Processing math: 100%

論文筆記 Attention is All You Need

 · 1 min read

論文出處:Attention Is All You Need

Model Architecture

Imgur

Scaled Dot-Product Attention

Attention(Q,K,V)=softmax(QKTdk)V

參考連結

Multi-Head Attention

MultiHead(Q,K,V)=Concate(head1,....,headh)WO

where headi = Attention(QWQi,KWKi,VWVi)