論文筆記 Attention is All You Need

24 Nov 2018 · 1 min read

論文出處：Attention Is All You Need

Model Architecture

Imgur

Scaled Dot-Product Attention

\[\mathrm {Attention}(Q, K, V) = \mathrm {softmax}(\frac {QK^T}{\sqrt {d_k}})V\]

Multi-Head Attention

\[\mathrm {MultiHead}(Q, K, V) = \mathrm {Concate(head_1, ...., head_h)}W^O\]

where $\mathrm {head_i}$ = $\mathrm {Attention}(QW_i^Q, KW_i^K, VW_i^V)$

Previous 論文筆記 StarGAN-Unified Genera...

Next 加入的優點