論文出處 Bidirectional Attention Flow for Machine Comprehension
SQuAD dataset
The Stanford Question Answering Dataset
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
Character Embedding Layer
Word Embedding Layer
GloVe (Pennington et al., 2014)
得到兩個level的embedding之後,先做concatenate、輸入highway network,得到最終我們要的embeding sequence。
Contextual Embedding Layer
對於 Context word sequence 與 Query word sequence 都是利用 bi-directional LSTM,並將兩個方向的 output 做 concatenate,得到:
\[\mathbf H \in \mathbb R^{2d\times T}\] \[\mathbf U \in \mathbb R^{2d\times J}\]Attention Flow Layer
Similarity Matrix $\mathbf S\in \mathbb R^{T\times J}$
\[\mathbf S_{tj} = \alpha(\mathbf H_{:t}, \mathbf U_{:j})\in \mathbb R\] \[\alpha(\mathbf{h,u}) = \mathbf{w^T(h;u;h\circ u)}\]Context-to-query Attention
attention wieght $\mathbf a_t\in \mathbf R^J$
\[\mathbf a_t = \mathrm {softmax}(\mathbf S_{t:})\]attended query vector $\tilde {\mathbf U}_{:t}$
\[\tilde {\mathbf U}_{:t} = \sum_j \mathbf a_{tj}\mathbf U_{:j}\]
Query-to-context Attention
attention wieght $\mathbf b_t\in \mathbf R$
\[\mathbf b_t = \mathrm {max}(\mathbf S_{t:})\]attended context vector $\tilde {\mathbf h} = \sum_t \mathbf b_t \mathbf H_{:t}$
$\tilde {\mathbf h}$ 複製 $T$ times 得:
\[\tilde {\mathbf H}\in \mathbb R^{2d\times T}\]
Query-aware representation of each context word $\mathbf G_{:t}$
\[\mathbf G_{:t} = \beta(\mathbf H_t, \tilde {\mathbf U}, \tilde {\mathbf H}_t)\in \mathbb R^{d_G}\] \[\beta(\mathbf{h,\tilde u, \tilde h}) = [\mathbf{h;\tilde u;h\circ \tilde u;h \circ \tilde h}]\in \mathbb R^{8d\times T}\]Modeling Layer
將 $\mathbf G$ input 入 bi-LSTM,得到 $\mathbf M \in \mathbb R^{2d\times T}$。 $\mathbf M$ 可當作是某個文章中的字考量前後文以及Query之後的文字表示。
Output Layer
\[\mathbf p^1 = \mathrm {softmax}(\mathbf {w_{p^1}^T[G;M]}), \quad \mathbf {w_{p_1}}\in \mathbb R^{10d}\]End
將 $\mathbf M$ input 入 bi-LSTM,得到 $\mathbf M^2 \in \mathbb R^{2d\times T}$ \(\mathbf p^2 = \mathrm {softmax}(\mathbf {w_{p^2}^T[G;M^2]})\)
\[L(\theta) = \frac 1 N \sum_{i}^{N}\log(\mathbf p_{y_i^1}^1) + \log(\mathbf p_{y_i^2}^2)\]$\mathbf p_k$ 代表第k個位置的機率。
Answer span $(k, l)$, where $k\leq l$。選擇其中 $\mathbf p_k^1\mathbf p_l^2$ 為機率相乘最大者。