“Attention Is All You Need” : Transformer 논문 리뷰

"Attention Is All You Need" 은 2017년에 발표된 paper이다.

Encoder-Decoder 구조를 갖는다.

Encoder에서는 input sequence $$(x_1, ... , x_n)$$ 을 연속적인 representation $$\mathbf{z} = (z_1, ... , z_n)$$으로 mapping한다.
Decoder에서는 encoder의 output인 $\mathbf{z}$로 한 시점에 하나의 원소의 symbol인 output sequence $$(y_1,...,y_m)$$을 생성한다.

이 때 각 step마다 AR model (auto-regressive model)을 사용하는데, 이 다음의 것을 생성해 낼 때 그 이전에 생성한 것들을 사용하는 것이다.

transformer은 encoder와 decoder에 stacked self-attention과 point-wise dense layer(fully-connected layer)을 이용한다.

Encoder
저자는 논문에서 N = 6 개의 동일한 layer을 쌓은 스택으로 인코더를 구성하였다.
각 layer에는 두 개의 sub-layer가 존재하는데, 이는 다음과 같다.

multi-head self-attention mechanism
simple, position-wise fully connected feed-forward network

이 때 residual connection이 사용되는데, 이는 layer normalization을 거치게 된다.
즉, 각 sub-layer의 output은 LayerNorm(x + Sublayer(x))가 된다.
이때 Sublayer(x)는 sub-layer 내에서 실행되는 function이다.
residual connection을 보다 용이하게 하기 위해 모델의 모든 sub-layer과 embedding layer은 dimension이 $d_{model}$인 output을 낸다.

Decoder
decoder도 encoder와 같이 N(=6)개의 동일한 layer의 스택으로 구성되어 있다.
디코더에는 인코더에서의 두 개의 sub-layer에 multi-head attention layer을 추가하였다.
인코더의 self-attention은 query, key, value가 같지만 multi-head attention에서 query는 디코더 행렬인 반면, key와 value는 인코더 행렬이다.

'논문 리뷰' 카테고리의 다른 글

논문 정리하는 법 (0)	2022.09.02
"Use Long-Short Term Memory To Enhance Internet of Things For Combined Sewer Overflow Monitoring" 리뷰 (0)	2021.07.13
"A Hybrid Deep Learning Algorithm and its Application To Streamflow Prediction" 논문 리뷰 (0)	2021.07.09
"Dam Water Overflow Estimation using Time Series" 논문 리뷰 (0)	2021.07.08

LEAVE NO ONE BEHIND

“Attention Is All You Need” : Transformer 논문 리뷰

'논문 리뷰' 카테고리의 다른 글

티스토리툴바

“Attention Is All You Need” : Transformer 논문 리뷰

'논문 리뷰' 카테고리의 다른 글

'논문 리뷰' Related Articles

티스토리툴바