Skip to content

Latest commit

 

History

History
4 lines (4 loc) · 492 Bytes

The Annotated Transformer.md

File metadata and controls

4 lines (4 loc) · 492 Bytes
  • Conv Nets have an increase in the number of operations required to relate signals from two arbitrary input or output positions that grow in the distance between positions.
  • Transformers reduce this to a constant fixed number of operations that tend to grow quadratically due to the attention calculation.
    • Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence