TensorFlow implementation of "Attention is all you need (Transformer)"
The MNIST dataset is used for confirming the working of the transformer.
The dataset is processed as follows for regarding as a sequential form.
- Trim off the sides from the square image.
- (H X W) -> (H X W_trim)
- H (Height) = W (Width) = 28
- W_trim = 18
- The height axis is regarded as a sequence and the width axis is regarded as a feature of each sequence.
- (H X W) = (S X F)
- S (Sequence) = 28
- F (Feature) = 18
- (H X W) -> (H X W_trim)
- Specify the target Y as an inverse sequence of X to differentiate the input sequence from the target sequence.
- In the figure, the data is shown in an upside-down form.
Class | Attention Map | Reconstruction |
0 | ||
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | ||
8 | ||
9 |
- Tensorflow 2.4.0
- whiteboxlayer 0.2.1
[1] Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems. 2017.