To replicate the experiments, there are three steps:
- Install the fairseq code located in fairseq
- Preprocess the data
- Run the training commands
Follow instructions of the original fairseq repo.
- Download the OpenWebText corpus
- Preprocess according to the instructions for GPT-2 BPE preprocessing in fairseq
The commands are in commands.txt. Some of the paths need to be changed to the dataset location or the checkpoint location specified by --save-dir
.
The code of the compression layers can be found in the path ./fairseq/fairseq/modules/transformer_layer.py
. Search for "bottleneck" and "maxout".