Replication of experiments on compression-aware architectures

To replicate the experiments, there are three steps:

Install the fairseq code located in fairseq
Preprocess the data
Run the training commands

Installing Fairseq

Follow instructions of the original fairseq repo.

Processing the data

Download the OpenWebText corpus
Preprocess according to the instructions for GPT-2 BPE preprocessing in fairseq

Runing the commands

The commands are in commands.txt. Some of the paths need to be changed to the dataset location or the checkpoint location specified by --save-dir.

Maxout and bottleneck compression code

The code of the compression layers can be found in the path ./fairseq/fairseq/modules/transformer_layer.py. Search for "bottleneck" and "maxout".