Small transformer with encoder and decoder implemented by Peng.
This is just a runnable demo.
Run the command below to install requirements:
pip install ./Mytransformer
The training script is in train_main.py. Change the hyperparameters here, changing args=ModelArgs(...) to chang model parameters.
The training script is in predict_main.py. The model and vocabulary parameters need to be the same during prediction as during training
At least now the code works and the loss goes down during training. However, the model translation accuracy is almost equal to none. I don't know why, maybe it's the tokenizer. Maybe the vocabulary size is too big...
New: The poor performance of the model is caused by a large vocabulary (the vocabulary dimension is 128000). Consider increasing the model size and increasing the amount of training data to compensate.
New: I set the thesaurus size to 1000, but the results were still terrible. I still don't know why. :(
New: I know! I didn't implement kv-cache, so the decoder only uses one word to predict in the prediction process. It is operational!