Skip to content

PengWeixuan/Mytransformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mytransformer

Small transformer with encoder and decoder implemented by Peng.
This is just a runnable demo.

Install requirements

Run the command below to install requirements:
pip install ./Mytransformer

Train

The training script is in train_main.py. Change the hyperparameters here, changing args=ModelArgs(...) to chang model parameters.

Predict

The training script is in predict_main.py. The model and vocabulary parameters need to be the same during prediction as during training

Remark

At least now the code works and the loss goes down during training. However, the model translation accuracy is almost equal to none. I don't know why, maybe it's the tokenizer. Maybe the vocabulary size is too big...
New: The poor performance of the model is caused by a large vocabulary (the vocabulary dimension is 128000). Consider increasing the model size and increasing the amount of training data to compensate.
New: I set the thesaurus size to 1000, but the results were still terrible. I still don't know why. :(
New: I know! I didn't implement kv-cache, so the decoder only uses one word to predict in the prediction process. It is operational!

References

https://github.com/meta-llama/llama3
https://zh-v2.d2l.ai/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages