Mytransformer

Small transformer with encoder and decoder implemented by Peng.
This is just a runnable demo.

Install requirements

Run the command below to install requirements:
pip install ./Mytransformer

Train

The training script is in train_main.py. Change the hyperparameters here, changing args=ModelArgs(...) to chang model parameters.

Predict

The training script is in predict_main.py. The model and vocabulary parameters need to be the same during prediction as during training

Remark

At least now the code works and the loss goes down during training. However, the model translation accuracy is almost equal to none. I don't know why, maybe it's the tokenizer. Maybe the vocabulary size is too big...
New: The poor performance of the model is caused by a large vocabulary (the vocabulary dimension is 128000). Consider increasing the model size and increasing the amount of training data to compensate.
New: I set the thesaurus size to 1000, but the results were still terrible. I still don't know why. :(
New: I know! I didn't implement kv-cache, so the decoder only uses one word to predict in the prediction process. It is operational!

References

https://github.com/meta-llama/llama3
https://zh-v2.d2l.ai/

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
my_transformer		my_transformer
.gitignore		.gitignore
README.md		README.md
predict_main.py		predict_main.py
requirements.txt		requirements.txt
setup.py		setup.py
train_main.py		train_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mytransformer

Install requirements

Train

Predict

Remark

References

About

Releases

Packages

Languages

PengWeixuan/Mytransformer

Folders and files

Latest commit

History

Repository files navigation

Mytransformer

Install requirements

Train

Predict

Remark

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages