The simplest, fastest repository for training/finetuning medium-sized GPTs inspired by Andrej.
- Install
huggingface_hub
- Setting the mirror if in Mainland China:
export HF_ENDPOINT=https://hf-mirror.com
./fineweb-dataset.sh
- Modify the target folder path for further usage.
- Install
datasets
,tiktoken
- Here is the final result graph of our 124M
nanoGPT
with the comparison to the OpenAI GPT-2.- The left one is the train-val loss compared with the GPT-2 baseline.
- The right one is the hellaSwag validation dataset compared to the OpenAI GPT-2.
- Process the final result to get the graph
- Formalize the README
- Train the model using RTX 4090