A simple implementation of Accelerating Large Language Model Decoding with Speculative Sampling in NumPy for GPT-2. See main.py
. I also wrote a blog post for this implementation.
Install Dependencies:
pip install -r picoGPT/requirements.txt
Tested on Python 3.9.10
.
Usage:
python main.py \
--prompt "Alan Turing theorized that computers would one day become" \
--n_tokens_to_generate 40 \
--draft_model_size "124M" \
--target_model_size "1558M" \
--K 4
Which outputs:
Autoregressive Decode
---------------------
Time = 71.64s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think like humans.
In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T
Speculative Decode
------------------
Time = 30.11s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think for themselves. But it's not just computers that are capable of thinking for themselves.
In fact, the brain is a computer, and it's capable