bpe.py
function bytes_to_unicode: Create a dictionary to map byte values to unicode characters return the dictionary
function get_pairs(word): Initialize an empty set for pairs Iterate through the word to find all pairs of consecutive characters return the set of pairs
function get_file(filepath): Open the file located at filepath Read the contents of the file return the contents
function get_encoder(filepath): Load contents from the file using get_file Create a dictionary mapping for encoding return the encoder dictionary
class Encoder: Initialize attributes: pat, cache, bpe_ranks, byte_encoder, decoder, byte_decoder, encoder
method __init__:
Set up byte to unicode mapping
Create encoder and decoder dictionaries
Initialize other required attributes
method bpe:
Perform BPE on the given text
return BPE applied text
method encode:
Encode input text using BPE
return encoded text
method encode_and_show_work:
Encode text and show computation steps
return encoded text
method decode:
Convert encoded text back to original text
return original text
class BPETokenizer: Initialize attributes: encoder
method __init__:
Create an encoder instance
method __call__:
Tokenize input text using encoder
return tokens
method decode:
Convert tokens back to original text
return original text
model.py
class NewGELU: method forward(input): Apply GELU activation function on input return activated output
class CausalSelfAttention: Initialize attributes: c_attn, n_head, n_embd, resid_dropout, c_proj, attn_dropout
method __init__(params):
Set up self-attention configurations using params
method forward(input):
Compute self-attention on input
Apply projection and dropout
return attention output
class Block: Initialize attributes: attn, mlp, ln_1, ln_2, mlpf
method __init__(params):
Initialize self-attention, MLP layers and normalizations
method forward(input):
Normalize input, apply self-attention
Normalize result, apply MLP
return block output
class GPT: Initialize attributes: lm_head, transformer, block_size
method get_default_config():
Create and return a default configuration dictionary
method __init__(config):
Set up GPT model with configuration
Initialize transformer and other components
method _init_weights():
Initialize model weights randomly or using a predefined scheme
method from_pretrained(model_name):
Load pretrained model weights from model_name
method configure_optimizers():
Set up optimizers for training
return optimizer
method forward(input):
Pass input through transformer layers
Apply language modeling head
return model output
method generate(input, max_length):
Generate sequence of tokens based on input up to max_length
return generated sequence
trainer.py
class Trainer:
@staticmethod
function get_default_config():
Create a configuration object
Set device, dataloader parameters, and optimizer parameters
return the configuration object
method __init__(self, config, model, train_dataset):
Assign configurations, model, and dataset to self
Initialize optimizer as None
Create a dictionary for storing callbacks
Determine training device based on config
Move model to the training device
Print device information
Initialize iteration counters and timers
method add_callback(self, onevent, callback):
Add callback to the specified event in the callbacks dictionary
method set_callback(self, onevent, callback):
Set a single callback for the specified event, replacing any existing ones
method trigger_callbacks(self, onevent):
Iterate through callbacks of the specified event and execute them
method run(self):
model = self.model
config = self.config
# Setup the optimizer using model's method
self.optimizer = model.configure_optimizers(config)
# Setup the data loader
train_loader = DataLoader(
self.train_dataset,
sampler=RandomSampler(self.train_dataset, replacement=True, num_samples=int(1e10)),
shuffle=False,
pin_memory=True,
batch_size=config.batch_size,
num_workers=config.num_workers
)
model.train()
Initialize iteration number and start time
Create an iterator from train_loader
while True:
try:
Fetch the next batch of data (x, y)
except StopIteration:
Reinitialize the data iterator
Fetch the next batch of data (x, y)
Move x and y to the training device
Forward pass through the model to get logits and loss
Backward pass to compute gradients
Clip gradients if necessary
Update model parameters
Trigger 'on_batch_end' callbacks
Increment the iteration number
Record the current time and compute the time difference since last iteration
if termination condition is met (based on max iterations):
break
utils.py
function set_seed(seed):
Set random seed for random
library
Set random seed for numpy
Set random seed for torch
for CPU
Set random seed for torch
for all CUDA devices
function setup_logging(config):
Get the working directory from the config
Create the working directory if it does not exist
Write command-line arguments to args.txt
in the working directory
Serialize and save the configuration to config.json
in the working directory
class CfgNode: """ A lightweight configuration class """
method __init__(**kwargs):
Update instance dictionary with kwargs
method __str__():
return string representation of the configuration node with nested indentation
method _str_helper(indent):
Create parts list for storing string representation
Iterate over self's dictionary items
If the item is a `CfgNode`, recursively call `_str_helper`
Else, just append the key-value pair
Indent the parts and join them into a single string
return the string
method to_dict():
Convert each item in the instance dictionary to a dictionary if it is a `CfgNode`
return the resulting dictionary
method merge_from_dict(d):
Update instance dictionary with items from dictionary `d`
method merge_from_args(args):
Iterate over each argument in the list `args`
Split the argument into key and value
Translate the value into a python object using `literal_eval`
If `literal_eval` fails, keep `val` as a string
Strip the argument prefix `--`
Split the resulting key by `.` to handle nested attributes
Traverse the configuration node to reach the nested configuration object
Ensure the attribute exists
Overwrite the attribute with the new value