Skip to content

Latest commit

 

History

History
252 lines (184 loc) · 7.62 KB

pseudocode.md

File metadata and controls

252 lines (184 loc) · 7.62 KB

bpe.py

function bytes_to_unicode: Create a dictionary to map byte values to unicode characters return the dictionary

function get_pairs(word): Initialize an empty set for pairs Iterate through the word to find all pairs of consecutive characters return the set of pairs

function get_file(filepath): Open the file located at filepath Read the contents of the file return the contents

function get_encoder(filepath): Load contents from the file using get_file Create a dictionary mapping for encoding return the encoder dictionary

class Encoder: Initialize attributes: pat, cache, bpe_ranks, byte_encoder, decoder, byte_decoder, encoder

method __init__:
    Set up byte to unicode mapping
    Create encoder and decoder dictionaries
    Initialize other required attributes
    
method bpe:
    Perform BPE on the given text
    return BPE applied text
    
method encode:
    Encode input text using BPE
    return encoded text
    
method encode_and_show_work:
    Encode text and show computation steps
    return encoded text
    
method decode:
    Convert encoded text back to original text
    return original text

class BPETokenizer: Initialize attributes: encoder

method __init__:
    Create an encoder instance
    
method __call__:
    Tokenize input text using encoder
    return tokens
    
method decode:
    Convert tokens back to original text
    return original text

model.py

class NewGELU: method forward(input): Apply GELU activation function on input return activated output

class CausalSelfAttention: Initialize attributes: c_attn, n_head, n_embd, resid_dropout, c_proj, attn_dropout

method __init__(params):
    Set up self-attention configurations using params
    
method forward(input):
    Compute self-attention on input
    Apply projection and dropout
    return attention output

class Block: Initialize attributes: attn, mlp, ln_1, ln_2, mlpf

method __init__(params):
    Initialize self-attention, MLP layers and normalizations
    
method forward(input):
    Normalize input, apply self-attention
    Normalize result, apply MLP
    return block output

class GPT: Initialize attributes: lm_head, transformer, block_size

method get_default_config():
    Create and return a default configuration dictionary
    
method __init__(config):
    Set up GPT model with configuration
    Initialize transformer and other components
    
method _init_weights():
    Initialize model weights randomly or using a predefined scheme
    
method from_pretrained(model_name):
    Load pretrained model weights from model_name
    
method configure_optimizers():
    Set up optimizers for training
    return optimizer
    
method forward(input):
    Pass input through transformer layers
    Apply language modeling head
    return model output
    
method generate(input, max_length):
    Generate sequence of tokens based on input up to max_length
    return generated sequence

trainer.py

class Trainer:

@staticmethod
function get_default_config():
    Create a configuration object
    Set device, dataloader parameters, and optimizer parameters
    return the configuration object

method __init__(self, config, model, train_dataset):
    Assign configurations, model, and dataset to self
    Initialize optimizer as None
    Create a dictionary for storing callbacks
    
    Determine training device based on config
    Move model to the training device
    Print device information
    
    Initialize iteration counters and timers
    
method add_callback(self, onevent, callback):
    Add callback to the specified event in the callbacks dictionary
    
method set_callback(self, onevent, callback):
    Set a single callback for the specified event, replacing any existing ones
    
method trigger_callbacks(self, onevent):
    Iterate through callbacks of the specified event and execute them
    
method run(self):
    model = self.model
    config = self.config
    
    # Setup the optimizer using model's method
    self.optimizer = model.configure_optimizers(config)
    
    # Setup the data loader
    train_loader = DataLoader(
        self.train_dataset,
        sampler=RandomSampler(self.train_dataset, replacement=True, num_samples=int(1e10)),
        shuffle=False,
        pin_memory=True,
        batch_size=config.batch_size,
        num_workers=config.num_workers
    )
    
    model.train()
    Initialize iteration number and start time
    
    Create an iterator from train_loader
    
    while True:
        try:
            Fetch the next batch of data (x, y)
        except StopIteration:
            Reinitialize the data iterator
            Fetch the next batch of data (x, y)
        
        Move x and y to the training device
        
        Forward pass through the model to get logits and loss
        
        Backward pass to compute gradients
        Clip gradients if necessary
        Update model parameters
        
        Trigger 'on_batch_end' callbacks
        Increment the iteration number
        Record the current time and compute the time difference since last iteration
        
        if termination condition is met (based on max iterations):
            break

utils.py

function set_seed(seed): Set random seed for random library Set random seed for numpy Set random seed for torch for CPU Set random seed for torch for all CUDA devices

function setup_logging(config): Get the working directory from the config Create the working directory if it does not exist Write command-line arguments to args.txt in the working directory Serialize and save the configuration to config.json in the working directory

class CfgNode: """ A lightweight configuration class """

method __init__(**kwargs):
    Update instance dictionary with kwargs
    
method __str__():
    return string representation of the configuration node with nested indentation

method _str_helper(indent):
    Create parts list for storing string representation
    Iterate over self's dictionary items
    If the item is a `CfgNode`, recursively call `_str_helper`
    Else, just append the key-value pair
    Indent the parts and join them into a single string
    return the string
    
method to_dict():
    Convert each item in the instance dictionary to a dictionary if it is a `CfgNode`
    return the resulting dictionary
    
method merge_from_dict(d):
    Update instance dictionary with items from dictionary `d`
    
method merge_from_args(args):
    Iterate over each argument in the list `args`
    Split the argument into key and value
    Translate the value into a python object using `literal_eval`
    If `literal_eval` fails, keep `val` as a string
    Strip the argument prefix `--`
    Split the resulting key by `.` to handle nested attributes
    Traverse the configuration node to reach the nested configuration object
    Ensure the attribute exists
    Overwrite the attribute with the new value