bpe.py

function bytes_to_unicode: Create a dictionary to map byte values to unicode characters return the dictionary

function get_pairs(word): Initialize an empty set for pairs Iterate through the word to find all pairs of consecutive characters return the set of pairs

function get_file(filepath): Open the file located at filepath Read the contents of the file return the contents

function get_encoder(filepath): Load contents from the file using get_file Create a dictionary mapping for encoding return the encoder dictionary

class Encoder: Initialize attributes: pat, cache, bpe_ranks, byte_encoder, decoder, byte_decoder, encoder

method __init__:
    Set up byte to unicode mapping
    Create encoder and decoder dictionaries
    Initialize other required attributes
    
method bpe:
    Perform BPE on the given text
    return BPE applied text
    
method encode:
    Encode input text using BPE
    return encoded text
    
method encode_and_show_work:
    Encode text and show computation steps
    return encoded text
    
method decode:
    Convert encoded text back to original text
    return original text

class BPETokenizer: Initialize attributes: encoder

method __init__:
    Create an encoder instance
    
method __call__:
    Tokenize input text using encoder
    return tokens
    
method decode:
    Convert tokens back to original text
    return original text

model.py

class NewGELU: method forward(input): Apply GELU activation function on input return activated output

class CausalSelfAttention: Initialize attributes: c_attn, n_head, n_embd, resid_dropout, c_proj, attn_dropout

method __init__(params):
    Set up self-attention configurations using params
    
method forward(input):
    Compute self-attention on input
    Apply projection and dropout
    return attention output

class Block: Initialize attributes: attn, mlp, ln_1, ln_2, mlpf

method __init__(params):
    Initialize self-attention, MLP layers and normalizations
    
method forward(input):
    Normalize input, apply self-attention
    Normalize result, apply MLP
    return block output

class GPT: Initialize attributes: lm_head, transformer, block_size

method get_default_config():
    Create and return a default configuration dictionary
    
method __init__(config):
    Set up GPT model with configuration
    Initialize transformer and other components
    
method _init_weights():
    Initialize model weights randomly or using a predefined scheme
    
method from_pretrained(model_name):
    Load pretrained model weights from model_name
    
method configure_optimizers():
    Set up optimizers for training
    return optimizer
    
method forward(input):
    Pass input through transformer layers
    Apply language modeling head
    return model output
    
method generate(input, max_length):
    Generate sequence of tokens based on input up to max_length
    return generated sequence

trainer.py

class Trainer:

@staticmethod
function get_default_config():
    Create a configuration object
    Set device, dataloader parameters, and optimizer parameters
    return the configuration object

method __init__(self, config, model, train_dataset):
    Assign configurations, model, and dataset to self
    Initialize optimizer as None
    Create a dictionary for storing callbacks
    
    Determine training device based on config
    Move model to the training device
    Print device information
    
    Initialize iteration counters and timers
    
method add_callback(self, onevent, callback):
    Add callback to the specified event in the callbacks dictionary
    
method set_callback(self, onevent, callback):
    Set a single callback for the specified event, replacing any existing ones
    
method trigger_callbacks(self, onevent):
    Iterate through callbacks of the specified event and execute them
    
method run(self):
    model = self.model
    config = self.config
    
    # Setup the optimizer using model's method
    self.optimizer = model.configure_optimizers(config)
    
    # Setup the data loader
    train_loader = DataLoader(
        self.train_dataset,
        sampler=RandomSampler(self.train_dataset, replacement=True, num_samples=int(1e10)),
        shuffle=False,
        pin_memory=True,
        batch_size=config.batch_size,
        num_workers=config.num_workers
    )
    
    model.train()
    Initialize iteration number and start time
    
    Create an iterator from train_loader
    
    while True:
        try:
            Fetch the next batch of data (x, y)
        except StopIteration:
            Reinitialize the data iterator
            Fetch the next batch of data (x, y)
        
        Move x and y to the training device
        
        Forward pass through the model to get logits and loss
        
        Backward pass to compute gradients
        Clip gradients if necessary
        Update model parameters
        
        Trigger 'on_batch_end' callbacks
        Increment the iteration number
        Record the current time and compute the time difference since last iteration
        
        if termination condition is met (based on max iterations):
            break

utils.py

function set_seed(seed): Set random seed for random library Set random seed for numpy Set random seed for torch for CPU Set random seed for torch for all CUDA devices

function setup_logging(config): Get the working directory from the config Create the working directory if it does not exist Write command-line arguments to args.txt in the working directory Serialize and save the configuration to config.json in the working directory

class CfgNode: """ A lightweight configuration class """

method __init__(**kwargs):
    Update instance dictionary with kwargs
    
method __str__():
    return string representation of the configuration node with nested indentation

method _str_helper(indent):
    Create parts list for storing string representation
    Iterate over self's dictionary items
    If the item is a `CfgNode`, recursively call `_str_helper`
    Else, just append the key-value pair
    Indent the parts and join them into a single string
    return the string
    
method to_dict():
    Convert each item in the instance dictionary to a dictionary if it is a `CfgNode`
    return the resulting dictionary
    
method merge_from_dict(d):
    Update instance dictionary with items from dictionary `d`
    
method merge_from_args(args):
    Iterate over each argument in the list `args`
    Split the argument into key and value
    Translate the value into a python object using `literal_eval`
    If `literal_eval` fails, keep `val` as a string
    Strip the argument prefix `--`
    Split the resulting key by `.` to handle nested attributes
    Traverse the configuration node to reach the nested configuration object
    Ensure the attribute exists
    Overwrite the attribute with the new value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pseudocode.md

pseudocode.md

Files

pseudocode.md

Latest commit

History

pseudocode.md

File metadata and controls