GitHub - NinoRisteski/ShallowFlow: Simple distributed training framework focused on ease of use and monitoring.

ShallowFlow is a distributed training framework designed for LLM training on cost-effective AWS GPU instances (g4dn.xlarge with NVIDIA T4). The project aims to make LLM training and fine-tuning accessible to developers with limited GPU resources.

Core

Features

Parameter-efficient fine-tuning (PEFT) support
Memory optimization techniques for T4 GPU
AWS integration and cost monitoring
Support for smaller, efficient models
Built-in monitoring and evaluation tools

Benefits

Optimization

Utilizes 8-bit quantization for memory efficiency
Implements gradient checkpointing
Supports efficient model parallelism
Optimizes for T4 GPU's 16GB memory constraint

Efficiency

Leverages AWS g4dn.xlarge ($0.526/hour)
Implements spot instance support
Provides cost monitoring and optimization
Enables efficient resource utilization

Goals

Accessibility: Make LLM training accessible to developers with limited resources
Efficiency: Optimize training for cost-effective GPU instances
Simplicity: Provide easy-to-use interfaces for LLM fine-tuning
Scalability: Enable scaling from single GPU to larger setups when needed

Install

# Clone repository
git clone https://github.com/NinoRisteski/ShallowFlow.git
cd shallowflow

# Create virtual env
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package
pip install -e .

# Conda env
conda env create -f environment.yml
conda activate shallowflow

Setup

Your GPU

# Environment var
export CUDA_VISIBLE_DEVICES=0
export WANDB_PROJECT="shallowflow-training"  # Optional for tracking

AWS

# Configure AWS credentials
aws configure

# Set AWS environment variables
export AWS_REGION=us-west-2
export AWS_INSTANCE_TYPE=g4dn.xlarge

WandB

# Set WandB API key
export WANDB_API_KEY="your-wandb-api-key"

Running ShallowFlow

Train on local GPU

# Train on Tiny Shakespeare dataset
python train.py \
    --model_name gpt2 \
    --dataset tiny_shakespeare \
    --batch_size 8 \
    --num_epochs 3 \
    --use_wandb \
    --use_quantization \ 
    --use_lora \          
    --output_dir trained_models

Train with Optimizations

# Train with LoRA and Quantization
python train.py \
    --model_name gpt2 \
    --dataset tiny_shakespeare \
    --batch_size 8 \
    --use_lora \
    --use_quantization \
    --use_wandb

Train on AWS

# Train on AWS
python train.py \
    --model_name gpt2 \
    --dataset tiny_shakespeare \
    --batch_size 8 \
    --learning_rate 1e-4 \
    --use_lora \
    --use_quantization \
    --quantization_bits 8 \
    --quantization_method dynamic \
    --use_aws \
    --use_wandb

Monitoring

# Set up wandb
wandb login

# Run training with monitoring
python train.py \
    --model_name gpt2 \
    --use_wandb \
    --wandb_project "my-project" \
    --wandb_entity "my-username"

Or:

# Monitor GPU usage
nvidia-smi

# Check training logs
tail -f logs/training.log

Test Runs:

# Fast testing configuration
python train.py \
    --model_name gpt2 \
    --dataset tiny_shakespeare \
    --batch_size 4 \
    --num_epochs 1 \
    --use_quantization

# Complete training configuration
python train.py \
    --model_name gpt2 \
    --dataset tiny_shakespeare \
    --batch_size 8 \
    --num_epochs 3 \
    --use_lora \
    --use_quantization \
    --use_wandb \
    --output_dir trained_models

ShallowFlow fills a specific niche by providing a practical solution for ML engineers and researchers who want to work with LLMs but don't have access to high-end GPU clusters, making distributed training more accessible and cost-effective.

References

[1] Atlassian, "How to write project objectives and project goals," Atlassian Work Management Guide, 2024. [Online]. Available: https://www.atlassian.com/work-management/project-management/project-objectives

[2] A. Kumar et al., "Efficient Large Language Model Training Techniques," arXiv:2404.08573v1 [cs.LG], Apr. 2024.

[3] SuperAnnotate, "A Comprehensive Guide to LLM Fine-Tuning," SuperAnnotate Technical Blog, Mar. 2024. [Online]. Available: https://www.superannotate.com/blog/llm-fine-tuning

[4] Anodot, "AWS G4 Instance Cost Optimization Guide," Anodot Learning Center, 2024. [Online]. Available: https://www.anodot.com/learning-center/aws-cost-optimization/ec2/g4/

[5] Hyperight, "The 4 Pillars of Effective LLM Training," Hyperight Technical Resources, Feb. 2024. [Online]. Available: https://hyperight.com/4-pillars-to-effective-training-of-large-language-models/

[6] S. Böhm, "ShallowSpeed: Small scale distributed training of sequential deep learning models," GitHub Repository, 2024. [Online]. Available: https://github.com/siboehm/ShallowSpeed

Note: ShallowSpeed served as inspiration for this project, implementing similar concepts for distributed training but focused specifically on LLM training on cost-effective GPU setups.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
assets		assets
data		data
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Core

Benefits

Goals

Install

Setup

Your GPU

AWS

WandB

Running ShallowFlow

Train on local GPU

Train with Optimizations

Train on AWS

Monitoring

Test Runs:

References

About

Releases

Packages

Languages

License

NinoRisteski/ShallowFlow

Folders and files

Latest commit

History

Repository files navigation

Core

Benefits

Goals

Install

Setup

Your GPU

AWS

WandB

Running ShallowFlow

Train on local GPU

Train with Optimizations

Train on AWS

Monitoring

Test Runs:

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages