Skip to content
/ barebones-ml Public template

A barebones template for ML projects using PyTorch, Transformers, and Hydra. It includes definitions for a Docker development container to streamline the environment setup in VS Code.

Notifications You must be signed in to change notification settings

austinleedavis/barebones-ml

Repository files navigation

Barebones ML Project Template

This repository provides a barebones template for ML projects using PyTorch, Transformers, and Hydra. It includes definitions for a Docker development container to streamline the environment setup in VS Code.

📋 Prerequisites

  1. Docker Engine (Installation Guide): Follow installation steps for your Linux distribution, or use Docker Desktop for Windows
  2. NVIDIA Container Toolkit (Installation Guide): Follow steps for Installation and Configuration
Move Docker's default data-dir (Only if neeeded)

On my system, I have a lot of free space at /home, but very little in docker's default directory. Run the following commands to update Docker to store its data in a different directory.

  1. Shutdown Docker service

    sudo systemctl stop docker docker.socket
    sudo systemctl status docker
  2. Move data to the new path (if it's not already there)

    sudo mkdir -p /etc/docker
    sudo rsync -avxP /var/lib/docker/ /home/docker/
    echo '{
      "data-root": "/home/docker"
    }' | sudo tee /etc/docker/daemon.json
  3. Restart the Docker services

    sudo systemctl restart docker

Useful links:

🚀 Installation

Use this template to initialize a new project on GitHub. Then run the following:

git clone <repo-url> your_new_project
cd your_new_project

# modify requirements.txt as-needed then...

# Build the container
make docker-build

Your environment is ready, and you can already get working. Access the development environment in one of two ways:

  • VS Code DevContainer: Open the project in VS Code, install the Dev Containers extension, and select "Dev Containers: Reopen in Container" from the command palette (Ctrl+Shift+P) to work inside the Docker environment.

  • Docker Run Command: Dispatch scipts to be run on the container using, e.g.:

    docker run --rm -v $(pwd):/workspace $(basename $(pwd)):latest bash -c "./scripts/train.sh"

📂 Project Structure

.
├── configs               # Configuration files for Hydra
│   ├── paths             #
│   │   └── default.yaml  # Default paths configuration
│   └── train.yaml        # Training-specific configuration
├── data                  # Data storage directory
├── logs                  # Logs generated from experiments
├── models                # Saved models
├── notebooks             # Jupyter notebooks for research and experimentation
│   └── template.ipynb    # Notebook template
├── scripts               # Shell scripts for automation
│   ├── eval.sh           # Evaluation script
│   └── train.sh          # Training script
├── src                   # Source code for the project
│   └── train.py          # Barebones train script
├── Dockerfile            # Docker environment setup
├── Makefile              # Makefile for automation (build, train, format, etc.)
├── pyproject.toml        # Python project configuration
├── README.md             # Project documentation
├── requirements.txt      # List of required Python packages
└── setup.py              # Python package setup

🛠 Features

  • Pre-configured PyTorch environment using Docker
  • Hydra-based configuration for flexibility in experiment settings
  • Pre-commit hooks for enforcing code quality
  • Sensible file structure to facilitate development
  • Automated setup with Makefile

📝 Notes

  • Configurations: Modify configs/train.yaml to adjust training settings.
  • Logs & Checkpoints: Stored in outputs/ and models/ respectively.
  • Extensibility: Add new scripts to scripts/ or modify Makefile for custom workflows.

💡 Recommendations

  • Run make format to run the pre-commit hooks before commiting your code.
  • Update requirements.txt whenever you install a new package in the container.
  • The configs/ folder is just a template. Consider cloning it outside the code base for day-to-day experiments. Then use the command-line flags --config-path (-cp) and --config-name (-cn) to direct hydra to those external locations. See Hydra's article on command line flags for more details.
  • Run make help for more commands

About

A barebones template for ML projects using PyTorch, Transformers, and Hydra. It includes definitions for a Docker development container to streamline the environment setup in VS Code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published