Getting Started with Containerized NVIDIA Environment

Prerequisites

This document assumes that you have either:

OpenTofu installed on your system.
Kind (Kubernetes In-Docker-Environment) installed and configured on your system.
Docker and Podman installed on your system, along with NVIDIA drivers to run GPU-enabled containers.
Ubuntu/Linux system with NVIDIA GPU hardware
NVIDIA drivers properly installed and configured

Install Requirements

To follow the guidelines outlined in this document, ensure that you have:

Installing Kind

Follow the official installation instructions for OpenTofu.
Follow the official installation instructions for Kind.
Follow the official installation instructions for nvKind.
This will provide a way to containerize and deploy Kubernetes environments without having to set up a physical cluster.

Helpful installation link

Install kind with gpus kind-with-gpus.

Installing Docker and Podman

Install Docker using the package manager of your operating system (e.g. apt for Ubuntu-based systems, brew for macOS, etc.).
OR, install Podman, which allows you to run containers in isolated environments without needing root privileges.
For NVIDIA drivers, ensure that they are properly installed and configured on your system.

Supported Environments

This setup supports the following environments:

NVIDIA-Kind Environment

Uses Kind as the container runtime environment for running Kubernetes clusters.
Mounts NVIDIA drivers to the containers requiring a GPU.
Provides an efficient way to develop, test, and deploy applications that rely on GPUs.

Docker Podman NVIDIA Environment

Uses Docker or Podman as the container runtime environment for running containers with NVIDIA drivers.
Supports running containers that require GPU acceleration without relying on Kind's cluster setup.

Usage

To use this setup, follow these steps:

clone this repository to your local machine.
define the necessary variables in an AWS Secret Manager file (e.g., cluster_name, http_ingress_port...
run the following command to install Kind and create a Kubernetes cluster:

make local-deploy

This will deploy the Kind cluster with the specified configuration, and ArgoCD will be installed on the cluster with the preconfigured github applications. 4. Access the ArgoCD web interface

Navigate to http://<your-cluster-ip>:<http_ingress_port>/argocd in your browser.

Log in using the default credentials.

GPU-Enabled Kubernetes Setup Guide

Step 1: Install Go Programming Language

First, install Go 1.24.4:

# Download Go
wget https://go.dev/dl/go1.24.4.linux-amd64.tar.gz

# Install Go to local directory (avoiding system-wide installation)
mkdir -p ~/.local
tar -C ~/.local -xzf go1.24.4.linux-amd64.tar.gz

# Add Go to PATH
export PATH=$PATH:$HOME/.local/go/bin

# Verify installation
go version

Step 2: Install Homebrew on Linux

Since some tools are easier to install via Homebrew, set up Homebrew for Linux:

# Create Homebrew directory
mkdir -p $HOME/.homebrew

# Clone Homebrew
git clone https://github.com/Homebrew/brew $HOME/.homebrew

# Clone homebrew-core tap
mkdir -p ~/.homebrew/Homebrew/Library/Taps/homebrew
git clone https://github.com/Homebrew/homebrew-core ~/.homebrew/Homebrew/Library/Taps/homebrew/homebrew-core

# Set up Homebrew environment variables
export HOMEBREW_PREFIX="$HOME/.homebrew"
export HOMEBREW_REPOSITORY="$HOME/.homebrew"
export HOMEBREW_CELLAR="$HOME/.homebrew/Cellar"
export PATH="$HOMEBREW_PREFIX/bin:$HOMEBREW_PREFIX/sbin:$PATH"

# Verify installation
brew --version

Step 3: Install Kubernetes Tools

Install essential Kubernetes tools:

# Install kind (Kubernetes in Docker)
brew install kind

# Install kubectl and helm
brew install kubectl helm

# Create kubectl alias for convenience
alias k=kubectl

Step 4: Configure NVIDIA Container Runtime

Set up Docker to work with NVIDIA GPUs:

# Test GPU availability
nvidia-smi -L

# Test Docker with NVIDIA runtime
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L

# Configure NVIDIA container toolkit
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place

# Restart Docker services
sudo systemctl restart docker
sudo systemctl restart containerd

# Verify services are running
sudo systemctl status docker
sudo systemctl status containerd

Step 5: Install and Use nvkind

Install NVIDIA's nvkind tool for GPU-enabled Kubernetes clusters:

# Install nvkind
go install github.com/NVIDIA/nvkind/cmd/nvkind@latest

# Add Go bin directory to PATH
export PATH=$PATH:$HOME/go/bin

# Create GPU-enabled Kubernetes cluster
nvkind cluster create

Step 6: Install NVIDIA GPU Operator

Deploy the NVIDIA GPU Operator to manage GPU resources in Kubernetes:

# Add NVIDIA Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update

# Install GPU Operator (with host drivers disabled since we're using nvkind)
helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator --set driver.enabled=false

# Verify GPU operator pods
k get pods -n gpu-operator

Step 7: Test GPU Access in Kubernetes

Test that GPU resources are available in the cluster:

# Run interactive GPU test container
k run -it --rm gpu-test --image=nvidia/cuda:12.0.1-runtime-ubuntu22.04 -- /bin/bash

# Inside the container, you can run:
# nvidia-smi -L

Step 8: Deploy NASA R2O Project

Clone and deploy the NASA R2O project:

# Clone the repository
git clone https://github.com/NASA-IMPACT/r2o-deploy.git
cd r2o-deploy

# Switch to GPU support branch
git checkout feature/support-cluster-with-gpu

# Set up environment configuration
cp .env.example .env
nano .env  # Edit configuration as needed

# Generate SSH key if needed
ssh-keygen -t rsa

# Source environment variables
set -a; source .env; set +a

# Navigate to local setup directory
cd local-setup

# Generate SSL certificates
openssl req -x509 -newkey rsa:2048 -nodes -keyout key.pem -out cert.pem -days 365

# Install OpenTofu (Terraform alternative)
brew install opentofu

# Initialize and deploy infrastructure
make init
make plan
make apply
make deploy

Step 9: Verification

Verify the deployment:

# Check all pods
k get pods

# Check GPU operator pods specifically
k get pods -n gpu-operator

# Test GPU access again
k run -it --rm gpu-test --image=nvidia/cuda:12.0.1-runtime-ubuntu22.04 -- /bin/bash

Cleanup

To clean up the environment:

# List clusters
kind get clusters

# Delete the cluster (replace with actual cluster name)
kind delete cluster --name nvkind-rqbcv

Key Points

nvkind is NVIDIA's tool for creating GPU-enabled Kubernetes clusters using kind
The GPU Operator manages GPU resources and drivers within Kubernetes
The driver.enabled=false setting is important when using nvkind since it manages drivers differently
The NASA R2O project appears to be a research-to-operations deployment system with GPU support
OpenTofu is used as a Terraform alternative for infrastructure management

Troubleshooting

Ensure NVIDIA drivers are properly installed on the host
Verify Docker can access GPUs before creating the cluster
Check that all services (Docker, containerd) are running after configuration changes
Make sure environment variables are properly sourced when working with the R2O project

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
cloud-setup		cloud-setup
local-setup		local-setup
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting Started with Containerized NVIDIA Environment

Prerequisites

Install Requirements

Installing Kind

Helpful installation link

Installing Docker and Podman

Supported Environments

NVIDIA-Kind Environment

Docker Podman NVIDIA Environment

Usage

GPU-Enabled Kubernetes Setup Guide

Step 1: Install Go Programming Language

Step 2: Install Homebrew on Linux

Step 3: Install Kubernetes Tools

Step 4: Configure NVIDIA Container Runtime

Step 5: Install and Use nvkind

Step 6: Install NVIDIA GPU Operator

Step 7: Test GPU Access in Kubernetes

Step 8: Deploy NASA R2O Project

Step 9: Verification

Cleanup

Key Points

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

NASA-IMPACT/r2o-deploy

Folders and files

Latest commit

History

Repository files navigation

Getting Started with Containerized NVIDIA Environment

Prerequisites

Install Requirements

Installing Kind

Helpful installation link

Installing Docker and Podman

Supported Environments

NVIDIA-Kind Environment

Docker Podman NVIDIA Environment

Usage

GPU-Enabled Kubernetes Setup Guide

Step 1: Install Go Programming Language

Step 2: Install Homebrew on Linux

Step 3: Install Kubernetes Tools

Step 4: Configure NVIDIA Container Runtime

Step 5: Install and Use nvkind

Step 6: Install NVIDIA GPU Operator

Step 7: Test GPU Access in Kubernetes

Step 8: Deploy NASA R2O Project

Step 9: Verification

Cleanup

Key Points

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages