Setup Cuda SDK 12.1 or later ( GTX 20xx, 30xx, 40xx )

This is a short guide to setup llm-inference project to run on your Windows machine using latest CUDA drivers.
This setup is viable if you have Nvidia GTX 20xx series or later.

NOTE: Python 3.12 breaks torch instllation. Please use Python 3.10

Install Windows build tools from:
https://visualstudio.microsoft.com/visual-cpp-build-tools/
You are looking for Desktop Development with C++
Setup CUDA
Navigate to: https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64
Select your Windows version and installation method then install.
Create Python Virtual Environment:
python -m venv venv
Activate the virtual environment:
.\venv\Scripts\activate
Install pytorch with cuda support: \
- pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  - source: https://pytorch.org/get-started/locally/
Install required packages:
- pip3 install -r requirements.txt
- llama-cpp-python guide: https://llama-cpp-python.readthedocs.io/en/latest/api-reference/
Install Bitsandbytes - windows compatible version
- pip3 install git+https://github.com/Keith-Hon/bitsandbytes-windows.git
Create .env file based on .env.example or env-samples/env.cuda.example
- Change the Model path and config then Run the server:
  - python main.py --multiprocess

Back to main doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-windows-cuda-12.1.md

llm-windows-cuda-12.1.md

Setup Cuda SDK 12.1 or later ( GTX 20xx, 30xx, 40xx )

Files

llm-windows-cuda-12.1.md

Latest commit

History

llm-windows-cuda-12.1.md

File metadata and controls

Setup Cuda SDK 12.1 or later ( GTX 20xx, 30xx, 40xx )