This program was written as part of Fall 2024 EE 451 course project. The objective of this exercise is to given an existing program, parallelize it to achieve a speedup and acceleration using the parallelization techniques taught in the course.
- OS: Windows, Mac Lnux
- RAM: >8GB
- Disk Space: 12GB + space to store language models of your choice
- CPU: 4+ cores, 8+ cores recommended
- GPU: Optional
- Reference: https://www.gpu-mart.com/blog/run-llms-with-ollama
-
Install Ollama server, following the instruction provided at https://ollama.com/
- Alternatively run
curl https://ollama.ai/install.sh | sh
- Alternatively run
-
Enable concurrent request handling:
export OLLAMA_NUM_PARALLEL=10 export OLLAMA_MAX_LOADED_MODELS=6
- Reference: ollama/ollama#358
-
Start Ollama server:
ollama server
To start the server on a specific address/port, set this environment variable before starting the server:export OLLAMA_SERVER=http://[ip address]:[port number]
-
Run
./run_test.sh