Getting Started - Docs - Changelog - Bug reports - Discord
β οΈ Nitro is currently in Development: Expect breaking changes and bugs!
- GGML inference support (llama.cpp, etc...)
- Local file server
- Cache
- Plugin support
Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment π.
.
βββ controllers
βββ docs
βββ llama.cpp -> Upstream llama C++
βββ nitro_deps -> Dependencies of the Nitro project as a sub-project
βββ utils
Step 1: Download Nitro
To use Nitro, download the released binaries from the release page below:
After downloading the release, double-click on the Nitro binary.
Step 2: Download a Model
Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:
Step 3: Run Nitro
Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
-H 'Content-Type: application/json' \
-d '{
"llama_model_path": "/path/to/your_model.gguf",
"ctx_len": 2048,
"ngl": 100,
"embedding": true
}'
ctx_len
and ngl
are typical llama C++ parameters, and embedding
determines whether to enable the embedding endpoint or not.
Step 4: Perform Inference on Nitro for the First Time
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Access-Control-Allow-Origin: *' \
--data '{
"messages": [
{"content": "Hello there π", "role": "assistant"},
{"content": "Can you write a long story", "role": "user"}
],
"stream": true,
"model": "gpt-3.5-turbo",
"max_tokens": 2000
}'
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
To compile nitro please visit Compile from source
Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows:
- For support, please file a GitHub ticket.
- For questions, join our Discord here.
- For long-form inquiries, please email hello@jan.ai.