81 Branches 311 Tags

Name	Name	Last commit message	Last commit date
Latest commit tikikun Update README.md Nov 10, 2023 c14e949 · Nov 10, 2023 History 163 Commits
.github	.github	Merge pull request #101 from janhq/41-feat-batch-inference-for-nitro	Nov 6, 2023
controllers	controllers	add pre prompt	Nov 9, 2023
docs	docs	manual install docs add	Oct 18, 2023
llama.cpp @ 46876d2	llama.cpp @ 46876d2	pump version	Nov 7, 2023
nitro_deps	nitro_deps	Install windows (#47 )	Oct 6, 2023
utils	utils	feat: nitro add debug build	Oct 13, 2023
.gitignore	.gitignore	fix(deps): Add extra deps for windows package	Oct 22, 2023
.gitmodules	.gitmodules	scaffolding the project	Sep 26, 2023
CMakeLists.txt	CMakeLists.txt	feat: fix embedding bug	Nov 7, 2023
LICENSE	LICENSE	Amend license from MIT to AGPLv3	Oct 25, 2023
README.md	README.md	Update README.md	Nov 10, 2023
config.json	config.json	feat: add dynamic loading model through api	Oct 11, 2023
install_deps.sh	install_deps.sh	scaffolding the project	Sep 26, 2023
main.cc	main.cc	feat: add batching inferences + multi threading in drogon cpp	Nov 3, 2023

Repository files navigation

Nitro - Accelerated AI Inference Engine

Getting Started - Docs - Changelog - Bug reports - Discord

⚠️ Nitro is currently in Development: Expect breaking changes and bugs!

Features

Supported features

GGML inference support (llama.cpp, etc...)

TODO:

Local file server
Cache
Plugin support

Documentation

About Nitro

Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!

The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.

Repo Structure

.
├── controllers
├── docs 
├── llama.cpp -> Upstream llama C++
├── nitro_deps -> Dependencies of the Nitro project as a sub-project
└── utils

Quickstart

Step 1: Download Nitro

To use Nitro, download the released binaries from the release page below:

After downloading the release, double-click on the Nitro binary.

Step 2: Download a Model

Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:

Step 3: Run Nitro

Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.

curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
  -H 'Content-Type: application/json' \
  -d '{
    "llama_model_path": "/path/to/your_model.gguf",
    "ctx_len": 2048,
    "ngl": 100,
    "embedding": true,
    "n_parallel": 4,
    "pre_prompt": "A chat between a curious user and an artificial intelligence",
    "user_prompt": "USER: ",
    "ai_prompt": "ASSISTANT: "
  }'

Table of parameters

Parameter	Type	Description
`llama_model_path`	String	The file path to the LLaMA model.
`ngl`	Integer	The number of GPU layers to use.
`ctx_len`	Integer	The context length for the model operations.
`embedding`	Boolean	Whether to use embedding in the model.
`n_parallel`	Integer	The number of parallel operations. Uses Drogon thread count if not set.
`cont_batching`	Boolean	Whether to use continuous batching.
`user_prompt`	String	The prompt to use for the user.
`ai_prompt`	String	The prompt to use for the AI assistant.
`system_prompt`	String	The prompt to use for system rules.
`pre_prompt`	String	The prompt to use for internal configuration.

Step 4: Perform Inference on Nitro for the First Time

curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
     --header 'Content-Type: application/json' \
     --header 'Accept: text/event-stream' \
     --header 'Access-Control-Allow-Origin: *' \
     --data '{
        "messages": [
            {"content": "Hello there 👋", "role": "assistant"},
            {"content": "Can you write a long story", "role": "user"}
        ],
        "stream": true,
        "model": "gpt-3.5-turbo",
        "max_tokens": 2000
     }'

Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.

Compile from source

To compile nitro please visit Compile from source

Architecture

Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows:

Contact

For support, please file a GitHub ticket.
For questions, join our Discord here.
For long-form inquiries, please email hello@jan.ai.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nitro - Accelerated AI Inference Engine

Features

Supported features

TODO:

Documentation

About Nitro

Repo Structure

Quickstart

Compile from source

Architecture

Contact

About

Releases 191

Packages

Contributors 43

Languages

License

janhq/cortex.cpp

Folders and files

Latest commit

History

Repository files navigation

Nitro - Accelerated AI Inference Engine

Features

Supported features

TODO:

Documentation

About Nitro

Repo Structure

Quickstart

Compile from source

Architecture

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 191

Packages 0

Contributors 43

Languages

Packages