This is not an official Google product
Minja is a minimalistic reimplementation of the Jinja templating engine to integrate in/with C++ LLM projects (such as llama.cpp or gemma.cpp).
It is not general purpose: it includes just what’s needed for actual chat templates (very limited set of filters, tests and language features). Users with different needs should look at third-party alternatives such as Jinja2Cpp, Jinja2CppLight, or inja (none of which we endorse).
Warning
TL;DR: use of Minja is at your own risk, and the risks are plenty! See Security & Privacy section below.
- Support each and every major LLM found on HuggingFace
- See
MODEL_IDS
in tests/CMakeLists.txt for the list of models currently supported
- See
- Easy to integrate to/with projects such as llama.cpp or gemma.cpp:
- Header-only
- C++17
- Only depend on nlohmann::json (no Boost)
- Keep codebase small (currently 2.5k LoC) and easy to understand
- Decent performance compared to Python.
- Address glaring Prompt injection risks in current Jinja chat templating practices. See Security & Privacy below
- Additional features from Jinja that aren't used by the template(s) of any major LLM (no feature creep!)
- Please don't submit PRs with such features, they will unfortunately be rejected.
- Full Jinja compliance (neither syntax-wise, nor filters / tests / globals)
This library is header-only: just copy the header(s) you need, make sure to use a compiler that handles C++11 and you're done. Oh, and get nlohmann::json's json.hpp
in your include path.
See API in minja/minja.hpp and minja/chat-template.h (experimental).
For raw Jinja templating (see examples/raw.cpp):
#include <minja.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
auto tmpl = minja::Parser::parse("Hello, {{ location }}!", /* options= */ {});
auto context = minja::Context::make(minja::Value(json {
{"location", "World"},
}));
auto result = tmpl->render(context);
std::cout << result << std::endl;
}
To apply a template to a JSON array of messages
and tools
in the HuggingFace standard (see examples/chat-template.cpp):
#include <chat-template.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
minja::chat_template tmpl(
"{% for message in messages %}"
"{{ '<|' + message['role'] + '|>\\n' + message['content'] + '<|end|>' + '\\n' }}"
"{% endfor %}",
/* bos_token= */ "<|start|>",
/* eos_token= */ "<|end|>"
);
std::cout << tmpl.apply(
json::parse(R"([
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"}
])"),
json::parse(R"([
{"type": "function", "function": {"name": "google_search", "arguments": {"query": "2+2"}}}
])"),
/* add_generation_prompt= */ true,
/* extra_context= */ {}) << std::endl;
}
(Note that some template quirks are worked around by minja/chat-template.hpp so that all templates can be used the same way)
Models have increasingly complex templates (see some examples), so a fair bit of Jinja's language constructs is required to execute their templates properly.
Minja supports the following subset of the Jinja2/3 template syntax:
- Full expression syntax
- Statements
{{% … %}}
, variable sections{{ … }}
, and comments{# … #}
with pre/post space elision{%- … -%}
/{{- … -}}
/{#- … -#}
if
/elif
/else
/endif
for
(recursive
) (if
) /else
/endfor
w/loop.*
(includingloop.cycle
) and destructuringset
w/ namespaces & destructuringmacro
/endmacro
filter
/endfilter
- Extensible filters collection:
count
,dictsort
,equalto
,e
/escape
,items
,join
,joiner
,namespace
,raise_exception
,range
,reject
,tojson
,trim
Main limitations (non-exhaustive list):
- Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.
- No difference between
none
andundefined
- Single namespace with all filters / tests / functions / macros / variables
- No tuples (templates seem to rely on lists only)
- No
if
expressions w/oelse
(butif
statements are fine) - No
{% raw %}
,{% block … %}
,{% include … %}
, `{% extends … %},
- Fix known issues w/ CRLF on Windows
- Integrate to llama.cpp: ggerganov/llama.cpp#11016 + ggerganov/llama.cpp#9639
- Improve fuzzing coverage:
- use thirdparty jinja grammar to guide exploration of inputs (or implement prettification of internal ASTs and use them to generate arbitrary values)
- fuzz each filter / test
- Measure / track test coverage
- Setup performance tests
- Simplify two-pass parsing
- Pass tokens to IfNode and such
- Macro nested set scope = global?
- Get listed in https://jbmoelker.github.io/jinja-compat-tests/, https://en.cppreference.com/w/cpp/links/libs
minja::Parser
does two-phased parsing:- its
tokenize()
method creates coarse template "tokens" (plain text section, or expression blocks or opening / closing blocks). Tokens may have nested expressions ASTs, parsed withparseExpression()
- its
parseTemplate()
method iterates on tokens to build the finalTemplateNode
AST.
- its
minja::Value
represents a Python-like value- It relies on
nlohmann/json
for primitive values, but does its own JSON dump to be exactly compatible w/ the Jinja / Python implementation ofdict
string representation
- It relies on
minja::chat_template
wraps a template and provides an interface similar to HuggingFace's chat template formatting. It also normalizes the message history to accommodate different expectations from some templates (e.g.message.tool_calls.function.arguments
is typically expected to be a JSON string representation of the tool call arguments, but some templates expect the arguments object instead)- Testing involves a myriad of simple syntax tests and full e2e chat template rendering tests. For each model in
MODEL_IDS
(see tests/CMakeLists.txt), we fetch thechat_template
field of the repo'stokenizer_config.json
, use the official jinja2 Python library to render them on each of the (relevant) test contexts (in tests/contexts) into a golden file, and run a C++ test that renders w/ Minja and checks we get exactly the same output.
-
Install Prerequisites:
- cmake
- GCC / clang
- python 3.8+ (for tests)
- flake8
- editorconfig-checker
-
Optional: test additional templates:
-
Add their HuggingFace model identifier to
MODEL_IDS
in tests/CMakeLists.txt (e.g.meta-llama/Llama-3.2-3B-Instruct
) -
For gated models you have access to, first authenticate w/ HuggingFace:
pip install huggingface_hub huggingface-cli login
-
-
Build & run tests (shorthand:
./scripts/run_tests.sh
):rm -fR build && \ cmake -B build && \ cmake --build build -j && \ ctest --test-dir build -j --output-on-failure
-
Fuzzing tests
-
Note:
fuzztest
doesn't work natively on Windows or MacOS.Show instructions to run it inside a Docker container
Beware of Docker Desktop's licensing: you might want to check out alternatives such as colima (we'll still use the docker client in the example below).
docker run --rm -it -v $PWD:/src:rw $( echo " FROM python:3.12-slim-bookworm COPY requirements.txt /tmp RUN apt update && \ apt install -y cmake clang ccache git python3 python-is-python3 python3-pip && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN pip install setuptools pip --upgrade --force-reinstall RUN pip install -r /tmp/requirements.txt CMD /usr/bin/bash WORKDIR /src " | docker build . -f - -q )
-
Build in fuzzing mode & run all fuzzing tests (optionally, set a higher
TIMEOUT
as env var):./scripts/run_fuzzing_mode.sh
-
-
If your model's template doesn't run fine, please consider the following before opening a bug:
- Is the template using any unsupported filter / test / method / global function, and which one(s)?
- Is the template publicly available? Non-gated models are more likely to become supported.
- Which version of GCC / clang did you compile the tests with? On which OS version?
- If you intend to contribute a fix:
- Please read CONTRIBUTING first. You'd have to sign a CLA, which your employer may need to accept.
- Please test as many gated models as possible (use
cmake -B build -DMINJA_TEST_GATED_MODELS=1 ...
and edit MODEL_LIST appropriately)
-
For bonus points, check the style of your edits with:
flake8 editorconfig-checker
This library doesn't store any data by itself, it doesn't access files or the web, it only transforms a template (string) and context (JSON w/ fields "messages"
, "tools"
...) into a formatted string.
You should still be careful about untrusted third-party chat templates, as these could try and trigger bugs in Minja to exfiltrate user chat data (we only have limited fuzzing tests in place).
Risks are even higher with any user-defined functions.
HTML processing with this library is UNSAFE: no escaping of is performed (and the safe
filter is a passthrough), leaving users vulnerable to XSS. Minja is not intended to produce HTML.
Prompt injection is NOT protected against by this library.
There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).
For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.
Note that injection of tool calls should typically not result in their execution as LLM inference engines should not try to parse the template output (just generated tokens), but this is something to watch out for when auditing such inference engines.
As there isn't any standard mechanism to escape special tokens to prevent those attacks, it is advised users of this library take their own message sanitization measures before applying chat templates. We do not recommend any specific such measure as each model reacts differently (some even understand l33tcode as instructions).