Skip to content

Latest commit

 

History

History
238 lines (173 loc) · 12.4 KB

README.md

File metadata and controls

238 lines (173 loc) · 12.4 KB

minja.hpp - A minimalistic C++ Jinja templating engine for LLM chat templates

This is not an official Google product

Minja is a minimalistic reimplementation of the Jinja templating engine to integrate in/with C++ LLM projects (such as llama.cpp or gemma.cpp).

It is not general purpose: it includes just what’s needed for actual chat templates (very limited set of filters, tests and language features). Users with different needs should look at third-party alternatives such as Jinja2Cpp, Jinja2CppLight, or inja (none of which we endorse).

Warning

TL;DR: use of Minja is at your own risk, and the risks are plenty! See Security & Privacy section below.

CI

Design goals:

  • Support each and every major LLM found on HuggingFace
  • Easy to integrate to/with projects such as llama.cpp or gemma.cpp:
    • Header-only
    • C++17
    • Only depend on nlohmann::json (no Boost)
    • Keep codebase small (currently 2.5k LoC) and easy to understand
  • Decent performance compared to Python.

Non-goals:

  • Address glaring Prompt injection risks in current Jinja chat templating practices. See Security & Privacy below
  • Additional features from Jinja that aren't used by the template(s) of any major LLM (no feature creep!)
    • Please don't submit PRs with such features, they will unfortunately be rejected.
  • Full Jinja compliance (neither syntax-wise, nor filters / tests / globals)

Usage:

This library is header-only: just copy the header(s) you need, make sure to use a compiler that handles C++11 and you're done. Oh, and get nlohmann::json's json.hpp in your include path.

See API in minja/minja.hpp and minja/chat-template.h (experimental).

For raw Jinja templating (see examples/raw.cpp):

#include <minja.hpp>
#include <iostream>

using json = nlohmann::ordered_json;

int main() {
    auto tmpl = minja::Parser::parse("Hello, {{ location }}!", /* options= */ {});
    auto context = minja::Context::make(minja::Value(json {
        {"location", "World"},
    }));
    auto result = tmpl->render(context);
    std::cout << result << std::endl;
}

To apply a template to a JSON array of messages and tools in the HuggingFace standard (see examples/chat-template.cpp):

#include <chat-template.hpp>
#include <iostream>

using json = nlohmann::ordered_json;

int main() {
    minja::chat_template tmpl(
        "{% for message in messages %}"
        "{{ '<|' + message['role'] + '|>\\n' + message['content'] + '<|end|>' + '\\n' }}"
        "{% endfor %}",
        /* bos_token= */ "<|start|>",
        /* eos_token= */ "<|end|>"
    );
    std::cout << tmpl.apply(
        json::parse(R"([
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hi there"}
        ])"),
        json::parse(R"([
            {"type": "function", "function": {"name": "google_search", "arguments": {"query": "2+2"}}}
        ])"),
        /* add_generation_prompt= */ true,
        /* extra_context= */ {}) << std::endl;
}

(Note that some template quirks are worked around by minja/chat-template.hpp so that all templates can be used the same way)

Supported features

Models have increasingly complex templates (see some examples), so a fair bit of Jinja's language constructs is required to execute their templates properly.

Minja supports the following subset of the Jinja2/3 template syntax:

  • Full expression syntax
  • Statements {{% … %}}, variable sections {{ … }}, and comments {# … #} with pre/post space elision {%- … -%} / {{- … -}} / {#- … -#}
  • if / elif / else / endif
  • for (recursive) (if) / else / endfor w/ loop.* (including loop.cycle) and destructuring
  • set w/ namespaces & destructuring
  • macro / endmacro
  • filter / endfilter
  • Extensible filters collection: count, dictsort, equalto, e / escape, items, join, joiner, namespace, raise_exception, range, reject, tojson, trim

Main limitations (non-exhaustive list):

  • Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.
  • No difference between none and undefined
  • Single namespace with all filters / tests / functions / macros / variables
  • No tuples (templates seem to rely on lists only)
  • No if expressions w/o else (but if statements are fine)
  • No {% raw %}, {% block … %}, {% include … %}, `{% extends … %},

Roadmap / TODOs

Developer corner

Design overview

  • minja::Parser does two-phased parsing:
    • its tokenize() method creates coarse template "tokens" (plain text section, or expression blocks or opening / closing blocks). Tokens may have nested expressions ASTs, parsed with parseExpression()
    • its parseTemplate() method iterates on tokens to build the final TemplateNode AST.
  • minja::Value represents a Python-like value
    • It relies on nlohmann/json for primitive values, but does its own JSON dump to be exactly compatible w/ the Jinja / Python implementation of dict string representation
  • minja::chat_template wraps a template and provides an interface similar to HuggingFace's chat template formatting. It also normalizes the message history to accommodate different expectations from some templates (e.g. message.tool_calls.function.arguments is typically expected to be a JSON string representation of the tool call arguments, but some templates expect the arguments object instead)
  • Testing involves a myriad of simple syntax tests and full e2e chat template rendering tests. For each model in MODEL_IDS (see tests/CMakeLists.txt), we fetch the chat_template field of the repo's tokenizer_config.json, use the official jinja2 Python library to render them on each of the (relevant) test contexts (in tests/contexts) into a golden file, and run a C++ test that renders w/ Minja and checks we get exactly the same output.

Adding new Templates / Building

  • Install Prerequisites:

    • cmake
    • GCC / clang
    • python 3.8+ (for tests)
    • flake8
    • editorconfig-checker
  • Optional: test additional templates:

    • Add their HuggingFace model identifier to MODEL_IDS in tests/CMakeLists.txt (e.g. meta-llama/Llama-3.2-3B-Instruct)

    • For gated models you have access to, first authenticate w/ HuggingFace:

      pip install huggingface_hub
      huggingface-cli login
  • Build & run tests (shorthand: ./scripts/run_tests.sh):

    rm -fR build && \
        cmake -B build && \
        cmake --build build -j && \
        ctest --test-dir build -j --output-on-failure
  • Fuzzing tests

    • Note: fuzztest doesn't work natively on Windows or MacOS.

      Show instructions to run it inside a Docker container

      Beware of Docker Desktop's licensing: you might want to check out alternatives such as colima (we'll still use the docker client in the example below).

      docker run --rm -it -v $PWD:/src:rw $( echo "
          FROM python:3.12-slim-bookworm
          COPY requirements.txt /tmp
          RUN apt update && \
              apt install -y cmake clang ccache git python3 python-is-python3 python3-pip && \
              apt-get clean && \
              rm -rf /var/lib/apt/lists/*
          RUN pip install setuptools pip --upgrade --force-reinstall
          RUN pip install -r /tmp/requirements.txt
          CMD /usr/bin/bash
          WORKDIR /src
      " | docker build . -f - -q )
    • Build in fuzzing mode & run all fuzzing tests (optionally, set a higher TIMEOUT as env var):

      ./scripts/run_fuzzing_mode.sh
  • If your model's template doesn't run fine, please consider the following before opening a bug:

    • Is the template using any unsupported filter / test / method / global function, and which one(s)?
    • Is the template publicly available? Non-gated models are more likely to become supported.
    • Which version of GCC / clang did you compile the tests with? On which OS version?
    • If you intend to contribute a fix:
      • Please read CONTRIBUTING first. You'd have to sign a CLA, which your employer may need to accept.
      • Please test as many gated models as possible (use cmake -B build -DMINJA_TEST_GATED_MODELS=1 ... and edit MODEL_LIST appropriately)
  • For bonus points, check the style of your edits with:

    flake8
    editorconfig-checker

Security & Privacy

Data protection

This library doesn't store any data by itself, it doesn't access files or the web, it only transforms a template (string) and context (JSON w/ fields "messages", "tools"...) into a formatted string.

You should still be careful about untrusted third-party chat templates, as these could try and trigger bugs in Minja to exfiltrate user chat data (we only have limited fuzzing tests in place).

Risks are even higher with any user-defined functions.

Do NOT produce HTML or JavaScript with this!

HTML processing with this library is UNSAFE: no escaping of is performed (and the safe filter is a passthrough), leaving users vulnerable to XSS. Minja is not intended to produce HTML.

Beware of Prompt injection risks!

Prompt injection is NOT protected against by this library.

There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).

For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.

Note that injection of tool calls should typically not result in their execution as LLM inference engines should not try to parse the template output (just generated tokens), but this is something to watch out for when auditing such inference engines.

As there isn't any standard mechanism to escape special tokens to prevent those attacks, it is advised users of this library take their own message sanitization measures before applying chat templates. We do not recommend any specific such measure as each model reacts differently (some even understand l33tcode as instructions).