Limit default context size in the node template #435

CrossPr0duct · 2025-02-23T20:09:02Z

Issue description

When loading a 8B Model in the npm create node-llama-cpp@latest it saturates memory to 24GB.

Expected Behavior

should only use 8~GB of vram

Actual Behavior

Shouldn't this only use 8GB of vram. I am using Q8.
My GPU Memory looks like 3 GB to start then jumps to 24

Steps to reproduce

Just install the latest npm create node-llama-cpp@latest and create an app run npm install then npm start and load the 8GB llama model.

My Environment

OS: Windows 10.0.26100 (x64) <-- says windows 10? but actually 11
Node: 22.13.0 (x64)
TypeScript: 5.7.3
node-llama-cpp: 3.6.0

CUDA: available
Vulkan: available

CUDA device: NVIDIA GeForce RTX 4090
CUDA used VRAM: 6.38% (1.53GB/23.99GB)
CUDA free VRAM: 93.61% (22.46GB/23.99GB)

Vulkan device: NVIDIA GeForce RTX 4090
Vulkan used VRAM: 6.38% (1.53GB/23.99GB)
Vulkan free VRAM: 93.61% (22.46GB/23.99GB)
Vulkan unified memory: 512MB (2.08%)

CPU model: AMD Ryzen 9 7900X 12-Core Processor
Math cores: 12
Used RAM: 50.15% (63.75GB/127.12GB)
Free RAM: 49.84% (63.37GB/127.12GB)
Used swap: 51.24% (76.41GB/149.12GB)
Max swap size: 149.12GB
mmap: supported

Additional Context

No response

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, and I know how to start.

CrossPr0duct · 2025-02-23T21:01:31Z

@giladgd why does a 8B q8 model require 21 GB of VRAM this wasn't the case for llama cpp server prior.

giladgd · 2025-02-23T21:54:27Z

By default, node-llama-cpp uses the largest context size that can be fitted in your GPU’s VRAM (up to the model's training context size), to allow the model to ingest as much information as possible before a context shift happens, making it produce significantly higher quality responses when using long inputs.
The llama.cpp server uses a significantly shorter context size by default (4096).

If you don’t need such a long context size, you can configure it when creating a context.
Here's an example of how you can do that:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext({
    contextSize: {
        max: 4096
    }
});
const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);

I'm working on an incremental allocation during runtime to optimize memory consumption while still supporting advanced use cases without having to configure anything, but it's not ready yet.

CrossPr0duct · 2025-02-24T15:27:34Z

@giladgd that makes a lot of sense. This might leave a very bad impression on the library I had thought it was broken, perhaps you should add a warning or go with a default context size? It did that and my whole computer froze basically and it took forever to load.

giladgd · 2025-02-25T23:43:38Z

@CrossPr0duct You're right. I'll add a default limit to the max context size in the node template for now, until the incremental allocation is ready.
Thanks for reporting this :)

CrossPr0duct added bug Something isn't working requires triage Requires triaging labels Feb 23, 2025

giladgd self-assigned this Feb 25, 2025

giladgd removed the requires triage Requires triaging label Feb 25, 2025

giladgd changed the title ~~npm create node-llama-cpp memory issue.~~ limit default context size in the node template Feb 25, 2025

giladgd changed the title ~~limit default context size in the node template~~ Limit default context size in the node template Feb 25, 2025

giladgd linked a pull request Mar 20, 2025 that will close this issue

feat: extract function calling syntax from a Jinja template #444

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit default context size in the node template #435

Limit default context size in the node template #435

CrossPr0duct commented Feb 23, 2025

CrossPr0duct commented Feb 23, 2025

giladgd commented Feb 23, 2025

CrossPr0duct commented Feb 24, 2025

giladgd commented Feb 25, 2025

Limit default context size in the node template #435

Limit default context size in the node template #435

Comments

CrossPr0duct commented Feb 23, 2025

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

CrossPr0duct commented Feb 23, 2025

giladgd commented Feb 23, 2025

CrossPr0duct commented Feb 24, 2025

giladgd commented Feb 25, 2025