-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot find tokenizer merges in model file #120
Comments
I agree it may be due to the outdated version using an older version of llama.cpp, because Llama 1B is available to try on https://github.ngxson.com/wllama/examples/main/dist/ and it's working fine there. I've also tried the 3B model, and it's all good. |
Odd, it didn't solve it. I tried re-downloading the model itself, but that didn 't help. Then I tried Firefox for comparison, and actually noticed the same error. ![]() I'm attempting a non-chunked version of the model next. https://huggingface.co/BoscoTheDog/llama_3_2_it_1b_q4_k_m_chunked |
bingo |
I re-chunked the 1B using the very latest version of llama.cpp. Now it loads, but only outputs a single word before giving this error: ![]() // Looking back, this error may have just been my code trying to unload Wllama after inference was complete, and failing. |
Could you try these splits and confirm if they work? (Those are the ones I'm using without issues on Wllama v1.16.2)
We need first to find out if the problem is with:
|
Here is the chunked model that only outputs one word by the way: https://huggingface.co/BoscoTheDog/llama_3_2_it_1b_q4_k_m_chunked/resolve/main/llama-3_2_it_1b_q4_0-00001-of-00004.gguf |
Setting the sampling to minimal worked! I thought that I keep 'allow_offline' enabled all the time. Is that a bad idea? |
Not at all! I leave it always enabled too! :D
Interesting! Did it work with https://huggingface.co/BoscoTheDog/llama_3_2_it_1b_q4_k_m_chunked/resolve/main/llama-3_2_it_1b_q4_0-00001-of-00004.gguf ? If so, we can conclude it is something specific with the config passed to Wllama? (If that’s the case, have you found the specific config combination that caused the issue?) |
I used to have this enabled all the time too, but I've removed it now.
But re-enabling it as a test had no (negative) effect. I still have this enabled:
Should I remove that? |
I got the vague notion that .gguf files have template information within them? Currently I use Transformers.js to turn a conversation dictionary into a templated string, and then feed that into the AI model. Is there a way that I can skip that step and feed a dictionary of a conversation into Wllama?
Aha! There is a function to get the Jinja template from the GGUF, and then Wllama's uses a dependency on |
There is, by using the @huggingface/jinja package (the same as Transformers.js uses). Here's the same logic used in https://github.ngxson.com/wllama/examples/main/dist/: import { Template } from "@huggingface/jinja";
const wllama = new Wllama(/*...*/);
await wllama.loadModelFromUrl(/*...*/);
export const formatChat = async (wllama: Wllama, messages: Message[]) => {
const defaultChatTemplate =
"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}";
const template = new Template(
wllama.getChatTemplate() ?? defaultChatTemplate,
);
return template.render({
messages,
bos_token: await wllama.detokenize([wllama.getBOS()]),
eos_token: await wllama.detokenize([wllama.getEOS()]),
add_generation_prompt: true,
});
};
const messages = [
{
"role": "user",
"content": "Hi!"
},
{
"role": "assistant",
"content": "Hello! How may I help you today?"
}
{
"role": "user",
"content": "How many R's are there in the word strawberry?"
},
]
const prompt = formatChat(wllama, messages);
// <|im_start|>user
// Hi!<|im_end|>
// <|im_start|>assistant
// Hello! How may I help you today?<|im_end|>
// <|im_start|>user
// How many R's are there in the word strawberry?<|im_end|>
// <|im_start|>assistant |
Oh wow, diving into your info I realized there is even an abstraction layer above Transformers.js. // Wait, no, it's just to use the API. |
I've implemented your templating approach, thank you! Much simpler than creating an entire Transformers.js instance. |
This might be related to unslothai/unsloth#1065, unslothai/unsloth#1062 - temporary fixes are provided for Unsloth finetuners, and can confirm with the Hugging Face team at ggml-org/llama.cpp#9692 it's |
This problem is reported on the upstream repo: ggml-org/llama.cpp#9692 |
Noticed this error loading the Llama 1B and 3B models.
I'm updating Wllama now, hopefully that fixes it.
The text was updated successfully, but these errors were encountered: