-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Missing <think> tag in response (DeepSeek R1) #11861
Comments
I observed the same problem when I was playing with non-thinking models and making them think within |
I had the same issue, but once I upgraded to a release greater than b4706, the issue went away. Looks like PR11607 resolved the problem. I get both of my think tags (<think> </think>) Here is how I am calling it (using shared library via llama-cpp-python:
|
I noticed another model that injects I used b4837 and Q3_K_M quant from mradermacher, launched with
@ochafik I know it's not a model from a major provider, but could you take a look if it's handled properly from llama.cpp side? |
The problem is the trend to add In theory we shouldn’t output something that’s already in the prompt (kinda working as intended), but in practice we’ll have to special case this 👌. |
New QwQ template contains a think token in the assistant turn so it doesn't get returned by the server api. this is ok if you manage the other side of the call, but for some frontends like ollama-webui results in a broken think infodump on the UI Cannot really special case everything as I've seen a fair share of etc in templates, probably a fixed prepend parameter on the server, where a fixed string is added in front of every response? |
If major model providers like DeepSeek and Qwen are including Tested using b4855 (7ab3643), launched with |
@MoonRide303 I do plan on accommodating this, it's only a special case in the wider context of text generation from a prompt (which has always been about returning content generated after the prompt, esp. for streamed mode). Missing opening I should note however that |
@LorenzoBoccaccia Simplest way I can think of is to just let model-specific chat handlers set that prepend variable when they detect a trailing |
Well regardless of who came up with it, it seems like a really stupid idea to me. It's just a matter of time before different think modes are introduced and I don't even want to imagine what the Jinja template would look like for these.
Definitely, we should just not support these templates and focus on getting the basics right for now. |
Both QwQ-32B and DeepSeek-R1 using this technique are SotA open weight models in their weight classes, so... maybe it would be good to figure out a way to support templates like that? Arent both working just fine in transformers, without any ugly hacks? |
@ggerganov I wish I could unsee DeepSeek R1 Distill's template (even before the
@MoonRide303 I'd love for someone to confirm. My experience in #11607 is that the official template is broken & does not make R1 Distill Qwen good at tool calling as it leaves the prompt dangling after tool call results (hence why I wrote an alternative template; but I still added a workaround to fix the original template)
I'm thinking of ways we can help model authors write SotA templates. Might get round to compiling an online template analyzer w/ Minja + WASM. But anyway, I digress: QwQ is gonna start reporting thoughts w/ #12297 in non-streaming mode. And streaming is on its way with promises of fixing everyone's sorrows (except mine; it's... a joyous mess 😓). |
Name and Version
I don't know whether it's a bug or not.
The latest Jinja chat template for the DeepSeek r1 model adds a
<think>\n
postfix to force the model into thinking.However, this makes all the responses losing the heading
<think>
tag like this:I suggest manually adding the
<think>
prefix in response whenadd_generation_prompt = true
.Operating systems
Linux
Which llama.cpp modules do you know to be affected?
libllama (core library)
Command line
Problem description & steps to reproduce
llama-server
DeepSeek R1
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: