Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

mark-andrews · 2025-01-26T19:13:44Z

(If this issue is off-topic and is about Ollama's API and not about ellmer per se, just let me know.)

I was successfully using chat_ollama and getting satisfactory results doing the following

client <- chat_ollama(model = "llama3.3", system_prompt = instructions)
result <- client$chat(text)

where instructions were around 200 words and text was about 20 words. Comparing the results to those using chat_openai (using GPT 4o) and nothing else being changed, the responses were quite similar in quality.

However, when I tried another problem when the system prompt instructions was increased to around 1500 words and the text was increased to around 2000, things were more or less disastrous with chat_ollama. It appeared as if llama was ignoring all but the last 500 or so words of text and seemingly ignoring instructions too maybe and so the response was completely unusable. If I switch to chat_openai instead of chat_ollama, changing nothing else, the results are about as good as good be expected.

I don't think the problem here is that GPT 4o is just so much better than llama 3.3. As stated, they were equally satisfactory before the prompts got long. Rather I assume that there is some setting that I need to change to tell llama to all of the prompts' contents. I just don't know what that might be. I tried changing num_ctx after seeing some discussion about that setting, e.g.

client <- chat_ollama(model = "llama3.3", system_prompt = instructions, api_args = list(num_ctx = 8192))

but that made no difference, nor did increasing num_ctx (much) higher.

Am I doing something wrong? Do I set Ollama settings with api_args like in the previous example? Does anyone know what settings I need to change?

For context,

> ollama -v
ollama version is 0.5.7

> ollama list                                             
NAME               ID              SIZE      MODIFIED    
llama3.3:latest    a6eb4748fd29    42 GB     4 weeks ago

and I'm using ellmer version 0.1.0.9000 and I have a relatively good GPU (RTX A6000) and have 64 cores and 500GB of RAM so hardware is not the problem.

The text was updated successfully, but these errors were encountered:

hadley · 2025-01-27T13:51:55Z

Would you mind creating a reprex that illustrates the problem? I think you could do something simple by pasting a text sequence together and asking what the first word was? (Or something similar).

But quickly looking at the linked issue, I'd think you'd want api_args = list(options = list(num_ctx = 8192)))

mark-andrews · 2025-01-27T20:41:30Z

To provide some context, what I am doing is using LLMs to do qualitative text analysis in social science, initially inspired by this article https://arxiv.org/abs/2307.13106 and using their Python code here. They used the OpenAI API through Python. I particularly want to use a local LLM instead if possible. As mentioned above, everything worked great using Llama 3.3 locally when the instructions and the text to be analysed were short.

Here is a reprex that shows things going badly with Llama when the system prompt and text become long, specifically when both were over 1500 words. By contrast, GPT 4o via chat_openai does a great job.

For this example, I made up a fake student essay marking assignment with the help of ChatGPT. I asked ChatGPT to make up a description of a student coursework assignment, and make a detailed marking rubric, and then generate an example of student essay on the relevant topic. So just to be very clear, all of this content is fake and generated by ChatGPT.

My reprex code:

library(tidyverse)
library(ellmer)

instructions_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/instructions.md'
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'

instructions <- readLines(instructions_url) |> str_c(collapse = '\n')
coursework <- readLines(coursework_url) |> str_c(collapse = '\n')

client_llama <- chat_ollama(model = "llama3.3", system_prompt = instructions, api_args = list(options = list(num_ctx = 8192)))
client_gpt <- chat_openai(system_prompt = instructions)

cat("# Llama 3.3 response:\n\n")
results_llama <- client_llama$chat(coursework)

cat("# GPT 4o responses\n\n")
results_gpt <- client_gpt$chat(coursework)

To avoid this code getting too long, I put the instructions and the fake essay in a GitHub gist.

If you run the above code, you will see that Llama basically doesn't seem to understand the instructions at all and also seems to focus only on the reference list of the essay. On the other hand, the GPT 4o does a very satisfactory job. I also put the responses into a markdown file in the Gist if you wish to see exactly the responses that I got.

This time, I used api_args = list(options = list(num_ctx = 8192)) this time (thank you for the recommendation). It did not seem different to when I use the default settings, which is 2048 (see https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).

hadley · 2025-01-27T21:06:00Z

Here's a slightly simpler reprex:

library(ellmer)
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'
coursework <- paste(readLines(coursework_url, warn = FALSE), collapse = "\n")

client_llama <- chat_ollama(model = "llama3.3", api_args = list(max_tokens = 8192))
client_llama$chat(paste0(coursework, "\n\n", "What was the title of the article?"))
client_llama$tokens()

Unfortunately it seems like this is a bug in ollama, with only an annoying work around: ollama/ollama#6544 (comment).

mark-andrews · 2025-01-27T21:37:55Z

Hi @hadley Thank you for looking into it. I assume that, for now at least, there is nothing that can be done on the R or ellmer side of things and the only option is the work around in Ollama itself. That's fine. In that case, I assume this issue can be closed.

hadley · 2025-01-28T14:03:21Z

Yeah, I'll probably close this issue with a note in the documentation.

Fixes #276

hadley added the reprex needs a minimal reproducible example label Jan 27, 2025

hadley added documentation and removed reprex needs a minimal reproducible example labels Jan 28, 2025

hadley added a commit that referenced this issue Jan 28, 2025

Document input token restriction

7b66b48

Fixes #276

hadley mentioned this issue Jan 28, 2025

Ollama improvements #283

Merged

hadley closed this as completed in #283 Jan 28, 2025

hadley closed this as completed in 32f8497 Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

mark-andrews commented Jan 26, 2025 •

edited

Loading

hadley commented Jan 27, 2025

mark-andrews commented Jan 27, 2025 •

edited

Loading

hadley commented Jan 27, 2025 •

edited

Loading

mark-andrews commented Jan 27, 2025

hadley commented Jan 28, 2025

Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

Comments

mark-andrews commented Jan 26, 2025 • edited Loading

hadley commented Jan 27, 2025

mark-andrews commented Jan 27, 2025 • edited Loading

hadley commented Jan 27, 2025 • edited Loading

mark-andrews commented Jan 27, 2025

hadley commented Jan 28, 2025

mark-andrews commented Jan 26, 2025 •

edited

Loading

mark-andrews commented Jan 27, 2025 •

edited

Loading

hadley commented Jan 27, 2025 •

edited

Loading