Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276

Closed
mark-andrews opened this issue Jan 26, 2025 · 5 comments · Fixed by #283

Comments

@mark-andrews
Copy link

mark-andrews commented Jan 26, 2025

(If this issue is off-topic and is about Ollama's API and not about ellmer per se, just let me know.)

I was successfully using chat_ollama and getting satisfactory results doing the following

client <- chat_ollama(model = "llama3.3", system_prompt = instructions)
result <- client$chat(text)

where instructions were around 200 words and text was about 20 words. Comparing the results to those using chat_openai (using GPT 4o) and nothing else being changed, the responses were quite similar in quality.

However, when I tried another problem when the system prompt instructions was increased to around 1500 words and the text was increased to around 2000, things were more or less disastrous with chat_ollama. It appeared as if llama was ignoring all but the last 500 or so words of text and seemingly ignoring instructions too maybe and so the response was completely unusable. If I switch to chat_openai instead of chat_ollama, changing nothing else, the results are about as good as good be expected.

I don't think the problem here is that GPT 4o is just so much better than llama 3.3. As stated, they were equally satisfactory before the prompts got long. Rather I assume that there is some setting that I need to change to tell llama to all of the prompts' contents. I just don't know what that might be. I tried changing num_ctx after seeing some discussion about that setting, e.g.

client <- chat_ollama(model = "llama3.3", system_prompt = instructions, api_args = list(num_ctx = 8192)) 

but that made no difference, nor did increasing num_ctx (much) higher.

Am I doing something wrong? Do I set Ollama settings with api_args like in the previous example? Does anyone know what settings I need to change?

For context,

> ollama -v
ollama version is 0.5.7

> ollama list                                             
NAME               ID              SIZE      MODIFIED    
llama3.3:latest    a6eb4748fd29    42 GB     4 weeks ago    

and I'm using ellmer version 0.1.0.9000 and I have a relatively good GPU (RTX A6000) and have 64 cores and 500GB of RAM so hardware is not the problem.

@hadley
Copy link
Member

hadley commented Jan 27, 2025

Would you mind creating a reprex that illustrates the problem? I think you could do something simple by pasting a text sequence together and asking what the first word was? (Or something similar).

But quickly looking at the linked issue, I'd think you'd want api_args = list(options = list(num_ctx = 8192)))

@hadley hadley added the reprex needs a minimal reproducible example label Jan 27, 2025
@mark-andrews
Copy link
Author

mark-andrews commented Jan 27, 2025

To provide some context, what I am doing is using LLMs to do qualitative text analysis in social science, initially inspired by this article https://arxiv.org/abs/2307.13106 and using their Python code here. They used the OpenAI API through Python. I particularly want to use a local LLM instead if possible. As mentioned above, everything worked great using Llama 3.3 locally when the instructions and the text to be analysed were short.

Here is a reprex that shows things going badly with Llama when the system prompt and text become long, specifically when both were over 1500 words. By contrast, GPT 4o via chat_openai does a great job.

For this example, I made up a fake student essay marking assignment with the help of ChatGPT. I asked ChatGPT to make up a description of a student coursework assignment, and make a detailed marking rubric, and then generate an example of student essay on the relevant topic. So just to be very clear, all of this content is fake and generated by ChatGPT.

My reprex code:

library(tidyverse)
library(ellmer)

instructions_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/instructions.md'
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'

instructions <- readLines(instructions_url) |> str_c(collapse = '\n')
coursework <- readLines(coursework_url) |> str_c(collapse = '\n')

client_llama <- chat_ollama(model = "llama3.3", system_prompt = instructions, api_args = list(options = list(num_ctx = 8192)))
client_gpt <- chat_openai(system_prompt = instructions)

cat("# Llama 3.3 response:\n\n")
results_llama <- client_llama$chat(coursework)

cat("# GPT 4o responses\n\n")
results_gpt <- client_gpt$chat(coursework)

To avoid this code getting too long, I put the instructions and the fake essay in a GitHub gist.

If you run the above code, you will see that Llama basically doesn't seem to understand the instructions at all and also seems to focus only on the reference list of the essay. On the other hand, the GPT 4o does a very satisfactory job. I also put the responses into a markdown file in the Gist if you wish to see exactly the responses that I got.

This time, I used api_args = list(options = list(num_ctx = 8192)) this time (thank you for the recommendation). It did not seem different to when I use the default settings, which is 2048 (see https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).

@hadley
Copy link
Member

hadley commented Jan 27, 2025

Here's a slightly simpler reprex:

library(ellmer)
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'
coursework <- paste(readLines(coursework_url, warn = FALSE), collapse = "\n")

client_llama <- chat_ollama(model = "llama3.3", api_args = list(max_tokens = 8192))
client_llama$chat(paste0(coursework, "\n\n", "What was the title of the article?"))
client_llama$tokens()

Unfortunately it seems like this is a bug in ollama, with only an annoying work around: ollama/ollama#6544 (comment).

@mark-andrews
Copy link
Author

Hi @hadley Thank you for looking into it. I assume that, for now at least, there is nothing that can be done on the R or ellmer side of things and the only option is the work around in Ollama itself. That's fine. In that case, I assume this issue can be closed.

@hadley
Copy link
Member

hadley commented Jan 28, 2025

Yeah, I'll probably close this issue with a note in the documentation.

@hadley hadley added documentation and removed reprex needs a minimal reproducible example labels Jan 28, 2025
hadley added a commit that referenced this issue Jan 28, 2025
@hadley hadley closed this as completed in 32f8497 Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants