-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long prompts and/or system prompts being clipped or ignored when using chat_ollama (but not with chat_openai) #276
Comments
Would you mind creating a reprex that illustrates the problem? I think you could do something simple by pasting a text sequence together and asking what the first word was? (Or something similar). But quickly looking at the linked issue, I'd think you'd want |
To provide some context, what I am doing is using LLMs to do qualitative text analysis in social science, initially inspired by this article https://arxiv.org/abs/2307.13106 and using their Python code here. They used the OpenAI API through Python. I particularly want to use a local LLM instead if possible. As mentioned above, everything worked great using Llama 3.3 locally when the instructions and the text to be analysed were short. Here is a reprex that shows things going badly with Llama when the system prompt and text become long, specifically when both were over 1500 words. By contrast, GPT 4o via For this example, I made up a fake student essay marking assignment with the help of ChatGPT. I asked ChatGPT to make up a description of a student coursework assignment, and make a detailed marking rubric, and then generate an example of student essay on the relevant topic. So just to be very clear, all of this content is fake and generated by ChatGPT. My reprex code: library(tidyverse)
library(ellmer)
instructions_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/instructions.md'
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'
instructions <- readLines(instructions_url) |> str_c(collapse = '\n')
coursework <- readLines(coursework_url) |> str_c(collapse = '\n')
client_llama <- chat_ollama(model = "llama3.3", system_prompt = instructions, api_args = list(options = list(num_ctx = 8192)))
client_gpt <- chat_openai(system_prompt = instructions)
cat("# Llama 3.3 response:\n\n")
results_llama <- client_llama$chat(coursework)
cat("# GPT 4o responses\n\n")
results_gpt <- client_gpt$chat(coursework) To avoid this code getting too long, I put the instructions and the fake essay in a GitHub gist. If you run the above code, you will see that Llama basically doesn't seem to understand the instructions at all and also seems to focus only on the reference list of the essay. On the other hand, the GPT 4o does a very satisfactory job. I also put the responses into a markdown file in the Gist if you wish to see exactly the responses that I got. This time, I used |
Here's a slightly simpler reprex: library(ellmer)
coursework_url <- 'https://gist.githubusercontent.com/mark-andrews/49f54e9abfc8a0cbb8ef556749702011/raw/9260ca82734322f000b8e6aed245551413ede3d3/coursework.md'
coursework <- paste(readLines(coursework_url, warn = FALSE), collapse = "\n")
client_llama <- chat_ollama(model = "llama3.3", api_args = list(max_tokens = 8192))
client_llama$chat(paste0(coursework, "\n\n", "What was the title of the article?"))
client_llama$tokens() Unfortunately it seems like this is a bug in ollama, with only an annoying work around: ollama/ollama#6544 (comment). |
Hi @hadley Thank you for looking into it. I assume that, for now at least, there is nothing that can be done on the R or ellmer side of things and the only option is the work around in Ollama itself. That's fine. In that case, I assume this issue can be closed. |
Yeah, I'll probably close this issue with a note in the documentation. |
(If this issue is off-topic and is about Ollama's API and not about ellmer per se, just let me know.)
I was successfully using
chat_ollama
and getting satisfactory results doing the followingwhere
instructions
were around 200 words andtext
was about 20 words. Comparing the results to those usingchat_openai
(using GPT 4o) and nothing else being changed, the responses were quite similar in quality.However, when I tried another problem when the system prompt
instructions
was increased to around 1500 words and thetext
was increased to around 2000, things were more or less disastrous withchat_ollama
. It appeared as if llama was ignoring all but the last 500 or so words oftext
and seemingly ignoringinstructions
too maybe and so the response was completely unusable. If I switch tochat_openai
instead ofchat_ollama
, changing nothing else, the results are about as good as good be expected.I don't think the problem here is that GPT 4o is just so much better than llama 3.3. As stated, they were equally satisfactory before the prompts got long. Rather I assume that there is some setting that I need to change to tell llama to all of the prompts' contents. I just don't know what that might be. I tried changing
num_ctx
after seeing some discussion about that setting, e.g.but that made no difference, nor did increasing
num_ctx
(much) higher.Am I doing something wrong? Do I set Ollama settings with
api_args
like in the previous example? Does anyone know what settings I need to change?For context,
and I'm using
ellmer
version 0.1.0.9000 and I have a relatively good GPU (RTX A6000) and have 64 cores and 500GB of RAM so hardware is not the problem.The text was updated successfully, but these errors were encountered: