You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meaning the Generation settings area for the model, such as the context size and so on. Can retrieval proficiency be improved by tuning those?
Here are some assumptions, I would expect that the larger the document split size is, the better would be the understanding of the excerpts by the model, but the size X chunks has to fit the context window, therefore setting it as high as the model allows should benefit the retrieval goals, right?
On the other hand, in my experience the chat memory of previous messages often does more harm than good for RAG, and that also depends on the context size. E.g. I've made a list of questions to the models about the topic of my document to quicky access their proficiency, but asking them all in succession confuses the bot, since they're about different parts of the original document and seem random and unrelated to each other. Some models even directly ask me whether I want to continue dicussing previous question or switch topic.
Is there a way to reduce or disable chat memory for the purposes of RAG?
Thirdly, I think temperature should be kept to 0 to make sure the LLM only uses the provided context to answer questions instead of its imagination. However it's hard to notice the effect of this setting as some model continue to hallucinate even at 0 while others fail to come up with anything at high setting.
Any other settings that would affect RAGging I should know about?
This discussion was converted from issue #2226 on April 17, 2024 16:12.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Meaning the Generation settings area for the model, such as the context size and so on. Can retrieval proficiency be improved by tuning those?
Here are some assumptions, I would expect that the larger the document split size is, the better would be the understanding of the excerpts by the model, but the size X chunks has to fit the context window, therefore setting it as high as the model allows should benefit the retrieval goals, right?
On the other hand, in my experience the chat memory of previous messages often does more harm than good for RAG, and that also depends on the context size. E.g. I've made a list of questions to the models about the topic of my document to quicky access their proficiency, but asking them all in succession confuses the bot, since they're about different parts of the original document and seem random and unrelated to each other. Some models even directly ask me whether I want to continue dicussing previous question or switch topic.
Is there a way to reduce or disable chat memory for the purposes of RAG?
Thirdly, I think temperature should be kept to 0 to make sure the LLM only uses the provided context to answer questions instead of its imagination. However it's hard to notice the effect of this setting as some model continue to hallucinate even at 0 while others fail to come up with anything at high setting.
Any other settings that would affect RAGging I should know about?
Beta Was this translation helpful? Give feedback.
All reactions