What should be a good approach to achieve the context based discussion keeping token limit in mind? #344

naveengujjar29 · 2024-02-20T02:41:39Z

naveengujjar29
Feb 20, 2024

I have used the spring-ai for RAG on custom data and use the PineCone VectorStore DB.

With my current implementation, I perform the VectorStore similarity search for the provided input by user then pass that context as well in UserMessage but it is causing issue in "Token Limit Exceed" after some time.

What should be the general practice for such a scenario? How are we going to maintain the chat context?

Below code for reference I have used currently.

 @GetMapping("/ai/generate")
public ChatResponse generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
    List<Document> result = new ArrayList<>();
    if(messageList.isEmpty()) {
        result = vectorStore.similaritySearch(SearchRequest.query(message).withTopK(2));
    } else {
        messageList.add(new UserMessage("Derive out the context from this conversation."));
        Prompt prompt = new Prompt(messageList);
        ChatResponse response = chatClient.call(prompt);
        String responseMessage = response.getResult().getOutput().toString();
        messageList.clear();
        result = vectorStore.similaritySearch(SearchRequest.query(responseMessage).withTopK(2));
    }
    for (Document doc : result) {
        System.out.println(doc.getMetadata().toString());
    }
    String content = result.stream().map(document -> document.getContent()).collect(Collectors.joining("\n"));
    System.out.println("content is:::" + content);
    messageList.add(new UserMessage(message));
    messageList.add(new UserMessage(content));
    Prompt prompt = new Prompt(messageList);
    ChatResponse response = chatClient.call(prompt);
    messageList.remove(1);
    System.out.println(response.getResult().getOutput());
    messageList.add(new AssistantMessage(response.getResult().getOutput().toString()));
    return response;
}

shandilya07 · 2024-04-01T18:00:49Z

shandilya07
Apr 1, 2024

There are two ways I have found. 1st approach is to pass the last "K" conversations with the context window size in mind. 2nd approach is to make use of summarising the previous discussion to compress the information being passed for tracking the historical context of the discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What should be a good approach to achieve the context based discussion keeping token limit in mind? #344

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What should be a good approach to achieve the context based discussion keeping token limit in mind? #344

Uh oh!

Uh oh!

naveengujjar29 Feb 20, 2024

Replies: 1 comment

Uh oh!

shandilya07 Apr 1, 2024

naveengujjar29
Feb 20, 2024

shandilya07
Apr 1, 2024