Skip to content

feat: add conversation id to HRI message #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 1, 2025

Conversation

rachwalk
Copy link
Contributor

Purpose

It is useful to have conversation id assigned when the message is passed between different agents

Proposed Changes

Adds conversation id to HRIMessage
Makes it so that ASR agent "follows" a single conversation, so delayed messages from different conversations don't affect runtime
Makes TTS and conversational agents handle the conversation id

Issues

N/A

Testing

tests pass

@rachwalk rachwalk requested a review from maciejmajek March 25, 2025 15:38
Comment on lines 118 to 121
def from_langchain(
cls,
message: LangchainBaseMessage | RAIMultimodalMessage,
conversation_id: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both RAIMultimodalMessage and LangchainBaseMessage have id field. How about using them instead of new argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From langchain documenation: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.base.BaseMessage.html#langchain_core.messages.base.BaseMessage.id - the Langchain id field is unique per message. As discussed conversation_id should not be assumed to be unique per message. Langchain id therefore cannot be used. RAIMultimodalMessage uses the same id as the aforementioned one.

Copy link
Contributor Author

@rachwalk rachwalk Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, that in the documenation it says "This should ideally be provided by the provider/model which created the message." - it is therefore not guaranteed to be provided by every provider/model, and adds unnecessary layers of complexity, especially given that logicallly message id is not the conversation id -- e.g. multiple HRI messages are published on /to_human with the same id, during response to a single prompt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I think we should consider adding chunk_id and message_id to the HRI message instead. This way it would be semantically complete for both streaming and standard use case. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding chunk_id and message_id to the HRI message instead

What is the difference between these two? And how would their pair be used instead of conversation id?

This way it would be semantically complete for both streaming and standard use case.

I do not understand what you mean by "semantically complete".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between these two? And how would their pair be used instead of conversation id?

message_id refers to the unique identifier for a specific message (previously referred to as conversation_id, but renamed for clarity, as "conversation" was unnecessarily specific).
chunk_id identifies a specific part of a message.

In streaming use cases, chunk_id is necessary to track individual parts of a message as they are received.
In standard (non-streaming) use cases, chunk_id can be set to None, indicating that the message was received in full.

I do not understand what you mean by "semantically complete".

This refers to whether the message (or chunk) contains all the necessary information to be considered complete in meaning—i.e., whether it represents a full, coherent message or only a partial one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed outside of Git, message_id naturally corresponds to the primary key of the message, meaning it must be unique. Other possible names for message_id include conversation_id, communication_id, etc.
The rest of the comment remains valid.

Also, a bool is_stream has been proposed outside of git.
I'd rather use two ids instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarise for clarity:

  • there will not be message_id, as there is apparently no need for this information to be included
  • there will be communication_id which identifies a single instance of communication (request/response) wether streaming or not streaming
  • there will be chunk_id which will be set to None in non-streaming cases, and otherwise will identify specific chunks of the message

Then I have a question, as it seems to me that there is no use case for a chunk_id not being a boolean, unless additional data is provided: i.e. either a (creation) timestamp is added to the message or chunk_ids are are sequential. Both of these options would allow to recreate the stream in case massages arrive in non-ordered sequence, as may be the case with some creation protocols.

Otherwise if chunk_id is not to be sequential, and a timestamp is not to be provided, I see no rationale behind using and id instead of a boolean to identify streaming. If that's the case, it would be helpful if you could provide one.

Copy link
Contributor Author

@rachwalk rachwalk Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-summarize:

  • communication_id stays, defined as above
  • seq_no will be added, providing a sequential id for chunks (starts at 0). This is the same for streaming and non streaming messages (single message is by definition 0th message in the sequence)
  • is_done - boolean field, to signify whether communication is finished or not will be added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@rachwalk rachwalk force-pushed the feat/add-id-hrimessage branch from a7e2ce2 to ac729d9 Compare March 31, 2025 14:08
@rachwalk rachwalk force-pushed the feat/add-id-hrimessage branch from 9f88c47 to fb12a21 Compare April 1, 2025 10:40
@maciejmajek maciejmajek self-requested a review April 1, 2025 13:11
@maciejmajek maciejmajek merged commit 2a2c613 into development Apr 1, 2025
5 checks passed
@maciejmajek maciejmajek deleted the feat/add-id-hrimessage branch April 1, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants