-
Notifications
You must be signed in to change notification settings - Fork 39
feat: add conversation id to HRI message #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def from_langchain( | ||
cls, | ||
message: LangchainBaseMessage | RAIMultimodalMessage, | ||
conversation_id: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both RAIMultimodalMessage and LangchainBaseMessage have id field. How about using them instead of new argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From langchain documenation: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.base.BaseMessage.html#langchain_core.messages.base.BaseMessage.id - the Langchain id field is unique per message. As discussed conversation_id
should not be assumed to be unique per message. Langchain id therefore cannot be used. RAIMultimodalMessage uses the same id as the aforementioned one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note, that in the documenation it says "This should ideally be provided by the provider/model which created the message." - it is therefore not guaranteed to be provided by every provider/model, and adds unnecessary layers of complexity, especially given that logicallly message id is not the conversation id -- e.g. multiple HRI messages are published on /to_human
with the same id, during response to a single prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I think we should consider adding chunk_id and message_id to the HRI message instead. This way it would be semantically complete for both streaming and standard use case. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding chunk_id and message_id to the HRI message instead
What is the difference between these two? And how would their pair be used instead of conversation id?
This way it would be semantically complete for both streaming and standard use case.
I do not understand what you mean by "semantically complete".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between these two? And how would their pair be used instead of conversation id?
message_id refers to the unique identifier for a specific message (previously referred to as conversation_id, but renamed for clarity, as "conversation" was unnecessarily specific).
chunk_id identifies a specific part of a message.
In streaming use cases, chunk_id is necessary to track individual parts of a message as they are received.
In standard (non-streaming) use cases, chunk_id can be set to None, indicating that the message was received in full.
I do not understand what you mean by "semantically complete".
This refers to whether the message (or chunk) contains all the necessary information to be considered complete in meaning—i.e., whether it represents a full, coherent message or only a partial one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed outside of Git, message_id naturally corresponds to the primary key of the message, meaning it must be unique. Other possible names for message_id include conversation_id, communication_id, etc.
The rest of the comment remains valid.
Also, a bool is_stream has been proposed outside of git.
I'd rather use two ids instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarise for clarity:
- there will not be
message_id
, as there is apparently no need for this information to be included - there will be
communication_id
which identifies a single instance of communication (request/response) wether streaming or not streaming - there will be
chunk_id
which will be set toNone
in non-streaming cases, and otherwise will identify specific chunks of the message
Then I have a question, as it seems to me that there is no use case for a chunk_id
not being a boolean, unless additional data is provided: i.e. either a (creation) timestamp is added to the message or chunk_id
s are are sequential. Both of these options would allow to recreate the stream in case massages arrive in non-ordered sequence, as may be the case with some creation protocols.
Otherwise if chunk_id
is not to be sequential, and a timestamp is not to be provided, I see no rationale behind using and id instead of a boolean to identify streaming. If that's the case, it would be helpful if you could provide one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-summarize:
communication_id
stays, defined as aboveseq_no
will be added, providing a sequential id for chunks (starts at 0). This is the same for streaming and non streaming messages (single message is by definition 0th message in the sequence)is_done
- boolean field, to signify whether communication is finished or not will be added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
a7e2ce2
to
ac729d9
Compare
…again before new speech is generated
9f88c47
to
fb12a21
Compare
Purpose
It is useful to have conversation id assigned when the message is passed between different agents
Proposed Changes
Adds conversation id to
HRIMessage
Makes it so that ASR agent "follows" a single conversation, so delayed messages from different conversations don't affect runtime
Makes TTS and conversational agents handle the conversation id
Issues
N/A
Testing
tests pass