Closed
Description
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
- This is a feature request for the Python library
Describe the feature or improvement you're requesting
After streaming a chat completion response it is often necessary to recombine the streamed chunks into a message. Two examples are
- in the https://github.com/pydantic/logfire observability platform, when a streamed response has ended the final Assistant message could be displayed nicely in the UI.
- in https://github.com/jackmpcollins/magentic , parallel tool calls are streamed to call these during the generation, and inserting the outputs back into
messages
requires also creating an Assistant message from the streamed chunks.
Currently the internal class ChatCompletionStreamState
makes this easy, but it is private which indicates it should not be relied on. Would it be possible to make this or similar functionality a supported part of the public API?
The current feature set of ChatCompletionStreamState
is ideal:
- get a
ChatCompletion
at any point during the stream (current_completion_snapshot
). This allows logging a partial stream response in case of error, including if max_tokens was reached. - parse the chunks into correct pydantic BaseModels for the tools/response_format (
get_final_completion()
)
Example usage of the existing class
import openai
from openai.lib.streaming.chat._completions import ChatCompletionStreamState
state = ChatCompletionStreamState(
input_tools=openai.NOT_GIVEN,
response_format=openai.NOT_GIVEN,
)
response = client.chat.completions.create(...)
for chunk in response:
state.handle_chunk(chunk)
print(state.current_completion_snapshot)
print(state.get_final_completion())
Additional context
No response