Flexible IO proposal #1161

theomonnom · 2024-12-02T15:28:30Z

No description provided.

changeset-bot · 2024-12-02T15:28:33Z

⚠️ No Changeset found

Latest commit: 678208b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

rektdeckard · 2024-12-03T01:06:11Z

proposal.md

+class PipelineIO(ABC):
+
+    def before_stt_node(self, source: AsyncIterator[rtc.AudioFrame]) -> AsyncIterator[rtc.AudioFrame]:
+        return source
+
+    def after_stt_node(self, source: AsyncIterator[SpeechEvent]) -> AsyncIterator[SpeechEvent]:
+        return source
+
+    def before_llm_node(self, chat_ctx: ChatContext) -> AsyncIterator[ChatChunk] | None:
+        return None
+
+    def after_llm_node(self, source: AsyncIterator[ChatChunk]) -> AsyncIterator[ChatChunk]:
+        return source
+
+    def before_tts_node(self, source: AsyncIterator[str]) -> AsyncIterator[rtc.AudioFrame] | None:
+        return source
+
+    def after_tts_node(self, source: AsyncIterator[rtc.AudioFrame]) -> AsyncIterator[rtc.AudioFrame]:
+        return source


Not my wheelhouse, but is this in any way more convenient or idiomatic than if each pipeline stage managed its own pre/post transform callbacks? E.G.:

def passthrough_audio(source: AsyncIterator[rtc.AudioFrame]) -> AsyncIterator[rtc.AudioFrame]: return source def filter_swearwords(source: AsyncIterator[SpeechEvent]) -> AsyncIterator[SpeechEvent]: return source agent = PipelineAgent( stt=STT(pre=passthrough_audio, post=filter_swearwords) ... )

The scope is different, ideally, we can add new parameters like speech_id inside each step like before_tts_node, ...

bcherry · 2024-12-03T17:08:01Z

proposal.md

+class TextOutput(Protocol):
+    async def write(self, text: str) -> None: ...
+
+    def flush(self) -> None: ...


how should frontend applications reason about "turns" with these two output types? is that what "flush" means? UI will likely want to render each complete "message" in a chat bubble, for instance. maybe having a unique id somewhere could help?

bcherry · 2024-12-03T17:14:51Z

proposal.md

+STT -> LLM -> TTS
+
+```python
+AudioInput = AsyncIterator[rtc.AudioFrame | rtc.AudioFrameEvent]


not sure the scope of what you're working on but where would text input/image/file "chat" input fit in?

bcherry · 2024-12-03T17:26:39Z

proposal.md

+    def clear_queue(self) -> None: ...
+
+
+class TextOutput(Protocol):


In addition to text and audio output that are essentially "talking" or "chat", many applications "output" other structured things either by returning images or through function calls (i.e. JSON output). do we have any thoughts about whether it would make sense to provide an affordance for that in pipelineagent?

bcherry · 2024-12-03T17:36:50Z

proposal.md

+        return source
+
+    def before_tts_node(self, source: AsyncIterator[str] | str) -> AsyncIterator[rtc.AudioFrame] | None:
+        return source


shouldn't this method return AsyncIterator[str] | str?

bcherry · 2024-12-03T17:39:52Z

proposal.md

+
+class PipelineIO(ABC):
+
+    def before_stt_node(self, source: AsyncIterator[rtc.AudioFrame]) -> AsyncIterator[rtc.AudioFrame]:


shouldn't these methods all be async?

bcherry · 2024-12-03T17:43:06Z

proposal.md

+        return source
+
+    def before_llm_node(self, chat_ctx: ChatContext) -> AsyncIterator[ChatChunk] | None:
+        return None


why is this one different than the others? it feels odd that it doesn't have the same return type as input type, and the default implementation returns None which implies its actually very semantically different than the other methods which are all open-ended hooks to transform data in the pipeline or add logging or other side effects.

bcherry · 2024-12-03T18:07:41Z

proposal.md

+    def flush(self) -> None: ...
+
+
+class PipelineIO(ABC):


this name feels a little odd, given that in addition to PipelineIO we also have PipelineOutput (and maybe some forthcoming Input protocol too), but the "IO" in PipelineIO is not related to the pipeline's input nor output itself... might be better as PipelineHooks

Create proposal.md

feb4457

theomonnom changed the title ~~Create proposal.md~~ Flexible IO proposal Dec 2, 2024

rektdeckard reviewed Dec 3, 2024

View reviewed changes

theomonnom and others added 2 commits December 3, 2024 13:50

Update proposal.md

432efe4

Update proposal.md

678208b

bcherry reviewed Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible IO proposal #1161

Flexible IO proposal #1161

theomonnom commented Dec 2, 2024

changeset-bot bot commented Dec 2, 2024 •

edited

Loading

rektdeckard Dec 3, 2024

theomonnom Dec 3, 2024 •

edited

Loading

bcherry Dec 3, 2024

bcherry Dec 3, 2024

bcherry Dec 3, 2024

bcherry Dec 3, 2024

bcherry Dec 3, 2024

bcherry Dec 3, 2024

bcherry Dec 3, 2024

		def clear_queue(self) -> None: ...


		class TextOutput(Protocol):


		class PipelineIO(ABC):

		def before_stt_node(self, source: AsyncIterator[rtc.AudioFrame]) -> AsyncIterator[rtc.AudioFrame]:

Flexible IO proposal #1161

Are you sure you want to change the base?

Flexible IO proposal #1161

Conversation

theomonnom commented Dec 2, 2024

changeset-bot bot commented Dec 2, 2024 • edited Loading

⚠️ No Changeset found

Choose a reason for hiding this comment

theomonnom Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

changeset-bot bot commented Dec 2, 2024 •

edited

Loading

theomonnom Dec 3, 2024 •

edited

Loading