Skip to content

feat: Adding watsonx support in Haystack #1949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

divyaruhil
Copy link

@divyaruhil divyaruhil commented Jun 15, 2025

Related Issues

Proposed Changes:

Add WatsonxGenerator and WatsonxChatGenerator components to wrap IBM watsonx.ai’s text- and chat-generation APIs (supporting streaming, custom models, and generation parameters).

Add WatsonxTextEmbedder and WatsonxDocumentEmbedder components that support embedding use cases for both text and documents.

Ensure all components follow the existing Haystack interfaces and patterns for Generator and Embedder.

Include tests files accordingly.

How did you test it?

  • Created unit and integration test files for the new WatsonxGenerator, WatsonxTextEmbedder, and WatsonxDocumentEmbedder components.
  • Verified functionality by running the test suite to ensure all components behave as expected.
  • Manually tested by importing the components into a Jupyter Notebook and running example queries to confirm that:
    --Text generation works with WatsonxGenerator
    --Chat generation works with WatsonxChatGenerator
    -- Embeddings are correctly generated for both text and documents using the respective embedder components

Checklist

@divyaruhil divyaruhil requested a review from a team as a code owner June 15, 2025 20:10
@divyaruhil divyaruhil requested review from Amnah199 and removed request for a team June 15, 2025 20:10
@CLAassistant
Copy link

CLAassistant commented Jun 15, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Jun 15, 2025
@sjrl
Copy link
Contributor

sjrl commented Jun 16, 2025

Hey @divyaruhil thanks for working on this integration! Here are some initial comments I have before doing an in-depth review:

Screenshot 2025-06-16 at 11 46 42

@divyaruhil
Copy link
Author

Thankyou @sjrl , for reviewing , sure i'll make the requested changes .

@sjrl
Copy link
Contributor

sjrl commented Jun 23, 2025

@divyaruhil thanks for making the changes so far!

I'll also go ahead and do a deeper dive review on the actual components themselves now this week.

@sjrl sjrl removed the request for review from Amnah199 June 23, 2025 13:41
@sjrl
Copy link
Contributor

sjrl commented Jun 24, 2025

Hey @divyaruhil I realize there are quite a few comments. I'd also be happy to push the changes myself if you are willing to give me write access to your branch. Let me know!

@divyaruhil
Copy link
Author

Hi @sjrl, thank you so much for reviewing! I know it’s quite a large PR, and I really appreciate your time. This is my first big contribution, so I’m still learning a lot as I go — which is probably why there are quite a few mistakes. I’ve already enabled “Allow edits by maintainers,” so feel free to push any changes directly to the branch!

@divyaruhil
Copy link
Author

  • Update the repo-level README.md (can find here) with a row for watsonx-haystack

Also, @sjrl can you please help me with this part? I’m not entirely sure how to go about it and could use a bit of guidance.

@sjrl
Copy link
Contributor

sjrl commented Jun 24, 2025

  • Update the repo-level README.md (can find here) with a row for watsonx-haystack

Also, @sjrl can you please help me with this part? I’m not entirely sure how to go about it and could use a bit of guidance.

Yeah for sure! Add a new line for this integration here https://github.com/divyaruhil/haystack-core-integrations/blob/45a180bd2071550b222e35939903d3617b94036f/README.md?plain=1#L62

I'd suggest using the Bedrock (near the top) as an example to follow

Comment on lines +46 to +47
self,
model: str = "ibm/slate-30m-english-rtrvr",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating this just in case it may be missed.

For all component's let's enforce keyword arguments by making the following change

Suggested change
self,
model: str = "ibm/slate-30m-english-rtrvr",
self,
*,
model: str = "ibm/slate-30m-english-rtrvr",

deserialize_secrets_inplace(data["init_parameters"], keys=["api_key"])
return default_from_dict(cls, data)

@component.output_types(replies=list[str], meta=list[dict[str, Any]], chunks=list[StreamingChunk])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh we don't typically return the chunks as part of the response. We usually only create them internally and pass them to a streaming_callback function. You can see an example of that in our OpenAIChatGenerator._handle_stream_response where you can see one of it's args is callback: SyncStreamingCallbackT. You can find the function here

Comment on lines +17 to +20
@component
class WatsonxGenerator:
"""
Generates text using IBM's watsonx.ai foundational models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to have not brought this up earlier, but it would be great if you could actually have WatsonxGenerator inherit from WatsonxChatGenerator and simply overwrite the relevant methods (e.g. run and run_async) to work with our standard Generator.run inputs which are

    @component.output_types(replies=List[str], meta=List[Dict[str, Any]])
    def run(
        self,
        prompt: str,
        system_prompt: Optional[str] = None,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
    ):

This is a pattern we are planning to follow for our other chat generators but haven't had a chance to do yet. But hopefully this should help reduce duplicate code between the two components.

Let me know if you need anymore clarifications!

divyaruhil and others added 2 commits June 24, 2025 14:29
@divyaruhil
Copy link
Author

Hi @sjrl, after committing the suggested changes, some tests are now failing. Should I revert those changes? Also, just a quick heads-up—it might take me a little while to thoughtfully work through all the requested updates, but I’m on it!

@sjrl
Copy link
Contributor

sjrl commented Jun 25, 2025

Hi @sjrl, after committing the suggested changes, some tests are now failing. Should I revert those changes? Also, just a quick heads-up—it might take me a little while to thoughtfully work through all the requested updates, but I’m on it!

Hey @divyaruhil my colleague @anakin87 and I decided to go through the pyproject.toml and the github workflow to make sure it matched with our other integrations. Currently everything is passing which is great!

So I think it's safe to leave those files alone now and please go ahead with the other requested updates!

@divyaruhil
Copy link
Author

Sure, @sjrl! Really appreciate the time and effort you and @anakin87 have put into the review. I’ll work through the remaining comments and aim to wrap things up as my schedule allows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:watsonx topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add watsonx support to Haystack
4 participants