Add Ibm Granite Completion and Chat Completion support #129146

Evgenii-Kazannik · 2025-06-09T12:54:38Z

Extend watsonx ai with completion and chat completion tasks in order to use corresponding IBM Granite models

PUT {{base-url}}/_inference/completion/ibm_watsonx_competion
PUT {{base-url}}/_inference/chat_completion/ibm_watsonx_chat_competion
{
"service": "watsonxai",
"service_settings": {
"api_key": "{{api-key}}",
"url": "us-south.ml.cloud.ibm.com",
"model_id": "ibm/granite-3-3-8b-instruct",
"project_id": "{{project-id}}",
"api_version": "2024-05-02"
}
}

{
"inference_id": "ibm_watsonx_competion",
"task_type": "completion",
"service": "watsonxai",
"service_settings": {
"url": "us-south.ml.cloud.ibm.com",
"api_version": "2024-05-02",
"model_id": "ibm/granite-3-3-8b-instruct",
"project_id": "{{project-id}}",
"rate_limit": {
"requests_per_minute": 120
}
}
}

POST {{base-url}}/_inference/completion/ibm_watsonx_competion/
POST {{base-url}}/_inference/completion/ibm_watsonx_competion/_stream
{
"input": [
"Greenland in short."
]
}

POST {{base-url}}/_inference/chat_completion/ibm_watsonx_chat_competion/_stream
{
"messages": [
{
"role": "user",
"content": "content"
}
],
"max_completion_tokens": 2,
"temperature": 1.2
}

Jan-Kazlouski-elastic

Left a few suggestions.

Jan-Kazlouski-elastic · 2025-06-10T09:51:14Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

+     * @return A formatted error message.
+     */
+    public static String buildErrorMessage(TaskType requestType, String inferenceId) {
+        return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);


Suggested change

return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

return format("Failed to send IBM Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

Done. Thank you

Jan-Kazlouski-elastic · 2025-06-10T09:51:39Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

    protected IbmWatsonxEmbeddingsRequestManager getEmbeddingsRequestManager(
        IbmWatsonxEmbeddingsModel model,
        Truncator truncator,
        ThreadPool threadPool
    ) {
        return new IbmWatsonxEmbeddingsRequestManager(model, truncator, threadPool);
    }
+
+    /**
+     * Builds an error message for Ibm Watsonx actions.


Suggested change

* Builds an error message for Ibm Watsonx actions.

* Builds an error message for IBM Watsonx actions.

Tnx. Applied as suggested

Jan-Kazlouski-elastic · 2025-06-10T09:56:43Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionVisitor.java

 import org.elasticsearch.xpack.inference.services.ibmwatsonx.embeddings.IbmWatsonxEmbeddingsModel;
 import org.elasticsearch.xpack.inference.services.ibmwatsonx.rerank.IbmWatsonxRerankModel;

 import java.util.Map;

+/**
+ * Interface for creating {@link ExecutableAction} instances for Watsonx models.


IMHO
From here and further down in logs and javadoc "IBM Watsonx" should be used instead of "Ibm Watsonx". It should be human readable. While class and variable names should stay camel cased.

Updated. Thank you

Jan-Kazlouski-elastic · 2025-06-10T10:02:59Z

...sticsearch/xpack/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionModel.java

+
+    /**
+     * Accepts a visitor to create an executable action. The returned action will not return documents in the response.
+     * @param visitor _


I have seen that underscore was used for some other param descriptions earlier, but it doesn't provide any useful information. I think it should be replaced with proper description.

Thank you, Jan. Done

Jan-Kazlouski-elastic · 2025-06-10T10:06:38Z

.../xpack/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionServiceSettings.java

+    /**
+     * Rate limits are defined at
+     * <a href="https://www.ibm.com/docs/en/watsonx/saas?topic=learning-watson-machine-plans">Watson Machine Learning plans</a>.
+     * For Lite plan, you've 120 requests per minute.


Original wording seems a bit off to me. I'd change rerank one as well.

Suggested change

* For Lite plan, you've 120 requests per minute.

* For the Lite plan, the limit is 120 requests per minute.

Thank you. I looked through and applied the suggestions.

unify naming: IBM Watsonx

use suggested comments

replace a visitor param undescore ( _ ) with a definition

Jan-Kazlouski-elastic · 2025-06-10T13:38:48Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

 public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
    private final Sender sender;
    private final ServiceComponents serviceComponents;

+    static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";


After discussion in dms it was found that platform is called Watsonx not WatsonX. Could you please unify naming. Thank you!

Thanks. Unified the naming as IBM Watsonx

A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

Sure. I updated it everywhere from IBM Watsonx to IBM watsonx. Thanks

elasticsearchmachine · 2025-06-10T14:13:17Z

Pinging @elastic/search-experiences-team (Team:Search - Experiences)

elasticsearchmachine · 2025-06-10T14:13:17Z

Pinging @elastic/search-eng (Team:SearchOrg)

elasticsearchmachine · 2025-06-10T14:23:35Z

Pinging @elastic/ml-core (Team:ML)

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

dan-rubinstein

Great work. I'd like to also manually test this change. Would you be able to provide some information in the PR description about how you manually tested such as some example API calls to help me get started with the testing?

dan-rubinstein · 2025-06-16T15:18:25Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java

+
+    @Override
+    public int rateLimitGroupingHash() {
+        return Objects.hash(uri);


Why does this not need to include the rateLimitServiceSettings?

Including both back then would have made hashing more accurate. Thank you.
Eventually I removed uri from the model as it's not needed there.
Therefore, only rateLimitServiceSettings is hashed now.
I made an update

dan-rubinstein · 2025-06-16T15:20:34Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java


    private final IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings;

+    protected URI uri;


Is URI only going to be used for completion/chat completion use cases? If yes, can it be in the completion model implementation instead?

I removed uri from the model.
Uri is to be set during an inference endpoint creation as part of the service settings.
It's not part of the IBM watsonx.ai API for completions.
Thanks

dan-rubinstein · 2025-06-16T15:23:48Z

...nce/src/main/java/org/elasticsearch/xpack/inference/services/ibmwatsonx/IbmWatsonxModel.java

 import java.util.Map;
 import java.util.Objects;

-public abstract class IbmWatsonxModel extends Model {
+public abstract class IbmWatsonxModel extends RateLimitGroupingModel {


Can you clarify why this needs to be a RateLimitGroupingModel?

This type needs to be used in GenericRequestManager
which I believe is also going to handle the requests for other tasks in the future

dan-rubinstein · 2025-06-16T15:34:29Z

...va/org/elasticsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxActionCreator.java

 public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
    private final Sender sender;
    private final ServiceComponents serviceComponents;

+    static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";


A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

dan-rubinstein · 2025-06-16T15:40:13Z

...ain/java/org/elasticsearch/xpack/inference/services/voyageai/rerank/VoyageAIRerankModel.java

@@ -109,8 +109,8 @@ public DefaultSecretSettings getSecretSettings() {

    /**
     * Accepts a visitor to create an executable action. The returned action will not return documents in the response.
-     * @param visitor _
-     * @param taskSettings _
+     * @param visitor          Interface for creating {@link ExecutableAction} instances for IBM Voyage AI models.


I believe this should just be Voyage AI instead of IBM Voyage AI.

Corrected. Thanks

dan-rubinstein · 2025-06-16T18:37:42Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+    private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";
+
+    public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {
+        var request = createRequest("secret", randomAlphaOfLength(15), "model", true);


Can we use randomized strings when possible? (ex. "secret", "model", etc)

I used randomAlphaOfLength in order to generate other values. Thank you

dan-rubinstein · 2025-06-16T18:37:54Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+    private static final String AUTH_HEADER_VALUE = "foo";
+    private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";
+
+    public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {


Should there be a test for creating a request without streaming?

Added one more test case so now it's checked with streaming and without. Thanks

dan-rubinstein · 2025-06-16T18:40:16Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+        return new IbmWatsonxChatCompletionWithoutAuthRequest(new UnifiedChatInput(List.of(input), "user", stream), chatCompletionModel);
+    }
+
+    private static class IbmWatsonxChatCompletionWithoutAuthRequest extends IbmWatsonxChatCompletionRequest {


Can you clarify why we need to create a WithoutAuth version of the request?

I saw it to be used throughout the plugin and watsonx servicealso uses it for other inference tasks
so I followed the pattern. That said it's about faking the auth implementation to avoid static mocking

Got it, thanks for clarifying.

dan-rubinstein · 2025-06-16T18:43:50Z

...k/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionServiceSettingsTests.java

+import static org.elasticsearch.xpack.inference.MatchersUtils.equalToIgnoringWhitespaceInJsonString;
+import static org.hamcrest.Matchers.is;
+
+public class IbmWatsonxChatCompletionServiceSettingsTests extends AbstractWireSerializingTestCase<IbmWatsonxChatCompletionServiceSettings> {


Can we add tests for fromMap for the non-happy cases (ex. modeld missing, projectId missing, URL missing, etc.)? Same goes for cases where optional values aren't set (ex.falling back to default rate limit settings when none are provided)?

Thanks. Done

dan-rubinstein · 2025-06-16T18:57:07Z

...icsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxChatCompletionActionTests.java

+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.mock;
+
+public class IbmWatsonxChatCompletionActionTests extends ESTestCase {


Seems like this test class is either identical or almost identical to some of our other ...ChatCompletionActionTests classes with the exception of the createAction function and the responsJson we define (example). To reduce duplication can we create a base class ChatCompletionActionTests extends ESTestCase with all the shared code (ex. the tests) and just have each service have a IbmWatsonxChatCompletionActionTests extends ChatCompletionActionTests which overrides a createAction, createResponseJson, etc. set of functions?

Thank you, Dan. Done

Great work on this. I think it'll really clean the code up for us going forward.

…hat-completion

Evgenii-Kazannik · 2025-06-19T09:37:50Z

Thank you for the review Daniel.
Much appreciated

I updated PR description in order to make the manual testing easier

@dan-rubinstein

dan-rubinstein

Looks good. Great work on the changes! Just have one clarifying question I've added.

dan-rubinstein · 2025-06-23T14:50:04Z

...asticsearch/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequest.java

@@ -42,7 +40,6 @@ public HttpRequest createHttpRequest() {
        httpPost.setEntity(byteEntity);

        httpPost.setHeader(HttpHeaders.CONTENT_TYPE, XContentType.JSON.mediaType());
-        httpPost.setHeader(createAuthBearerHeader(model.getSecretSettings().apiKey()));


Can you clarify why this was removed?

Sure. The method decorateWithAuth( adds a header with Bearer token so it's a bit of duplication there

dan-rubinstein · 2025-06-23T15:04:01Z

...sticsearch/xpack/inference/services/ibmwatsonx/completion/IbmWatsonxChatCompletionModel.java

+    }
+
+    @Override
+    public IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings() {


Got it, thanks for clarifying. I'm okay to leave getServiceSettings and getSecretSettings as is.

dan-rubinstein · 2025-06-23T15:07:11Z

...earch/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestEntity.java

+    @Override
+    public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
+        builder.startObject();
+        builder.field(PROJECT_ID_FIELD, model.getServiceSettings().projectId());


Good point, thanks for clarifying this for me!

dan-rubinstein · 2025-06-23T15:07:33Z

...search/xpack/inference/services/ibmwatsonx/request/IbmWatsonxChatCompletionRequestTests.java

+        return new IbmWatsonxChatCompletionWithoutAuthRequest(new UnifiedChatInput(List.of(input), "user", stream), chatCompletionModel);
+    }
+
+    private static class IbmWatsonxChatCompletionWithoutAuthRequest extends IbmWatsonxChatCompletionRequest {


Got it, thanks for clarifying.

dan-rubinstein · 2025-06-23T15:07:55Z

...icsearch/xpack/inference/services/ibmwatsonx/action/IbmWatsonxChatCompletionActionTests.java

+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.mock;
+
+public class IbmWatsonxChatCompletionActionTests extends ESTestCase {


Great work on this. I think it'll really clean the code up for us going forward.

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

Add Ibm Granite Completion and Chat Completion support

78ab1da

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 9, 2025

Evgenii-Kazannik mentioned this pull request Jun 10, 2025

Update Inference specification for Watsonx's completion and chat comp… elastic/elasticsearch-specification#4505

Merged

AI-IshanBhatt added the :SearchOrg/Experiences Label for the Search Experiences team label Jun 10, 2025

Jan-Kazlouski-elastic reviewed Jun 10, 2025

View reviewed changes

Apply suggestions

f92f348

Samiul-TheSoccerFan added Team:ML Meta label for the ML team and removed :SearchOrg/Experiences Label for the Search Experiences team labels Jun 10, 2025

elasticsearchmachine removed the Team:ML Meta label for the ML team label Jun 10, 2025

Samiul-TheSoccerFan added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Jun 10, 2025

Samiul-TheSoccerFan added the >enhancement label Jun 10, 2025

Merge branch 'main' into Add-IBM-Granite-support-for-completion-and-c…

510e3c5

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

dan-rubinstein reviewed Jun 16, 2025

View reviewed changes

Merge branch 'main' into Add-IBM-Granite-support-for-completion-and-c…

d6d19be

…hat-completion

github-actions bot deployed to docs-preview June 19, 2025 07:59 View deployment

jonathan-buttner added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 23, 2025

dan-rubinstein reviewed Jun 23, 2025

View reviewed changes

Merge branch 'main' into Add-IBM-Granite-support-for-completion-and-c…

a6eaec6

…hat-completion # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java

github-actions bot deployed to docs-preview June 23, 2025 16:18 View deployment

	return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);
	return format("Failed to send IBM Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

	* Builds an error message for Ibm Watsonx actions.
	* Builds an error message for IBM Watsonx actions.

	* For Lite plan, you've 120 requests per minute.
	* For the Lite plan, the limit is 120 requests per minute.


		private final IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings;

		protected URI uri;

Add Ibm Granite Completion and Chat Completion support #129146

Are you sure you want to change the base?

Add Ibm Granite Completion and Chat Completion support #129146

Conversation

Evgenii-Kazannik commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jan-Kazlouski-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

elasticsearchmachine commented Jun 10, 2025

Uh oh!

dan-rubinstein left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Evgenii-Kazannik commented Jun 9, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 10, 2025 •

edited

Loading

Evgenii-Kazannik commented Jun 19, 2025 •

edited

Loading