Skip to content

Add Ibm Granite Completion and Chat Completion support #129146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Evgenii-Kazannik
Copy link
Contributor

@Evgenii-Kazannik Evgenii-Kazannik commented Jun 9, 2025

  • Extend watsonx ai with completion and chat completion tasks in order to use corresponding IBM Granite models

elastic inference
watsonx-ai

PUT {{base-url}}/_inference/completion/ibm_watsonx_competion
PUT {{base-url}}/_inference/chat_completion/ibm_watsonx_chat_competion
{
"service": "watsonxai",
"service_settings": {
"api_key": "{{api-key}}",
"url": "us-south.ml.cloud.ibm.com",
"model_id": "ibm/granite-3-3-8b-instruct",
"project_id": "{{project-id}}",
"api_version": "2024-05-02"
}
}

{
"inference_id": "ibm_watsonx_competion",
"task_type": "completion",
"service": "watsonxai",
"service_settings": {
"url": "us-south.ml.cloud.ibm.com",
"api_version": "2024-05-02",
"model_id": "ibm/granite-3-3-8b-instruct",
"project_id": "{{project-id}}",
"rate_limit": {
"requests_per_minute": 120
}
}
}

POST {{base-url}}/_inference/completion/ibm_watsonx_competion/
POST {{base-url}}/_inference/completion/ibm_watsonx_competion/_stream
{
"input": [
"Greenland in short."
]
}

POST {{base-url}}/_inference/chat_completion/ibm_watsonx_chat_competion/_stream
{
"messages": [
{
"role": "user",
"content": "content"
}
],
"max_completion_tokens": 2,
"temperature": 1.2
}

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 9, 2025
@AI-IshanBhatt AI-IshanBhatt added the :SearchOrg/Experiences Label for the Search Experiences team label Jun 10, 2025
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions.

* @return A formatted error message.
*/
public static String buildErrorMessage(TaskType requestType, String inferenceId) {
return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return format("Failed to send Ibm Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);
return format("Failed to send IBM Watsonx %s request from inference entity id [%s]", requestType.toString(), inferenceId);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you

protected IbmWatsonxEmbeddingsRequestManager getEmbeddingsRequestManager(
IbmWatsonxEmbeddingsModel model,
Truncator truncator,
ThreadPool threadPool
) {
return new IbmWatsonxEmbeddingsRequestManager(model, truncator, threadPool);
}

/**
* Builds an error message for Ibm Watsonx actions.
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Builds an error message for Ibm Watsonx actions.
* Builds an error message for IBM Watsonx actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tnx. Applied as suggested

import org.elasticsearch.xpack.inference.services.ibmwatsonx.embeddings.IbmWatsonxEmbeddingsModel;
import org.elasticsearch.xpack.inference.services.ibmwatsonx.rerank.IbmWatsonxRerankModel;

import java.util.Map;

/**
* Interface for creating {@link ExecutableAction} instances for Watsonx models.
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO
From here and further down in logs and javadoc "IBM Watsonx" should be used instead of "Ibm Watsonx". It should be human readable. While class and variable names should stay camel cased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thank you


/**
* Accepts a visitor to create an executable action. The returned action will not return documents in the response.
* @param visitor _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen that underscore was used for some other param descriptions earlier, but it doesn't provide any useful information. I think it should be replaced with proper description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Jan. Done

/**
* Rate limits are defined at
* <a href="https://www.ibm.com/docs/en/watsonx/saas?topic=learning-watson-machine-plans">Watson Machine Learning plans</a>.
* For Lite plan, you've 120 requests per minute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original wording seems a bit off to me. I'd change rerank one as well.

Suggested change
* For Lite plan, you've 120 requests per minute.
* For the Lite plan, the limit is 120 requests per minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I looked through and applied the suggestions.

  • unify naming: IBM Watsonx
  • use suggested comments
  • replace a visitor param undescore ( _ ) with a definition

public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
private final Sender sender;
private final ServiceComponents serviceComponents;

static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in dms it was found that platform is called Watsonx not WatsonX. Could you please unify naming. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Unified the naming as IBM Watsonx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I updated it everywhere from IBM Watsonx to IBM watsonx. Thanks

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-experiences-team (Team:Search - Experiences)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@Samiul-TheSoccerFan Samiul-TheSoccerFan added Team:ML Meta label for the ML team and removed :SearchOrg/Experiences Label for the Search Experiences team labels Jun 10, 2025
@elasticsearchmachine elasticsearchmachine removed the Team:ML Meta label for the ML team label Jun 10, 2025
@Samiul-TheSoccerFan Samiul-TheSoccerFan added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Jun 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

…hat-completion

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Copy link
Member

@dan-rubinstein dan-rubinstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I'd like to also manually test this change. Would you be able to provide some information in the PR description about how you manually tested such as some example API calls to help me get started with the testing?


@Override
public int rateLimitGroupingHash() {
return Objects.hash(uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not need to include the rateLimitServiceSettings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including both back then would have made hashing more accurate. Thank you.
Eventually I removed uri from the model as it's not needed there.
Therefore, only rateLimitServiceSettings is hashed now.
I made an update


private final IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings;

protected URI uri;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is URI only going to be used for completion/chat completion use cases? If yes, can it be in the completion model implementation instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed uri from the model.
Uri is to be set during an inference endpoint creation as part of the service settings.
It's not part of the IBM watsonx.ai API for completions.
Thanks

import java.util.Map;
import java.util.Objects;

public abstract class IbmWatsonxModel extends Model {
public abstract class IbmWatsonxModel extends RateLimitGroupingModel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why this needs to be a RateLimitGroupingModel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type needs to be used in GenericRequestManager
which I believe is also going to handle the requests for other tasks in the future

public class IbmWatsonxActionCreator implements IbmWatsonxActionVisitor {
private final Sender sender;
private final ServiceComponents serviceComponents;

static final String COMPLETION_REQUEST_TYPE = "IBM WatsonX completions";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nitpick but the platform seems to be called IBM watsonx (from their website) but this change updates it everywhere to IBM Watsonx. Can we keep consistent with IBM's capitalization to avoid confusion?

@@ -109,8 +109,8 @@ public DefaultSecretSettings getSecretSettings() {

/**
* Accepts a visitor to create an executable action. The returned action will not return documents in the response.
* @param visitor _
* @param taskSettings _
* @param visitor Interface for creating {@link ExecutableAction} instances for IBM Voyage AI models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should just be Voyage AI instead of IBM Voyage AI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected. Thanks

private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";

public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {
var request = createRequest("secret", randomAlphaOfLength(15), "model", true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use randomized strings when possible? (ex. "secret", "model", etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used randomAlphaOfLength in order to generate other values. Thank you

private static final String AUTH_HEADER_VALUE = "foo";
private static final String API_COMPLETIONS_PATH = "https://abc.com/ml/v1/text/chat?version=apiVersion";

public void testCreateRequest_WithStreaming() throws IOException, URISyntaxException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test for creating a request without streaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one more test case so now it's checked with streaming and without. Thanks

return new IbmWatsonxChatCompletionWithoutAuthRequest(new UnifiedChatInput(List.of(input), "user", stream), chatCompletionModel);
}

private static class IbmWatsonxChatCompletionWithoutAuthRequest extends IbmWatsonxChatCompletionRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why we need to create a WithoutAuth version of the request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw it to be used throughout the plugin and watsonx servicealso uses it for other inference tasks
so I followed the pattern. That said it's about faking the auth implementation to avoid static mocking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying.

import static org.elasticsearch.xpack.inference.MatchersUtils.equalToIgnoringWhitespaceInJsonString;
import static org.hamcrest.Matchers.is;

public class IbmWatsonxChatCompletionServiceSettingsTests extends AbstractWireSerializingTestCase<IbmWatsonxChatCompletionServiceSettings> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests for fromMap for the non-happy cases (ex. modeld missing, projectId missing, URL missing, etc.)? Same goes for cases where optional values aren't set (ex.falling back to default rate limit settings when none are provided)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done

import static org.mockito.Mockito.doThrow;
import static org.mockito.Mockito.mock;

public class IbmWatsonxChatCompletionActionTests extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this test class is either identical or almost identical to some of our other ...ChatCompletionActionTests classes with the exception of the createAction function and the responsJson we define (example). To reduce duplication can we create a base class ChatCompletionActionTests extends ESTestCase with all the shared code (ex. the tests) and just have each service have a IbmWatsonxChatCompletionActionTests extends ChatCompletionActionTests which overrides a createAction, createResponseJson, etc. set of functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Dan. Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this. I think it'll really clean the code up for us going forward.

@Evgenii-Kazannik
Copy link
Contributor Author

Evgenii-Kazannik commented Jun 19, 2025

Thank you for the review Daniel.
Much appreciated

I updated PR description in order to make the manual testing easier

@dan-rubinstein

@jonathan-buttner jonathan-buttner added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 23, 2025
Copy link
Member

@dan-rubinstein dan-rubinstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Great work on the changes! Just have one clarifying question I've added.

@@ -42,7 +40,6 @@ public HttpRequest createHttpRequest() {
httpPost.setEntity(byteEntity);

httpPost.setHeader(HttpHeaders.CONTENT_TYPE, XContentType.JSON.mediaType());
httpPost.setHeader(createAuthBearerHeader(model.getSecretSettings().apiKey()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why this was removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The method decorateWithAuth( adds a header with Bearer token so it's a bit of duplication there

}

@Override
public IbmWatsonxRateLimitServiceSettings rateLimitServiceSettings() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying. I'm okay to leave getServiceSettings and getSecretSettings as is.

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
builder.field(PROJECT_ID_FIELD, model.getServiceSettings().projectId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for clarifying this for me!

return new IbmWatsonxChatCompletionWithoutAuthRequest(new UnifiedChatInput(List.of(input), "user", stream), chatCompletionModel);
}

private static class IbmWatsonxChatCompletionWithoutAuthRequest extends IbmWatsonxChatCompletionRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying.

import static org.mockito.Mockito.doThrow;
import static org.mockito.Mockito.mock;

public class IbmWatsonxChatCompletionActionTests extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this. I think it'll really clean the code up for us going forward.

…hat-completion

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants