Skip to content

add EIS rerank default inference endpoint #129681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brendan-jugan-elastic
Copy link
Contributor

Overview

This PR adds the rerank default inference endpoint for the Elastic Inference Service. This changes makes a few assumptions:

  1. The future model ID contained in the EIS authorizations response is rerank-v1
  2. The default inference endpoint ID follows existing conventions and is .rerank-v1-elastic
  3. The EIS task type -> inference API task type mapping is rerank/text/text-similarity -> TaskType.RERANK
    • This was pulled from this document outlining EIS task type mappings. This information might be outdated, so happy to modify the mapping if we want.

Testing

My testing included basic chat completions and sparse embeddings requests using the existing default endpoints with eis-gateway and eis-ray running locally, to ensure existing functionality works as expected. I've also modified the authorized models in my local eis-gateway, restarted ES, and verified that the list of default endpoints includes the new one for rerank.

Chat Completions:

curl -k -N --location 'http://localhost:9200/_inference/chat_completion/.rainbow-sprinkles-elastic/_stream' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Basic ${ES_AUTH}" \
  --data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "temperature": 0.7,
    "max_completion_tokens": 300
}'

Sparse Embeddings:

curl -k --location --request POST 'http://localhost:9200/_inference/sparse_embedding/.elser-v2-elastic' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Basic ${ES_AUTH}" \
  --data '{
    "input": "A blue sky"
}'

Default Endpoints:

curl -k --location --request GET 'http://localhost:9200/_inference?pretty' \ 
  --header 'Content-Type: application/json' \
  --header "Authorization: Basic ${ES_AUTH}"
{
  "endpoints" : [
    {
      "inference_id" : ".elser-v2-elastic",
      "task_type" : "sparse_embedding",
      "service" : "elastic",
      "service_settings" : {
        "model_id" : "elser-v2",
        "rate_limit" : {
          "requests_per_minute" : 1000
        }
      }
    },
    {
      "inference_id" : ".rainbow-sprinkles-elastic",
      "task_type" : "chat_completion",
      "service" : "elastic",
      "service_settings" : {
        "model_id" : "rainbow-sprinkles",
        "rate_limit" : {
          "requests_per_minute" : 720
        }
      }
    },
    {
      "inference_id" : ".rerank-v1-elastic",
      "task_type" : "rerank",
      "service" : "elastic",
      "service_settings" : {
        "model_id" : "rerank-v1",
        "rate_limit" : {
          "requests_per_minute" : 500
        }
      }
    },
    other default inference endpoints....
  ]
}

@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Inference labels Jun 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-inference-team (Team:Search - Inference)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :SearchOrg/Inference Label for the Search Inference team Team:Search - Inference Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants