Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add is_cross_encoder #1869

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

add is_cross_encoder #1869

wants to merge 2 commits into from

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Jan 24, 2025

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Small step to #1841. There are a lot of T5 and LLM that used for reranking tasks, but I'm not sure how to annotate them.

@sam-hey
Copy link
Contributor

sam-hey commented Jan 26, 2025

Suggestion: If I understand the intent of this PR correctly, the primary focus is to enable filtering on the leaderboard. Wouldn't it make more sense to add a list of model types as an Enum? This way, it would also allow filtering for specific categories like Late Interaction models or other types that might be added in the future.

Copy link
Collaborator

@x-tabdeveloping x-tabdeveloping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. I think setting False as a default would make more sense, since most models are not cross-encoders, but I'd love to here your arguments about this.

@@ -103,6 +104,7 @@ class ModelMeta(BaseModel):
training_datasets: dict[str, list[str]] | None
adapted_from: str | None = None
superseded_by: str | None = None
is_cross_encoder: bool | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be False? Doesn't None signal that we don't know?

Copy link
Collaborator Author

@Samoed Samoed Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like don't know. I think we should specify True or False only if someone added this, because in other cases this might be misleading

@@ -591,6 +601,7 @@ def get_prediction_tokens(self, *args, **kwargs):
use_instructions=None,
training_datasets=None,
framework=["PyTorch"],
is_cross_encoder=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it None? Do we not know whether it is a cross-encoder or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is LLM Mistral and because it is not truly cross-encoder I marked it as None (like other LLM and T5 models)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added this as a cross-encoder. I used it like the MonoT5 reranker but zero-shot (no retrieval training).

It’s actually not bad zero-shot, although ofc worse than if trained.

I’d add it as a cross-encoder in this setting since it’s doing full attention between query and doc. But agree it’s a weird case

@x-tabdeveloping
Copy link
Collaborator

@sam-hey Yeah, I think, that would probably make sense (an enum or Literal[...]).
What categories would you then choose though?

@sam-hey
Copy link
Contributor

sam-hey commented Jan 27, 2025

Thanks for raising @sam-hey!

I can definitely see the benefit! On the other hand, having it standardized makes it so each model class has the same function and is more reliable that way.

I can see both sides, but personally I think I would prefer to keep the core search functions in MTEB, so users can see them there and assume each model searches the same within their own “class” (eg that all dense retrievers use the same base functionality). I think it’d be great if we made BM25 a first class MTEB model so we didn’t have to rely on that (and could also add other sparse non-neural versions like Pyserini).

OTOH, there are probably 3 ish other model “classes” or types that would involve a different search functionality: multi-vector (like ColBERT as you say), and then perhaps neural sparse retrieval (like Splade) and generative retrieval.

So we should definitely make it so that each of these could be added, which as @KennethEnevoldsen says likely involves a change to the interface. But since there are less than 10 model “classes”, it seems like we could do that with an if statement. But perhaps it’s too early in the morning and I’m missing something!

Originally posted by @orionw in #1826

I believe the categories would be more or less the same @x-tabdeveloping.

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 27, 2025

I think we can add these model types:

  • Encoder
  • SparseEncoder (ColBERT, BM25)
  • BiEncoder
  • CrossEncoder
  • T5 (rename)
  • LLM

@orionw
Copy link
Contributor

orionw commented Jan 27, 2025

Catching up a bit, but are these supposed to be model "classes" or "types" or just general descriptions? This is for the leaderboard, so I could see either way.

If model "classes", it should probably be each type that has a unique way of searching (and correspond to searching code in the repo). e.g. for encoders it's a dense encode followed by some similarity func. For cross-encoders it's prediction over all (query, doc) pairs. For sparse models, it's an index-building step followed by a search step. And so on. If this is the case I think we can simplify that list a bit and exclude LLMs since they're cross-encoders.

If we're adding general descriptors then I think it makes sense to add fine-grained labels like LLM (zero-shot), Cross-Encoder (which implies fine-tuning), Bi-Encoder, Late-Interaction, etc.

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 27, 2025

These namings are only for description and the leaderboard.

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 31, 2025

@x-tabdeveloping What do you think we should do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants