-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add is_cross_encoder #1869
base: main
Are you sure you want to change the base?
add is_cross_encoder #1869
Conversation
Suggestion: If I understand the intent of this PR correctly, the primary focus is to enable filtering on the leaderboard. Wouldn't it make more sense to add a list of model types as an Enum? This way, it would also allow filtering for specific categories like Late Interaction models or other types that might be added in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. I think setting False as a default would make more sense, since most models are not cross-encoders, but I'd love to here your arguments about this.
@@ -103,6 +104,7 @@ class ModelMeta(BaseModel): | |||
training_datasets: dict[str, list[str]] | None | |||
adapted_from: str | None = None | |||
superseded_by: str | None = None | |||
is_cross_encoder: bool | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the default be False
? Doesn't None
signal that we don't know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like don't know. I think we should specify True
or False
only if someone added this, because in other cases this might be misleading
@@ -591,6 +601,7 @@ def get_prediction_tokens(self, *args, **kwargs): | |||
use_instructions=None, | |||
training_datasets=None, | |||
framework=["PyTorch"], | |||
is_cross_encoder=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it None
? Do we not know whether it is a cross-encoder or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is LLM Mistral
and because it is not truly cross-encoder I marked it as None
(like other LLM and T5 models)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I added this as a cross-encoder. I used it like the MonoT5 reranker but zero-shot (no retrieval training).
It’s actually not bad zero-shot, although ofc worse than if trained.
I’d add it as a cross-encoder in this setting since it’s doing full attention between query and doc. But agree it’s a weird case
@sam-hey Yeah, I think, that would probably make sense (an enum or |
Originally posted by @orionw in #1826 I believe the categories would be more or less the same @x-tabdeveloping. |
I think we can add these model types:
|
Catching up a bit, but are these supposed to be model "classes" or "types" or just general descriptions? This is for the leaderboard, so I could see either way. If model "classes", it should probably be each type that has a unique way of searching (and correspond to searching code in the repo). e.g. for encoders it's a dense encode followed by some similarity func. For cross-encoders it's prediction over all (query, doc) pairs. For sparse models, it's an index-building step followed by a search step. And so on. If this is the case I think we can simplify that list a bit and exclude LLMs since they're cross-encoders. If we're adding general descriptors then I think it makes sense to add fine-grained labels like LLM (zero-shot), Cross-Encoder (which implies fine-tuning), Bi-Encoder, Late-Interaction, etc. |
These namings are only for description and the leaderboard. |
@x-tabdeveloping What do you think we should do? |
Checklist
make test
.make lint
.Small step to #1841. There are a lot of
T5
andLLM
that used for reranking tasks, but I'm not sure how to annotate them.