Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nomic-embed-text-v2-moe #193

Open
PierreMesure opened this issue Feb 13, 2025 · 2 comments
Open

Add nomic-embed-text-v2-moe #193

PierreMesure opened this issue Feb 13, 2025 · 2 comments

Comments

@PierreMesure
Copy link

Can we add this new MoE model to the benchmark? It looks promising on paper! 😊

PS: I would have happily sent a PR but the other additions I saw in the repos have all been done by you @KennethEnevoldsen and they have 80+ files changed so it looks a bit too advanced for an external contributor. But tell me if I'm wrong and I can have a second look.

@KennethEnevoldsen
Copy link
Owner

Hi @PierreMesure I think this is great for a PR.

@jalkestrup did a PR a while ago adding a new model #191.

However in your case you don't even need to implement the model you can simply use the sentence transformer wrapper. So you probably only need to add the metadata:

@models.register("paraphrase-multilingual-MiniLM-L12-v2")
def create_multilingual_mini_lm_l12_v2() -> SebModel:
    hf_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
    meta = ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference=f"https://huggingface.co/{hf_name}",
        languages=[],
        open_source=True,
        embedding_size=384,
        architecture="BERT",
        release_date=date(2021, 6, 2),
    )
    return SebModel(
        encoder=LazyLoadEncoder(partial(wrap_sentence_transformer, model_name=hf_name)),  # type: ignore
        meta=meta,
    )

You can add it to this script here:
https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark/blob/main/src/seb/registered_models/sentence_transformer_models.py

What will take the longest time is running the model, which you can simply do by running:

seb run

Once implemented.

@PierreMesure
Copy link
Author

Awesome, will try in the coming days!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants