Add nomic-embed-text-v2-moe #193

PierreMesure · 2025-02-13T09:54:48Z

Can we add this new MoE model to the benchmark? It looks promising on paper! 😊

PS: I would have happily sent a PR but the other additions I saw in the repos have all been done by you @KennethEnevoldsen and they have 80+ files changed so it looks a bit too advanced for an external contributor. But tell me if I'm wrong and I can have a second look.

KennethEnevoldsen · 2025-02-13T15:48:14Z

Hi @PierreMesure I think this is great for a PR.

@jalkestrup did a PR a while ago adding a new model #191.

However in your case you don't even need to implement the model you can simply use the sentence transformer wrapper. So you probably only need to add the metadata:

@models.register("paraphrase-multilingual-MiniLM-L12-v2")
def create_multilingual_mini_lm_l12_v2() -> SebModel:
    hf_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
    meta = ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference=f"https://huggingface.co/{hf_name}",
        languages=[],
        open_source=True,
        embedding_size=384,
        architecture="BERT",
        release_date=date(2021, 6, 2),
    )
    return SebModel(
        encoder=LazyLoadEncoder(partial(wrap_sentence_transformer, model_name=hf_name)),  # type: ignore
        meta=meta,
    )

You can add it to this script here:
https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark/blob/main/src/seb/registered_models/sentence_transformer_models.py

What will take the longest time is running the model, which you can simply do by running:

seb run

Once implemented.

PierreMesure · 2025-02-13T19:48:26Z

Awesome, will try in the coming days!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nomic-embed-text-v2-moe #193

Add nomic-embed-text-v2-moe #193

PierreMesure commented Feb 13, 2025

KennethEnevoldsen commented Feb 13, 2025

PierreMesure commented Feb 13, 2025

Add nomic-embed-text-v2-moe #193

Add nomic-embed-text-v2-moe #193

Comments

PierreMesure commented Feb 13, 2025

KennethEnevoldsen commented Feb 13, 2025

PierreMesure commented Feb 13, 2025