tomaarsen

Tom Aarsen tomaarsen

Sentence Transformers, SetFit & NLTK maintainer, ML Engineer & Fellow @ 🤗Hugging Face

751 followers · 0 following

Achievements

x4 x3 x3

Achievements

x4 x3 x3

Highlights

Pro
1 security advisory credit

Organizations

Starred repositories

weaviate / t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module

Python 49 29 Updated Feb 1, 2025

jfkback / hypencoder-paper

Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"

Python 19 1 Updated Feb 19, 2025

datavistics / encoder-analysis

Analysis on the cost of encoder based models

Jupyter Notebook 9 1 Updated Feb 12, 2025

KindXiaoming / grow-crystals

Getting crystal-like representations with harmonic loss

Jupyter Notebook 175 7 Updated Feb 7, 2025

huggingface / ai-blueprint

A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs and agents.

Jupyter Notebook 50 4 Updated Feb 6, 2025

embeddings-benchmark / results

Data for the MTEB leaderboard

Python 18 37 Updated Feb 27, 2025

hotchpotch / yasem

YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings

Python 6 Updated Dec 16, 2024

MinishLab / vicinity

Lightweight Nearest Neighbors with Flexible Backends

Python 244 7 Updated Feb 28, 2025

MinishLab / tokenlearn

Pre-train Static Word Embeddings

Python 47 3 Updated Jan 29, 2025

MinishLab / semhash

Fast Semantic Text Deduplication

Python 545 23 Updated Feb 28, 2025

JeanKaddour / LAWA

Latest Weight Averaging (NeurIPS HITY 2022)

Python 28 2 Updated Jun 20, 2023

huggingface / smolagents

🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.

Python 12,853 1,172 Updated Feb 28, 2025

smpanaro / ModernBERT-AppleNeuralEngine

ModernBERT model optimized for Apple Neural Engine.

Python 23 1 Updated Jan 10, 2025

patchy631 / ai-engineering-hub

Jupyter Notebook 2,954 638 Updated Feb 27, 2025

neuml / paperai

📄 🤖 Semantic search and workflows for medical/scientific papers

Python 1,376 105 Updated Dec 28, 2024

abhishekkrthakur / chat-ext

chrome & firefox extension to chat with webpages: local llms

JavaScript 111 11 Updated Dec 20, 2024

kddubey / mpnrl

Multiple Positives and Negatives Ranking Loss

Python 6 1 Updated Jan 18, 2025

s-smits / modernbert-finetune

Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training

Python 61 10 Updated Feb 7, 2025

stefan-it / modern-bert-ner

My NER Experiments with ModernBERT

Python 17 1 Updated Jan 5, 2025

brandonstarxel / chunking_evaluation

This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and i…

Python 244 39 Updated Sep 27, 2024

AstraBert / SenTrEv

Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs

Python 19 2 Updated Jan 20, 2025

PrithivirajDamodaran / Route0x

Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da

Python 94 7 Updated Dec 19, 2024

flairNLP / transformer-ranker

Efficiently find the best-suited language model (LM) for your NLP task

Python 117 9 Updated Feb 28, 2025

chonkie-ai / chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

Python 2,678 117 Updated Feb 26, 2025

huggingface / optimum-tpu

Google TPU optimizations for transformers models

Python 100 24 Updated Jan 21, 2025

ariG23498 / smart-commit

Smart commit messages

Python 18 3 Updated Oct 25, 2024

MoritzLaurer / prompt_templates

A library for working with prompt templates locally or on the Hugging Face Hub.

Python 41 2 Updated Feb 12, 2025

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1,230 177 Updated Feb 26, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 430 26 Updated Feb 10, 2025

ielab / Starbucks

Starbucks: Improved Training for 2D Matryoshka Embeddings

Python 18 Updated Feb 5, 2025