Skip to content
View tomaarsen's full-sized avatar

Organizations

@nltk @huggingface @embeddings-benchmark @Hugging-Face-Helping-Hand

Block or report tomaarsen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

This is the repo for the container that holds the models for the text2vec-transformers module

Python 49 29 Updated Feb 1, 2025

Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"

Python 19 1 Updated Feb 19, 2025

Analysis on the cost of encoder based models

Jupyter Notebook 9 1 Updated Feb 12, 2025

Getting crystal-like representations with harmonic loss

Jupyter Notebook 175 7 Updated Feb 7, 2025

A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs and agents.

Jupyter Notebook 50 4 Updated Feb 6, 2025

Data for the MTEB leaderboard

Python 18 37 Updated Feb 27, 2025

YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings

Python 6 Updated Dec 16, 2024

Lightweight Nearest Neighbors with Flexible Backends

Python 244 7 Updated Feb 28, 2025

Pre-train Static Word Embeddings

Python 47 3 Updated Jan 29, 2025

Fast Semantic Text Deduplication

Python 545 23 Updated Feb 28, 2025

Latest Weight Averaging (NeurIPS HITY 2022)

Python 28 2 Updated Jun 20, 2023

🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.

Python 12,853 1,172 Updated Feb 28, 2025

ModernBERT model optimized for Apple Neural Engine.

Python 23 1 Updated Jan 10, 2025
Jupyter Notebook 2,954 638 Updated Feb 27, 2025

📄 🤖 Semantic search and workflows for medical/scientific papers

Python 1,376 105 Updated Dec 28, 2024

chrome & firefox extension to chat with webpages: local llms

JavaScript 111 11 Updated Dec 20, 2024

Multiple Positives and Negatives Ranking Loss

Python 6 1 Updated Jan 18, 2025

Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training

Python 61 10 Updated Feb 7, 2025

My NER Experiments with ModernBERT

Python 17 1 Updated Jan 5, 2025

This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and i…

Python 244 39 Updated Sep 27, 2024

Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs

Python 19 2 Updated Jan 20, 2025

Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da

Python 94 7 Updated Dec 19, 2024

Efficiently find the best-suited language model (LM) for your NLP task

Python 117 9 Updated Feb 28, 2025

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

Python 2,678 117 Updated Feb 26, 2025

Google TPU optimizations for transformers models

Python 100 24 Updated Jan 21, 2025

Smart commit messages

Python 18 3 Updated Oct 25, 2024

A library for working with prompt templates locally or on the Hugging Face Hub.

Python 41 2 Updated Feb 12, 2025

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1,230 177 Updated Feb 26, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 430 26 Updated Feb 10, 2025

Starbucks: Improved Training for 2D Matryoshka Embeddings

Python 18 Updated Feb 5, 2025
Next
Showing results