Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: 1-bit Embedding search #4510

Open
samueldashadrach opened this issue Jan 27, 2025 · 1 comment
Open

Feature Request: 1-bit Embedding search #4510

samueldashadrach opened this issue Jan 27, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request minor nice to have enhancement

Comments

@samueldashadrach
Copy link

samueldashadrach commented Jan 27, 2025

What?

DragonflyDB supports vector embedding search using FT.SEARCH command. However it only supports embedding search where each vector has weights of float32 datatype (4 bytes per float).

Please prioritise supporting embedding search using 1-bit weights. It would also be nice to support intermediate levels like 2-bit and 4-bit embeddings, but 1-bit is what I'm most interested in.

It is not mandatory for you to provide any logic for how to generate or quantise the embeddings. Search is what is most important.

It is okay if your implementation has a search time that is 50 milliseconds worse than state-of-the-art. It will still be useful if it is as easy-to-use as the rest of dragonflyDB. Ease-of-use is an important factor that some libraries fail on.

Why?

In practice, quantising embedding drastically reduces RAM requirements without significantly loss in accuracy, hence many devs are quantising their embeddings instead.

  • Accuracy loss is not significant, there is empirical data you can google on this. Also I've observed this in my own dataset.
  • Hosting 32-bit embeddings versus 1-bit embeddings can mean the difference between paying $10,000/month and $300/month in hosting costs. Devs working on independent projects or in small startups will obviously pick the latter.

This is one of the highest revenue generating use cases of the AI revolution. Perplexity recently raised $500 million at $8 billion valuation just to build more embedding search. Pinecone is the most popular hosted solution for embedding search. It was last valued at $750 million.

You can use Perplexity or Claude or ChatGPT yourself to confirm that RAG and embedding search add significant real value to user's lives, they are not just buzz words.

Whichever library is able to support devs for this use case is very likely to increase in popularity.

Alternatives

I have been using Pinecone for smaller datasets as it is the fastest way to host embedding today IMO. However it is expensive for large datasets. Also it does not support 1-bit embeddings yet.

FAISS library by Facebook is currently one of the easy-to-use open source libraries for someone who wants to use 1-bit embedding search.

ANN Benchmarks tries to benchmark existing embedding search libraries. Not all these are easy to use though. And not all of them support quantisation.

@BagritsevichStepan BagritsevichStepan self-assigned this Jan 28, 2025
@BagritsevichStepan
Copy link
Contributor

Hi @samueldashadrach , thanks for your request. I will proceed with that and get back to you later regarding whether we will add this or not

@BagritsevichStepan BagritsevichStepan added enhancement New feature or request minor nice to have enhancement labels Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor nice to have enhancement
Projects
None yet
Development

No branches or pull requests

2 participants