Idea: Support for embedding vector data type #142

s4ke · 2024-10-05T10:13:17Z

With pgvector being available and NF Compose being heavily based on postgres, it should be somewhat easy to add support for vector datatypes in NF Compose. This would increase the usefulness of NF Compose in the AI realm tremendeously as it would allow users to not just use NF Compose as an integration tool and a simple API generator. With embedding support it would be possible to get a robust REST API with embeddings in minutes but still be on a platform that is well understood such as Postgres.

Benefits over raw pgvector usage:

Spin up a datastore quickly
No need for manual database migrations, command line client + REST API will take care of most things already
No need to maintain any dependencies in the app you are building, just use the REST API directly as you would with chromadb, but you get the possibilities of a mature project with a more stable API.

Problems that need solving:

In order for embedding vector datatypes to be useful, the REST API will need support for custom ordering, or ordering that is not based on insertion order. This was something we never attempted at the start because we did not have indexing support in Data Series. Now that we have indexing support it should be easier to give users control over ordering. Problem: "Noisy neighbours" that do complex sorting on unindexed cases might deteriorate the performance for others more than usual. But the same could be said about filters on large datasets that are not indexed. A better solution would not be to hide functionality but instead limit the processing time of queries in the read case. (set statement_timeout TO X; at the start of the query should make this much more stable anyways)
Compound Filtering must be thorougly tested. How does filtering in vectors behave when combined with regular $mongo filters.
We will have to support different filter types, more filter operands

s4ke · 2024-10-05T10:23:14Z

Very important: We should not shoehorn something into NF Compose just because we can. This tickets is here to keep track of ideas and notes on what could be useful. We will explore these ideas in other projects first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Support for embedding vector data type #142

Idea: Support for embedding vector data type #142

s4ke commented Oct 5, 2024

s4ke commented Oct 5, 2024

Idea: Support for embedding vector data type #142

Idea: Support for embedding vector data type #142

Comments

s4ke commented Oct 5, 2024

s4ke commented Oct 5, 2024