Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Support for embedding vector data type #142

Open
s4ke opened this issue Oct 5, 2024 · 1 comment
Open

Idea: Support for embedding vector data type #142

s4ke opened this issue Oct 5, 2024 · 1 comment

Comments

@s4ke
Copy link
Member

s4ke commented Oct 5, 2024

With pgvector being available and NF Compose being heavily based on postgres, it should be somewhat easy to add support for vector datatypes in NF Compose. This would increase the usefulness of NF Compose in the AI realm tremendeously as it would allow users to not just use NF Compose as an integration tool and a simple API generator. With embedding support it would be possible to get a robust REST API with embeddings in minutes but still be on a platform that is well understood such as Postgres.

Benefits over raw pgvector usage:

  1. Spin up a datastore quickly
  2. No need for manual database migrations, command line client + REST API will take care of most things already
  3. No need to maintain any dependencies in the app you are building, just use the REST API directly as you would with chromadb, but you get the possibilities of a mature project with a more stable API.

Problems that need solving:

  1. In order for embedding vector datatypes to be useful, the REST API will need support for custom ordering, or ordering that is not based on insertion order. This was something we never attempted at the start because we did not have indexing support in Data Series. Now that we have indexing support it should be easier to give users control over ordering. Problem: "Noisy neighbours" that do complex sorting on unindexed cases might deteriorate the performance for others more than usual. But the same could be said about filters on large datasets that are not indexed. A better solution would not be to hide functionality but instead limit the processing time of queries in the read case. (set statement_timeout TO X; at the start of the query should make this much more stable anyways)
  2. Compound Filtering must be thorougly tested. How does filtering in vectors behave when combined with regular $mongo filters.
  3. We will have to support different filter types, more filter operands
@s4ke
Copy link
Member Author

s4ke commented Oct 5, 2024

Very important: We should not shoehorn something into NF Compose just because we can. This tickets is here to keep track of ideas and notes on what could be useful. We will explore these ideas in other projects first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant