Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read path #7

Open
piodul opened this issue Jan 9, 2025 · 0 comments
Open

Read path #7

piodul opened this issue Jan 9, 2025 · 0 comments

Comments

@piodul
Copy link

piodul commented Jan 9, 2025

The goal is to make it possible to issue an ANN query, assuming a populated index in OpenSearch cluster.

The idea is to reuse the existing infrastructure for querying secondary indexes. A secondary index is a helper table for which the partition key = the indexed column, and clustering key = the full key of the table being indexed. Currently, when you query a table WHERE indexed_column = X, the following happens:

  1. The partition corresponding to the value of the indexed column is queried in the index (SELECT * FROM index_table WHERE pk = X).
  2. From each row from the result of the previous query, we extract the key of the base table and query the row in the base table.

The idea is to plug into the step (1) and replace it with a query to the OpenSearch instance.

In addition to this, the syntax for ANN queries needs to be implemented (ORDER BY column_of_vector_type ANN OF ...).

Tips

  • The code responsible for implementing the two-step algorithm above is in cql3/statements/select_statement.cc file, in the indexed_table_select_statement class, and the code responsible for step (1) seems to be placed in the read_posting_list method.
  • Short note about the syntax of ANN queries: https://cassandra.apache.org/doc/latest/cassandra/vector-search/vector-search-working-with.html#query-vector-data-with-cql. For now, implementing the similarity_{dot_product,cosine,euclidean} functions can be skipped.
  • For now, tests can assume that a vector index is already created in the OpenSearch cluster; the tests should pre-populate the index manually and only query data through Scylla.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant