Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-62230 - Support for pre-filtering with kNN #255

Merged
merged 12 commits into from
Sep 9, 2024
Merged

Conversation

metonymic-smokey
Copy link
Member

@metonymic-smokey metonymic-smokey commented Aug 12, 2024

  1. Accommodates filtered doc IDs, if required, within a kNN search over a vector index.
  2. Builds and caches a document to vector ID map for looking up vector IDs of the filtered doc IDs.

faiss_vector_posting.go Outdated Show resolved Hide resolved

// vector IDs corresponding to the local doc numbers to be
// considered for the search
vectorIDsToInclude := make([]int64, eligibleDocIDs.Stats().Cardinality)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of allocations here - which is not good ..

  • make of an array of int64s
  • ToArray() below allocates an array as well

Ideally just ToArray() should've sufficed, but if you absolutely cannot work with the return type, then you should just iterate over the bitmap to populate vectorIDsToInclude as opposed to converting it and then reconverting it - which is quite bad!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorIDs are int64 hashes of the original float32 vector. Hence, the conversion.

@metonymic-smokey metonymic-smokey marked this pull request as draft August 26, 2024 04:54
@metonymic-smokey metonymic-smokey marked this pull request as ready for review August 26, 2024 09:00
faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Show resolved Hide resolved
faiss_vector_posting.go Outdated Show resolved Hide resolved
}

scores, ids, err = vecIndex.SearchWithIDs(qVector, k,
vectorIDsToInclude, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you be removing vectorIDsToExclude from vectorIDsToInclude before this or are we certain there never really is going to be an overlap there because of the pre-filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding, deleted results aren't returned as part of the initial filter search, whose results form the basis of the vector include list.

@metonymic-smokey metonymic-smokey force-pushed the pre-filter branch 3 times, most recently from c08d676 to c856dd1 Compare August 28, 2024 13:34
faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Show resolved Hide resolved
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, but looking good to me @metonymic-smokey .

faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Outdated Show resolved Hide resolved
abhinavdangeti
abhinavdangeti previously approved these changes Sep 9, 2024
@abhinavdangeti
Copy link
Member

@metonymic-smokey no more force pushing here please :)

@abhinavdangeti abhinavdangeti merged commit ba364ef into master Sep 9, 2024
6 checks passed
moshaad7 pushed a commit that referenced this pull request Sep 12, 2024
*Accommodates filtered doc IDs, if required, within a kNN search over a vector index.
* Builds and caches a document to vector ID map for looking up vector IDs of the filtered doc IDs.
* Account for nested vectors
* Upgrade bleve_index_api, scorch_segment_api, go-faiss & workflows

---------

Co-authored-by: Abhinav Dangeti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants