-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MB-62230 - Support for pre-filtering with kNN #255
Conversation
metonymic-smokey
commented
Aug 12, 2024
•
edited
Loading
edited
- Accommodates filtered doc IDs, if required, within a kNN search over a vector index.
- Builds and caches a document to vector ID map for looking up vector IDs of the filtered doc IDs.
356757b
to
e213fdd
Compare
faiss_vector_posting.go
Outdated
|
||
// vector IDs corresponding to the local doc numbers to be | ||
// considered for the search | ||
vectorIDsToInclude := make([]int64, eligibleDocIDs.Stats().Cardinality) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of allocations here - which is not good ..
make
of an array of int64sToArray()
below allocates an array as well
Ideally just ToArray() should've sufficed, but if you absolutely cannot work with the return type, then you should just iterate over the bitmap to populate vectorIDsToInclude
as opposed to converting it and then reconverting it - which is quite bad!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorIDs are int64 hashes of the original float32 vector. Hence, the conversion.
9c4107d
to
44193bf
Compare
} | ||
|
||
scores, ids, err = vecIndex.SearchWithIDs(qVector, k, | ||
vectorIDsToInclude, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you be removing vectorIDsToExclude
from vectorIDsToInclude
before this or are we certain there never really is going to be an overlap there because of the pre-filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding, deleted results aren't returned as part of the initial filter search, whose results form the basis of the vector include list.
c08d676
to
c856dd1
Compare
021a290
to
ab1b5d2
Compare
ab1b5d2
to
cd7a3db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple minor comments, but looking good to me @metonymic-smokey .
cd7a3db
to
53c4de8
Compare
@metonymic-smokey no more force pushing here please :) |
*Accommodates filtered doc IDs, if required, within a kNN search over a vector index. * Builds and caches a document to vector ID map for looking up vector IDs of the filtered doc IDs. * Account for nested vectors * Upgrade bleve_index_api, scorch_segment_api, go-faiss & workflows --------- Co-authored-by: Abhinav Dangeti <[email protected]>