Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement Approximate Nearest Neighbor support for DDL (CREATE TABLE, CREATE VECTOR INDEX) #124

Conversation

odeke-em
Copy link
Contributor

@odeke-em odeke-em commented Dec 26, 2024

This change adds ANN distance strategies for GoogleSQL semantics.
While here started unit tests to effectively test out components
without having to have a running Cloud Spanner instance.

Implements Data Definition Language (DDL) functionality for:

  • CREATE TABLE
  • CREATE VECTOR INDEX

Updates #94

@odeke-em odeke-em requested review from a team as code owners December 26, 2024 12:40
@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/langchain-google-spanner-python API. label Dec 26, 2024
@odeke-em odeke-em changed the title Ann support update distance strategies feat: add Approximate Nearest Neighbor support to distance strategies Dec 26, 2024
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch 2 times, most recently from dfeab0a to 2ef461a Compare December 26, 2024 15:06
@odeke-em odeke-em changed the title feat: add Approximate Nearest Neighbor support to distance strategies feat: implement Approximate Nearest Neighbor support Dec 26, 2024
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch 3 times, most recently from 36c552c to 64358ab Compare January 20, 2025 08:56
@odeke-em odeke-em changed the title feat: implement Approximate Nearest Neighbor support feat: implement Approximate Nearest Neighbor support for DDL (CREATE TABLE, CREATE VECTOR INDEX) Jan 20, 2025
Copy link
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the comments, focusing only on the VectorClass. Please run the lints add line breaks to improve the readability.

src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch from 7e5279a to 44d0996 Compare January 21, 2025 10:53
@gauravpurohit06
Copy link
Contributor

/gcbrun

Copy link
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two more comments, based on it... you also need to update function signature throughout the file not mentioning it explicitly.

src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch 3 times, most recently from a791690 to 8ea75e1 Compare January 30, 2025 09:09
This change introduces new nox directives:
* blacken: `nox -s blacken`
* format: `nox -s format` to apply formatting to files
* lint: `nox -s lint` to flag linting issues
* unit: to run unit tests locally

which are the basis to enable scalable development
and continuous testing as I prepare to bring in
Approximate Nearest Neighors (ANN) functionality into
this package.

Also while here, fixed a typo in the README.rst file
that didn't have the correct import path.
This change adds ANN distance strategies for GoogleSQL semantics.
While here started unit tests to effectively test out components
without having to have a running Cloud Spanner instance.

Updates googleapis#94
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch 2 times, most recently from bc0b254 to d654428 Compare January 30, 2025 11:11
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch from d654428 to 430d14d Compare January 30, 2025 11:15
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch from 0f2cd8e to 8be267d Compare January 31, 2025 07:45
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch from b8948a3 to 66930b4 Compare February 1, 2025 15:56
Copy link
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odeke-em, Other than granular comments there are inconsistencies in return type here through the implementation and also use of None keyword. Please correct it.

src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
@gauravpurohit06
Copy link
Contributor

/gcbrun

@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch 2 times, most recently from 90220f1 to 5b8cfd3 Compare February 3, 2025 11:13
@odeke-em odeke-em force-pushed the ANN-support-update-distance_strategies branch from 5b8cfd3 to 130bc46 Compare February 3, 2025 11:15
Copy link
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in public apis... decide whether to use knn vs ann... Mention it explicitly what to do.

src/langchain_google_spanner/vector_store.py Show resolved Hide resolved
src/langchain_google_spanner/vector_store.py Outdated Show resolved Hide resolved
@odeke-em
Copy link
Contributor Author

odeke-em commented Feb 3, 2025

@gauravpurohit06 I humbly and highly recommend that we get this PR in as is, then I can send more given firstly how massive it is, I actually had to delete code out of other helpers to keep in manageable, plus your time constraint. Getting 90% in before you are out of the office goes a much longer way than our 5+ week long PR. Sending smaller PRs for the other parts is planned as I mentioned a couple of times offline and allows for much more effective testing too.

@gauravpurohit06
Copy link
Contributor

/gcbrun

@odeke-em
Copy link
Contributor Author

odeke-em commented Feb 4, 2025

@gauravpurohit06 kindly help me run gcbrun.

@gauravpurohit06
Copy link
Contributor

/gcbrun

Copy link
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gauravpurohit06 gauravpurohit06 merged commit 5a25f91 into googleapis:main Feb 4, 2025
10 checks passed
@odeke-em odeke-em deleted the ANN-support-update-distance_strategies branch February 4, 2025 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/langchain-google-spanner-python API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants