Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing priority issue with NULL/TRUE/FALSE. Add support for IN Clause. #66

Merged
merged 5 commits into from
Feb 17, 2025

Conversation

davidmezzetti
Copy link
Contributor

@davidmezzetti davidmezzetti commented Feb 14, 2025

The recent update of the lark parser to the latest version introduced an issue with properly parsing NULL/TRUE/FALSE. It's matching CNAME first.

lark has the ability to set a priority. This change makes sure the rule for NULL/TRUE/FALSE (case insensitive) is matched before CNAME.

This change also removes deprecated Python builds as I believe this would have been caught by the unit tests otherwise.

Fixes #64. Fixes #65.

@davidmezzetti davidmezzetti changed the title Fix parsing priority issue with NULL/TRUE/FALSE Fix parsing priority issue with NULL/TRUE/FALSE. Add support for IN Clause. Feb 14, 2025
@davidmezzetti
Copy link
Contributor Author

Ended up putting both the changes into a single PR.

The IN clause is related to something I'm working on with txtai. I'm trying to sync the SQL capability and graph querying so that vector search is now a feature with txtai's graph search.

For example:

MATCH P=(A)-[]->(B)
WHERE SIMILAR(A, "vector query")
RETURN P
LIMIT 500

txtai pre-processes this query and will replace similar with an IN clause after doing a vector search lookup. Then the following query is passed to GrandCypher.

MATCH P=(A)-[]->(B)
WHERE A IN [1,2,3...500]
RETURN P
LIMIT 500

I compared doing this vs a list of EQUAL/ORs and the IN clause is significantly faster.

For example, in one test 500 equal/or combos took 1m 56s whereas the same query with an IN clause took 1.4s.

Perhaps in the future, there is a way to build on this as mentioned in #58 in terms of registering custom functions to handle arbitrary logic.

@j6k4m8
Copy link
Member

j6k4m8 commented Feb 15, 2025

Ooh yes I'd love to formalize this with something like #58; @ntjess did some awesome work and I'm still trying to figure out how to upstream it with minimal security question marks. (sorry @ntjess, I didn't forget about you!!!!)

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8, 3.9, '3.10', '3.11']
python-version: [3.9, '3.10', '3.11']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on including 3.12 and 3.13 now that we're editing this anyway? I agree that EOL versions can go away!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add those in and see what happens 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works!

host.add_edge(2, 3)

qry = """
MATCH (A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this well-mapped to a cypher implementation? i.e., can you check if a vertex (A) is IN a list? I thought it'd have to be something like A.id IN [...] but I may be misremembering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the test to do that but I don't believe it would change the code.

Looking at the spec: https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf

I see this:

MATCH (a)
WHERE a.name IN ['Peter', 'Tobias']
RETURN a.name, a.age
MATCH (n)
WHERE id(n) IN [0, 3, 5]
RETURN n

The latter seems to be the same idea as the unit test within the confines of GrandCypher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe smartest for us to make id a no-op so that the syntax is consistent? I've been (very very gently) trying to keep the queries 1:1 with a "real" neo4j/cypher database so you could ostensibly copy-paste between the two, or just change the execution location to run on a graphdb... but in practice I'll defer to what you think is right here if you think there's an obvious win off-spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a new function section to the lark spec and implemented id(). It can be built upon to add additional scalar functions as defined in the spec.

@davidmezzetti
Copy link
Contributor Author

davidmezzetti commented Feb 17, 2025

For now, I'll probably just go with the OR solution until this is ready.

EDIT: I was able to come up with a workaround that gives similar performance. I added an attribute to each node that matches the IN clause and just made that a simple equals check. The IN clause is cleaner but I'm good for now.

If there is further action on this PR or anything else you'd like to see, please let me know.

Copy link
Member

@j6k4m8 j6k4m8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This is great, merging shortly and will ping when it's up on pypi :)

@j6k4m8 j6k4m8 merged commit ef00478 into aplbrain:master Feb 17, 2025
6 checks passed
@j6k4m8
Copy link
Member

j6k4m8 commented Feb 17, 2025

Available in grand-cypher>=0.13.0!

@davidmezzetti
Copy link
Contributor Author

Wow, faster than I expected, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for openCypher IN clause Failed tests with new lark library
2 participants