-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing priority issue with NULL/TRUE/FALSE. Add support for IN Clause. #66
Conversation
Ended up putting both the changes into a single PR. The IN clause is related to something I'm working on with txtai. I'm trying to sync the SQL capability and graph querying so that vector search is now a feature with txtai's graph search. For example: MATCH P=(A)-[]->(B)
WHERE SIMILAR(A, "vector query")
RETURN P
LIMIT 500 txtai pre-processes this query and will replace MATCH P=(A)-[]->(B)
WHERE A IN [1,2,3...500]
RETURN P
LIMIT 500 I compared doing this vs a list of EQUAL/ORs and the IN clause is significantly faster. For example, in one test 500 equal/or combos took Perhaps in the future, there is a way to build on this as mentioned in #58 in terms of registering custom functions to handle arbitrary logic. |
.github/workflows/python-package.yml
Outdated
@@ -15,7 +15,7 @@ jobs: | |||
runs-on: ubuntu-latest | |||
strategy: | |||
matrix: | |||
python-version: [3.7, 3.8, 3.9, '3.10', '3.11'] | |||
python-version: [3.9, '3.10', '3.11'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on including 3.12 and 3.13 now that we're editing this anyway? I agree that EOL versions can go away!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add those in and see what happens 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works!
host.add_edge(2, 3) | ||
|
||
qry = """ | ||
MATCH (A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this well-mapped to a cypher implementation? i.e., can you check if a vertex (A)
is IN
a list? I thought it'd have to be something like A.id IN [...]
but I may be misremembering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the test to do that but I don't believe it would change the code.
Looking at the spec: https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf
I see this:
MATCH (a)
WHERE a.name IN ['Peter', 'Tobias']
RETURN a.name, a.age
MATCH (n)
WHERE id(n) IN [0, 3, 5]
RETURN n
The latter seems to be the same idea as the unit test within the confines of GrandCypher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah maybe smartest for us to make id
a no-op so that the syntax is consistent? I've been (very very gently) trying to keep the queries 1:1 with a "real" neo4j/cypher database so you could ostensibly copy-paste between the two, or just change the execution location to run on a graphdb... but in practice I'll defer to what you think is right here if you think there's an obvious win off-spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a new function section to the lark spec and implemented id()
. It can be built upon to add additional scalar functions as defined in the spec.
EDIT: I was able to come up with a workaround that gives similar performance. I added an attribute to each node that matches the IN clause and just made that a simple equals check. The If there is further action on this PR or anything else you'd like to see, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. This is great, merging shortly and will ping when it's up on pypi :)
Available in |
Wow, faster than I expected, thank you! |
The recent update of the
lark
parser to the latest version introduced an issue with properly parsing NULL/TRUE/FALSE. It's matching CNAME first.lark
has the ability to set a priority. This change makes sure the rule for NULL/TRUE/FALSE (case insensitive) is matched before CNAME.This change also removes deprecated Python builds as I believe this would have been caught by the unit tests otherwise.
Fixes #64. Fixes #65.