Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: grammar / WHERE namespace extensions #58

Open
ntjess opened this issue Dec 19, 2024 · 5 comments
Open

Feature request: grammar / WHERE namespace extensions #58

ntjess opened this issue Dec 19, 2024 · 5 comments

Comments

@ntjess
Copy link
Contributor

ntjess commented Dec 19, 2024

Use case: Cypher supports a robust library for datetime analysis, among other extensions. While it is unrealistic to expect their integration here, it would be nice to allow python equivalents during queries. I envision something like this:

import pandas as pd
GrandCypher(g, namespace={"datetime": pd.to_datetime, ...}).run("""
MATCH (a) --> (b)
WHERE a.date < datetime("2024-01-01")
RETURN a.name
""")

Would love to hear your thoughts.

@ntjess ntjess changed the title Feature request: grammar / eval namespace extensions Feature request: grammar / WHERE namespace extensions Dec 19, 2024
@ntjess
Copy link
Contributor Author

ntjess commented Dec 24, 2024

I was able to achieve something similar with this grammar modification:

%import python.expr_stmt -> python_expr

// Replaces current `where_clause` definition
where_clause: "where"i python_expr

And this implementation:

from lark.reconstruct import Reconstructor

# Add global items here such as "to_datetime": pd.to_datetime
WHERE_EXPRESSION_GLOBALS = {}
...


class _CypherNamespace(dict):
    """
    dot.notation access to dictionary attributes, useful for enabling cypher filtering
    syntax like `m.born < to_datetime("1990")`
    """

    def __getattr__(self, attr):
        out = self.get(attr)
        if isinstance(out, dict):
            return type(self)(out)
        return out

    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

...

_CypherGrammar = Lark(..., maybe_placeholders=False)
reconstructor = Reconstructor(_CypherGrammar)

... 

# Update these CypherTransformer methods
def where_clause(self, where_clause: list[Tree]):
    self.where_string = reconstructor.reconstruct(where_clause[0])

def _new_where_condition(cname_value_map: dict, target_graph: nx.DiGraph, _):
    if not self.where_string:
        return True, []
    eval_locals = {
        cname: _CypherNamespace(target_graph.nodes[value])
        for cname, value in cname_value_map.items()
    }
    result = eval(self.where_string, WHERE_EXPRESSION_GLOBALS, eval_locals)
    return result, [result]

Currently, it assumes a hard-coded list of globals that the user can update with their own values.

It was a fun intro to lark 🙂

@j6k4m8
Copy link
Member

j6k4m8 commented Dec 31, 2024

@ntjess just wanted to pop in and tell you this looks so awesome — I need to take a closer look here (and in #59) but I'm unfortunately in the midst of my phd dissertation prooposal process and it's using up all my cycles for the next wee or so. But I love what you did here! I'm brainstorming about how we can integrate into the official codebase while keeping back-compat and vuln surface-area low!!

@ntjess
Copy link
Contributor Author

ntjess commented Jan 2, 2025

back-compat

One option is to add python_expr at the end instead of replacing the current where matches. In cases that match python expressions, the same behavior will result. In cases (like contains) that are not valid python, Lark should resolve in favor of legacy behaivor.

vuln surface-area low

I've discovered pd.eval which can help with this. It limits python "control codes" like dot-access and module imports, but of course there will always be vulnerabilities associated with eval'ing python code...

@davidmezzetti
Copy link
Contributor

I wonder if a pattern similar to what the python sqlite3 module does is one to consider. Check out this create_function documentation.

I'd be concerned from a security standpoint to introduce evals into the query syntax. Seems like a lot could go wrong there.

@j6k4m8
Copy link
Member

j6k4m8 commented Feb 15, 2025

I'll also just drop here that if we support create_function / namespace style additions, we should totally make sure that there's a clear error message that says disambiguates things like

  • this is not recognized syntax
  • this is a stdlib function but we haven't implemented it yet
  • this is not a recognized function (maybe you forgot a namespace/create_function?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants