-
Notifications
You must be signed in to change notification settings - Fork 158
Support filter script push down for calcite #3812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
...ch/src/main/java/org/opensearch/sql/opensearch/storage/script/PPLCompoundedScriptEngine.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
Show resolved
Hide resolved
This is a side project that runs the plugin as a standalone CLI and leverages DSL queries to interact with the OpenSearch cluster. |
Yes, we can check the "calcite" related package information in script code to determine the engine type, rather than encode additional information. |
Add option incompatible with rest client is also for security consideration. Otherwise, users can use Rest client to submit java code script without limitation. |
Signed-off-by: Heng Qian <[email protected]>
public static Function1<DataContext, Object[]> compile(String code, Object reason) { | ||
try { | ||
ClassBodyEvaluator cbe = new ClassBodyEvaluator(); | ||
cbe.setClassName("Reducer"); | ||
cbe.setExtendedClass(Utilities.class); | ||
cbe.setImplementedInterfaces(new Class[] {Function1.class, Serializable.class}); | ||
cbe.setParentClassLoader(RexExecutable.class.getClassLoader()); | ||
cbe.cook(new Scanner((String) null, new StringReader(code))); | ||
Class c = cbe.getClazz(); | ||
Constructor<Function1<DataContext, Object[]>> constructor = c.getConstructor(); | ||
return (Function1) constructor.newInstance(); | ||
} catch (IOException | ||
| InstantiationException | ||
| IllegalAccessException | ||
| InvocationTargetException | ||
| NoSuchMethodException | ||
| CompileException var5) { | ||
Exception e = var5; | ||
throw new RuntimeException("While compiling " + reason, e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since we already get the code string, we can new RexExecutable and call getFunction() to avoid this copied method?
Signed-off-by: Heng Qian <[email protected]>
Current implementation depend on Script.java in Core deny any options, if it is not true in future, user can run ANY java code.
Are there any workarounds we can consider? Instead of using raw Java code, can we leverage LINQ4j Enumerable or V2 Expression to limit user input to only allowlisted expressions and reduce the risk? |
// Compile code when creating to detect exception as early as possible | ||
JavaTypeFactoryImpl typeFactory = | ||
new JavaTypeFactoryImpl(rexBuilder.getTypeFactory().getTypeSystem()); | ||
RexToLixTranslator.InputGetter getter = | ||
new ScriptInputGetter(typeFactory, rowType, fieldTypes); | ||
this.code = CalciteScriptEngine.translate(rexBuilder, List.of(rexNode), getter, rowType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking is it possible we delay this compilation to Java until compile()
in ScriptEngine? In this way, we serialize and deserialize RexCall
instead, similar as V2 expression pushdown? Just some thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RexNode is unserializable. We've though of similar method as mentioned in the option2 here: #3379 (comment).
@songkant-aws is working on option2 but seems it's not feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leveraging Kryo serialization for classes without zero-argument constructor requires explicitly registering specific class serializer. That means we may need to spend more efforts on creating separate customized Kryo serializers for Calcite Expression classes/Enumerable classes.
If we go this way, I'm not confident yet to ensure all kinds of expressions can be correctly serialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RexNode is unserializable.
You mean it's not by JDK or Kryo right? Does other serializer work, or even Json's?
Do we know how many classes related to RexCall
we need to serialize? Just thinking if that's not many and if we can ignore fields not needed by code-gen, can we have customize serializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianheng-aws I checked the option 2 you posted. I feel it's different?
What I was thinking is:
- Coordinator: RexNode -> serialized code
- Worker: Code -> RexNode -> Linq4j expression -> Interpret or compile to Java
In this way, the script engine can be bounded to allowlisted RexNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songkant-aws Not sure if you've explored the idea above. If so, could you share your PoC branch if any? I can do more test from my side. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dai-chen For the sake of convenience, I pulled Heng's code in my local and add my Kryo serialization prototype code on top of it. And then pushed the code to my personal branch: https://github.com/songkant-aws/sql/tree/kryo-serialization
Basically, the logic is similar. I just feel adding additional serialization logic for third party classes requires too much effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how many classes related to RexCall we need to serialize?
The common RexNode we need may not be too many. Like RexCall, RexLiteral, RexInputRef, RexLambda, RexLambdaRef... I think the biggest blocker is the inner fields of them. Like RexCall, it has plenty of SqlOperators implemented by Calcite or us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songkant-aws @qianheng-aws Got it. Will have a look and also see if other options on my side. Thanks!
Description
Implement filter script push down for calcite by registering a new script language for java code string. This new script language can only be invoked by Node client since it requires a option of
engine_type
which is not allowed in Rest client.Related Issues
Resolves #3379
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.