-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "User defined coercion" rules #10423
Comments
I found that to have impl UDFImpl for T {
fn name(&self) -> &str {
&self.name
}
fn coerce_types(&self, data_types: &[DataType]) -> Result<Vec<DataType>> {
not_impl_err!("Function {} does not implement coerce_types", self.name)
}
}
impl ScalarUDFImpl: UDFImpl
impl AggregateUDFImpl: UDFImpl Then we can have fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: Arc<dyn UDFImpl>,
) -> Result<Vec<Expr>> {} Another alternative is having a duplicate function for scalar and aggregate for a related function fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: &ScalarUDF,
) -> Result<Vec<Expr>> {}
fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: &AggregateUDF,
) -> Result<Vec<Expr>> {} I think the first option is potentially beneficial in the long run(?) but the user now needs to define two traits. The second option only increases the maintenance cost. What do you think about this @alamb |
It is a good observation that
I agree with your analysis of the tradeoffs: a common base trait would result in less duplication in DataFusion However, I personally prefer duplicating
|
Is your feature request related to a problem or challenge?
DataFusion automatically "coerces" (see docs here) input argument types to match the types required of operations or functions.
For functions, this is described by a desired TypeSignature
However, some functions have special hard coded coercion logic such as sum and count (TODO link) as well as some Array functions like
make_array
. We started down the path of encoding the special array semantics intoTypeSignature
(seeArrayFunctionSignature
))However, as we continue to find other examples of different desired rules (most recently in sum and count),
TypeSignature
will grow and become more and more specializedDescribe the solution you'd like
@jayzhan211 had a great suggestion #10268 (comment) that in addition to encoding common coercion behaviors in
TypeSignature
, we can also add a variant ofTypeSignature
that permits user defined coercion rulesDescribe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: