You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting inconsistent behaviour depending on whether I supply all fields in a substrait_project() or not (and differing behaviour between Arrow and DuckDB) to the point where it's unclear what the expected behaviour should be.
library(dplyr)
library(substrait)
# y/x/z tibble::tibble(x=1:3, y=4:6) %>%
arrow_substrait_compiler() %>%
substrait_project(x, z=x+1) %>%
collect()
#> # A tibble: 3 × 3#> y x z#> <int> <int> <dbl>#> 1 4 1 2#> 2 5 2 3#> 3 6 3 4# x/y/ztibble::tibble(x=1:3, y=4:6) %>%
arrow_substrait_compiler() %>%
substrait_project(y, z=x+1) %>%
collect()
#> # A tibble: 3 × 3#> x y z#> <int> <int> <dbl>#> 1 1 4 2#> 2 2 5 3#> 3 3 6 4# x/y/ztibble::tibble(x=1:3, y=4:6) %>%
arrow_substrait_compiler() %>%
substrait_project(x, y, z=x+1) %>%
collect()
#> # A tibble: 3 × 3#> x y z#> <int> <int> <dbl>#> 1 1 4 2#> 2 2 5 3#> 3 3 6 4# errortibble::tibble(x=1:3, y=4:6) %>%
arrow_substrait_compiler() %>%
substrait_project(z=x+1) %>%
collect()
#> Error: Invalid: Invalid emit case#> /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:157 FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(), ext_set, conversion_options)# errortibble::tibble(x=1:3, y=4:6) %>%
duckdb_substrait_compiler() %>%
substrait_project(x, z=x+1) %>%
collect()
#> Error: Binder Error: Positional reference 3 out of range (total 2 columns)# errortibble::tibble(x=1:3, y=4:6) %>%
duckdb_substrait_compiler() %>%
substrait_project(y, z=x+1) %>%
collect()
#> Error: Binder Error: Positional reference 3 out of range (total 2 columns)# successtibble::tibble(x=1:3, y=4:6) %>%
duckdb_substrait_compiler() %>%
substrait_project(x, y, z=x+1) %>%
collect()
#> # A tibble: 3 × 3#> x y z#> <int> <int> <dbl>#> 1 1 4 2#> 2 2 5 3#> 3 3 6 4# errortibble::tibble(x=1:3, y=4:6) %>%
duckdb_substrait_compiler() %>%
substrait_project(z=x+1) %>%
collect()
#> Error: Binder Error: Positional reference 2 out of range (total 1 columns)
The text was updated successfully, but these errors were encountered:
The behaviour for substrait_project() is definitely strange here! In general substrait_project() always appends columns (rather than replaces), but I forget the details and our emit case does look strange:
library(dplyr, warn.conflicts=FALSE)
library(substrait, warn.conflicts=FALSE)
projected<-tibble::tibble(x=1:3, y=4:6) %>%
arrow_substrait_compiler() %>%
substrait_project(x, z=x+1)
# Seems correct: we append two fields to the outputprojected$rel$project$expressions#> [[1]]#> message of type 'substrait.Expression' with 1 field set#> selection {#> direct_reference {#> struct_field {#> }#> }#> root_reference {#> }#> }#> #> [[2]]#> message of type 'substrait.Expression' with 1 field set#> scalar_function {#> function_reference: 2#> output_type {#> fp64 {#> nullability: NULLABILITY_NULLABLE#> }#> }#> arguments {#> value {#> selection {#> direct_reference {#> struct_field {#> }#> }#> root_reference {#> }#> }#> }#> }#> arguments {#> value {#> literal {#> fp64: 1#> }#> }#> }#> options {#> name: "overflow"#> preference: "SILENT"#> }#> }# Incorrect: the emit should be 0, 1, 2, 3projected$rel$project$common$emit#> message of type 'substrait.RelCommon.Emit' with 1 field set#> output_mapping: 1#> output_mapping: 2#> output_mapping: 3# Incorrect: the names should be x, y, x, zprojected$schema$names#> [1] "y" "x" "z"
In the meantime, I think maybe you could use substrait_select()?
I'm getting inconsistent behaviour depending on whether I supply all fields in a
substrait_project()
or not (and differing behaviour between Arrow and DuckDB) to the point where it's unclear what the expected behaviour should be.The text was updated successfully, but these errors were encountered: