You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(substrait)
#> #> Attaching package: 'substrait'#> The following object is masked from 'package:stats':#> #> filtertbl<-tibble::tibble(x= c("apple pie", "pork pie", "pork chop"))
# dplyr - all good without regextbl %>%
filter(grepl("pie", x))
#> # A tibble: 2 × 1#> x #> <chr> #> 1 apple pie#> 2 pork pie# substrait - all good without regextbl %>%
duckdb_substrait_compiler() %>%
filter(grepl("pie", x)) %>%
collect()
#> # A tibble: 2 × 1#> x #> <chr> #> 1 apple pie#> 2 pork pie# dplyr - all good with regextbl %>%
filter(grepl("pie$", x))
#> # A tibble: 2 × 1#> x #> <chr> #> 1 apple pie#> 2 pork pie# substrait - doesn't work without regextbl %>%
duckdb_substrait_compiler() %>%
filter(grepl("pie$", x)) %>%
collect()
#> # A tibble: 0 × 1#> # … with 1 variable: x <chr>
The Substrait spec doesn't state whether regex can or cannot be used in the functions, so this is something we want to raise there. I suspect that the contains() function which has been bound to R's grepl() here doesn't translate to something which allows regex; in the Substrait string function spec there are function like count_substring and regexp_count_substring; implying differing version of functions depending on whether regex are allowed. There is currently no regexp_contains. We should probably open an issue on the Substrait repo, and in the meantime use a workaround (e.g. could we perhaps bind regexp_count_substring(x) > 0 to grepl())?
Created on 2023-04-05 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: