Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .by to fill() #1572

Merged
merged 6 commits into from
Aug 28, 2024
Merged

Add .by to fill() #1572

merged 6 commits into from
Aug 28, 2024

Conversation

olivroy
Copy link
Contributor

@olivroy olivroy commented Aug 18, 2024

Addresses part of #1439.

R/fill.R Outdated
fill.data.frame <- function(data,
...,
.direction = c("down", "up", "downup", "updown"),
.by = NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting .by in the .data.frame method rather than the generic to try and be as least disruptive as possible for S3 method authors like dbplyr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We did add .by to the nest() generic, so maybe we should look at this a little more.

Maybe it isn't as disruptive as we thought due to the presence of the ... in the generic.

Copy link
Member

@DavisVaughan DavisVaughan Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my old research on this, it doesn't look like we will cause a warning for other S3 method authors if we add .by to the generic (this is probably why we could do it with nest() too):

https://github.com/DavisVaughan/methodtest/blob/0010788d798e9ff851a871db3cd35e8ee1cb4dd6/R/methods.R#L55-L67

They will just have to opt in to support of .by on their own over time or catch .by and error if its not NULL if they don't support it.

Copy link
Member

@DavisVaughan DavisVaughan Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've gone back to .by in the generic of fill() based on the above analysis. That means that for dbplyr we will see something like this where .by is captured in the ... of the fill.tbl_lazy() method until dbplyr updates its S3 method to handle .by (possibly by just erroring).

By some weird combination of events this passes through dbplyr without error, I've opened tidyverse/dbplyr#1536

df <- data.frame(x = c(1, 2), y = c(2, 2), z = c(1, NA))
df_sqlite <- tbl_lazy(df, con = simulate_sqlite())
df_sqlite |> window_order(x) |> fill(z, .by = y)

<SQL>
SELECT `x`, `y`, MAX(`z`) OVER (PARTITION BY `..dbplyr_partition_1`) AS `z`
FROM (
  SELECT
    `df`.*,
    SUM(CASE WHEN ((`z` IS NULL)) THEN 0 ELSE 1 END) OVER (ORDER BY `x` ROWS UNBOUNDED PRECEDING) AS `..dbplyr_partition_1`,
    SUM(CASE WHEN ((`.by` IS NULL)) THEN 0 ELSE 1 END) OVER (ORDER BY `x` ROWS UNBOUNDED PRECEDING) AS `..dbplyr_partition_2`
  FROM `df`
) AS `q01`

R/fill.R Outdated Show resolved Hide resolved
Comment on lines -111 to +129
dplyr::mutate_at(data, .vars = dplyr::vars(any_of(vars)), .funs = fn)
dplyr::mutate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also switching away from mutate_at() here

dplyr::mutate(
data,
dplyr::across(any_of(vars), .fns = fn),
.by = {{ .by }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing straight through to dplyr means that if .by is wrong for some reason, then the error call shows dplyr::mutate() rather than fill(). I think that is worth it to not have to fiddle with .by in any way, because dplyr has some pretty unique handling for it. (See the tests for more info)

Comment on lines +130 to +136
test_that("errors on named `...` inputs", {
df <- tibble(x = c(1, NA, 2))

expect_snapshot(error = TRUE, {
fill(df, fooy = x)
})
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted we didn't have a test for this and since we are touching check_dots_unnamed() here we may as well add one

@DavisVaughan
Copy link
Member

@olivroy thanks a lot! I'll take over from here. Leaving some comments for another colleague to review

@DavisVaughan DavisVaughan merged commit 622afe8 into tidyverse:main Aug 28, 2024
12 checks passed
@DavisVaughan
Copy link
Member

Thanks @olivroy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants