Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: manual pages for select() and mutate() #351

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

maelle
Copy link
Collaborator

@maelle maelle commented Nov 21, 2024

@@ -16,6 +17,9 @@ select.duckplyr_df <- function(.data, ...) {
rel_try(list(name = "select", x = .data, args = try_list(dots = enquos(...))),
# We could count and create a zero-col data frame, but we can't
# create a duckplyr frame from it anyway.
#' @section Caveats:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this section should be called Fallbacks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and we could have an automatic sentence about telemetry in all fallback section later once telemetry is easier to tweak?

@@ -0,0 +1,18 @@
#' @title Keep or drop columns using their names and types
#'
#' @description This is a method for the [`dplyr::select()`] generic.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we say "generic" here

@@ -16,6 +17,9 @@ select.duckplyr_df <- function(.data, ...) {
rel_try(list(name = "select", x = .data, args = try_list(dots = enquos(...))),
# We could count and create a zero-col data frame, but we can't
# create a duckplyr frame from it anyway.
#' @section Caveats:
#' You cannot use `select.duckplyr_df` with no expression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or with a selection that returns no columns

Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6e757c9 is merged into main:

  • ✔️001_tpch_01: 25.1ms -> 25.7ms [-2.3%, +7.19%]
  • ✔️001_tpch_02: 132ms -> 133ms [-0.47%, +1.88%]
  • ✔️001_tpch_03: 70.4ms -> 70.3ms [-1.8%, +1.4%]
  • ✔️001_tpch_04: 25.2ms -> 24.8ms [-4.36%, +1.5%]
  • ✔️001_tpch_05: 129ms -> 129ms [-1.21%, +1.52%]
  • ✔️001_tpch_06: 15.8ms -> 15.7ms [-2.56%, +1.75%]
  • ✔️001_tpch_07: 145ms -> 145ms [-1.51%, +0.61%]
  • ✔️001_tpch_08: 168ms -> 167ms [-1.95%, +0.37%]
  • ✔️001_tpch_09: 136ms -> 136ms [-1.07%, +1.19%]
  • ✔️001_tpch_10: 80.8ms -> 81.4ms [-1.23%, +2.75%]
  • ✔️001_tpch_11: 69.2ms -> 69.3ms [-2.9%, +3.18%]
  • ✔️001_tpch_12: 29.8ms -> 29.7ms [-2.01%, +1.39%]
  • ✔️001_tpch_13: 20.9ms -> 21ms [-1.97%, +2.45%]
  • ✔️001_tpch_14: 22.3ms -> 22.2ms [-2.53%, +1.34%]
  • ✔️001_tpch_15: 68.4ms -> 68.9ms [-1.44%, +2.93%]
  • ✔️001_tpch_16: 68.2ms -> 68.9ms [-1.13%, +3.06%]
  • ✔️001_tpch_17: 29ms -> 28.6ms [-5.72%, +2.88%]
  • ✔️001_tpch_18: 24.8ms -> 24.7ms [-2.6%, +2.42%]
  • ✔️001_tpch_19: 134ms -> 135ms [-0.68%, +1.97%]
  • ✔️001_tpch_20: 83.8ms -> 84.8ms [-0.88%, +3.37%]
  • ✔️001_tpch_21: 182ms -> 180ms [-2.18%, +0.14%]
  • ✔️001_tpch_22: 131ms -> 132ms [-1.14%, +1.84%]
  • ✔️010_tpch_01: 83ms -> 84.5ms [-2.19%, +5.83%]
  • ✔️010_tpch_02: 74.7ms -> 75.6ms [-3.49%, +6.02%]
  • ✔️010_tpch_03: 68.3ms -> 68ms [-3.15%, +2.27%]
  • ✔️010_tpch_04: 50.5ms -> 50.7ms [-1.44%, +2.58%]
  • ✔️010_tpch_05: 97.4ms -> 97.4ms [-1.35%, +1.47%]
  • ✔️010_tpch_06: 35.7ms -> 36.9ms [-0.38%, +7.55%]
  • ✔️010_tpch_07: 117ms -> 116ms [-3.96%, +1.65%]
  • ✔️010_tpch_08: 138ms -> 138ms [-2.7%, +3.32%]
  • ✔️010_tpch_09: 120ms -> 121ms [-1%, +1.92%]
  • ✔️010_tpch_10: 86.3ms -> 86.2ms [-2.87%, +2.52%]
  • ✔️010_tpch_11: 39.6ms -> 40ms [-2.57%, +4.4%]
  • ✔️010_tpch_12: 63.3ms -> 65.7ms [-1.59%, +8.93%]
  • ✔️010_tpch_13: 53.2ms -> 54.4ms [-1.23%, +5.63%]
  • ✔️010_tpch_14: 43.2ms -> 42.5ms [-5.03%, +1.85%]
  • ✔️010_tpch_15: 58.6ms -> 61.8ms [-1.67%, +12.41%]
  • ✔️010_tpch_16: 45.8ms -> 46ms [-2.93%, +3.91%]
  • ✔️010_tpch_17: 56.1ms -> 57.5ms [-1.26%, +6.51%]
  • ✔️010_tpch_18: 57.2ms -> 56.8ms [-6.8%, +5.52%]
  • ❗🐌010_tpch_19: 123ms -> 125ms [+0.31%, +2.8%]
  • ✔️010_tpch_20: 74.9ms -> 75.3ms [-2.4%, +3.32%]
  • ✔️010_tpch_21: 422ms -> 425ms [-2.18%, +3.63%]
  • ✔️010_tpch_22: 79ms -> 79.4ms [-1.51%, +2.57%]
  • ✔️100_tpch_01: 1.21s -> 1.21s [-3.59%, +4.89%]
  • ✔️100_tpch_02: 123ms -> 122ms [-2.82%, +1.5%]
  • ✔️100_tpch_03: 1.1s -> 1.1s [-4.46%, +5.21%]
  • ✔️100_tpch_04: 1.07s -> 1.08s [-1.58%, +3.66%]
  • ✔️100_tpch_05: 1.19s -> 1.18s [-3.38%, +1.33%]
  • ✔️100_tpch_06: 1.01s -> 1.02s [-1.37%, +2.79%]
  • ✔️100_tpch_07: 1.15s -> 1.15s [-1.33%, +2.52%]
  • ✔️100_tpch_08: 1.19s -> 1.2s [-2.33%, +4.58%]
  • ✔️100_tpch_09: 1.27s -> 1.29s [-6.08%, +8.22%]
  • ✔️100_tpch_10: 1.16s -> 1.16s [-2.84%, +3.7%]
  • ✔️100_tpch_11: 90.7ms -> 86.4ms [-17.54%, +8.15%]
  • ✔️100_tpch_12: 1.14s -> 1.12s [-5.25%, +1.01%]
  • ✔️100_tpch_13: 357ms -> 361ms [-11.72%, +13.97%]
  • ✔️100_tpch_14: 1.03s -> 1.02s [-2.42%, +1.04%]
  • ✔️100_tpch_15: 1.13s -> 1.11s [-3.77%, +1.43%]
  • ✔️100_tpch_16: 132ms -> 128ms [-8.98%, +1.85%]
  • ✔️100_tpch_17: 1.07s -> 1.1s [-0.16%, +6.18%]
  • ✔️100_tpch_18: 1.11s -> 1.12s [-0.95%, +4.22%]
  • ✔️100_tpch_19: 1.2s -> 1.21s [-2.7%, +4.21%]
  • ✔️100_tpch_20: 1.1s -> 1.11s [-0.3%, +0.95%]
  • ✔️100_tpch_21: 2.26s -> 2.23s [-5.34%, +2.84%]
  • ✔️100_tpch_22: 174ms -> 176ms [-9.38%, +11.63%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@krlmlr
Copy link
Member

krlmlr commented Nov 21, 2024

That won't fly, unfortunately. Looks like if we want to reexport, we need to document.

  Undocumented code objects:
    ‘mutate’ ‘select’

https://github.com/tidyverse/duckplyr/actions/runs/11953558550/job/33321797657?pr=351#step:17:223

@hadley: Neither dbplyr nor dtplyr reexport the dplyr generics. I'm inclined to avoid reexporting here too, but to support the existing

library(duckplyr)
data.frame(a = 1) |> mutate(b = 2)

pattern we'd need to add Depends: dplyr . What do you think?

@krlmlr
Copy link
Member

krlmlr commented Nov 28, 2024

Let's ignore the reexport question for the purpose of this PR, and focus only on providing ?mutate.duckplyr_df and similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants