Skip to content

Commit

Permalink
as_duckplyr_tibble()
Browse files Browse the repository at this point in the history
  • Loading branch information
krlmlr committed Jul 11, 2024
1 parent 7fa1894 commit 8466ce0
Show file tree
Hide file tree
Showing 10 changed files with 71 additions and 27 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ export(anti_join)
export(any_of)
export(arrange)
export(as_duckplyr_df)
export(as_duckplyr_tibble)
export(as_tibble)
export(between)
export(bind_cols)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
## Features

- `df_from_file()` and related functions support multiple files (#194, #195), show a clear error message for non-string `path` arguments (#182), and create a tibble by default (#177).
- New `as_duckplyr_tibble()` to convert a data frame to a duckplyr tibble (#177).
- Support descending sort for character and other non-numeric data (@toppyy, #92, #175).
- Avoid setting memory limit (#193).
- Check compatibility of join columns (#168, #185).
Expand Down
17 changes: 10 additions & 7 deletions R/as_duckplyr_df.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,23 @@
#' Convert to a duckplyr data frame
#'
#' For an object of class `duckplyr_df`,
#' @description
#' These functions convert a data-frame-like input to an object of class `"duckpylr_df"`.
#' For such objects,
#' dplyr verbs such as [mutate()], [select()] or [filter()] will attempt to use DuckDB.
#' If this is not possible, the original dplyr implementation is used.
#'
#' `as_duckplyr_df()` requires the input to be a plain data frame or a tibble,
#' and will fail for any other classes, including subclasses of `"data.frame"` or `"tbl_df"`.
#' This behavior is likely to change, do not rely on it.
#'
#' @details
#' Set the `DUCKPLYR_FALLBACK_INFO` and `DUCKPLYR_FORCE` environment variables
#' for more control over the behavior, see [config] for more details.
#'
#' @param .data data frame or tibble to transform
#'
#' @return An object of class `"duckplyr_df"`, inheriting from the classes of the
#' `.data` argument.
#' @return For `as_duckplyr_df()`, an object of class `"duckplyr_df"`,
#' inheriting from the classes of the `.data` argument.
#'
#' @export
#' @examples
Expand All @@ -36,7 +43,3 @@ as_duckplyr_df <- function(.data) {
class(.data) <- c("duckplyr_df", class(.data))
.data
}

default_df_class <- function() {
class(new_tibble(list()))
}
13 changes: 13 additions & 0 deletions R/as_duckplyr_tibble.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#' as_duckplyr_tibble
#'
#' `as_duckplyr_tibble()` converts the input to a tibble and then to a duckplyr data frame.
#'
#' @return For `as_duckplyr_df()`, an object of class
#' `c("duckplyr_df", class(tibble()))` .
#'
#' @rdname as_duckplyr_df
#' @export
as_duckplyr_tibble <- function(.data) {
# Extra as.data.frame() call for good measure and perhaps https://github.com/tidyverse/tibble/issues/1556
as_duckplyr_df(as_tibble(as.data.frame(.data)))
}
4 changes: 4 additions & 0 deletions R/io-.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,7 @@ duckplyr_df_from_file <- function(
out <- df_from_file(path, table_function, options = options, class = class)
as_duckplyr_df(out)
}

default_df_class <- function() {
class(new_tibble(list()))
}
14 changes: 7 additions & 7 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ conflict_prefer("filter", "dplyr")

There are two ways to use duckplyr.

1. To enable duckplyr for individual data frames, use `duckplyr::as_duckplyr_df()` as the first step in your pipe, without attaching the package.
1. By calling `library(duckplyr)`, it overwrites dplyr methods and is automatically enabled for the entire session without having to call `as_duckplyr_df()`. To turn this off, call `methods_restore()`.
1. To enable duckplyr for individual data frames, use `duckplyr::as_duckplyr_tibble()` as the first step in your pipe, without attaching the package.
1. By calling `library(duckplyr)`, it overwrites dplyr methods and is automatically enabled for the entire session without having to call `as_duckplyr_tibble()`. To turn this off, call `methods_restore()`.

The examples below illustrate both methods.
See also the companion [demo repository](https://github.com/Tmonster/duckplyr_demo) for a use case with a large dataset.
Expand All @@ -85,20 +85,20 @@ See also the companion [demo repository](https://github.com/Tmonster/duckplyr_de

This example illustrates usage of duckplyr for individual data frames.

Use `duckplyr::as_duckplyr_df()` to enable processing with duckdb:
Use `duckplyr::as_duckplyr_tibble()` to enable processing with duckdb:

```{r}
out <-
palmerpenguins::penguins %>%
# CAVEAT: factor columns are not supported yet
mutate(across(where(is.factor), as.character)) %>%
duckplyr::as_duckplyr_df() %>%
duckplyr::as_duckplyr_tibble() %>%
mutate(bill_area = bill_length_mm * bill_depth_mm) %>%
summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>%
filter(species != "Gentoo")
```

The result is a data frame or tibble, with its own class.
The result is a tibble, with its own class.

```{r}
class(out)
Expand Down Expand Up @@ -137,7 +137,7 @@ Use `library(duckplyr)` or `duckplyr::methods_overwrite()` to overwrite dplyr me
duckplyr::methods_overwrite()
```

This is the same query as above, without `as_duckplyr_df()`:
This is the same query as above, without `as_duckplyr_tibble()`:

```{r echo = FALSE}
Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 0)
Expand Down Expand Up @@ -206,7 +206,7 @@ Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = "")

```{r}
palmerpenguins::penguins %>%
duckplyr::as_duckplyr_df() %>%
duckplyr::as_duckplyr_tibble() %>%
transmute(bill_area = bill_length_mm * bill_depth_mm) %>%
head(3)
```
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,28 +41,28 @@ Or from [GitHub](https://github.com/) with:

There are two ways to use duckplyr.

1. To enable duckplyr for individual data frames, use [`duckplyr::as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html) as the first step in your pipe, without attaching the package.
2. By calling [`library(duckplyr)`](https://duckdblabs.github.io/duckplyr/), it overwrites dplyr methods and is automatically enabled for the entire session without having to call `as_duckplyr_df()`. To turn this off, call `methods_restore()`.
1. To enable duckplyr for individual data frames, use [`duckplyr::as_duckplyr_tibble()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_tibble.html) as the first step in your pipe, without attaching the package.
2. By calling [`library(duckplyr)`](https://duckdblabs.github.io/duckplyr/), it overwrites dplyr methods and is automatically enabled for the entire session without having to call `as_duckplyr_tibble()`. To turn this off, call `methods_restore()`.

The examples below illustrate both methods. See also the companion [demo repository](https://github.com/Tmonster/duckplyr_demo) for a use case with a large dataset.

### Usage for individual data frames

This example illustrates usage of duckplyr for individual data frames.

Use [`duckplyr::as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html) to enable processing with duckdb:
Use [`duckplyr::as_duckplyr_tibble()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_tibble.html) to enable processing with duckdb:

<pre class='chroma'>
<span><span class='nv'>out</span> <span class='o'>&lt;-</span></span>
<span> <span class='nf'>palmerpenguins</span><span class='nf'>::</span><span class='nv'><a href='https://allisonhorst.github.io/palmerpenguins/reference/penguins.html'>penguins</a></span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='c'># CAVEAT: factor columns are not supported yet</span></span>
<span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/where.html'>where</a></span><span class='o'>(</span><span class='nv'>is.factor</span><span class='o'>)</span>, <span class='nv'>as.character</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'>duckplyr</span><span class='nf'>::</span><span class='nf'><a href='https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html'>as_duckplyr_df</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'>duckplyr</span><span class='nf'>::</span><span class='nf'><a href='https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_tibble.html'>as_duckplyr_tibble</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>bill_area <span class='o'>=</span> <span class='nv'>bill_length_mm</span> <span class='o'>*</span> <span class='nv'>bill_depth_mm</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span>.by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>species</span>, <span class='nv'>sex</span><span class='o'>)</span>, mean_bill_area <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>bill_area</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>!=</span> <span class='s'>"Gentoo"</span><span class='o'>)</span></span></pre>

The result is a data frame or tibble, with its own class.
The result is a tibble, with its own class.

<pre class='chroma'>
<span><span class='nf'><a href='https://rdrr.io/r/base/class.html'>class</a></span><span class='o'>(</span><span class='nv'>out</span><span class='o'>)</span></span>
Expand Down Expand Up @@ -211,7 +211,7 @@ Use [`library(duckplyr)`](https://duckdblabs.github.io/duckplyr/) or [`duckplyr:
<span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Overwriting <span style='color: #0000BB;'>dplyr</span> methods with <span style='color: #0000BB;'>duckplyr</span> methods.</span></span>
<span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Turn off with `duckplyr::methods_restore()`.</span></span></pre>

This is the same query as above, without `as_duckplyr_df()`:
This is the same query as above, without `as_duckplyr_tibble()`:

<pre class='chroma'>
<span><span class='nv'>out</span> <span class='o'>&lt;-</span></span>
Expand Down Expand Up @@ -298,7 +298,7 @@ The first time the package encounters an unsupported function, data type, or ope

<pre class='chroma'>
<span><span class='nf'>palmerpenguins</span><span class='nf'>::</span><span class='nv'><a href='https://allisonhorst.github.io/palmerpenguins/reference/penguins.html'>penguins</a></span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'>duckplyr</span><span class='nf'>::</span><span class='nf'><a href='https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html'>as_duckplyr_df</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'>duckplyr</span><span class='nf'>::</span><span class='nf'><a href='https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_tibble.html'>as_duckplyr_tibble</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/transmute.html'>transmute</a></span><span class='o'>(</span>bill_area <span class='o'>=</span> <span class='nv'>bill_length_mm</span> <span class='o'>*</span> <span class='nv'>bill_depth_mm</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span>
<span> <span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='m'>3</span><span class='o'>)</span></span>
<span><span class='c'>#&gt; The <span style='color: #0000BB;'>duckplyr</span> package is configured to fall back to <span style='color: #0000BB;'>dplyr</span> when it encounters an</span></span>
Expand Down
21 changes: 17 additions & 4 deletions man/as_duckplyr_df.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions man/df_from_file.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions tests/testthat/test-as_duckplyr_tibble.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
test_that("as_duckplyr_tibble() works", {
expect_s3_class(as_duckplyr_tibble(tibble(a = 1)), "duckplyr_df")
expect_equal(class(as_duckplyr_tibble(tibble(a = 1))), c("duckplyr_df", class(tibble())))

expect_s3_class(as_duckplyr_tibble(data.frame(a = 1)), "duckplyr_df")
expect_equal(class(as_duckplyr_tibble(data.frame(a = 1))), c("duckplyr_df", class(tibble())))
})

0 comments on commit 8466ce0

Please sign in to comment.