Implement DISTINCT ON for dbplyr backend

Currently, if `distinct` is used with `.keep_all = TRUE`, `ROW_NUMBER` is used. Which is normal `dbplyr` behavior for generics. But in my daily uses, I found `DISTINCT ON` is much faster than filtering a window function of a subquery.

```
library(duckdb)
#> Loading required package: DBI
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
conn1 <- DBI::dbConnect(duckdb())

duckdb_register(conn1, "iris", iris)
tbl_iris <- tbl(conn1, "iris")

# Current behavior
tbl_iris |> 
  distinct(
    Petal.Width, 
    Species,
    .keep_all = TRUE
  ) |> 
  show_query()
#> <SQL>
#> SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", Species
#> FROM (
#>   SELECT
#>     iris.*,
#>     ROW_NUMBER() OVER (PARTITION BY "Petal.Width", Species ORDER BY "Sepal.Length") AS col01
#>   FROM iris
#> ) q01
#> WHERE (col01 = 1)

dbDisconnect(conn1)
rm(conn1, tbl_iris)
```
Created on 2024-09-18 with [reprex v2.1.1](https://reprex.tidyverse.org/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement DISTINCT ON for dbplyr backend #384

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement DISTINCT ON for dbplyr backend #384

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions