forked from tidyverse/design
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcompound-arguments.qmd
108 lines (78 loc) · 5.79 KB
/
compound-arguments.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# Compound arguments {#sec-compound-arguments}
```{r}
#| include = FALSE
source("common.R")
```
## What's the pattern?
Sometimes it's useful to take the same data as either one complex argument (e.g. a matrix or data frame) or as multiple simple arguments.
I think the most compelling reason to use this pattern is when another function might be called directly by a user (who will supply individual arguments) or with the output from another function (which needs to pour into a single argument).
For example, it seems reasonable that you should be able to feed the output of `str_locate()` directly into `str_sub()`:
```{r}
library(stringr)
x <- c("aaaaab", "aaab", "ccccb")
loc <- str_locate(x, "a+b")
str_sub(x, loc)
```
But equally, it's nice to be able to supply individual start and end values if known:
```{r}
str_sub("Hadley", start = 2, end = 4)
```
So `str_sub()` allows either individual vectors supplied to `start` and `end`, or a two-column matrix supplied to `start`.
The main challenge with using this pattern is that there's no way to make the pattern clear from the function signature alone, so you need to carefully document and generate error messages.
Because this is a very rare pattern, there are no helpers available so you'll have to think about much of it yourself.
I've included this pattern mostly as an example of a something that we have only partly thought through, so that you can get some sense of the questions that we ask when developing something new.
## What are some examples?
- `rgb(cbind(r, g, b))` is equivalent to `rgb(r, g, b)`.
This is useful because what `rgb()` really operates on is colours, as specified by their red, green, and blue components.
When working interactively it's useful to set the components individually, but when operating on colour with functions, it's nice to have a single object.
- `options(list(a = 1, b = 2))` is equivalent to `options(a = 1, b = 2)`.
This is half of very useful pattern.
The other half of that pattern is that `options()` returns the previous value of any options that you set.
That means you can do `old <- options(…); options(old)` to temporarily set then reset options.
- `withr::local_options()` and `withr::local_envvar()` work similarly: you can either supply a single list of values, or individually named values.
- `stringr::str_sub(x, cbind(start, end))` is equivalent to `str_sub(x, start, end)`.
I'm not sure if it would be better to accept a data frame or a matrix.
## Open questions
This pattern is only used in a couple of places, so I have a lot of questions still:
- Is it better to use a matrix (simpler) or a data frame (more common)?
- Should you respect the names of the matrix/data.frame? Should you require that the names match the argument names? For matrices, you could do this only if the names are present, but data frames require names.
- Is it better to branch on the first argument being complex (like `str_sub()`) or the additional arguments being missing (like `rgb()`)?
- Is this pattern even worth it? Would it be clearer to just have a pair of functions? Or always require that the user use the complex form? Is this just a trick that I think is cool but very few other people would every discover?
## `dplyr::bind_rows()` and `!!!`
Another place that this pattern crops up is in `dplyr::bind_rows()`.
When binding rows together, it's equally useful to bind a few named data frames as it is to bind a list of data frames that come from map or similar.
In base R you need to know about `do.call(rbind, mylist)` which is a relatively sophisticated pattern.
So in dplyr we tried to make `bind_rows()` automatically figure out if you were in situation one or two.
Unfortunately, it turns out to be really hard to tell which of the situations you are in, so dplyr implemented heuristics that work most of the time, but occasionally it fails in surprising ways.
Now we have generally steered away from interfaces that try to automatically "unsplice" their inputs and instead require that you use `!!!` to explicitly unsplice.
This is has some advantages and disadvantages: it's an interface that's becoming increasingly common in the tidyverse (and we have a good convention for documenting it with the `<dynamic-dots>` tag), but it's still relatively rare and is an advanced technique that we don't expect everyone to learn.
That's why for this important case, we also have `purrr::list_cbind()`.
But it means that functions like `purrr::hoist()`, `forcats::fct_cross()`, and `rvest::html_form()` which are less commonly given lists have a clearly documented escape hatch that doesn't require another different function.
(And of course if you understand the `do.call` pattern you can still use that too).
## How do I use the pattern?
To implement in your own functions, you should branch on the type of the first argument and then check that the others aren't supplied.
```{r}
str_sub <- function(string, start, end) {
if (is.matrix(start)) {
if (!missing(end)) {
stop("`end` must be missing when `start` is a matrix", call. = FALSE)
}
if (ncol(start) != 2) {
stop("Matrix `start` must have exactly two columns", call. = FALSE)
}
stri_sub(string, from = start[, 1], to = start[, 2])
} else {
stri_sub(string, from = start, to = end)
}
}
```
And make it clear in the documentation:
```{r}
#' @param start,end Integer vectors giving the `start` (default: first)
#' and `end` (default: last) positions, inclusively.
#'
#' Alternatively, you pass a two-column matrix to `start`, i.e.
#' `str_sub(x, start, end)` is equivalent to
#' `str_sub(x, cbind(start, end))`
```
(If you look at `string::str_sub()` you'll notice that `start` and `end` do have defaults; I think this is a mistake because `start` and `end` are important enough that the user should always be forced to supply them.)