forked from tidyverse/design
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdefaults-short-and-sweet.qmd
155 lines (112 loc) · 5.97 KB
/
defaults-short-and-sweet.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# Keep defaults short and sweet {#sec-defaults-short-and-sweet}
```{r}
#| include = FALSE
source("common.R")
```
```{r}
#| eval = FALSE,
#| include = FALSE
source("fun_def.R")
funs <- c(pkg_funs("base"), pkg_funs("stats"))
arg_length <- function(x) map_int(x$formals, ~ nchar(expr_text(.x)))
args <- map(funs, arg_length)
args_max <- map_dbl(args, ~ if (length(.x) == 0) 0 else max(.x))
funs[args_max > 50] %>% discard(~ grepl("as.data.frame", .x$name, fixed = TRUE))
```
## What's the pattern?
Default values should be short and sweet.
Avoid large or complex calculations in the default values, instead using `NULL` or helper function to where more calculation is needed.
This keeps the function specification focussed on the big picture (i.e. what are the arguments and are they required or not) rather than the details of the defaults.
## What are some examples?
The following examples, drawn from base R, illustrate some functions that don't follow this pattern:
- `sample.int()` uses a complicated rule to determine whether or not to use a faster hash based method that's only applicable in some circumstances: `useHash = (!replace && is.null(prob) && size <= n/2 && n > 1e+07))`
- `exists()`, which figures out if a variable exists in a given environment, uses a complex default to determine which environment to look in if not specifically provided: `envir = (if (missing(frame)) as.environment(where) else sys.frame(frame))` (NB: `?exists` cheats and hides the long default in the documentation.)
- `reshape()` has a very long default argument: the `split` argument is one of two possible lists depending on the value of the `sep` argument:
```{r}
#| eval = FALSE
reshape <- function(
...,
split = if (sep == "") {
list(regexp = "[A-Za-z][0-9]", include = TRUE)
} else {
list(regexp = sep, include = FALSE, fixed = TRUE)
}
) {}
```
## How do I use it?
There are three approaches:
- Set the default value to `NULL` and calculate the default only when the argument is `NULL`.
Providing a default of `NULL` signals that the argument is optional (@sec-required-no-defaults) but that the default requires some calculation.
- If the calculation is complex, and the user might find it useful in other scenarios, compute it with an exported function that documents exactly what happens.
- If `NULL` is meaningful, so you can't use the first approach, use a "sentinel" object instead.
### `NULL` default {#sec-arg-short-null}
The most common approach is to use `NULL` as a sentinel value that indicates that the argument is optional, but non-trivial.
This pattern is made substantially more elegant with the infix `%||%` operator.
You can either get it by importing it from rlang, or copying and pasting it in to your `utils.R`:
```{r}
`%||%` <- function(x, y) if (is.null(x)) y else x
```
This allows you to write code like this (extracted from `ggplot2::geom_bar()`).
It computes the width by first looking at the data, then in the parameters, finally falling back to computing it from the resolution of the `x` variable:
```{r}
#| eval = FALSE
width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
```
Or this code from the colourbar legend: it finds the horizontal justification by first looking in the guide settings, then in the specific theme setting, then then title element, finally using `0` if nothing else is set:
```{r}
#| eval = FALSE
title.hjust <- guide$title.hjust %||%
theme$legend.title.align %||%
title.theme$hjust %||%
0
```
As you can see, `%||%` is particularly well suited to arguments where the default value is found through a cascading system of fallbacks.
Don't use `%||%` for more complex examples where the individual clauses can't fit on their own line.
For example in `reshape()` I would set `split = NULL` and then write:
```{r}
#| eval = FALSE
if (is.null(split)) {
if (sep == "") {
split <- list(regexp = "[A-Za-z][0-9]", include = TRUE)
} else {
split <- list(regexp = sep, include = FALSE, fixed = TRUE)
}
}
```
### Helper function
For more complicated cases, you'll probably want to pull the code that computes the default out into a separate function, and in many cases you'll want to export (and document) the function.
A good example of this pattern is `readr::show_progress()`: it's used in every `read_` function in readr and it's sufficiently complicated that you don't want to copy and paste it between functions.
It's also nice to document it in its own file, rather than cluttering up file reading functions with incidental details.
### Sentinel value {#sec-args-default-sentinel}
Sometimes a default argument has a complex calculation that you don't want to include in arguments list.
You'd normally use `NULL` to indicate that it's calculated by default, but `NULL` is a meaningful option.
In that case, you can use a **sentinel** object.
```{r}
str(ggplot2::waiver())
str(purrr::done())
str(rlang::zap())
```
Take `purrr::reduce()`: it has an optional details argument called `init`.
When supplied, it serves as the initial value for the computation.
But any value (including `NULL`) can a valid value.
And using a sentinel value for this one case seemed like overkill.
## How do I remediate existing problems?
If you have a function with a long default, you can use any of the three approaches above to remediate it.
As long as you don't accidentally change the default value, this does not affect the function interface.
Make sure you have a test for the default operation of the function before embarking on this change.
```{r}
# BEFORE
sample.int <- function(n,
size = n,
replace = FALSE,
prob = NULL,
useHash = (!replace && is.null(prob) && size <= n/2 && n > 1e+07)
) {
...
}
# AFTER
sample.int <- function(n, size = n, replace = FALSE, prob = NULL, useHash = NULL) {
useHash <- useHash %||% (!replace && is.null(prob) && size <= n/2 && n > 1e+07)
...
}
```