Skip to content

Commit

Permalink
Rename chapters in scannable definitions section
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jul 25, 2023
1 parent 07457af commit 9acdeb8
Show file tree
Hide file tree
Showing 16 changed files with 87 additions and 30 deletions.
14 changes: 7 additions & 7 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ book:

- part: Scannable definitions
chapters:
- args-data-details.qmd
- args-hidden.qmd
- dots-position.qmd
- def-required.qmd
- args-independence.qmd
- def-enum.qmd
- def-short.qmd
- important-args-first.qmd
- inputs-explicit.qmd
- dots-after-required.qmd
- required-no-defaults.qmd
- arguments-independent.qmd
- enumerate-options.qmd
- defaults-short-and-sweet.qmd
- cs-setNames.qmd
- cs-stringr.qmd

Expand Down
2 changes: 1 addition & 1 deletion args-independence.qmd → arguments-independent.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Keep arguments independent {#sec-args-independence}
# Keep arguments independent {#sec-arguments-independent}

```{r}
#| include = FALSE
Expand Down
2 changes: 1 addition & 1 deletion cs-setNames.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ It was defined this way to make it possible to name a character vector with itse
setNames(nm = c("apple", "banana", "cake"))
```

But that decision leads to a function signature that violates one of the principles of @sec-args-data-details: a required argument comes after an optional argument.
But that decision leads to a function signature that violates one of the principles of @sec-important-args-first: a required argument comes after an optional argument.
Fortunately, we can fix this easily and still preserve the useful ability to name a vector with itself:

```{r}
Expand Down
2 changes: 1 addition & 1 deletion cs-stringr.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Part of this problem could be resolved by making it more clear that this functio
- Perl-style regular expressions (`perl = TRUE`)
- Fixed matching (`fixed = TRUE`).

One way to make this more clear would be to use @sec-def-enum and create a new argument called something like `engine = c("POSIX", "perl", "fixed")`.
One way to make this more clear would be to use @sec-enumerate-options and create a new argument called something like `engine = c("POSIX", "perl", "fixed")`.

The other problem is that `ignore.case` only works with two of the three engines: POSIX and perl.
This is hard to remedy without creating a completely new matching engine for fixed case, which is particularly hard because different languages have different rules for case.
Expand Down
2 changes: 1 addition & 1 deletion def-inform.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ library(dplyr, warn.conflicts = FALSE)

If a default value is important, and the computation is non-trivial, inform the user what value was used.
This is particularly important when the default value is an educated guess, and you want the user to change it.
It is also important when descriptor arguments (@sec-args-data-details)) have defaults.
It is also important when descriptor arguments (@sec-important-args-first)) have defaults.

## What are some examples?

Expand Down
2 changes: 1 addition & 1 deletion def-magical.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ This problem is generally easy to avoid for new functions:
## How do I remediate the problem?
If you have a made a mistake in an older function you can remediate it by using a `NULL` default, as described in @sec-def-short).
If you have a made a mistake in an older function you can remediate it by using a `NULL` default, as described in @sec-defaults-short-and-sweet).
If the problem is caused by an unexported function, you can also choose to document and export it.
Remediating this problem shouldn't break existing code, because it expands the function interface: all previous code will continue to work, and the function will also work if the argument is passed `NULL` input (which probably didn't previously).
Expand Down
2 changes: 1 addition & 1 deletion def-user.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The two primary uses are for controlling the appearance of output, particularly

Related patterns:

- If a global option affects the results of the computation (not just its side-effects), you have an example of @sec-args-hidden.
- If a global option affects the results of the computation (not just its side-effects), you have an example of @sec-inputs-explicit.

## What are some examples?

Expand Down
4 changes: 2 additions & 2 deletions def-short.qmd → defaults-short-and-sweet.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Keep defaults short and sweet {#sec-def-short}
# Keep defaults short and sweet {#sec-defaults-short-and-sweet}

```{r}
#| include = FALSE
Expand Down Expand Up @@ -52,7 +52,7 @@ The following examples, drawn from base R, illustrate some functions that don't
There are three approaches:
- Set the default value to `NULL` and calculate the default only when the argument is `NULL`.
Providing a default of `NULL` signals that the argument is optional (@sec-def-required) but that the default requires some calculation.
Providing a default of `NULL` signals that the argument is optional (@sec-required-no-defaults) but that the default requires some calculation.
- If the calculation is complex, and the user might find it useful in other scenarios, compute it with an exported function that documents exactly what happens.
Expand Down
4 changes: 2 additions & 2 deletions dots-position.qmd → dots-after-required.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Put `` before optional arguments {#sec-dots-position}
# Put `` after required arguments {#sec-dots-after-required}

```{r}
#| include = FALSE
Expand Down Expand Up @@ -40,7 +40,7 @@ Not only does allow confusing code[^dots-position-1], it also makes it hard to l

If `mean()` instead placed `` before `trim` and `na.rm`, like `mean2()`[^dots-position-2] below, then you must fully name each argument:

[^dots-position-2]: Note that I moved `na.rm = TRUE` in front of `trim` because I believe `na.rm` is the more important argument because it's used vastly more often than `trim` and I'm following @sec-args-data-details.
[^dots-position-2]: Note that I moved `na.rm = TRUE` in front of `trim` because I believe `na.rm` is the more important argument because it's used vastly more often than `trim` and I'm following @sec-important-args-first.

```{r}
mean2 <- function(x, ..., na.rm = FALSE, trim = 0) {
Expand Down
2 changes: 1 addition & 1 deletion dots-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ In general, I think it is best to avoid using `...` for this purpose because it
mean(1, 2, 3)
```
(See Chapter @sec-dots-position to learn why this doesn't give an error message.)
(See Chapter @sec-dots-after-required to learn why this doesn't give an error message.)
- It makes it harder to adapt the function for new uses.
For example, `fct_relevel()` can also be called with a function:
Expand Down
2 changes: 1 addition & 1 deletion dots-inspect.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ str_sort(x, numeric = TRUE)
This is wrapper is useful because it decouples `str_sort()` from the `stri_opts_collator()` meaning that if `stri_opts_collator()` gains new arguments users of `str_sort()` can take advantage of them immediately.
But most of the arguments in `stri_opts_collator()` are sufficiently arcane that they don't need to be exposed directly in stringr, which is designed to minimise the cognitive load of the user, by hiding some of the full complexity of string handling.

(The importance of the `locale` argument comes up in "hidden inputs", @sec-args-hidden.)
(The importance of the `locale` argument comes up in "hidden inputs", @sec-inputs-explicit.)

However, `stri_opts_collator()` deliberately ignores any arguments in `...`.
This means that misspellings are silently ignored:
Expand Down
4 changes: 2 additions & 2 deletions def-enum.qmd → enumerate-options.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Enumerate possible options {#sec-def-enum}
# Enumerate possible options {#sec-enumerate-options}

```{r}
#| include = FALSE
Expand Down Expand Up @@ -115,7 +115,7 @@ rank2(x, ties.method = "r")

This technique is a best used when the set of possible values is short.
You can see that it's already getting unwieldy in `rank()`.
If you have a long list of possibilities, there are two options that you could use from @sec-def-short.
If you have a long list of possibilities, there are two options that you could use from @sec-defaults-short-and-sweet.
Unfortunately both approaches have major downsides:

- Set a single default and supply the possible values to `match.arg()`:
Expand Down
57 changes: 57 additions & 0 deletions important-args-first.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Put the most important arguments first {#sec-important-args-first}

```{r}
#| include = FALSE
source("common.R")
```

## What's the pattern?

In a function call, the most important arguments should come first.
How can you tell what arguments are most important?
That's largely a judgement call that you'll need to make based on your beliefs about which arguments will be used most commonly.
However, there are a few general principles:

- If the output is a transformation of an input (e.g. `log()`, `stringr::str_replace()`, `dplyr::left_join()`) that's the most important argument.
- Other arguments that determine the type or shape of the output are typically very important.
- Optional arguments (i.e. arguments with a default) are the least important, and should come last.
- If the function uses ``, the optional arguments should come after ``; see @sec-dots-after-required for more details.

This convention makes it easy to understand the structure of a function at a glance: the more important an argument is, the earlier you see it.
When the output is very strongly tied to an input, putting first also ensures that you function works well with the pipe, leading to code that focuses on the transformations rather than the object being transformed.
I believe that ensuring the first argument is always the object being transformed is helps make stringr and purrr functions easier to learn than their base equivalents.

## What are some examples?

The vast majority of functions get this right, so we'll pick on a few examples which I think get it wrong:

- I think the arguments to base R string functions (`grepl()`, `gsub()`, etc) are in the wrong order because they consistently make the regular expression (`pattern`) the first argument, rather than the character vector being manipulated.
I think the character vector is more important because it's the argument the fundamentally determines the size of the output.

- The first two arguments to `lm()` are `formula` and `data`.
I'd argue that `data` should be the first argument; even though it doesn't affect the shape of the output (which is always an lm S3 object), it affects the shape of many important functions like `predict()`.
However, the designers of `lm()` wanted `data` to be optional, so you could still fit models even if you hadn't collected the individual variables into a data frame.
Because `formula` is required and `data` is not, `formula` must come first.

- The first two arguments to `ggplot()` are `data` and `mapping`.
Both data and mapping are required for every plot, so why make `data` first?
I picked this ordering because in most plots there's one dataset shared across all layers and only the mapping changes.

It's worth noting the layer functions, like `geom_point()`, flip the order of these arguments, because in an individual layer you're more likely to specify `mapping` than `data`, and in many cases if you do specify `data` you'll want `mapping` as well.
This makes these the argument order inconsistent with `ggplot()`, but I think time has shown it to be a reasonable design decision.

- ggplot2 functions work by creating some object that's then added on to a plot object, so the plot, which is arguably the most important argument, is not used at all.
ggplot2 works this way in part because it was invented before the pipe was discovered, and the best way I came up to write plots from left to right was to rely on `+` (so-called operator overloading).
As an interesting historical fact, ggplot (the precursor to ggplot2) actually works great with the pipe, and a couple of years ago I bought it back to life as [ggplot1](https://github.com/hadley/ggplot1).

## How do I remediate past mistakes?

Generally, it is not possible to remediate an existing exported function with this problem.
Typically, you will need to perform major surgery on the function arguments, and this will convey different conventions about which arguments should be named.
This implies that you should deprecate the entire existing function and replace it with a new alternative.
Because this is invasive to the user, it's best to do sparingly: if the mistake is minor, you're better off waiting until you've collected other problems before fixing it.

For example, take `tidyr::gather()`.
It has a number of problems with its design that made them hard to use.
Relevant to this chapter is that the argument order is wrong, because you almost always want to specify which variables to gather, which is the fourth argument, not the second (after the `data`).
Because it wasn't possible to easily fix this mistake, we accumulated other `gather()` problems for several years before fixing them all at once in `pivot_longer()`.
2 changes: 1 addition & 1 deletion args-hidden.qmd → inputs-explicit.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Make inputs explicit {#sec-args-hidden}
# Make inputs explicit {#sec-inputs-explicit}

```{r}
#| include = FALSE
Expand Down
2 changes: 1 addition & 1 deletion def-required.qmd → required-no-defaults.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Required args don't have defaults {#sec-def-required}
# Required args don't have defaults {#sec-required-no-defaults}

```{r}
#| include = FALSE
Expand Down
14 changes: 7 additions & 7 deletions structure.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,23 @@ A function's interface describes how, independent of its internal implementation

### Inputs

- @sec-args-hidden: All inputs to a function should be explicit arguments.
- @sec-inputs-explicit: All inputs to a function should be explicit arguments.
Avoid functions that suprise the user by returning different results when the inputs look the same.

- @sec-args-data-details: Required arguments should come before optional arguments.
- @sec-important-args-first: Required arguments should come before optional arguments.

- @sec-args-independence: Make arguments as orthogonal as possible.
- @sec-arguments-independent: Make arguments as orthogonal as possible.
Avoid complex interdependencies.

#### Default values

- @sec-def-required: the absence of a default value should indicate that an argument is required; the presence of a default value should indicate that an argument is optional.
- @sec-required-no-defaults: the absence of a default value should indicate that an argument is required; the presence of a default value should indicate that an argument is optional.

- @sec-def-enum: If a details argument can take one of a fixed set of possible strings, record them in the default value and use `match.arg()` or `rlang::arg_match()` inside the function.
- @sec-enumerate-options: If a details argument can take one of a fixed set of possible strings, record them in the default value and use `match.arg()` or `rlang::arg_match()` inside the function.

- @sec-def-magical: Default values should return the same answer when set directly.

- @sec-def-short: Default values should be short.
- @sec-defaults-short-and-sweet: Default values should be short.
If you have a complex calculation, either use `NULL` or an exported function.

- @sec-def-inform: If a default value is particularly important, as has non-trivial calculation, let the user know what it is.
Expand All @@ -33,7 +33,7 @@ A function's interface describes how, independent of its internal implementation

#### Dots

- @sec-dots-position: `...` should be placed between the data and details arguments.
- @sec-dots-after-required: `...` should be placed between the data and details arguments.

- @sec-dots-data: don't use `...` just to save the user from typing `c()` (unless the function is purely for data structure creation).

Expand Down

0 comments on commit 9acdeb8

Please sign in to comment.