You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes the strategy will be tangled in with many other arguments, or they might be multiple strategies used simultaneously.
In these situations you want to avoid creating a combinatorial explosion of functions, and instead might want to use a strategy object.
For example, generating the bins for a histogram is a surprisingly complex topic. ggplot2::stat_bin(), which powers ggplot2::geom_histogram(), has a total of 5 arguments that control where the bins are placed:
You can supply either binwidth or bins to specify either the width or the number of evenly spaced bins. Alternatively, you supply breaks to specify the exact bin locations yourself (which allows you to create unevenly sized bins1).
If you use binwidth or bins, you're specifying the width of each bin, but not where the bins start. So additionally you can use either boundary or center2 to specify the location of a side (boundary) or the middle (center) of a bin3. boundary and center are mutually exclusive; you can only specify one (see @sec-mutually-exclusive for more).
Regardless of the way that you specify the locations of the bins, you need to choose where a bin from a to b, is [a, b) or (a, b], which is the job of the closed argument.
One way to resolve this problem would encapsulate the three basic strategies into three functions:
bin_width(width, center, boundary, closed)
bin_number(bins, center, boundary, closed)
bin_breaks(breaks, closed)
That immediately makes the relationship between the arguments and the strategies more clear.
Note that these functions create "strategies"; i.e. they don't take the data needed to actual perform the operation --- none of these functions take range of the data.
This makes these functions function factories, which is a relatively complex technique.
bin_width <- function(width, center, boundary, closed = c("left", "right")) {
# https://adv-r.hadley.nz/function-factories.html#forcing-evaluation
list(width, center, boundary, closed)
function(range) {
}
}
As in @sec-argument-clutter, you may want to give these functions custom classes so that the function that uses them can provide better error messages if the user supplies the wrong type of object.
Alternatively, you might want to just check that the input is a function with the correct formals; that allows the user to supply their own strategy function.
It's probably something that few people will take advantage of, but it's a nice escape hatch.
Footnotes
One nice application of this principle is to create a histogram where each bin contains (approximately) the same number of points, as implemented in https://github.com/eliocamp/ggpercentogram/. ↩
center is also a little problematic as an argument name, because UK English would prefer centre.
It's probably ok here since this it's a very rarely used argument, but middle would be good alternatives that don't have the same US/UK problem.
Alternatively the pair could be endpoint and midpoint which perhaps suggest a tighter pairing than center and boundary. ↩
It can be any bin; stat_bin() will automatically adjust all the other bins. ↩
The text was updated successfully, but these errors were encountered:
Sometimes the strategy will be tangled in with many other arguments, or they might be multiple strategies used simultaneously.
In these situations you want to avoid creating a combinatorial explosion of functions, and instead might want to use a strategy object.
For example, generating the bins for a histogram is a surprisingly complex topic.
ggplot2::stat_bin()
, which powersggplot2::geom_histogram()
, has a total of 5 arguments that control where the bins are placed:binwidth
orbins
to specify either the width or the number of evenly spaced bins. Alternatively, you supplybreaks
to specify the exact bin locations yourself (which allows you to create unevenly sized bins1).binwidth
orbins
, you're specifying the width of each bin, but not where the bins start. So additionally you can use eitherboundary
orcenter
2 to specify the location of a side (boundary
) or the middle (center
) of a bin3.boundary
andcenter
are mutually exclusive; you can only specify one (see @sec-mutually-exclusive for more).a
tob
, is[a, b)
or(a, b]
, which is the job of theclosed
argument.One way to resolve this problem would encapsulate the three basic strategies into three functions:
bin_width(width, center, boundary, closed)
bin_number(bins, center, boundary, closed)
bin_breaks(breaks, closed)
That immediately makes the relationship between the arguments and the strategies more clear.
Note that these functions create "strategies"; i.e. they don't take the data needed to actual perform the operation --- none of these functions take range of the data.
This makes these functions function factories, which is a relatively complex technique.
As in @sec-argument-clutter, you may want to give these functions custom classes so that the function that uses them can provide better error messages if the user supplies the wrong type of object.
Alternatively, you might want to just check that the input is a function with the correct formals; that allows the user to supply their own strategy function.
It's probably something that few people will take advantage of, but it's a nice escape hatch.
Footnotes
One nice application of this principle is to create a histogram where each bin contains (approximately) the same number of points, as implemented in https://github.com/eliocamp/ggpercentogram/. ↩
center
is also a little problematic as an argument name, because UK English would prefercentre
.It's probably ok here since this it's a very rarely used argument, but
middle
would be good alternatives that don't have the same US/UK problem.Alternatively the pair could be
endpoint
andmidpoint
which perhaps suggest a tighter pairing thancenter
andboundary
. ↩It can be any bin;
stat_bin()
will automatically adjust all the other bins. ↩The text was updated successfully, but these errors were encountered: