Skip to content

Can ggplot2 have a Stat that simply summarises data by group? #3501

Closed
@yutannihilation

Description

@yutannihilation

Every time I encounter a question like #3497, I wonder why ggplot2 doesn't have a Stat that simply applies a function by group. Though, in terms of the computational efficiency, it's generally better to have a summarised version of the data before entering ggplot2, it would be handy if we can summarise in ggplot2 especially when we generate plots one after another with different groupings.

I believe StatSummary could have been implemented to be able to summarise data with other groupings than c("group", "x") because the code following seems very general one:

ggplot2/R/stat-summary.r

Lines 163 to 169 in b842024

summarise_by_x <- function(data, summary, ...) {
summary <- dapply(data, c("group", "x"), summary, ...)
unique <- dapply(data, c("group", "x"), uniquecols)
unique$y <- NULL
merge(summary, unique, by = c("x", "group"), sort = FALSE)
}

But, as the current make_summary_fun() expects a function that takes a vector, not a data.frame, it would be difficult to expand StatSummary to accept a function that summarises both x and y. So, to satisfy the need, I feel it might be nice to have some simple geom like below.

I don't see reasons why we shouldn't implement such a Stat. Am I missing something...?

library(ggplot2)

stat_summary_by_group <- function(mapping = NULL, data = NULL,
                                  geom = "pointrange", position = "identity",
                                  ...,
                                  fun.data = NULL,
                                  na.rm = FALSE,
                                  show.legend = NA,
                                  inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatSummaryByGroup,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      fun.data = fun.data,
      na.rm = na.rm,
      ...
    )
  )
}

StatSummaryByGroup <- ggproto("StatSummaryByGroup", Stat,
  compute_group = function(data, scales, fun.data = NULL, na.rm = FALSE) {
    summary <- fun.data(data)
    unique <- ggplot2:::dapply(data, c("group"), ggplot2:::uniquecols)
    unique[names(summary)] <- summary
    unique
  }
)

d <- data.frame(x = c(1:5, 3:7), y = 1:10, g = rep(c("a", "b"), each = 5), stringsAsFactors = FALSE)
f <- function(d) {
  data.frame(x = min(d$x), xend = max(d$x), y = mean(d$y), yend = mean(d$y))
}

ggplot(d) +
  geom_point(aes(x, y, colour = g)) +
  stat_summary_by_group(fun.data = f, aes(x, y, xend = stat(xend), yend = stat(yend)), geom = "segment") +
  facet_grid(cols = vars(g))

Created on 2019-08-24 by the reprex package (v0.3.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions