Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: function for splitting / unflattening lists #1127

Open
prototaxites opened this issue May 14, 2024 · 1 comment
Open

Feature request: function for splitting / unflattening lists #1127

prototaxites opened this issue May 14, 2024 · 1 comment
Labels
feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@prototaxites
Copy link

prototaxites commented May 14, 2024

{purrr} provides list_flatten(), which takes a list and removes a single layer of hierarchy. However, it would be quite useful to be able to do the reverse, and "unflatten" a list or vector into a list of lists. While the exact inverse of a flatten operation is likely to be impossible, this could be usefully implemented by allowing the user to specify either some kind of grouping vector (a character string or numeric vector the same length as the list to "unflatten"), or a chunk size at which to aggregate.

This would be useful in cases when a user has a list or vector, and a function that is able to operate on subsets of that vector, rather than solely individual elements, and especially when there might be a useful speed gain operating over chunks but not over the whole vector at once.

For example, in my case: I have some vector x of column indices of a matrix, and want to do some matrix multiplication. I can do this with a single operation without using purrr, but for a large matrix this is also likely to be slow. I can also do this column-wise by mapping over x, but this can be slow depending on the number of columns. It would be useful to be able to split x into a list of equal-sized chunks to find an optimum chunk size for computation, before combining the final output with reduce. (note that in my case, the full computation is very slow as I am using rvar types from the {posterior} package rather than scalars)

library(purrr)

beta <- matrix(rnorm(100000, 0, 1), ncol = 10000)
mat <- matrix(runif(10000, 0, 1), ncol = 10)
x <- 1:10000

## single computation
dim(mat %*% beta)
# [1] 1000 10000

## completely split computation
dim(
  map(x, \(y) mat %*% beta[,y]) |> 
  reduce(cbind)
  )
# [1] 1000 10000

## chunked computation - chunks of size 20
chunk_size <- 20
z <- seq_along(x)
chunks <- split(x, ceiling(z/chunk_size))

dim(
  map(chunks, \(y) mat %*% beta[,y]) |> 
  reduce(cbind)
)
# [1] 1000 10000

The proposal would add something like the following:

## split into even-sized chunks
chunks <- list_split(x, n = 20)
# [[1]]
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
# 
# [[2]]
# [1] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
# 
# [[3]]
# [1] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

# some random grouping vector - this is similar to how base split() works now
group_vec <- sample(letters[1:3], 100, replace = TRUE)
chunks <- list_split(x, groups = group_vec)
# $a
# [1]  10  12  16  28  34  41  51  52  60  65  68  70  71  72  73  78  83  90...
# 
# $b
# [1]  1  2  3  4  5  8 11 15 17 19 23 29 33 35 38 40 42 43 44 47 49 54 55...
# 
# $c
# [1]  6  7  9 13 14 18 20 21 22 24 25 26 27 30 31 32 36 37 39 45 46 48...


map(chunks, some_function)

I see a similar function was proposed and closed here: #274, but I think this proposal differs in that it specifically is about splitting/unflattening lists rather than dataframe rows.

@hadley
Copy link
Member

hadley commented Jul 15, 2024

I think that the implementation of this would be relatively straightforward since you could use vec_chop() — you'd just have to figure out how to generate the right vector of index. And it'll require some thinking about the interface, since you might want to provide the number of groups, the group size, or an actual vector of group ids.

@hadley hadley added feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day labels Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

No branches or pull requests

2 participants