Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement distinct #117

Open
PaulinaUrban opened this issue May 20, 2021 · 9 comments
Open

Implement distinct #117

PaulinaUrban opened this issue May 20, 2021 · 9 comments
Labels
feature a feature request or enhancement

Comments

@PaulinaUrban
Copy link

PaulinaUrban commented May 20, 2021

I have a problem with calculations on few cores using multidplyr in R. I have a data to which i give a number (data will be grouped by number and data with number 1 will be sens to cluster 1 etc.) like in code below:


group <- rep(1:cores, length.out = nrow(dane))

dane <- bind_cols(tibble(group), dane)

cluster <- multidplyr::new_cluster(cores)

dane <-
  dane %>%
  group_by(group) %>%
  partition(cluster) 

Also, I send to each cluster which will be calculating library, other values and functions.

After data is split and send to cluster I want to start calculations and collect results:

dane %>% select() %>% distinct() %>% ...

but unfortunatelly I have this error and I don't know what to do to solve this problem [instead of distinct(), I use unique but other error show.]

"Error in command 'UseMethod ("distinct")': inapplicable method for 'distinct' applied to the class object "multidplyr_party_df""

@hadley
Copy link
Member

hadley commented May 20, 2021

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@PaulinaUrban
Copy link
Author

PaulinaUrban commented May 21, 2021

Dear Hadley,
unfortunatelly reprex() gives strange errors when I try to make example code

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
numCores <- detectCores()
#> Error in detectCores(): nie udało się znaleźć funkcji 'detectCores'
cores <- numCores - 4
#> Error in eval(expr, envir, enclos): nie znaleziono obiektu 'numCores'
group <- rep(1:cores, length.out = nrow(flights))
#> Error in eval(expr, envir, enclos): nie znaleziono obiektu 'cores'
flights <- bind_cols(tibble(group), flights)
#> Error in eval_tidy(xs[[j]], mask): nie znaleziono obiektu 'group'
cluster <- multidplyr::new_cluster(cores)
#> Error in integer(n): nie znaleziono obiektu 'cores'
View(flights)
flights <-
+     flights %>%
+     group_by(group) %>%
+     partition(cluster) 
#> Error in FUN(left): niepoprawny argument przekazany do operatora jednoargumentowego
cluster_library(cluster,"tidyverse")
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
cluster_library(cluster,"tidytext")
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
cluster_library(cluster,"dplyr")
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
cluster_copy(cluster, 'flights')
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
flights <-
+     flights %>%
+     select(contains("dest"), everything()) %>%
+     select(`ID`=1, group = 2, abstract=3) %>%
+     distinct() 
#> Error in FUN(left): niepoprawny argument przekazany do operatora jednoargumentowego

So I paste normal code with data which is available for everyone (from package nycflights13) and gives the same error as in my situation:

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)

numCores <- detectCores()
cores <- numCores - 4
group <- rep(1:cores, length.out = nrow(flights))
flights <- bind_cols(tibble(group), flights)
cluster <- multidplyr::new_cluster(cores)

flights <-
  flights %>%
  group_by(group) %>%
  partition(cluster) 

cluster_library(cluster,"tidyverse")
cluster_library(cluster,"tidytext")
cluster_library(cluster,"dplyr")
cluster_copy(cluster, 'flights')

flights <-
  flights %>%
  select(contains("dest"), everything()) %>%
  select(`dest`=1, group = 2, origin=3) %>%
  distinct() %>%
  collect()```

When You put this code into Rstudio console and run it You will have error like this: Error in command 'UseMethod ("distinct")': inapplicable method for 'distinct' applied to the class object "multidplyr_party_df"

@hadley
Copy link
Member

hadley commented May 21, 2021

Here is a minimal reprex:

library(multidplyr)
library(dplyr, warn.conflicts = FALSE)

cluster <- multidplyr::new_cluster(2)

mtcars2 <- partition(mtcars, cluster)
mtcars2 %>% distinct()
#> Error in UseMethod("distinct"): no applicable method for 'distinct' applied to an object of class "multidplyr_party_df"

Created on 2021-05-21 by the reprex package (v2.0.0)

Looks like I forgot to provide a distinct method.

@PaulinaUrban
Copy link
Author

Dear Hadley,
Now I understand the error - now the question is: will You in the nearest future add this method distinct() to package multidplyr or how can I add this method in my code?

@hadley
Copy link
Member

hadley commented May 22, 2021

I will add it next time I work on multidplyr.

@pwwang
Copy link

pwwang commented Jul 10, 2021

group_map() family has the same issue.

@Tkastylevsky
Copy link

Any chance one of you came up with a fix for this ?

@hadley hadley added the feature a feature request or enhancement label Oct 31, 2023
@hadley hadley changed the title Error while using distinct in multidplyr in R Implement distinct Oct 31, 2023
@JohannesFriedrich
Copy link

I will add it next time I work on multidplyr.

This comment is now 3 years old, is there and maintenance planend to fix the mentioned issu(es)?

I would be very interested in a further development of the package.

@hadley
Copy link
Member

hadley commented Jul 28, 2024

@JohannesFriedrich I don't have time to work on it right now, but I'd be happy to review PRs. (And I don't think this would be that hard to fix following the template of the other methods.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants