Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crs handling in combine_list_of_sf #46

Closed
timelyportfolio opened this issue Jun 29, 2017 · 14 comments
Closed

crs handling in combine_list_of_sf #46

timelyportfolio opened this issue Jun 29, 2017 · 14 comments

Comments

@timelyportfolio
Copy link
Contributor

timelyportfolio commented Jun 29, 2017

How should we handle CRS in combine_list_of_sf() (see lines)? Originally, I specifically chose 4326 since I wrote combine_list_of_sf to deal with GeoJSON from Leaflet.Draw events in edit*(). However, I am not sure I am comfortable with this approach for a couple of reasons.

  1. With an imperfect rbind for sf, combine_list_of_sf() could be helpful elsewhere. If used elsewhere, combine_list_of_sf cannot be so crude with crs.
  2. What if the map or original features are in a different projection?

Options

I wanted to think through some options for dealing with this and discuss them in this issue. I have confirmed that even if edits made on a project leaflet map the edits are converted back to 4326,

  • add a crs argument - seems like we should do this no matter what even if we decide to continue to default to 4326
  • use crs from the list of features if all have the same crs
  • not assign a crs and let user decide what to do with the returned sf - I could see some merit in this also and this seems most consistent with sf.
@tim-salabim
Copy link
Member

tim-salabim commented Jun 29, 2017

My thoughts,

  1. what is imperfect about rbind.sf?
  2. Even if the map is in a different projection, the features still need to be in 4326 to be passed to add* functions. Only exception is L.CRS.Simple which is used in mapview(x, native.crs = TRUE). Hence, whatever features come back after editing from a projected map should also be in 4326 unless L.CRS.Simple is used. This may mean we might need to implement something similar (native.crs argument) for editFeatures, but I need to think this through a little more.

Options:

The two viable options I see are:

  • add crs argument and set default to crs = sf::st_crs(sf_list[[1]]) and reproject all other sf_list elements to that one (with a suitable warning if not all are consistent).
  • to be consistent with sf (and general GIS) behaviour, through an error if not all crs are equal.

I favour the latter as you should never be allowed to combine features with different projection (the result will be meaningless), though I see the convenience of the former.

pinging @edzer for thoughts on this one.

@timelyportfolio
Copy link
Contributor Author

timelyportfolio commented Jun 29, 2017

Here is my issue with rbind.sf, and it is one of the reasons for dplyr::bind_rows.

library(sf)

pt = st_sfc(st_point(c(1,2)))
rbind(
  st_sf(geometry=pt, a=1),
  st_sf(geometry=pt, b=2)
)

versus

mapedit:::combine_list_of_sf(
  list(
    st_sf(geometry=pt, a=1),
    st_sf(geometry=pt, b=2)
  )
)

I feel like there has been discussion on sf about this but my search skills/luck are not working this morning.

@edzer
Copy link
Member

edzer commented Jun 29, 2017

r-spatial/sf#49
r-spatial/sf#92
r-spatial/sf#223
tidyverse/dplyr#2459
tidyverse/dplyr#2457 which is still open...

we could "fix" this with @etiennebr's proposal for an st_bind_rows that does bind_rows on the non-geom columns, and then adds the combined geometry list-column?

@tim-salabim
Copy link
Member

Given that we are a bit pressured for time, I think we should maybe pursue option one of my comment. This is pretty close to @etiennebr 's suggestion I guess. We can easily substitute with a version hosted in sf later.

@timelyportfolio
Copy link
Contributor Author

Given time constraints I propose we leave as is for now and await a sf solution. For internal use I think 4326 GeoJSON is appropriate for this release.

@sheffe
Copy link

sheffe commented Jul 3, 2018

Writing in a year after the last comment on this issue -- for starters, thanks for all the work on this package! I use it often.

I found this issue a few months back and it solved a major headache for me. I've been unable to find an officially-supported solution to the basic problem of a bind_rows() style rbind-ing of a list of sf objects. All of the issues above look to be closed with references to an underlying issue with dplyr, roughly a year back. I now use mapedit's internal function for this a lot -- a quick search of my private repos this morning yielded >500 instances of calls to mapedit:::combine_list_of_sf() in lieu of a purrr::reduce(list_of_sf, sf:::rbind.sf) call. In case y'all think it's helpful, here or in related packages, here's my reasoning and a broad use-case where it's been helpful.

I've moved to using mapedit's internal function over reduce/rbind.sf for speed reasons -- if I understand correctly, reduce/rbind.sf takes elements 1 and 2 to create an intermediate sf dataframe A; then dataframe A is rbind-ed to element 3 to create intermediate sf df B; and so on, such that there are (for N list elements) a total of N-1 intermediate list objects and therefore N-1 copy in memory operations. I see a noticeable speed drop on polygon dataframes over about 200 rows.

The generic use-case -- I very frequently (>100 times/day?) need to split / map / combine sf objects, primarily for reasons of access speed. For instance: I have all 11 million US Census blocks from tigris, saved to disk by county, and need to readRDS or read_sf those files in arbitrary subsets. (For instance -- pick a focal county, find neighbors, load block geometries only for that subset of counties.) I would do something like list_of_county_files %>% map(read_sf) %>% mapedit:::combine_list_of_sf(), sometimes with intermediate operations.

Thanks again!

@edzer
Copy link
Member

edzer commented Jul 3, 2018

Is what you are proposing say a bind_rows_sf in sf which does the same as mapedit:::combine_list_of_sf does?

@sheffe
Copy link

sheffe commented Jul 4, 2018

From my perspective, that would be great! After reading the issue history in sf I thought there was a plan to implement bind_rows() as a generic so that sf and data.frame objects would both be handled. If that's not the case, a standalone bind_rows_sf would be great.

The other feature from dplyr::bind_rows I'd love to borrow is the .id parameter that embeds list names or element numbers into the resulting data frame as a column.

@edzer if you think it's worth considering this for sf proper, I will close this issue, move it to sf, and hopefully send you a PR in the next few days. I am not yet familiar enough with sf internals to know how complex this would be, so any warnings welcome.

Thanks much.

@timelyportfolio
Copy link
Contributor Author

@sheffe, I am so happy that you commented on the issue and have found the function helpful! I just used combine_list_of_sf tonight :)

@edzer
Copy link
Member

edzer commented Jul 4, 2018

Yes, worth considering:

  • iirc, bind_rows cannot be extended for sf objects because its first argument is ... and it is an S3 generic, unlike rbind and cbind which use a different method dispatch mechanism
  • given that bind_rows is easily a factor 20 faster than rbind.sf, I think it makes sense to add an bind_rows_sf to sf.

Happy to see an issue on sf, possibly followed by a PR.

@sheffe
Copy link

sheffe commented Jul 16, 2018

My comment / @edzer 's followup has been summarized over on sf. If anyone stumbles here searching for a solution like I did, check the linked issue. @tim-salabim added a crazy-fast data.table solution - sf::st_as_sf(data.table::rbindlist(<list_of_sf>)).

(This isn't my issue to close and looks like some other discussion was open, so I'll just say thanks all for engaging on this!)

@timelyportfolio
Copy link
Contributor Author

@sheffe I agree that this is safe to close for mapedit purposes. Thanks everyone for the very worthwhile discussion here and at sf.

@ramkumardeo
Copy link

ramkumardeo commented Jul 10, 2019

Hello,
I have more than thousand shape files in a directory, and I want to combine them into one big shape file using the following R scripts. But I am getting an "Error: arguments have different crs". I will appreciate if anyone could help me in correcting the R script.

library(sf)
filepaths <- list.files("D:/tree_objects_by_lidar_tiles/", pattern="*_Polygons.shp$", full.names=TRUE, recursive=TRUE)
filepaths1<- filepaths[1:1000]
listOfShp <- lapply(filepaths1, st_read)
combinedShp <- do.call(what = sf:::rbind.sf, args=listOfShp)
st_write(obj = combinedShp, dsn = "D:/tree_objects/tree_objects_combined1.gpkg", update = TRUE)

Thanks,
Ram

@tim-salabim
Copy link
Member

Hi @ramkumardeo, this is not the right place to ask this question. Please consider stackoverflow or r-sig-geo mailing list instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants