Skip to content

Feature Request: Support for left joining sf objects, preserving multiple geometry columns #2337

Open
@AarshBatra

Description

@AarshBatra

Encountered a situation where you needed two geometry columns in the same sf object?

Initially residing in separate sf objects (let's say sf_1, sf_2), the goal is to left_join (not a spatial join) them using a common joining key (e.g. uid). This results in a new sf object (sf_12) with two geometry columns. For example, in sf_12, you may want to calculate the row-wise distance between the first geometry (e.g., a village centroid, geometry type: point) and the second geometry (nearest irrigation canal to the village, geometry type: linestring).

However, attempting to perform this operation as follows:

# read in the village centroid shapefile
sf1 <- st_read("path/to/sf1.gpkg") # village centroids 

# read in the irrigation canal linestring shapefile
sf2 <- st_read("path/to/sf2.gpkg") # irrigation canal linestrings

# join sf1 and sf2 by "uid", which is the unique identifier for each row (leaving out geometries) in both sf1 and sf2
# and calculate row wise distance of village centroid to it's nearest irrigation canal
sf_12 <- sf_1 %>%
  left_join(sf_2, by = "uid") %>%
  mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element =  TRUE))

Results in this error:

Error: y should not have class sf; for spatial joins, use st_join.

R by default assumes a spatial join by default, which makes sense, but for the above use case it's not the desired behavior we want. Ideally, in the resulting sf_12 we would have wanted a sf object with 2 geometry columns (one from sf1 and one from sf2) and all other additional new columns of sf2.

To work around this issue, coercing each sf object to a dataframe before left joining, then re-coercing it back to an sf object, was effective and worked, here is the code:

# join the sf objects after coercing them to a data frame or a tibble and then recoerce the joined df back to sf to calculate distances
sf_12 <- sf_1 %>%
as.data.frame() %>%
left_join(sf_2 %>% as.data.frame(), by = "uid") %>%
st_as_sf() %>%
mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))

But, this might not be the most efficient implementation, or maybe I am missing something. Is there a way to handle this in sf in a better way?

If not, proposed solution:

Possibly, a dedicated function in the sf package for left joining sf objects while retaining multiple geometry columns. Something like:

# Hypothetical function to perform left join with multiple geometry columns (from above example)
sf_12 <- sf_left_join_multi_geom(sf_1, sf_2, by = "uid") %>%
  mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))

Happy to hear thoughts!

Thanks,
Aarsh

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions