Description
Encountered a situation where you needed two geometry columns in the same sf object?
Initially residing in separate sf objects (let's say sf_1
, sf_2
), the goal is to left_join
(not a spatial join) them using a common joining key (e.g. uid
). This results in a new sf object (sf_12
) with two geometry columns. For example, in sf_12
, you may want to calculate the row-wise distance between the first geometry (e.g., a village centroid, geometry type: point
) and the second geometry (nearest irrigation canal to the village, geometry type: linestring
).
However, attempting to perform this operation as follows:
# read in the village centroid shapefile
sf1 <- st_read("path/to/sf1.gpkg") # village centroids
# read in the irrigation canal linestring shapefile
sf2 <- st_read("path/to/sf2.gpkg") # irrigation canal linestrings
# join sf1 and sf2 by "uid", which is the unique identifier for each row (leaving out geometries) in both sf1 and sf2
# and calculate row wise distance of village centroid to it's nearest irrigation canal
sf_12 <- sf_1 %>%
left_join(sf_2, by = "uid") %>%
mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))
Results in this error:
Error: y should not have class sf; for spatial joins, use st_join.
R by default assumes a spatial join by default, which makes sense, but for the above use case it's not the desired behavior we want. Ideally, in the resulting sf_12
we would have wanted a sf object with 2 geometry columns (one from sf1
and one from sf2
) and all other additional new columns of sf2
.
To work around this issue, coercing each sf object to a dataframe before left joining, then re-coercing it back to an sf
object, was effective and worked, here is the code:
# join the sf objects after coercing them to a data frame or a tibble and then recoerce the joined df back to sf to calculate distances
sf_12 <- sf_1 %>%
as.data.frame() %>%
left_join(sf_2 %>% as.data.frame(), by = "uid") %>%
st_as_sf() %>%
mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))
But, this might not be the most efficient implementation, or maybe I am missing something. Is there a way to handle this in sf in a better way?
If not, proposed solution:
Possibly, a dedicated function in the sf package for left joining sf
objects while retaining multiple geometry columns. Something like:
# Hypothetical function to perform left join with multiple geometry columns (from above example)
sf_12 <- sf_left_join_multi_geom(sf_1, sf_2, by = "uid") %>%
mutate(dist_g1_g2 = st_distance(geometry.x, geometry.y, by_element = TRUE))
Happy to hear thoughts!
Thanks,
Aarsh