Consider allowing 1 dimensional `[` to drop the `geometry` column

In dplyr and in many other places in the tidyverse, we program with 1 dimensional calls to `[`, such as `df["x"]`, where we expect that the result has _exactly_ 1 column, and should be named `x`.

In `?dplyr::dplyr_extending`, we discuss how this is one of the invariants that is required for compatibility with dplyr.

But sf doesn't do this, and instead retains the `geometry` column as a "sticky" column:

``` r
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.0, PROJ 9.1.1; sf_use_s2() is TRUE

nrows <- 10
geometry = st_sfc(lapply(1:nrows, function(x) st_geometrycollection()))
df <- st_sf(id = 1:nrows, geometry = geometry)

df
#> Simple feature collection with 10 features and 1 field (with 10 geometries empty)
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
#>    id                 geometry
#> 1   1 GEOMETRYCOLLECTION EMPTY
#> 2   2 GEOMETRYCOLLECTION EMPTY
#> 3   3 GEOMETRYCOLLECTION EMPTY
#> 4   4 GEOMETRYCOLLECTION EMPTY
#> 5   5 GEOMETRYCOLLECTION EMPTY
#> 6   6 GEOMETRYCOLLECTION EMPTY
#> 7   7 GEOMETRYCOLLECTION EMPTY
#> 8   8 GEOMETRYCOLLECTION EMPTY
#> 9   9 GEOMETRYCOLLECTION EMPTY
#> 10 10 GEOMETRYCOLLECTION EMPTY

# `geometry` sticks around
df["id"]
#> Simple feature collection with 10 features and 1 field (with 10 geometries empty)
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
#>    id                 geometry
#> 1   1 GEOMETRYCOLLECTION EMPTY
#> 2   2 GEOMETRYCOLLECTION EMPTY
#> 3   3 GEOMETRYCOLLECTION EMPTY
#> 4   4 GEOMETRYCOLLECTION EMPTY
#> 5   5 GEOMETRYCOLLECTION EMPTY
#> 6   6 GEOMETRYCOLLECTION EMPTY
#> 7   7 GEOMETRYCOLLECTION EMPTY
#> 8   8 GEOMETRYCOLLECTION EMPTY
#> 9   9 GEOMETRYCOLLECTION EMPTY
#> 10 10 GEOMETRYCOLLECTION EMPTY
```

This has caused quite a bit of pain in dplyr over the years, and has recently also just bitten me again in hardhat, where I also use `df[cols]` as a way to select columns https://github.com/tidymodels/hardhat/issues/228.

In dplyr, algorithms underlying functions like `arrange()` and `distinct()` use `df[i]` to first select the columns to order by or compute the unique values of, so retaining extra columns here can be particularly problematic. I know sf has methods for these two verbs to work around this, but I think those could be avoided entirely if the geometry column wasn't sticky here.

I think:
- Having `dplyr::select()` retain sticky columns makes for a great user experience
- Having `df[i]` retain sticky columns makes for a painful programming experience

This ^ is our general advice regarding sticky columns, and is how dplyr's grouped data frames work:

``` r
library(dplyr, warn.conflicts = FALSE)

df <- tibble(g = 1, x = 2)
df <- group_by(df, g)

select(df, x)
#> Adding missing grouping variables: `g`
#> # A tibble: 1 × 2
#> # Groups:   g [1]
#>       g     x
#>   <dbl> <dbl>
#> 1     1     2

# Returns a bare tibble, not a grouped data frame
df["x"]
#> # A tibble: 1 × 1
#>       x
#>   <dbl>
#> 1     2
```

It is also how tsibble works with its index column.

Would you ever consider allowing `df[i]` to only return _exactly_ the columns selected by `i`? If the geometry column isn't selected, then the appropriate behavior would be to return a bare data frame or bare tibble depending on the underlying data structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider allowing 1 dimensional `[` to drop the `geometry` column #2131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider allowing 1 dimensional [ to drop the geometry column #2131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider allowing 1 dimensional `[` to drop the `geometry` column #2131