-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
st_write significantly slower when writing logical (boolean) field types #1689
Comments
Interesting 🤔 ... Can confirm on MacOS library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1
f <- function(x, n, fmt = ".csv") {
x <- rep(x, n)
x <- data.frame(0, 0, x)
x <- sf::st_as_sf(x, coords = 1:2)
sf::st_write(x, tempfile(fileext = fmt), quiet = TRUE)
}
microbenchmark::microbenchmark(
f(FALSE, 10000),
f(0L, 10000),
f(0, 10000),
f("FALSE", 10000),
f(FALSE, 10000, ".gpkg"),
f(0L, 10000, ".gpkg"),
f(0, 10000, ".gpkg"),
f("FALSE", 10000, ".gpkg"),
times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq
#> f(FALSE, 10000) 209.85522 230.28042 238.60764 234.83916 241.14995
#> f(0L, 10000) 32.27804 34.97311 36.81165 35.90451 38.60695
#> f(0, 10000) 35.26579 36.09508 36.74345 36.49578 37.87945
#> f("FALSE", 10000) 34.76497 35.47865 37.41611 36.41996 38.75253
#> f(FALSE, 10000, ".gpkg") 273.42011 282.26102 300.65752 290.11968 312.02942
#> f(0L, 10000, ".gpkg") 92.12086 93.14257 97.89846 98.11679 99.52682
#> f(0, 10000, ".gpkg") 94.32044 97.65801 99.13713 98.62017 100.35575
#> f("FALSE", 10000, ".gpkg") 94.62713 96.33537 98.80034 97.97429 99.84456
#> max neval
#> 279.97131 10
#> 41.73219 10
#> 38.51427 10
#> 43.07141 10
#> 378.14181 10
#> 107.44184 10
#> 105.80861 10
#> 105.31229 10 Created on 2021-06-08 by the reprex package (v2.0.0) Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Mojave 10.14.3
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2021-06-08
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
#> class 7.3-17 2020-04-26 [1] CRAN (R 4.0.2)
#> classInt 0.4-3 2020-04-07 [1] CRAN (R 4.0.0)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> dplyr 1.0.6 2021-05-05 [1] CRAN (R 4.0.2)
#> e1071 1.7-4 2020-10-14 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> KernSmooth 2.23-17 2020-04-26 [1] CRAN (R 4.0.2)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> microbenchmark * 1.4-7 2019-09-24 [1] CRAN (R 4.0.2)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> sf * 0.9-6 2020-09-13 [1] CRAN (R 4.0.2)
#> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
#> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.2)
#> units 0.6-7 2020-06-13 [1] CRAN (R 4.0.0)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
#> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library |
Can also confirm. Flatgeobuf helps speed in general, but also slower for library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
f <- function(x, n, fmt = ".csv") {
x <- rep(x, n)
x <- data.frame(0, 0, x)
x <- sf::st_as_sf(x, coords = 1:2)
sf::st_write(x, tempfile(fileext = fmt), quiet = TRUE)
}
microbenchmark::microbenchmark(
f(FALSE, 10000),
f(0L, 10000),
f(0, 10000),
f("FALSE", 10000),
f(FALSE, 10000, ".fgb"),
f(0L, 10000, ".fgb"),
f(0, 10000, ".fgb"),
f("FALSE", 10000, ".fgb"),
times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq
#> f(FALSE, 10000) 116.03899 120.08824 142.52775 132.83730 146.39599
#> f(0L, 10000) 23.38568 24.20939 25.40193 26.00254 26.17420
#> f(0, 10000) 25.33234 25.73252 26.94360 26.05844 28.04113
#> f("FALSE", 10000) 21.54567 21.88413 22.14330 22.15599 22.50748
#> f(FALSE, 10000, ".fgb") 124.45799 129.56546 143.20141 140.17218 153.84233
#> f(0L, 10000, ".fgb") 31.12163 31.95204 32.87096 33.04909 33.58276
#> f(0, 10000, ".fgb") 30.96207 31.16706 32.97509 31.75533 34.07206
#> f("FALSE", 10000, ".fgb") 31.68097 32.49478 33.56320 33.04091 34.89044
#> max neval
#> 234.55547 10
#> 27.09925 10
#> 31.91724 10
#> 22.56170 10
#> 186.41495 10
#> 35.00678 10
#> 39.57030 10
#> 36.27695 10 Created on 2021-06-08 by the reprex package (v2.0.0) |
The same here. We had to transform the dummy variables to integers. It seems that transforming them into characters is slow too to export. Strange fact ! |
It's treated differently from the others by GDAL: https://trac.osgeo.org/gdal/wiki/rfc50_ogr_field_subtype |
Thanks! |
Have noticed that when there are logical data type columns in an sf object writing out to a file is much slower. The following illustrates the issue:
Created on 2021-06-08 by the reprex package (v2.0.0)
Session info
The text was updated successfully, but these errors were encountered: