You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to create and use a locale for Walloon dialect as shown in the example of days and months in Maori language (see locale vignette). It was an exercise for a coding club event.
However, I encountered a suspect behavior (see reprex below): the dates "vén 3 djun 2016" and "vén 1 djulete 2016" are not parsed, "vén 1 avri 2016" well. This with a file formatted with UTF-8 encoding. The same file saved with latin-1 formatting ("iso8859-1") can be parsed correctly, of course with encoding = "iso8859-1".
Input files: example_walloon_latin1.txt example_walloon_utf8.txt
My session info follows the reprex.
Any kind of help is very welcome! Thanks a lot.
library(readr)
#> Warning: package 'readr' was built under R version 4.1.2days_walloon<- c("londi", "mårdi", "mierkidi", "djudi", "vénrdi",
"semdi", "dimegne")
months_walloon<- c("djanvî","fevrî", "måss", "avri", "may", "djun", "djulete",
"awousse","setimbre", "octôbe", "nôvimbe", "decimbe")
days_abbr_walloon<- c("lon", "mår", "mie", "dju", "vén", "sem", "dim")
months_abbr_walloon<- c("djan","fev", "mås", "avr", "may", "djun", "djul",
"awou","set", "oct", "nôv", "dec")
walloon_utf8<- locale(date_names(
day=days_walloon,
mon=months_walloon,
day_ab=days_abbr_walloon,
mon_ab=months_abbr_walloon),
encoding="UTF-8",
decimal_mark=".",
grouping_mark="'",
date_format="%a %d %B %Y"
)
walloon_latin1<-walloon_utf8walloon_latin1$encoding<-"iso8859-1"# parsing problemsdf_utf8<-
read_csv(
"example_walloon_utf8.txt",
col_types= cols(
Date= col_date(format=walloon_utf8$date_format)),
locale=walloon_utf8
)
#> Warning: One or more parsing issues, see `problems()` for detailsdf_utf8#> # A tibble: 4 x 1#> Date #> <date> #> 1 2016-04-01#> 2 NA #> 3 2016-07-02#> 4 NA
problems(df_utf8)
#> # A tibble: 2 x 5#> row col expected actual file #> <int> <int> <chr> <chr> <chr> #> 1 3 1 date like %a %d %B %Y vén 3 djun 2016 C:/Users/damiano_oldoni/~#> 2 5 1 date like %a %d %B %Y vén 1 djulete 2016 C:/Users/damiano_oldoni/~# everything works finedf_latin1<-
read_csv(
"example_walloon_latin1.txt",
col_types= cols(
Date= col_date(format=walloon_latin1$date_format)),
locale=walloon_latin1
)
df_latin1#> # A tibble: 4 x 1#> Date #> <date> #> 1 2016-04-01#> 2 2016-06-03#> 3 2016-07-02#> 4 2016-07-01
Created on 2022-03-07 by the reprex package (v2.0.1)
I think I might be experiencing a similar bug, albeit with my example working in UTF-8 and failing in latin1 (or Windows-1252). In my case, readr/vroom is unable to parse the abbreviated months with Umlaut ("Mär") correctly (although in some cases it also failed to parse other months, using Windows-1252 encoding).
Manually transferring from tidyverse/readr#1383
I was trying to create and use a locale for Walloon dialect as shown in the example of days and months in Maori language (see locale vignette). It was an exercise for a coding club event.
However, I encountered a suspect behavior (see reprex below): the dates "vén 3 djun 2016" and "vén 1 djulete 2016" are not parsed, "vén 1 avri 2016" well. This with a file formatted with UTF-8 encoding. The same file saved with latin-1 formatting (
"iso8859-1"
) can be parsed correctly, of course withencoding = "iso8859-1"
.Input files:
example_walloon_latin1.txt
example_walloon_utf8.txt
My session info follows the reprex.
Any kind of help is very welcome! Thanks a lot.
Created on 2022-03-07 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: