Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault error reading large compressed file #544

Open
n3ssuno opened this issue Sep 25, 2024 · 2 comments
Open

segfault error reading large compressed file #544

n3ssuno opened this issue Sep 25, 2024 · 2 comments

Comments

@n3ssuno
Copy link

n3ssuno commented Sep 25, 2024

Hi,

I cannot read a large zip file with vroom (v. 1.6.5) but I can read it once unzipped.

I saw there are closed similar issues, but none of the solutions seem to work for me and they are also supposed to have been fixed in the version I installed (if I understand correctly).

r$> download.file("https://s3.amazonaws.com/data.patentsview.org/download/g_inventor_disambiguated.tsv.zip")

r$> df <- "g_inventor_disambiguated.tsv.zip" |>
  vroom::vroom(
    col_select = c(inventor_id),
    col_types = vroom::cols(inventor_id = vroom::col_character())
  )

r$> df

 *** caught segfault ***
address 0x7f4dafee9009, cause 'memory not mapped'

Traceback:
 1: vec_slice(x, seq_len(n))
 2: vec_head(as.data.frame(x), n)
 3: df_head(x, n)
 4: tbl_format_setup.tbl(x, width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines, focus = focus)
 5: tbl_format_setup_dispatch(x, width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines, focus = focus)
 6: tbl_format_setup(x, width = width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines, focus = attr(x, "pillar_focus"))
 7: format_tbl(x, width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines)
 8: format.tbl(x, width = width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines)
 9: format(x, width = width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines)
10: writeLines(format(x, width = width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines))
11: print_tbl(x, width, ..., n = n, max_extra_cols = max_extra_cols,     max_footer_lines = max_footer_lines)
12: print.tbl(x)
13: (function (x, ...) UseMethod("print"))(x)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Here some information about my environment

r$> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS/LAPACK: /home/******/miniforge3/envs/******/lib/libopenblasp-r0.3.27.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Rome
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1

Please let me know if you need any extra information.
I hope this helps to solve the issue.

@Zilong-Li
Copy link

I second this

@Zilong-Li
Copy link

can be fixed by re-setting TMPDIR to a disk with enough space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants