Data corruption #28

i-strielkov · 2022-06-27T15:55:14Z

Hi, we have been using your great tool for several years and saved as a lot of disc space! However, recently we have encountered and error that appears during DSRC encoding. Algorithm occasionally skips a number of reads at seemingly random position and then continues. The resulting file contain artifacts like this:

@L183:321:CAFVJANXX:6:2213:18462:88964 3:N:0:0
TATAAATGGATTCTCTTTGTCCATGATCACAAAATAAGAAT@L183:321:CAFVJANXX:6:2213:5699:93216 3:N:0:0

Renaming the reads solves the problem.
Do you happen to know what may cause such issues?

The problems were encountered with this public dataset: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-10175/
In particular, the problem can be reproduced with this file: http://ftp.sra.ebi.ac.uk/vol1/run/ERR539/ERR5396174/AML_low_input_AAAACT_r2.fq.gz

Many thanks for any information in advance,
Best,
Ievgen

The text was updated successfully, but these errors were encountered:

earonesty · 2022-06-27T18:32:54Z

if you have reads that are named the same they might be seen as dups, right? not sure the algo takes dup-read inputs well (which should never happen)

ggoussarov-evotec · 2022-06-28T08:17:47Z

Hi, I have been working @i-strielkov on this, using the fastq files that were linked. After poking at the settings for a while, I have identified that using a buffer size that is not the default (I tried -b11 and -b12) reproducibly changes which lines get corrupted, and if set to be larger than the file size (I tried -b500) removes the corrupted output. To me, this indicates that the issue is probably related to that, rather than issues with with the file itself.

In addition, I have also verified that all read names are indeed unique, since this was proposed as a potential cause.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data corruption #28

Data corruption #28

i-strielkov commented Jun 27, 2022

earonesty commented Jun 27, 2022 via email •

edited

Loading

ggoussarov-evotec commented Jun 28, 2022

Data corruption #28

Data corruption #28

Comments

i-strielkov commented Jun 27, 2022

earonesty commented Jun 27, 2022 via email • edited Loading

ggoussarov-evotec commented Jun 28, 2022

earonesty commented Jun 27, 2022 via email •

edited

Loading