Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption #28

Open
i-strielkov opened this issue Jun 27, 2022 · 2 comments
Open

Data corruption #28

i-strielkov opened this issue Jun 27, 2022 · 2 comments

Comments

@i-strielkov
Copy link

Hi, we have been using your great tool for several years and saved as a lot of disc space! However, recently we have encountered and error that appears during DSRC encoding. Algorithm occasionally skips a number of reads at seemingly random position and then continues. The resulting file contain artifacts like this:

@L183:321:CAFVJANXX:6:2213:18462:88964 3:N:0:0
TATAAATGGATTCTCTTTGTCCATGATCACAAAATAAGAAT@L183:321:CAFVJANXX:6:2213:5699:93216 3:N:0:0

Renaming the reads solves the problem.
Do you happen to know what may cause such issues?

The problems were encountered with this public dataset: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-10175/
In particular, the problem can be reproduced with this file: http://ftp.sra.ebi.ac.uk/vol1/run/ERR539/ERR5396174/AML_low_input_AAAACT_r2.fq.gz

Many thanks for any information in advance,
Best,
Ievgen

@earonesty
Copy link

earonesty commented Jun 27, 2022 via email

@ggoussarov-evotec
Copy link

Hi, I have been working @i-strielkov on this, using the fastq files that were linked. After poking at the settings for a while, I have identified that using a buffer size that is not the default (I tried -b11 and -b12) reproducibly changes which lines get corrupted, and if set to be larger than the file size (I tried -b500) removes the corrupted output. To me, this indicates that the issue is probably related to that, rather than issues with with the file itself.

In addition, I have also verified that all read names are indeed unique, since this was proposed as a potential cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants