Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gerbil output file comparison #15

Open
chklopp opened this issue Jun 23, 2020 · 1 comment
Open

gerbil output file comparison #15

chklopp opened this issue Jun 23, 2020 · 1 comment

Comments

@chklopp
Copy link

chklopp commented Jun 23, 2020

Is there an efficient way to compare large gerbil output files in order to retrieve kmers which are only in one of the two input files?

@merbert
Copy link
Collaborator

merbert commented Jul 7, 2020

Unfortunately that is not easily possible. It would be possible to treat two input data one after the other with the same hash strategy and then to fix and sort the partitions in the output file. Then the comparison could be done quite efficiently. However, implementing this would be quite complex and unfortunately I don't have time for that at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants