fix: handle TRF processing for sequences fewer than total processes #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We welcome feedback and issue reporting for all bioBakery tools through our Discourse site. For users that would like to directly contribute to the tools we are happy to field PRs to address bug fixes. Please note the turn around time on our end might be a bit long to field these but that does not mean we don't value the contribution! We currently don't accept PRs to add new functionality to tools but we would be happy to receive your feedback on Discourse.
Also, we will make sure to attribute your contribution in our User’s manual(README.md) and in any associated paper Acknowledgements.
Description
Fix TRF parallel processing for scenarios with fewer sequences than configured processes.
In the current implementation, when input sequences are less than specified processes:
Modified process allocation logic in
kneaddata/trf_parallel.py
(line 69) to:Example log showing low sequence count:
kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /path/to/.trimmed.1.fastq ): 25542560.0
kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /path/to/.trimmed.2.fastq ): 25542560.0
kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /path/to/.trimmed.single.1.fastq ): 226.0
kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming (/path/to/.trimmed.single.2.fastq ): 7.0
After modification, TRF now successfully processes the 7 sequences, previously impossible with 16 processes (nproc=16).
Related Issue
Screenshots (if appropriate):