This workflow gives an example of using Makeflow to parallelize the Burroughs-Wheeler Alignment (BWA) tool.
If you have not done so already, please clone this example repository like so:
git clone https://github.com/cooperative-computing-lab/makeflow-examples.git
cd ./makeflow-examples/bwa
First, build the bwa binary for your architecture:
git clone https://github.com/lh3/bwa bwa-src
cd bwa-src
make
cp bwa ..
cd ..
If you do not have real data to work with, then generate some simulated data (~10 second workflow):
./fastq_generate.pl 10000 1000 > ref.fastq
./fastq_generate.pl 1000 100 ref.fastq > query.fastq
Then, generate a workflow to process the data:
./make_bwa_workflow --ref ref.fastq --query query.fastq --num_seq 100 > bwa.mf
Finally, execute the workflow using makeflow locally, or using a batch system like Condor, SGE, or Work Queue:
makeflow bwa.mf
makeflow -T condor bwa.mf
makeflow -T sge bwa.mf
makeflow -T wq bwa.mf
Alternatively, the makeflow can be run using the JX
or JSON
format
makeflow --jx bwa.jx
makeflow --json bwa.json
NOTE: both the JX
and JSON
formats utilize fastq_reduce and cat_bwa
which are created using the make_bwa_workflow
script.
Workflow Size | Reference Size(Number x Length) | Query Size(Number x Length) | Number of seq per split | Approx Time with Machine |
Small | 10000x1000 (Fixed 20M) | 1000x100 (237K) | 100 | ~10 sec : 1 machine |
Medium | 100000x1000 (Fixed 196M) | 10000x1000 (20M) | 1000 | ~2 min : 20 machines |
Medium | 100000x1000 (Fixed 196M) | 1000000x100 (237M) | 1000 | ~6 min : 20 machines |
Large | 1000000x1000 (Fixed 2.0G) | 1000000x100 (237M) | 1000 | ~30 min : 20 machines |
Note: when using generated data we did not use the paired-end functionality of BWA as we do not guarantee both query and rquery are matched as a pair would be in real data.