Skip to content

Latest commit

 

History

History

bwa

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

BWA Workflow Example

This workflow gives an example of using Makeflow to parallelize the Burroughs-Wheeler Alignment (BWA) tool.

If you have not done so already, please clone this example repository like so:

git clone https://github.com/cooperative-computing-lab/makeflow-examples.git
cd ./makeflow-examples/bwa

First, build the bwa binary for your architecture:

git clone https://github.com/lh3/bwa bwa-src
cd bwa-src
make
cp bwa ..
cd ..

If you do not have real data to work with, then generate some simulated data (~10 second workflow):

./fastq_generate.pl 10000 1000 > ref.fastq
./fastq_generate.pl 1000 100 ref.fastq > query.fastq

Then, generate a workflow to process the data:

./make_bwa_workflow --ref ref.fastq --query query.fastq --num_seq 100 > bwa.mf

Finally, execute the workflow using makeflow locally, or using a batch system like Condor, SGE, or Work Queue:

makeflow bwa.mf
makeflow -T condor bwa.mf
makeflow -T sge bwa.mf
makeflow -T wq bwa.mf

Alternatively, the makeflow can be run using the JX or JSON format

makeflow --jx bwa.jx
makeflow --json bwa.json

NOTE: both the JX and JSON formats utilize fastq_reduce and cat_bwa which are created using the make_bwa_workflow script.

Workflow SizeReference Size(Number x Length)Query Size(Number x Length)Number of seq per split Approx Time with Machine
Small10000x1000 (Fixed 20M)1000x100 (237K)100 ~10 sec : 1 machine
Medium100000x1000 (Fixed 196M)10000x1000 (20M)1000 ~2 min : 20 machines
Medium100000x1000 (Fixed 196M)1000000x100 (237M)1000 ~6 min : 20 machines
Large1000000x1000 (Fixed 2.0G)1000000x100 (237M)1000 ~30 min : 20 machines

Note: when using generated data we did not use the paired-end functionality of BWA as we do not guarantee both query and rquery are matched as a pair would be in real data.