Multi-threaded construction of multistring BWT and LCP array.
The input for this program should be a FASTA file on the alphabet {A, C, G, T}
.
Some examples are provided in the directory tests/
.
The output of the program are the BWT and the LCP array of the collection of strings in the input file.
After the main command execution, the binary-encoded BWT is contained in the file outfile-BWT.bin
,
while the binary LCP array is contained in the file outfile-LCP.bin
.
Each entry of the latter file is a 8-bit unsigned integer with the only exception of -1
encoded as 0xFF
.
After cloning the repository, issue the following command:
make bin
(Any modern C++ compiler should suffice for compiling the program.)
Now it is possible to run the program as follows:
./bwt_lcp <path to the FASTA file> <no of threads>
To output the ASCII version of the BWT use the decode_bwt
tool as follows:
./decode_bwt <path to outfile-BWT.bin>
Some test instances are provided in the repository.
To run them, one can issue the command make test
.
Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, and Raffaella Rizzi (2019).
Multithread Multistring Burrows–Wheeler Transform and Longest Common Prefix Array.
Journal of Computational Biology, 26(9), 948–961.
Paola Bonizzoni, Gianluca Della Vedova, Serena Nicosia, Yuri Pirola, Marco Previtali, and Raffaella Rizzi (2018).
Divide and Conquer Computation of the Multi-string BWT and LCP Array.
CiE 2018: 107-117
DOI: 978-3-319-94418-0_11