Skip to content

Commit

Permalink
Readme updated (version 1.0.2)
Browse files Browse the repository at this point in the history
  • Loading branch information
i-xiaohu committed Dec 24, 2023
1 parent 04eba52 commit d6c2b7a
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 26 deletions.
30 changes: 26 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,32 @@
cmake_minimum_required(VERSION 3.10)
project(comp_seed)

#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2")
#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall -O3")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -O3 -mavx")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -O3 -fpermissive -mavx")
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag("-mavx512bw" AVX512_SUPPORTED)
check_cxx_compiler_flag("-mavx2" AVX2_SUPPORTED)
check_cxx_compiler_flag("-mavx" AVX_SUPPORTED)
check_cxx_compiler_flag("-msse4.2" SSSE_SUPPORTED)

if (AVX512_SUPPORTED)
message("AVX512 instructions are supported on this machine.")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx512bw")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512bw")
elseif(AVX2_SUPPORTED)
message("AVX2 instructions are supported on this machine.")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx2")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2")
elseif (AVX_SUPPORTED)
message("AVX instructions are supported on this machine.")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx")
elseif (SSSE_SUPPORTED)
message("AVX instructions are supported on this machine.")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2")
endif()

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -O3")

set(cstl
cstl/kvec.h
Expand Down
39 changes: 20 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ cmake ..; make
```

If the installation is successful, the build subdirectory will contain the executable files.
- `bwaidx` to create the FM-index for a reference file.
- `bwamem_seeding` to run BWA-MEM seeding for sequencing reads.
- `comp_seeding` to run compressive seeding for reordered reads.
- `bwaidx` to create FM-index for a reference file.
- `bwamem` to run BWA-MEM (v0.7.17) for sequencing reads.
- `CompSeed` to run compressive seeding for reordered reads that generates same seeds and alignments as BWA-MEM.

## Declaration

CompSeed is an algorithm demonstration for compressive alignment, by far not a standalone tool. It
CompSeed is an algorithm demonstration for compressive alignment. It
received the reads compressed and reordered by upstream reordering-based compressors, including
[SPRING](https://github.com/shubhamchandak94/Spring), [Minicom](https://github.com/yuansliu/minicom)
and [PgRC](https://github.com/kowallus/PgRC). While CompSeed can only support for single-end compression and alignment,
and [PgRC](https://github.com/kowallus/PgRC). While CompSeed currently only supporst for single-end compression and alignment,
the project of integrating compression and alignment is underway.

## Example usage
Expand All @@ -49,23 +49,19 @@ minicom -d data.mincom -t 16; mv data_dec.reads minicom.reads
pgrc -t 16 -d data.pgrc; mv data.pgrc_out pgrc.reads
```

Run BWA-MEM seeding and record the time.
Run BWA-MEM.
```bash
/usr/bin/time bwamem_seeding -t 16 hg19 data.fq
bwamem -t 16 hg19 data.fq > bwa.sam
```

Run CompSeed and record the time.
Run CompSeed.
```bash
/usr/bin/time comp_seeding -t 16 hg19 spring.reads
/usr/bin/time comp_seeding -t 16 hg19 minicom.reads
/usr/bin/time comp_seeding -t 16 hg19 pgrc.reads
CompSeed -t 16 hg19 spring.reads > css.sam
CompSeed -t 16 hg19 minicom.reads > csm.sam
CompSeed -t 16 hg19 pgrc.reads > csp.sam
```

Both `bwamem_seeding` and `comp_seeding` have an option `--print` to output seeds in text format to stdout for
checking the seed identity. But do not turn it on when comparing the speed because the output time significantly degrades
the benchmarked results.

For `bwamem_seeding` and `comp_seeding`, all the original parameters of BWA-MEM seeding are supported.
For `CompSeed`, all the original parameters of BWA-MEM seeding are supported.
```
-t number of threads
-k minimum seed length
Expand All @@ -76,15 +72,20 @@ For `bwamem_seeding` and `comp_seeding`, all the original parameters of BWA-MEM
```

## Results
CompSeed fully utilizes the redundancy information provided from upstream compressors, and avoids ~50% of the redundant
time-consuming FM-index operations during the BWA-MEM seeding process.
CompSeed fully utilizes the redundancy information provided from upstream compressors using trie structures, and
avoids ~50% of the redundant time-consuming FM-index operations during the BWA-MEM seeding process.

![SST](images/Figure1.jpg)

After combined with AVX instructions for extension stage, a doubled alignment throughput is observed.

![Seeding time of BWA-MEM and CompSeed](images/Table1.jpg)

It shows enhanced performance as sequencing coverage increases, and it is almost not affected by the re-seeding parameter.
Moreover, it has substantial memory advantage compared with the existing solutions, because it does not replace or modify
the FM-index. All the acceleration benefits from the compression, thus does not conflict with existing hardware-based optimizations.
![Seeding time of BWA-MEM and CompSeed](images/Figure1.jpg)

![Seeding time of BWA-MEM and CompSeed](images/Figure2.jpg)

## References

Expand Down
Binary file modified images/Figure1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Figure2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/Table1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 11 additions & 3 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -214,9 +214,6 @@ void display_profile(const thread_aux_t &t) {
}

int main(int argc, char *argv[]) {
#if ((!__AVX512BW__) && (!__AVX2__) && (__SSE2__))
fprintf(stderr, "Smith-Waterman in AVX mode\n");
#endif
mem_opt_t *opt, opt0;
int fd, fd2, i, c, ignore_alt = 0, no_mt_io = 0;
int fixed_chunk_size = -1;
Expand Down Expand Up @@ -342,6 +339,16 @@ int main(int argc, char *argv[]) {
return 1;
}

#if __AVX512BW__
fprintf(stderr, "Executing Banded Smith-Waterman in AVX512 mode\n");
#elif __AVX2__
fprintf(stderr, "Executing Banded Smith-Waterman in AVX2 mode\n");
#elif __AVX__
fprintf(stderr, "Executing Banded Smith-Waterman in AVX mode\n");
#elif __SSE4_2__
fprintf(stderr, "Executing Banded Smith-Waterman in SSE4.2 mode\n");
#endif

if (rg_line) {
hdr_line = bwa_insert_header(rg_line, hdr_line);
free(rg_line);
Expand Down Expand Up @@ -425,6 +432,7 @@ int main(int argc, char *argv[]) {
// opt->flag |= MEM_F_PE;
// }
// }

bwa_print_sam_hdr(aux.idx->bns, hdr_line);
aux.actual_chunk_size = fixed_chunk_size > 0? fixed_chunk_size : opt->chunk_size * opt->n_threads;
kt_pipeline(no_mt_io? 1 : 2, process, &aux, 3);
Expand Down

0 comments on commit d6c2b7a

Please sign in to comment.