TASSEL is a hybrid transcript assembly pipeline that merges transcriptome from short-read RNA-seq and long-read RNA-seq. The output is a merged transcriptome file (gtf) which combines high depth of short-read sequencing with long-range information from long-read RNA-seq. The unique feature about TASSEL is that it strands the otherwise unstranded long reads using inbuilt SLURP methodology and then use them for transcript assembly.
Create two directories - one that contains short-read fastq files and another that contains long-read fastq files that you want to merge.
User input is required for the following variables within the script:
hisat_indices: provide path to hisat indices
rnastranded: provided type of strandedness for short-reads (options: F, R; default: R)
referenceGTF: provide path to reference_annotation_file (required for guided assembly)
referenceFASTA: provide path to reference_fasta_file
shortread_fastq_dir: provide path to directory that contains only short-read fastq files
longread_fastq_dir: provide path to directory that contains only long-read fastq files
library_type: for library strandedness (options: --rf (for first strand), --fr (for second strand); default --rf)
primer1: sequence of primer used for first strand synthesis during long-read cDNA library prep; limit to 15 nt (default: GCTCTATCTTCTTT)
primer2: sequence of primer used for second strand synthesis (strand switching) during long-read cDNA library prep; limit to 15 nt (default: CTGATATTGCTGGG)
rc_primer2: reverse complement of primer2 (default: CCCAGCAATATCAG)
processor: number of processors (default: 4)
Run bash TASSEL.sh
in the directory that contains directories for short-read and long-read fastq files.
hisat2: hisat2 can be obtained from (http://daehwankimlab.github.io/hisat2/download/) or conda install -c bioconda hisat2
samtools: samtools can be obtained from (http://www.htslib.org/download/) or conda install -c bioconda samtools
StringTie2: StringTie2 can be obtained from (https://ccb.jhu.edu/software/stringtie/index.shtml) or conda install -c bioconda stringtie
seqkit: seqkit can be obtained from (https://bioinf.shenwei.me/seqkit/) or conda install -c bioconda seqkit
minimap2: seqkit can be obtained from (https://github.com/lh3/minimap2#install) or conda install -c bioconda minimap2
If you use or discuss this method, please cite:
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ (2023) Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 19(10): e1011576. https://doi.org/10.1371/journal.pcbi.1011576