TASSEL (Transcript Assembly using Short and Strand Emended Long reads)

TASSEL is a hybrid transcript assembly pipeline that merges transcriptome from short-read RNA-seq and long-read RNA-seq. The output is a merged transcriptome file (gtf) which combines high depth of short-read sequencing with long-range information from long-read RNA-seq. The unique feature about TASSEL is that it strands the otherwise unstranded long reads using inbuilt SLURP methodology and then use them for transcript assembly.

Usage

Create two directories - one that contains short-read fastq files and another that contains long-read fastq files that you want to merge.

User input is required for the following variables within the script:
hisat_indices: provide path to hisat indices
rnastranded: provided type of strandedness for short-reads (options: F, R; default: R)
referenceGTF: provide path to reference_annotation_file (required for guided assembly)
referenceFASTA: provide path to reference_fasta_file
shortread_fastq_dir: provide path to directory that contains only short-read fastq files
longread_fastq_dir: provide path to directory that contains only long-read fastq files
library_type: for library strandedness (options: --rf (for first strand), --fr (for second strand); default --rf)
primer1: sequence of primer used for first strand synthesis during long-read cDNA library prep; limit to 15 nt (default: GCTCTATCTTCTTT)
primer2: sequence of primer used for second strand synthesis (strand switching) during long-read cDNA library prep; limit to 15 nt (default: CTGATATTGCTGGG)
rc_primer2: reverse complement of primer2 (default: CCCAGCAATATCAG)
processor: number of processors (default: 4)

Run bash TASSEL.sh in the directory that contains directories for short-read and long-read fastq files.

Dependencies

hisat2: hisat2 can be obtained from (http://daehwankimlab.github.io/hisat2/download/) or conda install -c bioconda hisat2
samtools: samtools can be obtained from (http://www.htslib.org/download/) or conda install -c bioconda samtools
StringTie2: StringTie2 can be obtained from (https://ccb.jhu.edu/software/stringtie/index.shtml) or conda install -c bioconda stringtie
seqkit: seqkit can be obtained from (https://bioinf.shenwei.me/seqkit/) or conda install -c bioconda seqkit
minimap2: seqkit can be obtained from (https://github.com/lh3/minimap2#install) or conda install -c bioconda minimap2

Citation

If you use or discuss this method, please cite:
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ (2023) Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 19(10): e1011576. https://doi.org/10.1371/journal.pcbi.1011576

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
TASSEL.sh		TASSEL.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TASSEL (Transcript Assembly using Short and Strand Emended Long reads)

Usage

Dependencies

Citation

About

Releases

Packages

Languages

License

kainth-amoldeep/TASSEL

Folders and files

Latest commit

History

Repository files navigation

TASSEL (Transcript Assembly using Short and Strand Emended Long reads)

Usage

Dependencies

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages