GitHub - morisUtokyo/vTR: vTR (program for computing Variant units in mosaic Tandem Repeats) feeds an input string and an estimated mosaic tandem repeat that partly matches the input, and output a series of variant units.

Usage

vTR -f [fasta_file]

-f: Input a fasta file, say sample.fasta

Real data

Real data and a script program "test.sh" for calculating variant units are found in the directory realdata/ .

Introduction

vTR (program for computing Variant units in mosaic Tandem Repeats) feeds an input string and an estimated mosaic tandem repeat that partly matches the input, and output a series of variant units.

For example, the upper string in following alignment shows an estimated pattern (AGGGG)3(AAAAGAAAGAGAGGG)2, the lower string is an input string, and the alignment displays 39 matches(|), 4 mismatches(X), and 5 indels(-).

AGGGG -A-GGGG AGGGG AAAAGAAAGA-GAGGG AAAAGAAAGAGAGGG
|||||  | |||| | ||X |||||X|||| || || ||||||||||X|||X
AGGGG AAGGGGG A-GGT AAAAGTAAGAGGA-GG AAAAGAAAGAAAGGT

Although the estimated pattern, (AGGGG)3(AAAAGAAAGAGAGGG)2, is concise and is easier to understand the structure of the mosaic tandem repeat, to see what substitutions and indels are present in the lower string precisely, it would be useful to examine a series of unit variants in the string:

AGGGG AAGGGGG AGGT AAAAGTAAGAGGAGG AAAAGAAAGAAAGGT

For this purpose, vTR is developed to output the above decomposition from the estimated pattern and the input string. vTR feeds a fasta file of the form

> #Pat (AGGGG)3(AAAAGAAAGAGAGGG)2
AGGGGAAGGGGGAGGTAAAAGTAAGAGGAGGAAAAGAAAGAAAGGT

where the pattern following #Pat shows the original pattern, and vYT outputs

> #Len 46 #Err 0.188 #Pat <AGGGG>3<AAAAGAAAGAGAGGG>2 
#PrecisePat <AGGGG>1<AAGGGGG>1<AGGT>1<AAAAGTAAGAGGAGG>1<AAAAGAAAGAAAGGT>1
AGGGGAAGGGGGAGGTAAAAGTAAGAGGAGGAAAAGAAAGAAAGGT

where "#Len 46" means the input is of length 46, and "#Err 0.188" shows the error rate between the pattern and the input string.

The error rate is defined as the sum of mismatches, insertions, and deletions (denoted by X) devided by the sum of the number of matches and X (denoted by Y), namely X/Y. In the running example, X = 9(=4+5), Y = 48, and X/Y = 0.188 = 9/48.

vTR use KSW2, a library to align a pair of reads that implements a global alignment dynamic programming algorithm. You can obtain a copy of the KSW2 program from:

https://github.com/lh3/ksw2

Suzuki, H. and Kasahara, M. (2018). Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics, 19:45.

Li, H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100.

For your convenience, a copy of the KSW2 program is placed on this github.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
realdata		realdata
Makefile		Makefile
README.md		README.md
README_ksw.md		README_ksw.md
cli.c		cli.c
kalloc.c		kalloc.c
kalloc.h		kalloc.h
kseq.h		kseq.h
ksw2.h		ksw2.h
ksw2_extd.c		ksw2_extd.c
ksw2_extd2_sse.c		ksw2_extd2_sse.c
ksw2_extf2_sse.c		ksw2_extf2_sse.c
ksw2_exts2_sse.c		ksw2_exts2_sse.c
ksw2_extz.c		ksw2_extz.c
ksw2_extz2_sse.c		ksw2_extz2_sse.c
ksw2_gg.c		ksw2_gg.c
ksw2_gg2.c		ksw2_gg2.c
ksw2_gg2_sse.c		ksw2_gg2_sse.c
main.c		main.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Real data

Introduction

About

Releases

Packages

Languages

morisUtokyo/vTR

Folders and files

Latest commit

History

Repository files navigation

Usage

Real data

Introduction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages