Alignment Free Homology Detection

These are a bunch of scripts that help detect sequence homology without the use of alignment. These mainly align a given training set of regions from hg18 to danRer5 identified using conserved flanks even though the middle region cannot be aligned.

Requirements

Python 2.7, or 2.6 + argparse
Perl
Bash
Biopython
numpy, scipy, scikits.statsmodels, matplotlib
twoBitToFa (UCSC Kent)

Pipeline

To get the 2bit genomes for hg18 and danRer5 (over 1GB):

./get_data.sh

Obtain the formatted file used for most utilities here:

./extract_sequences.sh

Perform the alignment for D2z and HexMCD (takes a long time):

cat hg18.toDanRer5.seqs.txt | ./af.py d2z
cat hg18.toDanRer5.seqs.txt | ./af.py hexmcd

These produce pickled files, which can be converted to .dat as below:

./pkl_to_dat d2z.pkl > d2z.dat

For the alignments of HexDiff, HexYMF:

cat hg18.toDanRer5.seqs.txt | ./hexdiff.pl region 1.0  > hexdiff.dat
cat hg18.toDanRer5.seqs.txt | hexymf/hexymf.pl > hexymf.dat

Use scoring to generate ranked score reports using either the overlap or ranked_peaks algorithm:

./scoring.py -f d2z.dat ranked_peaks > d2z_overlap

Generate cdf plots:

./scoring_cdf.py -f d2z_overlap hexmcd_overlap

You can also plot an individual cne's impulse function:

./plot.py -f d2z.dat <cne_name>

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
hexymf		hexymf
results		results
.gitignore		.gitignore
README.markdown		README.markdown
af.py		af.py
amb_extract_sequences.sh		amb_extract_sequences.sh
ans.txt		ans.txt
d2z.py		d2z.py
extract_sequences.sh		extract_sequences.sh
faToSeq.pl		faToSeq.pl
fastaToTxt.py		fastaToTxt.py
fieldsToCoord.py		fieldsToCoord.py
get_data.sh		get_data.sh
hexdiff.pl		hexdiff.pl
hexmcd.py		hexmcd.py
hg18.bejscHumanCNE.toDanRer5Region.txt		hg18.bejscHumanCNE.toDanRer5Region.txt
hg18.danRer5.aligningRegions.txt		hg18.danRer5.aligningRegions.txt
parse_answers.sh		parse_answers.sh
peakdetect.py		peakdetect.py
pkl_to_dat.py		pkl_to_dat.py
plot.pl		plot.pl
plot.py		plot.py
plot.sh		plot.sh
scorecard.pl		scorecard.pl
scoring.py		scoring.py
util.py		util.py
validate_seq.pl		validate_seq.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alignment Free Homology Detection

Requirements

Pipeline

About

Releases

Packages

Languages

yesimon/af-homology

Folders and files

Latest commit

History

Repository files navigation

Alignment Free Homology Detection

Requirements

Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages