Skip to content

Sequence Analysis

Rajan edited this page Apr 24, 2024 · 7 revisions

Sequence analysis python scripts

gc_percent.py

This script take FASTA file as input and return the GC content

$ python gc_percent.py <FASTA_File>

k-mer_constructor.py

This script take Fasta file as input and produce all the K-mers of length specified by the user

$ python k-mer_constructor.py <FASTA_File>

kmer_constructor-1.py

This script takes FASTA file as input and create overlapping/non-overlapping k-mers of length(s) specified by the user

$ python kmer_constructor-1.py <seq.fasta> <kmer_size> <kmer_type>
kmer_size: 3 or 3..7
kmer_type: -N/-O for Non-overlapping/Overlapping

random_seq_generator.py

This script take sequence-type and sequence-length as input and produce random-nucleotide-sequence in FASTA format

$ python random_seq_generator.py

base_composition.py

This script take DNA fasta file or multi fasta file as input and create base_composition.tsv file

$ python base_composition.py <Fasta/Multi_fasta File>

translate.py

This script take DNA sequence fasta file and produce amino acid sequences in three different frames for each strand

$ python translate.py <FASTA_File>

consensus.py

This script take multi fasta file as input and generate consensus_matrix

$ python consensus.py <multi_fasta_file>

prototype_aligner1.py

This script takes fasta file with two sequences of same length as input and perform ungapped global alignment

$ python prototype_aligner1.py <file.fasta>

prot_mol_weight_calculator.py

Program to calculate protein(s) molecular weight from the amino acid sequence(s) in Fasta/Multi_Fasta file

$ python prot_mol_weight_calculator.py prot.fasta

PSSM.py

Create PSSM (Position-Specific Scoring Matrix) from multi fasta DNA sequences

$ python PSSM.py <multi.fasta>

temp_to_cod.py

Input template sequence and convert it to coding sequence

$ python temp_to_cod.py

alignment2consensus.py

Generate consensus sequence from Multiple Sequence Alignment (MSA) file

$ python alignment2consensus.py <seq.aln>

orf_analyzer.py

Generate nucleotide, GC/AT, codon and amino acid composition plots for given ORF sequence in FASTA format

$ python orf_analyzer.py <seq.fasta>

hydrophobicity_plot.py

Calculate and plot hydrophobicity of a given peptide/protein sequence using Kyte-Doolittle scale

$ python hydrophobicity_plot.py prot.fasta

aa_comp.py

Create stacked barplot for each amino acid position showing sequence conservation

$ python aa_comp.py aligned_seq.fasta

random_seq_generator-1.py

Generate random nucleotide sequence with given length and GC frequency

$ python random_seq_generator-1.py