protein-classification-and-generation

Using deep learning to distinguish naturally occurring protein sequences from randomly shuffled ones

Motivation and Results

The extent to which protein amino acid sequences found in nature differ from random sequences remains an open question. Using an LSTM-based neural network, I have shown that sequences can be distinguished with record 98% accuracy. This is significantly higher than previous benchmarks I'm aware of, and relies solely on the information content of the sequences themselves without reference to chemical properties. See, e.g., here, here, and here for related work. Obtaining a NN model that can reliably distinguish natural proteins from random ones lays the foundation for a GAN that can propose new proteins not found in nature (coming soon).

Use

Run make_data.py to preprocess the raw data contained in the included .fasta files, which were downloaded from UniProt. Query information used to make these files can be found in make_data.py

Run protein_classifier.py to train on the processed data. Note that training will be much faster with a GPU. Saved models can be tested by running testing.py

(protein_GAN.py is work in progress)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
100_to_200.fasta		100_to_200.fasta
100_to_200_transcript_level.fasta		100_to_200_transcript_level.fasta
README.md		README.md
current_best_classifier.hdf5		current_best_classifier.hdf5
make_data.py		make_data.py
protein_GAN.py		protein_GAN.py
protein_classifier.py		protein_classifier.py
requirements.txt		requirements.txt
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

protein-classification-and-generation

Motivation and Results

Use

About

Uh oh!

Releases

Packages

Uh oh!

Languages

elanstop/protein-classification-and-generation

Folders and files

Latest commit

History

Repository files navigation

protein-classification-and-generation

Motivation and Results

Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages