Skip to content

RushiBhatt007/Gene-compression-and-Cancer-type-classification

Repository files navigation

Gene-compression-and-Cancer-type-classification

Fall 2021 Final Project for CM226 - Machine Learning in Bioinformatics

Dataset

  • TCGA - The Cancer Genome Atlas PanCanAtlas RNAseq data from the National Cancer Institute Genomic Data Commons
  • These data consisted of 11,069 samples with 20,531 measured genes. Preprocessing-
  • Tumors that were measured from multiple sites were removed.
  • Data was normalised
  • This resulted in a final TCGA PanCanAtlas gene expression matrix with 11,060 samples, which included 33 different cancer types, and 16,148 genes.
  • The data is split into 90% training and 10% testing partitions. The data is partitioned such that each split contained relatively equal representation of each cancer type.

Proposed Model

alt text

Implementation of PCA, NMF, ICA

  • These models have been built using sklearn library

Implementation of VAE Model

alt text

  • This VAE model is inspired from Tybalt's implementation

Authors

  • Rushi Bhatt
  • Ronak Kaoshik
  • Shruti Mohanty

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •