Skip to content

Latest commit

 

History

History
74 lines (64 loc) · 3.83 KB

README.md

File metadata and controls

74 lines (64 loc) · 3.83 KB

enrichment-visualizer

This repository contains the python script "make_dotplot.py" that can be used to produce a dot plot describing the gene/protein functional enrichment results produced by the Cytoscape app "stringApp" (available at: http://apps.cytoscape.org/apps/stringapp). The code can also handle data generated by other tools, as long as a table with the required columns is given and options are set accordingly.

Requirements

make_dotplot.py requires python v.3, pandas, matplotlib and seaborn.
The following versions of these modules were used for testing:
  • python=3.8.3
  • matplotlib=3.2.1
  • pandas=1.0.4
  • seaborn=0.10.1

Usage

The simplest way of running the script given a stringApp output table is to type:

./make_dotplot.py -e table_enrichment.tsv -o path_to_output_folder

where:
table_enrichment.tsv is the stringApp output .tsv table
path_to_output_folder is the path to the folder where the output will be placed (if not-existing, it will be created).

One plot in .svg format for each annotation cathegory (KEGG Pathways, Gene Ontology Components...) will be generated. By default, KEGG Pathways and Reactome Pathways are grouped together. To change this behavior set the option --groups to an empty list.
The script automatically adjustes the size of the images based on the number or terms and the length of the terms definitions (eg. pathways names). In case the automatic adjustment is not sufficient the options --increase_height and/or --increase_width can be used (set an integer > 0).

Running on enrichment results other than stringApp output table

As a minimum, the table containing the enrichment resuls should contain columns with the following data for each row (ech row represents a term being tested for enrichment):
  • N. background elements (eg. genes)
  • N. elements assigned to the term
  • Cathegory name (to group the results based on a certain cathegory. Use one cathegory for all rows to avoid grouping)
  • Term description
  • FDR

A name to each of these columns must be given in the header and assigned to the corresponding options (-cn, -c_nb, -c_c, -c_d, -c_f).

List of all options

  • -h, --help
    show the help message and exit
  • -e ENRICHMENT_RESULTS, --enrichment_results ENRICHMENT_RESULTS
    Path to table with enrichment results, in csv format
  • -o O
    Path to output folder<
  • --GSEA {StringApp,Other}
    Source of the GSEA analysis
  • -c_n C_N
    Name of the column containing the number of background genes
  • -c_nb C_NB
    Name of the column containing the number of genes assigned to each term
  • -c_c C_C
    Name of the column containing a category, used to group enriched terms
  • -c_d C_D
    Name of the column containing the term description
  • -c_f C_F
    Name of the column containing the FDR value
  • -n N
    Maximum number of terms to plot
  • --ratio_min RATIO_MIN
    Consider only terms for which the gene-ratio term is above the given threshold
  • --palette PALETTE
    Color palette used
  • --n_bins {2,3,4}
    Number of bins for grouping gene counts
  • --groups GROUPS [GROUPS ...]
    List of categories to be grouped together
  • --increase_height INCREASE_HEIGHT
    Integer, increases the height of the plot
  • --increase_width INCREASE_WIDTH
    Integer, increases the width of the plot

Author

Giulia I. Corsi giulia at rth dot dk

Author

Copyright © 2020 Giulia I. Corsi, code released under the the GNU public license version 3 or later.