enrichment-visualizer

This repository contains the python script "make_dotplot.py" that can be used to produce a dot plot describing the gene/protein functional enrichment results produced by the Cytoscape app "stringApp" (available at: http://apps.cytoscape.org/apps/stringapp). The code can also handle data generated by other tools, as long as a table with the required columns is given and options are set accordingly.

Requirements

make_dotplot.py requires python v.3, pandas, matplotlib and seaborn.
The following versions of these modules were used for testing:

python=3.8.3
matplotlib=3.2.1
pandas=1.0.4
seaborn=0.10.1

Usage

The simplest way of running the script given a stringApp output table is to type:

./make_dotplot.py -e table_enrichment.tsv -o path_to_output_folder

where:
table_enrichment.tsv is the stringApp output .tsv table
path_to_output_folder is the path to the folder where the output will be placed (if not-existing, it will be created).

One plot in .svg format for each annotation cathegory (KEGG Pathways, Gene Ontology Components...) will be generated. By default, KEGG Pathways and Reactome Pathways are grouped together. To change this behavior set the option --groups to an empty list.
The script automatically adjustes the size of the images based on the number or terms and the length of the terms definitions (eg. pathways names). In case the automatic adjustment is not sufficient the options --increase_height and/or --increase_width can be used (set an integer > 0).

Running on enrichment results other than stringApp output table

As a minimum, the table containing the enrichment resuls should contain columns with the following data for each row (ech row represents a term being tested for enrichment):

N. background elements (eg. genes)
N. elements assigned to the term
Cathegory name (to group the results based on a certain cathegory. Use one cathegory for all rows to avoid grouping)
Term description
FDR

A name to each of these columns must be given in the header and assigned to the corresponding options (-cn, -c_nb, -c_c, -c_d, -c_f).

List of all options

-h, --help
show the help message and exit
-e ENRICHMENT_RESULTS, --enrichment_results ENRICHMENT_RESULTS
Path to table with enrichment results, in csv format
-o O
Path to output folder<
--GSEA {StringApp,Other}
Source of the GSEA analysis
-c_n C_N
Name of the column containing the number of background genes
-c_nb C_NB
Name of the column containing the number of genes assigned to each term
-c_c C_C
Name of the column containing a category, used to group enriched terms
-c_d C_D
Name of the column containing the term description
-c_f C_F
Name of the column containing the FDR value
-n N
Maximum number of terms to plot
--ratio_min RATIO_MIN
Consider only terms for which the gene-ratio term is above the given threshold
--palette PALETTE
Color palette used
--n_bins {2,3,4}
Number of bins for grouping gene counts
--groups GROUPS [GROUPS ...]
List of categories to be grouped together
--increase_height INCREASE_HEIGHT
Integer, increases the height of the plot
--increase_width INCREASE_WIDTH
Integer, increases the width of the plot

Author

Giulia I. Corsi giulia at rth dot dk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

enrichment-visualizer

Requirements

Usage

Running on enrichment results other than stringApp output table

List of all options

Author

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

enrichment-visualizer

Requirements

Usage

Running on enrichment results other than stringApp output table

List of all options

Author

Author