This repository contains the python script "make_dotplot.py" that can be used to produce a dot plot describing the gene/protein functional enrichment results
produced by the Cytoscape app "stringApp" (available at: http://apps.cytoscape.org/apps/stringapp).
The code can also handle data generated by other tools, as long as a table with the required columns is given and options are set accordingly.
The following versions of these modules were used for testing:
- python=3.8.3
- matplotlib=3.2.1
- pandas=1.0.4
- seaborn=0.10.1
./make_dotplot.py -e table_enrichment.tsv -o path_to_output_folder
where:
table_enrichment.tsv is the stringApp output .tsv table
path_to_output_folder is the path to the folder where the output will be placed (if not-existing, it will be created).
One plot in .svg format for each annotation cathegory (KEGG Pathways, Gene Ontology Components...) will be generated. By default, KEGG Pathways and Reactome Pathways are grouped together. To change this behavior set the option --groups to an empty list.
The script automatically adjustes the size of the images based on the number or terms and the length of the terms definitions (eg. pathways names). In case the automatic adjustment is not sufficient the options --increase_height and/or --increase_width can be used (set an integer > 0).
As a minimum, the table containing the enrichment resuls should contain columns with the following data for each row (ech row represents a term being tested for enrichment):
- N. background elements (eg. genes)
- N. elements assigned to the term
- Cathegory name (to group the results based on a certain cathegory. Use one cathegory for all rows to avoid grouping)
- Term description
- FDR
A name to each of these columns must be given in the header and assigned to the corresponding options (-cn, -c_nb, -c_c, -c_d, -c_f).
- -h, --help
show the help message and exit - -e ENRICHMENT_RESULTS, --enrichment_results ENRICHMENT_RESULTS
Path to table with enrichment results, in csv format - -o O
Path to output folder< - --GSEA {StringApp,Other}
Source of the GSEA analysis - -c_n C_N
Name of the column containing the number of background genes - -c_nb C_NB
Name of the column containing the number of genes assigned to each term - -c_c C_C
Name of the column containing a category, used to group enriched terms - -c_d C_D
Name of the column containing the term description - -c_f C_F
Name of the column containing the FDR value - -n N
Maximum number of terms to plot - --ratio_min RATIO_MIN
Consider only terms for which the gene-ratio term is above the given threshold - --palette PALETTE
Color palette used - --n_bins {2,3,4}
Number of bins for grouping gene counts - --groups GROUPS [GROUPS ...]
List of categories to be grouped together - --increase_height INCREASE_HEIGHT
Integer, increases the height of the plot - --increase_width INCREASE_WIDTH
Integer, increases the width of the plot