You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the tutorial.ipynb workflow, a file is loaded at data/test_files/ptm_file.csv which contains a set of sites and known PTMs associated with that site (e.g. p, ub, m etc.)
There is also a *_reg column for some of these sites, however it's not explained what this means and i'm unsure to what extent these extra columns are used in the downstream analysis.
For example, in perform_enrichment_analysis_per_protein, we supply a ptm_dict which to my understanding just tells the function which residues to use for the "random" background generation (i.e. residues STY that are not necessarily modified should be analysed to see if there is a statistical difference in structural properties compared to the known phosphorylation sites). But is the p_reg also important for enrichment analysis here? Are these the background residues...?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hi,
thanks for your message and sorry for the delay in my reply.
The *_reg columns in the ptm_file.csv specifies sites with a known regulatory function. So in case you don't want to look at all modified sites but a subgroup of known regulatory sites you can use those. For any follow-up analysis you could also use e.g. all p sites ad background and the p_reg sites as target to see specific trends for regulatory sites against the background of all p-sites. But this is not necessary for the general functions shown in the tutorial.
And yes, the ptm_dict is only specifying the possible residues for a modification. The p_reg sites could be used instead of the p sites in this analysis, but they don't have any other function.
I hope this answers your questions :)
In the
tutorial.ipynb
workflow, a file is loaded atdata/test_files/ptm_file.csv
which contains a set of sites and known PTMs associated with that site (e.g.p
,ub
,m
etc.)There is also a
*_reg
column for some of these sites, however it's not explained what this means and i'm unsure to what extent these extra columns are used in the downstream analysis.For example, in
perform_enrichment_analysis_per_protein
, we supply aptm_dict
which to my understanding just tells the function which residues to use for the "random" background generation (i.e. residuesSTY
that are not necessarily modified should be analysed to see if there is a statistical difference in structural properties compared to the known phosphorylation sites). But is thep_reg
also important for enrichment analysis here? Are these the background residues...?Thanks in advance!
The text was updated successfully, but these errors were encountered: