Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer documentation needed for tutorial examples #16

Open
kamurani opened this issue Apr 24, 2023 · 1 comment
Open

Clearer documentation needed for tutorial examples #16

kamurani opened this issue Apr 24, 2023 · 1 comment

Comments

@kamurani
Copy link
Contributor

In the tutorial.ipynb workflow, a file is loaded at data/test_files/ptm_file.csv which contains a set of sites and known PTMs associated with that site (e.g. p, ub, m etc.)

There is also a *_reg column for some of these sites, however it's not explained what this means and i'm unsure to what extent these extra columns are used in the downstream analysis.

For example, in perform_enrichment_analysis_per_protein, we supply a ptm_dict which to my understanding just tells the function which residues to use for the "random" background generation (i.e. residues STY that are not necessarily modified should be analysed to see if there is a statistical difference in structural properties compared to the known phosphorylation sites). But is the p_reg also important for enrichment analysis here? Are these the background residues...?

Thanks in advance!

@ibludau
Copy link
Collaborator

ibludau commented Jan 8, 2024

Hi,
thanks for your message and sorry for the delay in my reply.
The *_reg columns in the ptm_file.csv specifies sites with a known regulatory function. So in case you don't want to look at all modified sites but a subgroup of known regulatory sites you can use those. For any follow-up analysis you could also use e.g. all p sites ad background and the p_reg sites as target to see specific trends for regulatory sites against the background of all p-sites. But this is not necessary for the general functions shown in the tutorial.
And yes, the ptm_dict is only specifying the possible residues for a modification. The p_reg sites could be used instead of the p sites in this analysis, but they don't have any other function.
I hope this answers your questions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants