-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSS enrichment with a region extension 2000bp #1638
Comments
Hi @Montsecb, this is indeed not expected, thanks for reporting the issue. There may be small changes in TSS enrichment score depending on the size of the flanking region used but certainly not to this extent. I will try to look into this, for now I'd suggest sticking with the |
I looked at the code of Lines 5 to 12 in 8b98f63
If we look at the link provided, we have the following definition for TSS enrichment:
But if we look at the actual code of the function:
As a comparison, ArchR is more in line with ENCODE definition and uses by default a region extension of 2000 (total length 4000), 100 bp at each flank for background, and signal window of 101 bp centered at the TSS. Furthermore they provide a way to feed those 3 values as parameter to the function and to arbitrary exclude extra chromosomes (not just mitochondrial) which is convenient to compare with other tools. I would suggest to either fix TSSEnrichment to comply with ENCODE definition, or provides a better description of what is computed. Note that the former would drastically change TSSEnrichment computed in users’ previous analyses (the QC thresholds would have to be adapted) but since it currently does not follow any standard in ATACseq I think that would be the right course. It would also be nice if we could have the following extra parameters:
I could have a look if you are open to pull requests. Cheers, |
@nigiord when this function was implemented in Signac ENCODE did not actually say what size region was being used for the central and flanking part. It looks like they've since added more documentation for this, and we are now doing something that isn't in line with the ENCODE terminology. I agree we should change this to be consistent with ENCODE and other ATAC analysis packages. The current values are not wrong, but it is obviously preferable if the definition of TSS enrichment score can be consistent across different tools. This would change the TSS scores being returned, but as long as this change is clearly documented in the release notes I'm ok with that.
Agreed these would be nice to have, I'm certainly open to a PR if you're interested in contributing. This would also enable users to get the current TSS scores if we were to update the defaults to be in line with ENCODE |
Hi @timoast, thank you for your answer! It seems simple enough since the code is rather neat and well commented so I’ll try to suggest a PR as soon as possible. Regarding @Montsecb issue with very high TSS values when extending the region to -2000:+2000, the code that implements region_extension looks fine to me, so I think we lack information to properly understand what’s actually happening. Could you generate the TSSPlot(seuratobj) +
ggplot2::geom_vline(
data = data.frame(
x = c(-500, 500, -901, 901, -1901, 1901),
label = c(
"center window", "center window",
"flank default", "flank default",
"flank 2000bp", "flank 2000bp"
)
),
mapping = aes(xintercept = x, color = label),
linetype = "dashed",
) +
theme(legend.position = "right") I sometimes have very high TSS value but it’s always with barcodes that fail other QC metrics though (usually number of fragments). |
Hi there,
I was trying to calculate the TSS enrichment score using signac version 1.11 by adding an extension up to 2000 bp from TSSs. I see that the results produced have a high enrichment value (up to 150 in some cells). I would like to know if these values are normal or if they are too high, as I usually see enrichment values ranging up to 60 or 80.
I'm leaving the code and the results here using 1000 and 2000 bp.
seu <- TSSEnrichment(object = seu, fast=F, region_extension = 2000)
Here are the results using a region extension of 1000 bp(left) and 2000 bp (right)
The text was updated successfully, but these errors were encountered: