-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogFC Clarification #6
Comments
To replicate Seurat::FindMarkers results Modifications based on : immunogenomics#6
Hi! I also noticed differences between presto::wilcoxauc and Seurat::FindMarkers in the reported logFC and avg_log2FC respectively. I was wondering if you could clarify this a bit more, and whether one or the other calculation would be better? |
Hey @ilyakorsunsky I think there is a Line 192 in 31dc97f
See the example below for details. ExampleHere is a complete example you can try running on your own: Generate random data: library(magrittr)
set.seed(42)
x1 <- rnorm(10) + 3
x2 <- rnorm(10) + 3 This is the fold-change: mean(x1) / mean(x2)
#> [1] 1.25057 And the log fold-change: log(mean(x1) / mean(x2))
#> [1] 0.2235997 OK so now let’s take a closer look at the X <- matrix(c(x1, x2), nrow = 1)
y <- factor(rep(c(1, 2), each = 10))
presto::wilcoxauc(X, y)
#> feature group avgExpr logFC statistic auc pval padj pct_in
#> 1 Feature1 1 3.547297 0.7107535 66 0.66 0.2413216 0.2413216 100
#> 2 Feature1 2 2.836543 -0.7107535 34 0.34 0.2413216 0.2413216 100
#> pct_out
#> 1 100
#> 2 100 OK, we wanted Let's go through each line of the ### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc)
group_sums <- presto::sumGroups(X, y, 1)
group_sums
#> [,1]
#> [1,] 35.47297
#> [2,] 28.36543
group_nnz <- presto::nnzeroGroups(X, y, 1)
group_nnz
#> [,1]
#> [1,] 10
#> [2,] 10
group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
group_pct
#> [,1] [,2]
#> [1,] 1 1
group_pct_out <- -group_nnz %>%
sweep(2, colSums(group_nnz) , "+") %>%
sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
group_pct_out
#> [,1] [,2]
#> [1,] 1 1
group_means <- sweep(group_sums, 1, as.numeric(table(y)), "/") %>% t()
group_means
#> [,1] [,2]
#> [1,] 3.547297 2.836543
cs <- colSums(group_sums)
cs
#> [1] 63.8384
gs <- as.numeric(table(y))
gs
#> [1] 10 10 So far so good. But this is giving us lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))
}))
lfc
#> init
#> [1,] 0.7107535 -0.7107535 After we change lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
group_means[, g] / ((cs - group_sums[g, ]) / (length(y) - gs[g]))
})) %>% log()
lfc
#> init
#> [1,] 0.2235997 -0.2235997 |
I also am getting different results from FindAllMarkers and presto's function (presto:::wilcoxauc.Seurat). I am running presto as shown below:
|
I tried this modification, in addition to that, I added 0.1 to the count matrix to avoid Inf. Then the returned logFC seems to be consistent with the default logFC. So I guess the purpose of original logFC was to avoid Inf? |
Hi,
Thank you for your work- it is really remarkable.
Not sure if this is a bug or a feature, but when doing a comparison between presto::wilcoxauc and Seurat::FindMarkers I noticed a difference in the LogFC calculation. When calculating the LogFC from a Seurat Data Object, the values are untransformed prior to calculating the mean, then re-log-transformed on the mean values- plus a pseudo count of default 1 to avoid Inf LogFCs (Line 551 @ https://github.com/satijalab/seurat/blob/master/R/differential_expression.R). This difference causes changes in the avgExpr and logFC when compared to the Seurat output. I was wondering if you could clarify whether you believe your approach is correct, or if this a potential issue with an edge case when the data are previously log transformed?
Note: If helpful I am able to replicate the values in Seurat with a couple slight modifications (see lines 1, 7 and 11 below).
Thanks!
Chris
The text was updated successfully, but these errors were encountered: