Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogFC Clarification #6

Open
christopher-hardy opened this issue May 12, 2020 · 4 comments
Open

LogFC Clarification #6

christopher-hardy opened this issue May 12, 2020 · 4 comments

Comments

@christopher-hardy
Copy link

Hi,

Thank you for your work- it is really remarkable.

Not sure if this is a bug or a feature, but when doing a comparison between presto::wilcoxauc and Seurat::FindMarkers I noticed a difference in the LogFC calculation. When calculating the LogFC from a Seurat Data Object, the values are untransformed prior to calculating the mean, then re-log-transformed on the mean values- plus a pseudo count of default 1 to avoid Inf LogFCs (Line 551 @ https://github.com/satijalab/seurat/blob/master/R/differential_expression.R). This difference causes changes in the avgExpr and logFC when compared to the Seurat output. I was wondering if you could clarify whether you believe your approach is correct, or if this a potential issue with an edge case when the data are previously log transformed?

Note: If helpful I am able to replicate the values in Seurat with a couple slight modifications (see lines 1, 7 and 11 below).

  X <- expm1(X)
  group_sums <- sumGroups(X, y, 1)
  group_nnz <- nnzeroGroups(X, y, 1)
  group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
  group_pct_out <- -group_nnz %>% sweep(2, colSums(group_nnz), 
                                        "+") %>% sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
  group_means <- log(sweep(group_sums, 1, as.numeric(table(y)), "/") + 1) %>% t()
  cs <- colSums(group_sums)
  gs <- as.numeric(table(y))
  lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
    group_means[, g] - (log((cs - group_sums[g, ]) / (length(y) - gs[g]) + 1))
  }))

Thanks!

Chris

Simon-Leonard added a commit to Simon-Leonard/presto that referenced this issue Jul 20, 2022
To replicate Seurat::FindMarkers results
Modifications based on : immunogenomics#6
@ireen-kal
Copy link

Hi! I also noticed differences between presto::wilcoxauc and Seurat::FindMarkers in the reported logFC and avg_log2FC respectively. I was wondering if you could clarify this a bit more, and whether one or the other calculation would be better?

@slowkow
Copy link
Member

slowkow commented Jan 5, 2024

Hey @ilyakorsunsky I think there is a - where there should be a / on this line:

group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))

See the example below for details.

Example

Here is a complete example you can try running on your own:

Generate random data:

library(magrittr)
set.seed(42)
x1 <- rnorm(10) + 3
x2 <- rnorm(10) + 3

This is the fold-change:

mean(x1) / mean(x2)
#> [1] 1.25057

And the log fold-change:

log(mean(x1) / mean(x2))
#> [1] 0.2235997

OK so now let’s take a closer look at the presto::wilcoxauc() function to see if we are getting the correct value for log fold-change:

X <- matrix(c(x1, x2), nrow = 1)
y <- factor(rep(c(1, 2), each = 10))
presto::wilcoxauc(X, y)
#>    feature group  avgExpr      logFC statistic  auc      pval      padj pct_in
#> 1 Feature1     1 3.547297  0.7107535        66 0.66 0.2413216 0.2413216    100
#> 2 Feature1     2 2.836543 -0.7107535        34 0.34 0.2413216 0.2413216    100
#>   pct_out
#> 1     100
#> 2     100

OK, we wanted 0.22 but we got 0.71, so maybe something is not right.

Let's go through each line of the presto::wilcoxauc() function, one line at a time, and see where we might be going wrong:

### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc)
group_sums <- presto::sumGroups(X, y, 1)
group_sums
#>          [,1]
#> [1,] 35.47297
#> [2,] 28.36543
group_nnz <- presto::nnzeroGroups(X, y, 1)
group_nnz
#>      [,1]
#> [1,]   10
#> [2,]   10
group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
group_pct
#>      [,1] [,2]
#> [1,]    1    1
group_pct_out <- -group_nnz %>%
  sweep(2, colSums(group_nnz) , "+") %>% 
  sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
group_pct_out
#>      [,1] [,2]
#> [1,]    1    1
group_means <- sweep(group_sums, 1, as.numeric(table(y)), "/") %>% t()
group_means
#>          [,1]     [,2]
#> [1,] 3.547297 2.836543
cs <- colSums(group_sums)
cs
#> [1] 63.8384
gs <- as.numeric(table(y))
gs
#> [1] 10 10

So far so good.

But this is giving us 0.71 instead of 0.22:

lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
  group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))
}))
lfc
#>           init           
#> [1,] 0.7107535 -0.7107535

After we change - to / and add a %>% log() at the end, we get the correct numbers:

lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
  group_means[, g] / ((cs - group_sums[g, ]) / (length(y) - gs[g]))
})) %>% log()
lfc
#>           init           
#> [1,] 0.2235997 -0.2235997

@TeodoraTockovska
Copy link

I also am getting different results from FindAllMarkers and presto's function (presto:::wilcoxauc.Seurat). I am running presto as shown below:

markers_rna <- presto:::wilcoxauc.Seurat(X = seur_obj, 
                                         group_by = 'wsnn_res.0.8', assay = 'data', 
                                         seurat_assay = 'SCT')

@WT215
Copy link

WT215 commented Apr 1, 2024

Hey @ilyakorsunsky I think there is a - where there should be a / on this line:

group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))

See the example below for details.

Example

Here is a complete example you can try running on your own:

Generate random data:

library(magrittr)
set.seed(42)
x1 <- rnorm(10) + 3
x2 <- rnorm(10) + 3

This is the fold-change:

mean(x1) / mean(x2)
#> [1] 1.25057

And the log fold-change:

log(mean(x1) / mean(x2))
#> [1] 0.2235997

OK so now let’s take a closer look at the presto::wilcoxauc() function to see if we are getting the correct value for log fold-change:

X <- matrix(c(x1, x2), nrow = 1)
y <- factor(rep(c(1, 2), each = 10))
presto::wilcoxauc(X, y)
#>    feature group  avgExpr      logFC statistic  auc      pval      padj pct_in
#> 1 Feature1     1 3.547297  0.7107535        66 0.66 0.2413216 0.2413216    100
#> 2 Feature1     2 2.836543 -0.7107535        34 0.34 0.2413216 0.2413216    100
#>   pct_out
#> 1     100
#> 2     100

OK, we wanted 0.22 but we got 0.71, so maybe something is not right.

Let's go through each line of the presto::wilcoxauc() function, one line at a time, and see where we might be going wrong:

### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc)
group_sums <- presto::sumGroups(X, y, 1)
group_sums
#>          [,1]
#> [1,] 35.47297
#> [2,] 28.36543
group_nnz <- presto::nnzeroGroups(X, y, 1)
group_nnz
#>      [,1]
#> [1,]   10
#> [2,]   10
group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
group_pct
#>      [,1] [,2]
#> [1,]    1    1
group_pct_out <- -group_nnz %>%
  sweep(2, colSums(group_nnz) , "+") %>% 
  sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
group_pct_out
#>      [,1] [,2]
#> [1,]    1    1
group_means <- sweep(group_sums, 1, as.numeric(table(y)), "/") %>% t()
group_means
#>          [,1]     [,2]
#> [1,] 3.547297 2.836543
cs <- colSums(group_sums)
cs
#> [1] 63.8384
gs <- as.numeric(table(y))
gs
#> [1] 10 10

So far so good.

But this is giving us 0.71 instead of 0.22:

lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
  group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))
}))
lfc
#>           init           
#> [1,] 0.7107535 -0.7107535

After we change - to / and add a %>% log() at the end, we get the correct numbers:

lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
  group_means[, g] / ((cs - group_sums[g, ]) / (length(y) - gs[g]))
})) %>% log()
lfc
#>           init           
#> [1,] 0.2235997 -0.2235997

I tried this modification, in addition to that, I added 0.1 to the count matrix to avoid Inf. Then the returned logFC seems to be consistent with the default logFC. So I guess the purpose of original logFC was to avoid Inf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants