You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the incredibly fast implementation. I have been trying to clone the repo and learn from the code. When I was trying to extend it from a default two-sided hypothesis to a one-sided hypothesis, I found something potentially mistaken.
For the hypothesis extension, it's rather simple by switching a correction value and pnorm() call in compute_pval(), referring to R's implementation. However, when I'm testing it with a sparse matrix where one of the features/genes is all zero, I found the result pretty odd -- It got a <1 p-value! So I digged into the code and suspected the ties returned from Rcpp side lacks the counting on zeros, which I believe is intentional for sparse data. I can see that the ties on zeros are easily addressed when calculating the ustats, but I never found it even mentioned when you calculate p-values. With suspecting that this is the cause, I modified my the Rcpp code in my copy:
src/fast_wilcox.cpp function cpp_rank_matrix_dgc
std::vector<std::list<float> > cpp_rank_matrix_dgc(
arma::vec& x, const arma::vec& p, int nrow, int ncol) {
vector<list<float> > ties(ncol);
int n_zero;
for (int i = 0; i < ncol; i++) {
n_zero = nrow - (p[i+1] - p[i]);
if (p[i+1] == p[i]) {
ties[i].push_back(n_zero);
continue;
}
ties[i] = cpp_in_place_rank_mean(x, p[i], p[i + 1] - 1);
ties[i].push_back(n_zero);
x.rows(p[i], p[i + 1] - 1) += n_zero;
}
return ties;
}
and also in function cpp_in_place_rank_mean, it seems the lastly counted ties are also not pushed back as well.
// code originally between line 125-143
for (i = 1U; i < v_sort.size(); i++) {
if (v_sort[i].first != v_sort[i - 1].first) {
// .....
} else {
// .....
}
}
// Things look good until now but I guess we need to add the following line
// right after the loop ends
if (n > 1) ties.push_back(n);
OK, all of these are based on that I want the function to be able to replicate what stats::wilcox.test() can produce. If it was on purpose to ignore the ties on zeros and the last happening ties when calculating p-values, I would really love to learn about the reason. But overall I appreciate it so much that you have made this package!
-Yichen
The text was updated successfully, but these errors were encountered:
Hi team,
Thanks for the incredibly fast implementation. I have been trying to clone the repo and learn from the code. When I was trying to extend it from a default two-sided hypothesis to a one-sided hypothesis, I found something potentially mistaken.
For the hypothesis extension, it's rather simple by switching a correction value and
pnorm()
call incompute_pval()
, referring to R's implementation. However, when I'm testing it with a sparse matrix where one of the features/genes is all zero, I found the result pretty odd -- It got a <1 p-value! So I digged into the code and suspected theties
returned from Rcpp side lacks the counting on zeros, which I believe is intentional for sparse data. I can see that the ties on zeros are easily addressed when calculating theustats
, but I never found it even mentioned when you calculate p-values. With suspecting that this is the cause, I modified my the Rcpp code in my copy:src/fast_wilcox.cpp function
cpp_rank_matrix_dgc
and also in function
cpp_in_place_rank_mean
, it seems the lastly counted ties are also not pushed back as well.It's very easy to test with something like
And if I apply the modification I described above, we can then have
OK, all of these are based on that I want the function to be able to replicate what
stats::wilcox.test()
can produce. If it was on purpose to ignore the ties on zeros and the last happening ties when calculating p-values, I would really love to learn about the reason. But overall I appreciate it so much that you have made this package!-Yichen
The text was updated successfully, but these errors were encountered: