-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chromPeakSummary function and a fix #772
Conversation
- Report also a chromatogram's m/z values if `findChromPeaks` is run on a `Chromatogram` or `Chromatograms` object (issue #765).
- Peak shape quality (similarity to gaussian shape) calculation is now performed using the EIC representation of a chromatographic peak, i.e. with intensities of mass peaks for the same retention time (but different m/z) summed.
Add internal function to calculate beta metrics
added chromPeakSummary method
Added peak quality section to xcms vignette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great ! Also thanks for the small vignette ! The code looks good to me, thank @pablovgd, excited to incorporate it in the overall workflow and see how to integrate it with the other preprocessing steps to improve them.
PS: I think the GHA needs to be updated, i they are failing because of warnings.
R/XcmsExperiment-functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.
Thanks @wkumler for your review! I have now addressed all your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the PR !
What I couldn't attach to a specific line, would it make sense to add the paper to https://github.com/sneumann/xcms/blob/devel/inst/CITATION ?
Yours, Steffen
#' @param skews A numeric vector of the skews to try, corresponding to the | ||
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly | ||
#' right-skewed, while values greater than 5 will be left-skewed. | ||
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs()
tells you the skewedness (regardless in what direction) and you can <0
or >0
if you only want the direction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thank you for the review. I'm using the language inherent to the dbeta()
function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary
then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.
@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) { | |||
valsPerSpect = valsPerSpect, rtrange = rtr, | |||
mzrange = mzr) | |||
if (length(mtx)) { | |||
## mtx: time, mz, intensity | |||
if (any(!is.na(mtx[, 3]))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity)
. Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - generally. Here, we load the mtx
matrix
with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.
I did some timings on that:
mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L
library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
expr min lq mean median uq max neval cld
mtx[, 3L] 354 363.5 437.30 380.5 477.5 948 100 a
mtx[, "intensity"] 389 404.0 543.07 412.0 474.0 5036 100 a
mtx[, int_col] 381 399.5 469.45 407.0 484.0 1135 100 a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.
((rtr[2] - rtr[1]) / | ||
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1))) | ||
maxi <- which.max(mtx[, 3]) | ||
maxi <- which.max(mtx[, 3L]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, above was not the only hardcoded 3
:-) and more below ...
Regarding the CITATIONS - I added the reference to the respective man page. I think adding that as a CITATION for xcms itself might be bit too much, as the paper deals more with peak shape quality, independently of xcms? |
For now I only bumped the version and fixed the NEWS.md, but I did not address the other points above - can we maybe discuss them again @sneumann ? |
I'll make the PR a draft and fix the conflicts. After that I'll re-open. |
- Add `c()` method to combine `XcmsExperiment` objects. - Add a method to coerce from `XCMSnExp` to `XcmsExperiment` objects. - Fix references in documentation.
I've now fixed the conflicts and in addition also added the |
@sneumann , just a little reminder that this PR is still open and we need to address it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks everyone for the PR and review work. Yours, Steffen
#' Convert a XCMSnExp to a XcmsExperiment. | ||
#' | ||
#' @noRd | ||
.xcms_n_exp_to_xcms_experiment <- function(from) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unusual (to me) way of converting CamelCase class name to (part of) function name. As it is internal, I am just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed, true. was not paying attention actually. I tend to write internal function names now always in snake_case - for no particular reason except that I find it a bit more readable
#' @param x `list` of `XcmsExperiment` objects. | ||
#' | ||
#' @noRd | ||
.xmse_combine <- function(x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again internal, but would be xcmse
or something else be more intiutive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, yes. I was using .mse_
as an abbreviation for MsExperiment
in internal function names - to quickly see from the function name that this is supposed to work on MsExperiment
objects - then for XcmsExperiment
I used xmse
- and true, I sometimes even misspell it now. Would just need changes in a lot of places if we want to revert :(
#' results is supported. Any eventually present alignment or correspondence | ||
#' results will be dropped before combining the `XcmsExperiment` objects. | ||
#' Finally, at present, only the MS data of the individual `XcmsExperiment` | ||
#' objects is combined and any data eventually present in the `@qdata`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dug into the function, but what prevents c()
ing also slots like @experimentFiles
from two or more XcmsExperiment
s ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly - I never used the @experimentFiles
slot - and am actually also not quite sure it will be used a lot...
@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) { | |||
valsPerSpect = valsPerSpect, rtrange = rtr, | |||
mzrange = mzr) | |||
if (length(mtx)) { | |||
## mtx: time, mz, intensity | |||
if (any(!is.na(mtx[, 3]))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.
This PR adds the
chromPeakSummary()
method and a first implementation to calculate the peak shape quality from @wkumler on chrom peak results (thanks to @pablovgd for contributing).In addition, it fixes a bug in the calculation of the beta scores during gap filling.