Add chromPeakSummary function and a fix #772

jorainer · 2024-09-30T05:42:40Z

This PR adds the chromPeakSummary() method and a first implementation to calculate the peak shape quality from @wkumler on chrom peak results (thanks to @pablovgd for contributing).

In addition, it fixes a bug in the calculation of the beta scores during gap filling.

- Report also a chromatogram's m/z values if `findChromPeaks` is run on a `Chromatogram` or `Chromatograms` object (issue #765).

- Peak shape quality (similarity to gaussian shape) calculation is now performed using the EIC representation of a chromatographic peak, i.e. with intensities of mass peaks for the same retention time (but different m/z) summed.

Add internal function to calculate beta metrics

added chromPeakSummary method

Added peak quality section to xcms vignette.

philouail

This is great ! Also thanks for the small vignette ! The code looks good to me, thank @pablovgd, excited to incorporate it in the overall workflow and see how to integrate it with the other preprocessing steps to improve them.

PS: I think the GHA needs to be updated, i they are failing because of warnings.

R/AllGenerics.R

wkumler · 2024-09-30T16:09:41Z

R/XcmsExperiment-functions.R

Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.

R/methods-XChromatogram.R

vignettes/xcms.Rmd

jorainer · 2024-10-01T09:25:03Z

Thanks @wkumler for your review! I have now addressed all your suggestions.

sneumann

Hi, thanks for the PR !
What I couldn't attach to a specific line, would it make sense to add the paper to https://github.com/sneumann/xcms/blob/devel/inst/CITATION ?
Yours, Steffen

sneumann · 2024-10-07T15:05:26Z

R/do_findChromPeaks-functions.R

 #' @param skews A numeric vector of the skews to try, corresponding to the
-#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
-#' right-skewed, while values greater than 5 will be left-skewed.
+#'     shape1 of dbeta with a shape2 of 5. Values less than 5 will be


Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs() tells you the skewedness (regardless in what direction) and you can <0 or >0 if you only want the direction ?

That's a good point, thank you for the review. I'm using the language inherent to the dbeta() function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.

sneumann · 2024-10-07T15:08:15Z

R/functions-XCMSnExp.R

@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
                       valsPerSpect = valsPerSpect, rtrange = rtr,
                       mzrange = mzr)
        if (length(mtx)) {
+            ## mtx: time, mz, intensity
            if (any(!is.na(mtx[, 3]))) {


I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity). Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?

I agree - generally. Here, we load the mtx matrix with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.

I did some timings on that:

mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23)) int_col <- 3L library(microbenchmark) > microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col]) Unit: nanoseconds expr min lq mean median uq max neval cld mtx[, 3L] 354 363.5 437.30 380.5 477.5 948 100 a mtx[, "intensity"] 389 404.0 543.07 412.0 474.0 5036 100 a mtx[, int_col] 381 399.5 469.45 407.0 484.0 1135 100 a

ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.

sneumann · 2024-10-07T15:08:52Z

R/functions-XCMSnExp.R

                    ((rtr[2] - rtr[1]) /
                     max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))
-                maxi <- which.max(mtx[, 3])
+                maxi <- which.max(mtx[, 3L])


No, above was not the only hardcoded 3 :-) and more below ...

jorainer · 2024-10-08T11:53:41Z

Regarding the CITATIONS - I added the reference to the respective man page. I think adding that as a CITATION for xcms itself might be bit too much, as the paper deals more with peak shape quality, independently of xcms?

jorainer · 2024-10-09T07:43:17Z

For now I only bumped the version and fixed the NEWS.md, but I did not address the other points above - can we maybe discuss them again @sneumann ?

Update vignette

jorainer · 2024-12-16T07:21:46Z

I'll make the PR a draft and fix the conflicts. After that I'll re-open.

- Add `c()` method to combine `XcmsExperiment` objects. - Add a method to coerce from `XCMSnExp` to `XcmsExperiment` objects. - Fix references in documentation.

jorainer · 2024-12-17T06:35:48Z

I've now fixed the conflicts and in addition also added the c() method to combine multiple XcmsExperiment objects into one as well as method to coerce from XCMSnExp to XcmsExperiment objects. Would be ready to review @sneumann :)

jorainer · 2025-01-17T06:47:58Z

@sneumann , just a little reminder that this PR is still open and we need to address it :)

sneumann

Hi, thanks everyone for the PR and review work. Yours, Steffen

sneumann · 2025-02-06T08:36:19Z

R/XcmsExperiment-functions.R

+#' Convert a XCMSnExp to a XcmsExperiment.
+#'
+#' @noRd
+.xcms_n_exp_to_xcms_experiment <- function(from) {


Unusual (to me) way of converting CamelCase class name to (part of) function name. As it is internal, I am just curious.

yes, indeed, true. was not paying attention actually. I tend to write internal function names now always in snake_case - for no particular reason except that I find it a bit more readable

sneumann · 2025-02-06T08:37:13Z

R/XcmsExperiment-functions.R

+#' @param x `list` of `XcmsExperiment` objects.
+#'
+#' @noRd
+.xmse_combine <- function(x) {


Again internal, but would be xcmse or something else be more intiutive ?

again, yes. I was using .mse_ as an abbreviation for MsExperiment in internal function names - to quickly see from the function name that this is supposed to work on MsExperiment objects - then for XcmsExperiment I used xmse - and true, I sometimes even misspell it now. Would just need changes in a lot of places if we want to revert :(

sneumann · 2025-02-06T08:40:26Z

R/XcmsExperiment.R

+#'   results is supported. Any eventually present alignment or correspondence
+#'   results will be dropped before combining the `XcmsExperiment` objects.
+#'   Finally, at present, only the MS data of the individual `XcmsExperiment`
+#'   objects is combined and any data eventually present in the `@qdata`,


I haven't dug into the function, but what prevents c()ing also slots like @experimentFiles from two or more XcmsExperiments ?

Honestly - I never used the @experimentFiles slot - and am actually also not quite sure it will be used a lot...

sneumann · 2025-02-06T08:43:30Z

R/functions-XCMSnExp.R

@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
                       valsPerSpect = valsPerSpect, rtrange = rtr,
                       mzrange = mzr)
        if (length(mtx)) {
+            ## mtx: time, mz, intensity
            if (any(!is.na(mtx[, 3]))) {


ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.

jorainer and others added 15 commits September 16, 2024 13:49

feat: report m/z values in chromPeaks matrix for XChromatograms

2c57df3

- Report also a chromatogram's m/z values if `findChromPeaks` is run on a `Chromatogram` or `Chromatograms` object (issue #765).

fix: peak shape quality calculation for gap filling

dacd665

- Peak shape quality (similarity to gaussian shape) calculation is now performed using the EIC representation of a chromatographic peak, i.e. with intensities of mass peaks for the same retention time (but different m/z) summed.

feat: add chromPeakSummary generic (issue #705)

188541c

docs: add documentation for chromPeakSummary generic

e5aa3ff

Add internal function to calculate beta metrics

b59b15e

Fixed requested changes for PR #767

c017cba

Fix for PR #767

fe7509c

Merge pull request #767 from pablovgd/issue705

670de04

Add internal function to calculate beta metrics

added chromPeakSummary method

c581cde

requested changes for PR #768

f9d4b0d

Merge pull request #768 from pablovgd/issue705

e7ee120

added chromPeakSummary method

Added section on peak quality to vignette.

dd68e2f

Fixed typos in peak quality vignette.

a49539e

Merge pull request #770 from pablovgd/issue705

13cfaf3

Added peak quality section to xcms vignette.

refactor: little fixes

156788a

jorainer requested review from sneumann and philouail September 30, 2024 05:42

philouail approved these changes Sep 30, 2024

View reviewed changes