Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chromPeakSummary function and a fix #772

Merged
merged 24 commits into from
Feb 6, 2025
Merged

Add chromPeakSummary function and a fix #772

merged 24 commits into from
Feb 6, 2025

Conversation

jorainer
Copy link
Collaborator

This PR adds the chromPeakSummary() method and a first implementation to calculate the peak shape quality from @wkumler on chrom peak results (thanks to @pablovgd for contributing).

In addition, it fixes a bug in the calculation of the beta scores during gap filling.

jorainer and others added 15 commits September 16, 2024 13:49
- Report also a chromatogram's m/z values if `findChromPeaks` is run on a
  `Chromatogram` or `Chromatograms` object (issue #765).
- Peak shape quality (similarity to gaussian shape) calculation is now performed
  using the EIC representation of a chromatographic peak, i.e. with intensities
  of mass peaks for the same retention time (but different m/z) summed.
Add internal function to calculate beta metrics
added chromPeakSummary method
Added peak quality section to xcms vignette.
Copy link
Collaborator

@philouail philouail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great ! Also thanks for the small vignette ! The code looks good to me, thank @pablovgd, excited to incorporate it in the overall workflow and see how to integrate it with the other preprocessing steps to improve them.

PS: I think the GHA needs to be updated, i they are failing because of warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.

@jorainer
Copy link
Collaborator Author

jorainer commented Oct 1, 2024

Thanks @wkumler for your review! I have now addressed all your suggestions.

Copy link
Owner

@sneumann sneumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR !
What I couldn't attach to a specific line, would it make sense to add the paper to https://github.com/sneumann/xcms/blob/devel/inst/CITATION ?
Yours, Steffen

#' @param skews A numeric vector of the skews to try, corresponding to the
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
#' right-skewed, while values greater than 5 will be left-skewed.
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs() tells you the skewedness (regardless in what direction) and you can <0 or >0 if you only want the direction ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thank you for the review. I'm using the language inherent to the dbeta() function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.

@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
valsPerSpect = valsPerSpect, rtrange = rtr,
mzrange = mzr)
if (length(mtx)) {
## mtx: time, mz, intensity
if (any(!is.na(mtx[, 3]))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity). Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - generally. Here, we load the mtx matrix with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.

I did some timings on that:

mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L

library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
               expr min    lq   mean median    uq  max neval cld
          mtx[, 3L] 354 363.5 437.30  380.5 477.5  948   100   a
 mtx[, "intensity"] 389 404.0 543.07  412.0 474.0 5036   100   a
     mtx[, int_col] 381 399.5 469.45  407.0 484.0 1135   100   a

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.

((rtr[2] - rtr[1]) /
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))
maxi <- which.max(mtx[, 3])
maxi <- which.max(mtx[, 3L])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, above was not the only hardcoded 3 :-) and more below ...

@jorainer
Copy link
Collaborator Author

jorainer commented Oct 8, 2024

Regarding the CITATIONS - I added the reference to the respective man page. I think adding that as a CITATION for xcms itself might be bit too much, as the paper deals more with peak shape quality, independently of xcms?

@jorainer
Copy link
Collaborator Author

jorainer commented Oct 9, 2024

For now I only bumped the version and fixed the NEWS.md, but I did not address the other points above - can we maybe discuss them again @sneumann ?

@jorainer jorainer requested a review from sneumann October 9, 2024 07:43
@jorainer
Copy link
Collaborator Author

I'll make the PR a draft and fix the conflicts. After that I'll re-open.

@jorainer jorainer marked this pull request as draft December 16, 2024 07:21
- Add `c()` method to combine `XcmsExperiment` objects.
- Add a method to coerce from `XCMSnExp` to `XcmsExperiment` objects.
- Fix references in documentation.
@jorainer jorainer marked this pull request as ready for review December 17, 2024 06:34
@jorainer
Copy link
Collaborator Author

I've now fixed the conflicts and in addition also added the c() method to combine multiple XcmsExperiment objects into one as well as method to coerce from XCMSnExp to XcmsExperiment objects. Would be ready to review @sneumann :)

@jorainer
Copy link
Collaborator Author

@sneumann , just a little reminder that this PR is still open and we need to address it :)

Copy link
Owner

@sneumann sneumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks everyone for the PR and review work. Yours, Steffen

#' Convert a XCMSnExp to a XcmsExperiment.
#'
#' @noRd
.xcms_n_exp_to_xcms_experiment <- function(from) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual (to me) way of converting CamelCase class name to (part of) function name. As it is internal, I am just curious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed, true. was not paying attention actually. I tend to write internal function names now always in snake_case - for no particular reason except that I find it a bit more readable

#' @param x `list` of `XcmsExperiment` objects.
#'
#' @noRd
.xmse_combine <- function(x) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again internal, but would be xcmse or something else be more intiutive ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, yes. I was using .mse_ as an abbreviation for MsExperiment in internal function names - to quickly see from the function name that this is supposed to work on MsExperiment objects - then for XcmsExperiment I used xmse - and true, I sometimes even misspell it now. Would just need changes in a lot of places if we want to revert :(

#' results is supported. Any eventually present alignment or correspondence
#' results will be dropped before combining the `XcmsExperiment` objects.
#' Finally, at present, only the MS data of the individual `XcmsExperiment`
#' objects is combined and any data eventually present in the `@qdata`,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into the function, but what prevents c()ing also slots like @experimentFiles from two or more XcmsExperiments ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly - I never used the @experimentFiles slot - and am actually also not quite sure it will be used a lot...

@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
valsPerSpect = valsPerSpect, rtrange = rtr,
mzrange = mzr)
if (length(mtx)) {
## mtx: time, mz, intensity
if (any(!is.na(mtx[, 3]))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, convinced. The paranoid among us could check `colnames()[3]=="intensity" once before the loop.

@sneumann sneumann merged commit 55d39de into devel Feb 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants