Skip to content

Commit

Permalink
Merge pull request #772 from sneumann/jomain
Browse files Browse the repository at this point in the history
Add chromPeakSummary function and a fix, and plenty of improvements in the docs linking to functions in other packages
  • Loading branch information
sneumann authored Feb 6, 2025
2 parents 143f3b9 + 6a67604 commit 55d39de
Show file tree
Hide file tree
Showing 46 changed files with 917 additions and 235 deletions.
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: xcms
Version: 4.5.2
Version: 4.5.3
Title: LC-MS and GC-MS Data Analysis
Description: Framework for processing and visualization of chromatographically
separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF,
Expand Down Expand Up @@ -159,4 +159,3 @@ Collate:
'writemztab.R'
'xcmsSource.R'
'zzz.R'

8 changes: 5 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ S3method(plot, xcmsEIC)
S3method(split, xcmsSet)
S3method(c, xcmsSet)
S3method(c, XCMSnExp)

S3method(c, XcmsExperiment)
S3method(split, xcmsRaw)

exportClasses(
Expand Down Expand Up @@ -461,7 +461,8 @@ export("CentWaveParam",
"CleanPeaksParam",
"MergeNeighboringPeaksParam",
"FilterIntensityParam",
"ChromPeakAreaParam")
"ChromPeakAreaParam",
"BetaDistributionParam")
## Param class methods.

## New Classes
Expand Down Expand Up @@ -530,7 +531,8 @@ exportMethods("hasChromPeaks",
"featureSpectra",
"chromPeakSpectra",
"chromPeakChromatograms",
"featureChromatograms"
"featureChromatograms",
"chromPeakSummary"
)

## feature grouping functions and methods.
Expand Down
27 changes: 20 additions & 7 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,26 @@
# xcms 4.5.2
# xcms 4.5

## Changes in version 4.5.3

- Address issue #765: peak detection on chromatographic data: report a
chromatogram's `"mz"`, `"mzmin"` and `"mzmax"` as the mean m/z and lower and
upper m/z in the `chromPeaks()` matrix.
- Fix calculation of the correlation coefficient for peak shape similarity with
an idealized bell shape (*beta*) during gap filling for centWave-based
chromatographic peak detection with parameter `verboseBetaColumns = TRUE`.
- Add `chromPeakSummary` generic (issue #705).
- Add `chromPeakSummary()` method to calculate the *beta* quality metrics.
- Add `c()` method to combine multiple `XcmsExperiment` objects into one.
- Add a method to coerce from `XCMSnExp` to `XcmsExperiment` objects.

## Changes in version 4.5.2

- Small update to `featureSpectra()` and `chromPeakSpectra()` to allow addition
of `chromPeaks()` and `featuresDefinitions()` columns to be added to the
`Spectra` output.
- Tidied the `xcms` vignette, to order the filtering of features and remove
the outdated normalisation paragraph.In depth discussion on this subject can
be found on `metabonaut`.
`Spectra` output.
- Tidied the `xcms` vignette, to order the filtering of features and remove
the outdated normalisation paragraph.In depth discussion on this subject can
be found on `metabonaut`.

## Changes in version 4.5.1

Expand All @@ -18,8 +31,8 @@
## Changes in version 4.3.4

- Small update to the `matchLamaChromPeaks()` function to get the chromPeaksId
of the chromPeaks matched with Lamas.
- Small fix to the .yml file for the github actions, so they do not crash on
of the chromPeaks matched with Lamas.
- Small fix to the .yml file for the github actions, so they do not crash on
warnings.

## Changes in version 4.3.3
Expand Down
122 changes: 95 additions & 27 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ setGeneric("addProcessHistory", function(object, ...)
#' parameter in \code{\link{profile-matrix}} documentation for more details.
#'
#' @param BPPARAM parallel processing setup. Defaults to `BPPARAM = bpparam()`.
#' See [bpparam()] for details.
#' See [BiocParallel::bpparam()] for details.
#'
#' @param centerSample \code{integer(1)} defining the index of the center sample
#' in the experiment. It defaults to
Expand Down Expand Up @@ -143,7 +143,7 @@ setGeneric("addProcessHistory", function(object, ...)
#'
#' @param family For `PeakGroupsParam`: `character(1)` defining the method for
#' loess smoothing. Allowed values are `"gaussian"` and `"symmetric"`. See
#' [loess()] for more information.
#' [stats::loess()] for more information.
#'
#' @param gapExtend For `ObiwarpParam`: `numeric(1)` defining the penalty for
#' gap enlargement. The default value for `gapExtend` depends on the value
Expand Down Expand Up @@ -177,8 +177,8 @@ setGeneric("addProcessHistory", function(object, ...)
#' @param msLevel For `adjustRtime`: `integer(1)` defining the MS level on
#' which the alignment should be performed.
#'
#' @param object For `adjustRtime`: an [OnDiskMSnExp()], [XCMSnExp()],
#' [MsExperiment()] or [XcmsExperiment()] object.
#' @param object For `adjustRtime`: an [MSnbase::OnDiskMSnExp()], [XCMSnExp()],
#' [MsExperiment::MsExperiment()] or [XcmsExperiment()] object.
#'
#' @param param The parameter object defining the alignment method (and its
#' setting).
Expand Down Expand Up @@ -212,7 +212,7 @@ setGeneric("addProcessHistory", function(object, ...)
#'
#' @param span For `PeakGroupsParam`: `numeric(1)` defining
#' the degree of smoothing (if `smooth = "loess"`). This parameter is
#' passed to the internal call to [loess()].
#' passed to the internal call to [stats::loess()].
#'
#' @param subset For `ObiwarpParam` and `PeakGroupsParam`: `integer` with the
#' indices of samples within the experiment on which the alignment models
Expand Down Expand Up @@ -463,7 +463,8 @@ setGeneric("chromPeakData<-", function(object, value)
#' The columns will be named as they are written in the `chromPeaks` object
#' with a prefix `"chrom_peak_"`. Defaults to `c("mz", "rt")`.
#'
#' @param BPPARAM parallel processing setup. Defaults to [bpparam()].
#' @param BPPARAM parallel processing setup. Defaults to
#' [BiocParallel::bpparam()].
#'
#' @param ... ignored.
#'
Expand Down Expand Up @@ -545,6 +546,66 @@ setGeneric("chromPeakData<-", function(object, value)
setGeneric("chromPeakSpectra", function(object, ...)
standardGeneric("chromPeakSpectra"))

#' @title Chromatographic peak summaries
#'
#' @name chromPeakSummary
#'
#' @description
#'
#' The `chromPeakSummary()` method calculates summary statistics or other
#' metrics for each of the identified chromatographic peaks in an *xcms* result
#' object, such as the [XcmsExperiment()]. Different metrics can be calculated,
#' depending upon (and configured by) using dedicated *parameter* classes. As a
#' result, the method returns a `matrix` or `data.frame` with one row per
#' chromatographic peak. Each column contains calculated values, depending on
#' the used method/parameter class.
#'
#' Currently implemented methods/parameter classes are:
#'
#' - `BetaDistributionParam`: calculates the *beta_cor* and *beta_snr* quality
#' metrics as described in Kumler 2023 representing the result from a
#' (correlation) test of similarity (using Pearson's correlation coefficient)
#' to a bell curve and the signal-to-noise ratio calculated on the residuals
#' of this test.
#'
#' @param BPPARAM Parallel processing setup. See
#' [BiocParallel::bpparam()] for details.
#'
#' @param chunkSize `integer(1)` defining the number of samples from which data
#' should be loaded and processed at a time.
#'
#' @param msLevel `integer(1)` with the MS level of the chromatographic peaks
#' on which the metric should be calculated.
#'
#' @param object an *xcms* result object containing information on
#' identified chromatographic peaks.
#'
#' @param param a parameter object defining the method/summaries that should
#' be calculated (see description above for supported parameter classes).
#'
#' @param ... additional arguments passed to the method implementation.
#'
#' @return
#'
#' A `matrix` or `data.frame` with the same number of rows as there are
#' chromatographic peaks. Columns contain the calculated values. The number of
#' columns, their names and content depend on the used parameter object. See
#' the respective documentation above for more details.
#'
#' @author Pablo Vangeenderhuysen, Johannes Rainer, William Kumler
#'
#' @md
#'
#' @references
#'
#' Kumler W, Hazelton B J and Ingalls A E (2023) "Picky with peakpicking:
#' assessing chromatographic peak quality with simple metrics in metabolomics"
#' *BMC Bioinformatics* 24(1):404. doi: 10.1186/s12859-023-05533-4
#'
#' @export
setGeneric("chromPeakSummary", function(object, param, ...)
standardGeneric("chromPeakSummary"))

setGeneric("collect", function(object, ...) standardGeneric("collect"))
setGeneric("consecMissedLimit", function(object, ...)
standardGeneric("consecMissedLimit"))
Expand Down Expand Up @@ -642,8 +703,8 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' chromatogram.
#'
#' @param BPPARAM For `object` being an `XcmsExperiment`: parallel processing
#' setup. Defaults to `BPPARAM = bpparam()`. See [bpparam()] for more
#' information.
#' setup. Defaults to `BPPARAM = bpparam()`. See [BiocParallel::bpparam()]
#' for more information.
#'
#' @param chunkSize For `object` being an `XcmsExperiment`: `integer(1)`
#' defining the number of files from which the data should be loaded at
Expand Down Expand Up @@ -810,7 +871,8 @@ setGeneric("featureDefinitions<-", function(object, value)
#' spectra per feature).
#'
#' The information from `featureDefinitions` for each feature can be included
#' in the returned [Spectra()] object using the `featureColumns` parameter.
#' in the returned [Spectra::Spectra()] object using the `featureColumns`
#' parameter.
#' This is useful for keeping details such as the median retention time (`rtmed`)
#' or median m/z (`mzmed`). The columns will retain their names as specified
#' in the `featureDefinitions` object, prefixed by `"feature_"`
Expand All @@ -819,9 +881,11 @@ setGeneric("featureDefinitions<-", function(object, value)
#' as a metadata column named `"feature_id"`.
#'
#' See also [chromPeakSpectra()], as it supports a similar parameter for
#' including columns from the chromatographic peaks in the returned spectra object.
#' including columns from the chromatographic peaks in the returned spectra
#' object.
#' These parameters can be used in combination to include information from both
#' the chromatographic peaks and the features in the returned [Spectra()].
#' the chromatographic peaks and the features in the returned
#' [Spectra::Spectra()].
#' The *peak ID* (i.e., the row name of the peak in the `chromPeaks` matrix)
#' is added as a metadata column named `"chrom_peak_id"`.
#'
Expand All @@ -847,7 +911,8 @@ setGeneric("featureDefinitions<-", function(object, value)
#'
#' @return
#'
#' The function returns either a [Spectra()] (for `return.type = "Spectra"`)
#' The function returns either a [Spectra::Spectra()] (for
#' `return.type = "Spectra"`)
#' or a `List` of `Spectra` (for `return.type = "List"`). For the latter,
#' the order and the length matches parameter `features` (or if no `features`
#' is defined the order of the features in `featureDefinitions(object)`).
Expand Down Expand Up @@ -1146,7 +1211,7 @@ setGeneric("filterFeatureDefinitions", function(object, ...)
#' object will remove previous results.
#'
#' @param BPPARAM Parallel processing setup. Uses by default the system-wide
#' default setup. See [bpparam()] for more details.
#' default setup. See [BiocParallel::bpparam()] for more details.
#'
#' @param chunkSize `integer(1)` for `object` being an `MsExperiment` or
#' [XcmsExperiment()]: defines the number of files (samples) for which the
Expand All @@ -1165,14 +1230,15 @@ setGeneric("filterFeatureDefinitions", function(object, ...)
#' will thus in most settings cause an out-of-memory error.
#' By setting `chunkSize = -1` the peak detection will be performed
#' separately, and in parallel, for each sample. This will however not work
#' for all `Spectra` *backends* (see eventually [Spectra()] for details).
#' for all `Spectra` *backends* (see eventually [Spectra::Spectra()] for
#' details).
#'
#' @param msLevel `integer(1)` defining the MS level on which the
#' chromatographic peak detection should be performed.
#'
#' @param object The data object on which to perform the peak detection. Can be
#' an [OnDiskMSnExp()], [XCMSnExp()], [MChromatograms()] or [MsExperiment()]
#' object.
#' an [MSnbase::OnDiskMSnExp()], [XCMSnExp()], [MSnbase::MChromatograms()]
#' or [MsExperiment::MsExperiment()] object.
#'
#' @param param The parameter object selecting and configuring the algorithm.
#'
Expand Down Expand Up @@ -1242,7 +1308,8 @@ setGeneric("findChromPeaks", function(object, param, ...)
#' more information.
#'
#' @param BPPARAM if `object` is an `MsExperiment` or `XcmsExperiment`:
#' parallel processing setup. See [bpparam()] for more information.
#' parallel processing setup. See [BiocParallel::bpparam()] for more
#' information.
#'
#' @param ... currently not used.
#'
Expand Down Expand Up @@ -1537,7 +1604,8 @@ setGeneric("loadRaw", function(object, ...) standardGeneric("loadRaw"))
#' chromatographic peaks into features by providing their index in the
#' object's `chromPeaks` matrix.
#'
#' @param BPPARAM parallel processing settings (see [bpparam()] for details).
#' @param BPPARAM parallel processing settings (see [BiocParallel::bpparam()]
#' for details).
#'
#' @param chromPeaks For `manualChromPeaks`: `matrix` defining the boundaries
#' of the chromatographic peaks with one row per chromatographic peak and
Expand Down Expand Up @@ -1745,9 +1813,9 @@ setGeneric("rawMZ", function(object, ...) standardGeneric("rawMZ"))
#' Each MS2 chromatographic peak selected for an MS1 peak will thus represent
#' one **mass peak** in the reconstructed spectrum.
#'
#' The resulting [Spectra()] object provides also the peak IDs of the MS2
#' chromatographic peaks for each spectrum as well as their correlation value
#' with spectra variables *ms2_peak_id* and *ms2_peak_cor*.
#' The resulting [Spectra::Spectra()] object provides also the peak IDs of
#' the MS2 chromatographic peaks for each spectrum as well as their
#' correlation value with spectra variables *ms2_peak_id* and *ms2_peak_cor*.
#'
#' @param object `XCMSnExp` with identified chromatographic peaks.
#'
Expand All @@ -1774,8 +1842,8 @@ setGeneric("rawMZ", function(object, ...) standardGeneric("rawMZ"))
#' `chromPeaks`) of MS1 peaks for which MS2 spectra should be reconstructed.
#' By default they are reconstructed for all MS1 chromatographic peaks.
#'
#' @param BPPARAM parallel processing setup. See [bpparam()] for more
#' information.
#' @param BPPARAM parallel processing setup. See [BiocParallel::bpparam()]
#' for more information.
#'
#' @param return.type `character(1)` defining the type of the returned object.
#' Only `return.type = "Spectra"` is supported, `return.type = "MSpectra"`
Expand All @@ -1785,14 +1853,14 @@ setGeneric("rawMZ", function(object, ...) standardGeneric("rawMZ"))
#'
#' @return
#'
#' - [Spectra()] object (defined in the `Spectra` package) with the
#' - [Spectra::Spectra()] object (defined in the `Spectra` package) with the
#' reconstructed MS2 spectra for all MS1 peaks in `object`. Contains
#' empty spectra (i.e. without m/z and intensity values) for MS1 peaks for
#' which reconstruction was not possible (either no MS2 signal was recorded
#' or the correlation of the MS2 chromatographic peaks with the MS1
#' chromatographic peak was below threshold `minCor`. Spectra variables
#' `"ms2_peak_id"` and `"ms2_peak_cor"` (of type [CharacterList()]
#' and [NumericList()] with length equal to the number of peaks per
#' `"ms2_peak_id"` and `"ms2_peak_cor"` (of type [IRanges::CharacterList()]
#' and [IRanges::NumericList()] with length equal to the number of peaks per
#' reconstructed MS2 spectrum) providing the IDs and the correlation of the
#' MS2 chromatographic peaks from which the MS2 spectrum was reconstructed.
#' As retention time the median retention times of all MS2 chromatographic
Expand Down Expand Up @@ -1888,7 +1956,7 @@ setGeneric("reconstructChromPeakSpectra", function(object, ...)
#'
#' @param BPPARAM parameter object to set up parallel processing. Uses the
#' default parallel processing setup returned by `bpparam()`. See
#' [bpparam()] for details and examples.
#' [BiocParallel::bpparam()] for details and examples.
#'
#' @param chunkSize For `refineChromPeaks` if `object` is either an
#' `XcmsExperiment`: `integer(1)` defining the number of files (samples)
Expand Down
5 changes: 5 additions & 0 deletions R/DataClasses.R
Original file line number Diff line number Diff line change
Expand Up @@ -2182,3 +2182,8 @@ setClass("FilterIntensityParam",
msg
else TRUE
})

setClass("BetaDistributionParam",
contains = "Param"
)

31 changes: 31 additions & 0 deletions R/MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -546,3 +546,34 @@
x@sampleDataLinks[["spectra"]] <- sdl
x
}

#' WARNING: this only joins @sampleData, @spectra and
#' `@sampleDataLinks[["spectra"]]`! All other slots are ignored.
#'
#' @noRd
.mse_combine <- function(x) {
if (!all(vapply(x, inherits, NA, "MsExperiment")))
stop("Only objects extending 'MsExperiment' accepted as input.")
## check other slots
lapply(x, function(z) {
if (length(z@experimentFiles) || length(z@qdata) || length(z@otherData))
stop("Slots 'experimentFiles', 'qdata' or 'otherData' are not ",
"empty! Can only combine objects for which these data slots ",
"are empty.", call. = FALSE)
})
res <- x[[1L]]
res@sampleData <- do.call(MsCoreUtils::rbindFill, lapply(x, sampleData))
res@spectra <- do.call(c, lapply(x, spectra))
sl <- lapply(x, function(z) z@sampleDataLinks[["spectra"]])
nsamp <- lengths(x)
nsamp <- c(0, cumsum(nsamp)[-length(nsamp)])
nspec <- vapply(sl, nrow, NA_integer_)
nspec <- c(0, cumsum(nspec)[-length(nspec)])
res@sampleDataLinks[["spectra"]] <- do.call(
rbind, mapply(function(z, i, j) {
z[, 1L] <- z[, 1L] + i
z[, 2L] <- z[, 2L] + j
z
}, sl, nsamp, nspec, SIMPLIFY = FALSE, USE.NAMES = FALSE))
res
}
Loading

0 comments on commit 55d39de

Please sign in to comment.