install.packages("BiocManager")
-## Packa
BiocManager::install(c('RforMassSpectrometry/MsIO', 'RforMassSpectrometry/MsBackendMetaboLights'), ask = FALSE, dependencies = TRUE)
BiocManager::install("rformassspectrometry/metabonaut",
diff --git a/pkgdown.yml b/pkgdown.yml
index f1818a0..70dfb56 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -6,7 +6,7 @@ articles:
alignment-to-external-dataset: alignment-to-external-dataset.html
dataset-investigation: dataset-investigation.html
install_v0: install_v0.html
-last_built: 2024-10-21T15:42Z
+last_built: 2024-10-21T22:50Z
urls:
reference: https://rformassspectrometry.github.io/metabonaut/reference
article: https://rformassspectrometry.github.io/metabonaut/articles
diff --git a/search.json b/search.json
index 99728c9..cd01349 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"present workflow describes steps analysis LC-MS/MS experiment, includes preprocessing raw data generate abundance matrix features various samples, followed data normalization, differential abundance analysis finally annotation features metabolites. Note also alternative analysis options R packages used different steps examples mentioned throughout workflow. Steps end--end workflow possible alternatives","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-description","dir":"Articles","previous_headings":"","what":"Data description","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"See data description vignette detailed explanation dataset go workflow general tips done first get data.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"packages-needed","dir":"Articles","previous_headings":"","what":"Packages needed","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"workflow therefore based following dependencies:","code":"## Data Import and handling library(readxl) library(MsExperiment) library(MsIO) library(MsBackendMetaboLights) library(SummarizedExperiment) ## Preprocessing of LC-MS data library(xcms) library(Spectra) library(MetaboCoreUtils) ## Statistical analysis library(limma) # Differential abundance library(matrixStats) # Summaries over matrices ## Visualisation library(pander) library(RColorBrewer) library(pheatmap) library(vioplot) library(ggfortify) # Plot PCA library(gridExtra) # To arrange multiple ggplots into single plots ## Annotation library(AnnotationHub) # Annotation resources library(CompoundDb) # Access small compound annotation data. library(MetaboAnnotation) # Functionality for metabolite annotation."},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-import","dir":"Articles","previous_headings":"","what":"Data import","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Note different equipment generate various file extensions, conversion step might needed beforehand, though apply dataset. Spectra package supports variety ways store retrieve MS data, including mzML, mzXML, CDF files, simple flat files, database systems. necessary, several tools, ProteoWizard’s MSConvert, can used convert files .mzML format [@chambers_cross-platform_2012]. show extract dataset MetaboLigths database load MsExperiment object. information load data MetaboLights database, refer vignette. type data loading, check xcms vignette specific vignette created data import soon. next configure parallel processing setup. functions xcms package allow per-sample parallel processing, can improve performance analysis, especially large data sets. xcms packages RforMassSpectrometry package ecosystem use parallel processing setup configured BiocParallel Bioconductor package. code use fork-based parallel processing unix system, socket-based parallel processing Windows operating system.","code":"param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms1 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #' Set up parallel processing using 2 cores if (.Platform$OS.type == \"unix\") { register(MulticoreParam(2)) } else { register(SnowParam(2)) }"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-organisation","dir":"Articles","previous_headings":"","what":"Data organisation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"experimental data now represented MsExperiment object MsExperiment package. MsExperiment object container metadata spectral data provides manages also linkage samples spectra. provide brief overview data structure content. sampleData() function extracts sample information object. next extract data use pander package render show information Table 1 . Throughout document use R pipe operator (|>) avoid nested function calls hence improve code readability. sampleData() output MetaboLights always ideal direct easy access data. therefore rename transform user-friendly way. user can add, transform remove column want using base R functionalities. Table 1. Samples data set. Table 2. Simplified sample data. 11 samples data set. abbreviations essential proper interpretation metadata information: injection_index: index representing order (position) individual sample measured (injected) within LC-MS measurement run experiment. \"QC\": Quality control sample (pool serum samples external, large cohort). \"CVD\": Sample individual cardiovascular disease. \"CTR\": Sample presumably healthy control. sample_name: arbitrary name/identifier sample. age: (rounded) age individuals. define colors sample groups based sample group using RColorBrewer package: MS data experiment stored Spectra object (Spectra Bioconductor package) within MsExperiment object can accessed using spectra() function. element object spectrum - organised linearly combined Spectra object one (ordered retention time samples). access dataset’s Spectra object summarize available information provide, among things, total number spectra data set. can also summarize number spectra respective MS level (extracted msLevel() function). fromFile() function returns spectrum index sample (data file) can thus used split information (MS level case) sample summarize using base R table() function combine result matrix. Note number spectra acquired run, number spectral features sample. present data set thus contains MS1 data, ideal quantification signal. second (LC-MS/MS) data set also fragment (MS2) spectra samples used later workflow. Note users restrict data evaluation examples shown tutorials. Spectra package enables user-friendly access full MS data functionality extensively used explore, visualize summarize data. another example, determine retention time range entire data set. Data obtained LC-MS experiments typically analyzed along retention time axis, MS data organized spectrum, orthogonal retention time axis.","code":"lcms1 Object of class MsExperiment Spectra: MS1 (17210) Experiment data: 10 sample(s) Sample data links: - spectra: 10 sample(s) to 17210 element(s). sampleData(lcms1)[, c(\"Derived_Spectral_Data_File\", \"Characteristics[Sample type]\", \"Factor Value[Phenotype]\", \"Sample Name\", \"Factor Value[Age]\")] |> kable(format = \"pipe\") # Let's rename the column for easier access colnames(sampleData(lcms1)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # Add \"QC\" to the phenotype of the QC samples sampleData(lcms1)$phenotype[sampleData(lcms1)$sample_name == \"POOL\"] <- \"QC\" sampleData(lcms1)$sample_name[sampleData(lcms1)$sample_name == \"POOL\" ] <- c(\"POOL1\", \"POOL2\", \"POOL3\", \"POOL4\") # Add injection index column sampleData(lcms1)$injection_index <- seq_len(nrow(sampleData(lcms1))) #let's look at the updated sample data sampleData(lcms1)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\", \"injection_index\")] |> kable(format = \"pipe\") #' Access Spectra Object spectra(lcms1) MSn data (Spectra) with 17210 spectra in a MsBackendMetaboLights backend: msLevel rtime scanIndex 1 1 0.274 1 2 1 0.553 2 3 1 0.832 3 4 1 1.111 4 5 1 1.390 5 ... ... ... ... 17206 1 479.052 1717 17207 1 479.331 1718 17208 1 479.610 1719 17209 1 479.889 1720 17210 1 480.168 1721 ... 36 more variables/columns. file(s): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML ... 7 more files #' Count the number of spectra with a specific MS level per file. spectra(lcms1) |> msLevel() |> split(fromFile(lcms1)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 1 1721 1721 1721 1721 1721 1721 1721 1721 1721 1721 #' Retention time range for entire dataset spectra(lcms1) |> rtime() |> range() [1] 0.273 480.169"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-visualization-and-general-quality-assessment","dir":"Articles","previous_headings":"","what":"Data visualization and general quality assessment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Effective visualization paramount inspecting assessing quality MS data. general overview LC-MS data, can: Combine mass peaks (MS1) spectra sample single spectrum mass peak represents maximum signal mass peaks similar m/z. spectrum might called Base Peak Spectrum (BPS), providing information abundant ions sample. Aggregate mass peak intensities spectrum, resulting Base Peak Chromatogram (BPC). BPC shows highest measured intensity distinct retention time (hence spectrum) thus orthogonal BPS. Sum mass peak intensities spectrum create Total Ion Chromatogram (TIC). Compare BPS samples experiment evaluate similarity ion content. Compare BPC samples experiment identify samples similar dissimilar chromatographic signal. addition general data evaluation visualization, also crucial investigate specific signal e.g. internal standards compounds/ions known present samples. providing reliable reference, internal standards help achieve consistent accurate analytical results. BPS collapses data retention time dimension reveals prevalent ions present samples, creation BPS however straightforward. Mass peaks, even representing signals ion, never identical m/z values consecutive spectra due measurement error/resolution instrument. use combineSpectra function combine spectra one file (defined using parameter f = fromFile(data)) single spectrum. mass peaks difference m/z value smaller 3 parts-per-million (ppm) combined one mass peak, intensity representing maximum grouped mass peaks. reduce memory requirement, addition first bin spectrum combining mass peaks within spectrum, aggregating mass peaks bins 0.01 m/z width. case large datasets, also recommended set processingChunkSize() parameter MsExperiment object finite value (default Inf) causing data processed (loaded memory) chunks processingChunkSize() spectra. can reduce memory demand speed process. can now generate BPS sample plot() . , observable overlap ion content files, particularly around 300 m/z 700 m/z. however also differences sets samples. particular, BPS 1, 4, 7 10 (counting row-wise left right) seem different others. fact, four BPS QC samples, remaining six study samples. observed differences might explained fact QC samples pools serum samples different cohort, study samples represent plasma samples, different sample collection. Next visual inspection , can also calculate express similarity BPS heatmap. use compareSpectra() function calculate pairwise similarities BPS use pheatmap() function pheatmap package cluster visualize result. get first glance different samples distribute terms similarity. heatmap confirms observations made BPS, showing distinct clusters QCs study samples, owing different matrices sample collections. also strongly recommended delve deeper data exploring detail. can accomplished carefully assessing data extracting spectra regions interest examination. next chunk, look extract information specific spectrum distinct samples. Figure 3. Intensity m/z values 125th spectrum two CTR samples. significant dissimilarities peak distribution intensity confirm difference composition QCs study samples. next compare full MS1 spectrum CVD CTR sample. Figure 4. Intensity m/z values 125th spectrum one CVD one CTR sample. , can observe spectra CVD CTR samples entirely similar, exhibit similar main peaks 200 600 m/z general higher intensity control samples. However peak distribution (least intensity) seems vary m/z 10 210 m/z 600. CTR spectrum exhibits significant peaks around m/z 150 - 200 much lower intensity CVD sample. delve details specific spectrum, wide range functions can employed: Table 3. Intensity m/z values 125th spectrum one CTR sample. chromatogram() function facilitates extraction intensities along retention time. However, access chromatographic information currently efficient seamless spectral information. Work underway develop/improve infrastructure chromatographic data new Chromatograms object aimed flexible user-friendly Spectra object. visualizing LC-MS data, BPC TIC serves valuable tool assess performance liquid chromatography across various samples experiment. case, extract BPC data create plot. BPC captures maximum peak signal spectrum data file plots information retention time spectrum y-axis. BPC can extracted using chromatogram function. setting parameter aggregationFun = \"max\", instruct function report maximum signal per spectrum. Conversely, setting aggregationFun = \"sum\", sums intensities spectrum, thereby creating TIC. Figure 5. BPC samples colored phenotype. 240 seconds signal seems measured. Thus, filter data removing part well first 10 seconds measured LC run. Figure 6. BPC filtering retention time. Initially, examined entire BPC subsequently filtered based desired retention times. results smaller file size also facilitates straightforward interpretation BPC. final plot illustrates BPC sample colored phenotype, providing insights signal measured along retention times sample. reveals points compounds eluted LC column. essence, BPC condenses three-dimensional LC-MS data (m/z retention time intensity) two dimensions (retention time intensity). can also compare similarities BPCs heatmap. retention times however identical different samples. Thus bin() chromatographic signal per sample along retention time axis bins two seconds resulting data number bins/data points. can calculate pairwise similarities data vectors using cor() function visualize result using pheatmap(). Figure 7. Heatmap BPC similarities. heatmap reinforces exploration spectra data showed, strong separation QC study samples. important bear mind later analyses. Additionally, study samples group two clusters, cluster containing samples C F cluster II samples. plot TIC samples, using different color cluster. Figure 8. Example TIC unusual signal. TIC samples look similar, samples cluster show different signal retention time range 40 160 seconds. Whether, strong difference impact following analysis remains determined. Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Setting the chunksize chunksize <- 1000 processingChunkSize(spectra(lcms1)) <- chunksize #' Combining all spectra per file into a single spectrum bps <- spectra(lcms1) |> bin(binSize = 0.01) |> combineSpectra(f = fromFile(lcms1), intensityFun = max, ppm = 3) #' Plot the base peak spectra par(mar = c(2, 1, 1, 1)) plotSpectra(bps, main= \"\") #' Calculate similarities between BPS sim_matrix <- compareSpectra(bps) #' Add sample names as rownames and colnames rownames(sim_matrix) <- colnames(sim_matrix) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(sim_matrix) #' Plot the heatmap pheatmap(sim_matrix, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) #' Accessing a single spectrum - comparing with QC par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[1])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"QC sample\") plotSpectra(spec2, main = \"CTR sample\") #' Accessing a single spectrum - comparing CVD and CTR par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[2])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"CVD sample\") plotSpectra(spec2, main = \"CTR sample\") #' Checking its intensity intensity(spec2) NumericList of length 1 [[1]] 18.3266733266736 45.1666666666667 ... 27.1048951048951 34.9020979020979 #' Checking its rtime rtime(spec2) [1] 34.872 #' Checking its m/z mz(spec2) NumericList of length 1 [[1]] 51.1677328505635 53.0461968245186 ... 999.139446289161 999.315208803072 #' Filtering for a specific m/z range and viewing in a tabular format filt_spec <- filterMzRange(spec2,c(50,200)) data.frame(intensity = unlist(intensity(filt_spec)), mz = unlist(mz(filt_spec))) |> head() |> kable(format = \"markdown\") #' Extract and plot BPC for full data bpc <- chromatogram(lcms1, aggregationFun = \"max\") plot(bpc, col = paste0(col_sample, 80), main = \"BPC\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Filter the data based on retention time lcms1 <- filterRt(lcms1, c(10, 240)) Filter spectra bpc <- chromatogram(lcms1, aggregationFun = \"max\") #' Plot after filtering plot(bpc, col = paste0(col_sample, 80), main = \"BPC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Total ion chromatogram tic <- chromatogram(lcms1, aggregationFun = \"sum\") |> bin(binSize = 2) #' Calculate similarity (Pearson correlation) between BPCs ticmap <- do.call(cbind, lapply(tic, intensity)) |> cor() rownames(ticmap) <- colnames(ticmap) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(ticmap) #' Plot heatmap pheatmap(ticmap, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) cluster_I_idx <- sampleData(lcms1)$sample_name %in% c(\"F\", \"C\") cluster_II_idx <- sampleData(lcms1)$sample_name %in% c(\"A\", \"B\", \"D\", \"E\") temp_col <- c(\"grey\", \"red\") names(temp_col) <- c(\"Cluster II\", \"Cluster I\") col <- rep(temp_col[1], length(lcms1)) col[cluster_I_idx] <- temp_col[2] col[sampleData(lcms1)$phenotype == \"QC\"] <- NA lcms1 |> chromatogram(aggregationFun = \"sum\") |> plot( col = col, main = \"TIC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = temp_col, legend = names(temp_col), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"spectra-data-visualization-bps","dir":"Articles","previous_headings":"","what":"Spectra Data Visualization: BPS","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"BPS collapses data retention time dimension reveals prevalent ions present samples, creation BPS however straightforward. Mass peaks, even representing signals ion, never identical m/z values consecutive spectra due measurement error/resolution instrument. use combineSpectra function combine spectra one file (defined using parameter f = fromFile(data)) single spectrum. mass peaks difference m/z value smaller 3 parts-per-million (ppm) combined one mass peak, intensity representing maximum grouped mass peaks. reduce memory requirement, addition first bin spectrum combining mass peaks within spectrum, aggregating mass peaks bins 0.01 m/z width. case large datasets, also recommended set processingChunkSize() parameter MsExperiment object finite value (default Inf) causing data processed (loaded memory) chunks processingChunkSize() spectra. can reduce memory demand speed process. can now generate BPS sample plot() . , observable overlap ion content files, particularly around 300 m/z 700 m/z. however also differences sets samples. particular, BPS 1, 4, 7 10 (counting row-wise left right) seem different others. fact, four BPS QC samples, remaining six study samples. observed differences might explained fact QC samples pools serum samples different cohort, study samples represent plasma samples, different sample collection. Next visual inspection , can also calculate express similarity BPS heatmap. use compareSpectra() function calculate pairwise similarities BPS use pheatmap() function pheatmap package cluster visualize result. get first glance different samples distribute terms similarity. heatmap confirms observations made BPS, showing distinct clusters QCs study samples, owing different matrices sample collections. also strongly recommended delve deeper data exploring detail. can accomplished carefully assessing data extracting spectra regions interest examination. next chunk, look extract information specific spectrum distinct samples. Figure 3. Intensity m/z values 125th spectrum two CTR samples. significant dissimilarities peak distribution intensity confirm difference composition QCs study samples. next compare full MS1 spectrum CVD CTR sample. Figure 4. Intensity m/z values 125th spectrum one CVD one CTR sample. , can observe spectra CVD CTR samples entirely similar, exhibit similar main peaks 200 600 m/z general higher intensity control samples. However peak distribution (least intensity) seems vary m/z 10 210 m/z 600. CTR spectrum exhibits significant peaks around m/z 150 - 200 much lower intensity CVD sample. delve details specific spectrum, wide range functions can employed: Table 3. Intensity m/z values 125th spectrum one CTR sample.","code":"#' Setting the chunksize chunksize <- 1000 processingChunkSize(spectra(lcms1)) <- chunksize #' Combining all spectra per file into a single spectrum bps <- spectra(lcms1) |> bin(binSize = 0.01) |> combineSpectra(f = fromFile(lcms1), intensityFun = max, ppm = 3) #' Plot the base peak spectra par(mar = c(2, 1, 1, 1)) plotSpectra(bps, main= \"\") #' Calculate similarities between BPS sim_matrix <- compareSpectra(bps) #' Add sample names as rownames and colnames rownames(sim_matrix) <- colnames(sim_matrix) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(sim_matrix) #' Plot the heatmap pheatmap(sim_matrix, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) #' Accessing a single spectrum - comparing with QC par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[1])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"QC sample\") plotSpectra(spec2, main = \"CTR sample\") #' Accessing a single spectrum - comparing CVD and CTR par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[2])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"CVD sample\") plotSpectra(spec2, main = \"CTR sample\") #' Checking its intensity intensity(spec2) NumericList of length 1 [[1]] 18.3266733266736 45.1666666666667 ... 27.1048951048951 34.9020979020979 #' Checking its rtime rtime(spec2) [1] 34.872 #' Checking its m/z mz(spec2) NumericList of length 1 [[1]] 51.1677328505635 53.0461968245186 ... 999.139446289161 999.315208803072 #' Filtering for a specific m/z range and viewing in a tabular format filt_spec <- filterMzRange(spec2,c(50,200)) data.frame(intensity = unlist(intensity(filt_spec)), mz = unlist(mz(filt_spec))) |> head() |> kable(format = \"markdown\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"chromatographic-data-visualization-bpc-and-tic","dir":"Articles","previous_headings":"","what":"Chromatographic Data Visualization: BPC and TIC","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"chromatogram() function facilitates extraction intensities along retention time. However, access chromatographic information currently efficient seamless spectral information. Work underway develop/improve infrastructure chromatographic data new Chromatograms object aimed flexible user-friendly Spectra object. visualizing LC-MS data, BPC TIC serves valuable tool assess performance liquid chromatography across various samples experiment. case, extract BPC data create plot. BPC captures maximum peak signal spectrum data file plots information retention time spectrum y-axis. BPC can extracted using chromatogram function. setting parameter aggregationFun = \"max\", instruct function report maximum signal per spectrum. Conversely, setting aggregationFun = \"sum\", sums intensities spectrum, thereby creating TIC. Figure 5. BPC samples colored phenotype. 240 seconds signal seems measured. Thus, filter data removing part well first 10 seconds measured LC run. Figure 6. BPC filtering retention time. Initially, examined entire BPC subsequently filtered based desired retention times. results smaller file size also facilitates straightforward interpretation BPC. final plot illustrates BPC sample colored phenotype, providing insights signal measured along retention times sample. reveals points compounds eluted LC column. essence, BPC condenses three-dimensional LC-MS data (m/z retention time intensity) two dimensions (retention time intensity). can also compare similarities BPCs heatmap. retention times however identical different samples. Thus bin() chromatographic signal per sample along retention time axis bins two seconds resulting data number bins/data points. can calculate pairwise similarities data vectors using cor() function visualize result using pheatmap(). Figure 7. Heatmap BPC similarities. heatmap reinforces exploration spectra data showed, strong separation QC study samples. important bear mind later analyses. Additionally, study samples group two clusters, cluster containing samples C F cluster II samples. plot TIC samples, using different color cluster. Figure 8. Example TIC unusual signal. TIC samples look similar, samples cluster show different signal retention time range 40 160 seconds. Whether, strong difference impact following analysis remains determined. Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Extract and plot BPC for full data bpc <- chromatogram(lcms1, aggregationFun = \"max\") plot(bpc, col = paste0(col_sample, 80), main = \"BPC\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Filter the data based on retention time lcms1 <- filterRt(lcms1, c(10, 240)) Filter spectra bpc <- chromatogram(lcms1, aggregationFun = \"max\") #' Plot after filtering plot(bpc, col = paste0(col_sample, 80), main = \"BPC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Total ion chromatogram tic <- chromatogram(lcms1, aggregationFun = \"sum\") |> bin(binSize = 2) #' Calculate similarity (Pearson correlation) between BPCs ticmap <- do.call(cbind, lapply(tic, intensity)) |> cor() rownames(ticmap) <- colnames(ticmap) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(ticmap) #' Plot heatmap pheatmap(ticmap, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) cluster_I_idx <- sampleData(lcms1)$sample_name %in% c(\"F\", \"C\") cluster_II_idx <- sampleData(lcms1)$sample_name %in% c(\"A\", \"B\", \"D\", \"E\") temp_col <- c(\"grey\", \"red\") names(temp_col) <- c(\"Cluster II\", \"Cluster I\") col <- rep(temp_col[1], length(lcms1)) col[cluster_I_idx] <- temp_col[2] col[sampleData(lcms1)$phenotype == \"QC\"] <- NA lcms1 |> chromatogram(aggregationFun = \"sum\") |> plot( col = col, main = \"TIC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = temp_col, legend = names(temp_col), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"known-compounds","dir":"Articles","previous_headings":"Data visualization and general quality assessment","what":"Known compounds","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-preprocessing","dir":"Articles","previous_headings":"","what":"Data preprocessing","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Preprocessing stands inaugural step analysis untargeted LC-MS. characterized 3 main stages: chromatographic peak detection, retention time shift correction (alignment) correspondence results features defined. primary objective preprocessing quantification signals ions measured sample, addressing potential retention time drifts samples, ensuring alignment quantified signals across samples within experiment. final result LC-MS data preprocessing numeric matrix abundances quantified entities samples experiment. initial preprocessing step involves detecting intensity peaks along retention time axis, called chromatographic peaks. achieve , employ findChromPeaks() function within xcms. function supports various algorithms peak detection, can selected configured respective parameter objects. preferred algorithm case, CentWave, utilizes continuous wavelet transformation (CWT)-based peak detection [@tautenhahn_highly_2008]. method known effectiveness handling non-Gaussian shaped chromatographic peaks peaks varying retention time widths, commonly encountered HILIC separations. apply CentWave algorithm default settings extracted ion chromatogram cystine methionine ions evaluate results. CentWave highly performant algorithm, requires costumized dataset. implies parameters fine-tuned based user’s data. example serves clear motivation users familiarize various parameters need adapt data set. discuss main parameters can easily adjusted suit user’s dataset: peakwidth: Specifies minimal maximal expected width peaks retention time dimension. Highly dependent chromatographic settings used. ppm: maximal allowed difference mass peaks’ m/z values (parts-per-million) consecutive scans consider representing signal ion. integrate: parameter defines integration method. , primarily use integrate = 2 integrates also signal chromatographic peak’s tail considered accurate developers. determine peakwidth, recommend users refer previous EICs estimate range peak widths observe dataset. Ideally, examining multiple EICs goal. dataset, peak widths appear around 2 10 seconds. advise choosing range wide narrow peakwidth parameter can lead false positives negatives. determine ppm, deeper analysis dataset needed. clarified ppm depends instrument, users necessarily input vendor-advertised ppm. ’s determine accurately possible: following steps involve generating highly restricted MS area single mass peak per spectrum, representing cystine ion. m/z peaks extracted, absolute difference calculated finally expressed ppm. therefore, choose value close maximum within range parameter ppm, .e., 15 ppm. can now perform chromatographic peak detection adapted settings EICs. important note , properly estimate background noise, sufficient data points outside chromatographic peak need present. generally problem peak detection performed full LC-MS data set, peak detection EICs retention time range EIC needs sufficiently wide. function fails find peak EIC, initial troubleshooting step increase range. Additionally, signal--noise threshold snthresh reduced peak detection EICs, within small retention time range, enough signal present properly estimate background noise. Finally, case MS1 data points per peaks, setting CentWave’s advanced parameter extendLengthMSW TRUE can help peak detection. customized parameters, chromatographic peak detected sample. , use plot() function EICs visualize results. Figure 11. Chromatographic peak detection EICs. can see peak seems ot detected sample ions. indicates custom settings seem thus suitable dataset. now proceed apply entire dataset, extracting EICs ions evaluate confirm chromatographic peak detection worked expected. Note: revert value parameter snthresh default, , mentioned , background noise estimation reliable performed full data set. Parameter chunkSize findChromPeaks() defines number data files loaded memory processed simultaneously. parameter thus allows fine-tune memory demand well performance chromatographic peak detection step. plot EICs two selected internal standards evaluate chromatographic peak detection results. Figure 12. Chromatographic peak detection EICs processing. Peaks seem detected properly samples ions. indicates peak detection process entire dataset successful. identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands. Despite using chromatographic settings conditions retention time shifts unavoidable. Indeed, performance instrument can change time, example due small variations environmental conditions, temperature pressure. shifts generally small samples measured within batch/measurement run, can considerable data experiment acquired across longer time period. evaluate presence shift extract plot BPC QC samples. Figure 15. BPC QC samples. QC samples representing sample (pool) measured regular intervals measurement run experiment measured day. Still, small shifts can observed, especially region 100 150 seconds. facilitate proper correspondence signals across samples (hence definition LC-MS features), essential minimize differences retention times. Theoretically, proceed two steps: first select QC samples dataset first alignment , using -called anchor peaks. way can assume linear shift time, since always measuring sample different regular time intervals. Despite external QCs data set, still use subset-based alignment assuming retention time shifts independent different sample matrix (human serum plasma) instead mostly instrument-dependent. Note also possible manually specify anchor peaks, respectively retention times align data set external, reference, data set. information provided vignettes xcms package. calculating much adjust retention time samples, apply shift also study samples. xcms retention time alignment can performed using adjustRtime() function alignment algorithm. example use PeakGroups method [@smith_xcms_2006] performs alignment minimizing differences retention times set anchor peaks different samples. method requires initial correspondence analysis match/group chromatographic peaks across samples algorithm selects anchor peaks alignment. initial correspondence, use PeakDensity approach [@smith_xcms_2006] groups chromatographic peaks similar m/z retention time LC-MS features. parameters algorithm, can configured using PeakDensityParam object, sampleGroups, minFraction, binSize, ppm bw. binSize, ppm bw allow specify similar chromatographic peaks’ m/z retention time values need consider grouping feature. binSize ppm define required similarity m/z values. Within m/z bin (defined binSize ppm) areas along retention time axis high chromatographic peak density (considering peaks samples) identified, chromatographic peaks within regions considered grouping feature. High density areas identified using base R density() function, bw parameter: higher values define wider retention time areas, lower values require chromatographic peaks similar retention times. parameter can seen black line plot , corresponding smoothness density curve. Whether candidate peaks get grouped feature depends also parameters sampleGroups minFraction: sampleGroups provide, sample, sample group belongs . minFraction expected value 0 1 defining proportion samples within least one sample groups (defined sampleGroups) chromatographic peaks detected group feature. initial correspondence, parameters don’t need fully optimized. Selection dataset-specific parameter values described detail next section. dataset, use small values binSize ppm , importantly, also parameter bw, since data set ultra high performance (UHP) LC setup used. minFraction use high value (0.9) ensure features defined chromatographic peaks present almost samples one sample group (can used anchor peaks actual alignment). base alignment later QC samples hence define sampleGroups binary variable grouping samples either study, QC group. Figure 16. Initial correspondence analysis. PeakGroups-based alignment can next performed using adjustRtime() function PeakGroupsParam parameter object. parameters algorithm : subsetAdjust subset: Allows subset alignment. base retention time alignment QC samples, .e., retention time shifts estimated based repeatedly measured samples. resulting adjustment applied entire data. data sets QC samples (e.g. sample pools) measured repeatedly, strongly suggest use method. Note also subset-based alignment samples ordered injection index (.e., order measured measurement run). minFraction: value 0 1 defining proportion samples (full data set, data subset defined subset) chromatographic peak identified use anchor peak. contrast PeakDensityParam parameter used define proportion within sample group. span: PeakGroups method allows, depending data, adjust regions along retention time axis differently. enable local alignments LOESS function used parameter defines degree smoothing function. Generally, values 0.4 0.6 used, however, suggested evaluate alignment results eventually adapt parameters result satisfactory. perform alignment data set based retention times anchor peaks defined subset QC samples. Alignment adjusted retention times spectra data set, well retention times identified chromatographic peaks. alignment performed, user evaluate results using plotAdjustedRtime() function. function visualizes difference adjusted raw retention time sample y-axis along adjusted retention time x-axis. Dot points represent position used anchor peak along retention time axis. optimal alignment areas along retention time axis, anchor peaks scattered retention time dimension. Figure 17. Retention time alignment results. samples present data set measured within measurement run, resulting small retention time shifts. Therefore, little adjustments needed performed (shifts maximum 1 second can seen plot ). Generally, magnitude adjustment seen plots match expectation analyst. can also compare BPC alignment. get original data, .e. raw retention times, can use dropAdjustedRtime() function: Figure 18. BPC alignment. largest shift can observed retention time range 120 130s. Apart retention time range, little changes can observed. next evaluate impact alignment EICs selected internal standards. thus first extract ion chromatograms alignment. can now evaluate alignment effect test ions. plot EICs alignment isotope labeled cystine methionine. Figure 19. EICs cystine methionine alignment. non-endogenous cystine ion already well aligned difference minimal. methionine ion, however, shows improvement alignment. addition visual inspection results, also evaluate impact alignment comparing variance retention times internal standards alignment. end, first need identify chromatographic peaks sample m/z retention time close expected values internal standard. use matchValues() function MetaboAnnotation package [@rainer_modular_2022] using MzRtParam method identify chromatographic peaks similar m/z (+/- 50 ppm) retention time (+/- 10 seconds) internal standard’s values. parameters mzColname rtColname specify column names query () target (chromatographic peaks) contain m/z retention time values match entities. perform matching separately sample. internal standard every sample, use filterMatches() function SingleMatchParam() parameter select chromatographic peak highest intensity. now internal standard ID chromatographic peak sample likely represents signal ion. can now extract retention times chromatographic peaks alignment. can now evaluate impact alignment retention times internal standards across full data set: Figure 20. Retention time variation internal standards alignment. average, variation retention times internal standards across samples slightly reduced alignment. briefly touched subject correspondence determine anchor peaks alignment. Generally, goal correspondence analysis identify chromatographic peaks originate types ions samples experiment group LC-MS features. point, proper configuration parameter bw crucial. illustrate sensible choices parameter’s value can made. use plotChromPeakDensity() function simulate correspondence analysis default values PeakGroups extracted ion chromatograms two selected isotope labeled ions. plot shows EIC top panel, apex position chromatographic peaks different samples (y-axis), along retention time (x-axis) lower panel. Figure 21. Initial correspondence analysis, Cystine. Figure 22. Initial correspondence analysis, Methionine. Grouping peaks depends smoothness previousl mentionned density curve can configured parameter bw. seen , smoothness high properly group features. looking default parameters, can observe indeed, bw parameter set bw = 30, high modern UHPLC-MS setups. reduce value parameter 1.8 evaluate impact. Figure 23. Correspondence analysis optimized parameters, Cystine. Figure 24. Correspondence analysis optimized parameters, Methionine. can observe peaks now grouped accurately single feature test ion. important parameters optimized : binsize: data generated high resolution MS instrument, thus select low value paramete. ppm: TOF instruments, suggested use value ppm larger 0 accommodate higher measurement error instrument larger m/z values. minFraction: set minFraction = 0.75, hence defining features chromatographic peak identified least 75% samples one sample groups. sampleGroups: use information available sampleData’s \"phenotype\" column. correspondence analysis suggested evaluate results selected EICs. extract signal m/z similar isotope labeled methionine larger retention time range. Importantly, show actual correspondence results, set simulate = FALSE plotChromPeakDensity() function. Figure 25. Correspondence analysis results, Methionine. hoped, signal two different ions now grouped separate features. Generally, correspondence results evaluated extracted chromatograms. Another interesting information look distribution features along retention time axis. Table 5. Distribution features along retention time axis. results correspondence analysis now stored, along results preprocessing steps, within XcmsExperiment result object. correspondence results, .e., definition LC-MS features, can extracted using featureDefinitions() function. data frame provides average m/z retention time (columns \"mzmed\" \"rtmed\") characterize LC-MS feature. Column, \"peakidx\" contains indices chromatographic peaks assigned feature. actual abundances features, represent also final preprocessing results, can extracted featureValues() function: can note features (e.g. F0003 F0006) missing values samples. expected certain degree samples features, respectively ions, need present. address next section. previously observed missing values (NA) attributed various reasons. Although might represent genuinely missing value, indicating ion (feature) truly present particular sample, also result failure preceding chromatographic peak detection step. crucial able recover missing values latter category much possible reduce eventual need data imputation. next examine prevalent missing values present dataset: can observe substantial number missing values values dataset. Let’s therefore delve process gap-filling. first evaluate example features chromatographic peak detected samples: Figure 26. Examples chromatographic peaks missing values. instances, chromatographic peak identified one two selected samples (red line), hence missing value reported feature particular samples (blue line). However, cases, signal measured samples, thus, reporting missing value correct example. signal feature low, likely reason peak detection failed. rescue signal cases, fillChromPeaks() function can used ChromPeakAreaParam approach. method defines m/z-retention time area feature based detected peaks, signal respective ion expected. integrates intensities within area samples missing values feature. reported feature abundance. apply method using default parameters. fillChromPeaks() thus rescue missing data data set. Note , even sample ion present, worst case noise integrated, expected much lower actual chromatographic peak signal. Let’s look previously missing values : Figure 27. Examples chromatographic peaks missing values gap-filling. gap-filling, also blue colored sample chromatographic peak present peak area reported feature abundance sample. assess effectiveness gap-filling method rescuing signals, can also plot average signal features least one missing value average filled-signal. advisable perform analysis repeatedly measured samples; case, QC samples used. , extract: Feature values detected chromatographic peaks setting filled = FALSE featuresValues() call. filled-signal first extracting detected gap-filled abundances replace values detected chromatographic peaks NA. , calculate row averages matrices plot . detected (x-axis) gap-filled (y-axis) values QC samples highly correlated. Especially higher abundances, agreement high, low intensities, can expected, differences higher trending correlation line. , addition, fit linear regression line data summarize results linear regression line slope 1.12 intercept -1.62. indicates filled-signal average 1.12 times higher detected signal. final results LC-MS data preprocessing stored within XcmsExperiment object. includes identified chromatographic peaks, alignment results, well correspondence results. addition, guarantee reproducibility, result object keeps track performed processing steps, including individual parameter objects used configure . processHistory() function returns list various applied processing steps chronological order. , extract information first step performed preprocessing. processParam() function used extract actual parameter class used configure processing step. final result whole LC-MS data preprocessing two-dimensional matrix abundances -called LC-MS features samples. Note stage analysis features characterized m/z retention time don’t yet information metabolite feature represent. seen , feature matrix can extracted featureValues() function corresponding feature characteristics (.e., m/z retention time values) using featureDefinitions() function. Thus, two arrays extracted xcms result object used/imported analysis packages processing. example also exported tab delimited text files, used external tool, used, also MS2 spectra available, feature-based molecular networking GNPS analysis environment [@nothias_feature-based_2020]. processing R, reference link raw MS data required, suggested extract xcms preprocessing result using quantify() function SummarizedExperiment object, Bioconductor’s default container data biological assays/experiments. simplifies integration Bioconductor analysis packages. quantify() function takes parameters featureValues() function, thus, call extract SummarizedExperiment detected, gap-filled, feature abundances: Sample identifications xcms result’s sampleData() now available colData() (column, sample annotations) featureDefinitions() rowData() (row, feature annotations). feature values added first assay() SummarizedExperiment even processing history available object’s metadata(). SummarizedExperiment supports multiple assays, numeric matrices dimensions. thus add detected gap-filled feature abundances additional assay SummarizedExperiment. Feature abundances can extracted assay() function. extract first 6 lines detected gap-filled feature abundances: advantage, addition container full preprocessing results also possibility easy intuitive creation data subsets ensuring data integrity. example easy subset full data selection features /samples: moving next step analysis, advisable save preprocessing results. multiple format options save , can found MsIO package documentation. save XcmsExperiment object file format handled alabster framework, ensures object can easily read languages like Python Javascript well loaded easily back R.","code":"#' Use default Centwave parameter param <- CentWaveParam() #' Look at the default parameters param Object of class: CentWaveParam Parameters: - ppm: [1] 25 - peakwidth: [1] 20 50 - snthresh: [1] 10 - prefilter: [1] 3 100 - mzCenterFun: [1] \"wMean\" - integrate: [1] 1 - mzdiff: [1] -0.001 - fitgauss: [1] FALSE - noise: [1] 0 - verboseColumns: [1] FALSE - roiList: list() - firstBaselineCheck: [1] TRUE - roiScales: numeric(0) - extendLengthMSW: [1] FALSE - verboseBetaColumns: [1] FALSE #' Evaluate for Cystine cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column #' Evaluate for Methionine met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column #' Restrict the data to signal from cystine in the first sample cst <- lcms1[1L] |> spectra() |> filterRt(rt = c(208, 218)) |> filterMzRange(mz = fData(eic_cystine)[\"cystine_13C_15N\", c(\"mzmin\", \"mzmax\")]) #' Show the number of peaks per m/z filtered spectra lengths(cst) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #' Calculate the difference in m/z values between scans mz_diff <- cst |> mz() |> unlist() |> diff() |> abs() #' Express differences in ppm range(mz_diff * 1e6 / mean(unlist(mz(cst)))) [1] 0.08829605 14.82188728 #' Parameters adapted for chromatographic peak detection on EICs. param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2, snthresh = 2) #' Evaluate on the cystine ion cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column [1,] 209.251 207.577 212.878 4085.675 2911.376 2157.459 4 1 1 [2,] 209.251 206.182 213.995 24625.728 19074.407 12907.487 4 1 2 [3,] 209.252 207.020 214.274 19467.836 14594.041 9996.466 4 1 3 [4,] 209.251 207.577 212.041 4648.229 3202.617 2458.485 3 1 4 [5,] 208.974 206.184 213.159 23801.825 18126.978 11300.289 3 1 5 [6,] 209.250 207.018 213.714 25990.327 21036.768 13650.329 5 1 6 [7,] 209.252 207.857 212.879 4528.767 3259.039 2445.841 4 1 7 [8,] 209.252 207.299 213.995 23119.449 17274.140 12153.410 4 1 8 [9,] 208.972 206.740 212.878 28943.188 23436.119 14451.023 4 1 9 [10,] 209.252 207.578 213.437 4470.552 3065.402 2292.881 4 1 10 #' Evaluate on the methionine ion met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column [1,] 159.867 157.913 162.378 20026.61 14715.42 12555.601 4 1 1 [2,] 160.425 157.077 163.215 16827.76 11843.39 8407.699 3 1 2 [3,] 160.425 157.356 163.215 18262.45 12881.67 9283.375 3 1 3 [4,] 159.588 157.635 161.820 20987.72 15424.25 13327.811 4 1 4 [5,] 160.985 156.799 163.217 16601.72 11968.46 10012.396 4 1 5 [6,] 160.982 157.634 163.214 17243.24 12389.94 9150.079 4 1 6 [7,] 159.867 158.193 162.099 21120.10 16202.05 13531.844 3 1 7 [8,] 160.426 157.356 162.937 18937.40 13739.73 10336.000 3 1 8 [9,] 160.704 158.472 163.215 17882.21 12299.43 9395.548 3 1 9 [10,] 160.146 157.914 162.379 20275.80 14279.50 12669.821 3 1 10 #' Using the same settings, but with default snthresh param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms1 <- findChromPeaks(lcms1, param = param, chunkSize = 5) #' Update EIC internal standard object eics_is_noprocess <- eic_is eic_is <- chromatogram(lcms1,, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_noprocess) Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\") #' Get QC samples QC_samples <- sampleData(lcms1)$phenotype == \"QC\" #' extract BPC lcms1[QC_samples] |> chromatogram(aggregationFun = \"max\", chromPeaks = \"none\") |> plot(col = col_phenotype[\"QC\"], main = \"BPC of QC samples\") |> grid() # Initial correspondence analysis param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype == \"QC\", minFraction = 0.9, binSize = 0.01, ppm = 10, bw = 2) lcms1 <- groupChromPeaks(lcms1, param = param) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) #' Define parameters of choice subset <- which(sampleData(lcms1)$phenotype == \"QC\") param <- PeakGroupsParam(minFraction = 0.9, extraPeaks = 50, span = 0.5, subsetAdjust = \"average\", subset = subset) #' Perform the alignment lcms1 <- adjustRtime(lcms1, param = param) Performing retention time correction using 5373 peak groups. Aligning sample number 2 against subset ... OK Aligning sample number 3 against subset ... OK Aligning sample number 5 against subset ... OK Aligning sample number 6 against subset ... OK Aligning sample number 8 against subset ... OK Aligning sample number 9 against subset ... OK #' Visualize alignment results plotAdjustedRtime(lcms1, col = paste0(col_sample, 80), peakGroupsPch = 1) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' Get data before alignment lcms1_raw <- dropAdjustedRtime(lcms1) #' Apply the adjusted retention time to our dataset lcms1 <- applyAdjustedRtime(lcms1) #' Plot the BPC before and after alignment par(mfrow = c(2, 1), mar = c(2, 1, 1, 0.5)) chromatogram(lcms1_raw, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC before alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) chromatogram(lcms1, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC after alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) #' Store the EICs before alignment eics_is_refined <- eic_is #' Update the EICs eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_refined) #' Extract the EICs for the test ions eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] par(mfrow = c(2, 2), mar = c(4, 4.5, 2, 1)) old_eic_cystine <- eics_is_refined[\"cystine_13C_15N\"] plot(old_eic_cystine, main = \"Cystine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) old_eic_met <- eics_is_refined[\"methionine_13C_15N\"] plot(old_eic_met, main = \"Methionine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) plot(eic_cystine, main = \"Cystine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") plot(eic_met, main = \"Methionine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) #' Extract the matrix with all chromatographic peaks and add a column #' with the ID of the chromatographic peak chrom_peaks <- chromPeaks(lcms1) |> as.data.frame() chrom_peaks$peak_id <- rownames(chrom_peaks) #' Define the parameters for the matching and filtering of the matches p_1 <- MzRtParam(ppm = 50, toleranceRt = 10) p_2 <- SingleMatchParam(duplicates = \"top_ranked\", column = \"target_maxo\", decreasing = TRUE) #' Iterate over samples and identify for each the chromatographic peaks #' with similar m/z and retention time than the onse from the internal #' standard, and extract among them the ID of the peaks with the #' highest intensity. intern_standard_peaks <- lapply(seq_along(lcms1), function(i) { tmp <- chrom_peaks[chrom_peaks[, \"sample\"] == i, , drop = FALSE] mtch <- matchValues(intern_standard, tmp, mzColname = c(\"mz\", \"mz\"), rtColname = c(\"RT\", \"rt\"), param = p_1) mtch <- filterMatches(mtch, p_2) mtch$target_peak_id }) |> do.call(what = cbind) #' Define the index of the selected chromatographic peaks in the #' full chromPeaks matrix idx <- match(intern_standard_peaks, rownames(chromPeaks(lcms1))) #' Extract the raw retention times for these rt_raw <- chromPeaks(lcms1_raw)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) #' Extract the adjusted retention times for these rt_adj <- chromPeaks(lcms1)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) list(all_raw = rowSds(rt_raw, na.rm = TRUE), all_adj = rowSds(rt_adj, na.rm = TRUE) ) |> vioplot(ylab = \"sd(retention time)\") grid() #' Default parameter for the grouping and apply them to the test ions BPC param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 30) param Object of class: PeakDensityParam Parameters: - sampleGroups: [1] \"QC\" \"CVD\" \"CTR\" \"QC\" \"CTR\" \"CVD\" \"QC\" \"CTR\" \"CVD\" \"QC\" - bw: [1] 30 - minFraction: [1] 0.5 - minSamples: [1] 1 - binSize: [1] 0.25 - maxFeatures: [1] 50 - ppm: [1] 0 plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Updating parameters param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 1.8) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Define the settings for the param param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, minFraction = 0.75, binSize = 0.01, ppm = 10, bw = 1.8) #' Apply to whole data lcms1 <- groupChromPeaks(lcms1, param = param) #' Extract chromatogram for an m/z similar to the one of the labeled methionine chr_test <- chromatogram(lcms1, mz = as.matrix(intern_standard[\"methionine_13C_15N\", c(\"mzmin\", \"mzmax\")]), rt = c(145, 200), aggregationFun = \"max\") Processing chromatographic peaks Processing features plotChromPeakDensity( chr_test, simulate = FALSE, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(chr_test)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(chr_test)[, \"sample\"]], 20), peakPch = 16) # Bin features per RT slices vc <- featureDefinitions(lcms1)$rtmed breaks <- seq(0, max(vc, na.rm = TRUE) + 1, length.out = 15) |> round(0) cuts <- cut(vc, breaks = breaks, include.lowest = TRUE) table(cuts) |> kable(format = \"pipe\") #' Definition of the features featureDefinitions(lcms1) |> head() mzmed mzmin mzmax rtmed rtmin rtmax npeaks CTR CVD QC FT0001 50.98979 50.98949 50.99038 203.6001 203.1181 204.2331 8 1 3 4 FT0002 51.05904 51.05880 51.05941 191.1675 190.8787 191.5050 9 2 3 4 FT0003 51.98657 51.98631 51.98699 203.1467 202.6406 203.6710 7 0 3 4 FT0004 53.02036 53.02009 53.02043 203.2343 202.5652 204.0901 10 3 3 4 FT0005 53.52080 53.52051 53.52102 203.1936 202.8490 204.0901 10 3 3 4 FT0006 54.01007 54.00988 54.01015 159.2816 158.8499 159.4484 6 1 3 2 peakidx ms_level FT0001 7702, 16.... 1 FT0002 7176, 16.... 1 FT0003 7680, 17.... 1 FT0004 7763, 17.... 1 FT0005 8353, 17.... 1 FT0006 5800, 15.... 1 #' Extract feature abundances featureValues(lcms1, method = \"sum\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 NA 481.7436 FT0002 710.8078 875.9192 NA 693.6997 FT0003 445.5711 613.4410 NA 497.8866 FT0004 16994.5260 24605.7340 19766.707 17808.0933 FT0005 3284.2664 4526.0531 3521.822 3379.8909 FT0006 10681.7476 10009.6602 NA 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 NA 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 NA 634.9370 449.0933 NA FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 NA 7296.4262 NA 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 NA #' Percentage of missing values sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 26.41597 ftidx <- which(is.na(rowSums(featureValues(lcms1)))) fts <- rownames(featureDefinitions(lcms1))[ftidx] farea <- featureArea(lcms1, features = fts[1:2]) chromatogram(lcms1[c(2, 3)], rt = farea[, c(\"rtmin\", \"rtmax\")], mz = farea[, c(\"mzmin\", \"mzmax\")]) |> plot(col = c(\"red\", \"blue\"), lwd = 2) Processing chromatographic peaks #' Fill in the missing values in the whole dataset lcms1 <- fillChromPeaks(lcms1, param = ChromPeakAreaParam(), chunkSize = 5) #' Percentage of missing values after gap-filling sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 5.155492 Processing chromatographic peaks #' Get only detected signal in QC samples vals_detect <- featureValues(lcms1, filled = FALSE)[, QC_samples] #' Get detected and filled-in signal vals_filled <- featureValues(lcms1)[, QC_samples] #' Replace detected signal with NA vals_filled[!is.na(vals_detect)] <- NA #' Identify features with at least one filled peak has_filled <- is.na(rowSums(vals_detect)) #' Calculate row averages for features with missing values avg_detect <- rowMeans(vals_detect[has_filled, ], na.rm = TRUE) avg_filled <- rowMeans(vals_filled[has_filled, ], na.rm = TRUE) #' Plot the values against each other (in log2 scale) plot(log2(avg_detect), log2(avg_filled), xlim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), ylim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), pch = 21, bg = \"#00000020\", col = \"#00000080\") grid() abline(0, 1) #' fit a linear regression line to the data l <- lm(log2(avg_filled) ~ log2(avg_detect)) summary(l) Call: lm(formula = log2(avg_filled) ~ log2(avg_detect)) Residuals: Min 1Q Median 3Q Max -6.8176 -0.3807 0.1725 0.5492 6.7504 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.62359 0.11545 -14.06 <2e-16 *** log2(avg_detect) 1.11763 0.01259 88.75 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9366 on 2846 degrees of freedom (846 observations deleted due to missingness) Multiple R-squared: 0.7346, Adjusted R-squared: 0.7345 F-statistic: 7877 on 1 and 2846 DF, p-value: < 2.2e-16 #' Check first step of the process history processHistory(lcms1)[[1]] Object of class \"XProcessHistory\" type: Peak detection date: Mon Oct 21 15:53:03 2024 info: fileIndex: 1,2,3,4,5,6,7,8,9,10 Parameter class: CentWaveParam MS level(s) 1 #' Extract results as a SummarizedExperiment res <- quantify(lcms1, method = \"sum\", filled = FALSE) res class: SummarizedExperiment dim: 9068 10 metadata(6): '' '' ... '' '' assays(1): raw rownames(9068): FT0001 FT0002 ... FT9067 FT9068 rowData names(11): mzmed mzmin ... QC ms_level colnames(10): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML ... MS_F_POS.mzML MS_QC_POOL_4_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index assays(res)$raw_filled <- featureValues(lcms1, method = \"sum\", filled = TRUE ) #' Different assay in the SummarizedExperiment object assayNames(res) [1] \"raw\" \"raw_filled\" assay(res, \"raw_filled\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 411.3295 481.7436 FT0002 710.8078 875.9192 457.5920 693.6997 FT0003 445.5711 613.4410 277.5022 497.8866 FT0004 16994.5260 24605.7340 19766.7069 17808.0933 FT0005 3284.2664 4526.0531 3521.8221 3379.8909 FT0006 10681.7476 10009.6602 9599.9701 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 314.7567 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 425.3774 634.9370 449.0933 556.2544 FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 4792.2390 7296.4262 2382.1788 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 6911.5439 res[1:14, 3:8] class: SummarizedExperiment dim: 14 6 metadata(6): '' '' ... '' '' assays(2): raw raw_filled rownames(14): FT0001 FT0002 ... FT0013 FT0014 rowData names(11): mzmed mzmin ... QC ms_level colnames(6): MS_B_POS.mzML MS_QC_POOL_2_POS.mzML ... MS_QC_POOL_3_POS.mzML MS_E_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index #' Save the preprocessing results #' d <- file.path(tempdir(), \"objects/lcms1\") # saveMsObject(lcms1, AlabasterParam(path = d)) #' for now let's do R object because the previous method is not implemented yet. save(lcms1, file = \"preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"chromatographic-peak-detection","dir":"Articles","previous_headings":"","what":"Chromatographic peak detection","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"initial preprocessing step involves detecting intensity peaks along retention time axis, called chromatographic peaks. achieve , employ findChromPeaks() function within xcms. function supports various algorithms peak detection, can selected configured respective parameter objects. preferred algorithm case, CentWave, utilizes continuous wavelet transformation (CWT)-based peak detection [@tautenhahn_highly_2008]. method known effectiveness handling non-Gaussian shaped chromatographic peaks peaks varying retention time widths, commonly encountered HILIC separations. apply CentWave algorithm default settings extracted ion chromatogram cystine methionine ions evaluate results. CentWave highly performant algorithm, requires costumized dataset. implies parameters fine-tuned based user’s data. example serves clear motivation users familiarize various parameters need adapt data set. discuss main parameters can easily adjusted suit user’s dataset: peakwidth: Specifies minimal maximal expected width peaks retention time dimension. Highly dependent chromatographic settings used. ppm: maximal allowed difference mass peaks’ m/z values (parts-per-million) consecutive scans consider representing signal ion. integrate: parameter defines integration method. , primarily use integrate = 2 integrates also signal chromatographic peak’s tail considered accurate developers. determine peakwidth, recommend users refer previous EICs estimate range peak widths observe dataset. Ideally, examining multiple EICs goal. dataset, peak widths appear around 2 10 seconds. advise choosing range wide narrow peakwidth parameter can lead false positives negatives. determine ppm, deeper analysis dataset needed. clarified ppm depends instrument, users necessarily input vendor-advertised ppm. ’s determine accurately possible: following steps involve generating highly restricted MS area single mass peak per spectrum, representing cystine ion. m/z peaks extracted, absolute difference calculated finally expressed ppm. therefore, choose value close maximum within range parameter ppm, .e., 15 ppm. can now perform chromatographic peak detection adapted settings EICs. important note , properly estimate background noise, sufficient data points outside chromatographic peak need present. generally problem peak detection performed full LC-MS data set, peak detection EICs retention time range EIC needs sufficiently wide. function fails find peak EIC, initial troubleshooting step increase range. Additionally, signal--noise threshold snthresh reduced peak detection EICs, within small retention time range, enough signal present properly estimate background noise. Finally, case MS1 data points per peaks, setting CentWave’s advanced parameter extendLengthMSW TRUE can help peak detection. customized parameters, chromatographic peak detected sample. , use plot() function EICs visualize results. Figure 11. Chromatographic peak detection EICs. can see peak seems ot detected sample ions. indicates custom settings seem thus suitable dataset. now proceed apply entire dataset, extracting EICs ions evaluate confirm chromatographic peak detection worked expected. Note: revert value parameter snthresh default, , mentioned , background noise estimation reliable performed full data set. Parameter chunkSize findChromPeaks() defines number data files loaded memory processed simultaneously. parameter thus allows fine-tune memory demand well performance chromatographic peak detection step. plot EICs two selected internal standards evaluate chromatographic peak detection results. Figure 12. Chromatographic peak detection EICs processing. Peaks seem detected properly samples ions. indicates peak detection process entire dataset successful. identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands.","code":"#' Use default Centwave parameter param <- CentWaveParam() #' Look at the default parameters param Object of class: CentWaveParam Parameters: - ppm: [1] 25 - peakwidth: [1] 20 50 - snthresh: [1] 10 - prefilter: [1] 3 100 - mzCenterFun: [1] \"wMean\" - integrate: [1] 1 - mzdiff: [1] -0.001 - fitgauss: [1] FALSE - noise: [1] 0 - verboseColumns: [1] FALSE - roiList: list() - firstBaselineCheck: [1] TRUE - roiScales: numeric(0) - extendLengthMSW: [1] FALSE - verboseBetaColumns: [1] FALSE #' Evaluate for Cystine cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column #' Evaluate for Methionine met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column #' Restrict the data to signal from cystine in the first sample cst <- lcms1[1L] |> spectra() |> filterRt(rt = c(208, 218)) |> filterMzRange(mz = fData(eic_cystine)[\"cystine_13C_15N\", c(\"mzmin\", \"mzmax\")]) #' Show the number of peaks per m/z filtered spectra lengths(cst) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #' Calculate the difference in m/z values between scans mz_diff <- cst |> mz() |> unlist() |> diff() |> abs() #' Express differences in ppm range(mz_diff * 1e6 / mean(unlist(mz(cst)))) [1] 0.08829605 14.82188728 #' Parameters adapted for chromatographic peak detection on EICs. param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2, snthresh = 2) #' Evaluate on the cystine ion cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column [1,] 209.251 207.577 212.878 4085.675 2911.376 2157.459 4 1 1 [2,] 209.251 206.182 213.995 24625.728 19074.407 12907.487 4 1 2 [3,] 209.252 207.020 214.274 19467.836 14594.041 9996.466 4 1 3 [4,] 209.251 207.577 212.041 4648.229 3202.617 2458.485 3 1 4 [5,] 208.974 206.184 213.159 23801.825 18126.978 11300.289 3 1 5 [6,] 209.250 207.018 213.714 25990.327 21036.768 13650.329 5 1 6 [7,] 209.252 207.857 212.879 4528.767 3259.039 2445.841 4 1 7 [8,] 209.252 207.299 213.995 23119.449 17274.140 12153.410 4 1 8 [9,] 208.972 206.740 212.878 28943.188 23436.119 14451.023 4 1 9 [10,] 209.252 207.578 213.437 4470.552 3065.402 2292.881 4 1 10 #' Evaluate on the methionine ion met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column [1,] 159.867 157.913 162.378 20026.61 14715.42 12555.601 4 1 1 [2,] 160.425 157.077 163.215 16827.76 11843.39 8407.699 3 1 2 [3,] 160.425 157.356 163.215 18262.45 12881.67 9283.375 3 1 3 [4,] 159.588 157.635 161.820 20987.72 15424.25 13327.811 4 1 4 [5,] 160.985 156.799 163.217 16601.72 11968.46 10012.396 4 1 5 [6,] 160.982 157.634 163.214 17243.24 12389.94 9150.079 4 1 6 [7,] 159.867 158.193 162.099 21120.10 16202.05 13531.844 3 1 7 [8,] 160.426 157.356 162.937 18937.40 13739.73 10336.000 3 1 8 [9,] 160.704 158.472 163.215 17882.21 12299.43 9395.548 3 1 9 [10,] 160.146 157.914 162.379 20275.80 14279.50 12669.821 3 1 10 #' Using the same settings, but with default snthresh param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms1 <- findChromPeaks(lcms1, param = param, chunkSize = 5) #' Update EIC internal standard object eics_is_noprocess <- eic_is eic_is <- chromatogram(lcms1,, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_noprocess) Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"refine-identified-chromatographic-peaks","dir":"Articles","previous_headings":"Data preprocessing","what":"Refine identified chromatographic peaks","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands.","code":"Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"retention-time-alignment","dir":"Articles","previous_headings":"","what":"Retention time alignment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Despite using chromatographic settings conditions retention time shifts unavoidable. Indeed, performance instrument can change time, example due small variations environmental conditions, temperature pressure. shifts generally small samples measured within batch/measurement run, can considerable data experiment acquired across longer time period. evaluate presence shift extract plot BPC QC samples. Figure 15. BPC QC samples. QC samples representing sample (pool) measured regular intervals measurement run experiment measured day. Still, small shifts can observed, especially region 100 150 seconds. facilitate proper correspondence signals across samples (hence definition LC-MS features), essential minimize differences retention times. Theoretically, proceed two steps: first select QC samples dataset first alignment , using -called anchor peaks. way can assume linear shift time, since always measuring sample different regular time intervals. Despite external QCs data set, still use subset-based alignment assuming retention time shifts independent different sample matrix (human serum plasma) instead mostly instrument-dependent. Note also possible manually specify anchor peaks, respectively retention times align data set external, reference, data set. information provided vignettes xcms package. calculating much adjust retention time samples, apply shift also study samples. xcms retention time alignment can performed using adjustRtime() function alignment algorithm. example use PeakGroups method [@smith_xcms_2006] performs alignment minimizing differences retention times set anchor peaks different samples. method requires initial correspondence analysis match/group chromatographic peaks across samples algorithm selects anchor peaks alignment. initial correspondence, use PeakDensity approach [@smith_xcms_2006] groups chromatographic peaks similar m/z retention time LC-MS features. parameters algorithm, can configured using PeakDensityParam object, sampleGroups, minFraction, binSize, ppm bw. binSize, ppm bw allow specify similar chromatographic peaks’ m/z retention time values need consider grouping feature. binSize ppm define required similarity m/z values. Within m/z bin (defined binSize ppm) areas along retention time axis high chromatographic peak density (considering peaks samples) identified, chromatographic peaks within regions considered grouping feature. High density areas identified using base R density() function, bw parameter: higher values define wider retention time areas, lower values require chromatographic peaks similar retention times. parameter can seen black line plot , corresponding smoothness density curve. Whether candidate peaks get grouped feature depends also parameters sampleGroups minFraction: sampleGroups provide, sample, sample group belongs . minFraction expected value 0 1 defining proportion samples within least one sample groups (defined sampleGroups) chromatographic peaks detected group feature. initial correspondence, parameters don’t need fully optimized. Selection dataset-specific parameter values described detail next section. dataset, use small values binSize ppm , importantly, also parameter bw, since data set ultra high performance (UHP) LC setup used. minFraction use high value (0.9) ensure features defined chromatographic peaks present almost samples one sample group (can used anchor peaks actual alignment). base alignment later QC samples hence define sampleGroups binary variable grouping samples either study, QC group. Figure 16. Initial correspondence analysis. PeakGroups-based alignment can next performed using adjustRtime() function PeakGroupsParam parameter object. parameters algorithm : subsetAdjust subset: Allows subset alignment. base retention time alignment QC samples, .e., retention time shifts estimated based repeatedly measured samples. resulting adjustment applied entire data. data sets QC samples (e.g. sample pools) measured repeatedly, strongly suggest use method. Note also subset-based alignment samples ordered injection index (.e., order measured measurement run). minFraction: value 0 1 defining proportion samples (full data set, data subset defined subset) chromatographic peak identified use anchor peak. contrast PeakDensityParam parameter used define proportion within sample group. span: PeakGroups method allows, depending data, adjust regions along retention time axis differently. enable local alignments LOESS function used parameter defines degree smoothing function. Generally, values 0.4 0.6 used, however, suggested evaluate alignment results eventually adapt parameters result satisfactory. perform alignment data set based retention times anchor peaks defined subset QC samples. Alignment adjusted retention times spectra data set, well retention times identified chromatographic peaks. alignment performed, user evaluate results using plotAdjustedRtime() function. function visualizes difference adjusted raw retention time sample y-axis along adjusted retention time x-axis. Dot points represent position used anchor peak along retention time axis. optimal alignment areas along retention time axis, anchor peaks scattered retention time dimension. Figure 17. Retention time alignment results. samples present data set measured within measurement run, resulting small retention time shifts. Therefore, little adjustments needed performed (shifts maximum 1 second can seen plot ). Generally, magnitude adjustment seen plots match expectation analyst. can also compare BPC alignment. get original data, .e. raw retention times, can use dropAdjustedRtime() function: Figure 18. BPC alignment. largest shift can observed retention time range 120 130s. Apart retention time range, little changes can observed. next evaluate impact alignment EICs selected internal standards. thus first extract ion chromatograms alignment. can now evaluate alignment effect test ions. plot EICs alignment isotope labeled cystine methionine. Figure 19. EICs cystine methionine alignment. non-endogenous cystine ion already well aligned difference minimal. methionine ion, however, shows improvement alignment. addition visual inspection results, also evaluate impact alignment comparing variance retention times internal standards alignment. end, first need identify chromatographic peaks sample m/z retention time close expected values internal standard. use matchValues() function MetaboAnnotation package [@rainer_modular_2022] using MzRtParam method identify chromatographic peaks similar m/z (+/- 50 ppm) retention time (+/- 10 seconds) internal standard’s values. parameters mzColname rtColname specify column names query () target (chromatographic peaks) contain m/z retention time values match entities. perform matching separately sample. internal standard every sample, use filterMatches() function SingleMatchParam() parameter select chromatographic peak highest intensity. now internal standard ID chromatographic peak sample likely represents signal ion. can now extract retention times chromatographic peaks alignment. can now evaluate impact alignment retention times internal standards across full data set: Figure 20. Retention time variation internal standards alignment. average, variation retention times internal standards across samples slightly reduced alignment.","code":"#' Get QC samples QC_samples <- sampleData(lcms1)$phenotype == \"QC\" #' extract BPC lcms1[QC_samples] |> chromatogram(aggregationFun = \"max\", chromPeaks = \"none\") |> plot(col = col_phenotype[\"QC\"], main = \"BPC of QC samples\") |> grid() # Initial correspondence analysis param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype == \"QC\", minFraction = 0.9, binSize = 0.01, ppm = 10, bw = 2) lcms1 <- groupChromPeaks(lcms1, param = param) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) #' Define parameters of choice subset <- which(sampleData(lcms1)$phenotype == \"QC\") param <- PeakGroupsParam(minFraction = 0.9, extraPeaks = 50, span = 0.5, subsetAdjust = \"average\", subset = subset) #' Perform the alignment lcms1 <- adjustRtime(lcms1, param = param) Performing retention time correction using 5373 peak groups. Aligning sample number 2 against subset ... OK Aligning sample number 3 against subset ... OK Aligning sample number 5 against subset ... OK Aligning sample number 6 against subset ... OK Aligning sample number 8 against subset ... OK Aligning sample number 9 against subset ... OK #' Visualize alignment results plotAdjustedRtime(lcms1, col = paste0(col_sample, 80), peakGroupsPch = 1) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' Get data before alignment lcms1_raw <- dropAdjustedRtime(lcms1) #' Apply the adjusted retention time to our dataset lcms1 <- applyAdjustedRtime(lcms1) #' Plot the BPC before and after alignment par(mfrow = c(2, 1), mar = c(2, 1, 1, 0.5)) chromatogram(lcms1_raw, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC before alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) chromatogram(lcms1, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC after alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) #' Store the EICs before alignment eics_is_refined <- eic_is #' Update the EICs eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_refined) #' Extract the EICs for the test ions eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] par(mfrow = c(2, 2), mar = c(4, 4.5, 2, 1)) old_eic_cystine <- eics_is_refined[\"cystine_13C_15N\"] plot(old_eic_cystine, main = \"Cystine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) old_eic_met <- eics_is_refined[\"methionine_13C_15N\"] plot(old_eic_met, main = \"Methionine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) plot(eic_cystine, main = \"Cystine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") plot(eic_met, main = \"Methionine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) #' Extract the matrix with all chromatographic peaks and add a column #' with the ID of the chromatographic peak chrom_peaks <- chromPeaks(lcms1) |> as.data.frame() chrom_peaks$peak_id <- rownames(chrom_peaks) #' Define the parameters for the matching and filtering of the matches p_1 <- MzRtParam(ppm = 50, toleranceRt = 10) p_2 <- SingleMatchParam(duplicates = \"top_ranked\", column = \"target_maxo\", decreasing = TRUE) #' Iterate over samples and identify for each the chromatographic peaks #' with similar m/z and retention time than the onse from the internal #' standard, and extract among them the ID of the peaks with the #' highest intensity. intern_standard_peaks <- lapply(seq_along(lcms1), function(i) { tmp <- chrom_peaks[chrom_peaks[, \"sample\"] == i, , drop = FALSE] mtch <- matchValues(intern_standard, tmp, mzColname = c(\"mz\", \"mz\"), rtColname = c(\"RT\", \"rt\"), param = p_1) mtch <- filterMatches(mtch, p_2) mtch$target_peak_id }) |> do.call(what = cbind) #' Define the index of the selected chromatographic peaks in the #' full chromPeaks matrix idx <- match(intern_standard_peaks, rownames(chromPeaks(lcms1))) #' Extract the raw retention times for these rt_raw <- chromPeaks(lcms1_raw)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) #' Extract the adjusted retention times for these rt_adj <- chromPeaks(lcms1)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) list(all_raw = rowSds(rt_raw, na.rm = TRUE), all_adj = rowSds(rt_adj, na.rm = TRUE) ) |> vioplot(ylab = \"sd(retention time)\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"correspondence","dir":"Articles","previous_headings":"","what":"Correspondence","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"briefly touched subject correspondence determine anchor peaks alignment. Generally, goal correspondence analysis identify chromatographic peaks originate types ions samples experiment group LC-MS features. point, proper configuration parameter bw crucial. illustrate sensible choices parameter’s value can made. use plotChromPeakDensity() function simulate correspondence analysis default values PeakGroups extracted ion chromatograms two selected isotope labeled ions. plot shows EIC top panel, apex position chromatographic peaks different samples (y-axis), along retention time (x-axis) lower panel. Figure 21. Initial correspondence analysis, Cystine. Figure 22. Initial correspondence analysis, Methionine. Grouping peaks depends smoothness previousl mentionned density curve can configured parameter bw. seen , smoothness high properly group features. looking default parameters, can observe indeed, bw parameter set bw = 30, high modern UHPLC-MS setups. reduce value parameter 1.8 evaluate impact. Figure 23. Correspondence analysis optimized parameters, Cystine. Figure 24. Correspondence analysis optimized parameters, Methionine. can observe peaks now grouped accurately single feature test ion. important parameters optimized : binsize: data generated high resolution MS instrument, thus select low value paramete. ppm: TOF instruments, suggested use value ppm larger 0 accommodate higher measurement error instrument larger m/z values. minFraction: set minFraction = 0.75, hence defining features chromatographic peak identified least 75% samples one sample groups. sampleGroups: use information available sampleData’s \"phenotype\" column. correspondence analysis suggested evaluate results selected EICs. extract signal m/z similar isotope labeled methionine larger retention time range. Importantly, show actual correspondence results, set simulate = FALSE plotChromPeakDensity() function. Figure 25. Correspondence analysis results, Methionine. hoped, signal two different ions now grouped separate features. Generally, correspondence results evaluated extracted chromatograms. Another interesting information look distribution features along retention time axis. Table 5. Distribution features along retention time axis. results correspondence analysis now stored, along results preprocessing steps, within XcmsExperiment result object. correspondence results, .e., definition LC-MS features, can extracted using featureDefinitions() function. data frame provides average m/z retention time (columns \"mzmed\" \"rtmed\") characterize LC-MS feature. Column, \"peakidx\" contains indices chromatographic peaks assigned feature. actual abundances features, represent also final preprocessing results, can extracted featureValues() function: can note features (e.g. F0003 F0006) missing values samples. expected certain degree samples features, respectively ions, need present. address next section.","code":"#' Default parameter for the grouping and apply them to the test ions BPC param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 30) param Object of class: PeakDensityParam Parameters: - sampleGroups: [1] \"QC\" \"CVD\" \"CTR\" \"QC\" \"CTR\" \"CVD\" \"QC\" \"CTR\" \"CVD\" \"QC\" - bw: [1] 30 - minFraction: [1] 0.5 - minSamples: [1] 1 - binSize: [1] 0.25 - maxFeatures: [1] 50 - ppm: [1] 0 plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Updating parameters param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 1.8) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Define the settings for the param param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, minFraction = 0.75, binSize = 0.01, ppm = 10, bw = 1.8) #' Apply to whole data lcms1 <- groupChromPeaks(lcms1, param = param) #' Extract chromatogram for an m/z similar to the one of the labeled methionine chr_test <- chromatogram(lcms1, mz = as.matrix(intern_standard[\"methionine_13C_15N\", c(\"mzmin\", \"mzmax\")]), rt = c(145, 200), aggregationFun = \"max\") Processing chromatographic peaks Processing features plotChromPeakDensity( chr_test, simulate = FALSE, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(chr_test)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(chr_test)[, \"sample\"]], 20), peakPch = 16) # Bin features per RT slices vc <- featureDefinitions(lcms1)$rtmed breaks <- seq(0, max(vc, na.rm = TRUE) + 1, length.out = 15) |> round(0) cuts <- cut(vc, breaks = breaks, include.lowest = TRUE) table(cuts) |> kable(format = \"pipe\") #' Definition of the features featureDefinitions(lcms1) |> head() mzmed mzmin mzmax rtmed rtmin rtmax npeaks CTR CVD QC FT0001 50.98979 50.98949 50.99038 203.6001 203.1181 204.2331 8 1 3 4 FT0002 51.05904 51.05880 51.05941 191.1675 190.8787 191.5050 9 2 3 4 FT0003 51.98657 51.98631 51.98699 203.1467 202.6406 203.6710 7 0 3 4 FT0004 53.02036 53.02009 53.02043 203.2343 202.5652 204.0901 10 3 3 4 FT0005 53.52080 53.52051 53.52102 203.1936 202.8490 204.0901 10 3 3 4 FT0006 54.01007 54.00988 54.01015 159.2816 158.8499 159.4484 6 1 3 2 peakidx ms_level FT0001 7702, 16.... 1 FT0002 7176, 16.... 1 FT0003 7680, 17.... 1 FT0004 7763, 17.... 1 FT0005 8353, 17.... 1 FT0006 5800, 15.... 1 #' Extract feature abundances featureValues(lcms1, method = \"sum\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 NA 481.7436 FT0002 710.8078 875.9192 NA 693.6997 FT0003 445.5711 613.4410 NA 497.8866 FT0004 16994.5260 24605.7340 19766.707 17808.0933 FT0005 3284.2664 4526.0531 3521.822 3379.8909 FT0006 10681.7476 10009.6602 NA 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 NA 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 NA 634.9370 449.0933 NA FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 NA 7296.4262 NA 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 NA"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"gap-filling","dir":"Articles","previous_headings":"","what":"Gap filling","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"previously observed missing values (NA) attributed various reasons. Although might represent genuinely missing value, indicating ion (feature) truly present particular sample, also result failure preceding chromatographic peak detection step. crucial able recover missing values latter category much possible reduce eventual need data imputation. next examine prevalent missing values present dataset: can observe substantial number missing values values dataset. Let’s therefore delve process gap-filling. first evaluate example features chromatographic peak detected samples: Figure 26. Examples chromatographic peaks missing values. instances, chromatographic peak identified one two selected samples (red line), hence missing value reported feature particular samples (blue line). However, cases, signal measured samples, thus, reporting missing value correct example. signal feature low, likely reason peak detection failed. rescue signal cases, fillChromPeaks() function can used ChromPeakAreaParam approach. method defines m/z-retention time area feature based detected peaks, signal respective ion expected. integrates intensities within area samples missing values feature. reported feature abundance. apply method using default parameters. fillChromPeaks() thus rescue missing data data set. Note , even sample ion present, worst case noise integrated, expected much lower actual chromatographic peak signal. Let’s look previously missing values : Figure 27. Examples chromatographic peaks missing values gap-filling. gap-filling, also blue colored sample chromatographic peak present peak area reported feature abundance sample. assess effectiveness gap-filling method rescuing signals, can also plot average signal features least one missing value average filled-signal. advisable perform analysis repeatedly measured samples; case, QC samples used. , extract: Feature values detected chromatographic peaks setting filled = FALSE featuresValues() call. filled-signal first extracting detected gap-filled abundances replace values detected chromatographic peaks NA. , calculate row averages matrices plot . detected (x-axis) gap-filled (y-axis) values QC samples highly correlated. Especially higher abundances, agreement high, low intensities, can expected, differences higher trending correlation line. , addition, fit linear regression line data summarize results linear regression line slope 1.12 intercept -1.62. indicates filled-signal average 1.12 times higher detected signal.","code":"#' Percentage of missing values sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 26.41597 ftidx <- which(is.na(rowSums(featureValues(lcms1)))) fts <- rownames(featureDefinitions(lcms1))[ftidx] farea <- featureArea(lcms1, features = fts[1:2]) chromatogram(lcms1[c(2, 3)], rt = farea[, c(\"rtmin\", \"rtmax\")], mz = farea[, c(\"mzmin\", \"mzmax\")]) |> plot(col = c(\"red\", \"blue\"), lwd = 2) Processing chromatographic peaks #' Fill in the missing values in the whole dataset lcms1 <- fillChromPeaks(lcms1, param = ChromPeakAreaParam(), chunkSize = 5) #' Percentage of missing values after gap-filling sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 5.155492 Processing chromatographic peaks #' Get only detected signal in QC samples vals_detect <- featureValues(lcms1, filled = FALSE)[, QC_samples] #' Get detected and filled-in signal vals_filled <- featureValues(lcms1)[, QC_samples] #' Replace detected signal with NA vals_filled[!is.na(vals_detect)] <- NA #' Identify features with at least one filled peak has_filled <- is.na(rowSums(vals_detect)) #' Calculate row averages for features with missing values avg_detect <- rowMeans(vals_detect[has_filled, ], na.rm = TRUE) avg_filled <- rowMeans(vals_filled[has_filled, ], na.rm = TRUE) #' Plot the values against each other (in log2 scale) plot(log2(avg_detect), log2(avg_filled), xlim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), ylim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), pch = 21, bg = \"#00000020\", col = \"#00000080\") grid() abline(0, 1) #' fit a linear regression line to the data l <- lm(log2(avg_filled) ~ log2(avg_detect)) summary(l) Call: lm(formula = log2(avg_filled) ~ log2(avg_detect)) Residuals: Min 1Q Median 3Q Max -6.8176 -0.3807 0.1725 0.5492 6.7504 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.62359 0.11545 -14.06 <2e-16 *** log2(avg_detect) 1.11763 0.01259 88.75 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9366 on 2846 degrees of freedom (846 observations deleted due to missingness) Multiple R-squared: 0.7346, Adjusted R-squared: 0.7345 F-statistic: 7877 on 1 and 2846 DF, p-value: < 2.2e-16"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"preprocessing-results","dir":"Articles","previous_headings":"","what":"Preprocessing results","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"final results LC-MS data preprocessing stored within XcmsExperiment object. includes identified chromatographic peaks, alignment results, well correspondence results. addition, guarantee reproducibility, result object keeps track performed processing steps, including individual parameter objects used configure . processHistory() function returns list various applied processing steps chronological order. , extract information first step performed preprocessing. processParam() function used extract actual parameter class used configure processing step. final result whole LC-MS data preprocessing two-dimensional matrix abundances -called LC-MS features samples. Note stage analysis features characterized m/z retention time don’t yet information metabolite feature represent. seen , feature matrix can extracted featureValues() function corresponding feature characteristics (.e., m/z retention time values) using featureDefinitions() function. Thus, two arrays extracted xcms result object used/imported analysis packages processing. example also exported tab delimited text files, used external tool, used, also MS2 spectra available, feature-based molecular networking GNPS analysis environment [@nothias_feature-based_2020]. processing R, reference link raw MS data required, suggested extract xcms preprocessing result using quantify() function SummarizedExperiment object, Bioconductor’s default container data biological assays/experiments. simplifies integration Bioconductor analysis packages. quantify() function takes parameters featureValues() function, thus, call extract SummarizedExperiment detected, gap-filled, feature abundances: Sample identifications xcms result’s sampleData() now available colData() (column, sample annotations) featureDefinitions() rowData() (row, feature annotations). feature values added first assay() SummarizedExperiment even processing history available object’s metadata(). SummarizedExperiment supports multiple assays, numeric matrices dimensions. thus add detected gap-filled feature abundances additional assay SummarizedExperiment. Feature abundances can extracted assay() function. extract first 6 lines detected gap-filled feature abundances: advantage, addition container full preprocessing results also possibility easy intuitive creation data subsets ensuring data integrity. example easy subset full data selection features /samples: moving next step analysis, advisable save preprocessing results. multiple format options save , can found MsIO package documentation. save XcmsExperiment object file format handled alabster framework, ensures object can easily read languages like Python Javascript well loaded easily back R.","code":"#' Check first step of the process history processHistory(lcms1)[[1]] Object of class \"XProcessHistory\" type: Peak detection date: Mon Oct 21 15:53:03 2024 info: fileIndex: 1,2,3,4,5,6,7,8,9,10 Parameter class: CentWaveParam MS level(s) 1 #' Extract results as a SummarizedExperiment res <- quantify(lcms1, method = \"sum\", filled = FALSE) res class: SummarizedExperiment dim: 9068 10 metadata(6): '' '' ... '' '' assays(1): raw rownames(9068): FT0001 FT0002 ... FT9067 FT9068 rowData names(11): mzmed mzmin ... QC ms_level colnames(10): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML ... MS_F_POS.mzML MS_QC_POOL_4_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index assays(res)$raw_filled <- featureValues(lcms1, method = \"sum\", filled = TRUE ) #' Different assay in the SummarizedExperiment object assayNames(res) [1] \"raw\" \"raw_filled\" assay(res, \"raw_filled\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 411.3295 481.7436 FT0002 710.8078 875.9192 457.5920 693.6997 FT0003 445.5711 613.4410 277.5022 497.8866 FT0004 16994.5260 24605.7340 19766.7069 17808.0933 FT0005 3284.2664 4526.0531 3521.8221 3379.8909 FT0006 10681.7476 10009.6602 9599.9701 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 314.7567 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 425.3774 634.9370 449.0933 556.2544 FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 4792.2390 7296.4262 2382.1788 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 6911.5439 res[1:14, 3:8] class: SummarizedExperiment dim: 14 6 metadata(6): '' '' ... '' '' assays(2): raw raw_filled rownames(14): FT0001 FT0002 ... FT0013 FT0014 rowData names(11): mzmed mzmin ... QC ms_level colnames(6): MS_B_POS.mzML MS_QC_POOL_2_POS.mzML ... MS_QC_POOL_3_POS.mzML MS_E_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index #' Save the preprocessing results #' d <- file.path(tempdir(), \"objects/lcms1\") # saveMsObject(lcms1, AlabasterParam(path = d)) #' for now let's do R object because the previous method is not implemented yet. save(lcms1, file = \"preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-normalization","dir":"Articles","previous_headings":"","what":"Data normalization","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"preprocessing, data normalization scaling might need applied remove technical variances data. simple approaches like median scaling can implemented lines R code, advanced normalization algorithms available packages Bioconductor’s preprocessCore. comprehensive workflow “Notame” also propose interesting normalization approach adaptable scalable user dataset [@klavus_notame_2020]. Generally, LC-MS data, bias can categorized three main groups[@broadhurst_guidelines_2018]: Variances introduced sample collection initial processing, can include differences sample amounts. type bias expected sample-specific affect signals sample way. Methods like median scaling, LOESS quantiles normalization can adjust bias. Signal drifts along measurement samples experiment. Reasons drifts can related aging instrumentation used (columns, detector), also changes metabolite abundances characteristics due reactions modifications, oxidation. changes expected affect samples measured later run rather ones measured beginning. reason, bias can play major role large experiments bias can play major role large experiments measured long time range usually considered affect individual metabolites (metabolite groups) differently. adjustment, moving average linear regression-based approaches can used. latter can example performed using adjust_lm() function MetaboCoreUtils package. Batch-related biases. comprise noise specific larger set samples, can set samples measured one LC-MS measurement run (.e. one analysis plate) samples measured using specific batch reagents. noise assumed affect samples one batch way linear modeling-based approaches can used adjust . Unwanted variation can arise various sources highly dependent experiment. Therefore, data normalization chosen carefully based experimental design, statistical aims, balance accuracy precision achieved use auxiliary information. Sample preparation biases can evaluated using internal standards, depending however also added sample mixes sample processing. Repeated measurements QC samples hand allows estimate correct LC-MS specific biases. Also, proper planning experiment, measurement study samples random order, can largely avoid biases introduced mentioned sources variance. workflow present tools assess data quality evaluate need normalization well options normalization. space reasons able provide solutions adjust possible sources variation. principal component analysis (PCA) helpful tool initial, unsupervised, visualization data also provides insights potential quality issues data. order apply PCA measured feature abundances, need however impute (still present) missing values. assume missing values (gap-filling step) represent signal detection limit. cases, missing values can replaced random values sampled uniform distribution, ranging half smallest measured value smallest measured value specific feature. uniform distribution defined two parameters (minimum maximum) values equal probability selected. impute missing values approach add resulting data matrix new assay result object. PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data. Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization. Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses. previous RLA plot showed data biases need corrected. Therefore, implement -sample normalization using filled-features. process effectively mitigates variations influenced technical issues, differences sample preparation, processing injection methods. instance, employ commonly used technique known median scaling [@de_livera_normalizing_2012]. method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary. crucial evaluate effectiveness normalization process. can achieved comparing distribution log2 transformed feature abundances normalization. Additionally, RLA plots can used assess tightness replicates within groups compare behavior groups. Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical. compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero. next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data. overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":"#' Load preprocessing results ## load(\"SumExp.RData\") ## loadResults(RDataParam(\"data.RData\")) #' Impute missing values using an uniform distribution na_unidis <- function(z) { na <- is.na(z) if (any(na)) { min = min(z, na.rm = TRUE) z[na] <- runif(sum(na), min = min/2, max = min) } z } #' Row-wise impute missing values and add the data as a new assay tmp <- apply(assay(res, \"raw_filled\"), MARGIN = 1, na_unidis) assays(res)$raw_filled_imputed <- t(tmp) #' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2) layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") # Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ] #' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/') #' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\") #' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"initial-quality-assessment","dir":"Articles","previous_headings":"","what":"Initial quality assessment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"principal component analysis (PCA) helpful tool initial, unsupervised, visualization data also provides insights potential quality issues data. order apply PCA measured feature abundances, need however impute (still present) missing values. assume missing values (gap-filling step) represent signal detection limit. cases, missing values can replaced random values sampled uniform distribution, ranging half smallest measured value smallest measured value specific feature. uniform distribution defined two parameters (minimum maximum) values equal probability selected. impute missing values approach add resulting data matrix new assay result object. PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data. Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization. Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses.","code":"#' Load preprocessing results ## load(\"SumExp.RData\") ## loadResults(RDataParam(\"data.RData\")) #' Impute missing values using an uniform distribution na_unidis <- function(z) { na <- is.na(z) if (any(na)) { min = min(z, na.rm = TRUE) z[na] <- runif(sum(na), min = min/2, max = min) } z } #' Row-wise impute missing values and add the data as a new assay tmp <- apply(assay(res, \"raw_filled\"), MARGIN = 1, na_unidis) assays(res)$raw_filled_imputed <- t(tmp) #' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2) layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") # Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ]"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"principal-component-analysis","dir":"Articles","previous_headings":"Data normalization","what":"Principal Component Analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data.","code":"#' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"intensity-evaluation","dir":"Articles","previous_headings":"Data normalization","what":"Intensity evaluation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization.","code":"layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"internal-standards","dir":"Articles","previous_headings":"Data normalization","what":"Internal standards","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses.","code":"# Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ]"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"between-sample-normalisation","dir":"Articles","previous_headings":"","what":"Between sample normalisation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"previous RLA plot showed data biases need corrected. Therefore, implement -sample normalization using filled-features. process effectively mitigates variations influenced technical issues, differences sample preparation, processing injection methods. instance, employ commonly used technique known median scaling [@de_livera_normalizing_2012]. method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary.","code":"#' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/')"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"median-scaling","dir":"Articles","previous_headings":"Data normalization","what":"Median scaling","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary.","code":"#' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/')"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"assessing-overall-effectiveness-of-the-normalization-approach","dir":"Articles","previous_headings":"","what":"Assessing overall effectiveness of the normalization approach","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"crucial evaluate effectiveness normalization process. can achieved comparing distribution log2 transformed feature abundances normalization. Additionally, RLA plots can used assess tightness replicates within groups compare behavior groups. Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical. compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero. next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data. overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":"#' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\") #' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"principal-component-analysis-1","dir":"Articles","previous_headings":"Data normalization","what":"Principal Component Analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical.","code":"#' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"intensity-evaluation-1","dir":"Articles","previous_headings":"Data normalization","what":"Intensity evaluation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero.","code":"par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"coefficient-of-variation","dir":"Articles","previous_headings":"Data normalization","what":"Coefficient of variation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data.","code":"#' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"conclusion-on-normalization","dir":"Articles","previous_headings":"Data normalization","what":"Conclusion on normalization","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"quality-control-feature-prefiltering","dir":"Articles","previous_headings":"","what":"Quality control: Feature prefiltering","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"normalizing data can now pre-filter clean data performing statistical analysis. general, pre-filtering samples features performed remove outliers. copy original result object also keep unfiltered data later comparisons. eliminate features exhibit high variability dataset. Repeatedly measured QC samples typically serve robust basis cleansing datasets allowing identify features excessively high noise. data set external QC samples used, .e. pooled samples different collection using slightly different sample matrix, utility filtering somewhat limited. comprehensive description guidelines data filtering untargeted metabolomic studies, please refer [@broadhurst_guidelines_2018]. first restrict data set features chromatographic peak detected least 2/3 samples least one study samples groups. ensures statistical tests carried later study samples performed reliable signal. Also, filter remove features mostly detected QC samples, study samples. filter can performed filterFeatures() function xcms package PercentMissingFilter setting. parameters filer: threshold: defines maximal acceptable percentage samples missing value(s) least one sample groups defined parameter f. f: factor defining sample groups. replacing \"QC\" sample group NA parameter f exclude QC samples evaluation consider study samples. threshold = 40 keep features peak detected 2 3 samples one sample groups. consider detected chromatographic peaks per sample, apply filter \"raw\" assay result object, contains abundance values detected chromatographic peaks (prior gap-filling). Following guidelines stated decided still use QC samples pre-filtering, basis represent similar bio-fluids study samples, thus, anticipate observing relatively similar metabolites affected similar measurement biases. therefore evaluate dispersion ratio (Dratio) [@broadhurst_guidelines_2018] features data set. accomplish task using function time DratioFilter parameter. filters exist function invite user explore decide best dataset. Dratio filter powerful tool identify features exhibit high variability data, relating variance observed QC samples study samples. setting threshold 0.4, remove features high degree variability QC study samples. example, feature deviation QC higher 40% (threshold = 0.4)deviation study samples removed. filtering step ensures features retained considerably lower technical biological variance. Note rowDratio() rowRsd() functions MetaboCoreUtils package used calculate actual numeric values estimates used filtering, e.g. evaluate distribution whole data set identify data set-dependent threshold values. Finally, evaluate number features left filtering steps calculate percentage features removed. dataset reduced 9068 4279 features. remove considerable amount features expected want focus reliable features analysis. rest analysis need separate QC samples study samples. store QC samples separate object later use. addition calculate CV QC samples add additional column rowData() result object. used later prioritize identified significant features e.g. low technical noise. Now data set preprocessed, normalized filtered, can start evaluate distribution data estimate variation due biology.","code":"#' Number of features before filtering nrow(res) [1] 9068 #' keep unfiltered object res_unfilt <- res #' Limit features to those with at least two detected peaks in one study group. #' Setting the value for QC samples to NA excludes QC samples from the #' calculation. f <- res$phenotype f[f == \"QC\"] <- NA f <- as.factor(f) res <- filterFeatures(res, PercentMissingFilter(f = f, threshold = 40), assay = \"raw\") 1808 features were removed #' Compute and filter based on the Dratio filter_dratio <- DratioFilter(threshold = 0.4, qcIndex = res$phenotype == \"QC\", studyIndex = res$phenotype != \"QC\", mad = TRUE) res <- filterFeatures(res, filter = filter_dratio, assay = \"norm_imputed\") 2981 features were removed #' Number of features after analysis nrow(res) [1] 4279 #' Percentage left: end/beginning nrow(res)/nrow(res_unfilt) * 100 [1] 47.18791 res_qc <- res[, res$phenotype == \"QC\"] res <- res[, res$phenotype != \"QC\"] #' Calculate the QC's CV and add as feature variable to the data set rowData(res)$qc_cv <- assay(res_qc, \"norm\") |> rowRsd()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"differential-abundance-analysis","dir":"Articles","previous_headings":"","what":"Differential abundance analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"normalization quality control, next step identify features differentially abundant study groups. crucial step allows us identify potential biomarkers metabolites associated study groups. various approaches methods available identification features interest. workflow use multiple linear regression analysis identify features significantly difference abundances CVD CTR study group. performing tests evaluate similarities study samples using PCA (excluding QC samples avoid influencing results). Figure 34. PCA data normalization quality control. samples clearly separate study group PCA indicating differences metabolite profiles two groups. However, drives separation PC1 clear. evaluate whether explained available variable study, .e., age: Figure 35. PCA colored age data normalization quality control. According PCA , PC1 seem related age. Even variance data set can’t explain stage, proceed (supervised) statistical tests identify features interest. compute linear models metabolite explaining observed feature abundance available study variables. also use base R function lm(), utilize R Biocpkg(\"limma\") package conduct differential abundance analysis: moderated test statistics [@smyth_linear_2004] provided package specifically well suited experiments limited number replicates. tests use linear model ~ phenotype + age, hence explaining abundances one metabolite accounting study group assignment age individual. analysis might benefit inclusion study covariate associated PC2 explaining variance seen principal component, present analysis participant’s age disease association provided. define design study model.matrix() function fit feature-wise linear models log2-transformed abundances using lmFit() function. P-values significance association calculated using eBayes() function, also performs empirical Bayes-based robust estimation standard errors. See also excellent vignette/user guide limma package examples details linear model procedure. linear models fitted, can now proceed extract results. create data frame containing coefficients, raw adjusted p-values (applying Benjamini-Hochberg correction, .e., method = \"BH\" improved control false discovery rate), average intensity signals CVD CTR samples, indication whether feature deemed significant . consider metabolites adjusted p-value smaller 0.05 significant, also include (absolute) difference abundances cut-criteria. last, add differential abundance results result object’s rowData(). can now proceed visualize distribution raw adjusted p-values. Figure 36. Distribution raw (left) adjusted p-values (right). histograms show distribution raw adjusted p-values. Except enrichment small p-values, raw p-values (less) uniformly distributed, indicates absence strong systematic biases data. adjusted p-values conservative account multiple testing; important fit linear model feature therefore perform large number tests leads high chance false positive findings. see features low p-values, indicating likely significantly different two study groups. plot adjusted p-values log2 fold change (average) abundances. volcano plot allow us visualize features significantly different two study groups. highlighted blue color plot . Figure 37. Volcano plot showing analysis results. interesting features top corners volcano plot (.e., features large difference abundance groups small p-value). significant features negative coefficient (log2 fold change value) indicating abundance lower CVD samples compared CTR samples. features listed, along average difference (log2) abundance compared groups, adjusted p-values, average (log2) abundance sample group RSD (CV) QC samples table . Table 7. Features significant differences abundances. visualize EICs significant features evaluate (raw) signal. restrict MS data set study samples. Parameters keepFeatures = TRUE: ensures identified features retained `subset object. peakBg: defines (background) color individual chromatographic peak EIC object. Figure 38. Extracted ion chromatograms significant features. EICs significant features show clear single peak. intensities (already observed ) much larger CTR CVD samples. exception second feature (second EIC top row), intensities significant features however generally low. might make challenging identify using LC-MS/MS setup.","code":"#' Define the colors for the plot col_sample <- col_phenotype[res$phenotype] #' Log transform and scale the data for PCA analysis vals <- assay(res, \"norm_imputed\") |> t() |> log2() |> scale(center = TRUE, scale = TRUE) pca_res <- prcomp(vals, scale = FALSE, center = FALSE) vals_st <- cbind(vals, phenotype = res$phenotype) autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Add age to the PCA plot vals_st <- cbind(vals, age = res$age) autoplot(pca_res, data = vals_st , colour = 'age', scale = 0) #' Define the linear model to be applied to the data p.cut <- 0.05 # cut-off for significance. m.cut <- 0.5 # cut-off for log2 fold change age <- res$age phenotype <- factor(res$phenotype) design <- model.matrix(~ phenotype + age) #' Fit the linear model to the data, explaining metabolite #' concentrations by phenotype and age. fit <- lmFit(log2(assay(res, \"norm_imputed\")), design = design) fit <- eBayes(fit) #' Compile a result data frame tmp <- data.frame( coef.CVD = fit$coefficients[, \"phenotypeCVD\"], pvalue.CVD = fit$p.value[, \"phenotypeCVD\"], adjp.CVD = p.adjust(fit$p.value[, \"phenotypeCVD\"], method = \"BH\"), avg.CVD = rowMeans( log2(assay(res, \"norm_imputed\")[, res$phenotype == \"CVD\"])), avg.CTR = rowMeans( log2(assay(res, \"norm_imputed\")[, res$phenotype == \"CTR\"])) ) tmp$significant.CVD <- tmp$adjp.CVD < 0.05 #' Add the results to the object's rowData rowData(res) <- cbind(rowData(res), tmp) #' Plot the distribution of p-values par(mfrow = c(1, 2)) hist(rowData(res)$pvalue.CVD, breaks = 64, xlab = \"p value\", main = \"Distribution of raw p-values\", cex.main = 1, cex.lab = 1, cex.axis = 1) hist(rowData(res)$adjp.CVD, breaks = 64, xlab = expression(p[BH]~value), main = \"Distribution of adjusted p-values\", cex.main = 1, cex.lab = 1, cex.axis = 1) #' Plot volcano plot of the statistical results par(mfrow = c(1, 1), mar = c(5, 5, 5, 1)) plot(rowData(res)$coef.CVD, -log10(rowData(res)$adjp.CVD), xlab = expression(log[2]~difference), ylab = expression(-log[10]~p[BH]), pch = 16, col = \"#00000060\", cex.main = 1.5, cex.lab = 1.5, cex.axis = 1.3) grid() abline(h = -log10(0.05), col = \"#0000ffcc\") if (any(rowData(res)$significant.CVD)) { points(rowData(res)$coef.CVD[rowData(res)$significant.CVD], -log10(rowData(res)$adjp.CVD[rowData(res)$significant.CVD]), col = \"#0000ffcc\") } # Table of significant features tab <- rowData(res)[rowData(res)$significant.CVD, c(\"mzmed\", \"rtmed\", \"coef.CVD\", \"adjp.CVD\", \"avg.CTR\", \"avg.CVD\", \"qc_cv\")] |> as.data.frame() tab <- tab[order(abs(tab$coef.CVD), decreasing = TRUE), ] kable(tab, format = \"pipe\") #' Restrict the raw data to study samples. lcms1_study <- lcms1[sampleData(lcms1)$phenotype != \"QC\", keepFeatures = TRUE] #' Extract EICs for the significant features eic_sign <- featureChromatograms( lcms1_study, features = rownames(tab), expandRt = 5, filled = TRUE) #' Plot the EICs. plot(eic_sign, col = col_sample, peakBg = paste0(col_sample[chromPeaks(eic_sign)[, \"sample\"]], 40)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"annotation","dir":"Articles","previous_headings":"","what":"Annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"now identified features significant differences abundances two study groups. provide information metabolic pathways differentiate affected healthy individuals might hence also serve biomarkers. However, stage analysis know compounds/metabolites actually represent. thus need now annotate signals. Annotation can performed different level confidence [@sumner_proposed_2007,@schymanski_identifying_2014]. lowest level annotation, highest rate false positive hits, bases features m/z ratios. Higher levels annotations employ fragment spectra (MS2 spectra) ions interest requiring however acquisition additional data. section, demonstrate multiple ways annotate significant features using functionality provided Bioconductor packages. Alternative approaches external software tools, may better suited, also discussed later section. data set acquired using LC-MS setup features thus characterized m/z retention times. retention time LC-setup-specific , without prior data/knowledge provide little information features’ identity. Modern MS instruments high accuracy m/z values therefore reliable estimates compound ion’s mass--charge ratio. first approach, use features’ m/z values match reference values, .e., exact masses chemical compounds provided reference database, case MassBank database. full MassBank data re-distributed Bioconductor’s AnnotationHub resource, simplifies integration reproducible R-based analysis workflows. load resource, list available MassBank data sets/releases load one . MassBank data provided self-contained SQLite database data can queried accessed CompoundDb Bioconductor package. use compounds() function extract small compound annotations database. MassBank (small compound annotation databases) provides (exact) molecular mass compound. Since almost small compounds neutral natural state, need first converted m/z values allow matching feature’s m/z. calculate m/z neutral mass, need assume ion (adduct) might generated measured metabolites employed electro-spray ionization. positive polarity, human serum samples, common ions protonated ([M+H]+), bear addition sodium ([M+Na]+) ammonium ([M+H-NH3]+) ions. match observed m/z values reference values potential ions use matchValues() function Mass2MzParam approach, allows specify types expected ions adducts parameter maximal allowed difference compared values using tolerance ppm parameters. first prepare data.frame significant features, set parameters matching perform comparison query features reference database. resulting Matched object shows 4 6 significant features matched ions compounds MassBank database. extract full result Matched object. Thus, total 237 ions compounds MassBank matched significant features based specified tolerance settings. Many compounds, different structure thus function/chemical property, identical chemical formula thus mass. Matching exclusively m/z features hence result many potentially false positive hits thus considered provide low confidence annotation. additional complication annotation resources, like MassBank, community maintained, contain large amount redundant information. reduce redundancy result table iterate hits feature keep matches unique compounds (identified INCHIKEY). INCHI INCHIKEY combine information compound’s chemical formula structure, different compounds can share chemical formula, different structure thus INCHI. Table 9. MS1 annotation results. table shows results MS1-based annotation process. can see four significant features matched. matches seem pretty accurate low ppm errors. deduplication performed considerably reduced number hits feature, first still matches ions large number compounds (chemical formula). Considering features’ m/z retention times MS1-based annotation increase annotation confidence, requires additional data, recording retention time thepure standard compound LC setup. alternative approach might provide better inside annotations help choose different annotations feature evaluate certain chemical properties possible matches. instance, LogP value, available several databases HMDB, provides insight given compound’s polarity. property highly affects interaction analyte column, usually also directly affects separation. Therefore, comparison analyte’s retention time polarity can help rule possible misidentifications. low confidence, MS1-based annotation can provide first candidate annotations confirmed rejected additional analyses. MS1 annotation fast efficient method annotate features therefore give first insight compounds significantly different two study groups. However, always accurate. MS2 data can provide higher level confidence annotation process provides, observed fragmentation pattern, information structure compound. MS2 data can generated LC-MS/MS measurement MS2 spectra recorded ions either data dependent acquisition (DDA) data independent acquisition (DIA) mode. Generally, advised include LC-MS/MS runs QC samples randomly selected study samples already acquisition MS1 data used quantification signals. alternative, addition, post-hoc LC-MS/MS acquisition can performed generate MS2 data needed annotation. present experiment, separate LC-MS/MS measurement conducted QC samples selected study samples generate data using inclusion list pre-selected ions. represent features found significantly different CVD CTR samples initial analysis full experiment. use subset second LC-MS/MS data set show data can used MS2-based annotation. differential abundance analysis found features significantly higher abundances CTR samples. Consequently, utilize MS2 data obtained CTR samples annotate significant features. load LC-MS/MS data experiment restrict data acquired CTR sample. Table 10. Samples LC-MS/MS data set. total 3 LC-MS/MS data files control samples, different collision energy fragment ions. show number MS1 MS2 spectra files. Compared number MS2 spectra, far less MS1 spectra acquired. configuration MS instrument set ensure ions specified inclusion list selected fragmentation, even intensity might low. setting, however, recorded MS2 spectra represent noise. plot shows location precursor ions m/z - retention time plane three files. can see MS2 spectra recorded m/z interest along full retention time range, even actual ions eluting within certain retention time windows. next extract Spectra object MS data data object assign new spectra variable employed collision energy, extract data object sampleData. next filter MS data first restricting MS2 spectra removing mass peaks spectrum intensity lower 5% highest intensity spectrum, assuming low intensity peaks represent background signal. next remove also mass peaks m/z value greater equal precursor m/z ion. puts, later matching reference spectra, weight fragmentation pattern ions avoids hits based precursor m/z peak (hence similar mass compared compounds). last, restrict data spectra least two fragment peaks scale intensities sum 1 spectrum. similarity calculations affected scaling, makes visual comparison fragment spectra easier read. Finally, also speed later comparison spectra reference database, load full MS data memory (changing backend MsBackendMemory) apply processing steps performed data far. Keeping MS data memory performance benefits, generally suggested large data sets. evaluate impact present data set print addition size data object changing backend. thus moderate increase memory demand loading MS data memory (also filtered cleaned MS2 data). proceed match experimental MS2 spectra reference fragment spectra, workflow aim annotate features found significant differential abundance analysis. goal thus identify MS2 spectra second (LC-MS/MS) run represent fragments ions features data first (LC-MS) run. approach match MS2 spectra significant features determined earlier based precursor m/z retention time (given acceptable tolerance) feature’s m/z retention time. can easily done using featureArea() function effectively considers actual m/z retention time ranges features’ chromatographic peaks therefore increase chance finding correct match. however also assumes retention times first second run don’t differ much. Alternatively, need align retention times second LC-MS/MS data set first. first extract feature area, .e., m/z retention time ranges, significant features. next identify fragment spectra precursor m/z retention times within ranges. use filterRanges() function allows filter Spectra object using multiple ranges simultaneously. apply function separately feature (row matrix) extract MS2 spectra representing fragmentation information presumed feature’s ions. result apply() call list Spectra, element representing result one feature. exception last feature, multiple MS2 spectra identified. next combine list Spectra single Spectra object using concatenateSpectra() function add additional spectra variable containing respective feature identifier. now Spectra object fragment spectra significant features differential expression analysis. next build reference data need process way query spectra. extract fragment spectra MassBank database, restrict positive polarity data (since experiment acquired positive polarity) perform processing fragment spectra MassBank database. Note switch MsBackendMemory backend hence loading full data reference database memory. positive impact performance subsequent spectra matching, however also increase memory demand present analysis. Now Spectra object second run database spectra prepared, can proceed matching process. use matchSpectra() function MetaboAnnotation package CompareSpectraParam define settings matching. following parameters: requirePrecursor = TRUE: Limits spectra similarity calculations fragment spectra similar precursor m/z. tolerance ppm: Defines acceptable difference compared m/z values. relaxed tolerance settings ensure find matches even reference spectra acquired instruments lower accuracy. THRESHFUN: Defines matches report. , keep matches resulting spectra similarity score (calculated normalized dot product [@stein_optimization_1994], default similarity function) larger 0.6. Thus, total 315 query MS2 spectra, 16 matched (least) one reference fragment spectrum. restrict results matching spectra extract metadata query target spectra well similarity score (complete list available metadata information can listed colnames() function). Now, query-target pairs spectra similarity higher 0.6. Similar MS1-based annotation also result table contains redundant information: multiple fragment spectra per feature also MassBank contains several fragment spectra compound, measured using differing collision energies MS instruments, different laboratories. thus iterate feature-compound pairs select one highest score. identifier compound, use fragment spectra’s INCHI-key, since compound names MassBank accepted consensus/controlled vocabularies. Table 9.MS2 annotation results. Thus, 6 significant features, one annotated compound based MS2-based approach. many reasons failure find matches features. Although MS2 spectra selected feature, appear represent noise, features, LC-MS/MS run, low MS1 signal recorded, indicating selected sample original compound might (longer) present. Also, reference databases contain predominantly fragment spectra protonated ([M+H]+) ions compounds, features might represent signal types ions result different fragmentation pattern. Finally, fragment spectra compounds interest might also simply present used reference database. Thus, combining information MS1- MS2 based annotation can annotate one feature considerable confidence. feature m/z 195.0879 retention time 32 seconds seems ion caffeine. result somewhat disappointing also clearly shows importance proper experimental planning need control potential confounding factors. present experiment, disease-specific biomarker identified, life-style property individuals suffering disease: coffee consumption probably contraindicated patients CVD group reduce risk heart arrhythmia. plot EIC feature highlighting retention time highest scoring MS2 spectra recorded create mirror plot comparing MS2 spectra reference fragment spectra caffeine. plot clearly shows higher signal feature CTR compared CVD samples. QC samples exhibit lower highly consistent signal, suggesting absence strong technical noise biases raw data experiment. vertical line indicates retention time fragment spectrum best match reference spectrum. noted , since fragment spectra measured separate LC-MS/MS experiment, considered indication approximate retention time ions fragmented second experiment. fragment spectrum feature, shown upper panel right plot highly similar reference spectrum caffeine MassBank (shown lower panel). addition matching precursor m/z, two fragments (m/z intensity) present spectra. can also extract additional metadata matching reference spectrum, used collision energy, fragmentation mode, instrument type, instrument well ion (adduct) fragmented. present workflow highlights annotation performed within R using packages Bioconductor project, also excellent external softwares used alternative, SIRIUS [@duhrkop_sirius_2019], mummichog [@li_predicting_2013] GNPS [@nothias_feature-based_2020] among others. use , data need exported format supported . MS2 spectra, data easily exported required MGF file format using MsBackendMgf Bioconductor package. Integration xcms feature-based molecular networking GNPS described GNPS documentation. alternative, addition, evidence potential matching chemical formula feature derived evaluating isotope pattern full MS1 scan. provide information isotope composition. Also , various functions isotopologues() MetaboCoreUtils package functionality envipat R package [@loos_accelerated_2015] used.","code":"#' load reference data ah <- AnnotationHub() #' List available MassBank data sets query(ah, \"MassBank\") AnnotationHub with 6 records # snapshotDate(): 2024-10-14 # $dataprovider: MassBank # $species: NA # $rdataclass: CompDb # additional mcols(): taxonomyid, genome, description, # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags, # rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[[\"AH107048\"]]' title AH107048 | MassBank CompDb for release 2021.03 AH107049 | MassBank CompDb for release 2022.06 AH111334 | MassBank CompDb for release 2022.12.1 AH116164 | MassBank CompDb for release 2023.06 AH116165 | MassBank CompDb for release 2023.09 AH116166 | MassBank CompDb for release 2023.11 #' Load one MAssBank release mb <- ah[[\"AH116166\"]] downloading 1 resources retrieving 1 resource loading from cache #' Extract compound annotations cmps <- compounds(mb, columns = c(\"compound_id\", \"name\", \"formula\", \"exactmass\", \"inchikey\")) head(cmps) compound_id formula exactmass inchikey 1 1 C27H29NO11 543.1741 AOJJSUZBOXZQNB-UHFFFAOYSA-N 2 2 C40H54O4 598.4022 KFNGKYUGHHQDEE-AXWOCEAUSA-N 3 3 C10H24N2O2 204.1838 AEUTYOVWOVBAKS-UWVGGRQHSA-N 4 4 C16H27NO5 313.1889 LMFKRLGHEKVMNT-UJDVCPFMSA-N 5 5 C20H15Cl3N2OS 435.9971 JLGKQTAYUIMGRK-UHFFFAOYSA-N 6 6 C15H14O5 274.0841 BWNCKEBBYADFPQ-UHFFFAOYSA-N name 1 Epirubicin 2 Crassostreaxanthin A 3 Ethambutol 4 Heliotrine 5 Sertaconazole 6 (R)Semivioxanthin #' Prepare query data frame rowData(res)$feature_id <- rownames(rowData(res)) res_sig <- res[rowData(res)$significant.CVD, ] #' Setup parameters for the matching param <- Mass2MzParam(adducts = c(\"[M+H]+\", \"[M+Na]+\", \"[M+H-NH3]+\"), tolerance = 0, ppm = 5) #' Perform the matching. mtch <- matchValues(res_sig, cmps, param = param, mzColname = \"mzmed\") mtch Object of class Matched Total number of matches: 237 Number of query objects: 6 (4 matched) Number of target objects: 117732 (237 matched) #' Extracting the results mtch_res <- matchedData(mtch, c(\"feature_id\", \"mzmed\", \"rtmed\", \"adduct\", \"ppm_error\", \"target_formula\", \"target_name\", \"target_inchikey\")) mtch_res DataFrame with 239 rows and 8 columns feature_id mzmed rtmed adduct ppm_error target_formula FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 ... ... ... ... ... ... ... FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT5606 FT5606 560.36 33.5492 NA NA NA target_name target_inchikey FT0371 Benzohydro... VDEUYMSGMP... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Salicylami... SKZKKFZAGN... ... ... ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT5606 NA NA rownames(mtch_res) <- NULL #' Keep only info on features that machted - create a utility function for that mtch_res <- split(mtch_res, mtch_res$feature_id) |> lapply(function(x) { lapply(split(x, x$target_inchikey), function(z) { z[which.min(z$ppm_error), ] }) }) |> unlist(recursive = FALSE) |> do.call(what = rbind) #' Display the results kable(mtch_res, format = \"pipe\") #' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # filter samples to keep MSMS data from CTR samples: sampleData(lcms2) <- sampleData(lcms2)[sampleData(lcms2)$phenotype == \"CTR\", ] sampleData(lcms2) <- sampleData(lcms2)[grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file), ] # Add fragmentation data information (from filenames) sampleData(lcms2)$fragmentation_mode <- c(\"CE20\", \"CE30\", \"CES\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, c(10, 240)) Filter spectra #' check the number of spectra per ms level spectra(lcms2) |> msLevel() |> split(spectraSampleIndex(lcms2)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 11 12 1 825 186 186 186 825 186 186 186 825 185 186 185 2 825 3121 3118 3124 825 3123 3118 3120 825 3117 3117 3116 plotPrecursorIons(lcms2) ms2_ctr <- spectra(lcms2) ms2_ctr$collision_energy <- sampleData(lcms2)$fragmentation_mode[spectraSampleIndex(lcms2)] #' Remove low intensity peaks low_int <- function(x, ...) { x > max(x, na.rm = TRUE) * 0.05 } ms2_ctr <- filterMsLevel(ms2_ctr, 2L) |> filterIntensity(intensity = low_int) #' Remove precursor peaks and restrict to spectra with a minimum #' number of peaks ms2_ctr <- filterPrecursorPeaks(ms2_ctr, ppm = 50, mz = \">=\") ms2_ctr <- ms2_ctr[lengths(ms2_ctr) > 1] |> scalePeaks() #' Size of the object before loading into memory print(object.size(ms2_ctr), units = \"MB\") 5.1 Mb #' Load the MS data subset into memory ms2_ctr <- setBackend(ms2_ctr, MsBackendMemory()) ms2_ctr <- applyProcessing(ms2_ctr) #' Size of the object after loading into memory print(object.size(ms2_ctr), units = \"MB\") 18.2 Mb #' Define the m/z and retention time ranges for the significant features target <- featureArea(lcms1)[rownames(res_sig), ] target mzmin mzmax rtmin rtmax FT0371 138.0544 138.0552 146.32270 152.86115 FT0565 161.0391 161.0407 159.00234 164.30799 FT0732 182.0726 182.0756 32.71242 42.28755 FT0845 195.0799 195.0887 30.73235 35.67337 FT1171 229.1282 229.1335 178.01450 183.35303 FT5606 560.3539 560.3656 32.06570 35.33456 #' Identify for each feature MS2 spectra with their precursor m/z and #' retention time within the feature's m/z and retention time range ms2_ctr_fts <- apply(target[, c(\"rtmin\", \"rtmax\", \"mzmin\", \"mzmax\")], MARGIN = 1, FUN = filterRanges, object = ms2_ctr, spectraVariables = c(\"rtime\", \"precursorMz\")) lengths(ms2_ctr_fts) FT0371 FT0565 FT0732 FT0845 FT1171 FT5606 38 36 135 68 38 0 l <- lengths(ms2_ctr_fts) #' Combine the individual Spectra objects ms2_ctr_fts <- concatenateSpectra(ms2_ctr_fts) #' Assign the feature identifier to each MS2 spectrum ms2_ctr_fts$feature_id <- rep(rownames(res_sig), l) ms2_ref <- Spectra(mb) |> filterPolarity(1L) |> filterIntensity(intensity = low_int) |> filterPrecursorPeaks(ppm = 50, mz = \">=\") ms2_ref <- ms2_ref[lengths(ms2_ref) > 1] |> scalePeaks() register(SerialParam()) #' Define the settings for the spectra matching. prm <- CompareSpectraParam(ppm = 40, tolerance = 0.05, requirePrecursor = TRUE, THRESHFUN = function(x) which(x >= 0.6)) ms2_mtch <- matchSpectra(ms2_ctr_fts, ms2_ref, param = prm) ms2_mtch Object of class MatchedSpectra Total number of matches: 214 Number of query objects: 315 (16 matched) Number of target objects: 69561 (21 matched) #' Keep only query spectra with matching reference spectra ms2_mtch <- ms2_mtch[whichQuery(ms2_mtch)] #' Extract the results ms2_mtch_res <- matchedData(ms2_mtch) nrow(ms2_mtch_res) [1] 214 #' - split the result per feature #' - select for each feature the best matching result for each compound #' - combine the result again into a data frame ms2_mtch_res <- ms2_mtch_res |> split(f = paste(ms2_mtch_res$feature_id, ms2_mtch_res$target_inchikey)) |> lapply(function(z) { z[which.max(z$score), ] }) |> do.call(what = rbind) |> as.data.frame() #' List the best matching feature-compound pair pandoc.table(ms2_mtch_res[, c(\"feature_id\", \"target_name\", \"score\", \"target_inchikey\")], style = \"rmarkdown\", caption = \"Table 9.MS2 annotation results.\", split.table = Inf) par(mfrow = c(1, 2)) col_sample <- col_phenotype[sampleData(lcms1)$phenotype] #' Extract and plot EIC for the annotated feature eic <- featureChromatograms(lcms1, features = ms2_mtch_res$feature_id[1]) plot(eic, col = col_sample, peakCol = col_sample[chromPeaks(eic)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic)[, \"sample\"]], 20)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1) #' Identify the best matching query-target spectra pair idx <- which.max(ms2_mtch_res$score) #' Indicate the retention time of the MS2 spectrum in the EIC plot abline(v = ms2_mtch_res$rtime[idx]) #' Get the index of the MS2 spectrum in the query object query_idx <- which(query(ms2_mtch)$.original_query_index == ms2_mtch_res$.original_query_index[idx]) query_ms2 <- query(ms2_mtch)[query_idx] #' Get the index of the MS2 spectrum in the target object target_idx <- which(target(ms2_mtch)$spectrum_id == ms2_mtch_res$target_spectrum_id[idx]) target_ms2 <- target(ms2_mtch)[target_idx] #' Create a mirror plot comparing the two best matching spectra plotSpectraMirror(query_ms2, target_ms2) legend(\"topleft\", legend = paste0(\"precursor m/z: \", format(precursorMz(query_ms2), 3))) spectraData(target_ms2, c(\"collisionEnergy_text\", \"fragmentation_mode\", \"instrument_type\", \"instrument\", \"adduct\")) |> as.data.frame() collisionEnergy_text fragmentation_mode instrument_type 1 55 (nominal) HCD LC-ESI-ITFT instrument adduct 1 LTQ Orbitrap XL Thermo Scientific [M+H]+"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"ms1-based-annotation","dir":"Articles","previous_headings":"","what":"MS1-based annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"data set acquired using LC-MS setup features thus characterized m/z retention times. retention time LC-setup-specific , without prior data/knowledge provide little information features’ identity. Modern MS instruments high accuracy m/z values therefore reliable estimates compound ion’s mass--charge ratio. first approach, use features’ m/z values match reference values, .e., exact masses chemical compounds provided reference database, case MassBank database. full MassBank data re-distributed Bioconductor’s AnnotationHub resource, simplifies integration reproducible R-based analysis workflows. load resource, list available MassBank data sets/releases load one . MassBank data provided self-contained SQLite database data can queried accessed CompoundDb Bioconductor package. use compounds() function extract small compound annotations database. MassBank (small compound annotation databases) provides (exact) molecular mass compound. Since almost small compounds neutral natural state, need first converted m/z values allow matching feature’s m/z. calculate m/z neutral mass, need assume ion (adduct) might generated measured metabolites employed electro-spray ionization. positive polarity, human serum samples, common ions protonated ([M+H]+), bear addition sodium ([M+Na]+) ammonium ([M+H-NH3]+) ions. match observed m/z values reference values potential ions use matchValues() function Mass2MzParam approach, allows specify types expected ions adducts parameter maximal allowed difference compared values using tolerance ppm parameters. first prepare data.frame significant features, set parameters matching perform comparison query features reference database. resulting Matched object shows 4 6 significant features matched ions compounds MassBank database. extract full result Matched object. Thus, total 237 ions compounds MassBank matched significant features based specified tolerance settings. Many compounds, different structure thus function/chemical property, identical chemical formula thus mass. Matching exclusively m/z features hence result many potentially false positive hits thus considered provide low confidence annotation. additional complication annotation resources, like MassBank, community maintained, contain large amount redundant information. reduce redundancy result table iterate hits feature keep matches unique compounds (identified INCHIKEY). INCHI INCHIKEY combine information compound’s chemical formula structure, different compounds can share chemical formula, different structure thus INCHI. Table 9. MS1 annotation results. table shows results MS1-based annotation process. can see four significant features matched. matches seem pretty accurate low ppm errors. deduplication performed considerably reduced number hits feature, first still matches ions large number compounds (chemical formula). Considering features’ m/z retention times MS1-based annotation increase annotation confidence, requires additional data, recording retention time thepure standard compound LC setup. alternative approach might provide better inside annotations help choose different annotations feature evaluate certain chemical properties possible matches. instance, LogP value, available several databases HMDB, provides insight given compound’s polarity. property highly affects interaction analyte column, usually also directly affects separation. Therefore, comparison analyte’s retention time polarity can help rule possible misidentifications. low confidence, MS1-based annotation can provide first candidate annotations confirmed rejected additional analyses.","code":"#' load reference data ah <- AnnotationHub() #' List available MassBank data sets query(ah, \"MassBank\") AnnotationHub with 6 records # snapshotDate(): 2024-10-14 # $dataprovider: MassBank # $species: NA # $rdataclass: CompDb # additional mcols(): taxonomyid, genome, description, # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags, # rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[[\"AH107048\"]]' title AH107048 | MassBank CompDb for release 2021.03 AH107049 | MassBank CompDb for release 2022.06 AH111334 | MassBank CompDb for release 2022.12.1 AH116164 | MassBank CompDb for release 2023.06 AH116165 | MassBank CompDb for release 2023.09 AH116166 | MassBank CompDb for release 2023.11 #' Load one MAssBank release mb <- ah[[\"AH116166\"]] downloading 1 resources retrieving 1 resource loading from cache #' Extract compound annotations cmps <- compounds(mb, columns = c(\"compound_id\", \"name\", \"formula\", \"exactmass\", \"inchikey\")) head(cmps) compound_id formula exactmass inchikey 1 1 C27H29NO11 543.1741 AOJJSUZBOXZQNB-UHFFFAOYSA-N 2 2 C40H54O4 598.4022 KFNGKYUGHHQDEE-AXWOCEAUSA-N 3 3 C10H24N2O2 204.1838 AEUTYOVWOVBAKS-UWVGGRQHSA-N 4 4 C16H27NO5 313.1889 LMFKRLGHEKVMNT-UJDVCPFMSA-N 5 5 C20H15Cl3N2OS 435.9971 JLGKQTAYUIMGRK-UHFFFAOYSA-N 6 6 C15H14O5 274.0841 BWNCKEBBYADFPQ-UHFFFAOYSA-N name 1 Epirubicin 2 Crassostreaxanthin A 3 Ethambutol 4 Heliotrine 5 Sertaconazole 6 (R)Semivioxanthin #' Prepare query data frame rowData(res)$feature_id <- rownames(rowData(res)) res_sig <- res[rowData(res)$significant.CVD, ] #' Setup parameters for the matching param <- Mass2MzParam(adducts = c(\"[M+H]+\", \"[M+Na]+\", \"[M+H-NH3]+\"), tolerance = 0, ppm = 5) #' Perform the matching. mtch <- matchValues(res_sig, cmps, param = param, mzColname = \"mzmed\") mtch Object of class Matched Total number of matches: 237 Number of query objects: 6 (4 matched) Number of target objects: 117732 (237 matched) #' Extracting the results mtch_res <- matchedData(mtch, c(\"feature_id\", \"mzmed\", \"rtmed\", \"adduct\", \"ppm_error\", \"target_formula\", \"target_name\", \"target_inchikey\")) mtch_res DataFrame with 239 rows and 8 columns feature_id mzmed rtmed adduct ppm_error target_formula FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 ... ... ... ... ... ... ... FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.0883 [M+Na]+ 3.07708 C12H18N2O FT5606 FT5606 560.36 33.5492 NA NA NA target_name target_inchikey FT0371 Benzohydro... VDEUYMSGMP... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Salicylami... SKZKKFZAGN... ... ... ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT5606 NA NA rownames(mtch_res) <- NULL #' Keep only info on features that machted - create a utility function for that mtch_res <- split(mtch_res, mtch_res$feature_id) |> lapply(function(x) { lapply(split(x, x$target_inchikey), function(z) { z[which.min(z$ppm_error), ] }) }) |> unlist(recursive = FALSE) |> do.call(what = rbind) #' Display the results kable(mtch_res, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"ms2-based-annotation","dir":"Articles","previous_headings":"","what":"MS2-based annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"MS1 annotation fast efficient method annotate features therefore give first insight compounds significantly different two study groups. However, always accurate. MS2 data can provide higher level confidence annotation process provides, observed fragmentation pattern, information structure compound. MS2 data can generated LC-MS/MS measurement MS2 spectra recorded ions either data dependent acquisition (DDA) data independent acquisition (DIA) mode. Generally, advised include LC-MS/MS runs QC samples randomly selected study samples already acquisition MS1 data used quantification signals. alternative, addition, post-hoc LC-MS/MS acquisition can performed generate MS2 data needed annotation. present experiment, separate LC-MS/MS measurement conducted QC samples selected study samples generate data using inclusion list pre-selected ions. represent features found significantly different CVD CTR samples initial analysis full experiment. use subset second LC-MS/MS data set show data can used MS2-based annotation. differential abundance analysis found features significantly higher abundances CTR samples. Consequently, utilize MS2 data obtained CTR samples annotate significant features. load LC-MS/MS data experiment restrict data acquired CTR sample. Table 10. Samples LC-MS/MS data set. total 3 LC-MS/MS data files control samples, different collision energy fragment ions. show number MS1 MS2 spectra files. Compared number MS2 spectra, far less MS1 spectra acquired. configuration MS instrument set ensure ions specified inclusion list selected fragmentation, even intensity might low. setting, however, recorded MS2 spectra represent noise. plot shows location precursor ions m/z - retention time plane three files. can see MS2 spectra recorded m/z interest along full retention time range, even actual ions eluting within certain retention time windows. next extract Spectra object MS data data object assign new spectra variable employed collision energy, extract data object sampleData. next filter MS data first restricting MS2 spectra removing mass peaks spectrum intensity lower 5% highest intensity spectrum, assuming low intensity peaks represent background signal. next remove also mass peaks m/z value greater equal precursor m/z ion. puts, later matching reference spectra, weight fragmentation pattern ions avoids hits based precursor m/z peak (hence similar mass compared compounds). last, restrict data spectra least two fragment peaks scale intensities sum 1 spectrum. similarity calculations affected scaling, makes visual comparison fragment spectra easier read. Finally, also speed later comparison spectra reference database, load full MS data memory (changing backend MsBackendMemory) apply processing steps performed data far. Keeping MS data memory performance benefits, generally suggested large data sets. evaluate impact present data set print addition size data object changing backend. thus moderate increase memory demand loading MS data memory (also filtered cleaned MS2 data). proceed match experimental MS2 spectra reference fragment spectra, workflow aim annotate features found significant differential abundance analysis. goal thus identify MS2 spectra second (LC-MS/MS) run represent fragments ions features data first (LC-MS) run. approach match MS2 spectra significant features determined earlier based precursor m/z retention time (given acceptable tolerance) feature’s m/z retention time. can easily done using featureArea() function effectively considers actual m/z retention time ranges features’ chromatographic peaks therefore increase chance finding correct match. however also assumes retention times first second run don’t differ much. Alternatively, need align retention times second LC-MS/MS data set first. first extract feature area, .e., m/z retention time ranges, significant features. next identify fragment spectra precursor m/z retention times within ranges. use filterRanges() function allows filter Spectra object using multiple ranges simultaneously. apply function separately feature (row matrix) extract MS2 spectra representing fragmentation information presumed feature’s ions. result apply() call list Spectra, element representing result one feature. exception last feature, multiple MS2 spectra identified. next combine list Spectra single Spectra object using concatenateSpectra() function add additional spectra variable containing respective feature identifier. now Spectra object fragment spectra significant features differential expression analysis. next build reference data need process way query spectra. extract fragment spectra MassBank database, restrict positive polarity data (since experiment acquired positive polarity) perform processing fragment spectra MassBank database. Note switch MsBackendMemory backend hence loading full data reference database memory. positive impact performance subsequent spectra matching, however also increase memory demand present analysis. Now Spectra object second run database spectra prepared, can proceed matching process. use matchSpectra() function MetaboAnnotation package CompareSpectraParam define settings matching. following parameters: requirePrecursor = TRUE: Limits spectra similarity calculations fragment spectra similar precursor m/z. tolerance ppm: Defines acceptable difference compared m/z values. relaxed tolerance settings ensure find matches even reference spectra acquired instruments lower accuracy. THRESHFUN: Defines matches report. , keep matches resulting spectra similarity score (calculated normalized dot product [@stein_optimization_1994], default similarity function) larger 0.6. Thus, total 315 query MS2 spectra, 16 matched (least) one reference fragment spectrum. restrict results matching spectra extract metadata query target spectra well similarity score (complete list available metadata information can listed colnames() function). Now, query-target pairs spectra similarity higher 0.6. Similar MS1-based annotation also result table contains redundant information: multiple fragment spectra per feature also MassBank contains several fragment spectra compound, measured using differing collision energies MS instruments, different laboratories. thus iterate feature-compound pairs select one highest score. identifier compound, use fragment spectra’s INCHI-key, since compound names MassBank accepted consensus/controlled vocabularies. Table 9.MS2 annotation results. Thus, 6 significant features, one annotated compound based MS2-based approach. many reasons failure find matches features. Although MS2 spectra selected feature, appear represent noise, features, LC-MS/MS run, low MS1 signal recorded, indicating selected sample original compound might (longer) present. Also, reference databases contain predominantly fragment spectra protonated ([M+H]+) ions compounds, features might represent signal types ions result different fragmentation pattern. Finally, fragment spectra compounds interest might also simply present used reference database. Thus, combining information MS1- MS2 based annotation can annotate one feature considerable confidence. feature m/z 195.0879 retention time 32 seconds seems ion caffeine. result somewhat disappointing also clearly shows importance proper experimental planning need control potential confounding factors. present experiment, disease-specific biomarker identified, life-style property individuals suffering disease: coffee consumption probably contraindicated patients CVD group reduce risk heart arrhythmia. plot EIC feature highlighting retention time highest scoring MS2 spectra recorded create mirror plot comparing MS2 spectra reference fragment spectra caffeine. plot clearly shows higher signal feature CTR compared CVD samples. QC samples exhibit lower highly consistent signal, suggesting absence strong technical noise biases raw data experiment. vertical line indicates retention time fragment spectrum best match reference spectrum. noted , since fragment spectra measured separate LC-MS/MS experiment, considered indication approximate retention time ions fragmented second experiment. fragment spectrum feature, shown upper panel right plot highly similar reference spectrum caffeine MassBank (shown lower panel). addition matching precursor m/z, two fragments (m/z intensity) present spectra. can also extract additional metadata matching reference spectrum, used collision energy, fragmentation mode, instrument type, instrument well ion (adduct) fragmented.","code":"#' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # filter samples to keep MSMS data from CTR samples: sampleData(lcms2) <- sampleData(lcms2)[sampleData(lcms2)$phenotype == \"CTR\", ] sampleData(lcms2) <- sampleData(lcms2)[grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file), ] # Add fragmentation data information (from filenames) sampleData(lcms2)$fragmentation_mode <- c(\"CE20\", \"CE30\", \"CES\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, c(10, 240)) Filter spectra #' check the number of spectra per ms level spectra(lcms2) |> msLevel() |> split(spectraSampleIndex(lcms2)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 11 12 1 825 186 186 186 825 186 186 186 825 185 186 185 2 825 3121 3118 3124 825 3123 3118 3120 825 3117 3117 3116 plotPrecursorIons(lcms2) ms2_ctr <- spectra(lcms2) ms2_ctr$collision_energy <- sampleData(lcms2)$fragmentation_mode[spectraSampleIndex(lcms2)] #' Remove low intensity peaks low_int <- function(x, ...) { x > max(x, na.rm = TRUE) * 0.05 } ms2_ctr <- filterMsLevel(ms2_ctr, 2L) |> filterIntensity(intensity = low_int) #' Remove precursor peaks and restrict to spectra with a minimum #' number of peaks ms2_ctr <- filterPrecursorPeaks(ms2_ctr, ppm = 50, mz = \">=\") ms2_ctr <- ms2_ctr[lengths(ms2_ctr) > 1] |> scalePeaks() #' Size of the object before loading into memory print(object.size(ms2_ctr), units = \"MB\") 5.1 Mb #' Load the MS data subset into memory ms2_ctr <- setBackend(ms2_ctr, MsBackendMemory()) ms2_ctr <- applyProcessing(ms2_ctr) #' Size of the object after loading into memory print(object.size(ms2_ctr), units = \"MB\") 18.2 Mb #' Define the m/z and retention time ranges for the significant features target <- featureArea(lcms1)[rownames(res_sig), ] target mzmin mzmax rtmin rtmax FT0371 138.0544 138.0552 146.32270 152.86115 FT0565 161.0391 161.0407 159.00234 164.30799 FT0732 182.0726 182.0756 32.71242 42.28755 FT0845 195.0799 195.0887 30.73235 35.67337 FT1171 229.1282 229.1335 178.01450 183.35303 FT5606 560.3539 560.3656 32.06570 35.33456 #' Identify for each feature MS2 spectra with their precursor m/z and #' retention time within the feature's m/z and retention time range ms2_ctr_fts <- apply(target[, c(\"rtmin\", \"rtmax\", \"mzmin\", \"mzmax\")], MARGIN = 1, FUN = filterRanges, object = ms2_ctr, spectraVariables = c(\"rtime\", \"precursorMz\")) lengths(ms2_ctr_fts) FT0371 FT0565 FT0732 FT0845 FT1171 FT5606 38 36 135 68 38 0 l <- lengths(ms2_ctr_fts) #' Combine the individual Spectra objects ms2_ctr_fts <- concatenateSpectra(ms2_ctr_fts) #' Assign the feature identifier to each MS2 spectrum ms2_ctr_fts$feature_id <- rep(rownames(res_sig), l) ms2_ref <- Spectra(mb) |> filterPolarity(1L) |> filterIntensity(intensity = low_int) |> filterPrecursorPeaks(ppm = 50, mz = \">=\") ms2_ref <- ms2_ref[lengths(ms2_ref) > 1] |> scalePeaks() register(SerialParam()) #' Define the settings for the spectra matching. prm <- CompareSpectraParam(ppm = 40, tolerance = 0.05, requirePrecursor = TRUE, THRESHFUN = function(x) which(x >= 0.6)) ms2_mtch <- matchSpectra(ms2_ctr_fts, ms2_ref, param = prm) ms2_mtch Object of class MatchedSpectra Total number of matches: 214 Number of query objects: 315 (16 matched) Number of target objects: 69561 (21 matched) #' Keep only query spectra with matching reference spectra ms2_mtch <- ms2_mtch[whichQuery(ms2_mtch)] #' Extract the results ms2_mtch_res <- matchedData(ms2_mtch) nrow(ms2_mtch_res) [1] 214 #' - split the result per feature #' - select for each feature the best matching result for each compound #' - combine the result again into a data frame ms2_mtch_res <- ms2_mtch_res |> split(f = paste(ms2_mtch_res$feature_id, ms2_mtch_res$target_inchikey)) |> lapply(function(z) { z[which.max(z$score), ] }) |> do.call(what = rbind) |> as.data.frame() #' List the best matching feature-compound pair pandoc.table(ms2_mtch_res[, c(\"feature_id\", \"target_name\", \"score\", \"target_inchikey\")], style = \"rmarkdown\", caption = \"Table 9.MS2 annotation results.\", split.table = Inf) par(mfrow = c(1, 2)) col_sample <- col_phenotype[sampleData(lcms1)$phenotype] #' Extract and plot EIC for the annotated feature eic <- featureChromatograms(lcms1, features = ms2_mtch_res$feature_id[1]) plot(eic, col = col_sample, peakCol = col_sample[chromPeaks(eic)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic)[, \"sample\"]], 20)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1) #' Identify the best matching query-target spectra pair idx <- which.max(ms2_mtch_res$score) #' Indicate the retention time of the MS2 spectrum in the EIC plot abline(v = ms2_mtch_res$rtime[idx]) #' Get the index of the MS2 spectrum in the query object query_idx <- which(query(ms2_mtch)$.original_query_index == ms2_mtch_res$.original_query_index[idx]) query_ms2 <- query(ms2_mtch)[query_idx] #' Get the index of the MS2 spectrum in the target object target_idx <- which(target(ms2_mtch)$spectrum_id == ms2_mtch_res$target_spectrum_id[idx]) target_ms2 <- target(ms2_mtch)[target_idx] #' Create a mirror plot comparing the two best matching spectra plotSpectraMirror(query_ms2, target_ms2) legend(\"topleft\", legend = paste0(\"precursor m/z: \", format(precursorMz(query_ms2), 3))) spectraData(target_ms2, c(\"collisionEnergy_text\", \"fragmentation_mode\", \"instrument_type\", \"instrument\", \"adduct\")) |> as.data.frame() collisionEnergy_text fragmentation_mode instrument_type 1 55 (nominal) HCD LC-ESI-ITFT instrument adduct 1 LTQ Orbitrap XL Thermo Scientific [M+H]+"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"external-tools-or-alternative-annotation-approaches","dir":"Articles","previous_headings":"","what":"External tools or alternative annotation approaches","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"present workflow highlights annotation performed within R using packages Bioconductor project, also excellent external softwares used alternative, SIRIUS [@duhrkop_sirius_2019], mummichog [@li_predicting_2013] GNPS [@nothias_feature-based_2020] among others. use , data need exported format supported . MS2 spectra, data easily exported required MGF file format using MsBackendMgf Bioconductor package. Integration xcms feature-based molecular networking GNPS described GNPS documentation. alternative, addition, evidence potential matching chemical formula feature derived evaluating isotope pattern full MS1 scan. provide information isotope composition. Also , various functions isotopologues() MetaboCoreUtils package functionality envipat R package [@loos_accelerated_2015] used.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"summary","dir":"Articles","previous_headings":"","what":"Summary","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"tutorial, describe end--end workflow LC-MS-based untargeted metabolomics experiments, conducted entirely within R using packages Bioconductor project base R functionality. excellent software exists perform similar analyses, power R-based workflow lies adaptability individual data sets research questions ability build reproducible workflows documentation. Due space restrictions don’t provide comprehensive listing methodologies individual analysis steps. advanced options approaches available, e.g., normalization data, however also heavily dependent size properties analyzed data set, well annotation features. result, found present analysis set features significant abundance differences compared groups. however reliably annotate single feature, related lifestyle individuals rather pathological properties investigated disease. low proportion annotated signals however uncommon untargeted metabolomics experiments reflects need comprehensive reliable reference annotation libraries.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"session-information","dir":"Articles","previous_headings":"","what":"Session information","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"","code":"sessionInfo() R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: Etc/UTC tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] MetaboAnnotation_1.9.2 CompoundDb_1.9.5 [3] AnnotationFilter_1.29.0 AnnotationHub_3.13.3 [5] BiocFileCache_2.13.2 dbplyr_2.5.0 [7] gridExtra_2.3 ggfortify_0.4.17 [9] ggplot2_3.5.1 vioplot_0.5.0 [11] zoo_1.8-12 sm_2.2-6.0 [13] pheatmap_1.0.12 RColorBrewer_1.1-3 [15] pander_0.6.5 limma_3.61.12 [17] MetaboCoreUtils_1.13.0 xcms_4.3.3 [19] SummarizedExperiment_1.35.4 Biobase_2.65.1 [21] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2 [23] IRanges_2.39.2 MatrixGenerics_1.17.0 [25] matrixStats_1.4.1 MsBackendMetaboLights_0.99.1 [27] Spectra_1.15.12 BiocParallel_1.39.0 [29] S4Vectors_0.43.2 BiocGenerics_0.51.3 [31] MsIO_0.0.6 MsExperiment_1.7.0 [33] ProtGenerics_1.37.1 readxl_1.4.3 [35] BiocStyle_2.33.1 quarto_1.4.4 [37] knitr_1.48 loaded via a namespace (and not attached): [1] later_1.3.2 bitops_1.0-9 [3] filelock_1.0.3 tibble_3.2.1 [5] cellranger_1.1.0 preprocessCore_1.67.1 [7] XML_3.99-0.17 lifecycle_1.0.4 [9] doParallel_1.0.17 processx_3.8.4 [11] lattice_0.22-6 MASS_7.3-61 [13] alabaster.base_1.5.10 MultiAssayExperiment_1.31.5 [15] magrittr_2.0.3 rmarkdown_2.28 [17] yaml_2.3.10 MsCoreUtils_1.17.2 [19] DBI_1.2.3 abind_1.4-8 [21] zlibbioc_1.51.1 purrr_1.0.2 [23] RCurl_1.98-1.16 rappdirs_0.3.3 [25] GenomeInfoDbData_1.2.13 MSnbase_2.31.1 [27] ncdf4_1.23 codetools_0.2-20 [29] DelayedArray_0.31.14 DT_0.33 [31] xml2_1.3.6 tidyselect_1.2.1 [33] UCSC.utils_1.1.0 farver_2.1.2 [35] base64enc_0.1-3 jsonlite_1.8.9 [37] iterators_1.0.14 foreach_1.5.2 [39] tools_4.4.1 progress_1.2.3 [41] Rcpp_1.0.13 glue_1.8.0 [43] SparseArray_1.5.45 xfun_0.48 [45] dplyr_1.1.4 withr_3.0.1 [47] BiocManager_1.30.25 fastmap_1.2.0 [49] rhdf5filters_1.17.0 fansi_1.0.6 [51] digest_0.6.37 R6_2.5.1 [53] mime_0.12 colorspace_2.1-1 [55] rsvg_2.6.1 RSQLite_2.3.7 [57] utf8_1.2.4 tidyr_1.3.1 [59] generics_0.1.3 prettyunits_1.2.0 [61] PSMatch_1.9.0 httr_1.4.7 [63] htmlwidgets_1.6.4 S4Arrays_1.5.11 [65] pkgconfig_2.0.3 gtable_0.3.5 [67] blob_1.2.4 impute_1.79.0 [69] MassSpecWavelet_1.71.0 XVector_0.45.0 [71] htmltools_0.5.8.1 MALDIquant_1.22.3 [73] clue_0.3-65 scales_1.3.0 [75] png_0.1-8 rstudioapi_0.17.0 [77] reshape2_1.4.4 rjson_0.2.23 [79] curl_5.2.3 cachem_1.1.0 [81] rhdf5_2.49.0 stringr_1.5.1 [83] BiocVersion_3.20.0 parallel_4.4.1 [85] AnnotationDbi_1.67.0 mzID_1.43.0 [87] vsn_3.73.0 pillar_1.9.0 [89] grid_4.4.1 alabaster.schemas_1.5.0 [91] vctrs_0.6.5 MsFeatures_1.13.0 [93] pcaMethods_1.97.0 cluster_2.1.6 [95] evaluate_1.0.1 cli_3.6.3 [97] compiler_4.4.1 rlang_1.1.4 [99] crayon_1.5.3 labeling_0.4.3 [101] QFeatures_1.15.3 ChemmineR_3.57.1 [103] ps_1.8.0 affy_1.83.1 [105] plyr_1.8.9 fs_1.6.4 [107] stringi_1.8.4 munsell_0.5.1 [109] Biostrings_2.73.2 lazyeval_0.2.2 [111] Matrix_1.7-1 hms_1.1.3 [113] bit64_4.5.2 Rhdf5lib_1.27.0 [115] KEGGREST_1.45.1 statmod_1.5.0 [117] mzR_2.39.2 igraph_2.1.1 [119] memoise_2.0.1 affyio_1.75.1 [121] bit_4.5.0"},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"appendix","dir":"Articles","previous_headings":"","what":"Appendix","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Thanks Steffen Neumann continuous work develop maintain xcms software. … align data set using internal standards. suggested eventually enrich anchor peaks signal ions retention time regions covered internal standards.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"aknowledgment","dir":"Articles","previous_headings":"","what":"Aknowledgment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Thanks Steffen Neumann continuous work develop maintain xcms software. …","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"alignment-using-manually-selected-anchor-peaks","dir":"Articles","previous_headings":"","what":"Alignment using manually selected anchor peaks","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"align data set using internal standards. suggested eventually enrich anchor peaks signal ions retention time regions covered internal standards.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"additional-informations","dir":"Articles","previous_headings":"","what":"Additional informations","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"","code":"#possible extra info: # -"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"certain experiments, aligning datasets recorded different times necessary. can involve comparing runs samples different laboratories matching MS2 data initial MS1 run. Variation retention time across laboratories LC systems often requires alignment step using adjustRtime() LamaParama parameter. described data description vignette, samples run twice: LC-MS mode LC-MS/MS mode. tutorial show align LC-MS/MS run preprocessed LC-MS dataset. following packages needed: Setting parallel processing improve efficiency process: First, let’s load pre-processed LC-MS object, steps get object shown End--end worflow vignette. Next, load unprocessed LC-MS/MS data MetaboLights database: adjust sampleData() LC-MS/MS object make easier access: Table 10. Samples LC-MS/MS data set. keep MS runs (MS/MS) remove pooled samples, focusing samples E common runs. alignment, ensure retention time (RT) ranges match datasets: need adjust RT range LC-MS/MS object match LC-MS data: evaluate retention time shifts, ’ll plot base peak chromatogram (BPC): Compare run1 sample run2 sample Similarly, compare BPC sample E: Perform peak detection refining alignment, detailed end--end vignette. setting applied. Now, attempt align two samples previous dataset. first step extract landmark features (referred lamas). achieve , identify features present every phenotype group lcms1 dataset. , categorize (using factor()) data phenotype retain QC samples. variable utilized filter features using PercentMissingFilter parameter within filterFeatures() function. , setting threshold = 0 select features present QC samples. lamas input look like alignment. terms method works, alignment algorithm matches chromatographic peaks experimental data lamas, fitting model based match adjust retention times minimize differences two datasets. Now can define param object LamaParama prepare alignment. Parameters tolerance, toleranceRt, ppm relate matching chromatographic peaks lamas. parameters related type fitting generated data points. details parameter overall method can found searching ?adjustRtime. example using default parameters. matchLamaChromPeaks() function facilitates assessment well lamas correspond chromatographic peaks file. extract matched results using matchedRtimes() function. used later evaluate alignment. Now can adjust retention time LC-MS/MS dataset using adjustRtime() function. extract base peak chromatogram (BPC) aligned object: evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks. quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line. tutorial demonstrated align LC-MS LC-MS/MS datasets correct retention time shifts, crucial handling data different runs platforms. preprocessed data, detected chromatographic peaks, used landmark features (lamas) QC samples adjust retention times via adjustRtime() function. Visual comparisons base peak chromatograms alignment, along distance calculations, showed clear improvements RT synchronization. Ultimately, aligning chromatographic data ensures subsequent analyses, feature extraction statistical comparisons, based consistent time points, improving data quality reliability. tutorial outlined end--end workflow can adapted various LC-MS-based metabolomics studies, helping researchers manage retention time variation effectively.","code":"library(MsIO) library(MsBackendMetaboLights) library(xcms) library(MsExperiment) library(Spectra) library(vioplot) #' Set up parallel processing using 2 cores if (.Platform$OS.type == \"unix\") { register(MulticoreParam(2)) } else { register(SnowParam(2)) } load(\"/shared/data/preprocessed_lcms1.RData\") #' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") # Only keep MS run lcms2 <- lcms2[!grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file),] range(rtime(lcms1)) [1] 9.674428 240.115311 range(rtime(lcms2)) [1] 0.275 480.176 #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, range(rtime(lcms1))) idx_A <- which(sampleData(lcms1)$sample_name == \"A\") idx_E <- which(sampleData(lcms1)$sample_name == \"E\") bpc1 <-chromatogram(lcms1[c(idx_A,idx_E)], aggregationFun = \"max\", msLevel = 1) Processing chromatographic peaks bpc2 <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) plot(bpc1[1,1], col = \"#00000080\", main = \"BPC sample A LC-MS vs A LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 1]), intensity(bpc2[1, 1]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"BPC sample E LC-MS vs E LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms2 <- findChromPeaks(lcms2, param = param, chunkSize = 2) param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) lcms2 <- refineChromPeaks(lcms2, param = param, chunkSize = 2) f <- sampleData(lcms1)$phenotype f[f != \"QC\"] <- NA lcms1 <- filterFeatures(lcms1, PercentMissingFilter(threshold = 0, f = f), filled = FALSE) 3694 features were removed lcms1_mz_rt <- featureDefinitions(lcms1)[, c(\"mzmed\",\"rtmed\")] head(lcms1_mz_rt) mzmed rtmed FT0001 50.98979 203.6001 FT0002 51.05904 191.1675 FT0003 51.98657 203.1467 FT0004 53.02036 203.2343 FT0005 53.52080 203.1936 FT0007 54.01010 235.9032 nrow(lcms1_mz_rt) [1] 5374 param <- LamaParama(lamas = lcms1_mz_rt, method = \"loess\", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm =20, tolerance = 0, toleranceRt = 20, bs = \"tp\") param <- matchLamasChromPeaks(lcms2, param = param) ref_vs_obs <- matchedRtimes(param) #' input into `adjustRtime()` lcms2 <- adjustRtime(lcms2, param = param) lcms2 <- applyAdjustedRtime(lcms2) #' evaluate the results with BPC bpc2_adj <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) #' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\") # Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"load-preprocessed-lc-ms-object","dir":"Articles","previous_headings":"","what":"Load preprocessed LC-MS object","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"First, let’s load pre-processed LC-MS object, steps get object shown End--end worflow vignette.","code":"load(\"/shared/data/preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"load-unprocessed-lc-msms-data","dir":"Articles","previous_headings":"","what":"Load unprocessed LC-MS/MS data","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Next, load unprocessed LC-MS/MS data MetaboLights database: adjust sampleData() LC-MS/MS object make easier access: Table 10. Samples LC-MS/MS data set. keep MS runs (MS/MS) remove pooled samples, focusing samples E common runs. alignment, ensure retention time (RT) ranges match datasets: need adjust RT range LC-MS/MS object match LC-MS data:","code":"#' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") # Only keep MS run lcms2 <- lcms2[!grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file),] range(rtime(lcms1)) [1] 9.674428 240.115311 range(rtime(lcms2)) [1] 0.275 480.176 #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, range(rtime(lcms1)))"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"comparing-chromatograms","dir":"Articles","previous_headings":"","what":"Comparing chromatograms","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"evaluate retention time shifts, ’ll plot base peak chromatogram (BPC): Compare run1 sample run2 sample Similarly, compare BPC sample E:","code":"idx_A <- which(sampleData(lcms1)$sample_name == \"A\") idx_E <- which(sampleData(lcms1)$sample_name == \"E\") bpc1 <-chromatogram(lcms1[c(idx_A,idx_E)], aggregationFun = \"max\", msLevel = 1) Processing chromatographic peaks bpc2 <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) plot(bpc1[1,1], col = \"#00000080\", main = \"BPC sample A LC-MS vs A LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 1]), intensity(bpc2[1, 1]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"BPC sample E LC-MS vs E LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"peak-detection","dir":"Articles","previous_headings":"","what":"Peak detection","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Perform peak detection refining alignment, detailed end--end vignette. setting applied.","code":"param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms2 <- findChromPeaks(lcms2, param = param, chunkSize = 2) param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) lcms2 <- refineChromPeaks(lcms2, param = param, chunkSize = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"alignment","dir":"Articles","previous_headings":"","what":"Alignment","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Now, attempt align two samples previous dataset. first step extract landmark features (referred lamas). achieve , identify features present every phenotype group lcms1 dataset. , categorize (using factor()) data phenotype retain QC samples. variable utilized filter features using PercentMissingFilter parameter within filterFeatures() function. , setting threshold = 0 select features present QC samples. lamas input look like alignment. terms method works, alignment algorithm matches chromatographic peaks experimental data lamas, fitting model based match adjust retention times minimize differences two datasets. Now can define param object LamaParama prepare alignment. Parameters tolerance, toleranceRt, ppm relate matching chromatographic peaks lamas. parameters related type fitting generated data points. details parameter overall method can found searching ?adjustRtime. example using default parameters. matchLamaChromPeaks() function facilitates assessment well lamas correspond chromatographic peaks file. extract matched results using matchedRtimes() function. used later evaluate alignment. Now can adjust retention time LC-MS/MS dataset using adjustRtime() function.","code":"f <- sampleData(lcms1)$phenotype f[f != \"QC\"] <- NA lcms1 <- filterFeatures(lcms1, PercentMissingFilter(threshold = 0, f = f), filled = FALSE) 3694 features were removed lcms1_mz_rt <- featureDefinitions(lcms1)[, c(\"mzmed\",\"rtmed\")] head(lcms1_mz_rt) mzmed rtmed FT0001 50.98979 203.6001 FT0002 51.05904 191.1675 FT0003 51.98657 203.1467 FT0004 53.02036 203.2343 FT0005 53.52080 203.1936 FT0007 54.01010 235.9032 nrow(lcms1_mz_rt) [1] 5374 param <- LamaParama(lamas = lcms1_mz_rt, method = \"loess\", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm =20, tolerance = 0, toleranceRt = 20, bs = \"tp\") param <- matchLamasChromPeaks(lcms2, param = param) ref_vs_obs <- matchedRtimes(param) #' input into `adjustRtime()` lcms2 <- adjustRtime(lcms2, param = param) lcms2 <- applyAdjustedRtime(lcms2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"evaluation","dir":"Articles","previous_headings":"","what":"Evaluation","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"extract base peak chromatogram (BPC) aligned object: evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks. quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line.","code":"#' evaluate the results with BPC bpc2_adj <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) #' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\") # Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"visualizing-alignment-quality","dir":"Articles","previous_headings":"Introduction","what":"Visualizing Alignment Quality","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks.","code":"#' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"quantitative-evaluation-of-alignment","dir":"Articles","previous_headings":"Introduction","what":"Quantitative Evaluation of Alignment","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line.","code":"# Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"tutorial demonstrated align LC-MS LC-MS/MS datasets correct retention time shifts, crucial handling data different runs platforms. preprocessed data, detected chromatographic peaks, used landmark features (lamas) QC samples adjust retention times via adjustRtime() function. Visual comparisons base peak chromatograms alignment, along distance calculations, showed clear improvements RT synchronization. Ultimately, aligning chromatographic data ensures subsequent analyses, feature extraction statistical comparisons, based consistent time points, improving data quality reliability. tutorial outlined end--end workflow can adapted various LC-MS-based metabolomics studies, helping researchers manage retention time variation effectively.","code":""},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/dataset-investigation.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Dataset investigation: What to do when you get your data","text":", (amazing lab mate) finally finished data acquisition, now dataset hand. ’s next? Unfortunately, work isn’t yet. diving analysis, ’s crucial understand dataset . first step data analysis workflow, ensuring data good quality well-prepared preprocessing downstream analysis plan perform. vignette, present dataset used throughout different vignettes website. ’s far perfect dataset, actually mirrors reality datasets ’ll encounter research. issues indeed specific described dataset. However, purpose vignette encourage think critically data guide steps can help avoid spending hours analysis, realize later samples features removed flagged earlier .","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/dataset-investigation.html","id":"dataset-description","dir":"Articles","previous_headings":"","what":"Dataset Description","title":"Dataset investigation: What to do when you get your data","text":"workflow, two datasets used: LC-MS-based (MS1 level ) untargeted metabolomics dataset quantify small polar metabolites human plasma samples. additional LC-MS/MS dataset selected samples former study identification annotation significant features. samples randomly selected larger study aimed identifying metabolites varying abundances individuals suffering cardiovascular disease (CVD) healthy controls (CTR). subset analyzed includes data three CVD patients, three CTR individuals, four quality control (QC) samples. QC samples, representing pooled serum sample large cohort, measured repeatedly throughout experiment monitor signal stability. data metadata workflow available MetaboLights database ID: MTBLS8735. detailed materials methods used sample analysis also available MetaboLights entry. particularly important understanding analysis parameters used. noted samples analyzed using ultra-high-performance liquid chromatography (UHPLC) coupled Q-TOF mass spectrometer (TripleTOF 5600+), chromatographic separation achieved using hydrophilic interaction liquid chromatography (HILIC). Consider moving visualizations end--end vignette clearer understanding dataset. Provide -depth visualizations explore understand dataset quality. Compare pool lc-ms pool lc-ms/ms show better separation second run.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/install_v0.html","id":"running-workflows-locally","dir":"Articles","previous_headings":"","what":"Running workflows locally","title":"Install","text":"install computer packages necessary workflows run code follow: BiocManager::install(“rformassspectrometry/metabonaut”, dependencies = TRUE, ask = FALSE, update = TRUE)","code":"install.packages(\"BiocManager\") ## Packa BiocManager::install(c('RforMassSpectrometry/MsIO', 'RforMassSpectrometry/MsBackendMetaboLights'), ask = FALSE, dependencies = TRUE) BiocManager::install(\"rformassspectrometry/metabonaut\", dependencies = TRUE, ask = FALSE, update = TRUE)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/install_v0.html","id":"docker-image","dir":"Articles","previous_headings":"","what":"Docker image","title":"Install","text":"vignettes files along R runtime environment including required packages RStudio (Posit) editor bundled docker container. installation, docker container can run computer code examples vignettes can evaluated within environment (without need install additional packages files). don’t already , install docker. Find installation information . Get docker image tutorial e.g. command line : Start docker container, either Docker Desktop, command line Enter http://localhost:8787 web browser log username rstudio password bioc. RStudio server version: open Quarto files vignettes folder evaluate R code blocks document.","code":"docker pull rformassspectrometry/metabonaut:latest docker run -e PASSWORD=bioc -p 8787:8787 rformassspectrometry/metabonaut:latest"},{"path":"https://rformassspectrometry.github.io/metabonaut/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Philippine Louail. Author, maintainer. ORCID: 0009-0007-5429-6846 Anna Tagliaferri. Contributor. ORCID: 0009-0001-4044-4285 Vinicius Verri Hernandes. Contributor. ORCID: 0000-0002-3057-6460 Daniel Marques de Sá e Silva. Contributor. ORCID: 0000-0002-9674-042X Johannes Rainer. Author. ORCID: 0000-0002-6977-7147","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Philippine Louail, & Johannes Rainer. (2024). Streamlining LC-MS/MS Data Analysis R Open-Source xcms RforMassSpectrometry: End--End Workflow (Version v1).Zenodo. https://doi.org/10.5281/zenodo.11370612","code":"@Manual{, title = {Streamlining LC-MS/MS Data Analysis in R with Open-Source xcms and RforMassSpectrometry: An End-to-End Workflow}, author = {Philippine Louail and Johannes Rainer}, publisher = {Zenodo}, year = {2024}, month = {may}, version = {v1.1.0}, doi = {10.5281/zenodo.11370612}, url = {https://doi.org/10.5281/zenodo.11370612}, }"},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"lets-explore-and-learn-to-analyze-untargeted-metabolomics-data","dir":"","previous_headings":"","what":"Let’s explore and learn to analyze untargeted metabolomics data","title":"Exploring and Analyzing LC-MS Data","text":"Welcome Metabonaut! 🧑🚀 initiative presents series workflows based small LC-MS/MS dataset, utilizing R Bioconductor packages. Throughout workflows, demonstrate adapt various algorithms specific datasets seamlessly integrate R packages ensure efficient, reproducible processing. primary workflow “Complete End--End LC-MS/MS Metabolomic Data Analysis”. full R code examples, along detailed descriptions, available end--end-untargeted-metabolomics.qmd file. file can opened RStudio, allowing execute individual R command. vignettes website interlinked, can find detailed description dataset used throughout . strive reproducibility. workflows designed remain stable time, allowing run vignettes together one comprehensive “super-vignette”. major change document smaller updates check News","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"for-r-beginners","dir":"","previous_headings":"","what":"For R beginners","title":"Exploring and Analyzing LC-MS Data","text":"tutorials provided assume users basic knowledge R RMarkdown. ’re unfamiliar either, recommend completing short tutorial help test code adapt data. vignettes written Quarto format, learn go , farily new format, functionallity shared RMarkdown format, therefore learning can usefull . basic R course documentation, recommand check website try interactive course fun introduction basic R programming. cheatsheet also help.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"known-issues","dir":"","previous_headings":"","what":"Known Issues","title":"Exploring and Analyzing LC-MS Data","text":"just beginning Metabonaut journey, website still refined. ’re actively addressing ongoing issues. ’re aware problem, ’ll list . Currently, known issues code. encounter , please ensure latest versions required packages (detailed ). issue persists, please report reproducible example GitHub . encounter issues, don’t hesitate let us know!","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"contribution","dir":"","previous_headings":"","what":"Contribution","title":"Exploring and Analyzing LC-MS Data","text":"contributions, please see RforMassSpectrometry contributions guideline.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Exploring and Analyzing LC-MS Data","text":"Please review RforMassSpectrometry Code Conduct.","code":""},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/news/index.html","id":"changes-in-0-0-2","dir":"Changelog","previous_headings":"","what":"Changes in 0.0.2","title":"metabonaut 0.0.2","text":"Switch Quarto instead Rmarkdown Addition Alignment reference dataset vignette Addition Data investigation vignette Addition Install vignette","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/news/index.html","id":"changes-in-0-0-2-1","dir":"Changelog","previous_headings":"","what":"Changes in 0.0.1","title":"metabonaut 0.0.2","text":"Addition basic files workflow package. Addition end--end vignette.","code":""}]
+[{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"present workflow describes steps analysis LC-MS/MS experiment, includes preprocessing raw data generate abundance matrix features various samples, followed data normalization, differential abundance analysis finally annotation features metabolites. Note also alternative analysis options R packages used different steps examples mentioned throughout workflow. Steps end--end workflow possible alternatives","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-description","dir":"Articles","previous_headings":"","what":"Data description","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"See data description vignette detailed explanation dataset go workflow general tips done first get data.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"packages-needed","dir":"Articles","previous_headings":"","what":"Packages needed","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"workflow therefore based following dependencies:","code":"## Data Import and handling library(readxl) library(MsExperiment) library(MsIO) library(MsBackendMetaboLights) library(SummarizedExperiment) ## Preprocessing of LC-MS data library(xcms) library(Spectra) library(MetaboCoreUtils) ## Statistical analysis library(limma) # Differential abundance library(matrixStats) # Summaries over matrices ## Visualisation library(pander) library(RColorBrewer) library(pheatmap) library(vioplot) library(ggfortify) # Plot PCA library(gridExtra) # To arrange multiple ggplots into single plots ## Annotation library(AnnotationHub) # Annotation resources library(CompoundDb) # Access small compound annotation data. library(MetaboAnnotation) # Functionality for metabolite annotation."},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-import","dir":"Articles","previous_headings":"","what":"Data import","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Note different equipment generate various file extensions, conversion step might needed beforehand, though apply dataset. Spectra package supports variety ways store retrieve MS data, including mzML, mzXML, CDF files, simple flat files, database systems. necessary, several tools, ProteoWizard’s MSConvert, can used convert files .mzML format [@chambers_cross-platform_2012]. show extract dataset MetaboLigths database load MsExperiment object. information load data MetaboLights database, refer vignette. type data loading, check xcms vignette specific vignette created data import soon. next configure parallel processing setup. functions xcms package allow per-sample parallel processing, can improve performance analysis, especially large data sets. xcms packages RforMassSpectrometry package ecosystem use parallel processing setup configured BiocParallel Bioconductor package. code use fork-based parallel processing unix system, socket-based parallel processing Windows operating system.","code":"param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms1 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #' Set up parallel processing using 2 cores if (.Platform$OS.type == \"unix\") { register(MulticoreParam(2)) } else { register(SnowParam(2)) }"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-organisation","dir":"Articles","previous_headings":"","what":"Data organisation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"experimental data now represented MsExperiment object MsExperiment package. MsExperiment object container metadata spectral data provides manages also linkage samples spectra. provide brief overview data structure content. sampleData() function extracts sample information object. next extract data use pander package render show information Table 1 . Throughout document use R pipe operator (|>) avoid nested function calls hence improve code readability. sampleData() output MetaboLights always ideal direct easy access data. therefore rename transform user-friendly way. user can add, transform remove column want using base R functionalities. Table 1. Samples data set. Table 2. Simplified sample data. 11 samples data set. abbreviations essential proper interpretation metadata information: injection_index: index representing order (position) individual sample measured (injected) within LC-MS measurement run experiment. \"QC\": Quality control sample (pool serum samples external, large cohort). \"CVD\": Sample individual cardiovascular disease. \"CTR\": Sample presumably healthy control. sample_name: arbitrary name/identifier sample. age: (rounded) age individuals. define colors sample groups based sample group using RColorBrewer package: MS data experiment stored Spectra object (Spectra Bioconductor package) within MsExperiment object can accessed using spectra() function. element object spectrum - organised linearly combined Spectra object one (ordered retention time samples). access dataset’s Spectra object summarize available information provide, among things, total number spectra data set. can also summarize number spectra respective MS level (extracted msLevel() function). fromFile() function returns spectrum index sample (data file) can thus used split information (MS level case) sample summarize using base R table() function combine result matrix. Note number spectra acquired run, number spectral features sample. present data set thus contains MS1 data, ideal quantification signal. second (LC-MS/MS) data set also fragment (MS2) spectra samples used later workflow. Note users restrict data evaluation examples shown tutorials. Spectra package enables user-friendly access full MS data functionality extensively used explore, visualize summarize data. another example, determine retention time range entire data set. Data obtained LC-MS experiments typically analyzed along retention time axis, MS data organized spectrum, orthogonal retention time axis.","code":"lcms1 Object of class MsExperiment Spectra: MS1 (17210) Experiment data: 10 sample(s) Sample data links: - spectra: 10 sample(s) to 17210 element(s). sampleData(lcms1)[, c(\"Derived_Spectral_Data_File\", \"Characteristics[Sample type]\", \"Factor Value[Phenotype]\", \"Sample Name\", \"Factor Value[Age]\")] |> kable(format = \"pipe\") # Let's rename the column for easier access colnames(sampleData(lcms1)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # Add \"QC\" to the phenotype of the QC samples sampleData(lcms1)$phenotype[sampleData(lcms1)$sample_name == \"POOL\"] <- \"QC\" sampleData(lcms1)$sample_name[sampleData(lcms1)$sample_name == \"POOL\" ] <- c(\"POOL1\", \"POOL2\", \"POOL3\", \"POOL4\") # Add injection index column sampleData(lcms1)$injection_index <- seq_len(nrow(sampleData(lcms1))) #let's look at the updated sample data sampleData(lcms1)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\", \"injection_index\")] |> kable(format = \"pipe\") #' Access Spectra Object spectra(lcms1) MSn data (Spectra) with 17210 spectra in a MsBackendMetaboLights backend: msLevel rtime scanIndex 1 1 0.274 1 2 1 0.553 2 3 1 0.832 3 4 1 1.111 4 5 1 1.390 5 ... ... ... ... 17206 1 479.052 1717 17207 1 479.331 1718 17208 1 479.610 1719 17209 1 479.889 1720 17210 1 480.168 1721 ... 36 more variables/columns. file(s): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML ... 7 more files #' Count the number of spectra with a specific MS level per file. spectra(lcms1) |> msLevel() |> split(fromFile(lcms1)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 1 1721 1721 1721 1721 1721 1721 1721 1721 1721 1721 #' Retention time range for entire dataset spectra(lcms1) |> rtime() |> range() [1] 0.273 480.169"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-visualization-and-general-quality-assessment","dir":"Articles","previous_headings":"","what":"Data visualization and general quality assessment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Effective visualization paramount inspecting assessing quality MS data. general overview LC-MS data, can: Combine mass peaks (MS1) spectra sample single spectrum mass peak represents maximum signal mass peaks similar m/z. spectrum might called Base Peak Spectrum (BPS), providing information abundant ions sample. Aggregate mass peak intensities spectrum, resulting Base Peak Chromatogram (BPC). BPC shows highest measured intensity distinct retention time (hence spectrum) thus orthogonal BPS. Sum mass peak intensities spectrum create Total Ion Chromatogram (TIC). Compare BPS samples experiment evaluate similarity ion content. Compare BPC samples experiment identify samples similar dissimilar chromatographic signal. addition general data evaluation visualization, also crucial investigate specific signal e.g. internal standards compounds/ions known present samples. providing reliable reference, internal standards help achieve consistent accurate analytical results. BPS collapses data retention time dimension reveals prevalent ions present samples, creation BPS however straightforward. Mass peaks, even representing signals ion, never identical m/z values consecutive spectra due measurement error/resolution instrument. use combineSpectra function combine spectra one file (defined using parameter f = fromFile(data)) single spectrum. mass peaks difference m/z value smaller 3 parts-per-million (ppm) combined one mass peak, intensity representing maximum grouped mass peaks. reduce memory requirement, addition first bin spectrum combining mass peaks within spectrum, aggregating mass peaks bins 0.01 m/z width. case large datasets, also recommended set processingChunkSize() parameter MsExperiment object finite value (default Inf) causing data processed (loaded memory) chunks processingChunkSize() spectra. can reduce memory demand speed process. can now generate BPS sample plot() . , observable overlap ion content files, particularly around 300 m/z 700 m/z. however also differences sets samples. particular, BPS 1, 4, 7 10 (counting row-wise left right) seem different others. fact, four BPS QC samples, remaining six study samples. observed differences might explained fact QC samples pools serum samples different cohort, study samples represent plasma samples, different sample collection. Next visual inspection , can also calculate express similarity BPS heatmap. use compareSpectra() function calculate pairwise similarities BPS use pheatmap() function pheatmap package cluster visualize result. get first glance different samples distribute terms similarity. heatmap confirms observations made BPS, showing distinct clusters QCs study samples, owing different matrices sample collections. also strongly recommended delve deeper data exploring detail. can accomplished carefully assessing data extracting spectra regions interest examination. next chunk, look extract information specific spectrum distinct samples. Figure 3. Intensity m/z values 125th spectrum two CTR samples. significant dissimilarities peak distribution intensity confirm difference composition QCs study samples. next compare full MS1 spectrum CVD CTR sample. Figure 4. Intensity m/z values 125th spectrum one CVD one CTR sample. , can observe spectra CVD CTR samples entirely similar, exhibit similar main peaks 200 600 m/z general higher intensity control samples. However peak distribution (least intensity) seems vary m/z 10 210 m/z 600. CTR spectrum exhibits significant peaks around m/z 150 - 200 much lower intensity CVD sample. delve details specific spectrum, wide range functions can employed: Table 3. Intensity m/z values 125th spectrum one CTR sample. chromatogram() function facilitates extraction intensities along retention time. However, access chromatographic information currently efficient seamless spectral information. Work underway develop/improve infrastructure chromatographic data new Chromatograms object aimed flexible user-friendly Spectra object. visualizing LC-MS data, BPC TIC serves valuable tool assess performance liquid chromatography across various samples experiment. case, extract BPC data create plot. BPC captures maximum peak signal spectrum data file plots information retention time spectrum y-axis. BPC can extracted using chromatogram function. setting parameter aggregationFun = \"max\", instruct function report maximum signal per spectrum. Conversely, setting aggregationFun = \"sum\", sums intensities spectrum, thereby creating TIC. Figure 5. BPC samples colored phenotype. 240 seconds signal seems measured. Thus, filter data removing part well first 10 seconds measured LC run. Figure 6. BPC filtering retention time. Initially, examined entire BPC subsequently filtered based desired retention times. results smaller file size also facilitates straightforward interpretation BPC. final plot illustrates BPC sample colored phenotype, providing insights signal measured along retention times sample. reveals points compounds eluted LC column. essence, BPC condenses three-dimensional LC-MS data (m/z retention time intensity) two dimensions (retention time intensity). can also compare similarities BPCs heatmap. retention times however identical different samples. Thus bin() chromatographic signal per sample along retention time axis bins two seconds resulting data number bins/data points. can calculate pairwise similarities data vectors using cor() function visualize result using pheatmap(). Figure 7. Heatmap BPC similarities. heatmap reinforces exploration spectra data showed, strong separation QC study samples. important bear mind later analyses. Additionally, study samples group two clusters, cluster containing samples C F cluster II samples. plot TIC samples, using different color cluster. Figure 8. Example TIC unusual signal. TIC samples look similar, samples cluster show different signal retention time range 40 160 seconds. Whether, strong difference impact following analysis remains determined. Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Setting the chunksize chunksize <- 1000 processingChunkSize(spectra(lcms1)) <- chunksize #' Combining all spectra per file into a single spectrum bps <- spectra(lcms1) |> bin(binSize = 0.01) |> combineSpectra(f = fromFile(lcms1), intensityFun = max, ppm = 3) #' Plot the base peak spectra par(mar = c(2, 1, 1, 1)) plotSpectra(bps, main= \"\") #' Calculate similarities between BPS sim_matrix <- compareSpectra(bps) #' Add sample names as rownames and colnames rownames(sim_matrix) <- colnames(sim_matrix) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(sim_matrix) #' Plot the heatmap pheatmap(sim_matrix, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) #' Accessing a single spectrum - comparing with QC par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[1])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"QC sample\") plotSpectra(spec2, main = \"CTR sample\") #' Accessing a single spectrum - comparing CVD and CTR par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[2])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"CVD sample\") plotSpectra(spec2, main = \"CTR sample\") #' Checking its intensity intensity(spec2) NumericList of length 1 [[1]] 18.3266733266736 45.1666666666667 ... 27.1048951048951 34.9020979020979 #' Checking its rtime rtime(spec2) [1] 34.872 #' Checking its m/z mz(spec2) NumericList of length 1 [[1]] 51.1677328505635 53.0461968245186 ... 999.139446289161 999.315208803072 #' Filtering for a specific m/z range and viewing in a tabular format filt_spec <- filterMzRange(spec2,c(50,200)) data.frame(intensity = unlist(intensity(filt_spec)), mz = unlist(mz(filt_spec))) |> head() |> kable(format = \"markdown\") #' Extract and plot BPC for full data bpc <- chromatogram(lcms1, aggregationFun = \"max\") plot(bpc, col = paste0(col_sample, 80), main = \"BPC\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Filter the data based on retention time lcms1 <- filterRt(lcms1, c(10, 240)) Filter spectra bpc <- chromatogram(lcms1, aggregationFun = \"max\") #' Plot after filtering plot(bpc, col = paste0(col_sample, 80), main = \"BPC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Total ion chromatogram tic <- chromatogram(lcms1, aggregationFun = \"sum\") |> bin(binSize = 2) #' Calculate similarity (Pearson correlation) between BPCs ticmap <- do.call(cbind, lapply(tic, intensity)) |> cor() rownames(ticmap) <- colnames(ticmap) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(ticmap) #' Plot heatmap pheatmap(ticmap, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) cluster_I_idx <- sampleData(lcms1)$sample_name %in% c(\"F\", \"C\") cluster_II_idx <- sampleData(lcms1)$sample_name %in% c(\"A\", \"B\", \"D\", \"E\") temp_col <- c(\"grey\", \"red\") names(temp_col) <- c(\"Cluster II\", \"Cluster I\") col <- rep(temp_col[1], length(lcms1)) col[cluster_I_idx] <- temp_col[2] col[sampleData(lcms1)$phenotype == \"QC\"] <- NA lcms1 |> chromatogram(aggregationFun = \"sum\") |> plot( col = col, main = \"TIC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = temp_col, legend = names(temp_col), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"spectra-data-visualization-bps","dir":"Articles","previous_headings":"","what":"Spectra Data Visualization: BPS","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"BPS collapses data retention time dimension reveals prevalent ions present samples, creation BPS however straightforward. Mass peaks, even representing signals ion, never identical m/z values consecutive spectra due measurement error/resolution instrument. use combineSpectra function combine spectra one file (defined using parameter f = fromFile(data)) single spectrum. mass peaks difference m/z value smaller 3 parts-per-million (ppm) combined one mass peak, intensity representing maximum grouped mass peaks. reduce memory requirement, addition first bin spectrum combining mass peaks within spectrum, aggregating mass peaks bins 0.01 m/z width. case large datasets, also recommended set processingChunkSize() parameter MsExperiment object finite value (default Inf) causing data processed (loaded memory) chunks processingChunkSize() spectra. can reduce memory demand speed process. can now generate BPS sample plot() . , observable overlap ion content files, particularly around 300 m/z 700 m/z. however also differences sets samples. particular, BPS 1, 4, 7 10 (counting row-wise left right) seem different others. fact, four BPS QC samples, remaining six study samples. observed differences might explained fact QC samples pools serum samples different cohort, study samples represent plasma samples, different sample collection. Next visual inspection , can also calculate express similarity BPS heatmap. use compareSpectra() function calculate pairwise similarities BPS use pheatmap() function pheatmap package cluster visualize result. get first glance different samples distribute terms similarity. heatmap confirms observations made BPS, showing distinct clusters QCs study samples, owing different matrices sample collections. also strongly recommended delve deeper data exploring detail. can accomplished carefully assessing data extracting spectra regions interest examination. next chunk, look extract information specific spectrum distinct samples. Figure 3. Intensity m/z values 125th spectrum two CTR samples. significant dissimilarities peak distribution intensity confirm difference composition QCs study samples. next compare full MS1 spectrum CVD CTR sample. Figure 4. Intensity m/z values 125th spectrum one CVD one CTR sample. , can observe spectra CVD CTR samples entirely similar, exhibit similar main peaks 200 600 m/z general higher intensity control samples. However peak distribution (least intensity) seems vary m/z 10 210 m/z 600. CTR spectrum exhibits significant peaks around m/z 150 - 200 much lower intensity CVD sample. delve details specific spectrum, wide range functions can employed: Table 3. Intensity m/z values 125th spectrum one CTR sample.","code":"#' Setting the chunksize chunksize <- 1000 processingChunkSize(spectra(lcms1)) <- chunksize #' Combining all spectra per file into a single spectrum bps <- spectra(lcms1) |> bin(binSize = 0.01) |> combineSpectra(f = fromFile(lcms1), intensityFun = max, ppm = 3) #' Plot the base peak spectra par(mar = c(2, 1, 1, 1)) plotSpectra(bps, main= \"\") #' Calculate similarities between BPS sim_matrix <- compareSpectra(bps) #' Add sample names as rownames and colnames rownames(sim_matrix) <- colnames(sim_matrix) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(sim_matrix) #' Plot the heatmap pheatmap(sim_matrix, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) #' Accessing a single spectrum - comparing with QC par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[1])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"QC sample\") plotSpectra(spec2, main = \"CTR sample\") #' Accessing a single spectrum - comparing CVD and CTR par(mfrow = c(1,2), mar = c(2, 2, 2, 2)) spec1 <- spectra(lcms1[2])[125] spec2 <- spectra(lcms1[3])[125] plotSpectra(spec1, main = \"CVD sample\") plotSpectra(spec2, main = \"CTR sample\") #' Checking its intensity intensity(spec2) NumericList of length 1 [[1]] 18.3266733266736 45.1666666666667 ... 27.1048951048951 34.9020979020979 #' Checking its rtime rtime(spec2) [1] 34.872 #' Checking its m/z mz(spec2) NumericList of length 1 [[1]] 51.1677328505635 53.0461968245186 ... 999.139446289161 999.315208803072 #' Filtering for a specific m/z range and viewing in a tabular format filt_spec <- filterMzRange(spec2,c(50,200)) data.frame(intensity = unlist(intensity(filt_spec)), mz = unlist(mz(filt_spec))) |> head() |> kable(format = \"markdown\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"chromatographic-data-visualization-bpc-and-tic","dir":"Articles","previous_headings":"","what":"Chromatographic Data Visualization: BPC and TIC","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"chromatogram() function facilitates extraction intensities along retention time. However, access chromatographic information currently efficient seamless spectral information. Work underway develop/improve infrastructure chromatographic data new Chromatograms object aimed flexible user-friendly Spectra object. visualizing LC-MS data, BPC TIC serves valuable tool assess performance liquid chromatography across various samples experiment. case, extract BPC data create plot. BPC captures maximum peak signal spectrum data file plots information retention time spectrum y-axis. BPC can extracted using chromatogram function. setting parameter aggregationFun = \"max\", instruct function report maximum signal per spectrum. Conversely, setting aggregationFun = \"sum\", sums intensities spectrum, thereby creating TIC. Figure 5. BPC samples colored phenotype. 240 seconds signal seems measured. Thus, filter data removing part well first 10 seconds measured LC run. Figure 6. BPC filtering retention time. Initially, examined entire BPC subsequently filtered based desired retention times. results smaller file size also facilitates straightforward interpretation BPC. final plot illustrates BPC sample colored phenotype, providing insights signal measured along retention times sample. reveals points compounds eluted LC column. essence, BPC condenses three-dimensional LC-MS data (m/z retention time intensity) two dimensions (retention time intensity). can also compare similarities BPCs heatmap. retention times however identical different samples. Thus bin() chromatographic signal per sample along retention time axis bins two seconds resulting data number bins/data points. can calculate pairwise similarities data vectors using cor() function visualize result using pheatmap(). Figure 7. Heatmap BPC similarities. heatmap reinforces exploration spectra data showed, strong separation QC study samples. important bear mind later analyses. Additionally, study samples group two clusters, cluster containing samples C F cluster II samples. plot TIC samples, using different color cluster. Figure 8. Example TIC unusual signal. TIC samples look similar, samples cluster show different signal retention time range 40 160 seconds. Whether, strong difference impact following analysis remains determined. Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Extract and plot BPC for full data bpc <- chromatogram(lcms1, aggregationFun = \"max\") plot(bpc, col = paste0(col_sample, 80), main = \"BPC\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Filter the data based on retention time lcms1 <- filterRt(lcms1, c(10, 240)) Filter spectra bpc <- chromatogram(lcms1, aggregationFun = \"max\") #' Plot after filtering plot(bpc, col = paste0(col_sample, 80), main = \"BPC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Total ion chromatogram tic <- chromatogram(lcms1, aggregationFun = \"sum\") |> bin(binSize = 2) #' Calculate similarity (Pearson correlation) between BPCs ticmap <- do.call(cbind, lapply(tic, intensity)) |> cor() rownames(ticmap) <- colnames(ticmap) <- sampleData(lcms1)$sample_name ann <- data.frame(phenotype = sampleData(lcms1)[, \"phenotype\"]) rownames(ann) <- rownames(ticmap) #' Plot heatmap pheatmap(ticmap, annotation_col = ann, annotation_colors = list(phenotype = col_phenotype)) cluster_I_idx <- sampleData(lcms1)$sample_name %in% c(\"F\", \"C\") cluster_II_idx <- sampleData(lcms1)$sample_name %in% c(\"A\", \"B\", \"D\", \"E\") temp_col <- c(\"grey\", \"red\") names(temp_col) <- c(\"Cluster II\", \"Cluster I\") col <- rep(temp_col[1], length(lcms1)) col[cluster_I_idx] <- temp_col[2] col[sampleData(lcms1)$phenotype == \"QC\"] <- NA lcms1 |> chromatogram(aggregationFun = \"sum\") |> plot( col = col, main = \"TIC after filtering retention time\", lwd = 1.5) grid() legend(\"topright\", col = temp_col, legend = names(temp_col), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") #' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"known-compounds","dir":"Articles","previous_headings":"Data visualization and general quality assessment","what":"Known compounds","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Throughout entire process, crucial reference points within dataset, well-known ions. experiments nowadays include internal standards (), case . strongly recommend using visualization throughout entire analysis. experiment, set 15 spiked samples. reviewing signal , selected two guide analysis process. However, also advise plot evaluate ions steps. illustrate , generate Extracted Ion Chromatograms (EIC) selected test ions. restricting MS data intensities within restricted, small m/z range selected retention time window, EICs expected contain signal single type ion. expected m/z retention times set determined different experiment. Additionally, cases internal standards available, commonly present ions sample matrix can serve suitable alternatives. Ideally, compounds distributed across entire retention time range experiment. Table 4. Internal standard list respective m/z expected retention time [s]. plot EICs isotope labeled cystine methionine. Figure 9. EIC cystine methionine. can observe clear concentration difference QCs study samples isotope labeled cystine ion. Meanwhile, labeled methionine internal standard exhibits discernible signal amidst noise noticeable retention time shift samples. artificially isotope labeled compounds spiked individual samples, also signal endogenous compounds serum (plasma) samples. Thus, calculate next mass m/z [M+H]+ ion endogenous cystine chemical formula extract also EIC ion. calculation exact mass m/z selected ion adduct use calculateMass() mass2mz() functions MetaboCoreUtils package. Figure 10. EIC endogenous cystine vs spiked. two cystine EICs look highly similar (endogenous shown left, isotope labeled right plot ), shift m/z, arises artificial labeling. shift allows us discriminate endogenous non-endogenous compound.","code":"#' Load our list of standard intern_standard <- read.delim(\"intern_standard_list.txt\") # Extract EICs for the list eic_is <- chromatogram( lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) #' Add internal standard metadata fData(eic_is)$mz <- intern_standard$mz fData(eic_is)$rt <- intern_standard$RT fData(eic_is)$name <- intern_standard$name fData(eic_is)$abbreviation <- intern_standard$abbreviation rownames(fData(eic_is)) <- intern_standard$abbreviation #' Summary of IS information fData(eic_is)[, c(\"name\", \"mz\", \"rt\")] |> kable(format = \"pipe\") #' Extract the two IS from the chromatogram object. eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] #' plot both EIC par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.5)) plot(eic_cystine, main = fData(eic_cystine)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_cystine)$rt, col = \"red\", lty = 3) plot(eic_met, main = fData(eic_met)$name, cex.axis = 0.8, cex.main = 0.8, col = paste0(col_sample, 80)) grid() abline(v = fData(eic_met)$rt, col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' extract endogenous cystine mass and EIC and plot. cysmass <- calculateMass(\"C6H12N2O4S2\") cys_endo <- mass2mz(cysmass, adduct = \"[M+H]+\")[, 1] #' Plot versus spiked par(mfrow = c(1, 2)) chromatogram(lcms1, mz = cys_endo + c(-0.005, 0.005), rt = unlist(fData(eic_cystine)[, c(\"rtmin\", \"rtmax\")]), aggregationFun = \"max\") |> plot(col = paste0(col_sample, 80)) |> grid() plot(eic_cystine, col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-preprocessing","dir":"Articles","previous_headings":"","what":"Data preprocessing","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Preprocessing stands inaugural step analysis untargeted LC-MS. characterized 3 main stages: chromatographic peak detection, retention time shift correction (alignment) correspondence results features defined. primary objective preprocessing quantification signals ions measured sample, addressing potential retention time drifts samples, ensuring alignment quantified signals across samples within experiment. final result LC-MS data preprocessing numeric matrix abundances quantified entities samples experiment. initial preprocessing step involves detecting intensity peaks along retention time axis, called chromatographic peaks. achieve , employ findChromPeaks() function within xcms. function supports various algorithms peak detection, can selected configured respective parameter objects. preferred algorithm case, CentWave, utilizes continuous wavelet transformation (CWT)-based peak detection [@tautenhahn_highly_2008]. method known effectiveness handling non-Gaussian shaped chromatographic peaks peaks varying retention time widths, commonly encountered HILIC separations. apply CentWave algorithm default settings extracted ion chromatogram cystine methionine ions evaluate results. CentWave highly performant algorithm, requires costumized dataset. implies parameters fine-tuned based user’s data. example serves clear motivation users familiarize various parameters need adapt data set. discuss main parameters can easily adjusted suit user’s dataset: peakwidth: Specifies minimal maximal expected width peaks retention time dimension. Highly dependent chromatographic settings used. ppm: maximal allowed difference mass peaks’ m/z values (parts-per-million) consecutive scans consider representing signal ion. integrate: parameter defines integration method. , primarily use integrate = 2 integrates also signal chromatographic peak’s tail considered accurate developers. determine peakwidth, recommend users refer previous EICs estimate range peak widths observe dataset. Ideally, examining multiple EICs goal. dataset, peak widths appear around 2 10 seconds. advise choosing range wide narrow peakwidth parameter can lead false positives negatives. determine ppm, deeper analysis dataset needed. clarified ppm depends instrument, users necessarily input vendor-advertised ppm. ’s determine accurately possible: following steps involve generating highly restricted MS area single mass peak per spectrum, representing cystine ion. m/z peaks extracted, absolute difference calculated finally expressed ppm. therefore, choose value close maximum within range parameter ppm, .e., 15 ppm. can now perform chromatographic peak detection adapted settings EICs. important note , properly estimate background noise, sufficient data points outside chromatographic peak need present. generally problem peak detection performed full LC-MS data set, peak detection EICs retention time range EIC needs sufficiently wide. function fails find peak EIC, initial troubleshooting step increase range. Additionally, signal--noise threshold snthresh reduced peak detection EICs, within small retention time range, enough signal present properly estimate background noise. Finally, case MS1 data points per peaks, setting CentWave’s advanced parameter extendLengthMSW TRUE can help peak detection. customized parameters, chromatographic peak detected sample. , use plot() function EICs visualize results. Figure 11. Chromatographic peak detection EICs. can see peak seems ot detected sample ions. indicates custom settings seem thus suitable dataset. now proceed apply entire dataset, extracting EICs ions evaluate confirm chromatographic peak detection worked expected. Note: revert value parameter snthresh default, , mentioned , background noise estimation reliable performed full data set. Parameter chunkSize findChromPeaks() defines number data files loaded memory processed simultaneously. parameter thus allows fine-tune memory demand well performance chromatographic peak detection step. plot EICs two selected internal standards evaluate chromatographic peak detection results. Figure 12. Chromatographic peak detection EICs processing. Peaks seem detected properly samples ions. indicates peak detection process entire dataset successful. identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands. Despite using chromatographic settings conditions retention time shifts unavoidable. Indeed, performance instrument can change time, example due small variations environmental conditions, temperature pressure. shifts generally small samples measured within batch/measurement run, can considerable data experiment acquired across longer time period. evaluate presence shift extract plot BPC QC samples. Figure 15. BPC QC samples. QC samples representing sample (pool) measured regular intervals measurement run experiment measured day. Still, small shifts can observed, especially region 100 150 seconds. facilitate proper correspondence signals across samples (hence definition LC-MS features), essential minimize differences retention times. Theoretically, proceed two steps: first select QC samples dataset first alignment , using -called anchor peaks. way can assume linear shift time, since always measuring sample different regular time intervals. Despite external QCs data set, still use subset-based alignment assuming retention time shifts independent different sample matrix (human serum plasma) instead mostly instrument-dependent. Note also possible manually specify anchor peaks, respectively retention times align data set external, reference, data set. information provided vignettes xcms package. calculating much adjust retention time samples, apply shift also study samples. xcms retention time alignment can performed using adjustRtime() function alignment algorithm. example use PeakGroups method [@smith_xcms_2006] performs alignment minimizing differences retention times set anchor peaks different samples. method requires initial correspondence analysis match/group chromatographic peaks across samples algorithm selects anchor peaks alignment. initial correspondence, use PeakDensity approach [@smith_xcms_2006] groups chromatographic peaks similar m/z retention time LC-MS features. parameters algorithm, can configured using PeakDensityParam object, sampleGroups, minFraction, binSize, ppm bw. binSize, ppm bw allow specify similar chromatographic peaks’ m/z retention time values need consider grouping feature. binSize ppm define required similarity m/z values. Within m/z bin (defined binSize ppm) areas along retention time axis high chromatographic peak density (considering peaks samples) identified, chromatographic peaks within regions considered grouping feature. High density areas identified using base R density() function, bw parameter: higher values define wider retention time areas, lower values require chromatographic peaks similar retention times. parameter can seen black line plot , corresponding smoothness density curve. Whether candidate peaks get grouped feature depends also parameters sampleGroups minFraction: sampleGroups provide, sample, sample group belongs . minFraction expected value 0 1 defining proportion samples within least one sample groups (defined sampleGroups) chromatographic peaks detected group feature. initial correspondence, parameters don’t need fully optimized. Selection dataset-specific parameter values described detail next section. dataset, use small values binSize ppm , importantly, also parameter bw, since data set ultra high performance (UHP) LC setup used. minFraction use high value (0.9) ensure features defined chromatographic peaks present almost samples one sample group (can used anchor peaks actual alignment). base alignment later QC samples hence define sampleGroups binary variable grouping samples either study, QC group. Figure 16. Initial correspondence analysis. PeakGroups-based alignment can next performed using adjustRtime() function PeakGroupsParam parameter object. parameters algorithm : subsetAdjust subset: Allows subset alignment. base retention time alignment QC samples, .e., retention time shifts estimated based repeatedly measured samples. resulting adjustment applied entire data. data sets QC samples (e.g. sample pools) measured repeatedly, strongly suggest use method. Note also subset-based alignment samples ordered injection index (.e., order measured measurement run). minFraction: value 0 1 defining proportion samples (full data set, data subset defined subset) chromatographic peak identified use anchor peak. contrast PeakDensityParam parameter used define proportion within sample group. span: PeakGroups method allows, depending data, adjust regions along retention time axis differently. enable local alignments LOESS function used parameter defines degree smoothing function. Generally, values 0.4 0.6 used, however, suggested evaluate alignment results eventually adapt parameters result satisfactory. perform alignment data set based retention times anchor peaks defined subset QC samples. Alignment adjusted retention times spectra data set, well retention times identified chromatographic peaks. alignment performed, user evaluate results using plotAdjustedRtime() function. function visualizes difference adjusted raw retention time sample y-axis along adjusted retention time x-axis. Dot points represent position used anchor peak along retention time axis. optimal alignment areas along retention time axis, anchor peaks scattered retention time dimension. Figure 17. Retention time alignment results. samples present data set measured within measurement run, resulting small retention time shifts. Therefore, little adjustments needed performed (shifts maximum 1 second can seen plot ). Generally, magnitude adjustment seen plots match expectation analyst. can also compare BPC alignment. get original data, .e. raw retention times, can use dropAdjustedRtime() function: Figure 18. BPC alignment. largest shift can observed retention time range 120 130s. Apart retention time range, little changes can observed. next evaluate impact alignment EICs selected internal standards. thus first extract ion chromatograms alignment. can now evaluate alignment effect test ions. plot EICs alignment isotope labeled cystine methionine. Figure 19. EICs cystine methionine alignment. non-endogenous cystine ion already well aligned difference minimal. methionine ion, however, shows improvement alignment. addition visual inspection results, also evaluate impact alignment comparing variance retention times internal standards alignment. end, first need identify chromatographic peaks sample m/z retention time close expected values internal standard. use matchValues() function MetaboAnnotation package [@rainer_modular_2022] using MzRtParam method identify chromatographic peaks similar m/z (+/- 50 ppm) retention time (+/- 10 seconds) internal standard’s values. parameters mzColname rtColname specify column names query () target (chromatographic peaks) contain m/z retention time values match entities. perform matching separately sample. internal standard every sample, use filterMatches() function SingleMatchParam() parameter select chromatographic peak highest intensity. now internal standard ID chromatographic peak sample likely represents signal ion. can now extract retention times chromatographic peaks alignment. can now evaluate impact alignment retention times internal standards across full data set: Figure 20. Retention time variation internal standards alignment. average, variation retention times internal standards across samples slightly reduced alignment. briefly touched subject correspondence determine anchor peaks alignment. Generally, goal correspondence analysis identify chromatographic peaks originate types ions samples experiment group LC-MS features. point, proper configuration parameter bw crucial. illustrate sensible choices parameter’s value can made. use plotChromPeakDensity() function simulate correspondence analysis default values PeakGroups extracted ion chromatograms two selected isotope labeled ions. plot shows EIC top panel, apex position chromatographic peaks different samples (y-axis), along retention time (x-axis) lower panel. Figure 21. Initial correspondence analysis, Cystine. Figure 22. Initial correspondence analysis, Methionine. Grouping peaks depends smoothness previousl mentionned density curve can configured parameter bw. seen , smoothness high properly group features. looking default parameters, can observe indeed, bw parameter set bw = 30, high modern UHPLC-MS setups. reduce value parameter 1.8 evaluate impact. Figure 23. Correspondence analysis optimized parameters, Cystine. Figure 24. Correspondence analysis optimized parameters, Methionine. can observe peaks now grouped accurately single feature test ion. important parameters optimized : binsize: data generated high resolution MS instrument, thus select low value paramete. ppm: TOF instruments, suggested use value ppm larger 0 accommodate higher measurement error instrument larger m/z values. minFraction: set minFraction = 0.75, hence defining features chromatographic peak identified least 75% samples one sample groups. sampleGroups: use information available sampleData’s \"phenotype\" column. correspondence analysis suggested evaluate results selected EICs. extract signal m/z similar isotope labeled methionine larger retention time range. Importantly, show actual correspondence results, set simulate = FALSE plotChromPeakDensity() function. Figure 25. Correspondence analysis results, Methionine. hoped, signal two different ions now grouped separate features. Generally, correspondence results evaluated extracted chromatograms. Another interesting information look distribution features along retention time axis. Table 5. Distribution features along retention time axis. results correspondence analysis now stored, along results preprocessing steps, within XcmsExperiment result object. correspondence results, .e., definition LC-MS features, can extracted using featureDefinitions() function. data frame provides average m/z retention time (columns \"mzmed\" \"rtmed\") characterize LC-MS feature. Column, \"peakidx\" contains indices chromatographic peaks assigned feature. actual abundances features, represent also final preprocessing results, can extracted featureValues() function: can note features (e.g. F0003 F0006) missing values samples. expected certain degree samples features, respectively ions, need present. address next section. previously observed missing values (NA) attributed various reasons. Although might represent genuinely missing value, indicating ion (feature) truly present particular sample, also result failure preceding chromatographic peak detection step. crucial able recover missing values latter category much possible reduce eventual need data imputation. next examine prevalent missing values present dataset: can observe substantial number missing values values dataset. Let’s therefore delve process gap-filling. first evaluate example features chromatographic peak detected samples: Figure 26. Examples chromatographic peaks missing values. instances, chromatographic peak identified one two selected samples (red line), hence missing value reported feature particular samples (blue line). However, cases, signal measured samples, thus, reporting missing value correct example. signal feature low, likely reason peak detection failed. rescue signal cases, fillChromPeaks() function can used ChromPeakAreaParam approach. method defines m/z-retention time area feature based detected peaks, signal respective ion expected. integrates intensities within area samples missing values feature. reported feature abundance. apply method using default parameters. fillChromPeaks() thus rescue missing data data set. Note , even sample ion present, worst case noise integrated, expected much lower actual chromatographic peak signal. Let’s look previously missing values : Figure 27. Examples chromatographic peaks missing values gap-filling. gap-filling, also blue colored sample chromatographic peak present peak area reported feature abundance sample. assess effectiveness gap-filling method rescuing signals, can also plot average signal features least one missing value average filled-signal. advisable perform analysis repeatedly measured samples; case, QC samples used. , extract: Feature values detected chromatographic peaks setting filled = FALSE featuresValues() call. filled-signal first extracting detected gap-filled abundances replace values detected chromatographic peaks NA. , calculate row averages matrices plot . detected (x-axis) gap-filled (y-axis) values QC samples highly correlated. Especially higher abundances, agreement high, low intensities, can expected, differences higher trending correlation line. , addition, fit linear regression line data summarize results linear regression line slope 1.12 intercept -1.62. indicates filled-signal average 1.12 times higher detected signal. final results LC-MS data preprocessing stored within XcmsExperiment object. includes identified chromatographic peaks, alignment results, well correspondence results. addition, guarantee reproducibility, result object keeps track performed processing steps, including individual parameter objects used configure . processHistory() function returns list various applied processing steps chronological order. , extract information first step performed preprocessing. processParam() function used extract actual parameter class used configure processing step. final result whole LC-MS data preprocessing two-dimensional matrix abundances -called LC-MS features samples. Note stage analysis features characterized m/z retention time don’t yet information metabolite feature represent. seen , feature matrix can extracted featureValues() function corresponding feature characteristics (.e., m/z retention time values) using featureDefinitions() function. Thus, two arrays extracted xcms result object used/imported analysis packages processing. example also exported tab delimited text files, used external tool, used, also MS2 spectra available, feature-based molecular networking GNPS analysis environment [@nothias_feature-based_2020]. processing R, reference link raw MS data required, suggested extract xcms preprocessing result using quantify() function SummarizedExperiment object, Bioconductor’s default container data biological assays/experiments. simplifies integration Bioconductor analysis packages. quantify() function takes parameters featureValues() function, thus, call extract SummarizedExperiment detected, gap-filled, feature abundances: Sample identifications xcms result’s sampleData() now available colData() (column, sample annotations) featureDefinitions() rowData() (row, feature annotations). feature values added first assay() SummarizedExperiment even processing history available object’s metadata(). SummarizedExperiment supports multiple assays, numeric matrices dimensions. thus add detected gap-filled feature abundances additional assay SummarizedExperiment. Feature abundances can extracted assay() function. extract first 6 lines detected gap-filled feature abundances: advantage, addition container full preprocessing results also possibility easy intuitive creation data subsets ensuring data integrity. example easy subset full data selection features /samples: moving next step analysis, advisable save preprocessing results. multiple format options save , can found MsIO package documentation. save XcmsExperiment object file format handled alabster framework, ensures object can easily read languages like Python Javascript well loaded easily back R.","code":"#' Use default Centwave parameter param <- CentWaveParam() #' Look at the default parameters param Object of class: CentWaveParam Parameters: - ppm: [1] 25 - peakwidth: [1] 20 50 - snthresh: [1] 10 - prefilter: [1] 3 100 - mzCenterFun: [1] \"wMean\" - integrate: [1] 1 - mzdiff: [1] -0.001 - fitgauss: [1] FALSE - noise: [1] 0 - verboseColumns: [1] FALSE - roiList: list() - firstBaselineCheck: [1] TRUE - roiScales: numeric(0) - extendLengthMSW: [1] FALSE - verboseBetaColumns: [1] FALSE #' Evaluate for Cystine cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column #' Evaluate for Methionine met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column #' Restrict the data to signal from cystine in the first sample cst <- lcms1[1L] |> spectra() |> filterRt(rt = c(208, 218)) |> filterMzRange(mz = fData(eic_cystine)[\"cystine_13C_15N\", c(\"mzmin\", \"mzmax\")]) #' Show the number of peaks per m/z filtered spectra lengths(cst) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #' Calculate the difference in m/z values between scans mz_diff <- cst |> mz() |> unlist() |> diff() |> abs() #' Express differences in ppm range(mz_diff * 1e6 / mean(unlist(mz(cst)))) [1] 0.08829605 14.82188728 #' Parameters adapted for chromatographic peak detection on EICs. param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2, snthresh = 2) #' Evaluate on the cystine ion cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column [1,] 209.251 207.577 212.878 4085.675 2911.376 2157.459 4 1 1 [2,] 209.251 206.182 213.995 24625.728 19074.407 12907.487 4 1 2 [3,] 209.252 207.020 214.274 19467.836 14594.041 9996.466 4 1 3 [4,] 209.251 207.577 212.041 4648.229 3202.617 2458.485 3 1 4 [5,] 208.974 206.184 213.159 23801.825 18126.978 11300.289 3 1 5 [6,] 209.250 207.018 213.714 25990.327 21036.768 13650.329 5 1 6 [7,] 209.252 207.857 212.879 4528.767 3259.039 2445.841 4 1 7 [8,] 209.252 207.299 213.995 23119.449 17274.140 12153.410 4 1 8 [9,] 208.972 206.740 212.878 28943.188 23436.119 14451.023 4 1 9 [10,] 209.252 207.578 213.437 4470.552 3065.402 2292.881 4 1 10 #' Evaluate on the methionine ion met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column [1,] 159.867 157.913 162.378 20026.61 14715.42 12555.601 4 1 1 [2,] 160.425 157.077 163.215 16827.76 11843.39 8407.699 3 1 2 [3,] 160.425 157.356 163.215 18262.45 12881.67 9283.375 3 1 3 [4,] 159.588 157.635 161.820 20987.72 15424.25 13327.811 4 1 4 [5,] 160.985 156.799 163.217 16601.72 11968.46 10012.396 4 1 5 [6,] 160.982 157.634 163.214 17243.24 12389.94 9150.079 4 1 6 [7,] 159.867 158.193 162.099 21120.10 16202.05 13531.844 3 1 7 [8,] 160.426 157.356 162.937 18937.40 13739.73 10336.000 3 1 8 [9,] 160.704 158.472 163.215 17882.21 12299.43 9395.548 3 1 9 [10,] 160.146 157.914 162.379 20275.80 14279.50 12669.821 3 1 10 #' Using the same settings, but with default snthresh param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms1 <- findChromPeaks(lcms1, param = param, chunkSize = 5) #' Update EIC internal standard object eics_is_noprocess <- eic_is eic_is <- chromatogram(lcms1,, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_noprocess) Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\") #' Get QC samples QC_samples <- sampleData(lcms1)$phenotype == \"QC\" #' extract BPC lcms1[QC_samples] |> chromatogram(aggregationFun = \"max\", chromPeaks = \"none\") |> plot(col = col_phenotype[\"QC\"], main = \"BPC of QC samples\") |> grid() # Initial correspondence analysis param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype == \"QC\", minFraction = 0.9, binSize = 0.01, ppm = 10, bw = 2) lcms1 <- groupChromPeaks(lcms1, param = param) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) #' Define parameters of choice subset <- which(sampleData(lcms1)$phenotype == \"QC\") param <- PeakGroupsParam(minFraction = 0.9, extraPeaks = 50, span = 0.5, subsetAdjust = \"average\", subset = subset) #' Perform the alignment lcms1 <- adjustRtime(lcms1, param = param) Performing retention time correction using 5373 peak groups. Aligning sample number 2 against subset ... OK Aligning sample number 3 against subset ... OK Aligning sample number 5 against subset ... OK Aligning sample number 6 against subset ... OK Aligning sample number 8 against subset ... OK Aligning sample number 9 against subset ... OK #' Visualize alignment results plotAdjustedRtime(lcms1, col = paste0(col_sample, 80), peakGroupsPch = 1) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' Get data before alignment lcms1_raw <- dropAdjustedRtime(lcms1) #' Apply the adjusted retention time to our dataset lcms1 <- applyAdjustedRtime(lcms1) #' Plot the BPC before and after alignment par(mfrow = c(2, 1), mar = c(2, 1, 1, 0.5)) chromatogram(lcms1_raw, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC before alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) chromatogram(lcms1, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC after alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) #' Store the EICs before alignment eics_is_refined <- eic_is #' Update the EICs eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_refined) #' Extract the EICs for the test ions eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] par(mfrow = c(2, 2), mar = c(4, 4.5, 2, 1)) old_eic_cystine <- eics_is_refined[\"cystine_13C_15N\"] plot(old_eic_cystine, main = \"Cystine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) old_eic_met <- eics_is_refined[\"methionine_13C_15N\"] plot(old_eic_met, main = \"Methionine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) plot(eic_cystine, main = \"Cystine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") plot(eic_met, main = \"Methionine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) #' Extract the matrix with all chromatographic peaks and add a column #' with the ID of the chromatographic peak chrom_peaks <- chromPeaks(lcms1) |> as.data.frame() chrom_peaks$peak_id <- rownames(chrom_peaks) #' Define the parameters for the matching and filtering of the matches p_1 <- MzRtParam(ppm = 50, toleranceRt = 10) p_2 <- SingleMatchParam(duplicates = \"top_ranked\", column = \"target_maxo\", decreasing = TRUE) #' Iterate over samples and identify for each the chromatographic peaks #' with similar m/z and retention time than the onse from the internal #' standard, and extract among them the ID of the peaks with the #' highest intensity. intern_standard_peaks <- lapply(seq_along(lcms1), function(i) { tmp <- chrom_peaks[chrom_peaks[, \"sample\"] == i, , drop = FALSE] mtch <- matchValues(intern_standard, tmp, mzColname = c(\"mz\", \"mz\"), rtColname = c(\"RT\", \"rt\"), param = p_1) mtch <- filterMatches(mtch, p_2) mtch$target_peak_id }) |> do.call(what = cbind) #' Define the index of the selected chromatographic peaks in the #' full chromPeaks matrix idx <- match(intern_standard_peaks, rownames(chromPeaks(lcms1))) #' Extract the raw retention times for these rt_raw <- chromPeaks(lcms1_raw)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) #' Extract the adjusted retention times for these rt_adj <- chromPeaks(lcms1)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) list(all_raw = rowSds(rt_raw, na.rm = TRUE), all_adj = rowSds(rt_adj, na.rm = TRUE) ) |> vioplot(ylab = \"sd(retention time)\") grid() #' Default parameter for the grouping and apply them to the test ions BPC param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 30) param Object of class: PeakDensityParam Parameters: - sampleGroups: [1] \"QC\" \"CVD\" \"CTR\" \"QC\" \"CTR\" \"CVD\" \"QC\" \"CTR\" \"CVD\" \"QC\" - bw: [1] 30 - minFraction: [1] 0.5 - minSamples: [1] 1 - binSize: [1] 0.25 - maxFeatures: [1] 50 - ppm: [1] 0 plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Updating parameters param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 1.8) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Define the settings for the param param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, minFraction = 0.75, binSize = 0.01, ppm = 10, bw = 1.8) #' Apply to whole data lcms1 <- groupChromPeaks(lcms1, param = param) #' Extract chromatogram for an m/z similar to the one of the labeled methionine chr_test <- chromatogram(lcms1, mz = as.matrix(intern_standard[\"methionine_13C_15N\", c(\"mzmin\", \"mzmax\")]), rt = c(145, 200), aggregationFun = \"max\") Processing chromatographic peaks Processing features plotChromPeakDensity( chr_test, simulate = FALSE, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(chr_test)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(chr_test)[, \"sample\"]], 20), peakPch = 16) # Bin features per RT slices vc <- featureDefinitions(lcms1)$rtmed breaks <- seq(0, max(vc, na.rm = TRUE) + 1, length.out = 15) |> round(0) cuts <- cut(vc, breaks = breaks, include.lowest = TRUE) table(cuts) |> kable(format = \"pipe\") #' Definition of the features featureDefinitions(lcms1) |> head() mzmed mzmin mzmax rtmed rtmin rtmax npeaks CTR CVD QC FT0001 50.98979 50.98949 50.99038 203.6001 203.1181 204.2331 8 1 3 4 FT0002 51.05904 51.05880 51.05941 191.1675 190.8787 191.5050 9 2 3 4 FT0003 51.98657 51.98631 51.98699 203.1467 202.6406 203.6710 7 0 3 4 FT0004 53.02036 53.02009 53.02043 203.2343 202.5652 204.0901 10 3 3 4 FT0005 53.52080 53.52051 53.52102 203.1936 202.8490 204.0901 10 3 3 4 FT0006 54.01007 54.00988 54.01015 159.2816 158.8499 159.4484 6 1 3 2 peakidx ms_level FT0001 7702, 16.... 1 FT0002 7176, 16.... 1 FT0003 7680, 17.... 1 FT0004 7763, 17.... 1 FT0005 8353, 17.... 1 FT0006 5800, 15.... 1 #' Extract feature abundances featureValues(lcms1, method = \"sum\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 NA 481.7436 FT0002 710.8078 875.9192 NA 693.6997 FT0003 445.5711 613.4410 NA 497.8866 FT0004 16994.5260 24605.7340 19766.707 17808.0933 FT0005 3284.2664 4526.0531 3521.822 3379.8909 FT0006 10681.7476 10009.6602 NA 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 NA 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 NA 634.9370 449.0933 NA FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 NA 7296.4262 NA 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 NA #' Percentage of missing values sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 26.41597 ftidx <- which(is.na(rowSums(featureValues(lcms1)))) fts <- rownames(featureDefinitions(lcms1))[ftidx] farea <- featureArea(lcms1, features = fts[1:2]) chromatogram(lcms1[c(2, 3)], rt = farea[, c(\"rtmin\", \"rtmax\")], mz = farea[, c(\"mzmin\", \"mzmax\")]) |> plot(col = c(\"red\", \"blue\"), lwd = 2) Processing chromatographic peaks #' Fill in the missing values in the whole dataset lcms1 <- fillChromPeaks(lcms1, param = ChromPeakAreaParam(), chunkSize = 5) #' Percentage of missing values after gap-filling sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 5.155492 Processing chromatographic peaks #' Get only detected signal in QC samples vals_detect <- featureValues(lcms1, filled = FALSE)[, QC_samples] #' Get detected and filled-in signal vals_filled <- featureValues(lcms1)[, QC_samples] #' Replace detected signal with NA vals_filled[!is.na(vals_detect)] <- NA #' Identify features with at least one filled peak has_filled <- is.na(rowSums(vals_detect)) #' Calculate row averages for features with missing values avg_detect <- rowMeans(vals_detect[has_filled, ], na.rm = TRUE) avg_filled <- rowMeans(vals_filled[has_filled, ], na.rm = TRUE) #' Plot the values against each other (in log2 scale) plot(log2(avg_detect), log2(avg_filled), xlim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), ylim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), pch = 21, bg = \"#00000020\", col = \"#00000080\") grid() abline(0, 1) #' fit a linear regression line to the data l <- lm(log2(avg_filled) ~ log2(avg_detect)) summary(l) Call: lm(formula = log2(avg_filled) ~ log2(avg_detect)) Residuals: Min 1Q Median 3Q Max -6.8176 -0.3807 0.1725 0.5492 6.7504 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.62359 0.11545 -14.06 <2e-16 *** log2(avg_detect) 1.11763 0.01259 88.75 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9366 on 2846 degrees of freedom (846 observations deleted due to missingness) Multiple R-squared: 0.7346, Adjusted R-squared: 0.7345 F-statistic: 7877 on 1 and 2846 DF, p-value: < 2.2e-16 #' Check first step of the process history processHistory(lcms1)[[1]] Object of class \"XProcessHistory\" type: Peak detection date: Mon Oct 21 23:01:56 2024 info: fileIndex: 1,2,3,4,5,6,7,8,9,10 Parameter class: CentWaveParam MS level(s) 1 #' Extract results as a SummarizedExperiment res <- quantify(lcms1, method = \"sum\", filled = FALSE) res class: SummarizedExperiment dim: 9068 10 metadata(6): '' '' ... '' '' assays(1): raw rownames(9068): FT0001 FT0002 ... FT9067 FT9068 rowData names(11): mzmed mzmin ... QC ms_level colnames(10): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML ... MS_F_POS.mzML MS_QC_POOL_4_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index assays(res)$raw_filled <- featureValues(lcms1, method = \"sum\", filled = TRUE ) #' Different assay in the SummarizedExperiment object assayNames(res) [1] \"raw\" \"raw_filled\" assay(res, \"raw_filled\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 411.3295 481.7436 FT0002 710.8078 875.9192 457.5920 693.6997 FT0003 445.5711 613.4410 277.5022 497.8866 FT0004 16994.5260 24605.7340 19766.7069 17808.0933 FT0005 3284.2664 4526.0531 3521.8221 3379.8909 FT0006 10681.7476 10009.6602 9599.9701 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 314.7567 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 425.3774 634.9370 449.0933 556.2544 FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 4792.2390 7296.4262 2382.1788 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 6911.5439 res[1:14, 3:8] class: SummarizedExperiment dim: 14 6 metadata(6): '' '' ... '' '' assays(2): raw raw_filled rownames(14): FT0001 FT0002 ... FT0013 FT0014 rowData names(11): mzmed mzmin ... QC ms_level colnames(6): MS_B_POS.mzML MS_QC_POOL_2_POS.mzML ... MS_QC_POOL_3_POS.mzML MS_E_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index #' Save the preprocessing results #' d <- file.path(tempdir(), \"objects/lcms1\") # saveMsObject(lcms1, AlabasterParam(path = d)) #' for now let's do R object because the previous method is not implemented yet. save(lcms1, file = \"preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"chromatographic-peak-detection","dir":"Articles","previous_headings":"","what":"Chromatographic peak detection","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"initial preprocessing step involves detecting intensity peaks along retention time axis, called chromatographic peaks. achieve , employ findChromPeaks() function within xcms. function supports various algorithms peak detection, can selected configured respective parameter objects. preferred algorithm case, CentWave, utilizes continuous wavelet transformation (CWT)-based peak detection [@tautenhahn_highly_2008]. method known effectiveness handling non-Gaussian shaped chromatographic peaks peaks varying retention time widths, commonly encountered HILIC separations. apply CentWave algorithm default settings extracted ion chromatogram cystine methionine ions evaluate results. CentWave highly performant algorithm, requires costumized dataset. implies parameters fine-tuned based user’s data. example serves clear motivation users familiarize various parameters need adapt data set. discuss main parameters can easily adjusted suit user’s dataset: peakwidth: Specifies minimal maximal expected width peaks retention time dimension. Highly dependent chromatographic settings used. ppm: maximal allowed difference mass peaks’ m/z values (parts-per-million) consecutive scans consider representing signal ion. integrate: parameter defines integration method. , primarily use integrate = 2 integrates also signal chromatographic peak’s tail considered accurate developers. determine peakwidth, recommend users refer previous EICs estimate range peak widths observe dataset. Ideally, examining multiple EICs goal. dataset, peak widths appear around 2 10 seconds. advise choosing range wide narrow peakwidth parameter can lead false positives negatives. determine ppm, deeper analysis dataset needed. clarified ppm depends instrument, users necessarily input vendor-advertised ppm. ’s determine accurately possible: following steps involve generating highly restricted MS area single mass peak per spectrum, representing cystine ion. m/z peaks extracted, absolute difference calculated finally expressed ppm. therefore, choose value close maximum within range parameter ppm, .e., 15 ppm. can now perform chromatographic peak detection adapted settings EICs. important note , properly estimate background noise, sufficient data points outside chromatographic peak need present. generally problem peak detection performed full LC-MS data set, peak detection EICs retention time range EIC needs sufficiently wide. function fails find peak EIC, initial troubleshooting step increase range. Additionally, signal--noise threshold snthresh reduced peak detection EICs, within small retention time range, enough signal present properly estimate background noise. Finally, case MS1 data points per peaks, setting CentWave’s advanced parameter extendLengthMSW TRUE can help peak detection. customized parameters, chromatographic peak detected sample. , use plot() function EICs visualize results. Figure 11. Chromatographic peak detection EICs. can see peak seems ot detected sample ions. indicates custom settings seem thus suitable dataset. now proceed apply entire dataset, extracting EICs ions evaluate confirm chromatographic peak detection worked expected. Note: revert value parameter snthresh default, , mentioned , background noise estimation reliable performed full data set. Parameter chunkSize findChromPeaks() defines number data files loaded memory processed simultaneously. parameter thus allows fine-tune memory demand well performance chromatographic peak detection step. plot EICs two selected internal standards evaluate chromatographic peak detection results. Figure 12. Chromatographic peak detection EICs processing. Peaks seem detected properly samples ions. indicates peak detection process entire dataset successful. identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands.","code":"#' Use default Centwave parameter param <- CentWaveParam() #' Look at the default parameters param Object of class: CentWaveParam Parameters: - ppm: [1] 25 - peakwidth: [1] 20 50 - snthresh: [1] 10 - prefilter: [1] 3 100 - mzCenterFun: [1] \"wMean\" - integrate: [1] 1 - mzdiff: [1] -0.001 - fitgauss: [1] FALSE - noise: [1] 0 - verboseColumns: [1] FALSE - roiList: list() - firstBaselineCheck: [1] TRUE - roiScales: numeric(0) - extendLengthMSW: [1] FALSE - verboseBetaColumns: [1] FALSE #' Evaluate for Cystine cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column #' Evaluate for Methionine met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column #' Restrict the data to signal from cystine in the first sample cst <- lcms1[1L] |> spectra() |> filterRt(rt = c(208, 218)) |> filterMzRange(mz = fData(eic_cystine)[\"cystine_13C_15N\", c(\"mzmin\", \"mzmax\")]) #' Show the number of peaks per m/z filtered spectra lengths(cst) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #' Calculate the difference in m/z values between scans mz_diff <- cst |> mz() |> unlist() |> diff() |> abs() #' Express differences in ppm range(mz_diff * 1e6 / mean(unlist(mz(cst)))) [1] 0.08829605 14.82188728 #' Parameters adapted for chromatographic peak detection on EICs. param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2, snthresh = 2) #' Evaluate on the cystine ion cystine_test <- findChromPeaks(eic_cystine, param = param) chromPeaks(cystine_test) rt rtmin rtmax into intb maxo sn row column [1,] 209.251 207.577 212.878 4085.675 2911.376 2157.459 4 1 1 [2,] 209.251 206.182 213.995 24625.728 19074.407 12907.487 4 1 2 [3,] 209.252 207.020 214.274 19467.836 14594.041 9996.466 4 1 3 [4,] 209.251 207.577 212.041 4648.229 3202.617 2458.485 3 1 4 [5,] 208.974 206.184 213.159 23801.825 18126.978 11300.289 3 1 5 [6,] 209.250 207.018 213.714 25990.327 21036.768 13650.329 5 1 6 [7,] 209.252 207.857 212.879 4528.767 3259.039 2445.841 4 1 7 [8,] 209.252 207.299 213.995 23119.449 17274.140 12153.410 4 1 8 [9,] 208.972 206.740 212.878 28943.188 23436.119 14451.023 4 1 9 [10,] 209.252 207.578 213.437 4470.552 3065.402 2292.881 4 1 10 #' Evaluate on the methionine ion met_test <- findChromPeaks(eic_met, param = param) chromPeaks(met_test) rt rtmin rtmax into intb maxo sn row column [1,] 159.867 157.913 162.378 20026.61 14715.42 12555.601 4 1 1 [2,] 160.425 157.077 163.215 16827.76 11843.39 8407.699 3 1 2 [3,] 160.425 157.356 163.215 18262.45 12881.67 9283.375 3 1 3 [4,] 159.588 157.635 161.820 20987.72 15424.25 13327.811 4 1 4 [5,] 160.985 156.799 163.217 16601.72 11968.46 10012.396 4 1 5 [6,] 160.982 157.634 163.214 17243.24 12389.94 9150.079 4 1 6 [7,] 159.867 158.193 162.099 21120.10 16202.05 13531.844 3 1 7 [8,] 160.426 157.356 162.937 18937.40 13739.73 10336.000 3 1 8 [9,] 160.704 158.472 163.215 17882.21 12299.43 9395.548 3 1 9 [10,] 160.146 157.914 162.379 20275.80 14279.50 12669.821 3 1 10 #' Using the same settings, but with default snthresh param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms1 <- findChromPeaks(lcms1, param = param, chunkSize = 5) #' Update EIC internal standard object eics_is_noprocess <- eic_is eic_is <- chromatogram(lcms1,, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_noprocess) Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"refine-identified-chromatographic-peaks","dir":"Articles","previous_headings":"Data preprocessing","what":"Refine identified chromatographic peaks","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"identification chromatographic peaks using CentWave algorithm can sometimes result artifacts, overlapping split peaks. address issue, refineChromPeaks() function utilized, conjunction MergeNeighboringPeaksParam, aims merging split peaks. show examples CentWave peak detection artifacts. examples pre-selected illustrate necessity next step: Figure 13. Examples CentWave peak detection artifacts. cases signal presumably single type ion split two separate chromatographic peaks (indicated vertical line). MergeNeigboringPeaksParam allows combine split peaks. parameters algorithm defined : expandMz: Suggested kept relatively small (0.0015) prevent merging isotopes. expandRt: Usually set approximately half size average retention time width used chromatographic peak detection (case, 2.5 seconds). minProp: Used determine whether candidates merged. Chromatographic peaks overlapping m/z ranges (expanded side expandMz) tail--head distance retention time dimension less 2 * expandRt, signal higher minProp apex intensity chromatographic peak lower intensity, merged. Values parameter small avoid merging closely co-eluting ions, isomers. test settings EICs split peaks. Figure 14. Examples CentWave peak detection artifacts merging. can observe artificially split peaks appropriately merged. Therefore, next apply settings entire dataset. peak merging, column \"merged\" result object’s chromPeakData() data frame can used evaluate chromatographic peaks result represent signal merged, originally identified chromatographic peaks. proceeding next preprocessing step generally suggested evaluate results chromatographic peak detection EICs e.g. internal standards compounds/ions known present samples. Additionally, evaluating comparing number identified chromatographic peaks samples data set can help spotting potentially problematic samples. count number chromatographic peaks per sample show numbers table. Table 5. Samples number identified chromatographic peaks. similar number chromatographic peaks identified within various samples data set. Additional options evaluate results chromatographic peak detection can implemented using plotChromPeaks() function summarizing results using base R commands.","code":"Processing chromatographic peaks #' set up the parameter param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) #' Perform the peak refinement on the EICs eics <- refineChromPeaks(eics, param = param) plot(eics) #' Apply on whole dataset lcms1 <- refineChromPeaks(lcms1, param = param, chunkSize = 5) Reduced from 106714 to 89182 chromatographic peaks. chromPeakData(lcms1)$merged |> table() FALSE TRUE 79908 9274 eics_is_chrompeaks <- eic_is eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_chrompeaks) eic_cystine <- eic_is[\"cystine_13C_15N\", ] eic_met <- eic_is[\"methionine_13C_15N\", ] #' Count the number of peaks per sample and summarize them in a table. data.frame(sample_name = sampleData(lcms1)$sample_name, peak_count = as.integer(table(chromPeaks(lcms1)[, \"sample\"]))) |> kable(format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"retention-time-alignment","dir":"Articles","previous_headings":"","what":"Retention time alignment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Despite using chromatographic settings conditions retention time shifts unavoidable. Indeed, performance instrument can change time, example due small variations environmental conditions, temperature pressure. shifts generally small samples measured within batch/measurement run, can considerable data experiment acquired across longer time period. evaluate presence shift extract plot BPC QC samples. Figure 15. BPC QC samples. QC samples representing sample (pool) measured regular intervals measurement run experiment measured day. Still, small shifts can observed, especially region 100 150 seconds. facilitate proper correspondence signals across samples (hence definition LC-MS features), essential minimize differences retention times. Theoretically, proceed two steps: first select QC samples dataset first alignment , using -called anchor peaks. way can assume linear shift time, since always measuring sample different regular time intervals. Despite external QCs data set, still use subset-based alignment assuming retention time shifts independent different sample matrix (human serum plasma) instead mostly instrument-dependent. Note also possible manually specify anchor peaks, respectively retention times align data set external, reference, data set. information provided vignettes xcms package. calculating much adjust retention time samples, apply shift also study samples. xcms retention time alignment can performed using adjustRtime() function alignment algorithm. example use PeakGroups method [@smith_xcms_2006] performs alignment minimizing differences retention times set anchor peaks different samples. method requires initial correspondence analysis match/group chromatographic peaks across samples algorithm selects anchor peaks alignment. initial correspondence, use PeakDensity approach [@smith_xcms_2006] groups chromatographic peaks similar m/z retention time LC-MS features. parameters algorithm, can configured using PeakDensityParam object, sampleGroups, minFraction, binSize, ppm bw. binSize, ppm bw allow specify similar chromatographic peaks’ m/z retention time values need consider grouping feature. binSize ppm define required similarity m/z values. Within m/z bin (defined binSize ppm) areas along retention time axis high chromatographic peak density (considering peaks samples) identified, chromatographic peaks within regions considered grouping feature. High density areas identified using base R density() function, bw parameter: higher values define wider retention time areas, lower values require chromatographic peaks similar retention times. parameter can seen black line plot , corresponding smoothness density curve. Whether candidate peaks get grouped feature depends also parameters sampleGroups minFraction: sampleGroups provide, sample, sample group belongs . minFraction expected value 0 1 defining proportion samples within least one sample groups (defined sampleGroups) chromatographic peaks detected group feature. initial correspondence, parameters don’t need fully optimized. Selection dataset-specific parameter values described detail next section. dataset, use small values binSize ppm , importantly, also parameter bw, since data set ultra high performance (UHP) LC setup used. minFraction use high value (0.9) ensure features defined chromatographic peaks present almost samples one sample group (can used anchor peaks actual alignment). base alignment later QC samples hence define sampleGroups binary variable grouping samples either study, QC group. Figure 16. Initial correspondence analysis. PeakGroups-based alignment can next performed using adjustRtime() function PeakGroupsParam parameter object. parameters algorithm : subsetAdjust subset: Allows subset alignment. base retention time alignment QC samples, .e., retention time shifts estimated based repeatedly measured samples. resulting adjustment applied entire data. data sets QC samples (e.g. sample pools) measured repeatedly, strongly suggest use method. Note also subset-based alignment samples ordered injection index (.e., order measured measurement run). minFraction: value 0 1 defining proportion samples (full data set, data subset defined subset) chromatographic peak identified use anchor peak. contrast PeakDensityParam parameter used define proportion within sample group. span: PeakGroups method allows, depending data, adjust regions along retention time axis differently. enable local alignments LOESS function used parameter defines degree smoothing function. Generally, values 0.4 0.6 used, however, suggested evaluate alignment results eventually adapt parameters result satisfactory. perform alignment data set based retention times anchor peaks defined subset QC samples. Alignment adjusted retention times spectra data set, well retention times identified chromatographic peaks. alignment performed, user evaluate results using plotAdjustedRtime() function. function visualizes difference adjusted raw retention time sample y-axis along adjusted retention time x-axis. Dot points represent position used anchor peak along retention time axis. optimal alignment areas along retention time axis, anchor peaks scattered retention time dimension. Figure 17. Retention time alignment results. samples present data set measured within measurement run, resulting small retention time shifts. Therefore, little adjustments needed performed (shifts maximum 1 second can seen plot ). Generally, magnitude adjustment seen plots match expectation analyst. can also compare BPC alignment. get original data, .e. raw retention times, can use dropAdjustedRtime() function: Figure 18. BPC alignment. largest shift can observed retention time range 120 130s. Apart retention time range, little changes can observed. next evaluate impact alignment EICs selected internal standards. thus first extract ion chromatograms alignment. can now evaluate alignment effect test ions. plot EICs alignment isotope labeled cystine methionine. Figure 19. EICs cystine methionine alignment. non-endogenous cystine ion already well aligned difference minimal. methionine ion, however, shows improvement alignment. addition visual inspection results, also evaluate impact alignment comparing variance retention times internal standards alignment. end, first need identify chromatographic peaks sample m/z retention time close expected values internal standard. use matchValues() function MetaboAnnotation package [@rainer_modular_2022] using MzRtParam method identify chromatographic peaks similar m/z (+/- 50 ppm) retention time (+/- 10 seconds) internal standard’s values. parameters mzColname rtColname specify column names query () target (chromatographic peaks) contain m/z retention time values match entities. perform matching separately sample. internal standard every sample, use filterMatches() function SingleMatchParam() parameter select chromatographic peak highest intensity. now internal standard ID chromatographic peak sample likely represents signal ion. can now extract retention times chromatographic peaks alignment. can now evaluate impact alignment retention times internal standards across full data set: Figure 20. Retention time variation internal standards alignment. average, variation retention times internal standards across samples slightly reduced alignment.","code":"#' Get QC samples QC_samples <- sampleData(lcms1)$phenotype == \"QC\" #' extract BPC lcms1[QC_samples] |> chromatogram(aggregationFun = \"max\", chromPeaks = \"none\") |> plot(col = col_phenotype[\"QC\"], main = \"BPC of QC samples\") |> grid() # Initial correspondence analysis param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype == \"QC\", minFraction = 0.9, binSize = 0.01, ppm = 10, bw = 2) lcms1 <- groupChromPeaks(lcms1, param = param) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) #' Define parameters of choice subset <- which(sampleData(lcms1)$phenotype == \"QC\") param <- PeakGroupsParam(minFraction = 0.9, extraPeaks = 50, span = 0.5, subsetAdjust = \"average\", subset = subset) #' Perform the alignment lcms1 <- adjustRtime(lcms1, param = param) Performing retention time correction using 5373 peak groups. Aligning sample number 2 against subset ... OK Aligning sample number 3 against subset ... OK Aligning sample number 5 against subset ... OK Aligning sample number 6 against subset ... OK Aligning sample number 8 against subset ... OK Aligning sample number 9 against subset ... OK #' Visualize alignment results plotAdjustedRtime(lcms1, col = paste0(col_sample, 80), peakGroupsPch = 1) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") #' Get data before alignment lcms1_raw <- dropAdjustedRtime(lcms1) #' Apply the adjusted retention time to our dataset lcms1 <- applyAdjustedRtime(lcms1) #' Plot the BPC before and after alignment par(mfrow = c(2, 1), mar = c(2, 1, 1, 0.5)) chromatogram(lcms1_raw, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC before alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) chromatogram(lcms1, aggregationFun = \"max\", chromPeaks = \"none\") |> plot(main = \"BPC after alignment\", col = paste0(col_sample, 80)) grid() legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\", horiz = TRUE) #' Store the EICs before alignment eics_is_refined <- eic_is #' Update the EICs eic_is <- chromatogram(lcms1, rt = as.matrix(intern_standard[, c(\"rtmin\", \"rtmax\")]), mz = as.matrix(intern_standard[, c(\"mzmin\", \"mzmax\")])) Processing chromatographic peaks fData(eic_is) <- fData(eics_is_refined) #' Extract the EICs for the test ions eic_cystine <- eic_is[\"cystine_13C_15N\"] eic_met <- eic_is[\"methionine_13C_15N\"] par(mfrow = c(2, 2), mar = c(4, 4.5, 2, 1)) old_eic_cystine <- eics_is_refined[\"cystine_13C_15N\"] plot(old_eic_cystine, main = \"Cystine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) old_eic_met <- eics_is_refined[\"methionine_13C_15N\"] plot(old_eic_met, main = \"Methionine before alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) plot(eic_cystine, main = \"Cystine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"cystine_13C_15N\", \"RT\"], col = \"red\", lty = 3) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1, bty = \"n\") plot(eic_met, main = \"Methionine after alignment\", peakType = \"none\", col = paste0(col_sample, 80)) grid() abline(v = intern_standard[\"methionine_13C_15N\", \"RT\"], col = \"red\", lty = 3) #' Extract the matrix with all chromatographic peaks and add a column #' with the ID of the chromatographic peak chrom_peaks <- chromPeaks(lcms1) |> as.data.frame() chrom_peaks$peak_id <- rownames(chrom_peaks) #' Define the parameters for the matching and filtering of the matches p_1 <- MzRtParam(ppm = 50, toleranceRt = 10) p_2 <- SingleMatchParam(duplicates = \"top_ranked\", column = \"target_maxo\", decreasing = TRUE) #' Iterate over samples and identify for each the chromatographic peaks #' with similar m/z and retention time than the onse from the internal #' standard, and extract among them the ID of the peaks with the #' highest intensity. intern_standard_peaks <- lapply(seq_along(lcms1), function(i) { tmp <- chrom_peaks[chrom_peaks[, \"sample\"] == i, , drop = FALSE] mtch <- matchValues(intern_standard, tmp, mzColname = c(\"mz\", \"mz\"), rtColname = c(\"RT\", \"rt\"), param = p_1) mtch <- filterMatches(mtch, p_2) mtch$target_peak_id }) |> do.call(what = cbind) #' Define the index of the selected chromatographic peaks in the #' full chromPeaks matrix idx <- match(intern_standard_peaks, rownames(chromPeaks(lcms1))) #' Extract the raw retention times for these rt_raw <- chromPeaks(lcms1_raw)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) #' Extract the adjusted retention times for these rt_adj <- chromPeaks(lcms1)[idx, \"rt\"] |> matrix(ncol = length(lcms1_raw)) list(all_raw = rowSds(rt_raw, na.rm = TRUE), all_adj = rowSds(rt_adj, na.rm = TRUE) ) |> vioplot(ylab = \"sd(retention time)\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"correspondence","dir":"Articles","previous_headings":"","what":"Correspondence","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"briefly touched subject correspondence determine anchor peaks alignment. Generally, goal correspondence analysis identify chromatographic peaks originate types ions samples experiment group LC-MS features. point, proper configuration parameter bw crucial. illustrate sensible choices parameter’s value can made. use plotChromPeakDensity() function simulate correspondence analysis default values PeakGroups extracted ion chromatograms two selected isotope labeled ions. plot shows EIC top panel, apex position chromatographic peaks different samples (y-axis), along retention time (x-axis) lower panel. Figure 21. Initial correspondence analysis, Cystine. Figure 22. Initial correspondence analysis, Methionine. Grouping peaks depends smoothness previousl mentionned density curve can configured parameter bw. seen , smoothness high properly group features. looking default parameters, can observe indeed, bw parameter set bw = 30, high modern UHPLC-MS setups. reduce value parameter 1.8 evaluate impact. Figure 23. Correspondence analysis optimized parameters, Cystine. Figure 24. Correspondence analysis optimized parameters, Methionine. can observe peaks now grouped accurately single feature test ion. important parameters optimized : binsize: data generated high resolution MS instrument, thus select low value paramete. ppm: TOF instruments, suggested use value ppm larger 0 accommodate higher measurement error instrument larger m/z values. minFraction: set minFraction = 0.75, hence defining features chromatographic peak identified least 75% samples one sample groups. sampleGroups: use information available sampleData’s \"phenotype\" column. correspondence analysis suggested evaluate results selected EICs. extract signal m/z similar isotope labeled methionine larger retention time range. Importantly, show actual correspondence results, set simulate = FALSE plotChromPeakDensity() function. Figure 25. Correspondence analysis results, Methionine. hoped, signal two different ions now grouped separate features. Generally, correspondence results evaluated extracted chromatograms. Another interesting information look distribution features along retention time axis. Table 5. Distribution features along retention time axis. results correspondence analysis now stored, along results preprocessing steps, within XcmsExperiment result object. correspondence results, .e., definition LC-MS features, can extracted using featureDefinitions() function. data frame provides average m/z retention time (columns \"mzmed\" \"rtmed\") characterize LC-MS feature. Column, \"peakidx\" contains indices chromatographic peaks assigned feature. actual abundances features, represent also final preprocessing results, can extracted featureValues() function: can note features (e.g. F0003 F0006) missing values samples. expected certain degree samples features, respectively ions, need present. address next section.","code":"#' Default parameter for the grouping and apply them to the test ions BPC param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 30) param Object of class: PeakDensityParam Parameters: - sampleGroups: [1] \"QC\" \"CVD\" \"CTR\" \"QC\" \"CTR\" \"CVD\" \"QC\" \"CTR\" \"CVD\" \"QC\" - bw: [1] 30 - minFraction: [1] 0.5 - minSamples: [1] 1 - binSize: [1] 0.25 - maxFeatures: [1] 50 - ppm: [1] 0 plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Updating parameters param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, bw = 1.8) plotChromPeakDensity( eic_cystine, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_cystine)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_cystine)[, \"sample\"]], 20), peakPch = 16) plotChromPeakDensity(eic_met, param = param, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(eic_met)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic_met)[, \"sample\"]], 20), peakPch = 16) #' Define the settings for the param param <- PeakDensityParam(sampleGroups = sampleData(lcms1)$phenotype, minFraction = 0.75, binSize = 0.01, ppm = 10, bw = 1.8) #' Apply to whole data lcms1 <- groupChromPeaks(lcms1, param = param) #' Extract chromatogram for an m/z similar to the one of the labeled methionine chr_test <- chromatogram(lcms1, mz = as.matrix(intern_standard[\"methionine_13C_15N\", c(\"mzmin\", \"mzmax\")]), rt = c(145, 200), aggregationFun = \"max\") Processing chromatographic peaks Processing features plotChromPeakDensity( chr_test, simulate = FALSE, col = paste0(col_sample, \"80\"), peakCol = col_sample[chromPeaks(chr_test)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(chr_test)[, \"sample\"]], 20), peakPch = 16) # Bin features per RT slices vc <- featureDefinitions(lcms1)$rtmed breaks <- seq(0, max(vc, na.rm = TRUE) + 1, length.out = 15) |> round(0) cuts <- cut(vc, breaks = breaks, include.lowest = TRUE) table(cuts) |> kable(format = \"pipe\") #' Definition of the features featureDefinitions(lcms1) |> head() mzmed mzmin mzmax rtmed rtmin rtmax npeaks CTR CVD QC FT0001 50.98979 50.98949 50.99038 203.6001 203.1181 204.2331 8 1 3 4 FT0002 51.05904 51.05880 51.05941 191.1675 190.8787 191.5050 9 2 3 4 FT0003 51.98657 51.98631 51.98699 203.1467 202.6406 203.6710 7 0 3 4 FT0004 53.02036 53.02009 53.02043 203.2343 202.5652 204.0901 10 3 3 4 FT0005 53.52080 53.52051 53.52102 203.1936 202.8490 204.0901 10 3 3 4 FT0006 54.01007 54.00988 54.01015 159.2816 158.8499 159.4484 6 1 3 2 peakidx ms_level FT0001 7702, 16.... 1 FT0002 7176, 16.... 1 FT0003 7680, 17.... 1 FT0004 7763, 17.... 1 FT0005 8353, 17.... 1 FT0006 5800, 15.... 1 #' Extract feature abundances featureValues(lcms1, method = \"sum\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 NA 481.7436 FT0002 710.8078 875.9192 NA 693.6997 FT0003 445.5711 613.4410 NA 497.8866 FT0004 16994.5260 24605.7340 19766.707 17808.0933 FT0005 3284.2664 4526.0531 3521.822 3379.8909 FT0006 10681.7476 10009.6602 NA 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 NA 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 NA 634.9370 449.0933 NA FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 NA 7296.4262 NA 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 NA"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"gap-filling","dir":"Articles","previous_headings":"","what":"Gap filling","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"previously observed missing values (NA) attributed various reasons. Although might represent genuinely missing value, indicating ion (feature) truly present particular sample, also result failure preceding chromatographic peak detection step. crucial able recover missing values latter category much possible reduce eventual need data imputation. next examine prevalent missing values present dataset: can observe substantial number missing values values dataset. Let’s therefore delve process gap-filling. first evaluate example features chromatographic peak detected samples: Figure 26. Examples chromatographic peaks missing values. instances, chromatographic peak identified one two selected samples (red line), hence missing value reported feature particular samples (blue line). However, cases, signal measured samples, thus, reporting missing value correct example. signal feature low, likely reason peak detection failed. rescue signal cases, fillChromPeaks() function can used ChromPeakAreaParam approach. method defines m/z-retention time area feature based detected peaks, signal respective ion expected. integrates intensities within area samples missing values feature. reported feature abundance. apply method using default parameters. fillChromPeaks() thus rescue missing data data set. Note , even sample ion present, worst case noise integrated, expected much lower actual chromatographic peak signal. Let’s look previously missing values : Figure 27. Examples chromatographic peaks missing values gap-filling. gap-filling, also blue colored sample chromatographic peak present peak area reported feature abundance sample. assess effectiveness gap-filling method rescuing signals, can also plot average signal features least one missing value average filled-signal. advisable perform analysis repeatedly measured samples; case, QC samples used. , extract: Feature values detected chromatographic peaks setting filled = FALSE featuresValues() call. filled-signal first extracting detected gap-filled abundances replace values detected chromatographic peaks NA. , calculate row averages matrices plot . detected (x-axis) gap-filled (y-axis) values QC samples highly correlated. Especially higher abundances, agreement high, low intensities, can expected, differences higher trending correlation line. , addition, fit linear regression line data summarize results linear regression line slope 1.12 intercept -1.62. indicates filled-signal average 1.12 times higher detected signal.","code":"#' Percentage of missing values sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 26.41597 ftidx <- which(is.na(rowSums(featureValues(lcms1)))) fts <- rownames(featureDefinitions(lcms1))[ftidx] farea <- featureArea(lcms1, features = fts[1:2]) chromatogram(lcms1[c(2, 3)], rt = farea[, c(\"rtmin\", \"rtmax\")], mz = farea[, c(\"mzmin\", \"mzmax\")]) |> plot(col = c(\"red\", \"blue\"), lwd = 2) Processing chromatographic peaks #' Fill in the missing values in the whole dataset lcms1 <- fillChromPeaks(lcms1, param = ChromPeakAreaParam(), chunkSize = 5) #' Percentage of missing values after gap-filling sum(is.na(featureValues(lcms1))) / length(featureValues(lcms1)) * 100 [1] 5.155492 Processing chromatographic peaks #' Get only detected signal in QC samples vals_detect <- featureValues(lcms1, filled = FALSE)[, QC_samples] #' Get detected and filled-in signal vals_filled <- featureValues(lcms1)[, QC_samples] #' Replace detected signal with NA vals_filled[!is.na(vals_detect)] <- NA #' Identify features with at least one filled peak has_filled <- is.na(rowSums(vals_detect)) #' Calculate row averages for features with missing values avg_detect <- rowMeans(vals_detect[has_filled, ], na.rm = TRUE) avg_filled <- rowMeans(vals_filled[has_filled, ], na.rm = TRUE) #' Plot the values against each other (in log2 scale) plot(log2(avg_detect), log2(avg_filled), xlim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), ylim = range(log2(c(avg_detect, avg_filled)), na.rm = TRUE), pch = 21, bg = \"#00000020\", col = \"#00000080\") grid() abline(0, 1) #' fit a linear regression line to the data l <- lm(log2(avg_filled) ~ log2(avg_detect)) summary(l) Call: lm(formula = log2(avg_filled) ~ log2(avg_detect)) Residuals: Min 1Q Median 3Q Max -6.8176 -0.3807 0.1725 0.5492 6.7504 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.62359 0.11545 -14.06 <2e-16 *** log2(avg_detect) 1.11763 0.01259 88.75 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9366 on 2846 degrees of freedom (846 observations deleted due to missingness) Multiple R-squared: 0.7346, Adjusted R-squared: 0.7345 F-statistic: 7877 on 1 and 2846 DF, p-value: < 2.2e-16"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"preprocessing-results","dir":"Articles","previous_headings":"","what":"Preprocessing results","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"final results LC-MS data preprocessing stored within XcmsExperiment object. includes identified chromatographic peaks, alignment results, well correspondence results. addition, guarantee reproducibility, result object keeps track performed processing steps, including individual parameter objects used configure . processHistory() function returns list various applied processing steps chronological order. , extract information first step performed preprocessing. processParam() function used extract actual parameter class used configure processing step. final result whole LC-MS data preprocessing two-dimensional matrix abundances -called LC-MS features samples. Note stage analysis features characterized m/z retention time don’t yet information metabolite feature represent. seen , feature matrix can extracted featureValues() function corresponding feature characteristics (.e., m/z retention time values) using featureDefinitions() function. Thus, two arrays extracted xcms result object used/imported analysis packages processing. example also exported tab delimited text files, used external tool, used, also MS2 spectra available, feature-based molecular networking GNPS analysis environment [@nothias_feature-based_2020]. processing R, reference link raw MS data required, suggested extract xcms preprocessing result using quantify() function SummarizedExperiment object, Bioconductor’s default container data biological assays/experiments. simplifies integration Bioconductor analysis packages. quantify() function takes parameters featureValues() function, thus, call extract SummarizedExperiment detected, gap-filled, feature abundances: Sample identifications xcms result’s sampleData() now available colData() (column, sample annotations) featureDefinitions() rowData() (row, feature annotations). feature values added first assay() SummarizedExperiment even processing history available object’s metadata(). SummarizedExperiment supports multiple assays, numeric matrices dimensions. thus add detected gap-filled feature abundances additional assay SummarizedExperiment. Feature abundances can extracted assay() function. extract first 6 lines detected gap-filled feature abundances: advantage, addition container full preprocessing results also possibility easy intuitive creation data subsets ensuring data integrity. example easy subset full data selection features /samples: moving next step analysis, advisable save preprocessing results. multiple format options save , can found MsIO package documentation. save XcmsExperiment object file format handled alabster framework, ensures object can easily read languages like Python Javascript well loaded easily back R.","code":"#' Check first step of the process history processHistory(lcms1)[[1]] Object of class \"XProcessHistory\" type: Peak detection date: Mon Oct 21 23:01:56 2024 info: fileIndex: 1,2,3,4,5,6,7,8,9,10 Parameter class: CentWaveParam MS level(s) 1 #' Extract results as a SummarizedExperiment res <- quantify(lcms1, method = \"sum\", filled = FALSE) res class: SummarizedExperiment dim: 9068 10 metadata(6): '' '' ... '' '' assays(1): raw rownames(9068): FT0001 FT0002 ... FT9067 FT9068 rowData names(11): mzmed mzmin ... QC ms_level colnames(10): MS_QC_POOL_1_POS.mzML MS_A_POS.mzML ... MS_F_POS.mzML MS_QC_POOL_4_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index assays(res)$raw_filled <- featureValues(lcms1, method = \"sum\", filled = TRUE ) #' Different assay in the SummarizedExperiment object assayNames(res) [1] \"raw\" \"raw_filled\" assay(res, \"raw_filled\") |> head() MS_QC_POOL_1_POS.mzML MS_A_POS.mzML MS_B_POS.mzML MS_QC_POOL_2_POS.mzML FT0001 421.6162 689.2422 411.3295 481.7436 FT0002 710.8078 875.9192 457.5920 693.6997 FT0003 445.5711 613.4410 277.5022 497.8866 FT0004 16994.5260 24605.7340 19766.7069 17808.0933 FT0005 3284.2664 4526.0531 3521.8221 3379.8909 FT0006 10681.7476 10009.6602 9599.9701 10800.5449 MS_C_POS.mzML MS_D_POS.mzML MS_QC_POOL_3_POS.mzML MS_E_POS.mzML FT0001 314.7567 635.2732 439.6086 570.5849 FT0002 781.2416 648.4344 700.9716 1054.0207 FT0003 425.3774 634.9370 449.0933 556.2544 FT0004 22780.6683 22873.1061 16965.7762 23432.1252 FT0005 4396.0762 4317.7734 3270.5290 4533.8667 FT0006 4792.2390 7296.4262 2382.1788 9236.9799 MS_F_POS.mzML MS_QC_POOL_4_POS.mzML FT0001 579.9360 437.0340 FT0002 534.4577 711.0361 FT0003 461.0465 232.1075 FT0004 22198.4607 16796.4497 FT0005 4161.0132 3142.2268 FT0006 6817.8785 6911.5439 res[1:14, 3:8] class: SummarizedExperiment dim: 14 6 metadata(6): '' '' ... '' '' assays(2): raw raw_filled rownames(14): FT0001 FT0002 ... FT0013 FT0014 rowData names(11): mzmed mzmin ... QC ms_level colnames(6): MS_B_POS.mzML MS_QC_POOL_2_POS.mzML ... MS_QC_POOL_3_POS.mzML MS_E_POS.mzML colData names(11): sample_name derived_spectra_data_file ... phenotype injection_index #' Save the preprocessing results #' d <- file.path(tempdir(), \"objects/lcms1\") # saveMsObject(lcms1, AlabasterParam(path = d)) #' for now let's do R object because the previous method is not implemented yet. save(lcms1, file = \"preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"data-normalization","dir":"Articles","previous_headings":"","what":"Data normalization","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"preprocessing, data normalization scaling might need applied remove technical variances data. simple approaches like median scaling can implemented lines R code, advanced normalization algorithms available packages Bioconductor’s preprocessCore. comprehensive workflow “Notame” also propose interesting normalization approach adaptable scalable user dataset [@klavus_notame_2020]. Generally, LC-MS data, bias can categorized three main groups[@broadhurst_guidelines_2018]: Variances introduced sample collection initial processing, can include differences sample amounts. type bias expected sample-specific affect signals sample way. Methods like median scaling, LOESS quantiles normalization can adjust bias. Signal drifts along measurement samples experiment. Reasons drifts can related aging instrumentation used (columns, detector), also changes metabolite abundances characteristics due reactions modifications, oxidation. changes expected affect samples measured later run rather ones measured beginning. reason, bias can play major role large experiments bias can play major role large experiments measured long time range usually considered affect individual metabolites (metabolite groups) differently. adjustment, moving average linear regression-based approaches can used. latter can example performed using adjust_lm() function MetaboCoreUtils package. Batch-related biases. comprise noise specific larger set samples, can set samples measured one LC-MS measurement run (.e. one analysis plate) samples measured using specific batch reagents. noise assumed affect samples one batch way linear modeling-based approaches can used adjust . Unwanted variation can arise various sources highly dependent experiment. Therefore, data normalization chosen carefully based experimental design, statistical aims, balance accuracy precision achieved use auxiliary information. Sample preparation biases can evaluated using internal standards, depending however also added sample mixes sample processing. Repeated measurements QC samples hand allows estimate correct LC-MS specific biases. Also, proper planning experiment, measurement study samples random order, can largely avoid biases introduced mentioned sources variance. workflow present tools assess data quality evaluate need normalization well options normalization. space reasons able provide solutions adjust possible sources variation. principal component analysis (PCA) helpful tool initial, unsupervised, visualization data also provides insights potential quality issues data. order apply PCA measured feature abundances, need however impute (still present) missing values. assume missing values (gap-filling step) represent signal detection limit. cases, missing values can replaced random values sampled uniform distribution, ranging half smallest measured value smallest measured value specific feature. uniform distribution defined two parameters (minimum maximum) values equal probability selected. impute missing values approach add resulting data matrix new assay result object. PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data. Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization. Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses. previous RLA plot showed data biases need corrected. Therefore, implement -sample normalization using filled-features. process effectively mitigates variations influenced technical issues, differences sample preparation, processing injection methods. instance, employ commonly used technique known median scaling [@de_livera_normalizing_2012]. method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary. crucial evaluate effectiveness normalization process. can achieved comparing distribution log2 transformed feature abundances normalization. Additionally, RLA plots can used assess tightness replicates within groups compare behavior groups. Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical. compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero. next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data. overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":"#' Load preprocessing results ## load(\"SumExp.RData\") ## loadResults(RDataParam(\"data.RData\")) #' Impute missing values using an uniform distribution na_unidis <- function(z) { na <- is.na(z) if (any(na)) { min = min(z, na.rm = TRUE) z[na] <- runif(sum(na), min = min/2, max = min) } z } #' Row-wise impute missing values and add the data as a new assay tmp <- apply(assay(res, \"raw_filled\"), MARGIN = 1, na_unidis) assays(res)$raw_filled_imputed <- t(tmp) #' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2) layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") # Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ] #' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/') #' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\") #' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"initial-quality-assessment","dir":"Articles","previous_headings":"","what":"Initial quality assessment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"principal component analysis (PCA) helpful tool initial, unsupervised, visualization data also provides insights potential quality issues data. order apply PCA measured feature abundances, need however impute (still present) missing values. assume missing values (gap-filling step) represent signal detection limit. cases, missing values can replaced random values sampled uniform distribution, ranging half smallest measured value smallest measured value specific feature. uniform distribution defined two parameters (minimum maximum) values equal probability selected. impute missing values approach add resulting data matrix new assay result object. PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data. Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization. Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses.","code":"#' Load preprocessing results ## load(\"SumExp.RData\") ## loadResults(RDataParam(\"data.RData\")) #' Impute missing values using an uniform distribution na_unidis <- function(z) { na <- is.na(z) if (any(na)) { min = min(z, na.rm = TRUE) z[na] <- runif(sum(na), min = min/2, max = min) } z } #' Row-wise impute missing values and add the data as a new assay tmp <- apply(assay(res, \"raw_filled\"), MARGIN = 1, na_unidis) assays(res)$raw_filled_imputed <- t(tmp) #' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2) layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") # Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ]"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"principal-component-analysis","dir":"Articles","previous_headings":"Data normalization","what":"Principal Component Analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"PCA powerful tool detecting biases data. dimensionality reduction technique, enables visualization data lower-dimensional space. context LC-MS data, PCA can used identify overall biases batch, sample, injection index, etc. However, important note PCA linear method may able detect biases data. plotting PCA, apply log2 transform, center scale data. log2 transformation applied stabilize variance centering remove dependency absolute abundances. Figure 28. PCA data. PCA shows clear separation study samples (plasma) QC samples (serum) first principal component (PC1). separation based phenotype visible third principal component (PC3). cases, can better option remove imputed values evaluate PCA . especially true imputed values replacing large proportion data.","code":"#' Log2 transform and scale data vals <- assay(res, \"raw_filled_imputed\") |> log2() |> t() |> scale(center = TRUE, scale = TRUE) #' Perform the PCA pca_res <- prcomp(vals, scale = FALSE, center = FALSE) #' Plot the results vals_st <- cbind(vals, phenotype = res$phenotype) pca_12 <- autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) pca_34 <- autoplot(pca_res, data = vals_st, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_12, pca_34, ncol = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"intensity-evaluation","dir":"Articles","previous_headings":"Data normalization","what":"Intensity evaluation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Global differences feature abundances samples (e.g. due sample-specific biases) can evaluated plotting distribution log2 transformed feature abundances using boxplots violin plots. show number detected chromatographic peaks per sample distribution log2 transformed feature abundances. Figure 29. Number detected peaks feature abundances. upper part plot show gap filling steps allowed rescue substantial number NAs allowed us consistent number feature values per sample. consistency aligns asspumption every sample similar amount features detected. Additionally observe , average, signal distribution individual samples similar. alternative way evaluate differences abundances samples relative log abundance (RLA) plots [@de_livera_normalizing_2012]. RLA value abundance feature sample relative median abundance feature across multiple samples. can discriminate within group across group RLAs, depending whether abundance compared samples within sample group across samples. Within group RLA plots assess tightness replicates within groups median close zero low variation around . used across groups, allow compare behavior groups. Generally, -sample differences can easily spotted using RLA plots. calculate visualize within group RLA values using rowRla() function MsCoreUtils package defining parameter f sample groups. Figure 30. RLA plot raw data filled data. RLA plot , can observe medians samples indeed centered around 0. Exception two CVD samples. Thus, distribution signals across samples comparable, differences seem present require sample normalization.","code":"layout(mat = matrix(1:3, ncol = 1), height = c(0.2, 0.2, 0.8)) par(mar = c(0.2, 4.5, 0.2, 3)) barplot(apply(assay(res, \"raw\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) barplot(apply(assay(res, \"raw_filled\"), MARGIN = 2, function(x) sum(!is.na(x))), col = paste0(col_sample, 80), border = col_sample, ylab = \"# detected + filled peaks\", xaxt = \"n\", space = 0.012) grid(nx = NA, ny = NULL) vioplot(log2(assay(res, \"raw_filled\")), xaxt = \"n\", ylab = expression(log[2]~feature~abundance), col = paste0(col_sample, 80), border = col_sample) points(colMedians(log2(assay(res, \"raw_filled\")), na.rm = TRUE), type = \"b\", pch = 1) grid(nx = NA, ny = NULL) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\") par(mfrow = c(1, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(MsCoreUtils::rowRla(assay(res, \"raw_filled\"), f = res$phenotype, transform = \"log2\"), cex = 0.5, pch = 16, col = paste0(col_sample, 80), ylab = \"RLA\", border = col_sample, boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Relative log abundance\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = colData(res)$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty=3, lwd = 1, col = \"black\") legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.8, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"internal-standards","dir":"Articles","previous_headings":"Data normalization","what":"Internal standards","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Depending added sample mixes, allow evaluation variances introduced subsequent processing analysis steps. present experiment, added original plasma samples sample extraction included also protein lipid removal steps. can therefore used evaluate variances introduced sample extraction subsequent steps, can however used infer conclusions performance differences original sample collection (blood drawing, storage, plasma creation). use matchValues() function identify features representing signal . filter matches keep match single feature using filterMatches() function combination SingleMatchParam. internal standards play crucial role guiding normalization process. Given assumption samples artificially spiked, possess known ground truth—abundance intensity internal standard consistent. difference expected due technical differences/variance. Consequently, normalization aims minimize variation samples internal standard, reinforcing reliability analyses.","code":"# Do we keep IS in normalisation ? Does not give much info... Would simplify a bit #' Creating a column within our IS table intern_standard$feature_id <- NA_character_ #' Identify features matching m/z and RT of internal standards. fdef <- featureDefinitions(lcms1) fdef$feature_id <- rownames(fdef) match_intern_standard <- matchValues( query = intern_standard, target = fdef, mzColname = c(\"mz\", \"mzmed\"), rtColname = c(\"RT\", \"rtmed\"), param = MzRtParam(ppm = 50, toleranceRt = 10)) #' Keep only matches with a 1:1 mapping standard to feature. param <- SingleMatchParam(duplicates = \"remove\", column = \"score_rt\", decreasing = TRUE) match_intern_standard <- filterMatches(match_intern_standard, param) intern_standard$feature_id <- match_intern_standard$target_feature_id intern_standard <- intern_standard[!is.na(intern_standard$feature_id), ]"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"between-sample-normalisation","dir":"Articles","previous_headings":"","what":"Between sample normalisation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"previous RLA plot showed data biases need corrected. Therefore, implement -sample normalization using filled-features. process effectively mitigates variations influenced technical issues, differences sample preparation, processing injection methods. instance, employ commonly used technique known median scaling [@de_livera_normalizing_2012]. method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary.","code":"#' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/')"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"median-scaling","dir":"Articles","previous_headings":"Data normalization","what":"Median scaling","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"method involves computing median sample, followed determining median individual sample medians. ensures consistent median values sample throughout entire data set. Maintaining uniformity average total metabolite abundance across samples crucial effective implementation. process aims establish shared baseline central tendency metabolite abundance, mitigating impact sample-specific technical variations. approach fosters robust comparable analysis top features across data set. assumption normalizing based median, known lower sensitivity extreme values, enhances comparability top features ensures consistent average abundance across samples. median scaling calculated imputed non-imputed data, set stored separately within SummarizedExperiment object. approach facilitates testing various normalization strategies maintaining record processing steps undertaken, enabling easy regression previous stages necessary.","code":"#' Compute median and generate normalization factor mdns <- apply(assay(res, \"raw_filled\"), MARGIN = 2, median, na.rm = TRUE ) nf_mdn <- mdns / median(mdns) #' divide dataset by median of median and create a new assay. assays(res)$norm <- sweep(assay(res, \"raw_filled\"), MARGIN = 2, nf_mdn, '/') assays(res)$norm_imputed <- sweep(assay(res, \"raw_filled_imputed\"), MARGIN = 2, nf_mdn, '/')"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"assessing-overall-effectiveness-of-the-normalization-approach","dir":"Articles","previous_headings":"","what":"Assessing overall effectiveness of the normalization approach","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"crucial evaluate effectiveness normalization process. can achieved comparing distribution log2 transformed feature abundances normalization. Additionally, RLA plots can used assess tightness replicates within groups compare behavior groups. Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical. compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero. next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data. overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":"#' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\") #' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"principal-component-analysis-1","dir":"Articles","previous_headings":"Data normalization","what":"Principal Component Analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Figure 31. PC1 PC2 data normalization. Normalization large impact PC1 PC2, separation study groups PC3 seems better difference QC samples lower normalization (see ). Figure 32. PC3 PC4 data normalization. PCA plots show normalization process changed overall structure data. separation study QC samples remains . expected results normalization correct biological variance technical.","code":"#' Data before normalization vals_st <- cbind(vals, phenotype = res$phenotype) pca_raw <- autoplot(pca_res, data = vals_st, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Data after normalization vals_norm <- apply(assay(res, \"norm\"), MARGIN = 1, na_unidis) |> log2() |> scale(center = TRUE, scale = TRUE) pca_res_norm <- prcomp(vals_norm, scale = FALSE, center = FALSE) vals_st_norm <- cbind(vals_norm, phenotype = res$phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2) pca_raw <- autoplot(pca_res, data = vals_st , colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) pca_adj <- autoplot(pca_res_norm, data = vals_st_norm, colour = 'phenotype', x = 3, y = 4, scale = 0) + scale_color_manual(values = col_phenotype) grid.arrange(pca_raw, pca_adj, ncol = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"intensity-evaluation-1","dir":"Articles","previous_headings":"Data normalization","what":"Intensity evaluation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"compare RLA plots -sample normalization evaluate impact data. Figure 33. RLA plot normalization. normalization process effectively centered data around median medians samples now closer zero.","code":"par(mfrow = c(2, 1), mar = c(3.5, 4.5, 2.5, 1)) boxplot(rowRla(assay(res, \"raw_filled\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), cex.main = 1, outline = FALSE, xaxt = \"n\", main = \"Raw data\", boxwex = 1) grid(nx = NA, ny = NULL) legend(\"topright\", inset = c(0, -0.2), col = col_phenotype, legend = names(col_phenotype), lty=1, lwd = 2, xpd = TRUE, ncol = 3, cex = 0.7, bty = \"n\") abline(h = 0, lty=3, lwd = 1, col = \"black\") boxplot(rowRla(assay(res, \"norm\"), group = res$phenotype), cex = 0.5, pch = 16, ylab = \"RLA\", border = col_sample, col = paste0(col_sample, 80), boxwex = 1, outline = FALSE, xaxt = \"n\", main = \"Normallized data\", cex.main = 1) axis(side = 1, at = seq_len(ncol(res)), labels = res$sample_name) grid(nx = NA, ny = NULL) abline(h = 0, lty = 3, lwd = 1, col = \"black\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"coefficient-of-variation","dir":"Articles","previous_headings":"Data normalization","what":"Coefficient of variation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"next evaluate coefficient variation (CV, also referred relative standard deviation RSD) features across samples either QC study samples. QC samples, CV represent technical noise, study samples include also expected biological differences. Thus, normalization reduce CV QC samples, slightly reducing CV study samples. CV calculated using rowRsd() function MetaboCoreUtils package. setting mad = TRUE use robust calculation using median absolute deviation instead standard deviation. Table 6. Distribution CV values across samples raw normalized data. table shows distribution CV raw normalized data. first column highlights % data given CV value, e.g. 25% data CV equal lower 0.04557 QC_raw data. anticipated, CV values QCs, reflect technical variance, lower compared study samples, include technical biological variance. Overall, minimal disparity exists raw normalized data, positive indication normalization process introduced bias dataset, also reflects little differences average abundances sample raw data.","code":"#' Calculate the CV values index_study <- res$phenotype %in% c(\"CTR\", \"CVD\") index_QC <- res$phenotype == \"QC\" sample_res <- cbind( QC_raw = rowRsd(assay(res, \"raw_filled\")[, index_QC], na.rm = TRUE, mad = TRUE), QC_norm = rowRsd(assay(res, \"norm\")[, index_QC], na.rm = TRUE, mad = TRUE), Study_raw = rowRsd(assay(res, \"raw_filled\")[, index_study], na.rm = TRUE, mad = TRUE), Study_norm = rowRsd(assay(res, \"norm\")[, index_study], na.rm = TRUE, mad = TRUE) ) #' Summarize the values across features res_df <- data.frame( QC_raw = quantile(sample_res[, \"QC_raw\"], na.rm = TRUE), QC_norm = quantile(sample_res[, \"QC_norm\"], na.rm = TRUE), Study_raw = quantile(sample_res[, \"Study_raw\"], na.rm = TRUE), Study_norm = quantile(sample_res[, \"Study_norm\"], na.rm = TRUE) ) kable(res_df, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"conclusion-on-normalization","dir":"Articles","previous_headings":"Data normalization","what":"Conclusion on normalization","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"overall conclusion normalization process little variance present beginning, normalization however able center data around median (shown RLA plot). Given simplicity limited size example dataset, conclude normalization process stage. intricate datasets diverse biases, tailored approach devised. include also approaches adjust signal drifts batch effects. One possible option use linear-model based approach can example applied adjust_lm() function MetaboCoreUtils package.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"quality-control-feature-prefiltering","dir":"Articles","previous_headings":"","what":"Quality control: Feature prefiltering","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"normalizing data can now pre-filter clean data performing statistical analysis. general, pre-filtering samples features performed remove outliers. copy original result object also keep unfiltered data later comparisons. eliminate features exhibit high variability dataset. Repeatedly measured QC samples typically serve robust basis cleansing datasets allowing identify features excessively high noise. data set external QC samples used, .e. pooled samples different collection using slightly different sample matrix, utility filtering somewhat limited. comprehensive description guidelines data filtering untargeted metabolomic studies, please refer [@broadhurst_guidelines_2018]. first restrict data set features chromatographic peak detected least 2/3 samples least one study samples groups. ensures statistical tests carried later study samples performed reliable signal. Also, filter remove features mostly detected QC samples, study samples. filter can performed filterFeatures() function xcms package PercentMissingFilter setting. parameters filer: threshold: defines maximal acceptable percentage samples missing value(s) least one sample groups defined parameter f. f: factor defining sample groups. replacing \"QC\" sample group NA parameter f exclude QC samples evaluation consider study samples. threshold = 40 keep features peak detected 2 3 samples one sample groups. consider detected chromatographic peaks per sample, apply filter \"raw\" assay result object, contains abundance values detected chromatographic peaks (prior gap-filling). Following guidelines stated decided still use QC samples pre-filtering, basis represent similar bio-fluids study samples, thus, anticipate observing relatively similar metabolites affected similar measurement biases. therefore evaluate dispersion ratio (Dratio) [@broadhurst_guidelines_2018] features data set. accomplish task using function time DratioFilter parameter. filters exist function invite user explore decide best dataset. Dratio filter powerful tool identify features exhibit high variability data, relating variance observed QC samples study samples. setting threshold 0.4, remove features high degree variability QC study samples. example, feature deviation QC higher 40% (threshold = 0.4)deviation study samples removed. filtering step ensures features retained considerably lower technical biological variance. Note rowDratio() rowRsd() functions MetaboCoreUtils package used calculate actual numeric values estimates used filtering, e.g. evaluate distribution whole data set identify data set-dependent threshold values. Finally, evaluate number features left filtering steps calculate percentage features removed. dataset reduced 9068 4277 features. remove considerable amount features expected want focus reliable features analysis. rest analysis need separate QC samples study samples. store QC samples separate object later use. addition calculate CV QC samples add additional column rowData() result object. used later prioritize identified significant features e.g. low technical noise. Now data set preprocessed, normalized filtered, can start evaluate distribution data estimate variation due biology.","code":"#' Number of features before filtering nrow(res) [1] 9068 #' keep unfiltered object res_unfilt <- res #' Limit features to those with at least two detected peaks in one study group. #' Setting the value for QC samples to NA excludes QC samples from the #' calculation. f <- res$phenotype f[f == \"QC\"] <- NA f <- as.factor(f) res <- filterFeatures(res, PercentMissingFilter(f = f, threshold = 40), assay = \"raw\") 1808 features were removed #' Compute and filter based on the Dratio filter_dratio <- DratioFilter(threshold = 0.4, qcIndex = res$phenotype == \"QC\", studyIndex = res$phenotype != \"QC\", mad = TRUE) res <- filterFeatures(res, filter = filter_dratio, assay = \"norm_imputed\") 2983 features were removed #' Number of features after analysis nrow(res) [1] 4277 #' Percentage left: end/beginning nrow(res)/nrow(res_unfilt) * 100 [1] 47.16586 res_qc <- res[, res$phenotype == \"QC\"] res <- res[, res$phenotype != \"QC\"] #' Calculate the QC's CV and add as feature variable to the data set rowData(res)$qc_cv <- assay(res_qc, \"norm\") |> rowRsd()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"differential-abundance-analysis","dir":"Articles","previous_headings":"","what":"Differential abundance analysis","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"normalization quality control, next step identify features differentially abundant study groups. crucial step allows us identify potential biomarkers metabolites associated study groups. various approaches methods available identification features interest. workflow use multiple linear regression analysis identify features significantly difference abundances CVD CTR study group. performing tests evaluate similarities study samples using PCA (excluding QC samples avoid influencing results). Figure 34. PCA data normalization quality control. samples clearly separate study group PCA indicating differences metabolite profiles two groups. However, drives separation PC1 clear. evaluate whether explained available variable study, .e., age: Figure 35. PCA colored age data normalization quality control. According PCA , PC1 seem related age. Even variance data set can’t explain stage, proceed (supervised) statistical tests identify features interest. compute linear models metabolite explaining observed feature abundance available study variables. also use base R function lm(), utilize R Biocpkg(\"limma\") package conduct differential abundance analysis: moderated test statistics [@smyth_linear_2004] provided package specifically well suited experiments limited number replicates. tests use linear model ~ phenotype + age, hence explaining abundances one metabolite accounting study group assignment age individual. analysis might benefit inclusion study covariate associated PC2 explaining variance seen principal component, present analysis participant’s age disease association provided. define design study model.matrix() function fit feature-wise linear models log2-transformed abundances using lmFit() function. P-values significance association calculated using eBayes() function, also performs empirical Bayes-based robust estimation standard errors. See also excellent vignette/user guide limma package examples details linear model procedure. linear models fitted, can now proceed extract results. create data frame containing coefficients, raw adjusted p-values (applying Benjamini-Hochberg correction, .e., method = \"BH\" improved control false discovery rate), average intensity signals CVD CTR samples, indication whether feature deemed significant . consider metabolites adjusted p-value smaller 0.05 significant, also include (absolute) difference abundances cut-criteria. last, add differential abundance results result object’s rowData(). can now proceed visualize distribution raw adjusted p-values. Figure 36. Distribution raw (left) adjusted p-values (right). histograms show distribution raw adjusted p-values. Except enrichment small p-values, raw p-values (less) uniformly distributed, indicates absence strong systematic biases data. adjusted p-values conservative account multiple testing; important fit linear model feature therefore perform large number tests leads high chance false positive findings. see features low p-values, indicating likely significantly different two study groups. plot adjusted p-values log2 fold change (average) abundances. volcano plot allow us visualize features significantly different two study groups. highlighted blue color plot . Figure 37. Volcano plot showing analysis results. interesting features top corners volcano plot (.e., features large difference abundance groups small p-value). significant features negative coefficient (log2 fold change value) indicating abundance lower CVD samples compared CTR samples. features listed, along average difference (log2) abundance compared groups, adjusted p-values, average (log2) abundance sample group RSD (CV) QC samples table . Table 7. Features significant differences abundances. visualize EICs significant features evaluate (raw) signal. restrict MS data set study samples. Parameters keepFeatures = TRUE: ensures identified features retained `subset object. peakBg: defines (background) color individual chromatographic peak EIC object. Figure 38. Extracted ion chromatograms significant features. EICs significant features show clear single peak. intensities (already observed ) much larger CTR CVD samples. exception second feature (second EIC top row), intensities significant features however generally low. might make challenging identify using LC-MS/MS setup.","code":"#' Define the colors for the plot col_sample <- col_phenotype[res$phenotype] #' Log transform and scale the data for PCA analysis vals <- assay(res, \"norm_imputed\") |> t() |> log2() |> scale(center = TRUE, scale = TRUE) pca_res <- prcomp(vals, scale = FALSE, center = FALSE) vals_st <- cbind(vals, phenotype = res$phenotype) autoplot(pca_res, data = vals_st , colour = 'phenotype', scale = 0) + scale_color_manual(values = col_phenotype) #' Add age to the PCA plot vals_st <- cbind(vals, age = res$age) autoplot(pca_res, data = vals_st , colour = 'age', scale = 0) #' Define the linear model to be applied to the data p.cut <- 0.05 # cut-off for significance. m.cut <- 0.5 # cut-off for log2 fold change age <- res$age phenotype <- factor(res$phenotype) design <- model.matrix(~ phenotype + age) #' Fit the linear model to the data, explaining metabolite #' concentrations by phenotype and age. fit <- lmFit(log2(assay(res, \"norm_imputed\")), design = design) fit <- eBayes(fit) #' Compile a result data frame tmp <- data.frame( coef.CVD = fit$coefficients[, \"phenotypeCVD\"], pvalue.CVD = fit$p.value[, \"phenotypeCVD\"], adjp.CVD = p.adjust(fit$p.value[, \"phenotypeCVD\"], method = \"BH\"), avg.CVD = rowMeans( log2(assay(res, \"norm_imputed\")[, res$phenotype == \"CVD\"])), avg.CTR = rowMeans( log2(assay(res, \"norm_imputed\")[, res$phenotype == \"CTR\"])) ) tmp$significant.CVD <- tmp$adjp.CVD < 0.05 #' Add the results to the object's rowData rowData(res) <- cbind(rowData(res), tmp) #' Plot the distribution of p-values par(mfrow = c(1, 2)) hist(rowData(res)$pvalue.CVD, breaks = 64, xlab = \"p value\", main = \"Distribution of raw p-values\", cex.main = 1, cex.lab = 1, cex.axis = 1) hist(rowData(res)$adjp.CVD, breaks = 64, xlab = expression(p[BH]~value), main = \"Distribution of adjusted p-values\", cex.main = 1, cex.lab = 1, cex.axis = 1) #' Plot volcano plot of the statistical results par(mfrow = c(1, 1), mar = c(5, 5, 5, 1)) plot(rowData(res)$coef.CVD, -log10(rowData(res)$adjp.CVD), xlab = expression(log[2]~difference), ylab = expression(-log[10]~p[BH]), pch = 16, col = \"#00000060\", cex.main = 1.5, cex.lab = 1.5, cex.axis = 1.3) grid() abline(h = -log10(0.05), col = \"#0000ffcc\") if (any(rowData(res)$significant.CVD)) { points(rowData(res)$coef.CVD[rowData(res)$significant.CVD], -log10(rowData(res)$adjp.CVD[rowData(res)$significant.CVD]), col = \"#0000ffcc\") } # Table of significant features tab <- rowData(res)[rowData(res)$significant.CVD, c(\"mzmed\", \"rtmed\", \"coef.CVD\", \"adjp.CVD\", \"avg.CTR\", \"avg.CVD\", \"qc_cv\")] |> as.data.frame() tab <- tab[order(abs(tab$coef.CVD), decreasing = TRUE), ] kable(tab, format = \"pipe\") #' Restrict the raw data to study samples. lcms1_study <- lcms1[sampleData(lcms1)$phenotype != \"QC\", keepFeatures = TRUE] #' Extract EICs for the significant features eic_sign <- featureChromatograms( lcms1_study, features = rownames(tab), expandRt = 5, filled = TRUE) #' Plot the EICs. plot(eic_sign, col = col_sample, peakBg = paste0(col_sample[chromPeaks(eic_sign)[, \"sample\"]], 40)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"annotation","dir":"Articles","previous_headings":"","what":"Annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"now identified features significant differences abundances two study groups. provide information metabolic pathways differentiate affected healthy individuals might hence also serve biomarkers. However, stage analysis know compounds/metabolites actually represent. thus need now annotate signals. Annotation can performed different level confidence [@sumner_proposed_2007,@schymanski_identifying_2014]. lowest level annotation, highest rate false positive hits, bases features m/z ratios. Higher levels annotations employ fragment spectra (MS2 spectra) ions interest requiring however acquisition additional data. section, demonstrate multiple ways annotate significant features using functionality provided Bioconductor packages. Alternative approaches external software tools, may better suited, also discussed later section. data set acquired using LC-MS setup features thus characterized m/z retention times. retention time LC-setup-specific , without prior data/knowledge provide little information features’ identity. Modern MS instruments high accuracy m/z values therefore reliable estimates compound ion’s mass--charge ratio. first approach, use features’ m/z values match reference values, .e., exact masses chemical compounds provided reference database, case MassBank database. full MassBank data re-distributed Bioconductor’s AnnotationHub resource, simplifies integration reproducible R-based analysis workflows. load resource, list available MassBank data sets/releases load one . MassBank data provided self-contained SQLite database data can queried accessed CompoundDb Bioconductor package. use compounds() function extract small compound annotations database. MassBank (small compound annotation databases) provides (exact) molecular mass compound. Since almost small compounds neutral natural state, need first converted m/z values allow matching feature’s m/z. calculate m/z neutral mass, need assume ion (adduct) might generated measured metabolites employed electro-spray ionization. positive polarity, human serum samples, common ions protonated ([M+H]+), bear addition sodium ([M+Na]+) ammonium ([M+H-NH3]+) ions. match observed m/z values reference values potential ions use matchValues() function Mass2MzParam approach, allows specify types expected ions adducts parameter maximal allowed difference compared values using tolerance ppm parameters. first prepare data.frame significant features, set parameters matching perform comparison query features reference database. resulting Matched object shows 4 6 significant features matched ions compounds MassBank database. extract full result Matched object. Thus, total 237 ions compounds MassBank matched significant features based specified tolerance settings. Many compounds, different structure thus function/chemical property, identical chemical formula thus mass. Matching exclusively m/z features hence result many potentially false positive hits thus considered provide low confidence annotation. additional complication annotation resources, like MassBank, community maintained, contain large amount redundant information. reduce redundancy result table iterate hits feature keep matches unique compounds (identified INCHIKEY). INCHI INCHIKEY combine information compound’s chemical formula structure, different compounds can share chemical formula, different structure thus INCHI. Table 9. MS1 annotation results. table shows results MS1-based annotation process. can see four significant features matched. matches seem pretty accurate low ppm errors. deduplication performed considerably reduced number hits feature, first still matches ions large number compounds (chemical formula). Considering features’ m/z retention times MS1-based annotation increase annotation confidence, requires additional data, recording retention time thepure standard compound LC setup. alternative approach might provide better inside annotations help choose different annotations feature evaluate certain chemical properties possible matches. instance, LogP value, available several databases HMDB, provides insight given compound’s polarity. property highly affects interaction analyte column, usually also directly affects separation. Therefore, comparison analyte’s retention time polarity can help rule possible misidentifications. low confidence, MS1-based annotation can provide first candidate annotations confirmed rejected additional analyses. MS1 annotation fast efficient method annotate features therefore give first insight compounds significantly different two study groups. However, always accurate. MS2 data can provide higher level confidence annotation process provides, observed fragmentation pattern, information structure compound. MS2 data can generated LC-MS/MS measurement MS2 spectra recorded ions either data dependent acquisition (DDA) data independent acquisition (DIA) mode. Generally, advised include LC-MS/MS runs QC samples randomly selected study samples already acquisition MS1 data used quantification signals. alternative, addition, post-hoc LC-MS/MS acquisition can performed generate MS2 data needed annotation. present experiment, separate LC-MS/MS measurement conducted QC samples selected study samples generate data using inclusion list pre-selected ions. represent features found significantly different CVD CTR samples initial analysis full experiment. use subset second LC-MS/MS data set show data can used MS2-based annotation. differential abundance analysis found features significantly higher abundances CTR samples. Consequently, utilize MS2 data obtained CTR samples annotate significant features. load LC-MS/MS data experiment restrict data acquired CTR sample. Table 10. Samples LC-MS/MS data set. total 3 LC-MS/MS data files control samples, different collision energy fragment ions. show number MS1 MS2 spectra files. Compared number MS2 spectra, far less MS1 spectra acquired. configuration MS instrument set ensure ions specified inclusion list selected fragmentation, even intensity might low. setting, however, recorded MS2 spectra represent noise. plot shows location precursor ions m/z - retention time plane three files. can see MS2 spectra recorded m/z interest along full retention time range, even actual ions eluting within certain retention time windows. next extract Spectra object MS data data object assign new spectra variable employed collision energy, extract data object sampleData. next filter MS data first restricting MS2 spectra removing mass peaks spectrum intensity lower 5% highest intensity spectrum, assuming low intensity peaks represent background signal. next remove also mass peaks m/z value greater equal precursor m/z ion. puts, later matching reference spectra, weight fragmentation pattern ions avoids hits based precursor m/z peak (hence similar mass compared compounds). last, restrict data spectra least two fragment peaks scale intensities sum 1 spectrum. similarity calculations affected scaling, makes visual comparison fragment spectra easier read. Finally, also speed later comparison spectra reference database, load full MS data memory (changing backend MsBackendMemory) apply processing steps performed data far. Keeping MS data memory performance benefits, generally suggested large data sets. evaluate impact present data set print addition size data object changing backend. thus moderate increase memory demand loading MS data memory (also filtered cleaned MS2 data). proceed match experimental MS2 spectra reference fragment spectra, workflow aim annotate features found significant differential abundance analysis. goal thus identify MS2 spectra second (LC-MS/MS) run represent fragments ions features data first (LC-MS) run. approach match MS2 spectra significant features determined earlier based precursor m/z retention time (given acceptable tolerance) feature’s m/z retention time. can easily done using featureArea() function effectively considers actual m/z retention time ranges features’ chromatographic peaks therefore increase chance finding correct match. however also assumes retention times first second run don’t differ much. Alternatively, need align retention times second LC-MS/MS data set first. first extract feature area, .e., m/z retention time ranges, significant features. next identify fragment spectra precursor m/z retention times within ranges. use filterRanges() function allows filter Spectra object using multiple ranges simultaneously. apply function separately feature (row matrix) extract MS2 spectra representing fragmentation information presumed feature’s ions. result apply() call list Spectra, element representing result one feature. exception last feature, multiple MS2 spectra identified. next combine list Spectra single Spectra object using concatenateSpectra() function add additional spectra variable containing respective feature identifier. now Spectra object fragment spectra significant features differential expression analysis. next build reference data need process way query spectra. extract fragment spectra MassBank database, restrict positive polarity data (since experiment acquired positive polarity) perform processing fragment spectra MassBank database. Note switch MsBackendMemory backend hence loading full data reference database memory. positive impact performance subsequent spectra matching, however also increase memory demand present analysis. Now Spectra object second run database spectra prepared, can proceed matching process. use matchSpectra() function MetaboAnnotation package CompareSpectraParam define settings matching. following parameters: requirePrecursor = TRUE: Limits spectra similarity calculations fragment spectra similar precursor m/z. tolerance ppm: Defines acceptable difference compared m/z values. relaxed tolerance settings ensure find matches even reference spectra acquired instruments lower accuracy. THRESHFUN: Defines matches report. , keep matches resulting spectra similarity score (calculated normalized dot product [@stein_optimization_1994], default similarity function) larger 0.6. Thus, total 315 query MS2 spectra, 16 matched (least) one reference fragment spectrum. restrict results matching spectra extract metadata query target spectra well similarity score (complete list available metadata information can listed colnames() function). Now, query-target pairs spectra similarity higher 0.6. Similar MS1-based annotation also result table contains redundant information: multiple fragment spectra per feature also MassBank contains several fragment spectra compound, measured using differing collision energies MS instruments, different laboratories. thus iterate feature-compound pairs select one highest score. identifier compound, use fragment spectra’s INCHI-key, since compound names MassBank accepted consensus/controlled vocabularies. Table 9.MS2 annotation results. Thus, 5 significant features, one annotated compound based MS2-based approach. many reasons failure find matches features. Although MS2 spectra selected feature, appear represent noise, features, LC-MS/MS run, low MS1 signal recorded, indicating selected sample original compound might (longer) present. Also, reference databases contain predominantly fragment spectra protonated ([M+H]+) ions compounds, features might represent signal types ions result different fragmentation pattern. Finally, fragment spectra compounds interest might also simply present used reference database. Thus, combining information MS1- MS2 based annotation can annotate one feature considerable confidence. feature m/z 195.0879 retention time 32 seconds seems ion caffeine. result somewhat disappointing also clearly shows importance proper experimental planning need control potential confounding factors. present experiment, disease-specific biomarker identified, life-style property individuals suffering disease: coffee consumption probably contraindicated patients CVD group reduce risk heart arrhythmia. plot EIC feature highlighting retention time highest scoring MS2 spectra recorded create mirror plot comparing MS2 spectra reference fragment spectra caffeine. plot clearly shows higher signal feature CTR compared CVD samples. QC samples exhibit lower highly consistent signal, suggesting absence strong technical noise biases raw data experiment. vertical line indicates retention time fragment spectrum best match reference spectrum. noted , since fragment spectra measured separate LC-MS/MS experiment, considered indication approximate retention time ions fragmented second experiment. fragment spectrum feature, shown upper panel right plot highly similar reference spectrum caffeine MassBank (shown lower panel). addition matching precursor m/z, two fragments (m/z intensity) present spectra. can also extract additional metadata matching reference spectrum, used collision energy, fragmentation mode, instrument type, instrument well ion (adduct) fragmented. present workflow highlights annotation performed within R using packages Bioconductor project, also excellent external softwares used alternative, SIRIUS [@duhrkop_sirius_2019], mummichog [@li_predicting_2013] GNPS [@nothias_feature-based_2020] among others. use , data need exported format supported . MS2 spectra, data easily exported required MGF file format using MsBackendMgf Bioconductor package. Integration xcms feature-based molecular networking GNPS described GNPS documentation. alternative, addition, evidence potential matching chemical formula feature derived evaluating isotope pattern full MS1 scan. provide information isotope composition. Also , various functions isotopologues() MetaboCoreUtils package functionality envipat R package [@loos_accelerated_2015] used.","code":"#' load reference data ah <- AnnotationHub() #' List available MassBank data sets query(ah, \"MassBank\") AnnotationHub with 6 records # snapshotDate(): 2024-10-14 # $dataprovider: MassBank # $species: NA # $rdataclass: CompDb # additional mcols(): taxonomyid, genome, description, # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags, # rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[[\"AH107048\"]]' title AH107048 | MassBank CompDb for release 2021.03 AH107049 | MassBank CompDb for release 2022.06 AH111334 | MassBank CompDb for release 2022.12.1 AH116164 | MassBank CompDb for release 2023.06 AH116165 | MassBank CompDb for release 2023.09 AH116166 | MassBank CompDb for release 2023.11 #' Load one MAssBank release mb <- ah[[\"AH116166\"]] downloading 1 resources retrieving 1 resource loading from cache #' Extract compound annotations cmps <- compounds(mb, columns = c(\"compound_id\", \"name\", \"formula\", \"exactmass\", \"inchikey\")) head(cmps) compound_id formula exactmass inchikey 1 1 C27H29NO11 543.1741 AOJJSUZBOXZQNB-UHFFFAOYSA-N 2 2 C40H54O4 598.4022 KFNGKYUGHHQDEE-AXWOCEAUSA-N 3 3 C10H24N2O2 204.1838 AEUTYOVWOVBAKS-UWVGGRQHSA-N 4 4 C16H27NO5 313.1889 LMFKRLGHEKVMNT-UJDVCPFMSA-N 5 5 C20H15Cl3N2OS 435.9971 JLGKQTAYUIMGRK-UHFFFAOYSA-N 6 6 C15H14O5 274.0841 BWNCKEBBYADFPQ-UHFFFAOYSA-N name 1 Epirubicin 2 Crassostreaxanthin A 3 Ethambutol 4 Heliotrine 5 Sertaconazole 6 (R)Semivioxanthin #' Prepare query data frame rowData(res)$feature_id <- rownames(rowData(res)) res_sig <- res[rowData(res)$significant.CVD, ] #' Setup parameters for the matching param <- Mass2MzParam(adducts = c(\"[M+H]+\", \"[M+Na]+\", \"[M+H-NH3]+\"), tolerance = 0, ppm = 5) #' Perform the matching. mtch <- matchValues(res_sig, cmps, param = param, mzColname = \"mzmed\") mtch Object of class Matched Total number of matches: 237 Number of query objects: 5 (4 matched) Number of target objects: 117732 (237 matched) #' Extracting the results mtch_res <- matchedData(mtch, c(\"feature_id\", \"mzmed\", \"rtmed\", \"adduct\", \"ppm_error\", \"target_formula\", \"target_name\", \"target_inchikey\")) mtch_res DataFrame with 238 rows and 8 columns feature_id mzmed rtmed adduct ppm_error target_formula FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 ... ... ... ... ... ... ... FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O target_name target_inchikey FT0371 Benzohydro... VDEUYMSGMP... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Salicylami... SKZKKFZAGN... ... ... ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... rownames(mtch_res) <- NULL #' Keep only info on features that machted - create a utility function for that mtch_res <- split(mtch_res, mtch_res$feature_id) |> lapply(function(x) { lapply(split(x, x$target_inchikey), function(z) { z[which.min(z$ppm_error), ] }) }) |> unlist(recursive = FALSE) |> do.call(what = rbind) #' Display the results kable(mtch_res, format = \"pipe\") #' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # filter samples to keep MSMS data from CTR samples: sampleData(lcms2) <- sampleData(lcms2)[sampleData(lcms2)$phenotype == \"CTR\", ] sampleData(lcms2) <- sampleData(lcms2)[grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file), ] # Add fragmentation data information (from filenames) sampleData(lcms2)$fragmentation_mode <- c(\"CE20\", \"CE30\", \"CES\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, c(10, 240)) Filter spectra #' check the number of spectra per ms level spectra(lcms2) |> msLevel() |> split(spectraSampleIndex(lcms2)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 11 12 1 825 186 186 186 825 186 186 186 825 185 186 185 2 825 3121 3118 3124 825 3123 3118 3120 825 3117 3117 3116 plotPrecursorIons(lcms2) ms2_ctr <- spectra(lcms2) ms2_ctr$collision_energy <- sampleData(lcms2)$fragmentation_mode[spectraSampleIndex(lcms2)] #' Remove low intensity peaks low_int <- function(x, ...) { x > max(x, na.rm = TRUE) * 0.05 } ms2_ctr <- filterMsLevel(ms2_ctr, 2L) |> filterIntensity(intensity = low_int) #' Remove precursor peaks and restrict to spectra with a minimum #' number of peaks ms2_ctr <- filterPrecursorPeaks(ms2_ctr, ppm = 50, mz = \">=\") ms2_ctr <- ms2_ctr[lengths(ms2_ctr) > 1] |> scalePeaks() #' Size of the object before loading into memory print(object.size(ms2_ctr), units = \"MB\") 5.1 Mb #' Load the MS data subset into memory ms2_ctr <- setBackend(ms2_ctr, MsBackendMemory()) ms2_ctr <- applyProcessing(ms2_ctr) #' Size of the object after loading into memory print(object.size(ms2_ctr), units = \"MB\") 18.2 Mb #' Define the m/z and retention time ranges for the significant features target <- featureArea(lcms1)[rownames(res_sig), ] target mzmin mzmax rtmin rtmax FT0371 138.0544 138.0552 146.32270 152.86115 FT0565 161.0391 161.0407 159.00234 164.30799 FT0732 182.0726 182.0756 32.71242 42.28755 FT0845 195.0799 195.0887 30.73235 35.67337 FT1171 229.1282 229.1335 178.01450 183.35303 #' Identify for each feature MS2 spectra with their precursor m/z and #' retention time within the feature's m/z and retention time range ms2_ctr_fts <- apply(target[, c(\"rtmin\", \"rtmax\", \"mzmin\", \"mzmax\")], MARGIN = 1, FUN = filterRanges, object = ms2_ctr, spectraVariables = c(\"rtime\", \"precursorMz\")) lengths(ms2_ctr_fts) FT0371 FT0565 FT0732 FT0845 FT1171 38 36 135 68 38 l <- lengths(ms2_ctr_fts) #' Combine the individual Spectra objects ms2_ctr_fts <- concatenateSpectra(ms2_ctr_fts) #' Assign the feature identifier to each MS2 spectrum ms2_ctr_fts$feature_id <- rep(rownames(res_sig), l) ms2_ref <- Spectra(mb) |> filterPolarity(1L) |> filterIntensity(intensity = low_int) |> filterPrecursorPeaks(ppm = 50, mz = \">=\") ms2_ref <- ms2_ref[lengths(ms2_ref) > 1] |> scalePeaks() register(SerialParam()) #' Define the settings for the spectra matching. prm <- CompareSpectraParam(ppm = 40, tolerance = 0.05, requirePrecursor = TRUE, THRESHFUN = function(x) which(x >= 0.6)) ms2_mtch <- matchSpectra(ms2_ctr_fts, ms2_ref, param = prm) ms2_mtch Object of class MatchedSpectra Total number of matches: 214 Number of query objects: 315 (16 matched) Number of target objects: 69561 (21 matched) #' Keep only query spectra with matching reference spectra ms2_mtch <- ms2_mtch[whichQuery(ms2_mtch)] #' Extract the results ms2_mtch_res <- matchedData(ms2_mtch) nrow(ms2_mtch_res) [1] 214 #' - split the result per feature #' - select for each feature the best matching result for each compound #' - combine the result again into a data frame ms2_mtch_res <- ms2_mtch_res |> split(f = paste(ms2_mtch_res$feature_id, ms2_mtch_res$target_inchikey)) |> lapply(function(z) { z[which.max(z$score), ] }) |> do.call(what = rbind) |> as.data.frame() #' List the best matching feature-compound pair pandoc.table(ms2_mtch_res[, c(\"feature_id\", \"target_name\", \"score\", \"target_inchikey\")], style = \"rmarkdown\", caption = \"Table 9.MS2 annotation results.\", split.table = Inf) par(mfrow = c(1, 2)) col_sample <- col_phenotype[sampleData(lcms1)$phenotype] #' Extract and plot EIC for the annotated feature eic <- featureChromatograms(lcms1, features = ms2_mtch_res$feature_id[1]) plot(eic, col = col_sample, peakCol = col_sample[chromPeaks(eic)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic)[, \"sample\"]], 20)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1) #' Identify the best matching query-target spectra pair idx <- which.max(ms2_mtch_res$score) #' Indicate the retention time of the MS2 spectrum in the EIC plot abline(v = ms2_mtch_res$rtime[idx]) #' Get the index of the MS2 spectrum in the query object query_idx <- which(query(ms2_mtch)$.original_query_index == ms2_mtch_res$.original_query_index[idx]) query_ms2 <- query(ms2_mtch)[query_idx] #' Get the index of the MS2 spectrum in the target object target_idx <- which(target(ms2_mtch)$spectrum_id == ms2_mtch_res$target_spectrum_id[idx]) target_ms2 <- target(ms2_mtch)[target_idx] #' Create a mirror plot comparing the two best matching spectra plotSpectraMirror(query_ms2, target_ms2) legend(\"topleft\", legend = paste0(\"precursor m/z: \", format(precursorMz(query_ms2), 3))) spectraData(target_ms2, c(\"collisionEnergy_text\", \"fragmentation_mode\", \"instrument_type\", \"instrument\", \"adduct\")) |> as.data.frame() collisionEnergy_text fragmentation_mode instrument_type 1 55 (nominal) HCD LC-ESI-ITFT instrument adduct 1 LTQ Orbitrap XL Thermo Scientific [M+H]+"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"ms1-based-annotation","dir":"Articles","previous_headings":"","what":"MS1-based annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"data set acquired using LC-MS setup features thus characterized m/z retention times. retention time LC-setup-specific , without prior data/knowledge provide little information features’ identity. Modern MS instruments high accuracy m/z values therefore reliable estimates compound ion’s mass--charge ratio. first approach, use features’ m/z values match reference values, .e., exact masses chemical compounds provided reference database, case MassBank database. full MassBank data re-distributed Bioconductor’s AnnotationHub resource, simplifies integration reproducible R-based analysis workflows. load resource, list available MassBank data sets/releases load one . MassBank data provided self-contained SQLite database data can queried accessed CompoundDb Bioconductor package. use compounds() function extract small compound annotations database. MassBank (small compound annotation databases) provides (exact) molecular mass compound. Since almost small compounds neutral natural state, need first converted m/z values allow matching feature’s m/z. calculate m/z neutral mass, need assume ion (adduct) might generated measured metabolites employed electro-spray ionization. positive polarity, human serum samples, common ions protonated ([M+H]+), bear addition sodium ([M+Na]+) ammonium ([M+H-NH3]+) ions. match observed m/z values reference values potential ions use matchValues() function Mass2MzParam approach, allows specify types expected ions adducts parameter maximal allowed difference compared values using tolerance ppm parameters. first prepare data.frame significant features, set parameters matching perform comparison query features reference database. resulting Matched object shows 4 6 significant features matched ions compounds MassBank database. extract full result Matched object. Thus, total 237 ions compounds MassBank matched significant features based specified tolerance settings. Many compounds, different structure thus function/chemical property, identical chemical formula thus mass. Matching exclusively m/z features hence result many potentially false positive hits thus considered provide low confidence annotation. additional complication annotation resources, like MassBank, community maintained, contain large amount redundant information. reduce redundancy result table iterate hits feature keep matches unique compounds (identified INCHIKEY). INCHI INCHIKEY combine information compound’s chemical formula structure, different compounds can share chemical formula, different structure thus INCHI. Table 9. MS1 annotation results. table shows results MS1-based annotation process. can see four significant features matched. matches seem pretty accurate low ppm errors. deduplication performed considerably reduced number hits feature, first still matches ions large number compounds (chemical formula). Considering features’ m/z retention times MS1-based annotation increase annotation confidence, requires additional data, recording retention time thepure standard compound LC setup. alternative approach might provide better inside annotations help choose different annotations feature evaluate certain chemical properties possible matches. instance, LogP value, available several databases HMDB, provides insight given compound’s polarity. property highly affects interaction analyte column, usually also directly affects separation. Therefore, comparison analyte’s retention time polarity can help rule possible misidentifications. low confidence, MS1-based annotation can provide first candidate annotations confirmed rejected additional analyses.","code":"#' load reference data ah <- AnnotationHub() #' List available MassBank data sets query(ah, \"MassBank\") AnnotationHub with 6 records # snapshotDate(): 2024-10-14 # $dataprovider: MassBank # $species: NA # $rdataclass: CompDb # additional mcols(): taxonomyid, genome, description, # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags, # rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[[\"AH107048\"]]' title AH107048 | MassBank CompDb for release 2021.03 AH107049 | MassBank CompDb for release 2022.06 AH111334 | MassBank CompDb for release 2022.12.1 AH116164 | MassBank CompDb for release 2023.06 AH116165 | MassBank CompDb for release 2023.09 AH116166 | MassBank CompDb for release 2023.11 #' Load one MAssBank release mb <- ah[[\"AH116166\"]] downloading 1 resources retrieving 1 resource loading from cache #' Extract compound annotations cmps <- compounds(mb, columns = c(\"compound_id\", \"name\", \"formula\", \"exactmass\", \"inchikey\")) head(cmps) compound_id formula exactmass inchikey 1 1 C27H29NO11 543.1741 AOJJSUZBOXZQNB-UHFFFAOYSA-N 2 2 C40H54O4 598.4022 KFNGKYUGHHQDEE-AXWOCEAUSA-N 3 3 C10H24N2O2 204.1838 AEUTYOVWOVBAKS-UWVGGRQHSA-N 4 4 C16H27NO5 313.1889 LMFKRLGHEKVMNT-UJDVCPFMSA-N 5 5 C20H15Cl3N2OS 435.9971 JLGKQTAYUIMGRK-UHFFFAOYSA-N 6 6 C15H14O5 274.0841 BWNCKEBBYADFPQ-UHFFFAOYSA-N name 1 Epirubicin 2 Crassostreaxanthin A 3 Ethambutol 4 Heliotrine 5 Sertaconazole 6 (R)Semivioxanthin #' Prepare query data frame rowData(res)$feature_id <- rownames(rowData(res)) res_sig <- res[rowData(res)$significant.CVD, ] #' Setup parameters for the matching param <- Mass2MzParam(adducts = c(\"[M+H]+\", \"[M+Na]+\", \"[M+H-NH3]+\"), tolerance = 0, ppm = 5) #' Perform the matching. mtch <- matchValues(res_sig, cmps, param = param, mzColname = \"mzmed\") mtch Object of class Matched Total number of matches: 237 Number of query objects: 5 (4 matched) Number of target objects: 117732 (237 matched) #' Extracting the results mtch_res <- matchedData(mtch, c(\"feature_id\", \"mzmed\", \"rtmed\", \"adduct\", \"ppm_error\", \"target_formula\", \"target_name\", \"target_inchikey\")) mtch_res DataFrame with 238 rows and 8 columns feature_id mzmed rtmed adduct ppm_error target_formula FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 1.93568 C7H7NO2 FT0371 FT0371 138.055 148.396 [M+H]+ 2.08055 C7H7NO2 ... ... ... ... ... ... ... FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O FT1171 FT1171 229.13 181.088 [M+Na]+ 3.07708 C12H18N2O target_name target_inchikey FT0371 Benzohydro... VDEUYMSGMP... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Trigonelli... WWNNZCOKKK... FT0371 Salicylami... SKZKKFZAGN... ... ... ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... FT1171 Isoproturo... PUIYMUZLKQ... rownames(mtch_res) <- NULL #' Keep only info on features that machted - create a utility function for that mtch_res <- split(mtch_res, mtch_res$feature_id) |> lapply(function(x) { lapply(split(x, x$target_inchikey), function(z) { z[which.min(z$ppm_error), ] }) }) |> unlist(recursive = FALSE) |> do.call(what = rbind) #' Display the results kable(mtch_res, format = \"pipe\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"ms2-based-annotation","dir":"Articles","previous_headings":"","what":"MS2-based annotation","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"MS1 annotation fast efficient method annotate features therefore give first insight compounds significantly different two study groups. However, always accurate. MS2 data can provide higher level confidence annotation process provides, observed fragmentation pattern, information structure compound. MS2 data can generated LC-MS/MS measurement MS2 spectra recorded ions either data dependent acquisition (DDA) data independent acquisition (DIA) mode. Generally, advised include LC-MS/MS runs QC samples randomly selected study samples already acquisition MS1 data used quantification signals. alternative, addition, post-hoc LC-MS/MS acquisition can performed generate MS2 data needed annotation. present experiment, separate LC-MS/MS measurement conducted QC samples selected study samples generate data using inclusion list pre-selected ions. represent features found significantly different CVD CTR samples initial analysis full experiment. use subset second LC-MS/MS data set show data can used MS2-based annotation. differential abundance analysis found features significantly higher abundances CTR samples. Consequently, utilize MS2 data obtained CTR samples annotate significant features. load LC-MS/MS data experiment restrict data acquired CTR sample. Table 10. Samples LC-MS/MS data set. total 3 LC-MS/MS data files control samples, different collision energy fragment ions. show number MS1 MS2 spectra files. Compared number MS2 spectra, far less MS1 spectra acquired. configuration MS instrument set ensure ions specified inclusion list selected fragmentation, even intensity might low. setting, however, recorded MS2 spectra represent noise. plot shows location precursor ions m/z - retention time plane three files. can see MS2 spectra recorded m/z interest along full retention time range, even actual ions eluting within certain retention time windows. next extract Spectra object MS data data object assign new spectra variable employed collision energy, extract data object sampleData. next filter MS data first restricting MS2 spectra removing mass peaks spectrum intensity lower 5% highest intensity spectrum, assuming low intensity peaks represent background signal. next remove also mass peaks m/z value greater equal precursor m/z ion. puts, later matching reference spectra, weight fragmentation pattern ions avoids hits based precursor m/z peak (hence similar mass compared compounds). last, restrict data spectra least two fragment peaks scale intensities sum 1 spectrum. similarity calculations affected scaling, makes visual comparison fragment spectra easier read. Finally, also speed later comparison spectra reference database, load full MS data memory (changing backend MsBackendMemory) apply processing steps performed data far. Keeping MS data memory performance benefits, generally suggested large data sets. evaluate impact present data set print addition size data object changing backend. thus moderate increase memory demand loading MS data memory (also filtered cleaned MS2 data). proceed match experimental MS2 spectra reference fragment spectra, workflow aim annotate features found significant differential abundance analysis. goal thus identify MS2 spectra second (LC-MS/MS) run represent fragments ions features data first (LC-MS) run. approach match MS2 spectra significant features determined earlier based precursor m/z retention time (given acceptable tolerance) feature’s m/z retention time. can easily done using featureArea() function effectively considers actual m/z retention time ranges features’ chromatographic peaks therefore increase chance finding correct match. however also assumes retention times first second run don’t differ much. Alternatively, need align retention times second LC-MS/MS data set first. first extract feature area, .e., m/z retention time ranges, significant features. next identify fragment spectra precursor m/z retention times within ranges. use filterRanges() function allows filter Spectra object using multiple ranges simultaneously. apply function separately feature (row matrix) extract MS2 spectra representing fragmentation information presumed feature’s ions. result apply() call list Spectra, element representing result one feature. exception last feature, multiple MS2 spectra identified. next combine list Spectra single Spectra object using concatenateSpectra() function add additional spectra variable containing respective feature identifier. now Spectra object fragment spectra significant features differential expression analysis. next build reference data need process way query spectra. extract fragment spectra MassBank database, restrict positive polarity data (since experiment acquired positive polarity) perform processing fragment spectra MassBank database. Note switch MsBackendMemory backend hence loading full data reference database memory. positive impact performance subsequent spectra matching, however also increase memory demand present analysis. Now Spectra object second run database spectra prepared, can proceed matching process. use matchSpectra() function MetaboAnnotation package CompareSpectraParam define settings matching. following parameters: requirePrecursor = TRUE: Limits spectra similarity calculations fragment spectra similar precursor m/z. tolerance ppm: Defines acceptable difference compared m/z values. relaxed tolerance settings ensure find matches even reference spectra acquired instruments lower accuracy. THRESHFUN: Defines matches report. , keep matches resulting spectra similarity score (calculated normalized dot product [@stein_optimization_1994], default similarity function) larger 0.6. Thus, total 315 query MS2 spectra, 16 matched (least) one reference fragment spectrum. restrict results matching spectra extract metadata query target spectra well similarity score (complete list available metadata information can listed colnames() function). Now, query-target pairs spectra similarity higher 0.6. Similar MS1-based annotation also result table contains redundant information: multiple fragment spectra per feature also MassBank contains several fragment spectra compound, measured using differing collision energies MS instruments, different laboratories. thus iterate feature-compound pairs select one highest score. identifier compound, use fragment spectra’s INCHI-key, since compound names MassBank accepted consensus/controlled vocabularies. Table 9.MS2 annotation results. Thus, 5 significant features, one annotated compound based MS2-based approach. many reasons failure find matches features. Although MS2 spectra selected feature, appear represent noise, features, LC-MS/MS run, low MS1 signal recorded, indicating selected sample original compound might (longer) present. Also, reference databases contain predominantly fragment spectra protonated ([M+H]+) ions compounds, features might represent signal types ions result different fragmentation pattern. Finally, fragment spectra compounds interest might also simply present used reference database. Thus, combining information MS1- MS2 based annotation can annotate one feature considerable confidence. feature m/z 195.0879 retention time 32 seconds seems ion caffeine. result somewhat disappointing also clearly shows importance proper experimental planning need control potential confounding factors. present experiment, disease-specific biomarker identified, life-style property individuals suffering disease: coffee consumption probably contraindicated patients CVD group reduce risk heart arrhythmia. plot EIC feature highlighting retention time highest scoring MS2 spectra recorded create mirror plot comparing MS2 spectra reference fragment spectra caffeine. plot clearly shows higher signal feature CTR compared CVD samples. QC samples exhibit lower highly consistent signal, suggesting absence strong technical noise biases raw data experiment. vertical line indicates retention time fragment spectrum best match reference spectrum. noted , since fragment spectra measured separate LC-MS/MS experiment, considered indication approximate retention time ions fragmented second experiment. fragment spectrum feature, shown upper panel right plot highly similar reference spectrum caffeine MassBank (shown lower panel). addition matching precursor m/z, two fragments (m/z intensity) present spectra. can also extract additional metadata matching reference spectrum, used collision energy, fragmentation mode, instrument type, instrument well ion (adduct) fragmented.","code":"#' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") # filter samples to keep MSMS data from CTR samples: sampleData(lcms2) <- sampleData(lcms2)[sampleData(lcms2)$phenotype == \"CTR\", ] sampleData(lcms2) <- sampleData(lcms2)[grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file), ] # Add fragmentation data information (from filenames) sampleData(lcms2)$fragmentation_mode <- c(\"CE20\", \"CE30\", \"CES\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, c(10, 240)) Filter spectra #' check the number of spectra per ms level spectra(lcms2) |> msLevel() |> split(spectraSampleIndex(lcms2)) |> lapply(table) |> do.call(what = cbind) 1 2 3 4 5 6 7 8 9 10 11 12 1 825 186 186 186 825 186 186 186 825 185 186 185 2 825 3121 3118 3124 825 3123 3118 3120 825 3117 3117 3116 plotPrecursorIons(lcms2) ms2_ctr <- spectra(lcms2) ms2_ctr$collision_energy <- sampleData(lcms2)$fragmentation_mode[spectraSampleIndex(lcms2)] #' Remove low intensity peaks low_int <- function(x, ...) { x > max(x, na.rm = TRUE) * 0.05 } ms2_ctr <- filterMsLevel(ms2_ctr, 2L) |> filterIntensity(intensity = low_int) #' Remove precursor peaks and restrict to spectra with a minimum #' number of peaks ms2_ctr <- filterPrecursorPeaks(ms2_ctr, ppm = 50, mz = \">=\") ms2_ctr <- ms2_ctr[lengths(ms2_ctr) > 1] |> scalePeaks() #' Size of the object before loading into memory print(object.size(ms2_ctr), units = \"MB\") 5.1 Mb #' Load the MS data subset into memory ms2_ctr <- setBackend(ms2_ctr, MsBackendMemory()) ms2_ctr <- applyProcessing(ms2_ctr) #' Size of the object after loading into memory print(object.size(ms2_ctr), units = \"MB\") 18.2 Mb #' Define the m/z and retention time ranges for the significant features target <- featureArea(lcms1)[rownames(res_sig), ] target mzmin mzmax rtmin rtmax FT0371 138.0544 138.0552 146.32270 152.86115 FT0565 161.0391 161.0407 159.00234 164.30799 FT0732 182.0726 182.0756 32.71242 42.28755 FT0845 195.0799 195.0887 30.73235 35.67337 FT1171 229.1282 229.1335 178.01450 183.35303 #' Identify for each feature MS2 spectra with their precursor m/z and #' retention time within the feature's m/z and retention time range ms2_ctr_fts <- apply(target[, c(\"rtmin\", \"rtmax\", \"mzmin\", \"mzmax\")], MARGIN = 1, FUN = filterRanges, object = ms2_ctr, spectraVariables = c(\"rtime\", \"precursorMz\")) lengths(ms2_ctr_fts) FT0371 FT0565 FT0732 FT0845 FT1171 38 36 135 68 38 l <- lengths(ms2_ctr_fts) #' Combine the individual Spectra objects ms2_ctr_fts <- concatenateSpectra(ms2_ctr_fts) #' Assign the feature identifier to each MS2 spectrum ms2_ctr_fts$feature_id <- rep(rownames(res_sig), l) ms2_ref <- Spectra(mb) |> filterPolarity(1L) |> filterIntensity(intensity = low_int) |> filterPrecursorPeaks(ppm = 50, mz = \">=\") ms2_ref <- ms2_ref[lengths(ms2_ref) > 1] |> scalePeaks() register(SerialParam()) #' Define the settings for the spectra matching. prm <- CompareSpectraParam(ppm = 40, tolerance = 0.05, requirePrecursor = TRUE, THRESHFUN = function(x) which(x >= 0.6)) ms2_mtch <- matchSpectra(ms2_ctr_fts, ms2_ref, param = prm) ms2_mtch Object of class MatchedSpectra Total number of matches: 214 Number of query objects: 315 (16 matched) Number of target objects: 69561 (21 matched) #' Keep only query spectra with matching reference spectra ms2_mtch <- ms2_mtch[whichQuery(ms2_mtch)] #' Extract the results ms2_mtch_res <- matchedData(ms2_mtch) nrow(ms2_mtch_res) [1] 214 #' - split the result per feature #' - select for each feature the best matching result for each compound #' - combine the result again into a data frame ms2_mtch_res <- ms2_mtch_res |> split(f = paste(ms2_mtch_res$feature_id, ms2_mtch_res$target_inchikey)) |> lapply(function(z) { z[which.max(z$score), ] }) |> do.call(what = rbind) |> as.data.frame() #' List the best matching feature-compound pair pandoc.table(ms2_mtch_res[, c(\"feature_id\", \"target_name\", \"score\", \"target_inchikey\")], style = \"rmarkdown\", caption = \"Table 9.MS2 annotation results.\", split.table = Inf) par(mfrow = c(1, 2)) col_sample <- col_phenotype[sampleData(lcms1)$phenotype] #' Extract and plot EIC for the annotated feature eic <- featureChromatograms(lcms1, features = ms2_mtch_res$feature_id[1]) plot(eic, col = col_sample, peakCol = col_sample[chromPeaks(eic)[, \"sample\"]], peakBg = paste0(col_sample[chromPeaks(eic)[, \"sample\"]], 20)) legend(\"topright\", col = col_phenotype, legend = names(col_phenotype), lty = 1) #' Identify the best matching query-target spectra pair idx <- which.max(ms2_mtch_res$score) #' Indicate the retention time of the MS2 spectrum in the EIC plot abline(v = ms2_mtch_res$rtime[idx]) #' Get the index of the MS2 spectrum in the query object query_idx <- which(query(ms2_mtch)$.original_query_index == ms2_mtch_res$.original_query_index[idx]) query_ms2 <- query(ms2_mtch)[query_idx] #' Get the index of the MS2 spectrum in the target object target_idx <- which(target(ms2_mtch)$spectrum_id == ms2_mtch_res$target_spectrum_id[idx]) target_ms2 <- target(ms2_mtch)[target_idx] #' Create a mirror plot comparing the two best matching spectra plotSpectraMirror(query_ms2, target_ms2) legend(\"topleft\", legend = paste0(\"precursor m/z: \", format(precursorMz(query_ms2), 3))) spectraData(target_ms2, c(\"collisionEnergy_text\", \"fragmentation_mode\", \"instrument_type\", \"instrument\", \"adduct\")) |> as.data.frame() collisionEnergy_text fragmentation_mode instrument_type 1 55 (nominal) HCD LC-ESI-ITFT instrument adduct 1 LTQ Orbitrap XL Thermo Scientific [M+H]+"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"external-tools-or-alternative-annotation-approaches","dir":"Articles","previous_headings":"","what":"External tools or alternative annotation approaches","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"present workflow highlights annotation performed within R using packages Bioconductor project, also excellent external softwares used alternative, SIRIUS [@duhrkop_sirius_2019], mummichog [@li_predicting_2013] GNPS [@nothias_feature-based_2020] among others. use , data need exported format supported . MS2 spectra, data easily exported required MGF file format using MsBackendMgf Bioconductor package. Integration xcms feature-based molecular networking GNPS described GNPS documentation. alternative, addition, evidence potential matching chemical formula feature derived evaluating isotope pattern full MS1 scan. provide information isotope composition. Also , various functions isotopologues() MetaboCoreUtils package functionality envipat R package [@loos_accelerated_2015] used.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"summary","dir":"Articles","previous_headings":"","what":"Summary","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"tutorial, describe end--end workflow LC-MS-based untargeted metabolomics experiments, conducted entirely within R using packages Bioconductor project base R functionality. excellent software exists perform similar analyses, power R-based workflow lies adaptability individual data sets research questions ability build reproducible workflows documentation. Due space restrictions don’t provide comprehensive listing methodologies individual analysis steps. advanced options approaches available, e.g., normalization data, however also heavily dependent size properties analyzed data set, well annotation features. result, found present analysis set features significant abundance differences compared groups. however reliably annotate single feature, related lifestyle individuals rather pathological properties investigated disease. low proportion annotated signals however uncommon untargeted metabolomics experiments reflects need comprehensive reliable reference annotation libraries.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"session-information","dir":"Articles","previous_headings":"","what":"Session information","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"","code":"sessionInfo() R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: Etc/UTC tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] MetaboAnnotation_1.9.2 CompoundDb_1.9.5 [3] AnnotationFilter_1.29.0 AnnotationHub_3.13.3 [5] BiocFileCache_2.13.2 dbplyr_2.5.0 [7] gridExtra_2.3 ggfortify_0.4.17 [9] ggplot2_3.5.1 vioplot_0.5.0 [11] zoo_1.8-12 sm_2.2-6.0 [13] pheatmap_1.0.12 RColorBrewer_1.1-3 [15] pander_0.6.5 limma_3.61.12 [17] MetaboCoreUtils_1.13.0 xcms_4.3.3 [19] SummarizedExperiment_1.35.4 Biobase_2.65.1 [21] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2 [23] IRanges_2.39.2 MatrixGenerics_1.17.0 [25] matrixStats_1.4.1 MsBackendMetaboLights_0.99.1 [27] Spectra_1.15.12 BiocParallel_1.39.0 [29] S4Vectors_0.43.2 BiocGenerics_0.51.3 [31] MsIO_0.0.6 MsExperiment_1.7.0 [33] ProtGenerics_1.37.1 readxl_1.4.3 [35] BiocStyle_2.33.1 quarto_1.4.4 [37] knitr_1.48 loaded via a namespace (and not attached): [1] later_1.3.2 bitops_1.0-9 [3] filelock_1.0.3 tibble_3.2.1 [5] cellranger_1.1.0 preprocessCore_1.67.1 [7] XML_3.99-0.17 lifecycle_1.0.4 [9] doParallel_1.0.17 processx_3.8.4 [11] lattice_0.22-6 MASS_7.3-61 [13] alabaster.base_1.5.10 MultiAssayExperiment_1.31.5 [15] magrittr_2.0.3 rmarkdown_2.28 [17] yaml_2.3.10 MsCoreUtils_1.17.2 [19] DBI_1.2.3 abind_1.4-8 [21] zlibbioc_1.51.2 purrr_1.0.2 [23] RCurl_1.98-1.16 rappdirs_0.3.3 [25] GenomeInfoDbData_1.2.13 MSnbase_2.31.1 [27] ncdf4_1.23 codetools_0.2-20 [29] DelayedArray_0.31.14 DT_0.33 [31] xml2_1.3.6 tidyselect_1.2.1 [33] UCSC.utils_1.1.0 farver_2.1.2 [35] base64enc_0.1-3 jsonlite_1.8.9 [37] iterators_1.0.14 foreach_1.5.2 [39] tools_4.4.1 progress_1.2.3 [41] Rcpp_1.0.13 glue_1.8.0 [43] SparseArray_1.5.45 xfun_0.48 [45] dplyr_1.1.4 withr_3.0.1 [47] BiocManager_1.30.25 fastmap_1.2.0 [49] rhdf5filters_1.17.0 fansi_1.0.6 [51] digest_0.6.37 R6_2.5.1 [53] mime_0.12 colorspace_2.1-1 [55] rsvg_2.6.1 RSQLite_2.3.7 [57] utf8_1.2.4 tidyr_1.3.1 [59] generics_0.1.3 prettyunits_1.2.0 [61] PSMatch_1.9.0 httr_1.4.7 [63] htmlwidgets_1.6.4 S4Arrays_1.5.11 [65] pkgconfig_2.0.3 gtable_0.3.5 [67] blob_1.2.4 impute_1.79.0 [69] MassSpecWavelet_1.71.0 XVector_0.45.0 [71] htmltools_0.5.8.1 MALDIquant_1.22.3 [73] clue_0.3-65 scales_1.3.0 [75] png_0.1-8 rstudioapi_0.17.0 [77] reshape2_1.4.4 rjson_0.2.23 [79] curl_5.2.3 cachem_1.1.0 [81] rhdf5_2.49.0 stringr_1.5.1 [83] BiocVersion_3.20.0 parallel_4.4.1 [85] AnnotationDbi_1.67.0 mzID_1.43.0 [87] vsn_3.73.0 pillar_1.9.0 [89] grid_4.4.1 alabaster.schemas_1.5.0 [91] vctrs_0.6.5 MsFeatures_1.13.0 [93] pcaMethods_1.97.0 cluster_2.1.6 [95] evaluate_1.0.1 cli_3.6.3 [97] compiler_4.4.1 rlang_1.1.4 [99] crayon_1.5.3 labeling_0.4.3 [101] QFeatures_1.15.3 ChemmineR_3.57.1 [103] ps_1.8.0 affy_1.83.1 [105] plyr_1.8.9 fs_1.6.4 [107] stringi_1.8.4 munsell_0.5.1 [109] Biostrings_2.73.2 lazyeval_0.2.2 [111] Matrix_1.7-1 hms_1.1.3 [113] bit64_4.5.2 Rhdf5lib_1.27.0 [115] KEGGREST_1.45.1 statmod_1.5.0 [117] mzR_2.39.2 igraph_2.1.1 [119] memoise_2.0.1 affyio_1.75.1 [121] bit_4.5.0"},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"appendix","dir":"Articles","previous_headings":"","what":"Appendix","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Thanks Steffen Neumann continuous work develop maintain xcms software. … align data set using internal standards. suggested eventually enrich anchor peaks signal ions retention time regions covered internal standards.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"aknowledgment","dir":"Articles","previous_headings":"","what":"Aknowledgment","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"Thanks Steffen Neumann continuous work develop maintain xcms software. …","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"alignment-using-manually-selected-anchor-peaks","dir":"Articles","previous_headings":"","what":"Alignment using manually selected anchor peaks","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"align data set using internal standards. suggested eventually enrich anchor peaks signal ions retention time regions covered internal standards.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/a-end-to-end-untargeted-metabolomics.html","id":"additional-informations","dir":"Articles","previous_headings":"","what":"Additional informations","title":"Complete end-to-end LC-MS/MS Metabolomic Data analysis","text":"","code":"#possible extra info: # -"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"certain experiments, aligning datasets recorded different times necessary. can involve comparing runs samples different laboratories matching MS2 data initial MS1 run. Variation retention time across laboratories LC systems often requires alignment step using adjustRtime() LamaParama parameter. described data description vignette, samples run twice: LC-MS mode LC-MS/MS mode. tutorial show align LC-MS/MS run preprocessed LC-MS dataset. following packages needed: Setting parallel processing improve efficiency process: First, let’s load pre-processed LC-MS object, steps get object shown End--end worflow vignette. Next, load unprocessed LC-MS/MS data MetaboLights database: adjust sampleData() LC-MS/MS object make easier access: Table 10. Samples LC-MS/MS data set. keep MS runs (MS/MS) remove pooled samples, focusing samples E common runs. alignment, ensure retention time (RT) ranges match datasets: need adjust RT range LC-MS/MS object match LC-MS data: evaluate retention time shifts, ’ll plot base peak chromatogram (BPC): Compare run1 sample run2 sample Similarly, compare BPC sample E: Perform peak detection refining alignment, detailed end--end vignette. setting applied. Now, attempt align two samples previous dataset. first step extract landmark features (referred lamas). achieve , identify features present every phenotype group lcms1 dataset. , categorize (using factor()) data phenotype retain QC samples. variable utilized filter features using PercentMissingFilter parameter within filterFeatures() function. , setting threshold = 0 select features present QC samples. lamas input look like alignment. terms method works, alignment algorithm matches chromatographic peaks experimental data lamas, fitting model based match adjust retention times minimize differences two datasets. Now can define param object LamaParama prepare alignment. Parameters tolerance, toleranceRt, ppm relate matching chromatographic peaks lamas. parameters related type fitting generated data points. details parameter overall method can found searching ?adjustRtime. example using default parameters. matchLamaChromPeaks() function facilitates assessment well lamas correspond chromatographic peaks file. extract matched results using matchedRtimes() function. used later evaluate alignment. Now can adjust retention time LC-MS/MS dataset using adjustRtime() function. extract base peak chromatogram (BPC) aligned object: evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks. quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line. tutorial demonstrated align LC-MS LC-MS/MS datasets correct retention time shifts, crucial handling data different runs platforms. preprocessed data, detected chromatographic peaks, used landmark features (lamas) QC samples adjust retention times via adjustRtime() function. Visual comparisons base peak chromatograms alignment, along distance calculations, showed clear improvements RT synchronization. Ultimately, aligning chromatographic data ensures subsequent analyses, feature extraction statistical comparisons, based consistent time points, improving data quality reliability. tutorial outlined end--end workflow can adapted various LC-MS-based metabolomics studies, helping researchers manage retention time variation effectively.","code":"library(MsIO) library(MsBackendMetaboLights) library(xcms) library(MsExperiment) library(Spectra) library(vioplot) #' Set up parallel processing using 2 cores if (.Platform$OS.type == \"unix\") { register(MulticoreParam(2)) } else { register(SnowParam(2)) } load(\"/shared/data/preprocessed_lcms1.RData\") #' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") # Only keep MS run lcms2 <- lcms2[!grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file),] range(rtime(lcms1)) [1] 9.674428 240.115311 range(rtime(lcms2)) [1] 0.275 480.176 #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, range(rtime(lcms1))) idx_A <- which(sampleData(lcms1)$sample_name == \"A\") idx_E <- which(sampleData(lcms1)$sample_name == \"E\") bpc1 <-chromatogram(lcms1[c(idx_A,idx_E)], aggregationFun = \"max\", msLevel = 1) Processing chromatographic peaks bpc2 <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) plot(bpc1[1,1], col = \"#00000080\", main = \"BPC sample A LC-MS vs A LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 1]), intensity(bpc2[1, 1]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"BPC sample E LC-MS vs E LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms2 <- findChromPeaks(lcms2, param = param, chunkSize = 2) param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) lcms2 <- refineChromPeaks(lcms2, param = param, chunkSize = 2) f <- sampleData(lcms1)$phenotype f[f != \"QC\"] <- NA lcms1 <- filterFeatures(lcms1, PercentMissingFilter(threshold = 0, f = f), filled = FALSE) 3694 features were removed lcms1_mz_rt <- featureDefinitions(lcms1)[, c(\"mzmed\",\"rtmed\")] head(lcms1_mz_rt) mzmed rtmed FT0001 50.98979 203.6001 FT0002 51.05904 191.1675 FT0003 51.98657 203.1467 FT0004 53.02036 203.2343 FT0005 53.52080 203.1936 FT0007 54.01010 235.9032 nrow(lcms1_mz_rt) [1] 5374 param <- LamaParama(lamas = lcms1_mz_rt, method = \"loess\", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm =20, tolerance = 0, toleranceRt = 20, bs = \"tp\") param <- matchLamasChromPeaks(lcms2, param = param) ref_vs_obs <- matchedRtimes(param) #' input into `adjustRtime()` lcms2 <- adjustRtime(lcms2, param = param) lcms2 <- applyAdjustedRtime(lcms2) #' evaluate the results with BPC bpc2_adj <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) #' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\") # Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"load-preprocessed-lc-ms-object","dir":"Articles","previous_headings":"","what":"Load preprocessed LC-MS object","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"First, let’s load pre-processed LC-MS object, steps get object shown End--end worflow vignette.","code":"load(\"/shared/data/preprocessed_lcms1.RData\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"load-unprocessed-lc-msms-data","dir":"Articles","previous_headings":"","what":"Load unprocessed LC-MS/MS data","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Next, load unprocessed LC-MS/MS data MetaboLights database: adjust sampleData() LC-MS/MS object make easier access: Table 10. Samples LC-MS/MS data set. keep MS runs (MS/MS) remove pooled samples, focusing samples E common runs. alignment, ensure retention time (RT) ranges match datasets: need adjust RT range LC-MS/MS object match LC-MS data:","code":"#' Load form the MetaboLights Database param <- MetaboLightsParam(mtblsId = \"MTBLS8735\", assayName = paste0(\"a_MTBLS8735_LC-MSMS_positive_\", \"hilic_metabolite_profiling.txt\"), filePattern = \".mzML\") lcms2 <- readMsObject(MsExperiment(), param, keepOntology = FALSE, keepProtocol = FALSE, simplify = TRUE) #adjust sampleData colnames(sampleData(lcms2)) <- c(\"sample_name\", \"derived_spectra_data_file\", \"metabolite_asssignment_file\", \"source_name\", \"organism\", \"blood_sample_type\", \"sample_type\", \"age\", \"unit\", \"phenotype\") #let's look at the updated sample data sampleData(lcms2)[, c(\"derived_spectra_data_file\", \"phenotype\", \"sample_name\", \"age\")] |> kable(format = \"pipe\") # Only keep MS run lcms2 <- lcms2[!grepl(\"MSMS\", sampleData(lcms2)$derived_spectra_data_file),] range(rtime(lcms1)) [1] 9.674428 240.115311 range(rtime(lcms2)) [1] 0.275 480.176 #' Filter the data to the same RT range as the LC-MS run lcms2 <- filterRt(lcms2, range(rtime(lcms1)))"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"comparing-chromatograms","dir":"Articles","previous_headings":"","what":"Comparing chromatograms","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"evaluate retention time shifts, ’ll plot base peak chromatogram (BPC): Compare run1 sample run2 sample Similarly, compare BPC sample E:","code":"idx_A <- which(sampleData(lcms1)$sample_name == \"A\") idx_E <- which(sampleData(lcms1)$sample_name == \"E\") bpc1 <-chromatogram(lcms1[c(idx_A,idx_E)], aggregationFun = \"max\", msLevel = 1) Processing chromatographic peaks bpc2 <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) plot(bpc1[1,1], col = \"#00000080\", main = \"BPC sample A LC-MS vs A LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 1]), intensity(bpc2[1, 1]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"BPC sample E LC-MS vs E LC-MS/MS\", lwd = 1.5, peakType = \"none\") grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), col = \"#0000ff80\", type = \"l\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"peak-detection","dir":"Articles","previous_headings":"","what":"Peak detection","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Perform peak detection refining alignment, detailed end--end vignette. setting applied.","code":"param <- CentWaveParam(peakwidth = c(1, 8), ppm = 15, integrate = 2) lcms2 <- findChromPeaks(lcms2, param = param, chunkSize = 2) param <- MergeNeighboringPeaksParam(expandRt = 2.5, expandMz = 0.0015, minProp = 0.75) lcms2 <- refineChromPeaks(lcms2, param = param, chunkSize = 2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"alignment","dir":"Articles","previous_headings":"","what":"Alignment","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"Now, attempt align two samples previous dataset. first step extract landmark features (referred lamas). achieve , identify features present every phenotype group lcms1 dataset. , categorize (using factor()) data phenotype retain QC samples. variable utilized filter features using PercentMissingFilter parameter within filterFeatures() function. , setting threshold = 0 select features present QC samples. lamas input look like alignment. terms method works, alignment algorithm matches chromatographic peaks experimental data lamas, fitting model based match adjust retention times minimize differences two datasets. Now can define param object LamaParama prepare alignment. Parameters tolerance, toleranceRt, ppm relate matching chromatographic peaks lamas. parameters related type fitting generated data points. details parameter overall method can found searching ?adjustRtime. example using default parameters. matchLamaChromPeaks() function facilitates assessment well lamas correspond chromatographic peaks file. extract matched results using matchedRtimes() function. used later evaluate alignment. Now can adjust retention time LC-MS/MS dataset using adjustRtime() function.","code":"f <- sampleData(lcms1)$phenotype f[f != \"QC\"] <- NA lcms1 <- filterFeatures(lcms1, PercentMissingFilter(threshold = 0, f = f), filled = FALSE) 3694 features were removed lcms1_mz_rt <- featureDefinitions(lcms1)[, c(\"mzmed\",\"rtmed\")] head(lcms1_mz_rt) mzmed rtmed FT0001 50.98979 203.6001 FT0002 51.05904 191.1675 FT0003 51.98657 203.1467 FT0004 53.02036 203.2343 FT0005 53.52080 203.1936 FT0007 54.01010 235.9032 nrow(lcms1_mz_rt) [1] 5374 param <- LamaParama(lamas = lcms1_mz_rt, method = \"loess\", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm =20, tolerance = 0, toleranceRt = 20, bs = \"tp\") param <- matchLamasChromPeaks(lcms2, param = param) ref_vs_obs <- matchedRtimes(param) #' input into `adjustRtime()` lcms2 <- adjustRtime(lcms2, param = param) lcms2 <- applyAdjustedRtime(lcms2)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"evaluation","dir":"Articles","previous_headings":"","what":"Evaluation","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"extract base peak chromatogram (BPC) aligned object: evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks. quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line.","code":"#' evaluate the results with BPC bpc2_adj <- chromatogram(lcms2, aggregationFun = \"max\", msLevel = 1) #' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\") # Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"visualizing-alignment-quality","dir":"Articles","previous_headings":"Introduction","what":"Visualizing Alignment Quality","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"evaluate performance alignment process, generate plots comparing alignment reference dataset (black) LC-MS data (red) (blue) adjustment. Although overall matching imperfect due initial sample issues, certain regions show significant improvement. alignment signal’s start particularly well done. Specifically, regions right 150 seconds show substantial improvement. visualization distribution chromatographic peaks matched anchor peaks (Lamas) Sample . red vertical lines represent positions matched peaks.","code":"#' BPC of sample A par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1,1]), intensity(bpc2[1,1]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 1], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1,1]), intensity(bpc2_adj[1,1]), type = \"l\", col = \"#0000ff80\") #' BPC of sample B par(mfrow = c(2, 1), mar = c(2.5, 2.5, 2.5, 0.5), mgp = c(1.5, 0.5, 0)) plot(bpc1[1, 2], col = \"#00000080\", main = \"Before Alignment\", lwd = 1.5, peakType = \"none\", xlab = NA) grid() points(rtime(bpc2[1, 2]), intensity(bpc2[1, 2]), type = \"l\", col = \"#0000ff80\") legend(\"topleft\", col = c(\"#00000080\", \"#0000ff80\"), legend = c(\"LC-MS\", \"LC-MS/MS\"), lty = 1, lwd = 2, horiz = TRUE, bty = \"n\") plot(bpc1[1, 2], col = \"#00000080\", main = \"After Alignment\", lwd = 1.5, peakType = \"none\", xlab = \"rtime (s)\") grid() points(rtime(bpc2_adj[1, 2]), intensity(bpc2_adj[1, 2]), type = \"l\", col = \"#0000ff80\") #' BPC of the first sample with matches to lamas overlay par(mfrow = c(1, 1)) plot(bpc1[1, 1], col = \"#00000080\", main = \"Distribution CP matched to Lamas\", lwd = 1.5, peakType = \"none\") points(rtime(bpc2_adj[1, 1]), intensity(bpc2_adj[1, 1]), type = \"l\", col = \"#0000ff80\") grid() abline(v = ref_vs_obs[[1]]$obs, col = \"#c4114510\")"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"quantitative-evaluation-of-alignment","dir":"Articles","previous_headings":"Introduction","what":"Quantitative Evaluation of Alignment","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"quantitatively assess quality alignment, compute distance chromatographic peaks LC-MS data anchor peaks (Lamas) alignment. library(vioplot) Furthermore, detailed examination matching model used fitting file possible. Numerical information can obtained using summarizeLamaMatch() function. , percentage chromatographic peaks utilized alignment can computed relative total number peaks file. Additionally, feasible directly plot() param object file interest, showcasing distribution chromatographic peaks along fitted model line.","code":"# Extract data for sample 3 directly ref_obs_sample_1 <- ref_vs_obs[[\"1\"]] # Calculate distances before and after alignment dist_before <- abs(ref_obs_sample_1$obs - ref_obs_sample_1$ref) dist_after <- abs(chromPeaks(lcms2)[ref_obs_sample_1$chromPeaksId, \"rt\"] - ref_obs_sample_1$ref) # Create a data frame for plotting distances <- data.frame( Distance = c(dist_before, dist_after), Alignment = rep(c(\"Before\", \"After\"), each = length(dist_before)) ) # Set factor levels for Alignment to ensure correct order distances$Alignment <- factor(distances$Alignment, levels = c(\"Before\", \"After\")) # Plot distances between anchor peaks between the two runs before and after alignment. vioplot(Distance ~ Alignment, data = distances, xlab = \"\", rectCol = \"#c4114580\", lineCol = \"white\", col=\"#17138fe8\", border = \"white\", ylab = \"Distance (s)\", main = \"Distance to Anchor Peaks: Before vs. After Alignment\") #' Access summary of matches and model information summary <- summarizeLamaMatch(param) summary Total_peaks Matched_peaks Total_lamas Model_summary 1 6832 1825 5374 1666, c(.... 2 6860 1785 5374 1617, c(.... 3 7588 2082 5374 1869, c(.... #' Coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 [1] 26.71253 26.02041 27.43806 #' Access the information on the model of for the first file summary$Model_summary[[1]] Call: loess(formula = ref ~ obs, data = rt_map, weights = weights, span = span) Number of Observations: 1666 Equivalent Number of Parameters: 7.38 Residual Standard Error: 2.315 Trace of smoother matrix: 8.13 (exact) Control settings: span : 0.5 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE #' Plot obs vs. lcms1 with fitting line plot(param, index = 1L, main = \"ChromPeaks versus Lamas for sample A\", colPoint = \"red\") abline(0, 1, lty = 3, col = \"grey\") grid()"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/alignment-to-external-dataset.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Seamless Alignment: Merging New Data with and Existing Preprocessed Dataset","text":"tutorial demonstrated align LC-MS LC-MS/MS datasets correct retention time shifts, crucial handling data different runs platforms. preprocessed data, detected chromatographic peaks, used landmark features (lamas) QC samples adjust retention times via adjustRtime() function. Visual comparisons base peak chromatograms alignment, along distance calculations, showed clear improvements RT synchronization. Ultimately, aligning chromatographic data ensures subsequent analyses, feature extraction statistical comparisons, based consistent time points, improving data quality reliability. tutorial outlined end--end workflow can adapted various LC-MS-based metabolomics studies, helping researchers manage retention time variation effectively.","code":""},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/dataset-investigation.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Dataset investigation: What to do when you get your data","text":", (amazing lab mate) finally finished data acquisition, now dataset hand. ’s next? Unfortunately, work isn’t yet. diving analysis, ’s crucial understand dataset . first step data analysis workflow, ensuring data good quality well-prepared preprocessing downstream analysis plan perform. vignette, present dataset used throughout different vignettes website. ’s far perfect dataset, actually mirrors reality datasets ’ll encounter research. issues indeed specific described dataset. However, purpose vignette encourage think critically data guide steps can help avoid spending hours analysis, realize later samples features removed flagged earlier .","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/dataset-investigation.html","id":"dataset-description","dir":"Articles","previous_headings":"","what":"Dataset Description","title":"Dataset investigation: What to do when you get your data","text":"workflow, two datasets used: LC-MS-based (MS1 level ) untargeted metabolomics dataset quantify small polar metabolites human plasma samples. additional LC-MS/MS dataset selected samples former study identification annotation significant features. samples randomly selected larger study aimed identifying metabolites varying abundances individuals suffering cardiovascular disease (CVD) healthy controls (CTR). subset analyzed includes data three CVD patients, three CTR individuals, four quality control (QC) samples. QC samples, representing pooled serum sample large cohort, measured repeatedly throughout experiment monitor signal stability. data metadata workflow available MetaboLights database ID: MTBLS8735. detailed materials methods used sample analysis also available MetaboLights entry. particularly important understanding analysis parameters used. noted samples analyzed using ultra-high-performance liquid chromatography (UHPLC) coupled Q-TOF mass spectrometer (TripleTOF 5600+), chromatographic separation achieved using hydrophilic interaction liquid chromatography (HILIC). Consider moving visualizations end--end vignette clearer understanding dataset. Provide -depth visualizations explore understand dataset quality. Compare pool lc-ms pool lc-ms/ms show better separation second run.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/install_v0.html","id":"running-workflows-locally","dir":"Articles","previous_headings":"","what":"Running workflows locally","title":"Install","text":"install computer packages necessary workflows run code follow:","code":"install.packages(\"BiocManager\") BiocManager::install(c('RforMassSpectrometry/MsIO', 'RforMassSpectrometry/MsBackendMetaboLights'), ask = FALSE, dependencies = TRUE) BiocManager::install(\"rformassspectrometry/metabonaut\", dependencies = TRUE, ask = FALSE, update = TRUE)"},{"path":"https://rformassspectrometry.github.io/metabonaut/articles/install_v0.html","id":"docker-image","dir":"Articles","previous_headings":"","what":"Docker image","title":"Install","text":"vignettes files along R runtime environment including required packages RStudio (Posit) editor bundled docker container. installation, docker container can run computer code examples vignettes can evaluated within environment (without need install additional packages files). don’t already , install docker. Find installation information . Get docker image tutorial e.g. command line : Start docker container, either Docker Desktop, command line Enter http://localhost:8787 web browser log username rstudio password bioc. RStudio server version: open Quarto files vignettes folder evaluate R code blocks document.","code":"docker pull rformassspectrometry/metabonaut:latest docker run -e PASSWORD=bioc -p 8787:8787 rformassspectrometry/metabonaut:latest"},{"path":"https://rformassspectrometry.github.io/metabonaut/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Philippine Louail. Author, maintainer. ORCID: 0009-0007-5429-6846 Anna Tagliaferri. Contributor. ORCID: 0009-0001-4044-4285 Vinicius Verri Hernandes. Contributor. ORCID: 0000-0002-3057-6460 Daniel Marques de Sá e Silva. Contributor. ORCID: 0000-0002-9674-042X Johannes Rainer. Author. ORCID: 0000-0002-6977-7147","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Philippine Louail, & Johannes Rainer. (2024). Streamlining LC-MS/MS Data Analysis R Open-Source xcms RforMassSpectrometry: End--End Workflow (Version v1).Zenodo. https://doi.org/10.5281/zenodo.11370612","code":"@Manual{, title = {Streamlining LC-MS/MS Data Analysis in R with Open-Source xcms and RforMassSpectrometry: An End-to-End Workflow}, author = {Philippine Louail and Johannes Rainer}, publisher = {Zenodo}, year = {2024}, month = {may}, version = {v1.1.0}, doi = {10.5281/zenodo.11370612}, url = {https://doi.org/10.5281/zenodo.11370612}, }"},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"lets-explore-and-learn-to-analyze-untargeted-metabolomics-data","dir":"","previous_headings":"","what":"Let’s explore and learn to analyze untargeted metabolomics data","title":"Exploring and Analyzing LC-MS Data","text":"Welcome Metabonaut! 🧑🚀 initiative presents series workflows based small LC-MS/MS dataset, utilizing R Bioconductor packages. Throughout workflows, demonstrate adapt various algorithms specific datasets seamlessly integrate R packages ensure efficient, reproducible processing. primary workflow “Complete End--End LC-MS/MS Metabolomic Data Analysis”. full R code examples, along detailed descriptions, available end--end-untargeted-metabolomics.qmd file. file can opened RStudio, allowing execute individual R command. vignettes website interlinked, can find detailed description dataset used throughout . strive reproducibility. workflows designed remain stable time, allowing run vignettes together one comprehensive “super-vignette”. major change document smaller updates check News","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"for-r-beginners","dir":"","previous_headings":"","what":"For R beginners","title":"Exploring and Analyzing LC-MS Data","text":"tutorials provided assume users basic knowledge R RMarkdown. ’re unfamiliar either, recommend completing short tutorial help test code adapt data. vignettes written Quarto format, learn go , farily new format, functionallity shared RMarkdown format, therefore learning can usefull . basic R course documentation, recommand check website try interactive course fun introduction basic R programming. cheatsheet also help.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"known-issues","dir":"","previous_headings":"","what":"Known Issues","title":"Exploring and Analyzing LC-MS Data","text":"just beginning Metabonaut journey, website still refined. ’re actively addressing ongoing issues. ’re aware problem, ’ll list . Currently, known issues code. encounter , please ensure latest versions required packages (detailed ). issue persists, please report reproducible example GitHub . encounter issues, don’t hesitate let us know!","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"contribution","dir":"","previous_headings":"","what":"Contribution","title":"Exploring and Analyzing LC-MS Data","text":"contributions, please see RforMassSpectrometry contributions guideline.","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/index.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Exploring and Analyzing LC-MS Data","text":"Please review RforMassSpectrometry Code Conduct.","code":""},{"path":[]},{"path":"https://rformassspectrometry.github.io/metabonaut/news/index.html","id":"changes-in-0-0-2","dir":"Changelog","previous_headings":"","what":"Changes in 0.0.2","title":"metabonaut 0.0.2","text":"Switch Quarto instead Rmarkdown Addition Alignment reference dataset vignette Addition Data investigation vignette Addition Install vignette","code":""},{"path":"https://rformassspectrometry.github.io/metabonaut/news/index.html","id":"changes-in-0-0-2-1","dir":"Changelog","previous_headings":"","what":"Changes in 0.0.1","title":"metabonaut 0.0.2","text":"Addition basic files workflow package. Addition end--end vignette.","code":""}]