Skip to content

Commit

Permalink
Merge branch 'main' into phili
Browse files Browse the repository at this point in the history
  • Loading branch information
philouail committed Oct 10, 2024
2 parents 6446716 + 93e47db commit 28d40aa
Showing 1 changed file with 79 additions and 0 deletions.
79 changes: 79 additions & 0 deletions vignettes/dataset-investigation.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: "Dataset investigation: What to do when you get your data"
format: html
editor: visual
minimal: true
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
vignette: >
%\VignetteIndexEntry{Dataset investigation: What to do when you get your data}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
---

## In Construction

```{r setup, include=FALSE}
library(knitr)
library(quarto)
knitr::opts_knit$set(root.dir = './')
```

# Introduction

So, you (or your amazing lab mate) have finally finished the data acquisition,
and now you have a dataset in hand. What's next? Unfortunately, the work isn't
over yet. Before diving into any analysis, it's crucial to understand the
dataset itself. This is the first step in any data analysis workflow, ensuring
that the data is of good quality and is well-prepared for preprocessing and any
downstream analysis you plan to perform.

In this vignette, we present the dataset used throughout the different vignettes
of this website. It's far from a *perfect* dataset, which actually mirrors the
reality of most datasets you'll encounter in research.

Some issues will indeed be specific to this described dataset. However, the
purpose of this vignette is to encourage you to think critically about your data
and guide you through steps that can help you avoid spending hours on an
analysis, only to realize later that some samples or features should have been
removed or flagged earlier on.

# Dataset Description

In this workflow, two datasets are used:

1. An LC-MS-based (MS1 level only) untargeted metabolomics dataset to quantify
small polar metabolites in human plasma samples.
2. An additional LC-MS/MS dataset of selected samples from the former study for
the identification and annotation of significant features.

The samples were randomly selected from a larger study aimed at identifying
metabolites with varying abundances between individuals suffering from
cardiovascular disease (CVD) and healthy controls (CTR). The subset analyzed
here includes data for three CVD patients, three CTR individuals, and four
quality control (QC) samples. The QC samples, representing a pooled serum sample
from a large cohort, were measured repeatedly throughout the experiment to
monitor signal stability.

The data and metadata for this workflow are available on the MetaboLights
database under the ID: MTBLS8735.

The detailed materials and methods used for the sample analysis are also
available in the MetaboLights entry. This is particularly important for
understanding the analysis and the parameters used. It should be noted that the
samples were analyzed using ultra-high-performance liquid chromatography (UHPLC)
coupled to a Q-TOF mass spectrometer (TripleTOF 5600+), and chromatographic
separation was achieved using hydrophilic interaction liquid chromatography
(HILIC).

- Consider moving visualizations from the end-to-end vignette to here for a
clearer understanding of the dataset.
- Provide more in-depth visualizations to explore and understand the dataset
quality.
- Compare pool lc-ms and pool lc-ms/ms and show that we have better separation
on the second run.

```{r}
getwd()
list.files()
list.dirs()
```

0 comments on commit 28d40aa

Please sign in to comment.