-
Notifications
You must be signed in to change notification settings - Fork 17
/
README.Rmd
133 lines (94 loc) · 9.35 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
```
```{r setup, include=FALSE}
library(tidyverse)
library(magrittr)
library(Tplyr)
library(knitr)
```
# **Tplyr** <img src="man/figures/logo.png" align="right" alt="" width="120" />
<!-- badges: start -->
[<img src="http://pharmaverse.org/shields/Tplyr.svg">](https://pharmaverse.org)
[<img src="https://img.shields.io/badge/Slack-RValidationHub-blue?style=flat&logo=slack">](https://RValidationHub.slack.com)
[![R build status](https://github.com/atorus-research/tplyr/workflows/R-CMD-check/badge.svg)](https://github.com/atorus-research/tplyr/actions?workflow=R-CMD-check)
[<img src="https://img.shields.io/codecov/c/github/atorus-research/tplyr">](https://app.codecov.io/gh/atorus-research/tplyr)
[<img src="https://img.shields.io/badge/License-MIT-blue.svg">](https://github.com/atorus-research/Tplyr/blob/master/LICENSE)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->
Welcome to **Tplyr**! **Tplyr** is a traceability minded grammar of data format and summary. It's designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result **Tplyr** produces, it also produces the metadata necessary to give your traceability from source to summary.
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue through GitHub right [here](https://github.com/atorus-research/Tplyr/issues).
Take a look at the [cheatsheet!](https://atorus-research.github.io/Tplyr_cheatsheet.pdf)
# Installation
You can install **Tplyr** with:
```{r install, eval=FALSE}
# Install from CRAN:
install.packages("Tplyr")
# Or install the development version:
devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="devel")
```
# What is **Tplyr**?
[dplyr](https://dplyr.tidyverse.org/) from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. [dplyr](https://dplyr.tidyverse.org/) conceptually breaks things down into verbs that allow you to focus on _what_ you want to do more than _how_ you have to do it.
**Tplyr** is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:
- Counting for event based variables or categories
- Shifting, which is just counting a change in state with a 'from' and a 'to'
- Generating descriptive statistics around some continuous variable.
For many of the tables that go into a clinical submission, the tables are made up of a combination of these approaches. Consider a demographics table - and let's use an example from the PHUSE project Standard Analyses & Code Sharing - [Analyses & Displays Associated with Demographics, Disposition, and Medications in Phase 2-4 Clinical Trials and Integrated Summary Documents](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+%26+Displays+Associated+with+Demographics,+Disposition+and+Medication+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary+Documents.pdf).
<p align="center"><img src="vignettes/demo_table.png" width="800px"></p>
When you look at this table, you can begin breaking this output down into smaller, redundant, components. These components can be viewed as 'layers', and the table as a whole is constructed by stacking the layers. The boxes in the image above represent how you can begin to conceptualize this.
- First we have Sex, which is made up of n (%) counts.
- Next we have Age as a continuous variable, where we have a number of descriptive statistics, including n, mean, standard deviation, median, quartile 1, quartile 3, min, max, and missing values.
- After that we have age, but broken into categories - so this is once again n (%) values.
- Race - more counting,
- Ethnicity - more counting
- Weight - and we're back to descriptive statistics.
So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed.
In the same way that [dplyr](https://dplyr.tidyverse.org/) is a grammar of data manipulation, **Tplyr** aims to be a grammar of data summary. The goal of **Tplyr** is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller 'layers', and combining them together like you see on the page.
Enough talking - let's see some code. In these examples, we will be using data from the [PHUSE Test Data Factory]( https://advance.phuse.global/display/WEL/Test+Dataset+Factory) based on the [original pilot project submission package](https://github.com/atorus-research/CDISC_pilot_replication). We've packaged some subsets of that data into **Tplyr**, which you can use to replicate our examples and run our vignette code yourself. Note: You can see our replication of the CDISC pilot using the PHUSE Test Data Factory data [here](https://github.com/atorus-research/CDISC_pilot_replication).
```{r initial_demo}
tplyr_table(tplyr_adsl, TRT01P, where = SAFFL == "Y") %>%
add_layer(
group_desc(AGE, by = "Age (years)")
) %>%
add_layer(
group_count(AGEGR1, by = "Age Categories n (%)")
) %>%
build() %>%
kable()
```
## **Tplyr** is Qualified
We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing **Tplyr** includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.
You can find the qualification document within this repository right [here](https://github.com/atorus-research/Tplyr/blob/master/uat/references/output/uat.pdf). The 'uat' folder additionally contains all of the raw files, programmatic tests, specifications, and test cases necessary to create this report.
## The TL;DR
Here are some of the high level benefits of using **Tplyr**:
- Easy construction of table data using an intuitive syntax
- Smart string formatting for your numbers that's easily specified by the user
- A great deal of flexibility in what is performed and how it's presented, without specifying hundreds of parameters
# Where to go from here?
There's quite a bit more to learn! And we've prepared a number of other vignettes to help you get what you need out of **Tplyr**.
- The best place to start is with our Getting Started vignette at `vignette("Tplyr")`
- Learn more about table level settings in `vignette("table")`
- Learn more about descriptive statistics layers in `vignette("desc")`
- Learn more about count layers in `vignette("count")`
- Learn more about shift layers in `vignette("shift")`
- Learn more about percentages in `vignette("denom")`
- Learn more about calculating risk differences in `vignette("riskdiff")`
- Learn more about sorting **Tplyr** tables in `vignette("sort")`
- Learn more about using **Tplyr** options in `vignette("options")`
- And finally, learn more about producing and outputting styled tables using **Tplyr** in `vignette("styled-table")`
In the **Tplyr** version 1.0.0, we've packed a number of new features in. For deeper dives on the largest new additions:
- Learn about **Tplyr**'s traceability metadata in `vignette("metadata")` and about how it can be extended in `vignette("custom-metadata")`
- Learn about layer templates in `vignette("layer_templates")`
# References
In building **Tplyr**, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
[Analysis and Displays Associated with Adverse Events](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+and+Displays+Associated+with+Adverse+Events+Focus+on+Adverse+Events+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary.pdf)
[Analyses and Displays Associated with Demographics, Disposition, and Medications](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+%26+Displays+Associated+with+Demographics,+Disposition+and+Medication+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary+Documents.pdf)
[Analyses and Displays Associated with Measures of Central Tendency](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+%26+Displays+Associated+with+Measures+of+Central+Tendency-+Focus+on+Vital+Sign,+Electrocardiogram+%26+Laboratory+Analyte+Measurements+in+Phase+2-4+Clinical+Trials+and+Integrated+Submissions.pdf)