-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlongitudinal-data-organization.Rmd
106 lines (86 loc) · 2.72 KB
/
longitudinal-data-organization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: "Longitudinal data organization"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{longitudinal-data-organization}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
#| message: false
library(alda)
library(dplyr)
library(tidyr)
```
## Longitudinal data formats
Longitudinal data can be organized into two distinct formats:
- A **person-level**, **wide**, or **multivariate**, format where each person has only one row of data and multiple columns containing data from each measurement occasion.
```{r}
glimpse(deviant_tolerance_pl)
```
- A **person-period**, **long**, or **univariate**, format where each person has one row of data for each measurement occasion.
```{r}
glimpse(deviant_tolerance_pp)
```
Most R functions expect data to be in the person-period format for visualization and analysis, but it's easy to convert a longitudinal data set from one format to the other.
### Converting between formats
To convert a person-level data set to person-period format use `dplyr::pivot_longer()`:
```{r}
pivot_longer(
deviant_tolerance_pl,
cols = starts_with("tolerance_"),
names_to = "age",
names_pattern = "([[:digit:]]+)",
names_transform = as.integer,
values_to = "tolerance"
)
```
To convert a person-period data set to person-level format use `dplyr::pivot_wider()`:
```{r}
pivot_wider(
deviant_tolerance_pp,
names_from = age,
names_prefix = "tolerance_",
values_from = tolerance
)
```
## Adding discrete time indicators to person-period data
To add discrete time indicators to a person-period data set first create a temporary copy of the time variable and a column of ones, then use `dplyr::pivot_wider()`:
```{r}
deviant_tolerance_pp |>
mutate(
temp_age = age,
temp_dummy = 1
) |>
pivot_wider(
names_from = temp_age,
names_prefix = "age_",
values_from = temp_dummy,
values_fill = 0
)
```
## Adding contiguous periods to person-level survival data
To add contiguous periods to person-level data use `dplyr::reframe()`:
```{r}
first_sex |>
# In order to add the event indicator, the time variable needs a different
# name in the person-level data from the name we want to use in `reframe()`.
# This is a temporary variable so it doesn't matter what the name is.
rename(grades = grade) |>
group_by(id) |>
reframe(
grade = 1:max(grades),
event = if_else(grade == grades & censor == 0, 1, 0),
# To keep predictors from the person-level data, simply list them. If there
# are many predictors it might be more convenient to use
# `dplyr::left_join()` after `reframe()`.
parental_transition,
parental_antisociality
)
```