vignettes/longitudinal-data-organization.Rmd

---
title: "Longitudinal data organization"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{longitudinal-data-organization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
#| message: false
library(alda)
library(dplyr)
library(tidyr)
```

## Longitudinal data formats

Longitudinal data can be organized into two distinct formats:

- A **person-level**, **wide**, or **multivariate**, format where each person has only one row of data and multiple columns containing data from each measurement occasion.

```{r}
glimpse(deviant_tolerance_pl)
```

- A **person-period**, **long**, or **univariate**, format where each person has one row of data for each measurement occasion.

```{r}
glimpse(deviant_tolerance_pp)
```

Most R functions expect data to be in the person-period format for visualization and analysis, but it's easy to convert a longitudinal data set from one format to the other.

### Converting between formats

To convert a person-level data set to person-period format use `dplyr::pivot_longer()`:

```{r}
pivot_longer(
  deviant_tolerance_pl,
  cols = starts_with("tolerance_"),
  names_to = "age",
  names_pattern = "([[:digit:]]+)",
  names_transform = as.integer,
  values_to = "tolerance"
)
```

To convert a person-period data set to person-level format use `dplyr::pivot_wider()`:

```{r}
pivot_wider(
  deviant_tolerance_pp,
  names_from = age,
  names_prefix = "tolerance_",
  values_from = tolerance
)
```

## Adding discrete time indicators to person-period data

To add discrete time indicators to a person-period data set first create a temporary copy of the time variable and a column of ones, then use `dplyr::pivot_wider()`:

```{r}
deviant_tolerance_pp |>
  mutate(
    temp_age = age,
    temp_dummy = 1
  ) |>
  pivot_wider(
    names_from = temp_age,
    names_prefix = "age_",
    values_from = temp_dummy,
    values_fill = 0
  )
```

## Adding contiguous periods to person-level survival data

To add contiguous periods to person-level data use `dplyr::reframe()`:

```{r}
first_sex |>
  # In order to add the event indicator, the time variable needs a different
  # name in the person-level data from the name we want to use in `reframe()`.
  # This is a temporary variable so it doesn't matter what the name is.
  rename(grades = grade) |>
  group_by(id) |>
  reframe(
    grade = 1:max(grades),
    event = if_else(grade == grades & censor == 0, 1, 0),
    # To keep predictors from the person-level data, simply list them. If there
    # are many predictors it might be more convenient to use
    # `dplyr::left_join()` after `reframe()`.
    parental_transition,
    parental_antisociality
  )
```