Skip to content

dajmcdon/rvdss-canada

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Respiratory Virus Detections in Canada

This is a data repository for respiratory virus detections in Canada collected by the Respiratory Virus Detection Surveillance System (RVDSS) and published by the Public Health Agency of Canada (PHAC).

The data was historically reported in weekly reports, but have moved to an interactive dashboard as of week 24 of the 2023-2024 season (week ending June 15, 2024). Historic reports and the interactive dashboard can be found here.

Directory Structure

For each season, there are 2 or 3 files:

Column Names

  • epiweek: the epidemiological week, in the format of yearweek (eg. 201712 is the 12th epidemiological week of 2017). The list of epidemiological weeks for the current 2024-2025 season can be found here

  • time_value: the last day of the epiweek

  • issue: the date (still the end of an epiweek) that the observation was revised

  • geo_type: the type of geographical location

  • geo_value: the actual geographical location

  • [virus]_tests: the total number of tests for a given virus

  • [virus]_positive_tests: the number of tests for a given virus that are positive

  • [virus]_pct_positive: the percentage of tests for a given virus that are positive

Reading the data directly

For convenience, we provide some code snippets to read in this data programmatically.

In Python

Read in data for a single season

import pandas as pd

base_url = "https://raw.githubusercontent.com/dajmcdon/rvdss-canada/main/data/"
one_season = "season_2024_2025/positive_tests.csv"
positive_tests= pd.read_csv(base_url+one_season)

Read in data for all seasons, merge into a single dataframe

import pandas as pd

base_url = "https://raw.githubusercontent.com/dajmcdon/rvdss-canada/main/data/"
years = list(range(2013,2025))
all_seasons = ["season_"+str(y)+"_"+str(y+1) for y in years]
all_seasons = [pd.read_csv(base_url+season+"/positive_tests.csv") for season in all_seasons]
all_seasons = [df.set_index(['epiweek', 'time_value', 'issue', 'geo_type', 'geo_value']) for df in all_seasons]

positive_tests = pd.concat(all_seasons)

In R

Read in data for a single season

library(readr)
base_url <- "https://raw.githubusercontent.com/dajmcdon/rvdss-canada/main/data/"
one_season <- "season_2024_2025/positive_tests.csv"
positive_tests <- read_csv(paste0(base_url, one_season))

Read in data for all seasons, merge into a single tibble

library(readr)
library(dplyr)
years <- 2013:2024
all_seasons <- paste0("season_", years, "_", years + 1, "/positive_tests.csv")
all_seasons <- lapply(all_seasons, \(.x) read_csv(paste0(base_url, .x))) # ~ 30MB
positive_tests <- bind_rows(all_seasons)

You can also convert this data to a more compact format using CMU Delphi Group's {epiprocess} R Package.

remotes::install_github("cmu-delphi/epiprocess")
library(epiprocess)
pt <- as_epi_archive(positive_tests, other_keys = "epiweek", compactify = TRUE) # ~ 3.3MB

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages