diff --git a/footer.html b/footer.html new file mode 100644 index 0000000..966617f --- /dev/null +++ b/footer.html @@ -0,0 +1,12 @@ + +
Developed by Sam Abbott, Katharine Sherratt, and Sebastian Funk
+ + + + + + + \ No newline at end of file diff --git a/header.html b/header.html new file mode 100644 index 0000000..22c9605 --- /dev/null +++ b/header.html @@ -0,0 +1 @@ + diff --git a/library.bib b/library.bib new file mode 100644 index 0000000..cfa2251 --- /dev/null +++ b/library.bib @@ -0,0 +1,68 @@ +@Manual{R, + title = {R: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2019}, + url = {https://www.R-project.org/}, + } + +@Manual{cmdstanr, + title = {cmdstanr: R Interface to 'CmdStan'}, + author = {Jonah Gabry and Rok Češnovar}, + year = {2021}, + note = {https://mc-stan.org/cmdstanr, https://discourse.mc-stan.org}, + } + +@Manual{stan, + title = {Stan Modeling Language Users Guide and Reference Manual, 2.28.1}, + author = {Stan Development Team}, + year = {2021}, + note = {https://mc-stan.org}, + } + +@Manual{scoringutils, + title = {scoringutils: A collection of proper scoring rules and metrics to assess predictions}, + author = {Nikos Bosse}, + year = {2020}, + note = {R package version 0.0.0.9000}, + url = {https://github.com/epiforecasts/scoringutils} +} + +@Article{betancourt_2017, + title={Diagnosing biased inference with divergences}, + author={Betancourt, Michael}, + year={2017}, + volume={4}, + journal={Stan Case Studies}, + url={https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html} +} + +@Article{forecast.vocs, + title = {forecast.vocs: Forecast case and sequence notifications using variant of concern strain dynamics}, + author = {Sam Abbott}, + journal = {Zenodo}, + year = {2021}, + doi = {10.5281/zenodo.5559016}, + } + + @Misc{loo, + title = {loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models}, + author = {Aki Vehtari and Jonah Gabry and Mans Magnusson and Yuling Yao and Paul-Christian Bürkner and Topi Paananen and Andrew Gelman}, + year = {2020}, + note = {R package version 2.4.1}, + url = {https://mc-stan.org/loo/}, + } + + @Article{loo-paper, + title = {Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC}, + author = {Aki Vehtari and Andrew Gelman and Jonah Gabry}, + year = {2017}, + journal = {Statistics and Computing}, + volume = {27}, + issue = {5}, + pages = {1413--1432}, + doi = {10.1007/s11222-016-9696-4}, + } + +To cite the \ No newline at end of file diff --git a/paper.Rmd b/paper.Rmd new file mode 100644 index 0000000..6b8610c --- /dev/null +++ b/paper.Rmd @@ -0,0 +1,237 @@ +--- +title: "Real-time estimation of the time-varying transmissibility advantage of Omicron in England using S-Gene Target Status as a Proxy" +author: Katharine Sherratt, Sam Abbott, Sebastian Funk +output: bookdown::pdf_document2 +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set(echo = FALSE, include = TRUE, + warning = FALSE, message = FALSE, + root.dir = here::here()) +options(scipen = 1, digits = 2) +library(here) +library(forecast.vocs) +library(dplyr) +library(tidyr) +library(ggplot2) +library(janitor) +library(knitr) +library(data.table) +``` + + +```{r load-results} +# load packags +library(ggplot2) +library(scales) +library(dplyr) +library(here) +library(readr) + +# load functions +source(here("R", "load-local-data.R")) + +# get latest date +date <- get_latest_date() + +# load data +daily <- load_local_data(date) + +# load results +sgtf <- load_results(date) +bias <- load_results(date, type = "bias") +``` + +## Abstract {-} + +**Background** + +**Methods** + +**Results** + +**Conclusions** + + +## Introduction {-} + +**Why** + +**Detail** + +**Aim and what we did** + +We aimed to assess competing explanations of the transmission advantage of a variant of concern (VoC), Omicron, compared to the existing dominant strain, Delta, in England. We explored the likelihood of increased transmissibility compared to immune escape, using S-gene target failure as a proxy for infection with Omicron. + +We use a model framework where we vary only the relationship between variants while holding all other parameters constant. We compare leave-one-out validation among the models and explore the estimated transmission advantage. + +When processing a positive test result for Covid-19 it is also possible to detect the S-gene target (SGT). A test for SGT can give either a failure or a positive result. The proportion of cases that are tested for SGT and result in SGT-Failure is a useful proxy for the proportion of the Omicron variant in comparison to the Delta variant. + +Only a sample of positive Covid-19 test results are also tested for SGT. The ability to test for SGT depends on type of Covid-19 test (only PCR), and laboratory resources. If these constraints are associated with factors influencing transmission, this would bias estimates of transmission made only from SGT data and confound the relationship between SGT results and all test-positive cases. + +We aim to explore bias in the availability of S-gene target results among cases testing positive for Covid-19. Here we use a branching process model to identify any difference in growth rate between cases which have an SGT result and those which do not. + +## Methods {-} + +### Data + +We used case data from England. This includes all cases tested by PCR test following symptom onset and a positive lateral flow test result. For a varying proportion of these cases, PCR test results included information on S-gene target. We used S-gene target negativity as a proxy for the Omicron variant. We excluded data before `r start_date`, to reduce overfitting to stochastic imports. Data are included up to `r max(obs$date)` and modelled at a `r parameters$timespan` day resolution. All dates are by specimen date. + +We used a two-strain branching process model to compare all test-positive cases to test-positive cases with any SGT result. We used data for all England test-positive cases. We used only the most recent three weeks of data between `r min(obs$date)` and `r max(obs$date)`. + +### Models + +We model strain dynamics using the single strain model as a starting point but with the addition of strain specific AR(P) case model and a beta binomial (or optionally binomial) observation process for sequence data. The variant of concerns growth rate is modelled either as a fixed scaling of the non-variant of concern growth rate, as an independent AR(1) process, or in a vector autoregression framework as a correlated AR(1) process. This last formulation is described in detail below along with the modifications required to define the other two formulations. Parameters related to the variant of concern (VoC) are given the $\delta$ superscript and parameters related to non-VoC cases are given the $o$ superscript. + +Mean reported cases are again defined using a AR(1) process on the log scale for each strain and then combined for overall mean reported cases. + +\begin{align} + \lambda_t &\sim \text{LogNormal}\left(\log C_t , 0.025 \times \log C_t \right) ,\ t \leq P\\ + \lambda^{\delta}_t &\sim \text{LogNormal}\left(\log C^{\delta}_t , 0.025 \times \log C^{\delta}_t \right),\ t \leq P \\ + \lambda^{o}_t &= \lambda_t - \lambda^{\delta}_t,\ t \leq P \\ + \lambda^{\delta}_t &= \text{exp}\left(r^{\delta}_t\right) \sum_{p = 1}^{P} \alpha^{\delta}_p \lambda^{\delta}_{t-p},\ t \gt P \\ + \lambda^{o}_t &= \text{exp}\left(r^{o}_t\right) \sum_{p = 1}^{P} \alpha^{o}_p \lambda^{o}_{t-p}, t \gt P \\ + \lambda_t &= \lambda^{\delta}_t + \lambda^{o}_t +\end{align} + +Where $C^{\delta}_0$ is derived by calculating the mean proportion of cases that had the VoC for the first time point using the overall number of reported cases, the number of sequenced cases, and the number of sequences that were positive for the VoC. The growth rate for both VoC and non-VoC cases ($r^{o, \delta}_t$) is again modelled as differenced AR(1) processes but now combined with a variant specific modifier ($s^{o, \delta}_0$) to growth when variant sequences are first reported ($t_{seq}$), and a correlation structure for the time and strain varying error terms ($\eta^{o, \delta}$). + + +\begin{align} + r^{o}_1 &\sim \text{Normal}\left(0, 0.25 \right),\\ + r^{\delta}_{t_{seq}} &= r^{o}_{t_{seq}} + s^{\delta} \\ + r^{o, \delta}_t &= r^{o, \delta}_{t-1} + \epsilon^{o, \delta}_t \\ + \epsilon^{o, \delta}_0 &= \eta^{o, \delta}_0 \\ + \epsilon^{o, \delta}_t &= \beta \epsilon^{o, \delta}_{t-1} + \eta^{o, \delta}_t +\end{align} + +Where, + +\begin{align} + s^{\beta} &\sim \text{Normal}(0, 0.5) \\ + \eta^{o, \delta}_t &\sim \text{MVN}\left(0, \boldsymbol \Sigma \right) \\ + \beta &\sim \text{Normal}\left(0, 0.5 \right) +\end{align} + +Where $\boldsymbol \Sigma$ is a $2 \times 2$ covariance matrix which we decompose for computational stability into a diagonal matrix containing variant specific scale parameters ($\boldsymbol \Delta$) and a symmetric correlation matrix ($\boldsymbol \Omega$) as follows [@stan], + +\begin{align} + \boldsymbol \Sigma &= \boldsymbol \Delta \boldsymbol \Omega \boldsymbol \Delta \\ + \boldsymbol \Delta &= \left[ {\begin{array}{cc} + \sigma^o & 0 \\ + 0& \sigma^{\delta} \\ + \end{array} } \right] \\ + \boldsymbol \Omega &\sim \left[ {\begin{array}{cc} + 1 & \omega \\ + \omega & 1 \\ + \end{array} } \right] \\ + \sigma^{o, \delta} &\sim \text{Normal}\left(0, 0.2 \right) +\end{align} + +Where $\boldsymbol \Omega$ has a Lewandowski-Kurowicka-Joe (LKJ) prior where +$\omega$ controls the prior expectation for correlation between variants (with $\omega = 1$ resulting in a uniform prior over all correlations, $\omega \lt 1$ placing more weight on larger correlations and $\omega \gt 1 placing more weight on small amounts of correlations). By default we set $\omega = 0.5$ giving more weight to correlations between variants dynamics. + +On top of this correlated strain dynamic model `forecast.vocs` also offers two other options that represent extremes in the correlated model. The first of these assumes that both strains evolve with independent differenced AR(1) growth rates with only a scaling factor (s^{\delta}) linking them. The second assumes that strains growth rates are completely correlated in time (i.e they are governed by a single AR(1) differenced process) and only differ based on a scaling factor ($s^{\delta}$). See the documentation for `variant_relationship` in `?forecast()` for details. + +We then assume a negative binomial observation model with overdispersion $\phi_c$ for reported cases ($C_t$) as in the single strain model, + +\begin{align} + C_{t} &\sim \text{NegBinomial}\left(\lambda_t, \phi_c\right) \\ + \frac{1}{\sqrt{\phi_c}} &\sim \text{Normal}(0, 0.5) +\end{align} + +Where $\sigma$, and $\frac{1}{\sqrt{phi_c}}$ are truncated to be greater than 0 and $\beta$ is truncated to be between -1 and 1. Again a Poisson observation model may instead be used (see the documentation for `overdispersion` in `?forecast()`). + +Finally, the mean proportion of samples that have the VoC ($p_t$) is then estimated using the mean reported cases with the VoC and the overall mean reported cases. + +\begin{equation} + p_t = \frac{\lambda^{\delta}_t}{\lambda_t} +\end{equation} + +We assume a beta binomial observation model for the number of sequences ($N_t$) that are positive ($P_t$) for the VoC with overdispersion $\phi_s$. + +\begin{align} + P_{t} &\sim \mathrm{BetaBinomial}\left(N_t, p_t \phi_s, (1 - p_t) \phi_s\right) \\ + \frac{1}{\sqrt{\phi_s}} &\sim \text{Normal}(0, 0.5) +\end{align} + +Where $\sigma^{o, \delta}$, and $\frac{1}{\sqrt{\phi_s}}$ are truncated to be greater than 0. A binomial observation model is also available (see the documentation for `overdispersion` in `?forecast()`). + + +We used the `forecast.vocs` package^[2021, Sam Abbott, forecast.vocs: Forecast case and sequence notifications using variant of concern strain dynamics, DOI:10.5281/zenodo.5559016] to model a two-strain branching process, including strain specific auto-regressive AR(1) variation, and a beta binomial observation process for the S-gene target status data. We used a weakly informative prior for a transmission advantage for the VoC vs non-VoC cases of mean `r parameters$voc_scale[1]` (standard deviation `r parameters$voc_scale[2]`), based on early work from South Africa^[2021-12-03, Carl Pearson and others, "Omicron spread in South Africa", Epidemics8]. + +We defined the relationship between variants as either fixed or independent. The fixed relationship meant the VoC differed only by its transmissibility to the non-VOC strain, with any variation over time shared between strains. The independent relationship allowed the VoC to differ with dependence determined from the data. See Appendix for full model details. + +### Transmission summary statistics + +As well as posterior predictions and forecasts for both notifications by variant and variant of concern proportion the models also return several summary statistics which may be useful for drawing inferences about underlying transmission dynamics. These include the log scale growth rate ($g^{o, \delta}_t$), the instantaneous effective reproduction number ($R^{o, \delta}_t$), and the transmission advantage of the variant of concern ($A_t$). These are calculated as follows: + +\begin{align} + g^{o, \delta}_t &= T_s r^{o, \delta}_t \\ + R^{o, \delta}_t &= \text{exp}\left(T_s r^{o, \delta}_t\right) \\ + A_t &= \text{exp}\left(T_s \left(r^{\delta}_t - r^{o}_t \right)\right)\\ +\end{align} + +$T_s$ is a user set scaling parameter that defines the timespan over which the summary metrics apply dependent on the time step of the data. It can be set using the `scale_r` and defaults to 1 which returns summary statistics scaled to the timestep of the data. Depending on the setup of the model used these summary measures will be more or less related to their epidemiological definitions. In particular, adding a weighting to past expected cases that is more complex than a simple lag may cause interpretation issues. + +### Statistical analysis + +We tested the difference between the two models by comparing out of sample predictive fit using leave-one-out cross-validation with Pareto smoothed importance sampling, and model scoring. + +### Implementation + +The models are implemented in `stan` using `cmdstanr` with no defaults altered[@stan; @cmdstanr]. Due to the complexity of the posterior it is likely that increasing the `adapt_delta` may be required to mitigate potential bias in posterior estimates [@betancourt_2017]. `forecast.vocs` incorporates additional functionality written in R[@R] to enable plotting forecasts and posterior predictions, summarising forecasts, and scoring them using `scoringutils`[@scoringutils]. All functionality is modular allowing users to extend and alter the underlying model whilst continuing to use the package framework. + + +### Reproducibility + +## Results {-} + +### Data description + +* Graph of cases by specimen date by region over time. +* Discuss trend (centred moving average to plot) +* Discussion sampling issues + potential sources of bias. + + +On average, `r round(mean(obs$share_voc, na.rm=T)*100)`% cases had an SGT result. We found a median advantage for SGT-result cases of `r exp(advantage$median)` (95% credible interval `r exp(advantage$q5)` - `r exp(advantage$q95)`), scaled against cases without an SGT result. The growth rate increased over time at a similar rate between SGT-result and no-SGT-result cases (figure \ref{fig:plot-growth}). + + +### Model comparison + +```{r compare-score} +scores <- janitor::clean_names( + select(sgtf$loo, -forecast_date), case = "sentence" +) +kable(scores) +``` + +### Growth rate of S-gene positive and negative cases + +### Proportion of cases with SGTF + +### Transmission advantage of the cases with SGTF + +### Posterior predictions of cases with an without SGTF as well as overall + +### Evidence of sampling bias in S-gene target results among Covid-19 test-positive cases + +## Discussion {-} + +**Summary** + +**Strengths and weaknesses** + +**Literature** + +**Futher work** + +**Conclusions** + +**Acknowledgements** + + +## References {-} + + + diff --git a/summary.Rmd b/summary.Rmd new file mode 100644 index 0000000..0e30f6c --- /dev/null +++ b/summary.Rmd @@ -0,0 +1,274 @@ +--- +title: "Real-time estimation of the time-varying transmissibility advantage of Omicron in England using S-Gene Target Status as a Proxy" +subtitle: "Summary report" +author: Sam Abbott (1), Katharine Sherratt (1), Sebastian Funk (1) +bibliography: library.bib +csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl +date: "`r format(Sys.Date(), format = '%B %d, %Y')`" +output: + html_document: + theme: cosmo + toc: true + toc_float: true + toc_depth: 4 + includes: + before_body: header.html + after_body: footer.html +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, include = TRUE, + warning = FALSE, message = FALSE, + root.dir = here::here(), + fig.width = 9, fig.height = 9) +options(digits = 1) +library(here) +library(forecast.vocs) +library(dplyr) +library(loo) +library(scoringutils) +library(knitr) +library(data.table) +``` + +```{r load-results} +# load packages +library(ggplot2) +library(scales) +library(dplyr) +library(here) +library(readr) +library(data.table) + +# load functions +source(here("R", "load-local-data.R")) +source(here("R", "plot-daily-cases.R")) +source(here("R", "plot-omicron-95.R")) +source(here("R", "plot-cumulative-percent.R")) + +# get latest date +date_latest <- get_latest_date() + +# load data +daily <- load_local_data(date_latest) +pop <- load_population() + +# load results +sgtf <- load_results(date_latest) +bias <- load_results(date_latest, type = "bias") + +sgtfposterior <- sgtf$posterior[, + variant_relationship := stringr::str_to_sentence(variant_relationship) +] +sgtf_posterior <- sgtf$posterior[ + variant_relationship == "Correlated" +] +``` + +```{r summarise-results} +# get cases from correlated model +cases <- summary(sgtf_posterior, type = "cases") + +# get VOC fraction from correlated model +voc_frac <- summary(sgtf_posterior, type = "voc_frac") + +# cumulative percentage with each variant +cum_per <- cumulative_percentage(cases, pop) %>% + filter(type %in% "Omicron") +``` + +```{r setup-dates} +# set up dates +date_data_start <- min(as.data.table(sgtf$data)[!is.na(seq_total)]$date) +date_forecast_start <- max(as.data.table(sgtf$data)[!is.na(seq_total)]$date) +date_forecast_end <- max(cases$date) +``` + +1. Center for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom + +## Introduction + +Since being highlighted by scientists in South Africa Omicron has spread rapidly globally. Using initial South African data, it has been estimated that Omicron may be both more transmissible and have greater immune escape than the previously dominant Delta variant. + +In this work, we use S-gene target failure (SGTF) as a proxy of variant status combined with reported case counts to explore the evidence for changes in transmission advantage over time for the Omicron variant. If present this could indicate the impact of immune escape, sampling bias in SGTF data or differences in the populations within which the variants are circulating. We also report estimates for growth rates by variant and overall, case counts overall and by variant for a 14 day forecast window assuming constant future growth, the date at which Omicron will become dominant in England and in each UKHSA region, and the estimated cumulative percentage of the population with a reported Omicron case. + +## Methods + +### Data + +We use S-gene status by specimen date sourced from UKHSA as a direct proxy for the Omicron variant with a target failure indicating a case has the Omicron variant. We augment this data with reported cases counts by date of specimen truncated by two days to account for delayed reporting. Data was available for England and for UKHSA region from both sources. + +### Models + +We consider two autoregressive models with different assumptions about the relationship between growth rates for Omicron and non-Omicron cases. Both models estimate expected cases for each variant as a combination of expected cases from the previous day and the exponential of the log growth rate. The growth rate is then itself modelled as a differenced AR(1) process. + +For the first model variants are assumed to have growth rates related by a fixed scaling (described from now on as the scaled model). In this model variant growth rates then share a single differenced AR(1) process meaning that they vary over time in the same way. This model assumes that variants differ only due to a transmissibility advantage and that there are no time-varying biases in the reported data. + +In the second model, we relax the assumption that variants co-vary using a vector autoregression structure which assume that the 1st differences of the variant growth rates are drawn from a multivariate normal distribution. This model formulation can account for variant differences other than a transmissibility advantage and can also better handle time-varying biases in data sources. + +For both models, we fit jointly to reported cases and SGTF data assuming a negative binomial and beta-binomial observation model respectively. We also initialise both models by fitting to a week of case only data where the Omicron variant is assumed to not be present (from the 17th of November to the 22nd). + +A full description of the models described here can be found in the documentation for the [`forecast.vocs` R package](https://epiforecasts.io/forecast.vocs/) [@forecast.vocs]. + +### Statistical Inference + +We first visualised our combined data sources (cases by specimen date and SGTF status by specimen date). We then fit both models separately to data for England and to each UKHSA region. Using these model fits we report posterior estimates from the best fitting model for the following summary statistics of epidemiological interest: growth rates by variant and overall, the time-varying transmission advantage for the Omicron variant, case counts overall and by variant for a 14 day forecast window assuming constant future growth, the date at which Omicron will become dominant in England and in each UKHSA region, and the estimated cumulative percentage of the population with a reported Omicron case. + +We explored the potential for bias in S-gene target sampling by comparing cases that have a reported SGTF status compared to cases with an unknown SGTF status. We fit the correlated model to case counts and S-gene tested status and reported the apparent transmission advantage for those with a S-gene status over time. Any variation in this metric from 100% may be interpreted as indicating biased sampling of those with known S-gene status, which may bias our main results. + +### Implementation + +All models were implemented using the [`forecast.vocs` R package](https://epiforecasts.io/forecast.vocs/) [@R; @forecast.vocs] and fit using `stan` [@stan] and `cmdstanr` [@cmdstanr]. Each model was fit using 2 chains with each chain having a 1000 warmup steps and 2000 sampling steps. Convergence was assessed using the Rhat diagnostic [@stan]. Models were compared using approximate leave-one-out (LOO) cross-validation [@loo; @loo-paper] where negative values indicate an improved fit for the correlated model. + + +## Limitations + + +* SGTF may be an imperfect proxy for Omicron. We make no adjustment for the background rate of SGTF observed for Delta or other factors that might lead to SGTF, or a confirmed detected S-gene observed with Omicron. + +* We make a crude adjustment for the right censoring of reported case counts by specimen date by excluding the last two days of data. However, this may be insufficient to correct for the bias and so our real-time growth rates estimates may be biased downwards. + +* Our analysis is based on reported case counts augmented with SGTF data. Growth rates estimated from reported case counts may be biased during periods when changes in testing practice occur or when available testing is at capacity. + +* We reported the log growth rate approximation to the growth rate. This approximation becomes less valid for high absolute growth rates. + +* Our models do not attempt to capture the underlying transmission process or delays in reporting. In particular, we do not consider the generation time of variants which may differ. Differences in the generation time would result in apparent differences in the growth rate (on the scale of the generation time) even in scenarios where growth rates were fixed. + +* Our forecasts assume a fixed growth rate across the forecast horizon and do not account for population susceptibility or other factors such as changes in contact patterns. + +* Our estimates are at the national or UKHSA region scale. There may be additional variation within these areas that these estimates cannot capture. + +* We used an approximate form of leave-one-out cross validation to compare the performance of the models we considered. This may be inappropriate in situations where performance on held out time-series data is the target of interest. + +## Results + +### Summary + +* A model that assumed that variants growth rates varied over time relative to each other performed better in all regions than a model where the relative difference was constant over time. + +* Prior to the introduction of Omicron the growth rate of non-Omicron cases was positive in most regions. In the last few weeks the growth rate for non-Omicron cases has become negative indicating that if nothing changes Omicron will be the only circulating variant at some point in the future. + +* The growth rate for Omicron cases has been consistently positive in all regions since the 23rd of November though with slightly different absolute values and with different variation over time. + +* In the last week both case data and the SGTF data suggest that the growth rate of Omicron in reported cases has reduced markedly though at this time still appears positive in all regions excluding London. + +* Assuming a fixed difference in transmissibility between variants indicates an advantage for Omicron in the order of 30% to 50% across all regions. Allowing this to vary (which was shown to better represent the data) implies that this advantage generally appears to have increased over time until the last week where it fell back to initially observed levels or below in all regions. + +* The variation in transmission advantage also differed by region with some regions having a fairly stable trend (such as the South West) and some (such as the North East) indicating large changes. + +* Assuming growth rates stay as currently estimated we forecast that approximately 7.5% to 10% of the English population will have reported an Omicron case by the end of 2021, although with significant regional variation. + +* We found some evidence of bias in sampling S-gene status though this varied across regions and over time. There was particularly strong evidence of biased sampling over time in the East Midlands. In other regions, the evidence of bias was substantially weaker and therefore less lilely to bias our main findings. + +### Data description + +```{r data, fig.cap = "Daily cases in England and by UKHSA region, with S-gene target result (failed, confirmed detected, or unknown), and centred 7-day moving average up to date of data truncation (dotted line). Source: UKHSA and coronavirus.gov.uk; data by specimen date"} +plot_daily_cases( + daily, truncate_date = date_forecast_start, caption = "", + start_date = date_data_start, smooth_total = TRUE +) +``` + +### Model comparison + +```{r compare-score} +scores <- janitor::clean_names( + select(sgtf$loo, -forecast_date), case = "sentence" +) +kable(scores, caption = "Estimated differences in the ELPD metric (with + standard errors) between the scaled and correlated models for England and UKHSA + regions. Negative values indicate that the correlated model is estimated to be + a better fit to the data. ") +``` + +### Growth rate of reported cases overall, with the Omicron variant, and not with the Omicron variant + +```{r growth, fig.cap = "Estimates of the time-varying daily growth rate for England and by UKHSA region (using the log scale approximation) for reported cases overall, cases with the Omicron variant, and non-Omicron cases. Growth rates are assumed to be piecewise constant with potential changes every 3 days. Growth rates are assumed to be constant beyond the forecast horizon (vertical dashed line). 90% and 60% credible intervals are shown."} +plot_growth(sgtf_posterior) + + facet_wrap(~ region) +``` + +### Proportion of cases with the Omicron variant on the logit scale + +#### Natural scale + +```{r voc-frac-natural, fig.cap = "Estimates of the proportion of reported cases with the Omicron variant for England and by UKHSA region using both the scaled model and the correlated model on the natural scale. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. Observation error is not shown meaning that the posterior predictions may look overly precise."} +plot( + sgtf$posterior, as.data.table(sgtf$data), type = "voc_frac", + fill = variant_relationship, voc_label = "Omicron variant", log = FALSE +) + + labs(fill = "Variant relationship") + + facet_wrap(~ region) +``` + +#### Logit scale + +```{r voc-frac-logit, fig.cap = "Estimates of the proportion of reported cases with the Omicron variant for England and by UKHSA region using both the scaled model and the correlated model on the logit scale. Note that scaled model is effectively assuming a linear relationship on this scale. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. Observation error is not shown meaning that the posterior predictions may look overly precise."} +plot( + sgtf$posterior, as.data.table(sgtf$data), type = "voc_frac", + fill = variant_relationship, voc_label = "Omicron variant" +) + + labs(fill = "Variant relationship") + + facet_wrap(~ region) +``` + +### Date at which Omicron estimated to account for 95% of reported cases +```{r omicron-95, fig.cap = "Estimated dates at which 95% of reported cases will have the Omicron variant for England and UKHSA regions. Points represent the median date and ranges represent the 90% credible interval.", fig.height = 6, fig.width = 6} +plot_omicron_95(voc_frac = voc_frac, + forecast_start = date_forecast_start, + forecast_end = date_forecast_end) +``` + + +### Time-varying transmission advantage of the Omicron variant + +```{r voc-advantage, fig.cap = "Estimates of the time-varying daily transmission advantage for the Omicron cases vs non-Omicron cases for England and by UKHSA region. Estimates are shown for both the scaled model and the correlated model. Variation in the advantage over time could be an indicator of biases in the SGTF data, differences in the populations in which the variants are circulating or as a result of immune escape. Transmission advantage is assumed to be constant beyond the forecasting horizon (dashed line). 90% and 60% credible intervals are shown."} +plot( + sgtf$posterior, type = "voc_advantage", + fill = variant_relationship, group = variant_relationship, + voc_label = "Omicron variant" +) + + labs(fill = "Variant relationship", + y = "Transmission advantage for the Omicron variant" + ) + + facet_wrap(~ region) +``` + +### Posterior predictions and forecasts of reported cases + +#### Natural + +```{r natural-cases, fig.cap = "Estimates of the number of reported cases by variant and overall for England and each UKHSA region. Estimates are shown both within the window supported by the data and for 14 days afterwards as a forecast where it is assumed that growth rates for each variant are stable. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown.", fig.width = 12} +plot( + sgtf_posterior, as.data.table(sgtf$data), type = "cases", log = FALSE +) + + facet_wrap(~region, scales = "free_y") +``` + +#### Log + +```{r log-cases, fig.cap = "Estimates of the number of reported cases by variant and overall for England and each UKHSA region on the log 2 scale. Estimates are shown both within the window supported by the data and for 14 days afterwards as a forecast where it is assumed that growth rates for each variant are stable. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown.", fig.width = 12} +plot( + sgtf_posterior, as.data.table(sgtf$data), type = "cases" +) + + facet_wrap(~region, scales = "free_y") +``` + +### Cumulative percentage of the population with a reported Omicron case + +```{r cumulative-population, fig.cap = "Estimates of the cumulative percentage of the population with reported Omicron cases in England and by UKHSA. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown."} +plot_cumulative_percent( + cum_per, forecast_start = date_forecast_start, data_start = date_data_start +) +``` + +### Evidence of sampling bias in S-gene target results among Covid-19 test-positive cases + +```{r, bias-advantage, eval = FALSE, fig.cap = "Estimates of the time-varying daily transmission advantage for cases with a reported S-gene status vs those without for England and by NHS region. Deviation from 100% indicates that growth rates differ in the two populations which is itself an indication of sampling bias. If present this sampling bias could impact our main results. Transmission advantage is assumed to be constant beyond the forecasting horizon (dashed line). 90% and 60% credible intervals are shown. Note that here a 7-day piecewise constant growth rate has been used."} +plot(bias$posterior, type = "voc_advantage") + + labs(y = "Transmission advantage of cases with a reported S gene + status versus those without") + + facet_wrap(~ region) +``` + +## References diff --git a/summary.html b/summary.html new file mode 100644 index 0000000..9f8bcfc --- /dev/null +++ b/summary.html @@ -0,0 +1,604 @@ + + + + + + + + + + + + + + + +Since being highlighted by scientists in South Africa Omicron has spread rapidly globally. Using initial South African data, it has been estimated that Omicron may be both more transmissible and have greater immune escape than the previously dominant Delta variant.
+In this work, we use S-gene target failure (SGTF) as a proxy of variant status combined with reported case counts to explore the evidence for changes in transmission advantage over time for the Omicron variant. If present this could indicate the impact of immune escape, sampling bias in SGTF data or differences in the populations within which the variants are circulating. We also report estimates for growth rates by variant and overall, case counts overall and by variant for a 14 day forecast window assuming constant future growth, the date at which Omicron will become dominant in England and in each UKHSA region, and the estimated cumulative percentage of the population with a reported Omicron case.
+We use S-gene status by specimen date sourced from UKHSA as a direct proxy for the Omicron variant with a target failure indicating a case has the Omicron variant. We augment this data with reported cases counts by date of specimen truncated by two days to account for delayed reporting. Data was available for England and for UKHSA region from both sources.
+We consider two autoregressive models with different assumptions about the relationship between growth rates for Omicron and non-Omicron cases. Both models estimate expected cases for each variant as a combination of expected cases from the previous day and the exponential of the log growth rate. The growth rate is then itself modelled as a differenced AR(1) process.
+For the first model variants are assumed to have growth rates related by a fixed scaling (described from now on as the scaled model). In this model variant growth rates then share a single differenced AR(1) process meaning that they vary over time in the same way. This model assumes that variants differ only due to a transmissibility advantage and that there are no time-varying biases in the reported data.
+In the second model, we relax the assumption that variants co-vary using a vector autoregression structure which assume that the 1st differences of the variant growth rates are drawn from a multivariate normal distribution. This model formulation can account for variant differences other than a transmissibility advantage and can also better handle time-varying biases in data sources.
+For both models, we fit jointly to reported cases and SGTF data assuming a negative binomial and beta-binomial observation model respectively. We also initialise both models by fitting to a week of case only data where the Omicron variant is assumed to not be present (from the 17th of November to the 22nd).
+A full description of the models described here can be found in the documentation for the forecast.vocs
R package[1].
We first visualised our combined data sources (cases by specimen date and SGTF status by specimen date). We then fit both models separately to data for England and to each UKHSA region. Using these model fits we report posterior estimates from the best fitting model for the following summary statistics of epidemiological interest: growth rates by variant and overall, the time-varying transmission advantage for the Omicron variant, case counts overall and by variant for a 14 day forecast window assuming constant future growth, the date at which Omicron will become dominant in England and in each UKHSA region, and the estimated cumulative percentage of the population with a reported Omicron case.
+We explored the potential for bias in S-gene target sampling by comparing cases that have a reported SGTF status compared to cases with an unknown SGTF status. We fit the correlated model to case counts and S-gene tested status and reported the apparent transmission advantage for those with a S-gene status over time. Any variation in this metric from 100% may be interpreted as indicating biased sampling of those with known S-gene status, which may bias our main results.
+All models were implemented using the forecast.vocs
R package[1,2] and fit using stan
[3] and cmdstanr
[4]. Each model was fit using 2 chains with each chain having a 1000 warmup steps and 2000 sampling steps. Convergence was assessed using the Rhat diagnostic[3]. Models were compared using approximate leave-one-out (LOO) cross-validation[5,6] where negative values indicate an improved fit for the correlated model.
SGTF may be an imperfect proxy for Omicron. We make no adjustment for the background rate of SGTF observed for Delta or other factors that might lead to SGTF, or a confirmed detected S-gene observed with Omicron.
We make a crude adjustment for the right censoring of reported case counts by specimen date by excluding the last two days of data. However, this may be insufficient to correct for the bias and so our real-time growth rates estimates may be biased downwards.
Our analysis is based on reported case counts augmented with SGTF data. Growth rates estimated from reported case counts may be biased during periods when changes in testing practice occur or when available testing is at capacity.
We reported the log growth rate approximation to the growth rate. This approximation becomes less valid for high absolute growth rates.
Our models do not attempt to capture the underlying transmission process or delays in reporting. In particular, we do not consider the generation time of variants which may differ. Differences in the generation time would result in apparent differences in the growth rate (on the scale of the generation time) even in scenarios where growth rates were fixed.
Our forecasts assume a fixed growth rate across the forecast horizon and do not account for population susceptibility or other factors such as changes in contact patterns.
Our estimates are at the national or UKHSA region scale. There may be additional variation within these areas that these estimates cannot capture.
We used an approximate form of leave-one-out cross validation to compare the performance of the models we considered. This may be inappropriate in situations where performance on held out time-series data is the target of interest.
A model that assumed that variants growth rates varied over time relative to each other performed better in all regions than a model where the relative difference was constant over time.
Prior to the introduction of Omicron the growth rate of non-Omicron cases was positive in most regions. In the last few weeks the growth rate for non-Omicron cases has become negative indicating that if nothing changes Omicron will be the only circulating variant at some point in the future.
The growth rate for Omicron cases has been consistently positive in all regions since the 23rd of November though with slightly different absolute values and with different variation over time.
In the last week both case data and the SGTF data suggest that the growth rate of Omicron in reported cases has reduced markedly though at this time still appears positive in all regions excluding London.
Assuming a fixed difference in transmissibility between variants indicates an advantage for Omicron in the order of 30% to 50% across all regions. Allowing this to vary (which was shown to better represent the data) implies that this advantage generally appears to have increased over time until the last week where it fell back to initially observed levels or below in all regions.
The variation in transmission advantage also differed by region with some regions having a fairly stable trend (such as the South West) and some (such as the North East) indicating large changes.
Assuming growth rates stay as currently estimated we forecast that approximately 7.5% to 10% of the English population will have reported an Omicron case by the end of 2021, although with significant regional variation.
We found some evidence of bias in sampling S-gene status though this varied across regions and over time. There was particularly strong evidence of biased sampling over time in the East Midlands. In other regions, the evidence of bias was substantially weaker and therefore less lilely to bias our main findings.
+Daily cases in England and by UKHSA region, with S-gene target result (failed, confirmed detected, or unknown), and centred 7-day moving average up to date of data truncation (dotted line). Source: UKHSA and coronavirus.gov.uk; data by specimen date +
+Region | +Elpd diff | +Se diff | +
---|---|---|
England | +-19 | +8 | +
East Midlands | +-16 | +6 | +
East of England | +-14 | +4 | +
London | +-10 | +4 | +
North East | +-14 | +3 | +
North West | +-14 | +4 | +
South East | +-11 | +3 | +
South West | +-7 | +3 | +
West Midlands | +-32 | +6 | +
Yorkshire and Humber | +-32 | +6 | +
+Estimates of the time-varying daily growth rate for England and by UKHSA region (using the log scale approximation) for reported cases overall, cases with the Omicron variant, and non-Omicron cases. Growth rates are assumed to be piecewise constant with potential changes every 3 days. Growth rates are assumed to be constant beyond the forecast horizon (vertical dashed line). 90% and 60% credible intervals are shown. +
++Estimates of the proportion of reported cases with the Omicron variant for England and by UKHSA region using both the scaled model and the correlated model on the natural scale. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. Observation error is not shown meaning that the posterior predictions may look overly precise. +
++Estimates of the proportion of reported cases with the Omicron variant for England and by UKHSA region using both the scaled model and the correlated model on the logit scale. Note that scaled model is effectively assuming a linear relationship on this scale. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. Observation error is not shown meaning that the posterior predictions may look overly precise. +
++Estimated dates at which 95% of reported cases will have the Omicron variant for England and UKHSA regions. Points represent the median date and ranges represent the 90% credible interval. +
++Estimates of the time-varying daily transmission advantage for the Omicron cases vs non-Omicron cases for England and by UKHSA region. Estimates are shown for both the scaled model and the correlated model. Variation in the advantage over time could be an indicator of biases in the SGTF data, differences in the populations in which the variants are circulating or as a result of immune escape. Transmission advantage is assumed to be constant beyond the forecasting horizon (dashed line). 90% and 60% credible intervals are shown. +
++Estimates of the number of reported cases by variant and overall for England and each UKHSA region. Estimates are shown both within the window supported by the data and for 14 days afterwards as a forecast where it is assumed that growth rates for each variant are stable. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. +
++Estimates of the number of reported cases by variant and overall for England and each UKHSA region on the log 2 scale. Estimates are shown both within the window supported by the data and for 14 days afterwards as a forecast where it is assumed that growth rates for each variant are stable. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. +
++Estimates of the cumulative percentage of the population with reported Omicron cases in England and by UKHSA. The forecast horizon is indicated using a dashed line. 90% and 60% credible intervals are shown. +
+Developed by Sam Abbott, Katharine Sherratt, and Sebastian Funk
+ + + + + + + + + +