--- title: "Forecasting age-stratified norovirus gastroenteritis counts using `surveillance::hhh4`" author: "Sebastian Meyer" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_width: 6 fig_height: 4 toc: TRUE vignette: > %\VignetteIndexEntry{Forecasting age-stratified norovirus gastroenteritis counts using `surveillance::hhh4`} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{ggplot2, surveillance, hhh4contacts, lattice, fanplot} --- ```{r setup_knitr, include = FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE, error = FALSE, fig.align = "center", dev.args = list(pointsize = 10)) ``` ```{r setup} options(digits = 4) # for more compact numerical outputs library("HIDDA.forecasting") library("ggplot2") ``` In this vignette, we use modelling and forecasting methods provided by: ```{r} library("surveillance") ``` The corresponding software reference is: ```{r, echo = FALSE, results = "asis"} cat("
") print(citation(package = "surveillance", auto = TRUE), style = "html") cat("\n") ``` ## Data We use age-stratified norovirus surveillance data from Berlin, Germany, together with an age-structured social contact matrix from the [POLYMOD survey](http://cordis.europa.eu/project/rcn/79211_en.html), as provided in the R package [**hhh4contacts**](https://CRAN.R-project.org/package=hhh4contacts): ```{r} library("hhh4contacts") ``` These data have been originally analyzed in: > Meyer S and Held L (2017): "Incorporating social contact data in > spatio-temporal models for infectious disease spread". > *Biostatistics*, **18**(2), pp. 338--351. > DOI: [10.1093/biostatistics/kxw051](https://doi.org/10.1093/biostatistics/kxw051) Here we only consider models for spatially aggregated counts, i.e., without additional stratification by city district. More specifically, we will analyse age-stratified weekly counts $Y_{gt}$ in a similar way as for the reference model 6 in: > Held L, Meyer S and Bracher J (2017): "Probabilistic forecasting in > infectious disease epidemiology: the 13th Armitage lecture". > *Statistics in Medicine*, **36**(22), pp. 3443--3460. > DOI: [10.1002/sim.7363](https://doi.org/10.1002/sim.7363) ### Berlin norovirus counts ```{r BNV} BNV <- noroBE(by = "agegroup", agegroups = c(1, 2, 2, 4, 4, 2), timeRange = c("2011-w27", "2016-w26")) BNV (NGROUPS <- ncol(BNV)) (GROUPS <- colnames(BNV)) ``` We can plot the observed age-stratified counts using the default plot method for "sts" objects: ```{r BNV_stsplot} plot(BNV) ``` We will use the first four seasons, from week 2011/27 to week 2015/26, as training data, and assess forecasts during the following year (2015/27 to 2016/26). ```{r} TRAIN <- 2:(4*52) TEST <- max(TRAIN) + 1:52 ``` There is also an `autoplot` variant based on **ggplot2**, which we use to plot the age-specific incidence based on the population fractions contained in the `BNV` object, ```{r} (popfracsBE <- population(BNV)[1,]) ``` and Berlin's total population, ```{r} (popBE <- sum(pop2011)) # also from the "hhh4contacts" package, see ?pop2011 ``` as follows: ```{r BNV_ggplot, fig.width = 7, fig.height = 5} autoplot(BNV, population = 100000/popBE) + ## population: divides observed(BNV) by population(BNV)/(100000/popBE) ## Mod 1: highlight the Christmas period in each year geom_col(aes(fill = epochInYear %in% c(52, 1)), width = 7, show.legend = FALSE) + scale_fill_manual(values = c("black", "red")) + ## Mod 2: separate training and test periods by a vertical line geom_vline(aes(xintercept = as.numeric(date)[4*52] + .5), linetype = 2) + ylab("Incidence [per 100 000 inhabitants]") + scale_y_sqrt() ``` ### Age-structured contact matrix The `neighbourhood` slot of the `BNV` object contains a social contact matrix derived from the German subset of the [POLYMOD survey](http://cordis.europa.eu/project/rcn/79211_en.html), aggregated to match the age groups of the surveillance data: ```{r} neighbourhood(BNV) ``` Each entry gives the average number of contact persons of a certain age group a participant (of a certain age group) reports on a randomly assigned day, see Mossong et al. (2008, *PLoS Medicine*, DOI: [10.1371/journal.pmed.0050074](https://doi.org/10.1371/journal.pmed.0050074)). The "agedistri" attribute gives the age distribution of the participants. We will employ an improved estimate of this contact matrix, which ensures reciprocity on the population level, i.e., the overall number of contacts of age group *i* with age group *j* should be the same as vice versa, see Wallinga et al (2006, *American Journal of Epidemiology*, DOI: [10.1093/aje/kwj317](https://doi.org/10.1093/aje/kwj317)): ```{r} (C_reci <- contactmatrix(which = "reciprocal", grouping = c(1, 2, 2, 4, 4, 2))) ``` We can check reciprocity with respect to Berlin's age distribution (also given in the "agedistri" attribute): ```{r} is_reciprocal <- function (C, population, tol = 0.001) { Cpop <- C * population all.equal(Cpop, t(Cpop), tolerance = tol, check.attributes = FALSE) } stopifnot(is_reciprocal(C_reci, popfracsBE)) ``` The **hhh4contacts** package provides a simple plotting function for such contact matrices: ```{r C_reci, fig.width = 4, fig.height = 3.5} plotC(C_reci, scales = list(cex = 0.8, x = list(rot = 45))) ``` NB: A general implementation to extract social contact matrices from surveys, including from POLYMOD, is available via the dedicated R package [**socialmixr**](https://CRAN.R-project.org/package=socialmixr), and I recommend to use that package in future projects. ```{r, include = FALSE, eval = FALSE} C <- contactmatrix(grouping = c(1, 2, 2, 4, 4, 2)) ## reproduce that matrix using socialmixr's POLYMOD data and contact_matrix() library("socialmixr") data("polymod") cite(polymod) C2 <- contact_matrix(polymod, countries = "Germany", age.limits = c(0,5,15,25,45,65), missing.participant.age = "remove", missing.contact.age = "keep") round(C2$matrix, 2) round(C, 2) stopifnot(all.equal(C2$matrix[,-ncol(C2$matrix)], C, check.attributes = FALSE)) ## reciprocity is obtained differently in socialmixr polymod2 <- polymod ## remove contacts with unknown age for same data approach polymod2$contacts <- subset(polymod2$contacts, !is.na(cnt_age_exact) | (!is.na(cnt_age_est_min) & !is.na(cnt_age_est_max))) C2_reci <- contact_matrix(polymod2, countries = "Germany", age.limits = c(0,5,15,25,45,65), missing.participant.age = "remove", #missing.contact.age = "keep", survey.pop = data.frame(lower.age.limit = c(0,5,15,25,45,65), population = popfracsBE * popBE), symmetric = TRUE) round(C2_reci$matrix, 2) stopifnot(is_reciprocal(C2_reci$matrix, popfracsBE)) round(C_reci, 2) ``` ## Modelling Given the counts from the previous week, $Y_{.,t-1}$, we assume $Y_{gt}$ to follow a negative binomial distribution with a group-specific overdispersion parameter and mean $$ \mu_{gt} = \nu_{gt} + \phi_{gt} \sum_{g'} c_{g'g} Y_{g',t-1} . $$ The endemic log-linear predictor $\nu_{gt}$ contains group-specific intercepts, a Christmas effect (via a simple indicator for the calendar weeks 52 and 1), and group-specific seasonal effects of $\sin(\omega t)$ and $\cos(\omega t)$ terms, $\omega=2\pi/52$. The epidemic log-linear predictor $\phi_{gt}$ also contains group-specific intercept, but shared seasonality and no Christmas effect. For the contact matrix we use `C_reci` from above, normalized to a transition matrix `C_reci/rowSums(C_reci)`, and compare this to models assuming homogeneous or no mixing between age groups, and a model where we estimate a power transformation $C^\kappa$ via profile likelihood (`hhh4contacts::fitC()`) as in Meyer and Held (2017, see `demo("hhh4contacts")` for a spatially disaggregated version of the model). ```{r mg_Creci} DATAt <- list(t = epoch(BNV) - 1, christmas = as.integer(epochInYear(BNV) %in% c(52, 1))) mg_Creci <- hhh4(BNV, list( end = list(f = addSeason2formula(~0 + fe(1, unitSpecific = TRUE) + christmas, S = rep(1, NGROUPS))), ne = list(f = addSeason2formula(~0 + fe(1, unitSpecific = TRUE)), weights = matrix(1, NGROUPS, NGROUPS), scale = C_reci, normalize = TRUE), family = "NegBinM", data = DATAt, subset = TRAIN)) ``` Alternative 1: assuming homogeneous mixing between age groups ```{r mg_Chom} mg_Chom <- update(mg_Creci, ne = list(scale = NULL)) ``` Alternative 2: assuming no mixing between age groups ```{r mg_Cdiag} mg_Cdiag <- update(mg_Creci, ne = list(scale = diag(NGROUPS))) ``` Alternative 3: with a power transformation of the contact matrix ```{r mg_Cpower, results = "hide"} mg_Cpower <- fitC(mg_Creci, C_reci, normalize = TRUE, truncate = TRUE) ``` Simple AIC comparison of the model fits to the training period: ```{r} AIC(mg_Creci, mg_Chom, mg_Cdiag, mg_Cpower) ``` Parameter estimates from the model with power-adjusted contact matrix: ```{r} summary(mg_Cpower, maxEV = TRUE, reparamPsi = TRUE, amplitudeShift = TRUE, idx2Exp = TRUE) ``` ```{r fitted_components} ## plot estimated endemic-epidemic decomposition ##plot(mg_Cpower, units = NULL, pch = 20, ## legend = 2, legend.args = list(legend = c("epidemic", "endemic"))) ## additional decomposition into AR effects and effects of other age groups plotHHH4_fitted_groups(mg_Cpower, groups = GROUPS, units = NULL, pch = 20, legend = 2, legend.args = list(legend = c("from other groups", "within group", "endemic"))) ``` ```{r seasonality, fig.width = 4.5, fig.height = 3, dev.args = list(pointsize = 8)} par(mfrow = c(2,1), mar = c(0,5,1,1), las = 1) plotHHH4_season_groups(mg_Cpower, component = "end", seasonStart = 27, col = c("#D53E4F", "#FC8D59", "#FEE08B", "#E6F598", "#99D594", "#3288BD"), conf.level = NULL, xaxt = "n", xlab = "", ylim = c(0, 5), yaxs = "i", ylab = "endemic seasonal effects") par(mar = c(3,5,1,1)) with(data.frame(time = epochInYear(BNV) + (year(BNV)-2011)*52, maxEV = getMaxEV(mg_Cpower))[1:52,], plot(maxEV ~ time, ylim = c(0, 1), type = "l", lwd = 3, yaxs = "i", xlab = "", ylab = "epidemic proportion", panel.first = quote(abline(v=52.5, lty=3)))) ``` ## One-week-ahead forecasts Parameters are updated sequentially in the one-step-ahead procedure. However, the power parameter of C is held fixed at the initial estimate to reduce the runtime. For each model, we compute the `r length(TEST)` "rolling" one-week-ahead forecasts during the last season. This takes roughly 4 seconds per model (we could parallelize using the `cores` argument of `oneStepAhead()`). ```{r} fits <- list("reciprocal" = mg_Creci, "homogeneous" = mg_Chom, "no mixing" = mg_Cdiag, "power-adjusted" = mg_Cpower) ``` ```{r owas, eval = !file.exists("BNV_owa.RData"), results = "hide"} owas <- lapply(fits, oneStepAhead, tp = range(TEST)-1, type = "rolling", which.start = "final", verbose = FALSE) save(owas, file = "BNV_owa.RData") ``` ```{r, include = FALSE} load("BNV_owa.RData") ``` ```{r owas_pit, echo = -1, results = "hide"} par(mfrow = c(2,2), mar = c(5,5,1,1), las = 1) mapply(function(x, main) pit(x, plot = list(main = main, ylab = "Density")), x = owas, main = names(owas)) ``` ```{r owas_caltest} unlist(sapply(owas, calibrationTest, which = "dss")["p.value",]) unlist(sapply(owas, calibrationTest, which = "logs")["p.value",]) ``` ```{r owas_scores} owas_scores <- lapply(owas, scores, which = c("dss", "logs"), individual = TRUE, reverse = FALSE) lapply(owas_scores, function(x) cbind(apply(x, 3:2, mean), overall=apply(x, 3, mean))) ``` ```{r owas_scores_permutationTest, include = FALSE, eval = FALSE} set.seed(1) sapply(c("dss", "logs"), function (score) permutationTest(owas_scores[["reciprocal"]][,,score], owas_scores[["power-adjusted"]][,,score], nPermutation = 999)) ``` ```{r owas_plot_Cpower, echo = -1, fig.align = "default", fig.width = 4.5, fig.height = 3, dev.args = list(pointsize = 12), out.width = "31%", results = "hide"} par(las = 1, mar = c(0,4,1,0)+.5) owas_quantiles <- lapply(owas, quantile, probs = 1:99/100) .osaplot <- function (model) { sapply(seq_along(GROUPS), function (group) { osaplot(quantiles = owas_quantiles[[model]][,group,], probs = 1:99/100, observed = owas[[model]]$observed[,group], scores = owas_scores[[model]][,group,], start = TEST[1], xlab = "", main = GROUPS[group], fan.args = list(ln = c(0.1,0.9), rlab = NULL), key.args = if(group==length(GROUPS)) list(start=max(TEST)-2, rcex=0.6), scores.args = list(xaxt = "n", ylim = range(owas_scores[[model]]), lab = c(7,3,0), panel.first = grid(nx=NA,ny=NULL,lty=1)), legend.args = if(group==length(GROUPS)) list()) }) } .osaplot("no mixing") ```