Title: | Forecasting Based on Surveillance Data |
---|---|
Description: | The Handbook of Infectious Disease Data Analysis ("HIDDA") contains a chapter on "Forecasting Based on Surveillance Data". The R package 'HIDDA.forecasting' provides the data and code to reproduce results from the two applications described in that chapter (see the corresponding vignettes): Univariate forecasting of Swiss ILI counts using 'forecast', 'glarma', 'surveillance' and 'prophet', and an age-stratified analysis of norovirus gastroenteritis in Berlin using the multivariate time-series model implemented in surveillance::hhh4(). |
Authors: | Sebastian Meyer [aut, cre] , Leonhard Held [ctb] |
Maintainer: | Sebastian Meyer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.2 |
Built: | 2024-12-07 05:57:16 UTC |
Source: | https://github.com/HIDDA/forecasting |
The CHILI
dataset is a time series of the weekly number of
ILI cases in Switzerland from 2000 to 2016,
estimated from the Swiss Sentinella Reporting System.
data("CHILI")
data("CHILI")
a univariate time series of class zoo
,
where the time index is of class Date
and always refers to the Tuesday of the notification week
The Swiss ILI data has been received on 19 January 2017 by courtesy of:
Swiss Federal Office of Public Health
Public Health Directorate
Communicable Diseases Division
3003 Bern
SWITZERLAND
summary(CHILI) plot(CHILI)
summary(CHILI) plot(CHILI)
hhh4
-Based Forecast DistributionsThe function dhhh4sims
constructs a (non-vectorized)
probability mass function from the result of
surveillance::simulate.hhh4()
(and the corresponding
model), as a function of the time point within the simulation period.
The distribution at each time point is obtained as a mixture of
negative binomial (or Poisson) distributions based on the samples from
the previous time point.
dhhh4sims(sims, model)
dhhh4sims(sims, model)
sims |
a |
model |
the "hhh4" object underlying |
a function(x, tp = 1, log = FALSE)
, which takes a
vector of model$nUnit
counts and calculates the
(log
-)probability of observing these counts (given the
model
) at the tp
'th time point of the simulation
period (index or character string matching rownames(sims)
).
Sebastian Meyer
logs_hhh4sims()
where this function is used.
library("surveillance") CHILI.sts <- sts(observed = CHILI, epoch = as.integer(index(CHILI)), epochAsDate = TRUE) ## fit a simple hhh4 model (f1 <- addSeason2formula(~ 1, period = 365.2425)) fit <- hhh4( stsObj = CHILI.sts, control = list(ar = list(f = f1), end = list(f = f1), family = "NegBin1") ) ## simulate the last four weeks (only 200 runs, for speed) sims <- simulate(fit, nsim = 200, seed = 1, subset = 884:nrow(CHILI.sts), y.start = observed(CHILI.sts)[883,]) if (requireNamespace("fanplot")) { plot(sims, "fan", fan.args = list(ln = c(5,95)/100), observed.args = list(pch = 19), means.args = list(type = "b")) } ## derive the weekly forecast distributions dfun <- dhhh4sims(sims, fit) dfun(4000, tp = 1) dfun(4000, tp = 4) curve(sapply(x, dfun, tp = 4), 0, 30000, type = "h", main = "4-weeks-ahead forecast", xlab = "No. infected", ylab = "Probability") ## compare the forecast distributions with the simulated counts par(mfrow = n2mfrow(nrow(sims))) for (tp in 1:nrow(sims)) { MASS::truehist(sims[tp,,], xlab = "counts", ylab = "Probability") curve(sapply(x, dfun, tp = tp), add = TRUE, lwd = 2) }
library("surveillance") CHILI.sts <- sts(observed = CHILI, epoch = as.integer(index(CHILI)), epochAsDate = TRUE) ## fit a simple hhh4 model (f1 <- addSeason2formula(~ 1, period = 365.2425)) fit <- hhh4( stsObj = CHILI.sts, control = list(ar = list(f = f1), end = list(f = f1), family = "NegBin1") ) ## simulate the last four weeks (only 200 runs, for speed) sims <- simulate(fit, nsim = 200, seed = 1, subset = 884:nrow(CHILI.sts), y.start = observed(CHILI.sts)[883,]) if (requireNamespace("fanplot")) { plot(sims, "fan", fan.args = list(ln = c(5,95)/100), observed.args = list(pch = 19), means.args = list(type = "b")) } ## derive the weekly forecast distributions dfun <- dhhh4sims(sims, fit) dfun(4000, tp = 1) dfun(4000, tp = 4) curve(sapply(x, dfun, tp = 4), 0, 30000, type = "h", main = "4-weeks-ahead forecast", xlab = "No. infected", ylab = "Probability") ## compare the forecast distributions with the simulated counts par(mfrow = n2mfrow(nrow(sims))) for (tp in 1:nrow(sims)) { MASS::truehist(sims[tp,,], xlab = "counts", ylab = "Probability") curve(sapply(x, dfun, tp = tp), add = TRUE, lwd = 2) }
The function dnbmix()
constructs a (vectorized) probability mass
function from a matrix of (simulated) means
and corresponding size
parameters, as a function of the time point (row of means
) within
the simulation period. The distribution at each time point is obtained
as a mixture of negative binomial (or Poisson) distributions.
dnbmix(means, size = NULL)
dnbmix(means, size = NULL)
means |
a |
size |
the dispersion parameter of the |
a function(x, tp = 1, log = FALSE)
, which takes a vector of
counts x
and calculates the (log
-)probabilities of observing
each of these numbers at the tp
'th time point of the simulation
period (indexing the rows of means
).
Sebastian Meyer
logs_nbmix()
where this function is used.
## a GLARMA example library("glarma") y <- as.vector(CHILI) ## fit a simple NegBin-GLARMA model X <- t(sapply(2*pi*seq_along(y)/52.1775, function (x) c(sin = sin(x), cos = cos(x)))) X <- cbind(intercept = 1, X) fit <- glarma(y = y[1:883], X = X[1:883,], type = "NegBin", phiLags = 1) ## simulate the last four weeks (only 500 runs, for speed) set.seed(1) means <- replicate(500, { forecast(fit, n.ahead = 4, newdata = X[884:887,], newoffset = rep(0,4))$mu }) ## derive the weekly forecast distributions dfun <- dnbmix(means, coef(fit, type = "NB")) dfun(4000, tp = 1) dfun(4000, tp = 4) curve(dfun(x, tp = 4), 0, 30000, type = "h", main = "4-weeks-ahead forecast", xlab = "No. infected", ylab = "Probability")
## a GLARMA example library("glarma") y <- as.vector(CHILI) ## fit a simple NegBin-GLARMA model X <- t(sapply(2*pi*seq_along(y)/52.1775, function (x) c(sin = sin(x), cos = cos(x)))) X <- cbind(intercept = 1, X) fit <- glarma(y = y[1:883], X = X[1:883,], type = "NegBin", phiLags = 1) ## simulate the last four weeks (only 500 runs, for speed) set.seed(1) means <- replicate(500, { forecast(fit, n.ahead = 4, newdata = X[884:887,], newoffset = rep(0,4))$mu }) ## derive the weekly forecast distributions dfun <- dnbmix(means, coef(fit, type = "NB")) dfun(4000, tp = 1) dfun(4000, tp = 4) curve(dfun(x, tp = 4), 0, 30000, type = "h", main = "4-weeks-ahead forecast", xlab = "No. infected", ylab = "Probability")
dhhh4sims
The function logs_hhh4sims
computes the logarithmic score of the
forecast distributions based on a surveillance::hhh4()
model
and
simulations (sims
) thereof. The
forecast distributions are obtained via dhhh4sims()
as sequential
mixtures of negative binomial (or Poisson) distributions, which is
different from the kernel density estimation approach employed in
scores_sample()
.
logs_hhh4sims(observed = NULL, sims, model)
logs_hhh4sims(observed = NULL, sims, model)
observed |
a vector or matrix of observed counts during the
simulation period. By default ( |
sims |
a |
model |
the |
a vector or matrix of log-scores for the observed
counts.
Sebastian Meyer
scores_sample()
for an alternative approach of calculating
the logarithmic score from simulation-based forecasts
dnbmix
The function logs_nbmix
computes the logarithmic score of forecasts
based on mixtures of negative binomial (or Poisson) distributions via
dnbmix()
. This is different from the kernel density estimation
approach available via scores_sample()
.
logs_nbmix(observed, means, size)
logs_nbmix(observed, means, size)
observed |
a vector of observed counts during the simulation period. |
means |
a |
size |
the dispersion parameter of the |
a vector of log-scores for the observed
counts.
Sebastian Meyer
scores_sample()
for an alternative approach of calculating
the logarithmic score from simulation-based forecasts
This function produces a fan chart of sequential (one-step-ahead) forecasts
with dots for the observed values, using surveillance::fanplot()
, which
itself wraps fanplot::fan()
. A matplot()
of score
values at each time point is added below ("slicing").
osaplot(quantiles, probs, means, observed, scores, start = 1, xlab = "Time", fan.args = list(), means.args = list(), observed.args = list(), key.args = list(), ..., scores.args = list(), legend.args = list(), heights = c(0.6, 0.4))
osaplot(quantiles, probs, means, observed, scores, start = 1, xlab = "Time", fan.args = list(), means.args = list(), observed.args = list(), key.args = list(), ..., scores.args = list(), legend.args = list(), heights = c(0.6, 0.4))
quantiles |
a time x |
probs |
numeric vector of probabilities with values between 0 and 1. |
means |
(optional) numeric vector of point forecasts at each time point. |
observed |
(optional) numeric vector of observed values. |
scores |
(optional) numeric vector (or matrix) of associated scores. |
start |
time index (x-coordinate) of the first prediction. |
xlab |
x-axis label. |
fan.args |
a list of graphical parameters for the |
means.args |
a list of graphical parameters for |
observed.args |
a list of graphical parameters for |
key.args |
if a list, a color key (in |
... |
further arguments are passed to |
scores.args |
a list of graphical parameters for |
legend.args |
if a list (of parameters for |
heights |
numeric vector of length 2 specifying the relative height of the two subplots. |
Sebastian Meyer
This is a simple wrapper around functions from the scoringRules
package for predictions with a LN(meanlog
, sdlog
)
distribution. The function is vectorized and preserves the dimension of
the input.
scores_lnorm(x, meanlog, sdlog, which = c("dss", "logs"))
scores_lnorm(x, meanlog, sdlog, which = c("dss", "logs"))
x |
the observed counts. |
meanlog , sdlog
|
parameters of the log-normal distribution, i.e., mean and standard deviation of the distribution on the log scale. |
which |
a character vector specifying which scoring rules to apply. The
Dawid-Sebastiani score ( |
scores for the predictions of the observations in x
(maintaining
their dimensions).
Compute scores for discretized log-normal forecasts. The function is vectorized and preserves the dimension of the input.
scores_lnorm_discrete(x, meanlog, sdlog, which = c("dss", "logs"))
scores_lnorm_discrete(x, meanlog, sdlog, which = c("dss", "logs"))
x |
the observed counts. |
meanlog , sdlog
|
parameters of the log-normal distribution, i.e., mean and standard deviation of the distribution on the log scale. |
which |
a character vector specifying which scoring rules to apply. The
Dawid-Sebastiani score ( |
scores for the predictions of the observations in x
(maintaining
their dimensions).
This is a simple wrapper around functions from the scoringRules
package to calculate scoring rules from simulation-based forecasts.
Calculation of the logarithmic score involves kernel density estimation,
see scoringRules::logs_sample()
.
The function is vectorized and preserves the dimension of the input.
scores_sample(x, sims, which = c("dss", "logs"))
scores_sample(x, sims, which = c("dss", "logs"))
x |
a vector of observed counts. |
sims |
a matrix of simulated counts with as many rows as |
which |
a character vector specifying which scoring rules to apply. The
Dawid-Sebastiani score ( |
scores for the predictions of the observations in x
(maintaining
their dimensions).
There seems to be no function in package forecast (as of version
8.2) to re-estimate an ARIMA model on a subset of the original time series.
This update
method does exactly that.
## S3 method for class 'Arima' update(object, subset, ...)
## S3 method for class 'Arima' update(object, subset, ...)
object |
an object of class |
subset |
an integer vector selecting part of the original time series (and external regressors). |
... |
further arguments to be passed to |
the updated model.
Sebastian Meyer