| Title: | A Bayesian Framework for Real-time Infectious Disease Surveillance |
|---|---|
| Description: | A modular Bayesian framework for real-time infectious disease surveillance. Provides tools for nowcasting, reproduction number estimation, delay estimation, and forecasting from data subject to reporting delays, right-truncation, missing data, and incomplete ascertainment. Users can build models suited to their setting using a flexible formula interface supporting fixed effects, random effects, random walks, and time-varying parameters, with options including parametric and non-parametric delay distributions with optional modifiers (via discrete-time hazard models), renewal processes, observation models, missing data imputation, and stratified analyses with partial pooling. By jointly estimating disease dynamics and reporting patterns, our framework enables earlier and more reliable detection of trends. While designed with epidemiological applications in mind, the framework can be applied to any right-truncated time series count data. |
| Authors: | Sam Abbott [aut, cre] (ORCID: <https://orcid.org/0000-0001-8057-8037>), Adrian Lison [aut] (ORCID: <https://orcid.org/0000-0002-6822-8437>), Sebastian Funk [aut], Carl Pearson [aut] (ORCID: <https://orcid.org/0000-0003-0701-7860>), Hugo Gruson [aut] (ORCID: <https://orcid.org/0000-0002-4094-1476>), Felix Guenther [aut] (ORCID: <https://orcid.org/0000-0001-6582-1174>), Michael DeWitt [aut] (ORCID: <https://orcid.org/0000-0001-8940-1967>), James Mba Azam [aut] (ORCID: <https://orcid.org/0000-0001-5782-7330>), Jessalyn Sebastian [aut] (ORCID: <https://orcid.org/0000-0002-1768-3229>), Hannah Choi [ctb], Pratik Gupte [ctb] (ORCID: <https://orcid.org/0000-0001-5294-7819>), Joel Hellewell [ctb] (ORCID: <https://orcid.org/0000-0003-2683-0849>), Luis Rivas [ctb], Sang Woo Park [ctb] (ORCID: <https://orcid.org/0000-0003-2202-3361>), Nathan McIntosh [ctb], Kath Sherratt [ctb] (ORCID: <https://orcid.org/0000-0003-2049-3423>), Nikos Bosse [ctb] (ORCID: <https://orcid.org/0000-0002-7750-5280>), Adam Howes [ctb] (ORCID: <https://orcid.org/0000-0003-2386-4031>), Kaitlyn Johnson [ctb] (ORCID: <https://orcid.org/0000-0001-8011-0012>), Barbora Nemcova [ctb] (ORCID: <https://orcid.org/0009-0004-7565-4145>) |
| Maintainer: | Sam Abbott <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.0.1000 |
| Built: | 2026-06-05 11:29:39 UTC |
| Source: | https://github.com/epinowcast/epinowcast |
This function calculates and adds the maximum observed delay for each group and reference date in the provided dataset. It first checks the validity of the observation indicator and then computes the maximum delay. If an observation indicator is provided, it further adjusts the maximum observed delay for unobserved data to be negative 1 (indicating no maximum observed).
add_max_observed_delay(new_confirm, observation_indicator = NULL)add_max_observed_delay(new_confirm, observation_indicator = NULL)
new_confirm |
A data.table containing the columns: "reference_date",
"delay", ".group", "new_confirm", and "max_obs_delay".
As produced by |
observation_indicator |
A character string specifying the column name
in |
A data.table with the original columns of new_confirm and an
additional "max_obs_delay" column representing the maximum observed delay
for each group and reference date. If an observation indicator is provided,
unobserved data will have a "max_obs_delay" value of -1.
Helper functions for model modules
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
This function allows the addition of probability mass functions (PMFs) to produce a new PMF. This is useful for example in the context of reporting delays where the PMF of the sum of two Poisson distributions is the convolution of the PMFs.
add_pmfs(pmfs)add_pmfs(pmfs)
pmfs |
A list of vectors describing the probability mass functions to |
A vector describing the probability mass function of the sum of the
Helper functions for model modules
add_max_observed_delay(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
# Sample and analytical PMFs for two Poisson distributions x <- rpois(10000, 5) xpmf <- dpois(0:20, 5) y <- rpois(10000, 7) ypmf <- dpois(0:20, 7) # Add sampled Poisson distributions up to get combined distribution z <- x + y # Analytical convolution of PMFs conv_pmf <- add_pmfs(list(xpmf, ypmf)) conv_cdf <- cumsum(conv_pmf) # Empirical convolution of PMFs cdf <- ecdf(z)(0:42) # Compare sampled and analytical CDFs plot(conv_cdf) lines(cdf, col = "black")# Sample and analytical PMFs for two Poisson distributions x <- rpois(10000, 5) xpmf <- dpois(0:20, 5) y <- rpois(10000, 7) ypmf <- dpois(0:20, 7) # Add sampled Poisson distributions up to get combined distribution z <- x + y # Analytical convolution of PMFs conv_pmf <- add_pmfs(list(xpmf, ypmf)) conv_cdf <- cumsum(conv_pmf) # Empirical convolution of PMFs cdf <- ecdf(z)(0:42) # Compare sampled and analytical CDFs plot(conv_cdf) lines(cdf, col = "black")
arima()
Thin wrapper around arima() that fixes d = 0 and q = 0. Matches
the in-formula ar() helper that brms users will be familiar with.
Equivalent to arima(time, by, p = p, d = 0, q = 0).
ar(time, by, p = 1)ar(time, by, p = 1)
time |
Time variable for the latent series; numeric. |
by |
Optional grouping variable. Each group draws an independent shock series; AR/MA parameters and the latent standard deviation are shared across groups. |
p |
Autoregressive order. Defaults to |
An enw_arima_term interpretable by construct_arima().
Functions used to help convert formulas into model designs
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
ar(time) ar(time, location, p = 2)ar(time) ar(time, location, p = 2)
A call to arima() can be used in the formula argument
of model construction functions in the epinowcast package such as
enw_formula(). It declares an ARIMA(p, d, q) latent series indexed
by time (and optionally a grouping variable by) whose value at
each observation is added to the linear predictor. As with rw(),
arguments are not evaluated; they are passed by name for use in
model construction. Setting p = d = q = 0 is not allowed; use
rw() (equivalent to arima(time, d = 1)) for a random walk.
arima(time, by, p = 1, d = 0, q = 0)arima(time, by, p = 1, d = 0, q = 0)
time |
Defines the time index of the ARIMA process. |
by |
Optional grouping variable. If supplied, an independent
ARIMA series is fitted for each level of |
p |
Non-negative integer. Order of the autoregressive part. Defaults to 1. |
d |
Non-negative integer. Order of differencing ( |
q |
Non-negative integer. Order of the moving-average part. Defaults to 0. |
A list of class enw_arima_term describing the ARIMA term,
interpretable by construct_arima(). Each group draws an independent
shock series; phi, theta, and sigma are shared across groups
(per-group parameters are a planned extension).
Functions used to help convert formulas into model designs
ar(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
arima(time) arima(time, location) arima(time, location, p = 2, d = 1, q = 1)arima(time) arima(time, location) arima(time, location, p = 2, d = 1, q = 1)
This function extracts ARIMA terms from a formula so that
they can be processed on their own. Matches all four user-facing
helpers that produce an enw_arima_term: arima(), plus the
convenience aliases ar(), ma(), and arma().
arima_terms(formula)arima_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector containing the ARIMA terms identified in the supplied formula.
Functions used to help convert formulas into model designs
ar(),
arima(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::arima_terms(~ 1 + age_group + arima(week)) epinowcast:::arima_terms(~ 1 + ar(week, p = 2)) epinowcast:::arima_terms(~ 1 + arma(week, location, p = 1, q = 1))epinowcast:::arima_terms(~ 1 + age_group + arima(week)) epinowcast:::arima_terms(~ 1 + ar(week, p = 2)) epinowcast:::arima_terms(~ 1 + arma(week, location, p = 1, q = 1))
arima()
Thin wrapper around arima() that fixes d = 0. Equivalent to
arima(time, by, p = p, d = 0, q = q). For an integrated
(random-walk) series use rw() or
arima(time, by, p = 0, d = 1, q = 0) directly.
arma(time, by, p = 1, q = 1)arma(time, by, p = 1, q = 1)
time |
Time variable for the latent series; numeric. |
by |
Optional grouping variable. Each group draws an independent shock series; AR/MA parameters and the latent standard deviation are shared across groups. |
p |
Autoregressive order. Defaults to |
q |
Moving-average order. Defaults to |
An enw_arima_term interpretable by construct_arima().
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
arma(time) arma(time, location, p = 1, q = 1)arma(time) arma(time, location, p = 1, q = 1)
This function is used to convert an epinowcast as returned by
epinowcast() object to a forecast_sample object which can be
used for scoring using the scoringutils package.
## S3 method for class 'epinowcast' as_forecast_sample(data, latest_obs, ...)## S3 method for class 'epinowcast' as_forecast_sample(data, latest_obs, ...)
data |
An |
latest_obs |
Latest observations to use for the true values
must contain |
... |
Additional arguments passed to
|
A forecast_sample object as returned by
scoringutils::as_forecast_sample()
library(scoringutils) nowcast <- enw_example("nowcast") latest_obs <- enw_example("observations") as_forecast_sample(nowcast, latest_obs)library(scoringutils) nowcast <- enw_example("nowcast") latest_obs <- enw_example("observations") as_forecast_sample(nowcast, latest_obs)
Converts formulas to strings
as_string_formula(formula)as_string_formula(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character string of the supplied formula
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::as_string_formula(~ 1 + age_group)epinowcast:::as_string_formula(~ 1 + age_group)
data.table.Build the ord_obs data.table.
build_ord_obs(obs, max_delay, internal_timestep, timestep, nowcast = NULL)build_ord_obs(obs, max_delay, internal_timestep, timestep, nowcast = NULL)
obs |
Observations as pulled from |
max_delay |
Whole number representing the maximum delay in units of the timestep. |
internal_timestep |
The internal timestep in days. |
timestep |
The timestep to be used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
nowcast |
If getting posterior samples, a data frame with a '.draws“ column to get the draws from, as pulled from the fit attribute of a nowcast. |
A data.table.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
This function checks the sparsity of a design matrix and provides a recommendation if the matrix is considered sparse.
check_design_matrix_sparsity( matrix, sparsity_threshold = 0.9, min_matrix_size = 50, name = "checked" )check_design_matrix_sparsity( matrix, sparsity_threshold = 0.9, min_matrix_size = 50, name = "checked" )
matrix |
A numeric matrix to be checked for sparsity. |
sparsity_threshold |
A numeric value between 0 and 1 indicating the threshold for considering a matrix sparse. Default is 0.9. |
min_matrix_size |
An integer indicating the minimum size of the matrix for which to perform the sparsity check. Default is 50. |
name |
A character string specifying the name of the design matrix. Default is "checked". |
This function is used for its side effect of providing an informational message if the matrix is sparse. It returns NULL invisibly.
Functions used for checking inputs
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
Check observations for reserved grouping variables
check_group(obs)check_group(obs)
obs |
An object that will be |
The obs object, which will be modifiable in place.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
reference_date and report_date
This function checks that the input data is stratified by
reference_date, report_date, and .group. It does this by counting the
number of observations for each combination of these variables, and
throwing a warning if any combination has more than one observation.
check_group_date_unique(obs)check_group_date_unique(obs)
obs |
An object that will be |
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
Check if maximum delay specified by the user is long enough and raise potential warnings. This is achieved by computing the share of reference dates where the cumulative case count is below some aspired coverage.
check_max_delay( data, max_delay = data$max_delay, cum_coverage = 0.8, maxdelay_quantile_outlier = 0.97, warn = TRUE, warn_internal = FALSE )check_max_delay( data, max_delay = data$max_delay, cum_coverage = 0.8, maxdelay_quantile_outlier = 0.97, warn = TRUE, warn_internal = FALSE )
data |
Output from |
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
cum_coverage |
The aspired percentage of cases that the maximum delay should cover. Defaults to 0.8 (80%). |
maxdelay_quantile_outlier |
Only reference dates sufficiently far in the past, determined based on the maximum observed delay, are included (see details). Instead of the overall maximum observed delay, a quantile of the maximum observed delay over all reference dates is used. This is more robust against outliers. Defaults to 0.97 (97%). |
warn |
Should a warning be issued if the cumulative case count is
below |
warn_internal |
Should only be |
When data is very sparse (e.g., predominantly zero counts), the
function may not be able to compute meaningful coverage statistics.
In such cases, a warning is issued and the function treats the data as
having no coverage issues.
This typically occurs when groups have very few non-zero observations or
when the specified max_delay is too large relative to available
data.
The coverage is with respect to the maximum observed case count for the corresponding reference date. As the maximum observed case count is likely smaller than the true overall case count for not yet fully observed reference dates (due to right truncation), only reference dates that are more than the maximum observed delay ago are included. Still, because we can only use the maximum observed delay, not the unknown true maximum delay, the computed coverage values should be interpreted with care, as they are only proxies for the true coverage.
A data.table with the share of reference dates where the
cumulative case count is below cum_coverage, stratified by group.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
pobs <- enw_example(type = "preprocessed_observations") check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)pobs <- enw_example(type = "preprocessed_observations") check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)
Check a model module contains the required components
check_module(module)check_module(module)
module |
A model module. For example |
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
Check that model modules have compatible specifications
check_modules_compatible(modules)check_modules_compatible(modules)
modules |
A list of model modules. |
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
This function verifies if the difference in numeric dates in the provided observations corresponds to the provided timestep.
check_numeric_timestep(dates, date_var, timestep, exact = TRUE)check_numeric_timestep(dates, date_var, timestep, exact = TRUE)
dates |
Vector of Date class representing dates. |
date_var |
The variable in |
timestep |
Numeric timestep for date difference. |
exact |
Logical, if |
This function is used for its side effect of stopping if the check fails. If the check passes, the function returns invisibly.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
This function verifies if the observation_indicator within the provided
new_confirm observations is logical. The check is performed to ensure
that the observation_indicator is of the correct type.
check_observation_indicator(new_confirm, observation_indicator = NULL)check_observation_indicator(new_confirm, observation_indicator = NULL)
new_confirm |
A data frame containing the observations to be checked. |
observation_indicator |
A character string specifying the column name
in |
This function is used for its side effect of checking the observation
indicator in new_confirm. If the check passes, the function returns
invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_quantiles(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
Check required quantiles are present
check_quantiles(posterior, req_probs = c(0.5, 0.95, 0.2, 0.8))check_quantiles(posterior, req_probs = c(0.5, 0.95, 0.2, 0.8))
posterior |
A |
req_probs |
A numeric vector of required probabilities. Default: c(0.5, 0.95, 0.2, 0.8). |
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_timestep(),
check_timestep_by_date(),
check_timestep_by_group()
This function verifies if the difference in dates in the provided
observations corresponds to the provided timestep. If the exact argument
is set to TRUE, the function checks if all differences exactly match the
timestep; otherwise, it checks if the sum of the differences modulo the
timestep equals zero. If the check fails, the function stops and returns an
error message.
check_timestep( obs, date_var, timestep = "day", exact = TRUE, check_nrow = TRUE )check_timestep( obs, date_var, timestep = "day", exact = TRUE, check_nrow = TRUE )
obs |
Any of the types supported by |
date_var |
The variable in |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
exact |
Logical, if |
check_nrow |
Logical, if |
This function is used for its side effect of stopping if the check fails. If the check passes, the function returns invisibly.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep_by_date(),
check_timestep_by_group()
This function verifies if the difference in dates within each date in the
provided observations corresponds to the provided timestep. This check is
performed for both report_date and reference_date and for each group in
obs.
check_timestep_by_date(obs, timestep = "day", exact = TRUE)check_timestep_by_date(obs, timestep = "day", exact = TRUE)
obs |
Any of the types supported by |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
exact |
Logical, if |
This function is used for its side effect of checking the timestep
by date in obs. If the check passes for all dates, the function
returns invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_group()
This function verifies if the difference in dates within each group in the
provided observations corresponds to the provided timestep. This check is
performed for the specified date_var and for each group in obs.
check_timestep_by_group(obs, date_var, timestep = "day", exact = TRUE)check_timestep_by_group(obs, date_var, timestep = "day", exact = TRUE)
obs |
Any of the types supported by |
date_var |
The variable in |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
exact |
Logical, if |
This function is used for its side effect of checking the timestep
by group in obs. If the check passes for all groups, the function
returns invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_design_matrix_sparsity(),
check_group(),
check_group_date_unique(),
check_max_delay(),
check_module(),
check_modules_compatible(),
check_numeric_timestep(),
check_observation_indicator(),
check_quantiles(),
check_timestep(),
check_timestep_by_date()
Provides consistent coercion of inputs to IDate with error handling
coerce_date(dates = NULL)coerce_date(dates = NULL)
dates |
A vector-like input, which the function attempts
to coerce via |
If any of the elements of dates cannot be coerced,
this function will result in an error, indicating all indices
which cannot be coerced to IDate.
Internal methods of epinowcast assume dates are represented as IDate.
An IDate vector.
Utility functions
coerce_dt(),
date_to_numeric_modulus(),
enw_get_data(),
enw_rolling_sum(),
get_internal_timestep(),
is.Date(),
stan_fns_as_string()
# works coerce_date(c("2020-05-28", "2020-05-29")) # does not, indicates index 2 is problem tryCatch( coerce_date(c("2020-05-28", "2020-o5-29")), error = function(e) { print(e) } )# works coerce_date(c("2020-05-28", "2020-05-29")) # does not, indicates index 2 is problem tryCatch( coerce_date(c("2020-05-28", "2020-o5-29")), error = function(e) { print(e) } )
data.tablesProvides consistent coercion of inputs to data.table with error handling, column checking, and optional selection.
coerce_dt( data, select = NULL, required_cols = select, forbidden_cols = NULL, group = FALSE, dates = FALSE, copy = TRUE, msg_required = "The following columns are required: ", msg_forbidden = "The following columns are forbidden: " )coerce_dt( data, select = NULL, required_cols = select, forbidden_cols = NULL, group = FALSE, dates = FALSE, copy = TRUE, msg_required = "The following columns are required: ", msg_forbidden = "The following columns are forbidden: " )
data |
Any of the types supported by |
select |
An optional character vector of columns to return; unchecked
n.b. it is an error to include ".group"; use |
required_cols |
An optional character vector of required columns |
forbidden_cols |
An optional character vector of forbidden columns |
group |
A logical; ensure the presence of a |
dates |
A logical; ensure the presence of |
copy |
A logical; if |
msg_required |
A character string; for |
msg_forbidden |
A character string; for |
This function provides a single-point function for getting a "local"
version of data provided by the user, in the internally used data.table
format. It also enables selectively copying versus not, as well as checking
for the presence and/or absence of various columns.
While it is intended to address garbage in from the user, it does not generally attempt to address garbage in from the developer - e.g. if asking for overlapping required and forbidden columns (though that will lead to an always-error condition).
When dates = TRUE, this function ensures that report_date and
reference_date columns are coerced to IDate class with integer storage
mode. This is necessary because some operations (such as dplyr::filter())
can convert IDate columns to double storage mode whilst preserving the
class, which violates data.table's requirements and causes errors in
subsequent date arithmetic operations.
A data.table; the returned object will be a copy, unless
copy = FALSE, in which case modifications are made in-place
Utility functions
coerce_date(),
date_to_numeric_modulus(),
enw_get_data(),
enw_rolling_sum(),
get_internal_timestep(),
is.Date(),
stan_fns_as_string()
Takes an ARIMA term as defined by arima() and returns
the metadata required to wire the term into a Stan model. Unlike
construct_rw(), this does not modify the data or produce design
matrix columns; ARIMA latent residuals enter the linear predictor
through a parameter-dependent kernel applied to unit-normal shocks
(see inst/stan/functions/arima_kernel.stan).
construct_arima(arima, data)construct_arima(arima, data)
arima |
An ARIMA term as defined by |
data |
A |
A list with the following elements:
time, by, p, d, q: passed through from the arima()
term.
T: number of distinct time points in the series.
G: number of groups (1 if by is unspecified).
time_idx: integer vector mapping each row of data to a
time index in 1:T.
group_idx: integer vector mapping each row of data to a
group index in 1:G.
time_vals, group_levels: lookup vectors so the indices can
be inverted.
name: a label for the term, suitable as a parameter prefix.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_arima(arima(week), data) epinowcast:::construct_arima( arima(week, day_of_week, p = 2, d = 1), data )data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_arima(arima(week), data) epinowcast:::construct_arima( arima(week, day_of_week, p = 2, d = 1), data )
Takes a Gaussian process term as defined by gp() and
returns the metadata required to wire the term into a Stan model.
Like construct_arima(), this does not modify the data or produce
design matrix columns; the Gaussian process enters the linear
predictor through a Hilbert-space reduced-rank approximation (see
inst/stan/functions/gaussian_process.stan).
construct_gp(gp, data)construct_gp(gp, data)
gp |
A Gaussian process term as defined by |
data |
A |
A list with the following elements:
time, by, kernel, gp_type, nu, d, basis_prop,
boundary_scale: passed through from the gp() term.
T: number of distinct time points in the integrated series.
G: number of groups (1 if by is unspecified).
M: number of basis functions, ceiling(basis_prop * (T - d)).
PHI: the (T - d) x M basis matrix. For d >= 1 the basis is
built on the T - d free values that are integrated d times in
Stan; the first d values of the realisation are anchored to
zero.
time_idx, group_idx: per-observation lookup indices.
time_vals, group_levels: lookup vectors so the indices can
be inverted.
name: a label for the term, suitable as a parameter prefix.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_gp(gp(week), data) epinowcast:::construct_gp(gp(week, day_of_week, kernel = "se"), data)data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_gp(gp(week), data) epinowcast:::construct_gp(gp(week, day_of_week, kernel = "se"), data)
Constructs random effect terms
construct_re(re, data)construct_re(re, data)
re |
A random effect as defined using |
data |
A |
A list containing the transformed data ("data"),
fixed effects terms ("terms") and a data.frame specifying
the random effect structure between these terms (effects). Note
that if the specified random effect was not a factor it will have been
converted into one.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
# Simple examples form <- epinowcast:::parse_formula(~ 1 + (1 | day_of_week)) data <- enw_example("prepr")$metareference[[1]] random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, data) # A more complex example form <- epinowcast:::parse_formula( ~ 1 + disp + (1 + gear | cyl) + (0 + wt | am) ) random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, mtcars) random_effect2 <- re(form$random[[2]]) epinowcast:::construct_re(random_effect2, mtcars)# Simple examples form <- epinowcast:::parse_formula(~ 1 + (1 | day_of_week)) data <- enw_example("prepr")$metareference[[1]] random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, data) # A more complex example form <- epinowcast:::parse_formula( ~ 1 + disp + (1 + gear | cyl) + (0 + wt | am) ) random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, mtcars) random_effect2 <- re(form$random[[2]]) epinowcast:::construct_re(random_effect2, mtcars)
This function takes random walks as defined
by rw(), produces the required additional variables
(denoted using a "c" prefix and constructed using
enw_add_cumulative_membership()), and then returns the
extended data.frame along with the new fixed effects and the
random effect structure.
construct_rw(rw, data)construct_rw(rw, data)
rw |
A random walk term as defined by |
data |
A |
A list containing the following:
data: The input data.frame with the addition of the new variables
required by the specified random walk. These are added using
enw_add_cumulative_membership().
-terms: A character vector of new fixed effects terms to add to a model
formula.
effects: A data.frame describing the random effect structure of the
new effects.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_rw(rw(week), data) epinowcast:::construct_rw(rw(week, day_of_week), data)data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_rw(rw(week), data) epinowcast:::construct_rw(rw(week, day_of_week), data)
This function allows the construction of convolution matrices which can be be combined with a vector of primary events to produce a vector of secondary events for example in the form of a renewal equation or to simulate reporting delays. Time-varying delays are supported as well as distribution padding (to allow for use in renewal equation like approaches).
convolution_matrix(dist, t, include_partial = FALSE)convolution_matrix(dist, t, include_partial = FALSE)
dist |
A vector of list of vectors describing the distribution to be convolved as a probability mass function. |
t |
Integer value indicating the number of time steps to convolve over. |
include_partial |
Logical, defaults to FALSE. If TRUE, the convolution include partially complete secondary events. |
A matrix with each column indicating a primary event and each row indicating a secondary event.
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
# Simple convolution matrix with a static distribution convolution_matrix(c(1, 2, 3), 10) # Include partially reported convolutions convolution_matrix(c(1, 2, 3), 10, include_partial = TRUE) # Use a list of distributions convolution_matrix(rep(list(c(1, 2, 3)), 10), 10) # Use a time-varying list of distributions convolution_matrix(c(rep(list(c(1, 2, 3)), 10), list(c(4, 5, 6))), 11)# Simple convolution matrix with a static distribution convolution_matrix(c(1, 2, 3), 10) # Include partially reported convolutions convolution_matrix(c(1, 2, 3), 10, include_partial = TRUE) # Use a list of distributions convolution_matrix(rep(list(c(1, 2, 3)), 10), 10) # Use a time-varying list of distributions convolution_matrix(c(rep(list(c(1, 2, 3)), 10), list(c(4, 5, 6))), 11)
This function processes a date column in a data.table, converting it to a
numeric representation and then computing the modulus with the provided
timestep.
date_to_numeric_modulus(dt, date_column, timestep)date_to_numeric_modulus(dt, date_column, timestep)
dt |
A data.table. |
date_column |
A character string representing the name of the date column in dt. |
timestep |
An integer representing the internal timestep. |
A modified data.table with two new columns: one for the numeric representation of the date minus the minimum date and another for its modulus with the timestep.
Utility functions
coerce_date(),
coerce_dt(),
enw_get_data(),
enw_rolling_sum(),
get_internal_timestep(),
is.Date(),
stan_fns_as_string()
Calculate cumulative reported cases from incidence of new reports
enw_add_cumulative(obs, by = NULL, copy = TRUE)enw_add_cumulative(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
Should |
The input data.frame with a new variable confirm.
Data converters
enw_add_incidence(),
enw_aggregate_cumulative(),
enw_incidence_to_linelist(),
enw_linelist_to_incidence()
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_filter_reference_dates_by_report_start(dt) dt <- enw_add_incidence(dt) dt <- dt[, confirm := NULL] enw_add_cumulative(dt) # Make use of maximum reported to calculate empirical daily reporting enw_add_cumulative(dt)# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_filter_reference_dates_by_report_start(dt) dt <- enw_add_incidence(dt) dt <- dt[, confirm := NULL] enw_add_cumulative(dt) # Make use of maximum reported to calculate empirical daily reporting enw_add_cumulative(dt)
data.frame
This function adds a cumulative membership effect to a data
frame. This is useful for specifying models such as random walks (using
rw()) where these features can be used in the design matrix with the
appropriate formula. Supports grouping via the optional .group column.
Note that cumulative membership is indexed to start with zero (i.e. the
first observation is assigned a cumulative membership of zero).
enw_add_cumulative_membership(metaobs, feature, copy = TRUE)enw_add_cumulative_membership(metaobs, feature, copy = TRUE)
metaobs |
A |
feature |
The name of the column in |
copy |
Should |
A data.frame with a new columns cfeature$ that contain the
cumulative membership effect for each value of feature. For example if the
original feature was week (with numeric entries 1, 2, 3) then the new
columns will be cweek1, cweek2, and cweek3.
Functions used to formulate models
enw_add_pooling_effect(),
enw_design(),
enw_effects_metadata(),
enw_one_hot_encode_feature()
metaobs <- data.frame(week = 1:2) enw_add_cumulative_membership(metaobs, "week") metaobs <- data.frame(week = 1:3, .group = c(1,1,2)) enw_add_cumulative_membership(metaobs, "week")metaobs <- data.frame(week = 1:2) enw_add_cumulative_membership(metaobs, "week") metaobs <- data.frame(week = 1:3, .group = c(1,1,2)) enw_add_cumulative_membership(metaobs, "week")
This helper function takes a data.frame or data.table of
observations and adds the delay (numeric, in days) between reference_date
and report_date for each observation.
enw_add_delay(obs, timestep = "day", copy = TRUE)enw_add_delay(obs, timestep = "day", copy = TRUE)
obs |
A |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
copy |
Should |
A data.table of observations with a new column delay.
Preprocessing functions
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame(report_date = as.Date("2021-01-01") + -2:0) obs$reference_date <- as.Date("2021-01-01") enw_add_delay(obs)obs <- data.frame(report_date = as.Date("2021-01-01") + -2:0) obs$reference_date <- as.Date("2021-01-01") enw_add_delay(obs)
Computes incident counts from cumulative
reports. Users should typically call
enw_filter_reference_dates_by_report_start() before
this function to remove reference dates that precede the
earliest report date, which would otherwise produce
spurious leading entries.
enw_add_incidence(obs, set_negatives_to_zero = TRUE, by = NULL, copy = TRUE)enw_add_incidence(obs, set_negatives_to_zero = TRUE, by = NULL, copy = TRUE)
obs |
A |
set_negatives_to_zero |
Logical, defaults to TRUE.
Should negative counts (for calculated incidence of
observations) be set to zero? Currently downstream
modelling does not support negative counts and so
setting must be TRUE if intending to use
|
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
Should |
The input data.frame with a new variable
new_confirm. If max_confirm is present in the
data.frame, then the proportion reported on each day
(prop_reported) will also be added.
Data converters
enw_add_cumulative(),
enw_aggregate_cumulative(),
enw_incidence_to_linelist(),
enw_linelist_to_incidence()
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_filter_reference_dates_by_report_start(dt) enw_add_incidence(dt) # Make use of maximum reported to calculate empirical # daily reporting dt <- germany_covid19_hosp[location == "DE"][ age_group == "00+" ] dt <- enw_add_max_reported(dt) dt <- enw_filter_reference_dates_by_report_start(dt) enw_add_incidence(dt)# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_filter_reference_dates_by_report_start(dt) enw_add_incidence(dt) # Make use of maximum reported to calculate empirical # daily reporting dt <- germany_covid19_hosp[location == "DE"][ age_group == "00+" ] dt <- enw_add_max_reported(dt) dt <- enw_filter_reference_dates_by_report_start(dt) enw_add_incidence(dt)
Add the latest observations to the nowcast output. This is useful for plotting the nowcast against the latest observations.
enw_add_latest_obs_to_nowcast(nowcast, obs)enw_add_latest_obs_to_nowcast(nowcast, obs)
nowcast |
A |
obs |
An observation |
A data.frame of nowcast output with the latest observations
added.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") obs <- enw_example("obs") nowcast <- summary(fit, type = "nowcast") enw_add_latest_obs_to_nowcast(nowcast, obs)fit <- enw_example("nowcast") obs <- enw_example("obs") nowcast <- summary(fit, type = "nowcast") enw_add_latest_obs_to_nowcast(nowcast, obs)
reference_date
This is a helper function which adds the maximum (in the sense of latest observed) number of reported cases for each reference_date and computes the proportion of already reported cases for each combination of reference_date and report_date.
enw_add_max_reported(obs, copy = TRUE)enw_add_max_reported(obs, copy = TRUE)
obs |
A |
copy |
Should |
A data.table with new columns max_confirm and cum_prop_reported.
max_confirm is the maximum number of cases reported for a certain
reference_date. cum_prop_reported is the proportion of cases for a certain
reference_date that are reported until a given report_day, relative to all
cases so far observed for this reference_date.
Preprocessing functions
enw_add_delay(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame(report_date = as.Date("2021-01-01") + 0:2) obs$reference_date <- as.Date("2021-01-01") obs$confirm <- 1:3 enw_add_max_reported(obs)obs <- data.frame(report_date = as.Date("2021-01-01") + 0:2) obs$reference_date <- as.Date("2021-01-01") obs$confirm <- 1:3 enw_add_max_reported(obs)
If not already present, annotates time series data with metadata commonly used in models: day of week, and days, weeks, and months since start of time series.
enw_add_metaobs_features( metaobs, holidays = NULL, holidays_to = "Sunday", datecol = "date" )enw_add_metaobs_features( metaobs, holidays = NULL, holidays_to = "Sunday", datecol = "date" )
metaobs |
Raw data, coercible via |
holidays |
a (potentially empty) vector of dates (or input
coercible to such; see |
holidays_to |
A character string to assign to holidays, when |
datecol |
The column in |
Effects models often need to include covariates for time-based features, such as day of the week (e.g. to reflect different care-seeking and/or reporting behaviour).
This function is called from within enw_preprocess_data() to systematically
annotate metaobs with these commonly used metadata, if not already present.
However, it can also be used directly on other data.
A copy of the metaobs input, with additional columns:
day_of_week, a factor of values as output from weekdays() and
possibly as holiday_to if distinct from weekdays values
day, numeric, 0 based from start of time series
week, numeric, 0 based from start of time series
month, numeric, 0 based from start of time series
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
# make some example date nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "80+" )[1:40] basemeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date" ) basemeta # with holidays - n.b.: holidays not found are silently ignored holidaymeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date", holidays = c( "2021-04-04", "2021-04-05", "2021-05-01", "2021-05-13", "2021-05-24" ), holidays_to = "Holiday" ) holidaymeta subset(holidaymeta, day_of_week == "Holiday")# make some example date nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "80+" )[1:40] basemeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date" ) basemeta # with holidays - n.b.: holidays not found are silently ignored holidaymeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date", holidays = c( "2021-04-04", "2021-04-05", "2021-05-01", "2021-05-13", "2021-05-24" ), holidays_to = "Holiday" ) holidaymeta subset(holidaymeta, day_of_week == "Holiday")
This function adds a pooling effect to the metadata
returned by enw_effects_metadata(). It does this updating the
fixed column to 0 for the effects that match the string argument and
adding a new column var_name that is 1 for the effects that match the
string argument and 0 otherwise.
enw_add_pooling_effect(effects, var_name = "sd", finder_fn = startsWith, ...)enw_add_pooling_effect(effects, var_name = "sd", finder_fn = startsWith, ...)
effects |
A
This is the output of |
var_name |
The name of the new column that will be added to the
|
finder_fn |
A function that will be used to find the effects that
match the string. Defaults to |
... |
Additional arguments to |
A data.table with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0).
Argument supplied to var_name: a logical indicating whether the effect
should be pooled (1) or not (0).
Functions used to formulate models
enw_add_cumulative_membership(),
enw_design(),
enw_effects_metadata(),
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design effects <- enw_effects_metadata(design) enw_add_pooling_effect(effects, prefix = "b")data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design effects <- enw_effects_metadata(design) enw_add_pooling_effect(effects, prefix = "b")
This function aggregates observations over a specified timestep,
ensuring alignment on the same day of week for report and reference dates.
It is useful for aggregating data to a weekly timestep, for example which
may be desirable if testing using a weekly timestep or if you are very
concerned about runtime. Note that the start of the timestep will be
determined by min_date + a single timestep (i.e. the
first timestep will be "2022-10-23" if the minimum reference date is
"2022-10-16"). Observations where the report dates do not form a complete
timestep will be dropped from the aggregated output.
enw_aggregate_cumulative( obs, timestep = "day", by = NULL, min_reference_date = min(obs$reference_date, na.rm = TRUE), copy = TRUE )enw_aggregate_cumulative( obs, timestep = "day", by = NULL, min_reference_date = min(obs$reference_date, na.rm = TRUE), copy = TRUE )
obs |
An object coercible to a |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
by |
A character vector of variables to also aggregate by (i.e. as well
as using the |
min_reference_date |
The minimum reference date to start the
aggregation from. Note that the timestep will start from the minimum
reference date + a single time step (i.e. the first timestep will be
"2022-10-23" if the minimum reference date is "2022-10-16"). The default
is the minimum reference date in the |
copy |
Should |
A data.table with aggregated observations.
Data converters
enw_add_cumulative(),
enw_add_incidence(),
enw_incidence_to_linelist(),
enw_linelist_to_incidence()
nat_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_aggregate_cumulative(nat_hosp, timestep = "week")nat_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_aggregate_cumulative(nat_hosp, timestep = "week")
Assign a group to each row of a data.table. If by is
specified, then each unique combination of the columns in by will
be assigned a unique group. If by is not specified, then all rows
will be assigned to the same group.
enw_assign_group(obs, by = NULL, copy = TRUE)enw_assign_group(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector of column names to group by. Defaults to an empty vector. |
copy |
A logical; make a copy (default) of |
A data.table with a .group column added ordered by .group
and the existing key of obs.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame(x = 1:3, y = 1:3) enw_assign_group(obs) enw_assign_group(obs, by = "x")obs <- data.frame(x = 1:3, y = 1:3) enw_assign_group(obs) enw_assign_group(obs, by = "x")
Ensures that all reference and report dates are present for
all groups based on the maximum and minimum dates found in the data.
This function may be of use to users when preprocessing their data. In
general all features that you may consider using as grouping variables
or as covariates need to be included in the by variable.
enw_complete_dates( obs, by = NULL, max_delay, min_date = min(obs$reference_date, na.rm = TRUE), max_date = max(obs$report_date, na.rm = TRUE), timestep = "day", missing_reference = TRUE, completion_beyond_max_report = FALSE, flag_observation = FALSE )enw_complete_dates( obs, by = NULL, max_delay, min_date = min(obs$reference_date, na.rm = TRUE), max_date = max(obs$report_date, na.rm = TRUE), timestep = "day", missing_reference = TRUE, completion_beyond_max_report = FALSE, flag_observation = FALSE )
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
min_date |
The minimum date to include in the data. Defaults to the minimum reference date found in the data. |
max_date |
The maximum date to include in the data. Defaults to the maximum report date found in the data. |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
missing_reference |
Logical, should entries for cases with missing reference date be completed as well?, Default: TRUE |
completion_beyond_max_report |
Logical, should entries be completed beyond the maximum date found in the data? Default: FALSE |
flag_observation |
Logical, should observations that have been
imputed as missing be flagged as not observed?. Makes use of
|
A data.table with completed entries for all combinations of
reference dates, groups and possible report dates.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) enw_complete_dates(obs) # Allow completion beyond the maximum date found in the data enw_complete_dates(obs, completion_beyond_max_report = TRUE, max_delay = 10)obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) enw_complete_dates(obs) # Allow completion beyond the maximum date found in the data enw_complete_dates(obs, completion_beyond_max_report = TRUE, max_delay = 10)
This function is used internally by enw_preprocess_data() to combine
various pieces of processed observed data into a single object. It
is exposed to the user in order to allow for modular data preprocessing
though this is not currently recommended. See documentation and code
of enw_preprocess_data() for more on the expected inputs.
enw_construct_data( obs, new_confirm, latest, missing_reference, reporting_triangle, metareport, metareference, metadelay, max_delay, timestep, by )enw_construct_data( obs, new_confirm, latest, missing_reference, reporting_triangle, metareport, metareference, metadelay, max_delay, timestep, by )
obs |
Observations with the addition of empirical reporting proportions and and restricted to the specified maximum delay. |
new_confirm |
Incidence of notifications by reference and report date. Empirical reporting distributions are also added. |
latest |
The latest available observations. |
missing_reference |
A |
reporting_triangle |
Incident observations by report and reference date in the standard reporting triangle matrix format. |
metareport |
Metadata for report dates. |
metareference |
Metadata reference dates derived from observations. |
metadelay |
Metadata for reporting delays produced using
|
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs: (observations with the addition of empirical reporting proportions
and restricted to the specified maximum delay).
new_confirm: Incidence of notifications by reference and report date.
Empirical reporting distributions are also added.
latest: The latest available observations.
missing_reference: Observations missing reference dates.
reporting_triangle: Incident observations by report and reference date in
the standard reporting triangle matrix format.
metareference: Metadata reference dates derived from observations.
metrareport: Metadata for report dates.
metadelay: Metadata for reporting delays produced using
enw_metadata_delay().
max_delay: Maximum delay to be modelled by epinowcast.
time: Numeric, number of timepoints in the data.
snapshots: Numeric, number of available data snapshots to use for
nowcasting.
groups: Numeric, Number of groups/strata in the supplied observations
(set using by).
max_date: The maximum available report date.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
pobs <- enw_example("preprocessed") enw_construct_data( obs = pobs$obs[[1]], new_confirm = pobs$new_confirm[[1]], latest = pobs$latest[[1]], missing_reference = pobs$missing_reference[[1]], reporting_triangle = pobs$reporting_triangle[[1]], metareport = pobs$metareport[[1]], metareference = pobs$metareference[[1]], metadelay = pobs$metadelay[[1]], max_delay = pobs$max_delay, timestep = pobs$timestep[[1]], by = c() )pobs <- enw_example("preprocessed") enw_construct_data( obs = pobs$obs[[1]], new_confirm = pobs$new_confirm[[1]], latest = pobs$latest[[1]], missing_reference = pobs$missing_reference[[1]], reporting_triangle = pobs$reporting_triangle[[1]], metareport = pobs$metareport[[1]], metareference = pobs$metareference[[1]], metadelay = pobs$metadelay[[1]], max_delay = pobs$max_delay, timestep = pobs$timestep[[1]], by = c() )
Creates a structural reporting pattern for cases where reporting only
occurs on specific days of the week (e.g., Wednesday-only reporting).
This is a convenience function that builds on
enw_structural_reporting_metadata().
enw_dayofweek_structural_reporting(pobs, day_of_week)enw_dayofweek_structural_reporting(pobs, day_of_week)
pobs |
A preprocessed observation list from
|
day_of_week |
Character vector of weekday names when reporting
occurs (e.g., |
A data.table with columns:
.group: Group identifier
date: Reference date
report_date: Report date
report: Binary indicator (1 = reporting occurs, 0 = no reporting)
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
## Not run: pobs <- enw_preprocess_data(obs, max_delay = 30) # Wednesday-only reporting enw_dayofweek_structural_reporting( pobs, day_of_week = "Wednesday" ) # Multiple reporting days enw_dayofweek_structural_reporting( pobs, day_of_week = c("Monday", "Wednesday", "Friday") ) ## End(Not run)## Not run: pobs <- enw_preprocess_data(obs, max_delay = 30) # Wednesday-only reporting enw_dayofweek_structural_reporting( pobs, day_of_week = "Wednesday" ) # Multiple reporting days enw_dayofweek_structural_reporting( pobs, day_of_week = c("Monday", "Wednesday", "Friday") ) ## End(Not run)
Categorises incidence by delay group with empirical
reporting proportions. Intended for use with the
plot.enw_preprocess_data() visualisation types that show
reporting patterns by delay.
enw_delay_categories(pobs, delay_group_thresh)enw_delay_categories(pobs, delay_group_thresh)
pobs |
A preprocessed data object as produced by
|
delay_group_thresh |
A numeric vector defining
left-closed interval thresholds for grouping reporting
delays. The smallest value should be zero and the largest
should exceed |
A data.table of notification incidence by reference
date and delay group, including columns prop_reported
and cum_prop_reported.
Plotting functions
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_delay_categories(pobs, delay_group_thresh = c(0, 2, 5, 10, 21))pobs <- enw_example("preprocessed_observations") enw_delay_categories(pobs, delay_group_thresh = c(0, 2, 5, 10, 21))
Computes empirical quantiles of the reporting
delay distribution for each reference date. Intended for
use with the "delay_quantiles" plot type in
plot.enw_preprocess_data().
enw_delay_quantiles(pobs, quantiles = c(0.1, 0.5, 0.9))enw_delay_quantiles(pobs, quantiles = c(0.1, 0.5, 0.9))
pobs |
A preprocessed data object as produced by
|
quantiles |
A numeric vector of probabilities for which
quantiles are computed. Defaults to |
A data.table with columns for each quantile by
reference date.
Plotting functions
enw_delay_categories(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_delay_quantiles(pobs)pobs <- enw_example("preprocessed_observations") enw_delay_quantiles(pobs)
This function is a wrapper around stats::model.matrix() that
can optionally return a sparse design matrix defined as the unique
number of rows in the design matrix and an index vector that
allows the full design matrix to be reconstructed. This is useful
for models that have many repeated rows in the design matrix and that
are computationally expensive to fit. This function also allows
for the specification of contrasts for categorical variables.
enw_design(formula, data, no_contrasts = FALSE, sparse = TRUE, ...)enw_design(formula, data, no_contrasts = FALSE, sparse = TRUE, ...)
formula |
An R formula. |
data |
A |
no_contrasts |
A vector of variable names that should not be
converted to contrasts. If |
sparse |
Logical, if TRUE return a sparse design matrix. Defaults to TRUE. |
... |
Arguments passed on to |
A list containing the formula, the design matrix, and the index.
Functions used to formulate models
enw_add_cumulative_membership(),
enw_add_pooling_effect(),
enw_effects_metadata(),
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) enw_design(a ~ b + c, data) enw_design(a ~ b + c, data, no_contrasts = TRUE) enw_design(a ~ b + c, data, no_contrasts = c("b")) enw_design(a ~ c, data, sparse = TRUE) enw_design(a ~ c, data, sparse = FALSE)data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) enw_design(a ~ b + c, data) enw_design(a ~ b + c, data, no_contrasts = TRUE) enw_design(a ~ b + c, data, no_contrasts = c("b")) enw_design(a ~ c, data, sparse = TRUE) enw_design(a ~ c, data, sparse = FALSE)
This function extracts metadata from a design matrix and returns a data.table with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0).
It automatically drops the intercept (defined as "(Intercept)").
This function is useful for constructing a model design object for random
effects when used in combination with ewn_add_pooling_effect.
enw_effects_metadata(design)enw_effects_metadata(design)
design |
A design matrix as returned by |
A data.table with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0)
Functions used to formulate models
enw_add_cumulative_membership(),
enw_add_pooling_effect(),
enw_design(),
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design enw_effects_metadata(design)data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design enw_effects_metadata(design)
Loads examples of nowcasts produced using example scripts. Used to streamline
examples, in package tests and to enable users to explore package
functionality without needing to install cmdstanr.
enw_example( type = c("nowcast", "preprocessed_observations", "observations", "script") )enw_example( type = c("nowcast", "preprocessed_observations", "observations", "script") )
type |
A character string indicating the example to load. Supported options are
|
Depending on type, a data.table of the requested output OR
the file name(s) to generate these outputs (type = "script")
Package data sets
germany_covid19_hosp
# Load the nowcast enw_example(type = "nowcast") # Load the preprocessed observations enw_example(type = "preprocessed_observations") # Load the latest observations enw_example(type = "observations") # Load the script used to generate these examples # Optionally source this script to regenerate the example readLines(enw_example(type = "script"))# Load the nowcast enw_example(type = "nowcast") # Load the preprocessed observations enw_example(type = "preprocessed_observations") # Load the latest observations enw_example(type = "observations") # Load the script used to generate these examples # Optionally source this script to regenerate the example readLines(enw_example(type = "script"))
Expectation model module
enw_expectation( r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data, ... )enw_expectation( r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data, ... )
r |
A formula (as implemented in |
generation_time |
A numeric vector that sums to 1 and defaults to 1. Describes the weighting to apply to previous generations (i.e as part of a renewal equation). When set to 1 (the default) this corresponds to modelling the daily growth rate. |
observation |
A formula (as implemented in |
latent_reporting_delay |
A numeric vector that defaults to 1. Describes the weighting to apply to past and current latent expected observations (from most recent to least). This can be used both to convolve based on some assumed reporting delay and to rescale observations (by multiplying a probability mass function by some fraction) to account ascertainment etc. A list of PMFs can be provided to allow for time-varying PMFs. This should be the same length as the modelled time period plus the length of the generation time if supplied. |
data |
Output from |
... |
Additional parameters passed to |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_fit_opts(),
enw_missing(),
enw_obs(),
enw_reference(),
enw_report()
enw_expectation(data = enw_example("preprocessed"))enw_expectation(data = enw_example("preprocessed"))
Extend a time series with additional dates. This is useful when extending the report dates of a time series to include future dates for nowcasting purposes or to include additional dates for backcasting when using a renewal process as the expectation model.
enw_extend_date( metaobs, days = 20, direction = c("end", "start"), timestep = "day" )enw_extend_date( metaobs, days = 20, direction = c("end", "start"), timestep = "day" )
metaobs |
A |
days |
Number of days to add to the time series. Defaults to 20. |
direction |
Should new dates be added at the beginning or end of the data. Default is "end" with "start" also available. |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
A data.table with the same columns as metaobs but with
additional rows for each date in the range of date to date + days
(or date - days if direction = "start"). An additional variable
observed is added with a value of FALSE for all new dates and TRUE
for all existing dates.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
metaobs <- data.frame(date = as.Date("2021-01-01") + 0:4) enw_extend_date(metaobs, days = 2) enw_extend_date(metaobs, days = 2, direction = "start")metaobs <- data.frame(date = as.Date("2021-01-01") + 0:4) enw_extend_date(metaobs, days = 2) enw_extend_date(metaobs, days = 2, direction = "start")
Filter observations to have a consistent maximum delay period
enw_filter_delay(obs, max_delay, timestep = "day")enw_filter_delay(obs, max_delay, timestep = "day")
obs |
A |
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
A data.frame filtered so that dates by report are less than or
equal the reference date plus the maximum delay.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- enw_example("preprocessed")$obs[[1]] enw_filter_delay(obs, max_delay = 2)obs <- enw_example("preprocessed")$obs[[1]] enw_filter_delay(obs, max_delay = 2)
This is a helper function which allows users to filter datasets
by reference date. This is useful, for example, when evaluating nowcast
performance against fully observed data. Users may wish to combine this
function with enw_filter_report_dates(). Note that by definition it is
assumed that report dates must be equal or greater than the corresponding
reference date (i.e a report cannot happen before the event being reported
occurs). This means that this function will also filter out any report dates
that are earlier than their corresponding reference date.
enw_filter_reference_dates( obs, earliest_date, include_days, latest_date, remove_days )enw_filter_reference_dates( obs, earliest_date, include_days, latest_date, remove_days )
obs |
A |
earliest_date |
earliest reference date to include in the data set |
include_days |
if |
latest_date |
Date, the latest reference date to include in the returned dataset. |
remove_days |
Integer, if |
The include_days parameter filters to include exactly the specified
number of most recent reference dates. For example, if the latest
reference date is 2021-10-20 and include_days = 10, the filtered data
will contain reference dates from 2021-10-11 to 2021-10-20 (10 days
inclusive).
A data.table filtered by report date
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
# Filter by date enw_filter_reference_dates( germany_covid19_hosp, earliest_date = "2021-09-01", latest_date = "2021-10-01" ) # # Filter by days enw_filter_reference_dates( germany_covid19_hosp, include_days = 10, remove_days = 10 )# Filter by date enw_filter_reference_dates( germany_covid19_hosp, earliest_date = "2021-09-01", latest_date = "2021-10-01" ) # # Filter by days enw_filter_reference_dates( germany_covid19_hosp, include_days = 10, remove_days = 10 )
Removes observations where the
reference_date is earlier than the minimum
report_date within each group. Rows with missing
reference_date are retained. This is useful for
ensuring that observations are only included from the
first available report date onwards.
This function is typically called before
enw_add_incidence() so that the incidence calculation
starts from a valid reporting window. Without this
step, reference dates that predate any report date
produce spurious leading entries in the incidence
output.
enw_filter_reference_dates_by_report_start(obs, by = NULL, copy = TRUE)enw_filter_reference_dates_by_report_start(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
Should |
A data.table filtered so that each
reference_date is on or after the minimum
report_date in its group. Rows with NA
reference_date are kept.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
library(data.table) obs <- data.table( reference_date = as.IDate(c( "2021-10-01", "2021-10-02", "2021-10-03" )), report_date = as.IDate(c( "2021-10-02", "2021-10-02", "2021-10-03" )) ) # The first row has reference_date before the minimum # report_date, so it is removed enw_filter_reference_dates_by_report_start(obs)library(data.table) obs <- data.table( reference_date = as.IDate(c( "2021-10-01", "2021-10-02", "2021-10-03" )), report_date = as.IDate(c( "2021-10-02", "2021-10-02", "2021-10-03" )) ) # The first row has reference_date before the minimum # report_date, so it is removed enw_filter_reference_dates_by_report_start(obs)
This is a helper function which allows users to create
truncated data sets at past time points from a given larger data set.
This is useful when evaluating nowcast performance against fully
observed data. Users may wish to combine this function with
enw_filter_reference_dates().
enw_filter_report_dates(obs, latest_date, remove_days)enw_filter_report_dates(obs, latest_date, remove_days)
obs |
A |
latest_date |
Date, the latest report date to include in the returned dataset. |
remove_days |
Integer, if |
A data.table filtered by report date
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
# Filter by date enw_filter_report_dates(germany_covid19_hosp, latest_date = "2021-09-01") # Filter by days enw_filter_report_dates(germany_covid19_hosp, remove_days = 10)# Filter by date enw_filter_report_dates(germany_covid19_hosp, latest_date = "2021-09-01") # Filter by days enw_filter_report_dates(germany_covid19_hosp, remove_days = 10)
Format model fitting options for use with stan
enw_fit_opts( sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, likelihood_aggregation = c("snapshots", "groups"), threads_per_chain = 1L, debug = FALSE, output_loglik = FALSE, sparse_design = FALSE, ... )enw_fit_opts( sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, likelihood_aggregation = c("snapshots", "groups"), threads_per_chain = 1L, debug = FALSE, output_loglik = FALSE, sparse_design = FALSE, ... )
sampler |
A function that creates an object that be used to extract
posterior samples from the specified model. By default this is |
nowcast |
Logical, defaults to |
pp |
Logical, defaults to |
likelihood |
Logical, defaults to |
likelihood_aggregation |
Character string, aggregation over which
stratify the likelihood when
Note that some model modules override this setting depending on model
requirements. For example, the |
threads_per_chain |
Integer, defaults to |
debug |
Logical, defaults to |
output_loglik |
Logical, defaults to |
sparse_design |
Logical, defaults to |
... |
Additional arguments to pass to the fitting function being used
by |
A list containing the specified sampler function, data as a list specifying the fitting options to use, and additional arguments to pass to the sampler function when it is called.
Model modules
enw_expectation(),
enw_missing(),
enw_obs(),
enw_reference(),
enw_report()
# Default options along with settings to pass to enw_sample enw_fit_opts(iter_sampling = 1000, iter_warmup = 1000)# Default options along with settings to pass to enw_sample enw_fit_opts(iter_sampling = 1000, iter_warmup = 1000)
Flags observations based on the 'confirm' column.
If the '.observed' column does not exist, it is created. Observations are
flagged as observed (TRUE) if 'confirm' is not NA.
enw_flag_observed_observations(obs, copy = TRUE)enw_flag_observed_observations(obs, copy = TRUE)
obs |
A |
copy |
A logical; if |
A data.table with an additional column '.observed' indicating
observed observations.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
dt <- data.frame(id = 1:3, confirm = c(NA, 1, 2)) enw_flag_observed_observations(dt)dt <- data.frame(id = 1:3, confirm = c(NA, 1, 2)) enw_flag_observed_observations(dt)
This function allows models to be defined using a flexible formula interface that supports fixed effects, random effects (using lme4 syntax), and random walks. The formula syntax builds on standard R formula notation and extends it with lme4 style random effects and custom random walk terms. Users familiar with mixed models in lme4 or brms will recognise the syntax. Note that the returned fixed effects design matrix is sparse and so the index supplied is required to link observations to the appropriate design matrix row.
enw_formula(formula, data, sparse = TRUE)enw_formula(formula, data, sparse = TRUE)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
data |
A |
sparse |
Logical, defaults to |
The formula interface supports three types of model components:
Fixed effects: Standard R formula syntax as used in stats::lm() and
similar functions. For example:
~ 1: intercept only
~ age_group: intercept plus categorical predictor
~ age_group + location: multiple predictors
~ 0 + age_group: no intercept (contrasts)
Random effects: Uses lme4 syntax with vertical bar notation.
Random effects allow parameters to vary by group whilst sharing information
across groups through partial pooling. Note that epinowcast assumes
independent standard deviations for random effects rather than correlated
random effects as supported by lme4. For example:
~ 1 + (1 | location): random intercepts by location
~ 1 + age_group + (1 | location): fixed age effect with random
location intercepts
~ (age_group | location): random slopes for age within each location
~ (1 + week | location:month): random intercepts and week effects
for each location-month combination (using interaction to create
independent random effects per strata)
Interactions (e.g., location:month) can be used on the right-hand side
of the vertical bar to specify independent random effects for each
combination of the interacting variables.
See the lme4 package documentation for more details on random effects syntax.
Random walks: Uses the rw() helper function to specify time-varying
effects that evolve smoothly over time. For example:
~ rw(week): a random walk over weeks
~ rw(week, location): independent random walks for each location
~ rw(week, location): random walks with shared variance across
locations (per-group variance is a planned extension)
ARIMA residuals: Uses the arima() helper to add an ARIMA(p, d, q)
latent residual series to the linear predictor. Unlike rw(), the
kernel that maps unit-normal shocks to the latent series depends on
the autoregressive and moving-average parameters, so the term does
not produce design-matrix columns; it carries lookup metadata that
the Stan layer uses with the kernel from
inst/stan/functions/arima_kernel.stan. For example:
~ arima(week): AR(1) on weekly residuals
~ arima(week, location, p = 2, d = 1, q = 1): ARIMA(2, 1, 1)
driven by independent shocks per location, with phi, theta,
and sigma shared across locations (per-group parameters are a
planned extension)
arima(time, d = 1, p = 0, q = 0) is equivalent to rw(time)
Convenience aliases match brms's in-formula vocabulary:
ar(time, by, p) is arima(time, by, p = p, d = 0, q = 0)
ma(time, by, q) is arima(time, by, p = 0, d = 0, q = q)
arma(time, by, p, q) is arima(time, by, p = p, d = 0, q = q)
These four types of effects can be combined in a single formula,
for example: ~ 1 + age_group + (1 | location) + rw(week, location)
specifies fixed age effects, random location intercepts, and
location-specific random walks over time.
In epinowcast model specification functions (such as enw_reference(),
enw_report(), enw_expectation()), formula arguments can be set to ~0
to disable that model component entirely. This is a package-specific
convention. Note that when a formula is specified as ~0, it is typically
converted internally to ~1 (intercept only) to ensure valid model
structure, but the component is flagged as inactive.
The formula you specify controls which covariates and effects enter the
linear predictor of the model. For instance, in the reference date model
(enw_reference()), the formula determines how reporting delay parameters
vary by covariates and groups. The formula is converted to design matrices:
a fixed effects matrix (which may be sparse for computational efficiency)
and a random effects matrix that defines the hierarchical structure.
A list containing the following:
formula: The user supplied formula
parsed_formula: The formula as parsed by parse_formula()
expanded_formula: The flattened version of the formula with
both user supplied terms and terms added for the user supplied
complex model components.
fixed: A list containing the fixed effect formula, sparse design
matrix, and the index linking the design matrix with observations.
random: A list containing the random effect formula, sparse design
matrix, and the index linking the design matrix with random effects.
For users new to formula syntax in R:
Fixed effects: See ?formula and the "Statistical Models in R"
chapter of "An Introduction to R" at the URL:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Statistical-models-in-R # nolint: line_length_linter
Random effects: See the lme4 package documentation and vignettes.
Mixed models: Bates et al. (2015) "Fitting Linear Mixed-Effects Models Using lme4". Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
# Use meta data for references dates from the Germany COVID-19 # hospitalisation data. obs <- enw_filter_report_dates( germany_covid19_hosp[location == "DE"], remove_days = 40 ) obs <- enw_filter_reference_dates(obs, include_days = 40) pobs <- enw_preprocess_data( obs, by = c("age_group", "location"), max_delay = 20 ) data <- pobs$metareference[[1]] # Intercept only enw_formula(~ 1, data) # Fixed effect enw_formula(~ 1 + age_group, data) # Random intercepts enw_formula(~ 1 + (1 | age_group), data) # Random walk enw_formula(~ 1 + rw(week), data) # Model with a random effect for age group and a random walk enw_formula(~ 1 + (1 | age_group) + rw(week), data) # Model defined without a sparse fixed effects design matrix enw_formula(~1, data[1:20, ], sparse = FALSE) # Model using an interaction in the right hand side of a random effect # to specify an independent random effect per strata. enw_formula(~ (1 + day | week:month), data = data)# Use meta data for references dates from the Germany COVID-19 # hospitalisation data. obs <- enw_filter_report_dates( germany_covid19_hosp[location == "DE"], remove_days = 40 ) obs <- enw_filter_reference_dates(obs, include_days = 40) pobs <- enw_preprocess_data( obs, by = c("age_group", "location"), max_delay = 20 ) data <- pobs$metareference[[1]] # Intercept only enw_formula(~ 1, data) # Fixed effect enw_formula(~ 1 + age_group, data) # Random intercepts enw_formula(~ 1 + (1 | age_group), data) # Random walk enw_formula(~ 1 + rw(week), data) # Model with a random effect for age group and a random walk enw_formula(~ 1 + (1 | age_group) + rw(week), data) # Model defined without a sparse fixed effects design matrix enw_formula(~1, data[1:20, ], sparse = FALSE) # Model using an interaction in the right hand side of a random effect # to specify an independent random effect per strata. enw_formula(~ (1 + day | week:month), data = data)
Format formula data for use with stan
enw_formula_as_data_list(formula, prefix, drop_intercept = FALSE)enw_formula_as_data_list(formula, prefix, drop_intercept = FALSE)
formula |
The output of |
prefix |
A character string indicating variable label to use as a prefix. |
drop_intercept |
Logical, defaults to |
A list defining the model formula. This includes:
prefix_fintercept: Is an intercept present for the fixed effects design
matrix.
prefix_fdesign: The fixed effects design matrix
prefix_fnrow: The number of rows of the fixed design matrix
prefix_findex: The index linking design matrix rows to observations
prefix_fnindex: The length of the index
prefix_fncol: The number of columns (i.e effects) in the fixed effect
design matrix (minus 1 if an intercept is present).
prefix_rdesign: The random effects design matrix
prefix_rncol: The number of columns (i.e random effects) in the random
effect design matrix (minus 1 as the intercept is dropped).
prefix_arima_present: 1 if the formula contains an arima() term,
0 otherwise.
prefix_arima_T, prefix_arima_G: ARIMA series length and group count.
prefix_arima_p, prefix_arima_d, prefix_arima_q: ARIMA orders.
prefix_arima_flat_idx: per-observation column-major index into a
(T x G) ARIMA residual matrix, used by Stan to gather residuals
with to_vector(eps)[flat_idx].
prefix_arima_n_obs: length of the lookup vectors.
Functions used to help convert models into the format required for stan
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
f <- enw_formula(~ 1 + (1 | cyl), mtcars) enw_formula_as_data_list(f, "mtcars") # A missing formula produces the default list enw_formula_as_data_list(prefix = "missing")f <- enw_formula(~ 1 + (1 | cyl), mtcars) enw_formula_as_data_list(f, "mtcars") # A missing formula produces the default list enw_formula_as_data_list(prefix = "missing")
Retrieves the user set cache location for Stan models. This
path can be set through the enw_cache_location function call.
If no environmental variable is available the output from
tempdir() will be returned.
enw_get_cache()enw_get_cache()
A string representing the file path for the cache location
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
Extracts a named component from an
enw_preprocess_data() or epinowcast() object. List columns
are unwrapped automatically so you get the underlying
data.table or value directly.
enw_get_data(x, name)enw_get_data(x, name)
x |
An |
name |
Character string naming the component to extract. |
The extracted component. For list columns this is the
first element (typically a data.table); for scalar columns
the value is returned as-is.
Utility functions
coerce_date(),
coerce_dt(),
date_to_numeric_modulus(),
enw_rolling_sum(),
get_internal_timestep(),
is.Date(),
stan_fns_as_string()
pobs <- enw_example("preprocessed_observations") enw_get_data(pobs, "obs") enw_get_data(pobs, "max_delay")pobs <- enw_example("preprocessed_observations") enw_get_data(pobs, "obs") enw_get_data(pobs, "max_delay")
Imputes NA values in the 'confirm' column. NA values are replaced with the last available observation or 0.
enw_impute_na_observations(obs, by = NULL, copy = TRUE)enw_impute_na_observations(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector of column names to group by. Defaults to an empty vector. |
copy |
A logical; if |
A data.table with imputed 'confirm' column where NA values have
been replaced with zero.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
dt <- data.frame( id = 1:3, confirm = c(NA, 1, 2), reference_date = as.Date("2021-01-01") ) enw_impute_na_observations(dt)dt <- data.frame( id = 1:3, confirm = c(NA, 1, 2), reference_date = as.Date("2021-01-01") ) enw_impute_na_observations(dt)
This function takes a data.table of aggregate counts or
something coercible to a data.table (such as a data.frame) and converts
it to a line list where each row represents a case.
enw_incidence_to_linelist( obs, reference_date = "reference_date", report_date = "report_date" )enw_incidence_to_linelist( obs, reference_date = "reference_date", report_date = "report_date" )
obs |
An object coercible to a |
reference_date |
A character string of the variable name to use
for the |
report_date |
A character string of the variable name to use
for the |
A data.table with the following variables: id, reference_date,
report_date, and any other variables in the obs object. Rows in obs
will be duplicated based on the new_confirm column. reference_date and
report_date may be renamed if reference_date and report_date are
supplied.
Data converters
enw_add_cumulative(),
enw_add_incidence(),
enw_aggregate_cumulative(),
enw_linelist_to_incidence()
incidence <- enw_filter_reference_dates_by_report_start( germany_covid19_hosp ) incidence <- enw_add_incidence(incidence) incidence <- enw_filter_reference_dates( incidence[location == "DE"], include_days = 10 ) enw_incidence_to_linelist(incidence, reference_date = "onset_date")incidence <- enw_filter_reference_dates_by_report_start( germany_covid19_hosp ) incidence <- enw_add_incidence(incidence) incidence <- enw_filter_reference_dates( incidence[location == "DE"], include_days = 10 ) enw_incidence_to_linelist(incidence, reference_date = "onset_date")
Filter observations for the latest available reported data for each reference date. Note this is not the same as filtering for the maximum report date in all cases as data may only be updated up to some maximum number of days.
enw_latest_data(obs)enw_latest_data(obs)
obs |
A |
A data.table of observations filtered for the latest available data
for each reference date.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
# Filter for latest reported data enw_latest_data(germany_covid19_hosp)# Filter for latest reported data enw_latest_data(germany_covid19_hosp)
This function takes a line list (i.e. tabular data where each
row represents a case) and aggregates to a count (new_confirm) of cases by
user-specified reference_dates and report_dates. This is enables the use
of enw_preprocess_data() and other epinowcast() preprocessing functions.
enw_linelist_to_incidence( linelist, reference_date = "reference_date", report_date = "report_date", by = NULL, max_delay, completion_beyond_max_report = FALSE, copy = TRUE )enw_linelist_to_incidence( linelist, reference_date = "reference_date", report_date = "report_date", by = NULL, max_delay, completion_beyond_max_report = FALSE, copy = TRUE )
linelist |
An object coercible to a |
reference_date |
A date or a variable that can be coerced to a date
that represents the date of interest for the case. For example, if the
|
report_date |
A date or a variable that can be coerced to a date that represents the date the case was reported. The default is "report_date". |
by |
A character vector of variables to also aggregate by (i.e. as well
as using the |
max_delay |
The maximum delay (in days) between the
|
completion_beyond_max_report |
Logical, should entries be completed beyond the maximum date found in the data? Default: FALSE |
copy |
Should |
A data.table with the following variables: reference_date,
report_date, new_confirm, confirm, delay, and
any variables specified in by.
Data converters
enw_add_cumulative(),
enw_add_incidence(),
enw_aggregate_cumulative(),
enw_incidence_to_linelist()
linelist <- data.frame( onset_date = as.Date(c("2021-01-02", "2021-01-03", "2021-01-02")), report_date = as.Date(c("2021-01-03", "2021-01-05", "2021-01-04")) ) enw_linelist_to_incidence(linelist, reference_date = "onset_date") # Specify a custom maximum delay and allow completion beyond the maximum # observed delay enw_linelist_to_incidence( linelist, reference_date = "onset_date", max_delay = 5, completion_beyond_max_report = TRUE )linelist <- data.frame( onset_date = as.Date(c("2021-01-02", "2021-01-03", "2021-01-02")), report_date = as.Date(c("2021-01-03", "2021-01-05", "2021-01-04")) ) enw_linelist_to_incidence(linelist, reference_date = "onset_date") # Specify a custom maximum delay and allow completion beyond the maximum # observed delay enw_linelist_to_incidence( linelist, reference_date = "onset_date", max_delay = 5, completion_beyond_max_report = TRUE )
For most typical use cases enw_formula() should
provide sufficient flexibility to allow models to be defined. However,
there may be some instances where more manual model specification is
required. This function supports this by allowing the user to supply
vectors of fixed, random, and customised random effects (where they are
not first treated as fixed effect terms). Prior to 1.0.0 this was the
main interface for specifying models and it is still used internally to
handle some parts of the model specification process.
enw_manual_formula( data, fixed = NULL, random = NULL, custom_random = NULL, no_contrasts = FALSE, add_intercept = TRUE )enw_manual_formula( data, fixed = NULL, random = NULL, custom_random = NULL, no_contrasts = FALSE, add_intercept = TRUE )
data |
A |
fixed |
A character vector of fixed effects. |
random |
A character vector of random effects. Random effects specified here will be added to the fixed effects. |
custom_random |
A vector of random effects. Random effects added here will not be added to the vector of fixed effects. This can be used to random effects for fixed effects that only have a partial name match. |
no_contrasts |
Logical, defaults to |
add_intercept |
Logical, defaults to |
A list specifying the fixed effects (formula, design matrix, and design matrix index), and random effects (formula and design matrix).
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
data <- enw_example("prep")$metareference[[1]] enw_manual_formula(data, fixed = "week", random = "day_of_week")data <- enw_example("prep")$metareference[[1]] enw_manual_formula(data, fixed = "week", random = "day_of_week")
Extract metadata from raw data, either
by reference or by report date. For the target date chosen
(reference or report), confirm, max_confirm``, and cum_prop_reported'
are dropped and the first observation for each group and date is retained.
enw_metadata(obs, target_date = c("reference_date", "report_date"))enw_metadata(obs, target_date = c("reference_date", "report_date"))
obs |
A |
target_date |
A character string, either "reference_date" or "report_date". The column corresponding to this string will be used as the target date for metadata extraction. |
A data.table with columns:
date, a Date column
.group, a grouping column
and the first observation for each group and date.
The data.table is sorted by .group and date.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame( reference_date = as.Date("2021-01-01"), report_date = as.Date("2022-01-01"), x = 1:10 ) enw_metadata(obs, target_date = "reference_date")obs <- data.frame( reference_date = as.Date("2021-01-01"), report_date = as.Date("2022-01-01"), x = 1:10 ) enw_metadata(obs, target_date = "reference_date")
Calculate delay metadata based on the supplied maximum delay and independent
of other metadata or date indexing. These data are meant to be used in
conjunction with metadata on the date of reference. Users can build
additional features with this data.frame or regenerate it using this
function in the output of enw_preprocess_data().
enw_metadata_delay(max_delay = 20, breaks = 4, timestep = "day")enw_metadata_delay(max_delay = 20, breaks = 4, timestep = "day")
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
breaks |
Numeric, defaults to 4. The number of breaks to use when constructing a categorised version of numeric delays. |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
A data.frame of delay metadata. This includes:
delay: The numeric delay from reference date to report.
delay_cat: The categorised delay. This may be useful for model building.
delay_week: The numeric week since the delay was reported. This again
may be useful for model building.
delay_head: A logical variable defining if the delay is in the lower
25% of the potential delays. This may be particularly useful when building
models that assume a parametric distribution in order to increase the weight
of the head of the reporting distribution in a pragmatic way.
delay_tail: A logical variable defining if the delay is in the upper
75% of the potential delays. This may be particularly useful when building
models that assume a parametric distribution in order to increase the weight
of the tail of the reporting distribution in a pragmatic way.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
enw_metadata_delay(max_delay = 20, breaks = 4)enw_metadata_delay(max_delay = 20, breaks = 4)
Missing reference data model module
enw_missing(formula = ~1, data)enw_missing(formula = ~1, data)
formula |
A formula (as implemented in |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation(),
enw_fit_opts(),
enw_obs(),
enw_reference(),
enw_report()
# Missingness model with a fixed intercept only enw_missing(data = enw_example("preprocessed")) # No missingness model specified enw_missing(~0, data = enw_example("preprocessed"))# Missingness model with a fixed intercept only enw_missing(data = enw_example("preprocessed")) # No missingness model specified enw_missing(~0, data = enw_example("preprocessed"))
Returns reports with missing reference dates as well as calculating the proportion of reports for a given reference date that were missing.
enw_missing_reference(obs)enw_missing_reference(obs)
obs |
A |
A data.table of missing counts and proportions by report date and
group.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) obs <- rbind( obs, data.frame(report_date = "2021-10-04", reference_date = NA, confirm = 4) ) obs <- enw_complete_dates(obs) obs <- enw_assign_group(obs) obs <- enw_filter_reference_dates_by_report_start(obs) obs <- enw_add_incidence(obs) enw_missing_reference(obs)obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) obs <- rbind( obs, data.frame(report_date = "2021-10-04", reference_date = NA, confirm = 4) ) obs <- enw_complete_dates(obs) obs <- enw_assign_group(obs) obs <- enw_filter_reference_dates_by_report_start(obs) obs <- enw_add_incidence(obs) enw_missing_reference(obs)
Load and compile the nowcasting model
enw_model( model = system.file("stan", "epinowcast.stan", package = "epinowcast"), include = system.file("stan", package = "epinowcast"), compile = TRUE, threads = TRUE, profile = FALSE, target_dir = epinowcast::enw_get_cache(), stanc_options = list(), cpp_options = list(), verbose = TRUE, ... )enw_model( model = system.file("stan", "epinowcast.stan", package = "epinowcast"), include = system.file("stan", package = "epinowcast"), compile = TRUE, threads = TRUE, profile = FALSE, target_dir = epinowcast::enw_get_cache(), stanc_options = list(), cpp_options = list(), verbose = TRUE, ... )
model |
A character string indicating the path to the model. If not supplied the package default model is used. |
include |
A character string specifying the path to any stan files to include in the model. If missing the package default is used. |
compile |
Logical, defaults to |
threads |
Logical, defaults to |
profile |
Logical, defaults to |
target_dir |
The path to a directory in which the manipulated .stan
files without profiling statements should be stored. To avoid overriding of
the original .stan files, this should be different from the directory of the
original model and the |
stanc_options |
A list of options to pass to the |
cpp_options |
A list of options to pass to the |
verbose |
Logical, defaults to |
... |
Additional arguments passed to |
A cmdstanr model.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
mod <- enw_model()mod <- enw_model()
A generic wrapper around posterior::draws_df() with
opinionated defaults to extract the posterior samples for the
nowcast ("pp_inf_obs" from the stan code). The functionality of
this function can be used directly on the output of epinowcast() using
the supplied summary.epinowcast() method.
enw_nowcast_samples(fit, obs, max_delay = NULL, timestep = "day")enw_nowcast_samples(fit, obs, max_delay = NULL, timestep = "day")
fit |
A |
obs |
An observation |
max_delay |
Maximum delay to which nowcasts should be extracted, in units of the timestep used during preprocessing. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
A data.frame of posterior samples for the nowcast prediction.
This uses observed data where available and the posterior prediction
where not.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") enw_nowcast_samples( fit$fit[[1]], fit$latest[[1]], fit$max_delay, "day" )fit <- enw_example("nowcast") enw_nowcast_samples( fit$fit[[1]], fit$latest[[1]], fit$max_delay, "day" )
A generic wrapper around enw_posterior() with
opinionated defaults to extract the posterior prediction for the
nowcast ("pp_inf_obs" from the stan code). The functionality of
this function can be used directly on the output of epinowcast() using
the supplied summary.epinowcast() method.
enw_nowcast_summary( fit, obs, max_delay = NULL, timestep = "day", probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95) )enw_nowcast_summary( fit, obs, max_delay = NULL, timestep = "day", probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95) )
fit |
A |
obs |
An observation |
max_delay |
Maximum delay to which nowcasts should be summarised, in units of the timestep used during preprocessing. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
A data.frame summarising the model posterior nowcast prediction.
This uses observed data where available and the posterior prediction
where not.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") enw_nowcast_summary( fit$fit[[1]], fit$latest[[1]], fit$max_delay )fit <- enw_example("nowcast") enw_nowcast_summary( fit$fit[[1]], fit$latest[[1]], fit$max_delay )
Setup observation model and data
enw_obs( family = c("negbin", "negbin1d", "poisson"), observation_indicator = NULL, data )enw_obs( family = c("negbin", "negbin1d", "poisson"), observation_indicator = NULL, data )
family |
Character string, the observation model to use in the
likelihood; enforced by |
observation_indicator |
A character string, the name of the column in
the data that indicates whether an observation is observed or not (using a
logical variable) and therefore whether or not it should be used in the
likelihood. This variable should be present in the data input to
|
data |
Output from |
A list as required by stan.
Model modules
enw_expectation(),
enw_fit_opts(),
enw_missing(),
enw_reference(),
enw_report()
enw_obs(data = enw_example("preprocessed"))enw_obs(data = enw_example("preprocessed"))
Filter observations to a maximum delay and then extract the latest observations. This is useful for model evaluation where you want to assess performance against the data as the model would have seen it.
enw_obs_at_delay(obs, max_delay, timestep = "day")enw_obs_at_delay(obs, max_delay, timestep = "day")
obs |
A |
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
A data.table of observations filtered for the
latest available data for each reference date at the
specified maximum delay.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- enw_example("preprocessed")$obs[[1]] enw_obs_at_delay(obs, max_delay = 2)obs <- enw_example("preprocessed")$obs[[1]] enw_obs_at_delay(obs, max_delay = 2)
This function takes a data.frame and a categorical variable, performs one-hot encoding, and column-binds the encoded variables back to the data.frame.
enw_one_hot_encode_feature(metaobs, feature, contrasts = FALSE)enw_one_hot_encode_feature(metaobs, feature, contrasts = FALSE)
metaobs |
A data.frame containing the data to be encoded. |
feature |
The name of the categorical variable to one-hot encode as a character string. |
contrasts |
Logical. If TRUE, create one-hot encoded variables with contrasts; if FALSE, create them without contrasts. Defaults to FALSE. |
Functions used to formulate models
enw_add_cumulative_membership(),
enw_add_pooling_effect(),
enw_design(),
enw_effects_metadata()
metaobs <- data.frame(week = 1:2) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE) metaobs <- data.frame(week = 1:6) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE)metaobs <- data.frame(week = 1:2) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE) metaobs <- data.frame(week = 1:6) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE)
For more information on the pathfinder algorithm see the CmdStan documentation. # nolint
enw_pathfinder( data, model = epinowcast::enw_model(), diagnostics = TRUE, init = NULL, ... )enw_pathfinder( data, model = epinowcast::enw_model(), diagnostics = TRUE, init = NULL, ... )
data |
A list of data as produced by model modules (for example
|
model |
A |
diagnostics |
Logical, defaults to |
init |
A list of initial values or a function to generate initial values. If not provided, the model will attempt to generate initial values |
... |
Additional parameters to be passed to |
Note that the threads_per_chain argument is renamed to num_threads to
match the CmdStanModel$pathfinder() method.
This fitting method is faster but more approximate than the NUTS sampler
used in enw_sample() and as such is recommended for use in exploratory
analysis and model development.
A data.table containing the fit, data, and fit_args. If diagnostics is TRUE, it also includes the run_time column with the timing information.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
pobs <- enw_example("preprocessed") nowcast <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_pathfinder, pp = TRUE), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast)pobs <- enw_example("preprocessed") nowcast <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_pathfinder, pp = TRUE), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast)
Stacked bar plot of notifications by reference date, coloured by delay group.
enw_plot_delay_counts(pobs, delay_group_thresh, facet = TRUE)enw_plot_delay_counts(pobs, delay_group_thresh, facet = TRUE)
pobs |
A preprocessed data object as produced by
|
delay_group_thresh |
A numeric vector defining left-closed interval thresholds for delay groups. |
facet |
Logical. When |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_plot_delay_counts(pobs, c(0, 2, 5, 10, 21))pobs <- enw_example("preprocessed_observations") enw_plot_delay_counts(pobs, c(0, 2, 5, 10, 21))
Stacked ribbon plot showing the cumulative fraction reported by delay group over reference dates.
enw_plot_delay_cumulative(pobs, delay_group_thresh, facet = TRUE)enw_plot_delay_cumulative(pobs, delay_group_thresh, facet = TRUE)
pobs |
A preprocessed data object as produced by
|
delay_group_thresh |
A numeric vector defining left-closed interval thresholds for delay groups. |
facet |
Logical. When |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_plot_delay_cumulative(pobs, c(0, 2, 5, 10, 21))pobs <- enw_example("preprocessed_observations") enw_plot_delay_cumulative(pobs, c(0, 2, 5, 10, 21))
Tile plot showing the fraction reported by delay group and reference date.
enw_plot_delay_fraction(pobs, delay_group_thresh, facet = TRUE)enw_plot_delay_fraction(pobs, delay_group_thresh, facet = TRUE)
pobs |
A preprocessed data object as produced by
|
delay_group_thresh |
A numeric vector defining left-closed interval thresholds for delay groups. |
facet |
Logical. When |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_plot_delay_fraction(pobs, c(0, 2, 5, 10, 21))pobs <- enw_example("preprocessed_observations") enw_plot_delay_fraction(pobs, c(0, 2, 5, 10, 21))
Line plot showing quantiles of the empirical delay distribution over reference dates.
enw_plot_delay_quantiles(pobs, quantiles = c(0.1, 0.5, 0.9), facet = TRUE)enw_plot_delay_quantiles(pobs, quantiles = c(0.1, 0.5, 0.9), facet = TRUE)
pobs |
A preprocessed data object as produced by
|
quantiles |
A numeric vector of probabilities.
Defaults to |
facet |
Logical. When |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") enw_plot_delay_quantiles(pobs)pobs <- enw_example("preprocessed_observations") enw_plot_delay_quantiles(pobs)
Plot nowcast quantiles
enw_plot_nowcast_quantiles(nowcast, latest_obs = NULL, log = FALSE, ...)enw_plot_nowcast_quantiles(nowcast, latest_obs = NULL, log = FALSE, ...)
nowcast |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_nowcast_quantiles(nowcast)nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_nowcast_quantiles(nowcast)
Generic quantile plot
enw_plot_obs(obs, latest_obs = NULL, log = TRUE, ...)enw_plot_obs(obs, latest_obs = NULL, log = TRUE, ...)
obs |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
nowcast <- enw_example("nowcast") obs <- enw_example("obs") # Plot observed data by reference date enw_plot_obs(obs, x = reference_date) # Plot observed data by reference date with more recent data enw_plot_obs(nowcast$latest[[1]], obs, x = reference_date)nowcast <- enw_example("nowcast") obs <- enw_example("obs") # Plot observed data by reference date enw_plot_obs(obs, x = reference_date) # Plot observed data by reference date with more recent data enw_plot_obs(nowcast$latest[[1]], obs, x = reference_date)
Plot posterior prediction quantiles
enw_plot_pp_quantiles(pp, log = FALSE, ...)enw_plot_pp_quantiles(pp, log = FALSE, ...)
pp |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2 plot.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary( nowcast, type = "posterior_prediction", probs = c(0.05, 0.2, 0.8, 0.95) ) enw_plot_pp_quantiles(nowcast) + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")nowcast <- enw_example("nowcast") nowcast <- summary( nowcast, type = "posterior_prediction", probs = c(0.05, 0.2, 0.8, 0.95) ) enw_plot_pp_quantiles(nowcast) + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
Generic quantile plot
enw_plot_quantiles(posterior, latest_obs = NULL, log = FALSE, ...)enw_plot_quantiles(posterior, latest_obs = NULL, log = FALSE, ...)
posterior |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2 plot.
enw_plot_nowcast_quantiles(), enw_plot_pp_quantiles()
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data(),
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_quantiles(nowcast, x = reference_date)nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_quantiles(nowcast, x = reference_date)
Package plot theme
enw_plot_theme(plot)enw_plot_theme(plot)
plot |
|
ggplot2 plot object.
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
plot.enw_preprocess_data(),
plot.epinowcast()
A generic wrapper around posterior::summarise_draws() with
opinionated defaults.
enw_posterior(fit, variables = NULL, probs = c(0.05, 0.2, 0.8, 0.95), ...)enw_posterior(fit, variables = NULL, probs = c(0.05, 0.2, 0.8, 0.95), ...)
fit |
A |
variables |
A character vector of variables to return posterior summaries for. By default summaries for all parameters are returned. |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
... |
Additional arguments that may be passed but will not be used. |
A data.frame summarising the model posterior.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") enw_posterior(fit$fit[[1]], variables = "expr_beta")fit <- enw_example("nowcast") enw_posterior(fit$fit[[1]], variables = "expr_beta")
This function summarises posterior predictives
for observed data (by report and reference date). The functionality of
this function can be used directly on the output of epinowcast() using
the supplied summary.epinowcast() method.
enw_pp_summary(fit, diff_obs, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95))enw_pp_summary(fit, diff_obs, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95))
fit |
A |
diff_obs |
A |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
A data.table summarising the posterior predictions.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_quantiles_to_long(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") enw_pp_summary(fit$fit[[1]], fit$new_confirm[[1]], probs = c(0.5))fit <- enw_example("nowcast") enw_pp_summary(fit$fit[[1]], fit$new_confirm[[1]], probs = c(0.5))
This function preprocesses raw observations under the
assumption they are reported as cumulative counts by a reference and
report date and is used to assign groups. It also constructs data objects
used by visualisation and modelling functions including the
observed empirical probability of a report on a given day, the cumulative
probability of report, the latest available observations, incidence of
observations, and metadata about the date of reference and report (used to
construct models). This function wraps other preprocessing functions that may
be instead used individually if required. Note that internally reports
beyond the user specified delay are dropped for modelling purposes with the
cum_prop_reported and max_confirm variables allowing the user to check
the impact this may have (if cum_prop_reported is significantly below 1 a
longer max_delay may be appropriate). Also note that if missing reference
or report dates are suspected to occur in your data then these need to be
completed with enw_complete_dates().
enw_preprocess_data( obs, by = NULL, max_delay, timestep = "day", set_negatives_to_zero = TRUE, ..., copy = TRUE )enw_preprocess_data( obs, by = NULL, max_delay, timestep = "day", set_negatives_to_zero = TRUE, ..., copy = TRUE )
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
set_negatives_to_zero |
Logical, defaults to TRUE.
Should negative counts (for calculated incidence of
observations) be set to zero? Currently downstream
modelling does not support negative counts and so
setting must be TRUE if intending to use
|
... |
Other arguments to |
copy |
A logical; if |
If max_delay is numeric, it will be internally coerced to integer
using as.integer()).
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs: (observations with the addition of empirical reporting proportions
and restricted to the specified maximum delay).
new_confirm: Incidence of notifications by reference and report date.
Empirical reporting distributions are also added.
latest: The latest available observations.
missing_reference: Observations missing reference dates.
reporting_triangle: Incident observations by report and reference date in
the standard reporting triangle matrix format.
metareference: Metadata reference dates derived from observations.
metrareport: Metadata for report dates.
metadelay: Metadata for reporting delays produced using
enw_metadata_delay().
max_delay: Maximum delay to be modelled by epinowcast.
time: Numeric, number of timepoints in the data.
snapshots: Numeric, number of available data snapshots to use for
nowcasting.
groups: Numeric, Number of groups/strata in the supplied observations
(set using by).
max_date: The maximum available report date.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long(),
enw_retrospective()
library(data.table) # Filter example hospitalisation data to be national and over all ages nat_germany_hosp <- germany_covid19_hosp[location == "DE"] nat_germany_hosp <- nat_germany_hosp[age_group == "00+"] # Preprocess with default settings pobs <- enw_preprocess_data(nat_germany_hosp) pobslibrary(data.table) # Filter example hospitalisation data to be national and over all ages nat_germany_hosp <- germany_covid19_hosp[location == "DE"] nat_germany_hosp <- nat_germany_hosp[age_group == "00+"] # Preprocess with default settings pobs <- enw_preprocess_data(nat_germany_hosp) pobs
data.frame to listConverts priors defined in a data.frame into a list
format for use by stan. In addition it adds "_p" to all
variable names in order too allow them to be distinguished from
their standard usage within modelling code.
enw_priors_as_data_list(priors)enw_priors_as_data_list(priors)
priors |
A |
A named list with each entry specifying a prior as a length two vector (specifying the mean and standard deviation of the prior).
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
priors <- data.frame(variable = "x", mean = 1, sd = 2) enw_priors_as_data_list(priors)priors <- data.frame(variable = "x", mean = 1, sd = 2) enw_priors_as_data_list(priors)
Convert summarised quantiles from wide to long format
enw_quantiles_to_long(posterior)enw_quantiles_to_long(posterior)
posterior |
A |
A data.frame of quantiles in long format.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_summarise_samples(),
subset_obs()
fit <- enw_example("nowcast") posterior <- enw_posterior(fit$fit[[1]], var = "expr_lelatent_int[1,1]") enw_quantiles_to_long(posterior)fit <- enw_example("nowcast") posterior <- enw_posterior(fit$fit[[1]], var = "expr_lelatent_int[1,1]") enw_quantiles_to_long(posterior)
Specifies the reference date reporting delay model using parametric and/or non-parametric hazard formulations.
enw_reference( parametric = ~1, distribution = c("lognormal", "none", "exponential", "gamma", "loglogistic"), non_parametric = ~0, data )enw_reference( parametric = ~1, distribution = c("lognormal", "none", "exponential", "gamma", "loglogistic"), non_parametric = ~0, data )
parametric |
A formula (as implemented in |
distribution |
A character vector describing the parametric delay distribution to use. Current options are: "none", "lognormal", "gamma", "exponential", and "loglogistic", with the default being "lognormal". |
non_parametric |
A formula (as implemented in |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation(),
enw_fit_opts(),
enw_missing(),
enw_obs(),
enw_report()
# Parametric model with a lognormal distribution enw_reference( parametric = ~1, distribution = "lognormal", data = enw_example("preprocessed") ) # Non-parametric model with a random effect per delay enw_reference( parametric = ~0, non_parametric = ~ 1 + (1 | delay), data = enw_example("preprocessed") ) # Combined parametric and non-parametric model enw_reference( parametric = ~1, non_parametric = ~ 0 + (1 | delay_cat), data = enw_example("preprocessed") )# Parametric model with a lognormal distribution enw_reference( parametric = ~1, distribution = "lognormal", data = enw_example("preprocessed") ) # Non-parametric model with a random effect per delay enw_reference( parametric = ~0, non_parametric = ~ 1 + (1 | delay), data = enw_example("preprocessed") ) # Combined parametric and non-parametric model enw_reference( parametric = ~1, non_parametric = ~ 0 + (1 | delay_cat), data = enw_example("preprocessed") )
Construct a lookup of references dates by report
enw_reference_by_report( missing_reference, reps_with_complete_refs, metareference, max_delay )enw_reference_by_report( missing_reference, reps_with_complete_refs, metareference, max_delay )
missing_reference |
|
reps_with_complete_refs |
A |
metareference |
|
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
A wide data.frame with each row being a complete report date and'
the columns being the observation index for each reporting delay
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
Replaces default model priors with user specified ones
(restricted to normal priors with specified mean and standard
deviations).
A common use is extracting the posterior from a previous
epinowcast() run (using summary(nowcast, type = "fit"))
and using it as a prior for subsequent fits.
enw_replace_priors(priors, custom_priors)enw_replace_priors(priors, custom_priors)
priors |
A |
custom_priors |
A |
Default priors can be obtained from each model module's
$priors element, e.g. enw_reference(data = pobs)$priors.
See the priors argument of epinowcast() for a list of
available prior variable names by module.
A data.table of prior definitions (variable, mean and sd).
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
# Update priors from a data.frame priors <- data.frame(variable = c("x", "y"), mean = c(1, 2), sd = c(1, 2)) custom_priors <- data.frame(variable = "x[1]", mean = 10, sd = 2) enw_replace_priors(priors, custom_priors) # Update priors from a previous model fit default_priors <- enw_reference( distribution = "lognormal", data = enw_example("preprocessed"), )$priors print(default_priors) fit_priors <- summary( enw_example("nowcast"), type = "fit", variables = c("refp_mean_int", "refp_sd_int", "sqrt_phi") ) fit_priors enw_replace_priors(default_priors, fit_priors)# Update priors from a data.frame priors <- data.frame(variable = c("x", "y"), mean = c(1, 2), sd = c(1, 2)) custom_priors <- data.frame(variable = "x[1]", mean = 10, sd = 2) enw_replace_priors(priors, custom_priors) # Update priors from a previous model fit default_priors <- enw_reference( distribution = "lognormal", data = enw_example("preprocessed"), )$priors print(default_priors) fit_priors <- summary( enw_example("nowcast"), type = "fit", variables = c("refp_mean_int", "refp_sd_int", "sqrt_phi") ) fit_priors enw_replace_priors(default_priors, fit_priors)
Report date logit hazard reporting model module
enw_report(non_parametric = ~0, structural = NULL, data)enw_report(non_parametric = ~0, structural = NULL, data)
non_parametric |
A formula (as implemented in |
structural |
A |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation(),
enw_fit_opts(),
enw_missing(),
enw_obs(),
enw_reference()
# Basic report model enw_report(data = enw_example("preprocessed")) ## Not run: # With Wednesday-only reporting structure pobs <- enw_example("preprocessed") structural <- enw_dayofweek_structural_reporting( pobs, day_of_week = "Wednesday" ) enw_report(structural = structural, data = pobs) ## End(Not run)# Basic report model enw_report(data = enw_example("preprocessed")) ## Not run: # With Wednesday-only reporting structure pobs <- enw_example("preprocessed") structural <- enw_dayofweek_structural_reporting( pobs, day_of_week = "Wednesday" ) enw_report(structural = structural, data = pobs) ## End(Not run)
Constructs the reporting triangle with each row representing a reference date and columns being observations by report date
enw_reporting_triangle(obs)enw_reporting_triangle(obs)
obs |
A |
A data.frame with each row being a reference date, and columns
being observations by reporting delay.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle_to_long(),
enw_retrospective()
obs <- enw_example("preprocessed")$new_confirm enw_reporting_triangle(obs)obs <- enw_example("preprocessed")$new_confirm enw_reporting_triangle(obs)
Recast the reporting triangle from wide to long format
enw_reporting_triangle_to_long(obs)enw_reporting_triangle_to_long(obs)
obs |
A |
A long format reporting triangle as a data.frame with additional
variables new_confirm and delay.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_retrospective()
obs <- enw_example("preprocessed")$new_confirm rt <- enw_reporting_triangle(obs) enw_reporting_triangle_to_long(rt)obs <- enw_example("preprocessed")$new_confirm rt <- enw_reporting_triangle(obs) enw_reporting_triangle_to_long(rt)
Identify report dates with complete (i.e up to the maximum delay) reference dates
enw_reps_with_complete_refs(new_confirm, max_delay, by = NULL, copy = TRUE)enw_reps_with_complete_refs(new_confirm, max_delay, by = NULL, copy = TRUE)
new_confirm |
|
max_delay |
The maximum delay to model in the delay
distribution, specified in units of the timestep (e.g., if
Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date
and |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
A logical; if |
A data.frame containing a report_date variable, and grouping
variables specified for report dates that have complete reporting.
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
Takes output of enw_preprocess_data() and returns a new
preprocessed dataset with max_delay = 1, suitable for
retrospective Rt estimation without delay modelling.
Observations are taken at the specified delay (or the latest
available) and treated as final counts. In the returned data,
report_date is set equal to reference_date for all rows
(i.e. all observations appear to be reported on the same day
they occurred).
enw_retrospective(data, max_delay = NULL)enw_retrospective(data, max_delay = NULL)
data |
Output of |
max_delay |
Integer delay at which to freeze observations.
If |
A preprocessed data object (as from
enw_preprocess_data()) with max_delay = 1.
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_construct_data(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_reference_dates_by_report_start(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_obs_at_delay(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long()
pobs <- enw_example("preprocessed") enw_retrospective(pobs)pobs <- enw_example("preprocessed") enw_retrospective(pobs)
This function takes a data.table and applies a rolling sum over a given timestep, aggregating by specified columns. It's particularly useful for aggregating observations over certain periods.
enw_rolling_sum(dt, internal_timestep, by = NULL, value_col = "confirm")enw_rolling_sum(dt, internal_timestep, by = NULL, value_col = "confirm")
dt |
A |
internal_timestep |
An integer indicating the period over which to aggregate. |
by |
A character vector specifying the columns to aggregate by. |
value_col |
A character string specifying the column to aggregate. Defaults to "confirm". |
A modified data.table with aggregated observations.
Utility functions
coerce_date(),
coerce_dt(),
date_to_numeric_modulus(),
enw_get_data(),
get_internal_timestep(),
is.Date(),
stan_fns_as_string()
Fit a CmdStan model using NUTS
enw_sample( data, model = epinowcast::enw_model(), init = NULL, init_method = c("prior", "pathfinder"), init_method_args = list(), diagnostics = TRUE, ... )enw_sample( data, model = epinowcast::enw_model(), init = NULL, init_method = c("prior", "pathfinder"), init_method_args = list(), diagnostics = TRUE, ... )
data |
A list of data as produced by model modules (for example
|
model |
A |
init |
A list of initial values or a function to generate initial values. If not provided, the model will attempt to generate initial values |
init_method |
The method to use for initializing the model. Defaults to
"prior" which samples initial values from the prior. "pathfinder", which uses
the pathfinder algorithm ( |
init_method_args |
A list of additional arguments to pass to the initialization method. |
diagnostics |
Logical, defaults to |
... |
Additional parameters passed to the |
A data.frame containing the cmdstanr fit, the input data, the
fitting arguments, and optionally summary diagnostics.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
pobs <- enw_example("preprocessed") nowcast <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_sample, pp = TRUE), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast) # Use pathfinder initialization nowcast_pathfinder <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_sample, pp = TRUE, init_method = "pathfinder"), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast_pathfinder)pobs <- enw_example("preprocessed") nowcast <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_sample, pp = TRUE), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast) # Use pathfinder initialization nowcast_pathfinder <- epinowcast(pobs, expectation = enw_expectation(~1, data = pobs), fit = enw_fit_opts(enw_sample, pp = TRUE, init_method = "pathfinder"), obs = enw_obs(family = "poisson", data = pobs), ) summary(nowcast_pathfinder)
This function allows the user to set a cache location for Stan models rather than a temporary directory. This can reduce the need for model compilation on every new model run across sessions or within a session. For R version 4.0.0 and above, it's recommended to use the persistent cache as shown in the example.
enw_set_cache(path, type = c("session", "persistent", "all"))enw_set_cache(path, type = c("session", "persistent", "all"))
path |
A valid filepath representing the desired cache location. If the directory does not exist it will be created. |
type |
A character string specifying the cache type. It can be one of
"session", "persistent", or "all". Default is "session".
"session" sets the cache for the current session, "persistent" writes the
cache location to the user's |
The string of the filepath set.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
# Set to local directory my_enw_cache <- enw_set_cache(file.path(tempdir(), "test")) enw_get_cache() ## Not run: # Use the package cache in R >= 4.0 if (R.version.string >= "4.0.0") { enw_set_cache( tools::R_user_dir(package = "epinowcast", "cache"), type = "all" ) } ## End(Not run)# Set to local directory my_enw_cache <- enw_set_cache(file.path(tempdir(), "test")) enw_get_cache() ## Not run: # Use the package cache in R >= 4.0 if (R.version.string >= "4.0.0") { enw_set_cache( tools::R_user_dir(package = "epinowcast", "cache"), type = "all" ) } ## End(Not run)
A simple binomial simulator of missing data by reference date using simulated or observed data as an input. This function may be used to validate missing data models, as part of examples and case studies, or to explore the implications of missing data for your use case.
enw_simulate_missing_reference(obs, proportion = 0.2, by = NULL)enw_simulate_missing_reference(obs, proportion = 0.2, by = NULL)
obs |
A |
proportion |
Numeric, the proportion of observations that are missing a reference date, indexed by reference date. Currently only a fixed proportion are supported and this defaults to 0.2. |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
A data.table of the same format as the input but with a simulated
proportion of observations now having a missing reference date.
# Load and filter germany hospitalisations nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "00+" ) nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-08-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group"), missing_reference = FALSE ) # Simulate enw_simulate_missing_reference( nat_germany_hosp, proportion = 0.35, by = c("location", "age_group") )# Load and filter germany hospitalisations nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "00+" ) nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-08-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group"), missing_reference = FALSE ) # Simulate enw_simulate_missing_reference( nat_germany_hosp, proportion = 0.35, by = c("location", "age_group") )
epinowcast stan functions in RThis function facilitates the exposure of Stan functions from the epinowcast package in R. It utilizes the expose_functions method of cmdstanr::CmdStanModel for this purpose. This function is useful for developers and contributors to the epinowcast package, as well as for users interested in exploring and prototyping with model functionalities.
enw_stan_to_r( files = list.files(include), include = system.file("stan", "functions", package = "epinowcast"), global = TRUE, verbose = TRUE, ... )enw_stan_to_r( files = list.files(include), include = system.file("stan", "functions", package = "epinowcast"), global = TRUE, verbose = TRUE, ... )
files |
A character vector specifying the names of Stan files to be
exposed. These must be in the |
include |
A character string specifying the directory containing Stan
files. Defaults to the 'stan/functions' directory of the |
global |
A logical value indicating whether to expose the functions
globally. Defaults to |
verbose |
Logical, defaults to |
... |
Additional arguments passed to enw_model. |
An object of class CmdStanModel with functions from the model
exposed for use in R.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_unset_cache(),
remove_profiling(),
write_stan_files_no_profile()
# Compile functions in stan/functions/hazard.stan stan_functions <- enw_stan_to_r("hazard.stan") # These functions can now be used in R stan_functions$functions$prob_to_hazard(c(0.5, 0.1, 0.1)) # or exposed globally and used directly prob_to_hazard(c(0.5, 0.1, 0.1))# Compile functions in stan/functions/hazard.stan stan_functions <- enw_stan_to_r("hazard.stan") # These functions can now be used in R stan_functions$functions$prob_to_hazard(c(0.5, 0.1, 0.1)) # or exposed globally and used directly prob_to_hazard(c(0.5, 0.1, 0.1))
Creates a base metadata grid for structural reporting patterns by generating all combinations of reference dates, delays, and report dates. This grid serves as the foundation for defining custom reporting patterns.
enw_structural_reporting_metadata(pobs)enw_structural_reporting_metadata(pobs)
pobs |
A preprocessed observation list from
|
A data.table with columns:
.group: Group identifier
date: Reference date
report_date: Report date (reference date + delay)
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
extract_obs_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
## Not run: pobs <- enw_preprocess_data(obs, max_delay = 30) metadata <- enw_structural_reporting_metadata(pobs) # Add custom reporting pattern (e.g., only report on first day of month) metadata[, report := as.integer(format(report_date, "%d") == "01")] ## End(Not run)## Not run: pobs <- enw_preprocess_data(obs, max_delay = 30) metadata <- enw_structural_reporting_metadata(pobs) # Add custom reporting pattern (e.g., only report on first day of month) metadata[, report := as.integer(format(report_date, "%d") == "01")] ## End(Not run)
This function summarises posterior samples for arbitrary strata. It optionally holds out the observed data (variables that are not ".draw", ".iteration", ".sample", ".chain" ) joins this to the summarised posterior.
enw_summarise_samples( samples, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95), by = c("reference_date", ".group"), link_with_obs = TRUE )enw_summarise_samples( samples, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95), by = c("reference_date", ".group"), link_with_obs = TRUE )
samples |
A |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
by |
A character vector of variables to summarise by. Defaults to
|
link_with_obs |
Logical, should the observed data be linked to the
posterior summary? This is useful for plotting the posterior against the
observed data. Defaults to |
A data.frame summarising the posterior samples.
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
subset_obs()
fit <- enw_example("nowcast") samples <- summary(fit, type = "nowcast_sample") enw_summarise_samples(samples, probs = c(0.05, 0.5, 0.95))fit <- enw_example("nowcast") samples <- summary(fit, type = "nowcast_sample") enw_summarise_samples(samples, probs = c(0.05, 0.5, 0.95))
Optionally removes the enw_cache_location environment variable from
the user .Renviron file and/or removes it from the local
environment. If you unset the local cache and want to switch
back to using the persistent cache, you can reload the
.Renviron file using readRenviron("~/.Renviron").
enw_unset_cache(type = c("session", "persistent", "all"))enw_unset_cache(type = c("session", "persistent", "all"))
type |
A character string specifying the type of cache to unset.
It can be one of "session", "persistent", or "all". Default is "session".
"session" unsets the cache for the current session, "persistent" removes the
cache location from the user's |
The prior cache location, if it existed otherwise NULL.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
remove_profiling(),
write_stan_files_no_profile()
enw_unset_cache()enw_unset_cache()
Provides a user friendly interface around package functionality to produce a nowcast from observed preprocessed data, and a series of user defined models. By default a model that assumes a fixed parametric reporting distribution with a flexible expectation model is used. Explore the individual model components for additional documentation and see the package case studies for example model specifications for different tasks.
epinowcast( data, reference = epinowcast::enw_reference(parametric = ~1, distribution = "lognormal", non_parametric = ~0, data = data), report = epinowcast::enw_report(non_parametric = ~0, structural = NULL, data = data), expectation = epinowcast::enw_expectation(r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data = data), missing = epinowcast::enw_missing(formula = ~0, data = data), obs = epinowcast::enw_obs(family = "negbin", data = data), fit = epinowcast::enw_fit_opts(sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, debug = FALSE, output_loglik = FALSE), model = epinowcast::enw_model(), priors, ... )epinowcast( data, reference = epinowcast::enw_reference(parametric = ~1, distribution = "lognormal", non_parametric = ~0, data = data), report = epinowcast::enw_report(non_parametric = ~0, structural = NULL, data = data), expectation = epinowcast::enw_expectation(r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data = data), missing = epinowcast::enw_missing(formula = ~0, data = data), obs = epinowcast::enw_obs(family = "negbin", data = data), fit = epinowcast::enw_fit_opts(sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, debug = FALSE, output_loglik = FALSE), model = epinowcast::enw_model(), priors, ... )
data |
Output from |
reference |
The reference date indexed reporting process model
specification as defined using |
report |
The report date indexed reporting process model
specification as defined using |
expectation |
The expectation model specification as defined using
|
missing |
The missing reference date model specification as defined
using |
obs |
The observation model as defined by |
fit |
Model fit options as defined using |
model |
The model to use within |
priors |
A |
... |
Additional model modules to pass to |
Each model module defines its own default priors.
To inspect them, call the module function and access the
$priors element.
See enw_reference(), enw_report(),
enw_expectation(), enw_missing(), and enw_obs() for
the prior variables available in each module.
To replace specific defaults, pass a data.frame to the
priors argument.
Vectorised prior names (e.g. "refp_mean_int[1]") are
matched after stripping the index.
See enw_replace_priors() for details on the merging
behaviour.
A object of the class "epinowcast" which inherits from
enw_preprocess_data() and data.table, and combines the input data,
priors, and output from the sampler specified in enw_fit_opts().
Other epinowcast:
plot.enw_preprocess_data(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data(),
summary.epinowcast()
# Load data.table and ggplot2 library(data.table) library(ggplot2) # Use 2 cores options(mc.cores = 2) # Load and filter germany hospitalisations nat_germany_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-10-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group") ) # Make a retrospective dataset retro_nat_germany <- enw_filter_report_dates( nat_germany_hosp, remove_days = 40 ) retro_nat_germany <- enw_filter_reference_dates( retro_nat_germany, include_days = 40 ) # Get latest observations for the same time period latest_obs <- nat_germany_hosp |> enw_obs_at_delay(max_delay = 20) |> enw_filter_reference_dates( remove_days = 40, include_days = 20 ) # Preprocess observations (note this maximum delay is likely too short) pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 20) # Fit with custom priors my_priors <- data.frame( variable = "refp_mean_int", mean = 2, sd = 0.5 ) nowcast <- epinowcast(pobs, priors = my_priors, fit = enw_fit_opts( save_warmup = FALSE, pp = TRUE, chains = 2, iter_warmup = 500, iter_sampling = 500 ) ) nowcast # plot the nowcast vs latest available observations plot(nowcast, latest_obs = latest_obs) # plot posterior predictions for the delay distribution by date plot(nowcast, type = "posterior") + facet_wrap(vars(reference_date), scale = "free")# Load data.table and ggplot2 library(data.table) library(ggplot2) # Use 2 cores options(mc.cores = 2) # Load and filter germany hospitalisations nat_germany_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-10-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group") ) # Make a retrospective dataset retro_nat_germany <- enw_filter_report_dates( nat_germany_hosp, remove_days = 40 ) retro_nat_germany <- enw_filter_reference_dates( retro_nat_germany, include_days = 40 ) # Get latest observations for the same time period latest_obs <- nat_germany_hosp |> enw_obs_at_delay(max_delay = 20) |> enw_filter_reference_dates( remove_days = 40, include_days = 20 ) # Preprocess observations (note this maximum delay is likely too short) pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 20) # Fit with custom priors my_priors <- data.frame( variable = "refp_mean_int", mean = 2, sd = 0.5 ) nowcast <- epinowcast(pobs, priors = my_priors, fit = enw_fit_opts( save_warmup = FALSE, pp = TRUE, chains = 2, iter_warmup = 500, iter_sampling = 500 ) ) nowcast # plot the nowcast vs latest available observations plot(nowcast, latest_obs = latest_obs) # plot posterior predictions for the delay distribution by date plot(nowcast, type = "posterior") + facet_wrap(vars(reference_date), scale = "free")
This function extracts metadata from the provided dataset to be used in the observation model.
extract_obs_metadata(new_confirm, observation_indicator = NULL)extract_obs_metadata(new_confirm, observation_indicator = NULL)
new_confirm |
A data.table containing the columns: "reference_date",
"delay", ".group", "new_confirm", and "max_obs_delay".
As produced by |
observation_indicator |
A character string specifying the column name
in |
A list containing:
st: time index of each snapshot (snapshot time).
ts: snapshot index by time and group.
sl: number of reported observations per snapshot (snapshot
length).
csl: cumulative version of sl.
lsl: number of consecutive reported observations per
snapshot accounting for missing data.
clsl: cumulative version of lsl.
nsl: number of observed observations per snapshot (snapshot
length).
cnsl: cumulative version of nsl.
sg: group index of each snapshot (snapshot group).
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_sparse_matrix(),
latest_obs_as_matrix()
This helper function allows the extraction of a sparse matrix from a matrix
using a similar approach to that implemented in
rstan::extract_sparse_parts() and returns these elements in a named
list for use in stan. This function is used in the construction of the
expectation model (see enw_expectation()).
extract_sparse_matrix(mat, prefix = "")extract_sparse_matrix(mat, prefix = "")
mat |
A matrix to extract the sparse matrix from. |
prefix |
A character string to prefix the names of the returned list. |
A list representing the sparse matrix, containing:
nw: Count of non-zero elements in mat.
w: Vector of non-zero elements in mat. Equivalent to the numeric
values from mat excluding zeros.
nv: Length of v.
v: Vector of row indices corresponding to each non-zero element in w.
Indicates the row location in mat for each non-zero value.
nu: Length of u.
u: Vector indicating the starting indices in w for non-zero elements
of each row in mat. Helps identify the partition of w into different
rows of mat.
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
latest_obs_as_matrix()
mat <- matrix(1:12, nrow = 4) mat[2, 2] <- 0 mat[3, 1] <- 0 extract_sparse_matrix(mat)mat <- matrix(1:12, nrow = 4) mat[2, 2] <- 0 mat[3, 1] <- 0 extract_sparse_matrix(mat)
Hospitalisations in Germany by date of report and reference
germany_covid19_hospgermany_covid19_hosp
An object of class data.table (inherits from data.frame) with 1536885 rows and 5 columns.
A data.table
Package data sets
enw_example()
This function converts the string representation of the timestep to its corresponding numeric value or returns the numeric input (if it is a whole number). For "day", "week", it returns 1 and 7 respectively. "month" is not supported and will throw an error. If the input is a numeric whole number, it is returned as is.
get_internal_timestep(timestep)get_internal_timestep(timestep)
timestep |
The timestep to used. This can be a string ("day", "week") or a numeric whole number representing the number of days. Note that "month" is not currently supported in user-facing functions and will throw an error if used. |
A numeric value representing the number of days for "day" and "week", or the input value if it is a numeric whole number.
Utility functions
coerce_date(),
coerce_dt(),
date_to_numeric_modulus(),
enw_get_data(),
enw_rolling_sum(),
is.Date(),
stan_fns_as_string()
A call to gp() can be used in the formula argument of
model construction functions in the epinowcast package such as
enw_formula(). It declares a Hilbert-space reduced-rank
(spectral) approximate Gaussian process indexed by time (and
optionally a grouping variable by) whose value at each observation
is added to the linear predictor. As with arima(), arguments are
not evaluated; they are passed by name for use in model construction.
Like arima() and rw(), a gp() term works on every module that
takes a formula, each with its own prior prefix:
enw_expectation() — the growth rate (expr) and the latent-to-obs
proportion (expl).
enw_reference() — the parametric delay mean (refp) and the
non-parametric logit hazards (refnp).
enw_report() — report-date logit hazards (rep).
enw_missing() — the missing-reference proportion (miss).
At most one gp() term is currently supported per formula (the
multiple-term example shown for gp_terms() only illustrates term
detection, not a supported model). The default alpha (magnitude)
and length-scale priors are inherited from EpiNow2 and are set on
EpiNow2's scale; on a given module's scale (for example the log
growth rate or a logit hazard) they may need tuning with
enw_replace_priors().
gp( time, by, d = 0, kernel = c("matern32", "matern52", "ou", "se", "periodic"), basis_prop = 0.2, boundary_scale = 1.5 )gp( time, by, d = 0, kernel = c("matern32", "matern52", "ou", "se", "periodic"), basis_prop = 0.2, boundary_scale = 1.5 )
time |
Defines the time index of the Gaussian process. Must be numeric. |
by |
Optional grouping variable. If supplied, an independent
Gaussian process is fitted for each level of |
d |
Non-negative integer, defaults to |
kernel |
Character string selecting the covariance kernel. One
of |
basis_prop |
Numeric in |
boundary_scale |
Numeric, defaults to |
A list of class enw_gp_term describing the Gaussian
process term, interpretable by construct_gp().
The Stan implementation of the approximate Gaussian process is
adapted from EpiNow2 (https://github.com/epiforecasts/EpiNow2,
MIT licensed). The Hilbert-space approximation follows
Riutort-Mayol et al. (2023), doi:10.1007/s11222-022-10167-2.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
gp(time) gp(time, location) gp(time, kernel = "se", basis_prop = 0.3) gp(time, d = 1)gp(time) gp(time, location) gp(time, kernel = "se", basis_prop = 0.3) gp(time, d = 1)
This function extracts Gaussian process terms denoted
using gp() from a formula so that they can be processed on their
own.
gp_terms(formula)gp_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector containing the Gaussian process terms identified in the supplied formula.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::gp_terms(~ 1 + age_group + gp(week)) epinowcast:::gp_terms(~ 1 + gp(week, kernel = "se") + gp(day))epinowcast:::gp_terms(~ 1 + age_group + gp(week)) epinowcast:::gp_terms(~ 1 + gp(week, kernel = "se") + gp(day))
Checks that an object is a date
is.Date(x)is.Date(x)
x |
An object |
A logical
Utility functions
coerce_date(),
coerce_dt(),
date_to_numeric_modulus(),
enw_get_data(),
enw_rolling_sum(),
get_internal_timestep(),
stan_fns_as_string()
Convert latest observed data to a matrix
latest_obs_as_matrix(latest)latest_obs_as_matrix(latest)
latest |
|
A matrix with each column being a group and each row a reference date
Helper functions for model modules
add_max_observed_delay(),
add_pmfs(),
convolution_matrix(),
enw_dayofweek_structural_reporting(),
enw_reference_by_report(),
enw_reps_with_complete_refs(),
enw_structural_reporting_metadata(),
extract_obs_metadata(),
extract_sparse_matrix()
arima()
Thin wrapper around arima() that fixes p = 0 and d = 0.
Equivalent to arima(time, by, p = 0, d = 0, q = q).
ma(time, by, q = 1)ma(time, by, q = 1)
time |
Time variable for the latent series; numeric. |
by |
Optional grouping variable. Each group draws an independent shock series; AR/MA parameters and the latent standard deviation are shared across groups. |
q |
Moving-average order. Defaults to |
An enw_arima_term interpretable by construct_arima().
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
ma(time) ma(time, location, q = 2)ma(time) ma(time, location, q = 2)
This function uses a series internal functions
to break an input formula into its component parts each of which
can then be handled separately. Currently supported components are
fixed effects, lme4 style random effects, and random walks
using the rw() helper function.
parse_formula(formula)parse_formula(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A list of formula components. These currently include:
fixed: A character vector of fixed effect terms
random: A list of of lme4 style random effects
rw: A character vector of rw() random walk terms.
arima: A character vector of arima() ARIMA(p, d, q) terms.
gp: A character vector of gp() Gaussian process terms.
The random walk functions used internally by this function were
adapted from code written by J Scott (under an MIT license) as part of
the epidemia package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::parse_formula(~ 1 + age_group + location) epinowcast:::parse_formula(~ 1 + age_group + (1 | location)) epinowcast:::parse_formula(~ 1 + (age_group | location)) epinowcast:::parse_formula(~ 1 + (1 | location) + rw(week, location))epinowcast:::parse_formula(~ 1 + age_group + location) epinowcast:::parse_formula(~ 1 + age_group + (1 | location)) epinowcast:::parse_formula(~ 1 + (age_group | location)) epinowcast:::parse_formula(~ 1 + (1 | location) + rw(week, location))
plot method for preprocessed data of class
"enw_preprocess_data". Creates descriptive plots of the
empirical reporting delay distribution and notification
time series.
## S3 method for class 'enw_preprocess_data' plot( x, type = c("obs", "delay_cumulative", "delay_fraction", "delay_quantiles", "delay_counts"), delay_group_thresh = NULL, quantiles = c(0.1, 0.5, 0.9), log = FALSE, facet = TRUE, ... )## S3 method for class 'enw_preprocess_data' plot( x, type = c("obs", "delay_cumulative", "delay_fraction", "delay_quantiles", "delay_counts"), delay_group_thresh = NULL, quantiles = c(0.1, 0.5, 0.9), log = FALSE, facet = TRUE, ... )
x |
A preprocessed data object as produced by
|
type |
Character string indicating the plot type;
enforced by
|
delay_group_thresh |
A numeric vector of left-closed
interval thresholds for delay grouping (use |
quantiles |
A numeric vector of probabilities for the
|
log |
Logical, defaults to |
facet |
Logical. When |
... |
Additional arguments passed to the underlying plot function. |
A ggplot2 object.
Other epinowcast:
epinowcast(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data(),
summary.epinowcast()
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.epinowcast()
pobs <- enw_example("preprocessed_observations") # Latest observations plot(pobs, type = "obs") # Cumulative reporting delay plot(pobs, type = "delay_cumulative") # Reporting delay heatmap plot(pobs, type = "delay_fraction") # Reporting delay quantiles plot(pobs, type = "delay_quantiles") # Notifications by delay group plot(pobs, type = "delay_counts")pobs <- enw_example("preprocessed_observations") # Latest observations plot(pobs, type = "obs") # Cumulative reporting delay plot(pobs, type = "delay_cumulative") # Reporting delay heatmap plot(pobs, type = "delay_fraction") # Reporting delay quantiles plot(pobs, type = "delay_quantiles") # Notifications by delay group plot(pobs, type = "delay_counts")
plot method for class "epinowcast".
## S3 method for class 'epinowcast' plot( x, latest_obs = NULL, type = c("nowcast", "posterior_prediction"), log = FALSE, ... )## S3 method for class 'epinowcast' plot( x, latest_obs = NULL, type = c("nowcast", "posterior_prediction"), log = FALSE, ... )
x |
A |
latest_obs |
A |
type |
Character string indicating the plot required; enforced by
|
log |
Logical, defaults to |
... |
Additional arguments to the plot function specified by |
ggplot2 object
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
print.enw_preprocess_data(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data(),
summary.epinowcast()
Plotting functions
enw_delay_categories(),
enw_delay_quantiles(),
enw_plot_delay_counts(),
enw_plot_delay_cumulative(),
enw_plot_delay_fraction(),
enw_plot_delay_quantiles(),
enw_plot_nowcast_quantiles(),
enw_plot_obs(),
enw_plot_pp_quantiles(),
enw_plot_quantiles(),
enw_plot_theme(),
plot.enw_preprocess_data()
nowcast <- enw_example("nowcast") latest_obs <- enw_example("obs") # Plot nowcast plot(nowcast, latest_obs = latest_obs, type = "nowcast") # Plot posterior predictions by reference date plot(nowcast, type = "posterior_prediction") + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")nowcast <- enw_example("nowcast") latest_obs <- enw_example("obs") # Plot nowcast plot(nowcast, latest_obs = latest_obs, type = "nowcast") # Plot posterior predictions by reference date plot(nowcast, type = "posterior_prediction") + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
print method for class
"enw_preprocess_data".
## S3 method for class 'enw_preprocess_data' print(x, ...)## S3 method for class 'enw_preprocess_data' print(x, ...)
x |
A |
... |
Additional arguments (not used). |
Invisibly returns x.
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
plot.epinowcast(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data(),
summary.epinowcast()
pobs <- enw_example("preprocessed_observations") pobspobs <- enw_example("preprocessed_observations") pobs
print method for class "epinowcast".
## S3 method for class 'epinowcast' print(x, ...)## S3 method for class 'epinowcast' print(x, ...)
x |
A |
... |
Additional arguments (not used). |
Invisibly returns x.
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data(),
summary.epinowcast()
nowcast <- enw_example("nowcast") nowcastnowcast <- enw_example("nowcast") nowcast
print method for the output of
summary.enw_preprocess_data().
## S3 method for class 'summary.enw_preprocess_data' print(x, ...)## S3 method for class 'summary.enw_preprocess_data' print(x, ...)
x |
A |
... |
Additional arguments (not used). |
Invisibly returns x.
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.epinowcast(),
summary.enw_preprocess_data(),
summary.epinowcast()
Defines random effect terms using the lme4 syntax
re(formula)re(formula)
formula |
A random effect as returned by findbars when a random effect is defined using the lme4 syntax in formula. Currently only simplified random effects (i.e LHS | RHS) are supported. |
A list defining the fixed and random effects of the specified random effect
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
form <- epinowcast:::parse_formula(~ 1 + (1 | age_group)) re(form$random[[1]]) form <- epinowcast:::parse_formula(~ 1 + (location | age_group)) re(form$random[[1]])form <- epinowcast:::parse_formula(~ 1 + (1 | age_group)) re(form$random[[1]]) form <- epinowcast:::parse_formula(~ 1 + (location | age_group)) re(form$random[[1]])
This function removes ARIMA terms — arima(), ar(),
ma(), and arma() — from a formula so they can be processed on
their own.
remove_arima_terms(formula)remove_arima_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A formula object with the ARIMA terms removed.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::remove_arima_terms(~ 1 + age_group + arima(week)) epinowcast:::remove_arima_terms(~ 1 + age_group + ar(week, p = 2))epinowcast:::remove_arima_terms(~ 1 + age_group + arima(week)) epinowcast:::remove_arima_terms(~ 1 + age_group + ar(week, p = 2))
This function removes Gaussian process terms denoted
using gp() from a formula so they can be processed on their own.
remove_gp_terms(formula)remove_gp_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A formula object with the Gaussian process terms removed.
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_rw_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::remove_gp_terms(~ 1 + age_group + gp(week))epinowcast:::remove_gp_terms(~ 1 + age_group + gp(week))
Remove profiling statements from a character vector representing stan code
remove_profiling(s)remove_profiling(s)
s |
Character vector representing stan code |
A character vector of the stan code without profiling statements
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
write_stan_files_no_profile()
This function removes random walk terms
denoted using rw() from a formula so that they can be
processed on their own.
remove_rw_terms(formula)remove_rw_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A formula object with the random walk terms removed.
This function was adapted from code written
by J Scott (under an MIT license) as part of
the epidemia package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
rw(),
rw_terms(),
split_formula_to_terms()
epinowcast:::remove_rw_terms(~ 1 + age_group + location) epinowcast:::remove_rw_terms(~ 1 + age_group + location + rw(week, location))epinowcast:::remove_rw_terms(~ 1 + age_group + location) epinowcast:::remove_rw_terms(~ 1 + age_group + location + rw(week, location))
A call to rw() can be used in the formula argument of model
construction functions in the epinowcast package such as
enw_formula(). Mathematically a Gaussian random walk is exactly
an ARIMA(0, 1, 0) process; rw(time, by, type) is now a thin
wrapper over arima() with p = 0, d = 1, q = 0. It is kept
as a user-facing convenience because random walks are the most
common time-series structure in epinowcast formulas.
rw(time, by)rw(time, by)
time |
Defines the random walk time period. |
by |
Defines the grouping parameter used for the random walk. If not specified no grouping is used. Currently this is limited to a single variable. Each group draws an independent shock series; the latent standard deviation is shared across groups (per-group standard deviations are a planned extension). |
Does not evaluate arguments but instead simply passes information for use in model construction.
A list of class enw_arima_term (with p = 0, d = 1,
q = 0) that can be interpreted by construct_arima().
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw_terms(),
split_formula_to_terms()
rw(time) rw(time, location) rw(time, location)rw(time) rw(time, location) rw(time, location)
This function extracts random walk terms
denoted using rw() from a formula so that they can be
processed on their own.
rw_terms(formula)rw_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector containing the random walk terms that have been identified in the supplied formula.
This function was adapted from code written
by J Scott (under an MIT license) as part of
the epidemia package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
split_formula_to_terms()
epinowcast:::rw_terms(~ 1 + age_group + location) epinowcast:::rw_terms(~ 1 + age_group + location + rw(week, location))epinowcast:::rw_terms(~ 1 + age_group + location) epinowcast:::rw_terms(~ 1 + age_group + location + rw(week, location))
Split formula into individual terms
split_formula_to_terms(formula)split_formula_to_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector of formula terms
Functions used to help convert formulas into model designs
ar(),
arima(),
arima_terms(),
arma(),
as_string_formula(),
construct_arima(),
construct_gp(),
construct_re(),
construct_rw(),
enw_formula(),
enw_manual_formula(),
gp(),
gp_terms(),
ma(),
parse_formula(),
re(),
remove_arima_terms(),
remove_gp_terms(),
remove_rw_terms(),
rw(),
rw_terms()
epinowcast:::split_formula_to_terms(~ 1 + age_group + location)epinowcast:::split_formula_to_terms(~ 1 + age_group + location)
Read in a stan function file as a character string
stan_fns_as_string(files, include)stan_fns_as_string(files, include)
files |
A character vector specifying the names of Stan files to be
exposed. These must be in the |
include |
A character string specifying the directory containing Stan
files. Defaults to the 'stan/functions' directory of the |
A character string in the of stan functions.
Utility functions
coerce_date(),
coerce_dt(),
date_to_numeric_modulus(),
enw_get_data(),
enw_rolling_sum(),
get_internal_timestep(),
is.Date()
Subset observations data table for either modelled dates or not-modelled earlier dates.
subset_obs(ord_obs, max_delay, internal_timestep, reference_subset)subset_obs(ord_obs, max_delay, internal_timestep, reference_subset)
ord_obs |
The observations |
max_delay |
Whole number representing the maximum delay in units of the timestep. |
internal_timestep |
A numeric value representing the number of days in the timestep, e.g. 7 when the timesteps are weeks. |
reference_subset |
String giving a relational operator
to subset ord_obs by reference date; e.g. |
A data.frame subset for the desired observations
Functions used for postprocessing of model fits
build_ord_obs(),
enw_add_latest_obs_to_nowcast(),
enw_nowcast_samples(),
enw_nowcast_summary(),
enw_posterior(),
enw_pp_summary(),
enw_quantiles_to_long(),
enw_summarise_samples()
summary method for class
"enw_preprocess_data". Returns a structured overview of
the preprocessed data including a preview of the latest
observations and a corner of the reporting triangle.
## S3 method for class 'enw_preprocess_data' summary(object, n = 6, ...)## S3 method for class 'enw_preprocess_data' summary(object, n = 6, ...)
object |
A |
n |
Integer number of rows to show in previews. Defaults to 6. |
... |
Additional arguments (not used). |
A list of class "summary.enw_preprocess_data"
containing the preprocessed data object and preview
parameters, printed via
print.summary.enw_preprocess_data().
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.epinowcast()
pobs <- enw_example("preprocessed_observations") summary(pobs)pobs <- enw_example("preprocessed_observations") summary(pobs)
summary method for class "epinowcast".
## S3 method for class 'epinowcast' summary( object, type = c("nowcast", "nowcast_samples", "fit", "posterior_prediction"), max_delay = object$max_delay, ... )## S3 method for class 'epinowcast' summary( object, type = c("nowcast", "nowcast_samples", "fit", "posterior_prediction"), max_delay = object$max_delay, ... )
object |
A |
type |
Character string indicating the summary to return; enforced by
|
max_delay |
Maximum delay to which nowcasts should be summarised, in units of the timestep used during preprocessing. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
... |
Additional arguments passed to summary specified by |
A summary data.frame
summary epinowcast
Other epinowcast:
epinowcast(),
plot.enw_preprocess_data(),
plot.epinowcast(),
print.enw_preprocess_data(),
print.epinowcast(),
print.summary.enw_preprocess_data(),
summary.enw_preprocess_data()
nowcast <- enw_example("nowcast") # Summarise nowcast posterior summary(nowcast, type = "nowcast") # Nowcast posterior samples summary(nowcast, type = "nowcast_samples") # Nowcast model fit summary(nowcast, type = "fit") # Posterior predictions summary(nowcast, type = "posterior_prediction")nowcast <- enw_example("nowcast") # Summarise nowcast posterior summary(nowcast, type = "nowcast") # Nowcast posterior samples summary(nowcast, type = "nowcast_samples") # Nowcast model fit summary(nowcast, type = "fit") # Posterior predictions summary(nowcast, type = "posterior_prediction")
Write copies of the .stan files of a Stan model and its #include files with all profiling statements removed.
write_stan_files_no_profile( stan_file, include_paths = NULL, target_dir = epinowcast::enw_get_cache() )write_stan_files_no_profile( stan_file, include_paths = NULL, target_dir = epinowcast::enw_get_cache() )
stan_file |
The path to a .stan file containing a Stan program. |
include_paths |
Paths to directories where Stan should look for files specified in #include directives in the Stan program. |
target_dir |
The path to a directory in which the manipulated .stan
files without profiling statements should be stored. To avoid overriding of
the original .stan files, this should be different from the directory of the
original model and the |
A list containing the path to the .stan file without profiling
statements and the include_paths for the included .stan files without
profiling statements
Functions used to help convert models into the format required for stan
enw_formula_as_data_list(),
enw_get_cache(),
enw_model(),
enw_pathfinder(),
enw_priors_as_data_list(),
enw_replace_priors(),
enw_sample(),
enw_set_cache(),
enw_stan_to_r(),
enw_unset_cache(),
remove_profiling()