This vignette covers the maths behind epinowcast’s
Gaussian process latent terms, how to use them in any module’s formula,
and how to set their priors. It explains the Hilbert-space spectral
approximation, the available kernels, and the regularisation choices,
with the trade-offs made explicit. For worked usage and a comparison to
rw(), arima(), and lme4-style random effects,
see the latent processes vignette.
For the full model definition see the model
vignette.
The implementation was ported from EpiNow2
(epiforecasts/EpiNow2, MIT licensed), where the same
Hilbert-space approximate Gaussian process is used as the default latent
process for the reproduction number. The Stan functions in
inst/stan/functions/gaussian_process.stan carry an
attribution comment to that effect. The maths below follows the
EpiNow2 implementation notes and the underlying method of
Riutort-Mayol et al.[1].
A gp(time, by, kernel, basis_prop) term in a formula
declares a smooth latent function of time that is added to the relevant
linear predictor at every observation. Writing the linear predictor of a
module as \(\eta_i\) for observation
\(i\) at time \(t_i\) and group \(g_i\):
\[\eta_i = X_i \beta + Z_i b + f_{g_i}(t_i),\]
where \(X_i \beta\) are the fixed
effects, \(Z_i b\) the random effects
(including any rw() contributions, which are handled
through the design matrix), and \(f_{g_i}(t_i)\) is the value of the Gaussian
process for group \(g_i\) at time \(t_i\). The process evolves over the time
axis declared in gp(...); each observation just looks up
the value at its own (time, group) index.
The one-dimensional Gaussian processes used here have a zero mean function and a stationary covariance kernel \(k\),
\[f \sim \mathrm{GP}(0,\, k(\Delta t)), \qquad k(t, t') = k(|t - t'|) = k(\Delta t),\]
so the prior correlation between two time points depends only on
their separation \(\Delta t\). Where
the arima() term builds memory step by step from a
recursion, the Gaussian process places a smooth prior over the whole
trajectory at once, with a length scale \(\rho\) that controls how quickly the
function can change and a magnitude \(\alpha\) that controls how far it can
depart from zero.
Add a term to any module’s formula with gp():
gp(time) — a Gaussian process over time
with the default Matérn 3/2 kernel.gp(time, by) — an independent realisation per level of
by, sharing the length scale and magnitude.gp(time, d = 1) — an integrated (non-stationary)
process (see Differencing order).gp(time, kernel = "se") — a different kernel (see Kernels).gp(time, basis_prop = 0.3) — more basis functions,
trading speed for approximation accuracy.time is the column the process evolves over, and the
optional by gives each group its own realisation under
shared hyperparameters. The latent
processes vignette works through gp() on the growth
rate and compares it to rw(), arima(), and
lme4-style random effects; the Priors section
below covers tuning their regularisation.
A Gaussian process with a dense covariance matrix costs \(O(T^3)\) to evaluate, which is prohibitive inside a sampler. We instead use the Hilbert-space reduced-rank (spectral) approximation of Riutort-Mayol et al.[1], which represents the process as a finite weighted sum of fixed basis functions,
\[f(t) \approx \sum_{j=1}^{m} \left( S_k(\sqrt{\lambda_j}) \right)^{\frac{1}{2}} \phi_j(t)\, \beta_j,\]
with \(m\) the number of basis functions. This turns the Gaussian process into a linear model in \(m\) standard-normal weights \(\beta_j\), so the cost is \(O(T m)\) rather than \(O(T^3)\).
The number of basis functions is set as a proportion \(b\) of the number of time points \(t_\mathrm{GP}\) to which the process is applied, rounded up[1],
\[m = \lceil b\, t_\mathrm{GP} \rceil.\]
The eigenvalues of the Laplace operator on the boundary-scaled interval are
\[\lambda_j = \left( \frac{j \pi}{2 L} \right)^2,\]
with \(L\) a positive boundary factor, and the eigenfunctions are
\[\phi_j(t) = \frac{1}{\sqrt{L}} \sin\!\left( \sqrt{\lambda_j}\, (t^* + L) \right),\]
with time rescaled linearly to lie in \([-1, 1]\),
\[t^* = \frac{t - \tfrac{1}{2} t_\mathrm{GP}}{\tfrac{1}{2} t_\mathrm{GP}}.\]
The weights have a standard-normal prior \(\beta_j \sim \mathcal{N}(0, 1)\), so the approximation is non-centred: all of the kernel’s structure enters through the spectral densities \(S_k(\sqrt{\lambda_j})\), which scale each basis function by how much power the kernel places at that frequency.
The basis matrix \(\Phi \in \mathbb{R}^{T
\times m}\) (with entries \(\phi_j(t)\)) is data-only: it does not
depend on \(\rho\) or \(\alpha\), so it is built once on the R side
in construct_gp() and passed to Stan as data. The spectral
densities are parameter-dependent and rebuilt every gradient evaluation.
Putting the pieces together, the latent process for one group is
S <- diagSPD(alpha, rho, L, M) # length-M (or 2M) spectral density
f <- PHI * (S .* eta) # length-T latent, eta ~ Normal(0, 1)
The Stan implementation of these steps lives in
inst/stan/functions/gaussian_process.stan, which was
adapted from EpiNow2.
The kernel choice enters only through the spectral density \(S_k\). For a Matérn kernel of order \(\nu\),
\[S_k(\omega) = \alpha^2\, \frac{2 \sqrt{\pi}\, \Gamma(\nu + 1/2)\, (2\nu)^{\nu}}{\Gamma(\nu)\, \rho^{2\nu}} \left( \frac{2\nu}{\rho^2} + \omega^2 \right)^{-\left(\nu + \frac{1}{2}\right)}.\]
The kernel argument selects among the following special
cases.
Matérn 3/2 (kernel = "matern32",
the default), with covariance \(k(\Delta t) =
\alpha^2 \left( 1 + \sqrt{3}\,\Delta t / \rho \right)
\exp(-\sqrt{3}\,\Delta t / \rho)\) and spectral density
\[S_k(\omega) = \left( \frac{2 \alpha\, (\sqrt{3}/\rho)^{3/2}}{(\sqrt{3}/\rho)^2 + \omega^2} \right)^2.\]
Matérn 5/2 (kernel = "matern52"),
with
\[S_k(\omega) = \alpha^2\, \frac{16\, (\sqrt{5}/\rho)^5}{3 \left( (\sqrt{5}/\rho)^2 + \omega^2 \right)^3}.\]
Ornstein-Uhlenbeck (kernel = "ou"),
the Matérn 1/2 kernel, with \(k(\Delta t) =
\alpha^2 \exp(-\Delta t / \rho)\) and
\[S_k(\omega) = \alpha^2\, \frac{2}{\rho\, (1/\rho^2 + \omega^2)}.\]
Squared exponential
(kernel = "se"), with \(k(\Delta
t) = \alpha^2 \exp\!\left( -\tfrac{1}{2}\, \Delta t^2 / \rho^2
\right)\) and
\[S_k(\omega) = \alpha^2 \sqrt{2\pi}\, \rho\, \exp\!\left( -\tfrac{1}{2}\, \rho^2 \omega^2 \right).\]
Periodic (kernel = "periodic"),
which repeats on a fixed cycle rather than decaying with separation. It
uses a fundamental-frequency basis (cosine/sine pairs) rather than the
boundary-scaled basis above, so boundary_scale has no
effect for this kernel.
The smoother kernels (squared exponential, higher Matérn order) place less power at high frequencies, so their spectral densities decay faster and the approximation needs fewer basis functions for a given accuracy.
When by is supplied each group has its own column of
standard-normal spectral weights \(\eta_{\cdot, g}\), but the length scale
\(\rho\) and magnitude \(\alpha\) are shared across groups. The
basis matrix \(\Phi\) is built once and
applied to each group’s weight column in turn, so every group’s
realisation is an independent draw from a Gaussian process with common
hyperparameters. The length scale \(\rho\) and magnitude \(\alpha\) are shared across groups, and
gp() does not take a type argument.
The latent process produces \(f \in
\mathbb{R}^{T \times G}\), but observations live in a flat
vector. Each observation carries its own time and group index, so the
contribution to the linear predictor is a single matrix lookup per
observation — negligible next to building the spectral densities. On
modules with a sparse design (the parametric and non-parametric
reference, the report-time hazards, and the missing-reference
proportion) the lookup index is built from the same joint (covariate row
\(\times\) time \(\times\) group) deduplication used for
arima(), so the gather stays aligned with the deduplicated
design rows.
The d argument integrates the realisation \(d\) times before it enters the predictor,
matching the d of arima(). It controls whether the
Gaussian process models the level of the latent quantity or its
change.
d = 0 (the default) is a stationary
process: \(f\) is the realisation
described above, fluctuating around zero with no preferred direction. On
the growth rate this corresponds to EpiNow2’s gp_on = "R0",
where the process models stationary deviations from a fixed reproduction
number.
d = 1 integrates the process once,
so it is the increments of the trajectory that are smooth
rather than the trajectory itself. The result drifts like a random walk
but with correlated, smoothly varying steps, which is often the more
natural prior for an epidemic trend. This matches EpiNow2’s default
gp_on = "R_t-1", where the Gaussian process is placed on
the first difference of the log reproduction number.
d \ge 2 integrates further,
generalising beyond EpiNow2 to smoother non-stationary trends (for
example \(d = 2\) models curvature with
a smoothly varying slope).
Concretely, for \(d \ge 1\) the spectral approximation generates the \(T - d\) free values \(\tilde f\), which are placed in positions \((d+1), \dots, T\) of a length-\(T\) vector whose first \(d\) entries are zero, and the vector is then cumulatively summed \(d\) times:
\[f = \underbrace{C \cdots C}_{d}\, \begin{pmatrix} \mathbf{0}_d \\ \tilde f \end{pmatrix}, \qquad C \text{ the cumulative-sum operator}.\]
Because the leading \(d\) entries
start at zero they remain zero through every integration pass, so \(f_1 = \dots = f_d = 0\). This anchoring is
the identifiability fix: it leaves the free level (for \(d = 1\)) and additionally the free slope
(for \(d \ge 2\)) to the module’s fixed
effects rather than the Gaussian process, which would otherwise be
confounded with the intercept. This is the same convention EpiNow2 uses
for its non-stationary reproduction-number process
(gp[2:(gp_n + 1)] = noise; gp = cumulative_sum(gp)),
generalised to arbitrary \(d\).
Differencing is meaningful for the latent expectation modules (the
growth rate expr and the latent-to-obs proportion
expl), where a drifting non-stationary trend is the usual
model. It is permitted on the logit-hazard modules (refnp,
rep, miss) for API consistency with
arima(), but integrating a logit hazard is unusual and
rarely the intended model.
The default priors follow EpiNow2[1]. The
length scale \(\rho\) and magnitude
\(\alpha\) can both be overridden
through the standard priors argument of the relevant
module; the spectral weights are fixed.
<prefix>_gp_alpha
prior.<prefix>_gp_rho prior.Here <prefix> is the module the term sits on
(expr for the growth rate, refp for the
parametric reference mean, and so on; see Where you can
use it). The EpiNow2 defaults were chosen on the scale
of the log reproduction number; on a given module’s scale (for example a
logit hazard) they may need tightening or loosening. For example, to put
a tighter length scale on the growth-rate process:
library(epinowcast)
rho_prior <- data.frame(variable = "expr_gp_rho", mean = log(3), sd = 0.25)
# epinowcast(..., priors = rho_prior)The magnitude prior on \(\alpha\) is the most consequential in practice; tighten it if the latent process is competing with the rest of the linear predictor for variance. The length-scale prior on \(\rho\) is the lever for how wiggly the function is allowed to be: a smaller \(\rho\) permits faster changes, a larger \(\rho\) forces a smoother trajectory.
gp() works on every module that takes a formula, each
with its own prior prefix:
enw_expectation() — the growth rate (expr)
and the latent-to-obs proportion (expl).enw_reference() — the parametric mean
(refp) and the non-parametric logit hazards
(refnp).enw_report() — report-date logit hazards
(rep).enw_missing() — the missing-reference proportion
(miss).Putting a gp() term on the reference delay mean, for
instance, models a reporting delay that drifts smoothly over time, while
a gp() on the growth rate models the epidemic trend
itself.
gp() does not take a type argument.basis_prop;
too few basis functions oversmooth short-length-scale structure, while
too many slow the fit. The EpiNow2 default of \(b = 0.2\) is a reasonable starting point
but may need raising for processes with fast variation.model_refp > 1), it shares the Gaussian process basis,
length scale, and spectral weights of the mean but carries an
independent magnitude (refp_gp_sd_alpha). The two latent
processes are therefore perfectly correlated up to that magnitude rather
than driven by separate weights.