| Title: | Social Mixing Matrices for Infectious Disease Modelling |
|---|---|
| Description: | Methods for sampling contact matrices from diary data for use in infectious disease modelling, as discussed in Mossong et al. (2008) <doi:10.1371/journal.pmed.0050074>. |
| Authors: | Sebastian Funk [aut, cre], Lander Willem [aut], Hugo Gruson [aut], Nicholas Tierney [aut] (ORCID: <https://orcid.org/0000-0003-1460-8722>), Maria Bekker-Nielsen Dunbar [ctb], Carl A. B. Pearson [ctb], Sam Clifford [ctb], Christopher Jarvis [ctb], Alexis Robert [ctb], Niel Hens [ctb], Pietro Coletti [col, dtm], Lloyd Chapman [ctb] |
| Maintainer: | Sebastian Funk <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.0.9000 |
| Built: | 2026-05-13 19:54:18 UTC |
| Source: | https://github.com/epiforecasts/socialmixr |
Filters a contact_survey object using an expression. The expression is
evaluated against whichever table(s) contain the referenced columns
(participants, contacts, or both). When participants are filtered, contacts
are automatically pruned to matching part_ids.
## S3 method for class 'contact_survey' x[i, ...]## S3 method for class 'contact_survey' x[i, ...]
x |
a |
i |
an expression to evaluate as a row filter (e.g.
|
... |
ignored |
a filtered contact_survey object
data(polymod) polymod[country == "United Kingdom"]data(polymod) polymod[country == "United Kingdom"]
Inverse of limits_to_agegroups(). Extracts lower age limits from age group
labels.
agegroups_to_limits(x)agegroups_to_limits(x)
x |
age groups (a factor, as produced by |
a numeric vector of lower age limits
agegroups_to_limits(limits_to_agegroups(c(0, 5, 10), notation = "brackets"))agegroups_to_limits(limits_to_agegroups(c(0, 5, 10), notation = "brackets"))
Checks that a survey fulfills all the requirements to work with the 'contact_matrix' function
as_contact_survey( x, id_column = "part_id", country_column = NULL, year_column = NULL, ..., id.column = deprecated(), country.column = deprecated(), year.column = deprecated() )as_contact_survey( x, id_column = "part_id", country_column = NULL, year_column = NULL, ..., id.column = deprecated(), country.column = deprecated(), year.column = deprecated() )
x |
list containing
|
id_column |
the column in both the |
country_column |
the column in the |
year_column |
the column in the |
... |
additional arguments (currently ignored) |
id.column, country.column, year.column
|
invisibly returns a character vector of the relevant columns
data(polymod) as_contact_survey(polymod)data(polymod) as_contact_survey(polymod)
This function processes age data in a survey object. It imputes ages from ranges, handles missing values, and assigns age groups.
assign_age_groups( survey, age_limits = NULL, estimated_participant_age = c("mean", "sample", "missing"), estimated_contact_age = c("mean", "sample", "missing"), missing_participant_age = c("remove", "keep"), missing_contact_age = c("remove", "sample", "keep", "ignore") )assign_age_groups( survey, age_limits = NULL, estimated_participant_age = c("mean", "sample", "missing"), estimated_contact_age = c("mean", "sample", "missing"), missing_participant_age = c("remove", "keep"), missing_contact_age = c("remove", "sample", "keep", "ignore") )
survey |
a |
age_limits |
lower limits of the age groups over which to construct the matrix. Defaults to NULL. If NULL, age limits are inferred from participant and contact ages. |
estimated_participant_age |
if set to "mean" (default), people whose ages are given as a range (in columns named "..._est_min" and "..._est_max") but not exactly (in a column named "..._exact") will have their age set to the mid-point of the range; if set to "sample", the age will be sampled from the range; if set to "missing", age ranges will be treated as missing |
estimated_contact_age |
if set to "mean" (default), contacts whose ages are given as a range (in columns named "..._est_min" and "..._est_max") but not exactly (in a column named "..._exact") will have their age set to the mid-point of the range; if set to "sample", the age will be sampled from the range; if set to "missing", age ranges will be treated as missing |
missing_participant_age |
if set to "remove" (default), participants without age information are removed; if set to "keep", participants with missing age are kept and treated as a separate age group |
missing_contact_age |
if set to "remove" (default), participants that
have contacts without age information are removed; if set to "keep",
contacts with missing age are kept and treated as a separate age group;
if set to "ignore", contacts with missing age are ignored in the contact
analysis. The "sample" option is defunct (errors). For contacts that
have only an age range (rather than a truly missing age),
|
The survey object with processed age data.
polymod_grouped <- assign_age_groups(polymod) polymod_grouped polymod_custom <- assign_age_groups(polymod, age_limits = c(0, 5, 10, 15)) polymod_custompolymod_grouped <- assign_age_groups(polymod) polymod_grouped polymod_custom <- assign_age_groups(polymod, age_limits = c(0, 5, 10, 15)) polymod_custom
Checks that a survey fulfills all the requirements to work with the 'contact_matrix' function
## S3 method for class 'contact_survey' check( x, id.column = "part_id", participant.age.column = "part_age", country.column = "country", year.column = "year", contact.age.column = "cnt_age", ... )## S3 method for class 'contact_survey' check( x, id.column = "part_id", participant.age.column = "part_age", country.column = "country", year.column = "year", contact.age.column = "cnt_age", ... )
x |
A |
id.column |
the column in both the |
participant.age.column |
the column in the |
country.column |
the column in the |
year.column |
the column in the |
contact.age.column |
the column in the |
... |
ignored |
invisibly returns a character vector of the relevant columns
data(polymod) try(check(polymod))data(polymod) try(check(polymod))
Cleans survey data to work with the 'contact_matrix' function
## S3 method for class 'contact_survey' clean( x, participant_age_column = "part_age", ..., participant.age.column = deprecated() )## S3 method for class 'contact_survey' clean( x, participant_age_column = "part_age", ..., participant.age.column = deprecated() )
x |
A |
participant_age_column |
the column in |
... |
ignored |
participant.age.column |
a cleaned survey in the correct format
data(polymod) cleaned <- clean(polymod) # not really necessary, polymod is cleandata(polymod) cleaned <- clean(polymod) # not really necessary, polymod is clean
Computes a contact matrix from a contact_survey that has been processed
by assign_age_groups() and optionally weigh(). This is the final step
in the pipeline workflow.
For post-processing, pipe the result into symmetrise(),
split_matrix(), or per_capita(). These post-processing functions
currently support single-grouping (age-only) matrices.
compute_matrix(survey, by = "age", counts = FALSE, weight_threshold = NULL)compute_matrix(survey, by = "age", counts = FALSE, weight_threshold = NULL)
survey |
a |
by |
character vector or list of grouping specifications. Each
entry is either the string |
counts |
whether to return counts instead of means |
weight_threshold |
numeric; if provided, weights above this threshold are capped to the threshold value and then re-normalised (default NULL) |
a contact_matrix object with elements matrix (a rank-2K
array) and participants (a long table with one row per grouping
combination)
Passing more than one entry to by produces a matrix of rank 2K,
where K = length(by). The first K dimensions index participants and
the last K dimensions index contacts, in the order given to by.
For example, by = c("age", "gender") returns an array with dimensions
(age, gender, age, gender) — age and gender of the participant
first, then of the contact. Dim names carry the levels of each grouping.
data(polymod) # Single-grouping (age) — default polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() # Two-grouping (age x gender) polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix(by = c("age", "gender"))data(polymod) # Single-grouping (age) — default polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() # Two-grouping (age x gender) polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix(by = c("age", "gender"))
Returns a data.frame of (age, proportion) pairs representing how
contact ages are distributed in the survey. This can be passed to
assign_age_groups() as estimated_contact_age to impute ages
from ranges using this distribution rather than uniform sampling.
contact_age_distribution(survey)contact_age_distribution(survey)
survey |
a |
a data.frame with columns age (integer) and proportion (numeric,
summing to 1)
data(polymod) dist <- contact_age_distribution(polymod) head(dist) plot(dist$age, dist$proportion, type = "h", xlab = "Age", ylab = "Proportion")data(polymod) dist <- contact_age_distribution(polymod) head(dist) plot(dist$age, dist$proportion, type = "h", xlab = "Age", ylab = "Proportion")
Samples a contact survey
contact_matrix( survey, countries = NULL, survey_pop = NULL, age_limits = NULL, filter = NULL, counts = FALSE, symmetric = FALSE, split = FALSE, sample_participants = FALSE, estimated_participant_age = c("mean", "sample", "missing"), estimated_contact_age = c("mean", "sample", "missing"), missing_participant_age = c("remove", "keep"), missing_contact_age = c("remove", "sample", "keep", "ignore"), weights = NULL, weigh_dayofweek = FALSE, weigh_age = FALSE, weight_threshold = NA, symmetric_norm_threshold = 2, sample_all_age_groups = FALSE, sample_participants_max_tries = 1000, return_part_weights = FALSE, return_demography = NA, per_capita = FALSE, ..., survey.pop = deprecated(), age.limits = deprecated(), sample.participants = deprecated(), estimated.participant.age = deprecated(), estimated.contact.age = deprecated(), missing.participant.age = deprecated(), missing.contact.age = deprecated(), weigh.dayofweek = deprecated(), weigh.age = deprecated(), weight.threshold = deprecated(), symmetric.norm.threshold = deprecated(), sample.all.age.groups = deprecated(), sample.participants.max.tries = deprecated(), return.part.weights = deprecated(), return.demography = deprecated(), per.capita = deprecated() )contact_matrix( survey, countries = NULL, survey_pop = NULL, age_limits = NULL, filter = NULL, counts = FALSE, symmetric = FALSE, split = FALSE, sample_participants = FALSE, estimated_participant_age = c("mean", "sample", "missing"), estimated_contact_age = c("mean", "sample", "missing"), missing_participant_age = c("remove", "keep"), missing_contact_age = c("remove", "sample", "keep", "ignore"), weights = NULL, weigh_dayofweek = FALSE, weigh_age = FALSE, weight_threshold = NA, symmetric_norm_threshold = 2, sample_all_age_groups = FALSE, sample_participants_max_tries = 1000, return_part_weights = FALSE, return_demography = NA, per_capita = FALSE, ..., survey.pop = deprecated(), age.limits = deprecated(), sample.participants = deprecated(), estimated.participant.age = deprecated(), estimated.contact.age = deprecated(), missing.participant.age = deprecated(), missing.contact.age = deprecated(), weigh.dayofweek = deprecated(), weigh.age = deprecated(), weight.threshold = deprecated(), symmetric.norm.threshold = deprecated(), sample.all.age.groups = deprecated(), sample.participants.max.tries = deprecated(), return.part.weights = deprecated(), return.demography = deprecated(), per.capita = deprecated() )
survey |
a |
countries |
limit to one or more countries; if NULL (default), will use all countries in the survey; these can be given as country names or 2-letter (ISO Alpha-2) country codes. |
survey_pop |
survey population – a data frame with columns
|
age_limits |
lower limits of the age groups over which to construct the matrix. If NULL (default), age limits are inferred from participant and contact ages. |
filter |
any filters to apply to the data, given as list of the form (column=filter_value) - only contacts that have 'filter_value' in 'column' will be considered. If multiple filters are given, they are all applied independently and in the sequence given. Default value is NULL; no filtering performed. |
counts |
whether to return counts (instead of means). |
symmetric |
whether to make matrix symmetric, such that
|
split |
whether to split the contact matrix into the mean
number of contacts, in each age group (split further into the
product of the mean number of contacts across the whole
population ( |
sample_participants |
whether to sample participants randomly (with replacement); done multiple times this can be used to assess uncertainty in the generated contact matrices. See the "Bootstrapping" section in the vignette for how to do this. |
estimated_participant_age |
if set to "mean" (default), people whose ages are given as a range (in columns named "..._est_min" and "..._est_max") but not exactly (in a column named "..._exact") will have their age set to the mid-point of the range; if set to "sample", the age will be sampled from the range; if set to "missing", age ranges will be treated as missing |
estimated_contact_age |
if set to "mean" (default), contacts whose ages are given as a range (in columns named "..._est_min" and "..._est_max") but not exactly (in a column named "..._exact") will have their age set to the mid-point of the range; if set to "sample", the age will be sampled from the range; if set to "missing", age ranges will be treated as missing. |
missing_participant_age |
if set to "remove" (default), participants without age information are removed; if set to "keep", participants with missing age are kept and will appear in the contact matrix in a row labelled "NA". |
missing_contact_age |
if set to "remove" (default), participants that have contacts without age information are removed; if set to "keep", contacts with missing age are kept and will appear in the contact matrix in a column labelled "NA"; if set to "ignore", contacts without age information are removed from the analysis (but the participants that made them are kept). The "sample" option is defunct (errors). |
weights |
column name(s) of the participant data of the
|
weigh_dayofweek |
whether to weigh social contacts data by the day of the week (weight (5/7 / N_week / N) for weekdays and (2/7 / N_weekend / N) for weekends). |
weigh_age |
whether to weigh social contacts data by the age of the participants (vs. the populations' age distribution). |
weight_threshold |
threshold value for the standardized weights before running an additional standardisation (default 'NA' = no cutoff). |
symmetric_norm_threshold |
threshold value for the
normalization weights when |
sample_all_age_groups |
what to do if sampling
participants (with |
sample_participants_max_tries |
maximum number of attempts
when |
return_part_weights |
boolean to return the participant weights. |
return_demography |
boolean to explicitly return demography data that corresponds to the survey data (default 'NA' = if demography data is requested by other function parameters). |
per_capita |
whether to return a matrix with contact rates per capita (default is FALSE and not possible if 'counts=TRUE' or 'split=TRUE'). |
... |
further arguments to pass to |
survey.pop, age.limits, sample.participants, estimated.participant.age, estimated.contact.age, missing.participant.age, missing.contact.age, weigh.dayofweek, weigh.age, weight.threshold, symmetric.norm.threshold, sample.all.age.groups, sample.participants.max.tries, return.part.weights, return.demography, per.capita
|
|
a contact matrix, and the underlying demography of the surveyed population
Sebastian Funk
data(polymod) contact_matrix( survey = polymod, countries = "United Kingdom", age_limits = c(0, 1, 5, 15) )data(polymod) contact_matrix( survey = polymod, countries = "United Kingdom", age_limits = c(0, 1, 5, 15) )
download_survey() is defunct. Use contactsurveys::download_survey()
instead.
download_survey() downloads survey data from Zenodo.
download_survey(survey, dir = NULL, sleep = 1)download_survey(survey, dir = NULL, sleep = 1)
survey |
a URL (see |
dir |
a directory to save the files to; if not given, will save to a temporary directory |
sleep |
time to sleep between requests to avoid overloading the server
(passed on to |
Always errors.
load_survey
# we recommend using the contactsurveys package for download_survey() ## Not run: # if needed, discover surveys with: contactsurveys::list_surveys() peru_survey <- download_survey("https://doi.org/10.5281/zenodo.1095664") # --> peru_survey <- contactsurveys::download_survey( "https://doi.org/10.5281/zenodo.1095664" ) ## End(Not run)# we recommend using the contactsurveys package for download_survey() ## Not run: # if needed, discover surveys with: contactsurveys::list_surveys() peru_survey <- download_survey("https://doi.org/10.5281/zenodo.1095664") # --> peru_survey <- contactsurveys::download_survey( "https://doi.org/10.5281/zenodo.1095664" ) ## End(Not run)
get_citation() is defunct. Use contactsurveys::get_citation() instead.
get_citation(x)get_citation(x)
x |
a character vector of surveys to cite |
Always errors.
# we recommend using the contactsurveys package for get_citation() ## Not run: data(polymod) citation <- contactsurveys::get_citation(polymod) print(citation) print(citation, style = "bibtex") ## End(Not run)# we recommend using the contactsurveys package for get_citation() ## Not run: data(polymod) citation <- contactsurveys::get_citation(polymod) print(citation) print(citation, style = "bibtex") ## End(Not run)
get_survey() is defunct. Use contactsurveys::download_survey() and then
load_survey() instead.
Downloads survey data, or extracts them from files, and returns a clean data
set. If a survey URL is accessed multiple times, the data will be cached
(unless clear_cache is set to TRUE) to avoid repeated downloads.
If survey objects are used repeatedly the downloaded files can be saved and
reloaded between sessions then survey objects can be saved/loaded using
base::saveRDS() and base::readRDS(), or via the individual survey files
that can be downloaded using download_survey() and subsequently loaded
using load_survey().
get_survey(survey, clear_cache = FALSE, ...)get_survey(survey, clear_cache = FALSE, ...)
survey |
a DOI or url to get the survey from, or a |
clear_cache |
logical, whether to clear the cache before downloading the survey; by default, the cache is not cleared and so multiple calls of this function to access the same survey will not result in repeated downloads. |
... |
currently unused |
Always errors.
## Not run: list_surveys() peru_doi <- "https://doi.org/10.5281/zenodo.1095664" peru_survey <- get_survey(peru_doi) ## --> We now recommend: peru_survey <- contactsurveys::download_survey(peru_doi) peru_data <- load_survey(peru_survey) ## End(Not run)## Not run: list_surveys() peru_doi <- "https://doi.org/10.5281/zenodo.1095664" peru_survey <- get_survey(peru_doi) ## --> We now recommend: peru_survey <- contactsurveys::download_survey(peru_doi) peru_data <- load_survey(peru_survey) ## End(Not run)
Checks if a character string is a DOI
is_doi(x)is_doi(x)
x |
Character vector; the string or strings to check |
Logical; TRUE if x is a DOI, FALSE otherwise
Sebastian Funk
Mostly used for plot labelling
limits_to_agegroups( x, limits = sort(unique(x)), notation = c("dashes", "brackets") )limits_to_agegroups( x, limits = sort(unique(x)), notation = c("dashes", "brackets") )
x |
age limits to transform |
limits |
lower age limits; if not given, will use all limits in |
notation |
whether to use bracket notation, e.g. [0,4) or dash notation, e.g. 0-4) |
Age groups as specified in notation
limits_to_agegroups(c(0, 5, 10))limits_to_agegroups(c(0, 5, 10))
list_surveys() is defunct. Use contactsurveys::list_surveys() instead.
list_surveys(clear_cache = FALSE)list_surveys(clear_cache = FALSE)
clear_cache |
logical, whether to clear the cache before downloading the survey; by default, the cache is not cleared and so multiple calls of this function to access the same survey will not result in repeated downloads. |
Always errors.
# we recommend using the contactsurveys package now for listing surveys. ## Not run: contactsurveys::list_surveys() ## End(Not run)# we recommend using the contactsurveys package now for listing surveys. ## Not run: contactsurveys::list_surveys() ## End(Not run)
Loads a survey from a local file system. Tables are expected as csv files, and a reference (if present) as JSON.
load_survey(files, participant_key = NULL, ...)load_survey(files, participant_key = NULL, ...)
files |
a vector of file names as returned by |
participant_key |
character vector specifying columns that uniquely
identify participant observations. For cross-sectional surveys this is
typically just |
... |
options for |
a survey in the correct format. For longitudinal surveys with
multiple observations per participant, the returned object includes an
observation_key field containing the column names (excluding part_id)
that distinguish observations for the same participant.
## Not run: list_surveys() peru_files <- download_survey("https://doi.org/10.5281/zenodo.1095664") peru_survey <- load_survey(peru_files) # For longitudinal surveys, specify the unique key explicitly: france_files <- download_survey("https://doi.org/10.5281/zenodo.1157918") france_survey <- load_survey(france_files, participant_key = c("part_id", "wave", "studyDay") ) ## End(Not run)## Not run: list_surveys() peru_files <- download_survey("https://doi.org/10.5281/zenodo.1095664") peru_survey <- load_survey(peru_files) # For longitudinal surveys, specify the unique key explicitly: france_files <- download_survey("https://doi.org/10.5281/zenodo.1157918") france_survey <- load_survey(france_files, participant_key = c("part_id", "wave", "studyDay") ) ## End(Not run)
This function combines the R image.plot function with numeric contact rates in the matrix cells.
matrix_plot( mij, min.legend = 0, max.legend = NA, num.digits = 2, num.colors = 50, main, xlab, ylab, legend.width, legend.mar, legend.shrink, cex.lab, cex.axis, cex.text, color.palette = heat.colors )matrix_plot( mij, min.legend = 0, max.legend = NA, num.digits = 2, num.colors = 50, main, xlab, ylab, legend.width, legend.mar, legend.shrink, cex.lab, cex.axis, cex.text, color.palette = heat.colors )
mij |
a contact matrix containing contact rates between
participants of age i (rows) with contacts of age j
(columns). This is the default matrix format of
|
min.legend |
the color scale minimum (default = 0). Set
to NA to use the minimum value of |
max.legend |
the color scale maximum (default = NA). Set
to NA to use the maximum value of |
num.digits |
the number of digits when rounding the contact rates (default = 2). Use NA to disable this. |
num.colors |
the number of color breaks (default = 50) |
main |
the figure title |
xlab |
a title for the x axis (default: "Age group (year)") |
ylab |
a title for the y axis (default: "Contact age group (year)") |
legend.width |
width of the legend strip in characters. Default is 1. |
legend.mar |
width in characters of legend margin. Default is 5.1. |
legend.shrink |
amount to shrink the size of legend relative to the full height or width of the plot. Default is 0.9. |
cex.lab |
size of the x and y labels (default: 1.2) |
cex.axis |
size of the axis labels (default: 0.8) |
cex.text |
size of the numeric values in the matrix (default: 1) |
color.palette |
the color palette to use (default:
|
This is a function using basic R graphics to visualise a social contact matrix.
Lander Willem
## Not run: data(polymod) mij <- contact_matrix( polymod, countries = "United Kingdom", age_limits = c(0, 18, 65) )$matrix matrix_plot(mij) ## End(Not run)## Not run: data(polymod) mij <- contact_matrix( polymod, countries = "United Kingdom", age_limits = c(0, 18, 65) )$matrix matrix_plot(mij) ## End(Not run)
Divides each column of the contact matrix by the population of the corresponding age group, giving the contact rate of age group i with one individual of age group j.
per_capita(x, survey_pop, ...)per_capita(x, survey_pop, ...)
x |
a list as returned by |
survey_pop |
a data frame with columns |
... |
passed to |
x with $matrix replaced by the per-capita version
data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> per_capita(survey_pop = pop)data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> per_capita(survey_pop = pop)
A dataset containing social mixing diary data from 8 European countries: Belgium, Germany, Finland, Great Britain, Italy, Luxembourg, The Netherlands and Poland. The Data are fully described in Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74.
polymodpolymod
A list of two data frames:
the study participant, with age, country, year and day of the week (starting with 1 = Monday)
reported contacts of the study participants. The variable phys_contact has two levels (1 denotes physical contact while 2 denotes non-physical contact), duration_multi has five levels (1 is less than 5 minutes while 5 is more than 4 hours, increasing in the order found in Figure 1 in Mossong et al.), and frequency_multi has five levels (1 is daily, 2 is weekly, 3 is monthly, 4 is less often, and 5 is first time)
All other variables are described on the Zenodo repository of the data, available at doi:10.5281/zenodo.1043437
doi:10.1371/journal.pmed.0050074
This changes population data to have age groups with the given age_limits, extrapolating linearly between age groups (if more are requested than available) and summing populations (if fewer are requested than available)
pop_age( pop, age_limits = NULL, pop_age_column = "lower.age.limit", pop_column = "population", ..., age.limits = deprecated(), pop.age.column = deprecated(), pop.column = deprecated() )pop_age( pop, age_limits = NULL, pop_age_column = "lower.age.limit", pop_column = "population", ..., age.limits = deprecated(), pop.age.column = deprecated(), pop.column = deprecated() )
pop |
a data frame with columns indicating lower age limits and population sizes (see 'pop_age_column' and 'pop_column') |
age_limits |
lower age limits of age groups to extract; if NULL (default), the population data is returned unchanged |
pop_age_column |
column in the 'pop' data frame indicating the lower age group limit |
pop_column |
column in the 'pop' data frame indicating the population size |
... |
ignored |
age.limits, pop.age.column, pop.column
|
data frame of age-specific population data
# 5-year age bands for a population of 70 million it_pop <- data.frame( lower.age.limit = seq(0, 80, by = 5), population = c(rep(2.5e6, 4), rep(3.5e6, 4), rep(5e6, 6), 5e6, 7e6, 4e6) ) # Modify the age data.frame to get age groups of 10 years instead of 5 pop_age(it_pop, age_limits = seq(0, 100, by = 10)) # The function will also automatically interpolate if necessary pop_age(it_pop, age_limits = c(0, 18, 40, 65))# 5-year age bands for a population of 70 million it_pop <- data.frame( lower.age.limit = seq(0, 80, by = 5), population = c(rep(2.5e6, 4), rep(3.5e6, 4), rep(5e6, 6), 5e6, 7e6, 4e6) ) # Modify the age data.frame to get age groups of 10 years instead of 5 pop_age(it_pop, age_limits = seq(0, 100, by = 10)) # The function will also automatically interpolate if necessary pop_age(it_pop, age_limits = c(0, 18, 40, 65))
Operates on lower limits
reduce_agegroups(x, limits)reduce_agegroups(x, limits)
x |
vector of limits |
limits |
new limits |
vector with the new age groups
reduce_agegroups(seq_len(20), c(0, 5, 10))reduce_agegroups(seq_len(20), c(0, 5, 10))
Splits the contact matrix into the mean number of contacts across the whole
population (mean.contacts), a normalisation constant (normalisation),
age-specific contact rates (contacts), and an assortativity matrix
(replacing $matrix). For details, see the "Getting Started" vignette.
split_matrix(x, survey_pop, ...)split_matrix(x, survey_pop, ...)
x |
a list as returned by |
survey_pop |
a data frame with columns |
... |
passed to |
x with $matrix replaced by the assortativity matrix, plus
additional elements $mean.contacts, $normalisation, and $contacts
data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> split_matrix(survey_pop = pop)data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> split_matrix(survey_pop = pop)
survey() is defunct. Use as_contact_survey() instead.
survey(participants, contacts, reference = NULL)survey(participants, contacts, reference = NULL)
participants |
a |
contacts |
a |
reference |
a |
Always errors.
Sebastian Funk
survey_countries(survey, country.column = "country", ...)survey_countries(survey, country.column = "country", ...)
survey |
a DOI or url to get the survey from, or a |
country.column |
column in the survey indicating the country |
... |
further arguments for |
survey_countries() is defunct. Use contactsurveys::download_survey()
and load_survey() and then explore the country column yourself.
Always errors.
## Not run: data(polymod) survey_countries(polymod) ## End(Not run) ## --> we now recommend ## Not run: doi_peru <- "10.5281/zenodo.1095664" # nolint # download the data with the contactsurveys package peru_survey <- contactsurveys::download_survey(doi_peru) # load the survey with socialmixr peru_data <- socialmixr::load_survey(peru_survey) # find the unique country - assuming your data has a "country" column: unique(peru_data$participants$country) ## End(Not run)## Not run: data(polymod) survey_countries(polymod) ## End(Not run) ## --> we now recommend ## Not run: doi_peru <- "10.5281/zenodo.1095664" # nolint # download the data with the contactsurveys package peru_survey <- contactsurveys::download_survey(doi_peru) # load the survey with socialmixr peru_data <- socialmixr::load_survey(peru_survey) # find the unique country - assuming your data has a "country" column: unique(peru_data$participants$country) ## End(Not run)
This function is deprecated alongside wpp_age(), which it wraps. The
underlying wpp2017 data is outdated. Construct a data.frame with
columns lower.age.limit and population from a current source (e.g.
the wpp2024 package from GitHub) and pass it to contact_matrix()
via the survey_pop argument instead.
survey_country_population(survey, countries = NULL)survey_country_population(survey, countries = NULL)
survey |
A |
countries |
Optional. A character vector of country names. If specified, this will be used instead of the potential "country" column in "participants". |
A data table with population data by age group for the survey countries, aggregated by lower age limit. The function will error if no country information is available from either the survey or countries argument.
if (requireNamespace("wpp2017", quietly = TRUE)) { survey_country_population(polymod, countries = "Belgium") }if (requireNamespace("wpp2017", quietly = TRUE)) { survey_country_population(polymod, countries = "Belgium") }
Makes a contact matrix symmetric so that ,
where is the (i, j) entry and is the population
of age group i. This is done by replacing each pair with half their sum,
weighted by population size.
symmetrise(x, survey_pop, symmetric_norm_threshold = 2, ...)symmetrise(x, survey_pop, symmetric_norm_threshold = 2, ...)
x |
a list as returned by |
survey_pop |
a data frame with columns |
symmetric_norm_threshold |
threshold for the normalisation factor before issuing a warning (default 2) |
... |
passed to |
x with $matrix replaced by the symmetrised version
data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> symmetrise(survey_pop = pop)data(polymod) pop <- data.frame( lower.age.limit = c(0, 5, 15), population = c(3500000, 6000000, 50000000) ) polymod |> (\(s) s[country == "United Kingdom"])() |> assign_age_groups(age_limits = c(0, 5, 15)) |> compute_matrix() |> symmetrise(survey_pop = pop)
weigh() multiplies participant weights by values looked up from a
target. The existing weight column is multiplied in place, so
multiple calls compose; if no weight column is present, one is created
with value 1.
weigh_by_dayofweek() and weigh_by_age() are thin convenience
wrappers around the two most common recipes — the weekly weekday/weekend
split and age post-stratification against a reference population. See
the dedicated sections below for what they compute exactly.
weigh(survey, by, target = NULL, groups = NULL, ...) weigh_by_dayofweek(survey) weigh_by_age(survey, pop, ...)weigh(survey, by, target = NULL, groups = NULL, ...) weigh_by_dayofweek(survey) weigh_by_age(survey, pop, ...)
survey |
a |
by |
column name in the participant data to join on |
target |
see Target shapes accepted by |
groups |
a list of value sets mapping column values to groups (used
with an unnamed numeric |
... |
further arguments passed to |
pop |
a data frame with columns |
the survey object with updated participant weights
weigh()
target = NULL (the default) — multiply the numeric column by
directly into weight. Useful when participants already carry a
precomputed weight column.
a two-column data frame whose key column is named by —
pure discrete join: multiply the value column into weight where the
key matches. Unmatched values get NA (with a warning).
an unnamed numeric vector together with groups — each element of
target is the total weight assigned across participants matching
the corresponding entry in groups. The per-participant factor is
target[g] / n_in_group.
a named numeric vector — same as above but names(target) are
matched against values of the by column.
A data frame target that does not have a column named by but does
have lower.age.limit and population triggers a deprecation warning
and falls back to the old hidden age post-stratification path; use
weigh_by_age() instead.
weigh_by_dayofweek()Rescales weights so that weekday participants together carry a total
weight of 5 and weekend participants a total weight of 2 — the weekly
5/2 split that corrects for the typical over-representation of weekdays
in diary surveys. Concretely, each weekday participant gets
5 / n_weekday and each weekend participant 2 / n_weekend;
participants with NA day-of-week get the neutral average 7 / N.
The dayofweek column is taken to use 0 = Sunday through 6 = Saturday
(the POLYMOD convention).
Equivalent to:
weigh(survey, "dayofweek", target = c(5, 2), groups = list(1:5, c(0, 6)))
weigh_by_age()Convenience wrapper for age post-stratification. The main thing it
adds over a raw weigh() call is interpolation: the reference
pop is expanded to single-year ages with pop_age(), so it can be
supplied at any age resolution (e.g. 5-year bands).
For each single-year age the weight then becomes
where is the target population at age , the
total, and , the corresponding sample counts.
survey must already have been processed by assign_age_groups() so
that a part_age column is available for the join.
data(polymod) uk <- polymod[country == "United Kingdom"] |> assign_age_groups(age_limits = c(0, 5, 15)) # ── target = NULL ──────────────────────────────────────────────── # Multiply an existing numeric column directly into the weight: uk |> weigh("hh_size") # ── data-frame target (discrete join) ──────────────────────────── # The key column of `target` must match `by`. Each participant # has its weight multiplied by the matching value column. age_target <- data.frame( age.group = c("[0,5)", "[5,15)", "[15,Inf)"), p = c(0.06, 0.12, 0.82) ) uk |> weigh("age.group", target = age_target) # Same idea, joining on `country` to pool participants across studies # by a target population share: country_target <- data.frame( country = c("United Kingdom", "Germany", "Italy"), p = c(0.3, 0.4, 0.3) ) polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> weigh("country", target = country_target) # ── unnamed vector + groups (total-weight semantics) ───────────── # Each `target[g]` is the *total* weight assigned to participants in # `groups[[g]]`. Here weekdays together carry weight 5, weekend days # together carry weight 2: uk |> weigh("dayofweek", target = c(5, 2), groups = list(1:5, c(0, 6))) # The same is available as the convenience: uk |> weigh_by_dayofweek() # ── named vector ───────────────────────────────────────────────── # `names(target)` are matched against `by` values; each value is the # total weight for participants with that key. uk$participants[, agecat := ifelse(part_age < 18, "child", "adult")] uk |> weigh("agecat", target = c(child = 0.25, adult = 0.75)) # ── age post-stratification ────────────────────────────────────── uk_pop <- data.frame( lower.age.limit = c(0, 5, 15, 65), population = c(3500000, 6000000, 40000000, 10000000) ) uk |> weigh_by_age(uk_pop)data(polymod) uk <- polymod[country == "United Kingdom"] |> assign_age_groups(age_limits = c(0, 5, 15)) # ── target = NULL ──────────────────────────────────────────────── # Multiply an existing numeric column directly into the weight: uk |> weigh("hh_size") # ── data-frame target (discrete join) ──────────────────────────── # The key column of `target` must match `by`. Each participant # has its weight multiplied by the matching value column. age_target <- data.frame( age.group = c("[0,5)", "[5,15)", "[15,Inf)"), p = c(0.06, 0.12, 0.82) ) uk |> weigh("age.group", target = age_target) # Same idea, joining on `country` to pool participants across studies # by a target population share: country_target <- data.frame( country = c("United Kingdom", "Germany", "Italy"), p = c(0.3, 0.4, 0.3) ) polymod |> assign_age_groups(age_limits = c(0, 5, 15)) |> weigh("country", target = country_target) # ── unnamed vector + groups (total-weight semantics) ───────────── # Each `target[g]` is the *total* weight assigned to participants in # `groups[[g]]`. Here weekdays together carry weight 5, weekend days # together carry weight 2: uk |> weigh("dayofweek", target = c(5, 2), groups = list(1:5, c(0, 6))) # The same is available as the convenience: uk |> weigh_by_dayofweek() # ── named vector ───────────────────────────────────────────────── # `names(target)` are matched against `by` values; each value is the # total weight for participants with that key. uk$participants[, agecat := ifelse(part_age < 18, "child", "adult")] uk |> weigh("agecat", target = c(child = 0.25, adult = 0.75)) # ── age post-stratification ────────────────────────────────────── uk_pop <- data.frame( lower.age.limit = c(0, 5, 15, 65), population = c(3500000, 6000000, 40000000, 10000000) ) uk |> weigh_by_age(uk_pop)
This function is deprecated in favour of passing population data directly
to contact_matrix() via the survey_pop argument. Additionally, the
underlying wpp2017 data is outdated. For more recent population data,
use the wpp2024 package from GitHub.
wpp_age(countries, years)wpp_age(countries, years)
countries |
countries, will return all if not given |
years |
years, will return all if not given |
This uses data from the wpp2017 package but combines male and female,
and converts age groups to lower age limits. If the requested
year is not present in the historical data, WPP projections
are used.
data frame of age-specific population data
if (requireNamespace("wpp2017", quietly = TRUE)) { wpp_age("Italy", c(1990, 2000)) } # For more recent data, use wpp2024 from GitHub: # remotes::install_github("PPgp/wpp2024") # library(wpp2024) # data(popAge1dt) # uk_pop <- popAge1dt[name == "United Kingdom" & year == 2020, # .(lower.age.limit = age, population = pop * 1000)] # contact_matrix(polymod, countries = "United Kingdom", survey_pop = uk_pop)if (requireNamespace("wpp2017", quietly = TRUE)) { wpp_age("Italy", c(1990, 2000)) } # For more recent data, use wpp2024 from GitHub: # remotes::install_github("PPgp/wpp2024") # library(wpp2024) # data(popAge1dt) # uk_pop <- popAge1dt[name == "United Kingdom" & year == 2020, # .(lower.age.limit = age, population = pop * 1000)] # contact_matrix(polymod, countries = "United Kingdom", survey_pop = uk_pop)
This function is deprecated in favour of passing population data directly
to contact_matrix() via the survey_pop argument, which removes the need
for a country list. Additionally, the underlying wpp2017 data is outdated.
For countries available in more recent WPP editions, use the wpp2024
package from GitHub.
wpp_countries()wpp_countries()
Uses the World Population Prospects data from the wpp2017 package.
list of countries
if (requireNamespace("wpp2017", quietly = TRUE)) { wpp_countries() }if (requireNamespace("wpp2017", quietly = TRUE)) { wpp_countries() }