| Title: | Model Infectious Disease Parameters from Serosurveys |
|---|---|
| Description: | An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model(), static_sir_model(), mseir_model(), sir_subpops_model()), parametric models (polynomial_model(), fp_model()), nonparametric models (lp_model()), semiparametric models (penalized_splines_model()), hierarchical models (hierarchical_bayesian_model()). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) <doi:10.1007/978-1-4614-4072-7>. |
| Authors: | Anh Phan Truong Quynh [aut, cre] (ORCID: <https://orcid.org/0009-0000-2129-435X>), Nguyen Pham Nguyen The [aut] (ORCID: <https://orcid.org/0000-0002-0356-2776>), Long Bui Thanh [aut], Tuyen Huynh [aut], Thinh Ong [aut] (ORCID: <https://orcid.org/0000-0001-6772-9291>), Marc Choisy [aut] (ORCID: <https://orcid.org/0000-0002-5187-6390>) |
| Maintainer: | Anh Phan Truong Quynh <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.0.9000 |
| Built: | 2026-06-01 06:48:31 UTC |
| Source: | https://github.com/oucru-modelling/serosv |
An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model(), static_sir_model(), mseir_model(), sir_subpops_model()), parametric models (polynomial_model(), fp_model()), nonparametric models (lp_model()), semiparametric models (penalized_splines_model()), hierarchical models (hierarchical_bayesian_model()). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) doi:10.1007/978-1-4614-4072-7.
Maintainer: Anh Phan Truong Quynh [email protected] (ORCID)
Authors:
Nguyen Pham Nguyen The [email protected] (ORCID)
Long Bui Thanh
Tuyen Huynh [email protected]
Thinh Ong [email protected] (ORCID)
Marc Choisy [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/OUCRU-Modelling/serosv/issues
Visualize positive threshold at different dilution factors
add_thresholds(dilution_factors, positive_threshold = 0.1, shift_text = 0.15)add_thresholds(dilution_factors, positive_threshold = 0.1, shift_text = 0.15)
dilution_factors |
dilution factors to be visualized |
positive_threshold |
titer threshold for sample to be considered positive |
shift_text |
adjust how much the text is shifted along the x-axis (relative to the threshold line) |
Fit age-stratified seroprevalence across multiple time points. Also try to monotonize age (or birth cohort) - specific seroprevalence.
age_time_model( data, age_col = "age", status_col = "status", pos_col = "pos", tot_col = "tot", time_col = "date", grouping_col = "group", age_correct = F, le = 512, ci = 0.95, monotonize_method = "pava" )age_time_model( data, age_col = "age", status_col = "status", pos_col = "pos", tot_col = "tot", time_col = "date", grouping_col = "group", age_correct = F, le = 512, ci = 0.95, monotonize_method = "pava" )
data |
input data, must have age, status, time, group columns, where group column determines how data is aggregated |
age_col |
name of the 'age' column (default age_col="age"). |
status_col |
name of the 'status' column (default status_col="status"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
time_col |
name of the column for time (default to "date") |
grouping_col |
name of the column for time (default to "group") |
age_correct |
a boolean, if 'TRUE', monotonize age-specific prevalence. Monotonize birth cohort-specific seroprevalence otherwise. |
le |
number of bins to generate age grid, used when monotonizing data |
ci |
confidence interval for smoothing |
monotonize_method |
either "pava" or "scam" |
a list of class time_age_model with 4 items
out |
a data.frame with dimension n_group x 9, where columns 'info', 'sp', 'foi' store output for non-monotonized data and 'monotonized_info', 'monotonized_sp', 'monotonized_foi', 'monotonized_ci_mod' store output for monotonized data |
grouping_col |
name of the column for grouping |
age_correct |
a boolean indicating whether the data is monotonized across age or cohort |
datatype |
whether the input data is aggregated or line-listing data |
Generate table of metrics for model comparison
compare_models(data, method = "AIC/BIC", ...)compare_models(data, method = "AIC/BIC", ...)
data |
input data to fit into the models |
method |
method to compare models. Can be one of the built-in methods or a function to compute the returned metrics (see Details). |
... |
models to be compared. Must be models created by serosv. If models' names are not provided, indices will be used instead for the 'model' column in the returned data.frame. |
Built-in comparison methods include:
computing AIC and BIC, which returns AIC, BIC values of the model if available
cross validation (perform k-fold validation), which returns MSE and logloss (negative log Binomial likelihood) for aggregated data, or AUC and logloss (negative log Bernoulli likelihood) for linelisting data
a data.frame with the following columns
label |
name or index of the model |
type |
model type of the given model (a serosv model name) |
metrics columns |
the columns for metrics of comparison, the number of which depends on the function that generate these metrics |
comparison_table <- suppressWarnings( compare_models( data = hav_bg_1964, method = "CV", polynomial_mod = ~polynomial_model(.x, k=1), penalized_spline = penalized_spline_model, farrington = ~farrington_model(.x, start=list(alpha=0.3,beta=0.1,gamma=0.03)) ) ) # view table of metrics comparison_table # view the model fitted with the whole dataset comparison_table$plotscomparison_table <- suppressWarnings( compare_models( data = hav_bg_1964, method = "CV", polynomial_mod = ~polynomial_model(.x, k=1), penalized_spline = penalized_spline_model, farrington = ~farrington_model(.x, start=list(alpha=0.3,beta=0.1,gamma=0.03)) ) ) # view table of metrics comparison_table # view the model fitted with the whole dataset comparison_table$plots
Compute confidence interval for time age model
## S3 method for class 'age_time_model' compute_ci(x, ci = 0.95, le = 100, ...)## S3 method for class 'age_time_model' compute_ci(x, ci = 0.95, le = 100, ...)
x |
serosv models |
ci |
confidence interval |
le |
number of data for computing confidence interval |
... |
arbitrary argument |
confidence interval dataframe with n_group x 3 cols, the columns are 'group', 'sp_df', 'foi_df'
Compute confidence interval for a model of serosv
## Default S3 method: compute_ci(x, ci = 0.95, le = 100, ...)## Default S3 method: compute_ci(x, ci = 0.95, le = 100, ...)
x |
serosv models |
ci |
confidence interval |
le |
number of data for computing confidence interval |
... |
arbitrary argument |
confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Compute confidence interval for fractional polynomial model
## S3 method for class 'fp_model' compute_ci(x, ci = 0.95, le = 100, ...)## S3 method for class 'fp_model' compute_ci(x, ci = 0.95, le = 100, ...)
x |
serosv models |
ci |
confidence interval |
le |
number of data for computing confidence interval |
... |
arbitrary argument |
confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Compute 95% credible interval for hierarchical Bayesian model
## S3 method for class 'hierarchical_bayesian_model' compute_ci(x, ...)## S3 method for class 'hierarchical_bayesian_model' compute_ci(x, ...)
x |
serosv models |
... |
arbitrary arguments |
list of confidence interval for seroprevalence and foi. Each confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Compute confidence interval for local polynomial model
## S3 method for class 'lp_model' compute_ci(x, ci = 0.95, ...)## S3 method for class 'lp_model' compute_ci(x, ci = 0.95, ...)
x |
serosv models |
ci |
confidence interval |
... |
arbitrary arguments |
confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Compute confidence interval for mixture model
## S3 method for class 'mixture_model' compute_ci(x, ci = 0.95, ...)## S3 method for class 'mixture_model' compute_ci(x, ci = 0.95, ...)
x |
serosv mixture_model object |
ci |
confidence interval |
... |
arbitrary arguments |
list of confidence interval for susceptible and infected. Each confidence interval is a list with 2 items for lower and upper bound of the interval.
Compute confidence interval for penalized_spline_model
## S3 method for class 'penalized_spline_model' compute_ci(x, ci = 0.95, ...)## S3 method for class 'penalized_spline_model' compute_ci(x, ci = 0.95, ...)
x |
serosv models |
ci |
confidence interval |
... |
arbitrary arguments |
list of confidence interval for seroprevalence and foi Each confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Compute confidence interval for Weibull model
## S3 method for class 'weibull_model' compute_ci(x, ci = 0.95, ...)## S3 method for class 'weibull_model' compute_ci(x, ci = 0.95, ...)
x |
serosv models |
ci |
confidence interval |
... |
arbitrary argument |
confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Estimate the true sero prevalence using Frequentist/Bayesian estimation
correct_prevalence( data, bayesian = TRUE, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", init_se = 0.95, init_sp = 0.8, study_size_se = 1000, study_size_sp = 1000, chains = 1, warmup = 1000, iter = 2000 )correct_prevalence( data, bayesian = TRUE, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", init_se = 0.95, init_sp = 0.8, study_size_se = 1000, study_size_sp = 1000, chains = 1, warmup = 1000, iter = 2000 )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
bayesian |
whether to adjust sero-prevalence using the Bayesian or frequentist approach. If set to 'TRUE', true sero-prevalence is estimated using MCMC. |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
init_se |
sensitivity of the serological test |
init_sp |
specificity of the serological test |
study_size_se |
(applicable when 'bayesian=TRUE') study size for sensitivity validation study (i.e., number of confirmed infected patients in the study) |
study_size_sp |
(applicable when 'bayesian=TRUE') study size for specificity validation study (i.e., number of confirmed non-infected patients in the study) |
chains |
(applicable when 'bayesian=TRUE') number of Markov chains |
warmup |
(applicable when 'bayesian=TRUE') number of warm up runs |
iter |
(applicable when 'bayesian=TRUE') number of iterations |
a list of 3 items
info |
estimated parameters (when 'bayesian = TRUE') or formula to compute corrected prevalence (when 'bayesian = FALSE') |
df |
data.frame of input data (in aggregated form) with the 95% confidence interval for apparent (i.e. observed) seroprevalence |
corrected_sero |
data.frame containing age, the corresponding estimated seroprevalance with 95% confidence/credible interval, and adjusted tot and pos |
data <- rubella_uk_1986_1987 correct_prevalence(data)data <- rubella_uk_1986_1987 correct_prevalence(data)
Estimate force of infection
est_foi(t, sp)est_foi(t, sp)
t |
time (in this case age) vector |
sp |
seroprevalence vector |
computed foi vector
Estimate age-specific seroprevalence and FOI given a fitted mixture model (generated by [serosv::mixture_model()])
estimate_from_mixture( age, antibody_level, threshold_status = NULL, mixture_model, s = "ps", sp = 83, monotonize = TRUE )estimate_from_mixture( age, antibody_level, threshold_status = NULL, mixture_model, s = "ps", sp = 83, monotonize = TRUE )
age |
vector of age |
antibody_level |
vector of the corresponding raw antibody level |
threshold_status |
sero status using threshold approach in line listing (optional, for visualization and comparison only) |
mixture_model |
mixture_model object generated by serosv::mixture_model() |
s |
smoothing basis used to fit antibody level |
sp |
smoothing parameter |
monotonize |
whether to monotonize seroprevalence (default to TRUE) |
Antibody level (denoted ) is modeled using a 2-component Gaussian
mixture model. Each component () represents the
antibody level of the latent Infected and Susceptible sub-populations, following density
Let denotes the age-dependent mixing probability
(i.e., the true prevalence), the density of the mixture is formulated as
The mean thus equals
From which true prevalence can be computed as
And FOI can then be inferred as
Function [serosv::mixture_model()] fits antibody level data to and
Function [serosv::estimate_mixture()] will then estimate age-specific antibody level
and infer the estimation for and
Refer to section 11.3. of the the book by Hens et al. (2012) for further details.
a list of class estimated_from_mixture with the following items
df |
the dataframe used for fitting the model |
info |
a fitted "gam" model for mu(a) |
sp |
seroprevalence |
foi |
force of infection |
threshold_status |
serostatus using threshold method only if provided |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[mgcv::gam()] for more information about the fitted gam object
Fit age-stratified seroprevalence data using the Farrington (1990) model, which assumes the force of infection increases linearly with age and subsequently decreases exponentially.
farrington_model( data, start, fixed = list(), age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )farrington_model( data, start, fixed = list(), age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
start |
Named list of vectors or single vector. Initial values for optimizer. |
fixed |
Named list of vectors or single vector. Parameter values to keep fixed during optimization. |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
The force of infection is defined as followed
Where is called the long term residual for FOI,
as ,
The seroprevalence can thus be estimated using the non-linear model
Refer to section 6.1.2. of the the book by Hens et al. (2012) for further details.
a list of class farrington_model with 5 items
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
info |
fitted "mle" object |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[stats4::mle()] for more information on the fitted mle object
df <- rubella_uk_1986_1987 model <- farrington_model( df, start=list(alpha=0.07,beta=0.1,gamma=0.03) ) plot(model)df <- rubella_uk_1986_1987 model <- farrington_model( df, start=list(alpha=0.07,beta=0.1,gamma=0.03) ) plot(model)
Return the best powers for a given degree
find_best_fp_powers(data, p, mc, degree, link = "logit")find_best_fp_powers(data, p, mc, degree, link = "logit")
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
p |
a powers sequence to be tested. |
mc |
indicates if the returned model should be monotonic. |
degree |
the maximum degree (i.e. number of power terms) to search for the best model. Recommended to be <= 2. |
link |
the link function. Defaulted to "logit". |
list of 3 elements:
p |
The best power for fp model. |
deviance |
Deviance of the best fitted model. |
model |
The best model fitted |
Fractional polynomial model is a generalization of polynomial models where the power of the terms can be fractions, allowing more flexibility and better fit for data where asymptotic behavior is expected.
fp_model( data, p, monotonic = FALSE, link = "logit", age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )fp_model( data, p, monotonic = FALSE, link = "logit", age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )
data |
the input data frame, must either have 'age', 'pos', 'tot' columns (for aggregated data) OR 'age', 'status' for (linelisting data) |
p |
is either:
|
monotonic |
whether the returned model should be monotonic (if a search is specified) |
link |
the link function for model. Defaulted to "logit". |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
Instead of a polynomial, the linear predictor is now defined as
Where is an integer, is a sequence of powers,
and is a transformation given by
Refers to section 6.2. of the the book by Hens et al. (2012) for further details.
a list of class fp_model with 5 items
datatype |
type of data used for fitting model (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
info |
a fitted glm model |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[stats::glm()] for more information on glm object
[polynomial_models()]
df <- hav_be_1993_1994 model <- fp_model( df, p=c(1.5, 1.6), link="cloglog") plot(model)df <- hav_be_1993_1994 model <- fp_model( df, p=c(1.5, 1.6), link="cloglog") plot(model)
A study of the prevalence of HAV antibodies conducted in the Flemish Community of Belgium in 1993 and early 1994
hav_be_1993_1994hav_be_1993_1994
A data frame with 3 variables:
Age group
Number of seropositive individuals
Total number of individuals surveyed
Beutels, M., Van Damme, P., Aelvoet, W. et al. Prevalence of hepatitis A, B and C in the Flemish population. Eur J Epidemiol 13, 275-280 (1997). doi:10.1023/A:1007393405966
with(hav_be_1993_1994, plot( age, pos / tot, pty = "s", cex = 0.34 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )with(hav_be_1993_1994, plot( age, pos / tot, pty = "s", cex = 0.34 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )
A subset of the serological dataset of Varicella-Zoster Virus (VZV) and Parvovirus B19 in Belgium where only individuals living in Flanders were selected
hav_be_2002hav_be_2002
A data frame with 2 variables:
Age of individual
If the individual is seropositive or not
Thiry, N., Beutels, P., Shkedy, Z. et al. The seroepidemiology of primary varicella-zoster virus infection in Flanders (Belgium). Eur J Pediatr 161, 588-593 (2002). doi:10.1007/s00431-002-1053-2
library(dplyr) df <- hav_be_2002 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) with( df, plot( age, pos / tot, pty = "s", cex = 0.34 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )library(dplyr) df <- hav_be_2002 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) with( df, plot( age, pos / tot, pty = "s", cex = 0.34 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )
A cross-sectional survey conducted in 1964 in Bulgaria. Samples were collected from schoolchildren and blood donors.
hav_bg_1964hav_bg_1964
A data frame with 3 variables:
Age group
Number of seropositive individuals
Total number of individuals surveyed
Keiding, Niels. "Age-Specific Incidence and Prevalence: A Statistical Perspective." Journal of the Royal Statistical Society. Series A (Statistics in Society) 154, no. 3 (1991): 371-412. doi:10.2307/2983150
with( hav_bg_1964, plot( age, pos / tot, pty = "s", cex = 0.6 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )with( hav_bg_1964, plot( age, pos / tot, pty = "s", cex = 0.6 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 86), ylim = c(0, 1) ) )
A seroprevalence study conducted in St. Petersburg (more information in the book)
hbv_ru_1999hbv_ru_1999
A data frame with 4 variables:
Age group
Number of seropositive individuals
Total number of individuals surveyed
Gender of cohort (unsure what 1 and 2 means)
Mukomolov, S., L. Shliakhtenko, I. Levakova, and E. Shargorodskaya. Viral hepatitis in Russian federation. An analytical overview. Technical Report 213 (3), 3rd edn. St Petersburg Pasteur Institute, St Petersburg, 2000.
library(dplyr) hbv_ru_1999$age <- trunc(hbv_ru_1999$age / 1) * 1 hbv_ru_1999$age[hbv_ru_1999$age > 40] <- trunc( hbv_ru_1999$age[hbv_ru_1999$age > 40] / 5 ) * 5 df <- hbv_ru_1999 %>% group_by(age) %>% summarise(pos = sum(pos), tot = sum(tot)) plot( df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 72) )library(dplyr) hbv_ru_1999$age <- trunc(hbv_ru_1999$age / 1) * 1 hbv_ru_1999$age[hbv_ru_1999$age > 40] <- trunc( hbv_ru_1999$age[hbv_ru_1999$age > 40] / 5 ) * 5 df <- hbv_ru_1999 %>% group_by(age) %>% summarise(pos = sum(pos), tot = sum(tot)) plot( df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 72) )
A study of HCV infection among injecting drug users. All injecting drug users were interviewed by means of a standardized face-to-face interview and information on their socio-demographic status, drug use history, drug use, and related risk behavior was recorded
hcv_be_2006hcv_be_2006
A data frame with 3 variables:
Duration of injection/Exposure time (years)
If the individual is seropositive or not
Mathei, C., Shkedy, Z., Denis, B., Kabali, C., Aerts, M., Molenberghs, G., Van Damme, P. and Buntinx, F. (2006), Evidence for a substantial role of sharing of injecting paraphernalia other than syringes/needles to the spread of hepatitis C among injecting drug users. Journal of Viral Hepatitis, 13: 560-570. doi:10.1111/j.1365-2893.2006.00725.x
library(dplyr) # snapping age to aggregated age group # (credit: https://stackoverflow.com/a/12861810) groups <- c(0.5:24.5) range <- 0.5 low <- findInterval(hcv_be_2006$dur, groups) high <- low + 1 low_diff <- hcv_be_2006$dur - groups[ifelse(low == 0, NA, low)] high_diff <- groups[ifelse(high == 0, NA, high)] - hcv_be_2006$dur mins <- pmin(low_diff, high_diff, na.rm = TRUE) pick <- ifelse(!is.na(low_diff) & mins == low_diff, low, high) hcv_be_2006$dur <- ifelse( mins <= range + .Machine$double.eps, groups[pick], hcv_be_2006$dur ) hcv_be_2006 <- hcv_be_2006 %>% group_by(dur) %>% summarise(tot = n(), pos = sum(seropositive)) with(hcv_be_2006, plot( dur, pos / tot, cex = 0.42 * sqrt(tot), pch = 16, xlab = "duration of injection (years)", ylab = "seroprevalence", xlim = c(0, 25), ylim = c(0, 1) ) )library(dplyr) # snapping age to aggregated age group # (credit: https://stackoverflow.com/a/12861810) groups <- c(0.5:24.5) range <- 0.5 low <- findInterval(hcv_be_2006$dur, groups) high <- low + 1 low_diff <- hcv_be_2006$dur - groups[ifelse(low == 0, NA, low)] high_diff <- groups[ifelse(high == 0, NA, high)] - hcv_be_2006$dur mins <- pmin(low_diff, high_diff, na.rm = TRUE) pick <- ifelse(!is.na(low_diff) & mins == low_diff, low, high) hcv_be_2006$dur <- ifelse( mins <= range + .Machine$double.eps, groups[pick], hcv_be_2006$dur ) hcv_be_2006 <- hcv_be_2006 %>% group_by(dur) %>% summarise(tot = n(), pos = sum(seropositive)) with(hcv_be_2006, plot( dur, pos / tot, cex = 0.42 * sqrt(tot), pch = 16, xlab = "duration of injection (years)", ylab = "seroprevalence", xlim = c(0, 25), ylim = c(0, 1) ) )
Fit age-stratified seroprevalence to parametric hierarchical Bayesian models. Supported models including Farrington model (2 and 3 parameters variants) and Log Logistic model
hierarchical_bayesian_model( data, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", type = "far3", chains = 1, warmup = 1500, iter = 5000 )hierarchical_bayesian_model( data, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", type = "far3", chains = 1, warmup = 1500, iter = 5000 )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
type |
type of model ("far2", "far3" or "log_logistic") |
chains |
number of Markov chains |
warmup |
number of warmup runs |
iter |
number of iterations |
Consider a model for prevalence that has a parametric form
where is a parameter vector
Under a Bayesian framework, we can constraint the parameter space of the prior distribution
to achieve monotonicity of the posterior distribution
Where:
- and is the sample size at age
- and is the number of infected individual from the sampled subjects
For Farrington model with 3 parameters, prevalence is formulated as follow
The likelihood model is defined as
The constraint on the parameter space can be incorporated by assuming
truncated normal distribution for the components of ,
in
The flat hyperpriors are defined as and
For Farrington model with 2 parameters, it is equivalent to the previous model with
For Log logistic model, seroprevalence is instead defined as
The likelihood is similarly defined as
The prior model of is specified as
with flat hyperpriors as in Farrington model
is constrained to be positive by specifying
Refer to section Chapter 10.3 of the the book by Hens et al. (2012) for further details.
a list of class hierarchical_bayesian_model with 6 items
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
type |
type of bayesian model far2, far3 or log_logistic |
info |
parameters for the fitted model |
sp |
seroprevalence |
foi |
force of infection |
sp_func |
function to compute seroprevalence given age and model parameters |
foi |
function to compute force of infection given age and model parameters |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
df <- mumps_uk_1986_1987 model <- hierarchical_bayesian_model(df, type="far3") model$info plot(model)df <- mumps_uk_1986_1987 model <- hierarchical_bayesian_model(df, type="far3") model$info plot(model)
Fit the age-specific seroprevalence to a local polynomial model, where the linear predictor is approximated locally at one particular age.
lp_model( data, kern = "tcub", nn = 0, h = 0, deg = 2, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )lp_model( data, kern = "tcub", nn = 0, h = 0, deg = 2, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
kern |
Weight function, default = "tcub". Other choices are "rect", "trwt", "tria", "epan", "bisq" and "gauss". Choices may be restricted when derivatives are required; e.g. for confidence bands and some bandwidth selectors. |
nn |
Nearest neighbor component of the smoothing parameter. Default value is 0.7, unless either h is provided, in which case the default is 0. |
h |
The constant bandwidth of the smoothing parameter. Default: 0. |
deg |
Degree of polynomial to use. Default: 2. |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
Consider a linear predictor approximated locally at one particular value .
For a general degree , the linear predictor for a neighbor of , labeled is equivalent to the Taylor approximation
can be estimated by maximizing
The estimator for the -th derivative of , for
(degree of local polynomial) is thus:
The estimator for the prevalence at age is then given by
Where is the link function
The estimator for the force of infection at age by assuming is as followed
Where
Refer to section 7.1 and 7.2. of the the book by Hens et al. (2012) for further details.
a list of class lp_model with 6 items
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
pi |
fitted locfit object for pi |
eta |
fitted locfit object for eta |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[locfit::locfit()] for more information on the fitted locfit object
df <- mumps_uk_1986_1987 model <- lp_model( df, nn=0.7, kern="tcub" ) plot(model)df <- mumps_uk_1986_1987 model <- lp_model( df, nn=0.7, kern="tcub" ) plot(model)
Fit the antibody level data to a 2-component Gaussian mixture model
mixture_model( antibody_level, breaks = 40, pi = c(0.2, 0.8), mu = c(2, 6), sigma = c(0.5, 1) )mixture_model( antibody_level, breaks = 40, pi = c(0.2, 0.8), mu = c(2, 6), sigma = c(0.5, 1) )
antibody_level |
vector of the corresponding raw antibody level |
breaks |
number of intervals which the antibody_level are grouped into |
pi |
proportion of susceptible, infected |
mu |
a vector of means of component distributions (vector of 2 numbers in ascending order) |
sigma |
a vector of standard deviations of component distributions (vector of 2 number) |
Antibody level (denoted ) is modeled using a 2-component Gaussian
mixture model. Each component () represents the
antibody level of the latent Infected and Susceptible sub-populations, following density
Let denotes the age-dependent mixing probability
(i.e., the true prevalence), the density of the mixture is formulated as
The mean thus equals
From which true prevalence can be computed as
And FOI can then be inferred as
Function [serosv::mixture_model()] fits antibody level data to and
Function [serosv::estimate_mixture()] will then estimate age-specific antibody level
and infer the estimation for and
Refer to section 11.3. of the the book by Hens et al. (2012) for further details.
a list of class mixture_model with the following items
df |
the dataframe used for fitting the model |
info |
list of 3 items parameters, distribution and constraints for the fitted model |
susceptible |
fitted distribution for susceptible |
infected |
fitted distribution for infected |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
df <- vzv_be_2001_2003[vzv_be_2001_2003$age < 40.5,] data <- df$VZVmIUml[order(df$age)] model <- mixture_model(antibody_level = data) model$info plot(model)df <- vzv_be_2001_2003[vzv_be_2001_2003$age < 40.5,] data <- df$VZVmIUml[order(df$age)] model <- mixture_model(antibody_level = data) model$info plot(model)
a large survey of prevalence of antibodies to mumps and rubella viruses in the UK. The survey, covering subjects from 1 to over 65 years of age, provides information on the prevalence of antibody by age
mumps_uk_1986_1987mumps_uk_1986_1987
A data frame with 3 variables:
Midpoint of the age group (e.g. 1.5 = 1-2 years old, 2.5 = 2-3 years old)
Number of seropositive individuals
Total number of individuals surveyed
Morgan-Capner P, Wright J, Miller C L, Miller E. Surveillance of antibody to measles, mumps, and rubella by age. British Medical Journal 1988; 297 :770 doi:10.1136/bmj.297.6651.770
with(mumps_uk_1986_1987, plot(age, pos / tot, cex = 0.1 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )with(mumps_uk_1986_1987, plot(age, pos / tot, cex = 0.1 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )
A seroprevalence survey testing for parvovirus B19 IgG antibody, performed on large representative national serum banks in Belgium, England and Wales, Finland, Italy, and Poland. The sera were collected between 1995 and 2004 and were obtained from residual sera submitted for routine laboratory testing.
parvob19_be_2001_2003parvob19_be_2001_2003
A data frame with 5 variables:
Age of individual
If the individual is seropositive or not
Year surveyed
Gender of individual
Parvo B19 antibody units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
library(dplyr) df <- parvob19_be_2001_2003 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.3 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )library(dplyr) df <- parvob19_be_2001_2003 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.3 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )
A seroprevalence survey testing for parvovirus B19 IgG antibody, performed on large representative national serum banks in Belgium, England and Wales, Finland, Italy, and Poland. The sera were collected between 1995 and 2004 and were obtained from residual sera submitted for routine laboratory testing.
parvob19_ew_1996parvob19_ew_1996
A data frame with 5 variables:
Age of individual
If the individual is seropositive or not
Year surveyed
Gender of individual
Parvo B19 antibody units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_ew_1996 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.3 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_ew_1996 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.3 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )
A seroprevalence survey testing for parvovirus B19 IgG antibody, performed on large representative national serum banks in Belgium, England and Wales, Finland, Italy, and Poland. The sera were collected between 1995 and 2004 and were obtained from residual sera submitted for routine laboratory testing.
parvob19_fi_1997_1998parvob19_fi_1997_1998
A data frame with 5 variables:
Age of individual
If the individual is seropositive or not
Year surveyed
Gender of individual
Parvo B19 antibody units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_fi_1997_1998 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.4 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_fi_1997_1998 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.4 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )
A seroprevalence survey testing for parvovirus B19 IgG antibody, performed on large representative national serum banks in Belgium, England and Wales, Finland, Italy, and Poland. The sera were collected between 1995 and 2004 and were obtained from residual sera submitted for routine laboratory testing.
parvob19_it_2003_2004parvob19_it_2003_2004
A data frame with 5 variables:
Age of individual
If the individual is seropositive or not
Year surveyed
Gender of individual
Parvo B19 antibody units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
library(dplyr) df <- parvob19_it_2003_2004 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )library(dplyr) df <- parvob19_it_2003_2004 %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )
A seroprevalence survey testing for parvovirus B19 IgG antibody, performed on large representative national serum banks in Belgium, England and Wales, Finland, Italy, and Poland. The sera were collected between 1995 and 2004 and were obtained from residual sera submitted for routine laboratory testing.
parvob19_pl_1995_2004parvob19_pl_1995_2004
A data frame with 5 variables:
Age of individual
If the individual is seropositive or not
Year surveyed
Gender of individual
Parvo B19 antibody units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_pl_1995_2004 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )# Note: This figure will look different to that of in the book, since we # believe that the original authors has made some errors in specifying # the sample size of the dots. library(dplyr) df <- parvob19_pl_1995_2004 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.32 * sqrt(df$tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 82), ylim = c(0, 1) )
Monotonize seroprevalence
pava(pos = pos, tot = rep(1, length(pos)))pava(pos = pos, tot = rep(1, length(pos)))
pos |
the positive count vector. |
tot |
the total count vector. |
computed list of 2 items pai1 for original values and pai2 for monotonized value
Fit age-specific seroprevalence to a semi-parametric model where predictor is modeled with penalized splines. The penalized splines can be estimated by either (1) penalized likelihood framework or (2) mixed model framework
penalized_spline_model( data, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", s = "bs", link = "logit", framework = "pl", sp = NULL )penalized_spline_model( data, age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status", s = "bs", link = "logit", framework = "pl", sp = NULL )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR columns for 'age', 'status' (for linelisting data) |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
s |
smoothing basis to use |
link |
link function to use |
framework |
which approach to fit the model ("pl" for penalized likelihood framework, "glmm" for generalized linear mixed model framework) |
sp |
smoothing parameter |
In the semi-parametric model, the predictor is formulated as a penalized spline
with truncated power basis functions of degree
and fixed knots as followed
- Where:
FOI can then be derived by
Where is determined by the link function used in the model
In matrix annotation, the mean structure model for becomes
Where , ,
and are the regression with corresponding design matrices
Under penalized likelihood framework, the model is fitted by maximizing the following likelihood
Where:
- is the predictor
- is a known semi-definite penalty matrix [@Wahba1978], [@Green1993]
- is the response vector
- the unit vector, is determined by the link function used
- is the smoothing parameter (larger values –> smoother curves)
- is the overdispersion parameter and equals 1 if there is no overdispersion
Under the mixed model framework,
the model instead treats the coefficients in the likelihood formulation
as random effects with
Refer to section 8.1 and 8.2 of the the book by Hens et al. (2012) for further details.
a list of class penalized_spline_model with 6 attributes
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
framework |
either pl or glmm |
info |
fitted "gam" model when framework is pl or "gamm" model when framework is glmm |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[mgcv::gam()], [mgcv::gamm()] for more information the fitted gam and gamm model
data <- parvob19_be_2001_2003 data$status <- data$seropositive model <- penalized_spline_model(data, framework="glmm") model$info$gam plot(model)data <- parvob19_be_2001_2003 data$status <- data$seropositive model <- penalized_spline_model(data, framework="glmm") model$info$gam plot(model)
Plot output for corrected_prevalence
plot_corrected_prev(x, y = NULL, facet = FALSE)plot_corrected_prev(x, y = NULL, facet = FALSE)
x |
- the output of 'correct_prevalence()' function |
y |
- another output of 'correct_prevalence()' function (optional, for comparison only) |
facet |
- whether to plot as facets or on the same plot (only when y is provided) |
ggplot object
Refers to section 7.2.
plot_gcv(age, pos, tot, nn_seq, h_seq, kern = "tcub", deg = 2)plot_gcv(age, pos, tot, nn_seq, h_seq, kern = "tcub", deg = 2)
age |
the age vector. |
pos |
the pos vector. |
tot |
the tot vector.#' |
nn_seq |
Nearest neighbor sequence. |
h_seq |
Smoothing parameter sequence. |
kern |
Weight function, default = "tcub". Other choices are "rect", "trwt", "tria", "epan", "bisq" and "gauss". Choices may be restricted when derivatives are required; e.g. for confidence bands and some bandwidth selectors. |
deg |
Degree of polynomial to use. Default: 2. |
plot of gcv value
df <- mumps_uk_1986_1987 plot_gcv( df$age, df$pos, df$tot, nn_seq = seq(0.2, 0.8, by=0.1), h_seq = seq(5, 25, by=1) )df <- mumps_uk_1986_1987 plot_gcv( df$age, df$pos, df$tot, nn_seq = seq(0.2, 0.8, by=0.1), h_seq = seq(5, 25, by=1) )
Visualize standard curves for each plate
plot_standard_curve( x, facet = TRUE, xlab = "log10(concentration)", ylab = "Optical density", datapoint_size = 2 )plot_standard_curve( x, facet = TRUE, xlab = "log10(concentration)", ylab = "Optical density", datapoint_size = 2 )
x |
output of 'to_titer()' |
facet |
whether to faceted by plates or plot all standard curves on a single plot |
xlab |
label of the x axis |
ylab |
label of the y axis |
datapoint_size |
size of the data point (only applicable when 'facet=TRUE') |
Visualize for each sample, whether titer estimates can be computed at the tested dilutions.
plot_titer_qc(x, n_plates = 18, n_samples = 22, n_dilutions = 3)plot_titer_qc(x, n_plates = 18, n_samples = 22, n_dilutions = 3)
x |
output of 'to_titer()' |
n_plates |
maximum number of plates to plot |
n_samples |
maximum number of samples per plate to plot |
n_dilutions |
number of dilutions used for testing |
Each sample is represented by a 'n_estimates x n_dilutions' grid where cell color indicate estimate availability (green = estimate available, orange = result too low, red = result too high)
These sample grids are arranged in columns where each column represent samples from a plate
The figure below demonstrates the interpretation of the plot.
Plot output for age_time_model
## S3 method for class 'age_time_model' plot(x, ...)## S3 method for class 'age_time_model' plot(x, ...)
x |
- a 'age_time_model' object |
... |
arbitrary params. Supported options include:
|
ggplot object
plot() overloading for result of estimate_from_mixture
## S3 method for class 'estimate_from_mixture' plot(x, ...)## S3 method for class 'estimate_from_mixture' plot(x, ...)
x |
the mixture_model |
... |
arbitrary params. |
ggplot object
plot() overloading for Farrington model
## S3 method for class 'farrington_model' plot(x, ...)## S3 method for class 'farrington_model' plot(x, ...)
x |
the Farrington model object. |
... |
arbitrary params. |
ggplot object
plot() overloading for fractional polynomial model
## S3 method for class 'fp_model' plot(x, ...)## S3 method for class 'fp_model' plot(x, ...)
x |
the fractional polynomial model object. |
... |
arbitrary params. |
ggplot object
plot() overloading for hierarchical_bayesian_model
## S3 method for class 'hierarchical_bayesian_model' plot(x, ...)## S3 method for class 'hierarchical_bayesian_model' plot(x, ...)
x |
hierarchical_bayesian_model object created by serosv. |
... |
arbitrary params. |
ggplot object
plot() overloading for local polynomial model
## S3 method for class 'lp_model' plot(x, ...)## S3 method for class 'lp_model' plot(x, ...)
x |
the local polynomial model object. |
... |
arbitrary params. |
ggplot object
plot() overloading for mixture model
## S3 method for class 'mixture_model' plot(x, ...)## S3 method for class 'mixture_model' plot(x, ...)
x |
the mixture_model |
... |
arbitrary params. |
ggplot object
plot() overloading for penalized spline
## S3 method for class 'penalized_spline_model' plot(x, ...)## S3 method for class 'penalized_spline_model' plot(x, ...)
x |
the penalized_spline_model object |
... |
arbitrary params. |
ggplot object
plot() overloading for polynomial model
## S3 method for class 'polynomial_model' plot(x, ...)## S3 method for class 'polynomial_model' plot(x, ...)
x |
the polynomial model object |
... |
arbitrary params. |
ggplot object
plot() overloading for Weibull model
## S3 method for class 'weibull_model' plot(x, ...)## S3 method for class 'weibull_model' plot(x, ...)
x |
the Weibull model object. |
... |
arbitrary params. |
ggplot object
Fit age-stratified seroprevalence data to serocatalytic models formulated as polynomials.
polynomial_model( data, k, link = "log", age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )polynomial_model( data, k, link = "log", age_col = "age", pos_col = "pos", tot_col = "tot", status_col = "status" )
data |
the input data frame, must either have columns for 'age', 'pos', 'tot' (for aggregated data) OR 'age', 'status' (for linelisting data) |
k |
degree of the polynomial. (k=1 for Muench model, k=2 for Griffith model, k=3 for Grenfell model). |
link |
link function (default link="log"). |
age_col |
name of the 'age' column (default age_col="age"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
The seroprevalence is assumed to follow the general format
Which implies the force of infection to be
Where:
- is the seroprevalence at age
- is the variable age
- is the degree of the polynomial
The seroprevalence is fitted using a GLM with log link with
the linear predictor
Muench (1934) model is equivalent to a degree 1 () linear predictor
Griffith model is equivalent to a degree 2 () linear predictor
Grenfell & Anderson (1985) suggested a higher order polynomials ()
Refer to section 6.1.1. of the the book by Hens et al. (2012) for further details.
a list of class polynomial_model with 5 items
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
info |
fitted "glm" object |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
Grenfell, B. T., and R. M. Anderson. 1985. “The Estimation of Age-Related Rates of Infection from Case Notifications and Serological Data.” The Journal of Hygiene 95 (2): 419–36. doi:10.1017/s0022172400062859.
Muench, Hugo. 1934. “Derivation of Rates from Summation Data by the Catalytic Curve.” Journal of the American Statistical Association 29 (185): 25–38. doi:10.1080/01621459.1934.10502684.
data <- parvob19_fi_1997_1998[order(parvob19_fi_1997_1998$age), ] aggregated <- transform_data(data, stratum_col = "age", status_col="seropositive") # fit with aggregated data model <- polynomial_model(aggregated, k = 1) # fit with linelisting data model <- polynomial_model(data, status_col = "seropositive", k = 1) plot(model)data <- parvob19_fi_1997_1998[order(parvob19_fi_1997_1998$age), ] aggregated <- transform_data(data, stratum_col = "age", status_col="seropositive") # fit with aggregated data model <- polynomial_model(aggregated, k = 1) # fit with linelisting data model <- polynomial_model(data, status_col = "seropositive", k = 1) plot(model)
Predict from the age_time_mdoel
## S3 method for class 'age_time_model' predict(object, newdata, modtype = "monotonized", ...)## S3 method for class 'age_time_model' predict(object, newdata, modtype = "monotonized", ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
modtype |
either "monotonized" (to predict using monotonized model) or "non-monotonized" |
... |
arbitrary argument |
confidence interval dataframe with n_group x 3 cols, the columns are 'group', 'sp_df', 'foi_df'
Prediction for serosv Farrington model
## S3 method for class 'farrington_model' predict(object, newdata = NULL, ...)## S3 method for class 'farrington_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
Prediction for serosv fractional polynomial model
## S3 method for class 'fp_model' predict(object, newdata = NULL, ...)## S3 method for class 'fp_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
[stats::predict.glm()] for more information on the predict function
Predict from an hierarchical bayesian model
## S3 method for class 'hierarchical_bayesian_model' predict(object, newdata = NULL, ...)## S3 method for class 'hierarchical_bayesian_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary arguments |
list of confidence interval for seroprevalence and foi. Each confidence interval dataframe with 4 variables, x and y for the fitted values and ymin and ymax for the confidence interval
Prediction for serosv local polynomial model
## S3 method for class 'lp_model' predict(object, newdata = NULL, ...)## S3 method for class 'lp_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
Prediction for serosv penalized spline model
## S3 method for class 'penalized_spline_model' predict(object, newdata = NULL, ...)## S3 method for class 'penalized_spline_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
[mgcv::predict.gam()] for more information on the predict function
A wrapper of predict.glm for direct prediction from polynomial_model object
## S3 method for class 'polynomial_model' predict(object, newdata = NULL, ...)## S3 method for class 'polynomial_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
[stats::predict.glm()] for more information on the predict function
Prediction for serosv Weibull model
## S3 method for class 'weibull_model' predict(object, newdata = NULL, ...)## S3 method for class 'weibull_model' predict(object, newdata = NULL, ...)
object |
serosv models |
newdata |
data.frame with age column to generate prediction |
... |
arbitrary argument |
prediction output
[stats::predict.glm()] for more information on the predict function
Rubella - Mumps data from the UK (aggregated)
rubella_mumps_ukrubella_mumps_uk
A data frame with 5 variables:
Age group
Number of individuals negative to rubella and mumps
Number of individuals negative to rubella and positive to mumps
Number of individuals positive to rubella and negative to mumps
Number of individuals positive to rubella and mumps
Morgan-Capner P, Wright J, Miller C L, Miller E. Surveillance of antibody to measles, mumps, and rubella by age. British Medical Journal 1988; 297 :770 doi:10.1136/bmj.297.6651.770
Prevalence of rubella in the UK, obtained from a large survey of prevalence of antibodies to both mumps and rubella viruses.
rubella_uk_1986_1987rubella_uk_1986_1987
A data frame with 3 variables:
Midpoint of the age group (e.g. 1.5 = 1-2 years old, 2.5 = 2-3 years old)
Number of seropositive individuals
Total number of individuals surveyed
Morgan-Capner P, Wright J, Miller C L, Miller E. Surveillance of antibody to measles, mumps, and rubella by age. British Medical Journal 1988; 297 :770 doi:10.1136/bmj.297.6651.770
with(rubella_uk_1986_1987, plot(age, pos / tot, cex = 0.2 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )with(rubella_uk_1986_1987, plot(age, pos / tot, cex = 0.2 * sqrt(tot), pch = 16, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )
Helper to adjust styling of a plot
set_plot_style( sero = "blueviolet", ci = "royalblue1", foi = "#fc0328", sero_line = "solid", foi_line = "dashed", xlabel = "Age" )set_plot_style( sero = "blueviolet", ci = "royalblue1", foi = "#fc0328", sero_line = "solid", foi_line = "dashed", xlabel = "Age" )
sero |
color for seroprevalence line |
ci |
color for confidence interval |
foi |
color for force of infection line |
sero_line |
linetype for seroprevalence line |
foi_line |
linetype for force of infection line |
xlabel |
x label |
list of updated aesthetic values
Validate and prepare raw serological test results for use with 'to_titer()'
standardize_data( df, plate_id_col = "PLATE_ID", id_col = "SAMPLE_ID", result_col = "RESULT", dilution_fct_col = "DILUTION_FACTORS", antitoxin_label = "Anti_toxin", negative_col = "^NEGATIVE_*" )standardize_data( df, plate_id_col = "PLATE_ID", id_col = "SAMPLE_ID", result_col = "RESULT", dilution_fct_col = "DILUTION_FACTORS", antitoxin_label = "Anti_toxin", negative_col = "^NEGATIVE_*" )
df |
data.frame with columns for plate id, sample id, result, dilution factor, and (optionally) negative controls |
plate_id_col |
name of the column storing plates id |
id_col |
name of the column storing sample id |
result_col |
name of the column storing result |
dilution_fct_col |
name of the column storing dilution factors |
antitoxin_label |
how antitoxin is label in the sample id column |
negative_col |
regex for columns for negative controls, assumed to be a label followed by the dilution factor (e.g. NEGATIVE_50, NEGATIVE_100) |
a standardized data.frame that can be passed to 'to_titer()'
A study of tuberculosis conducted in the Netherlands. Schoolchildren, aged between 6 and 18 years, were tested using the tuberculin skin test.
tb_nl_1966_1973tb_nl_1966_1973
A data frame with 5 variables:
Age group
Number of seropositive individuals
Total number of individuals surveyed
Gender of cohort (unsure what 0 and 1 means)
Birth year of cohort
Nagelkerke, N., Heisterkamp, S., Borgdorff, M., Broekmans, J. and Van Houwelingen, H. (1999), Semi-parametric estimation of age-time specific infection incidence from serial prevalence data. Statist. Med., 18: 307-320. doi:10.1002/(SICI)1097-0258(19990215)18:3<307::AID-SIM15>3.0.CO;2-Z
with(tb_nl_1966_1973, plot(age, pos / tot, pch = 16, cex = 0.01 * sqrt(tot), xlab = "age", ylab = "prevalence", xlim = c(6, 18) ) ) with(tb_nl_1966_1973, plot(birthyr, pos / tot, pch = 16, cex = 0.01 * sqrt(tot), xlab = "year", ylab = "prevalence" ) )with(tb_nl_1966_1973, plot(age, pos / tot, pch = 16, cex = 0.01 * sqrt(tot), xlab = "age", ylab = "prevalence", xlim = c(6, 18) ) ) with(tb_nl_1966_1973, plot(birthyr, pos / tot, pch = 16, cex = 0.01 * sqrt(tot), xlab = "year", ylab = "prevalence" ) )
to_titer() converts raw assay readings (e.g., OD, fluorescence intensity) to titer by fitting a calibrating model
to_titer( df, model = "4PL", positive_threshold = NULL, ci = 0.95, negative_control = TRUE )to_titer( df, model = "4PL", positive_threshold = NULL, ci = 0.95, negative_control = TRUE )
df |
a standardized data.frame returned by'standardize_data()' |
model |
either:
|
positive_threshold |
if not NULL, processed_data will have the serostatus labeled |
ci |
confidence interval for the titer estimates (default is .95 i.e., 95% CI) |
negative_control |
if TRUE, output tibble will include the result for negative controls |
a data.frame with 8 columns
plate_id |
id of the plate |
data |
list of 'data.frame's containing the raw sample results from each plate |
antitoxin_df |
list of 'data.frame's containing the raw results for antitoxins from each plate |
standard_curve_func |
list of functions mapping from assay reading to titer for each plate |
std_crv_midpoint |
midpoint of the standard curve, for qualitative analysis |
processed_data |
list of 'tibble's containing samples with titer estimates (lower, median, upper) |
negative_control |
list of 'tibble's containing negative control check results (if 'negative_control=TRUE') |
Generate a dataframe with 't', 'pos' and 'tot' columns from 't' and 'seropositive' vectors.
transform_data(data, stratum_col = "age", status_col = "status")transform_data(data, stratum_col = "age", status_col = "status")
data |
a data frame with columns for age and serostatus |
stratum_col |
name of the column to stratify by (default to "age") |
status_col |
name of the column for serostatus |
dataframe in aggregated format
df <- hcv_be_2006 hcv_df <- transform_data(df, stratum_col="dur", status_col="seropositive") hcv_dfdf <- hcv_be_2006 hcv_df <- transform_data(df, stratum_col="dur", status_col="seropositive") hcv_df
Age-specific seroprevalence of VZV antibodies, assessed in Flanders (Belgium) between October 1999 and April 2000. This population was stratified by age in order to obtain approximately 100 observations per age group.
vzv_be_1999_2000vzv_be_1999_2000
A data frame with 3 variables:
Age group
Number of seropositive individuals
Total number of individuals surveyed
Thiry, N., Beutels, P., Shkedy, Z. et al. The seroepidemiology of primary varicella-zoster virus infection in Flanders (Belgium). Eur J Pediatr 161, 588-593 (2002). doi:10.1007/s00431-002-1053-2
with(vzv_be_1999_2000, plot(age, pos / tot, cex = 0.1 * sqrt(tot), pch = 19, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )with(vzv_be_1999_2000, plot(age, pos / tot, cex = 0.1 * sqrt(tot), pch = 19, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) ) )
The survey is the same as the one used to study the seroprevalence of parvovirus B19 in Belgium, as described above.
vzv_be_2001_2003vzv_be_2001_2003
A data frame with 4 variables:
Age of individual
If the individual is seropositive or not
Gender of individual
VZV milli international units per ml
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
library(dplyr) df <- vzv_be_2001_2003 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.1 * sqrt(df$tot), pch = 19, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) )library(dplyr) df <- vzv_be_2001_2003 %>% mutate(age = round(age)) %>% group_by(age) %>% summarise(pos = sum(seropositive), tot = n()) plot(df$age, df$pos / df$tot, cex = 0.1 * sqrt(df$tot), pch = 19, xlab = "age", ylab = "seroprevalence", xlim = c(0, 45), ylim = c(0, 1) )
VZV and Parvovirus B19 serological data in Belgium (line listing)
vzv_parvo_bevzv_parvo_be
A data frame with 7 variables:
ID of individual
Age of individual
Gender of individual
Parvo B19 antibody units per ml
If an individual is positive for parvovirus B19
VZV milli international units per ml
If an individual is positive for VZV
MOSSONG, J., N. HENS, V. FRIEDERICHS, I. DAVIDKIN, M. BROMAN, B. LITWINSKA, J. SIENNICKA, et al. "Parvovirus B19 Infection in Five European Countries: Seroepidemiology, Force of Infection and Maternal Risk of Infection." Epidemiology and Infection 136, no. 8 (2008): 1059-68. doi:10.1017/S0950268807009661
Model seroprevalence as a function of duration since vaccination using the Weibull model, where the force of infection is assumed to vary monotonically with duration.
weibull_model( data, t_lab = "t", pos_col = "pos", tot_col = "tot", status_col = "status" )weibull_model( data, t_lab = "t", pos_col = "pos", tot_col = "tot", status_col = "status" )
data |
the input data frame, must either have columns for 't', 'pos', 'tot' (for aggregated data) OR 't', 'status' (for linelisting data) |
t_lab |
name of the 't' column (default t_lab="t"). |
pos_col |
name of the 'pos' column (default pos_col="pos"). |
tot_col |
name of the 'tot' column (default tot_col="tot"). |
status_col |
name of the 'status' column (default status_col="status"). |
For a Weibull model, the prevalence is given by
Where is exposure time (difference between age of vaccination and age at test)
Which implies the force of infection to be the monotonic function
Refer to section 6.1.2. of the the book by Hens et al. (2012) for further details.
list of class weibull_model with the following items
datatype |
type of datatype used for model fitting (aggregated or linelisting) |
df |
the dataframe used for fitting the model |
info |
fitted "glm" object |
sp |
seroprevalence |
foi |
force of infection |
Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. tatistics for Biology and Health. Springer New York. doi:10.1007/978-1-4614-4072-7.
[stats::glm()] for more information on the fitted "glm" object
df <- hcv_be_2006[order(hcv_be_2006$dur), ] df$t <- df$dur df$status <- df$seropositive model <- weibull_model(df, t_lab="dur", status_col="seropositive") plot(model)df <- hcv_be_2006[order(hcv_be_2006$dur), ] df$t <- df$dur df$status <- df$seropositive model <- weibull_model(df, t_lab="dur", status_col="seropositive") plot(model)