Function correct_prevalence()
is used for estimating the
true prevalence if the serological test used is imperfect
Arguments:
age
the age vector
pos
the positive count vector (optional if status is
provided).
tot
the total count vector (optional if status is
provided).
status
the serostatus vector (optional if pos &
tot are provided).
init_se
sensitivity of the serological test (default
value 0.95
)
init_sp
specificity of the serological test (default
value 0.8
)
study_size_se
sample size for sensitivity validation
study (default value 1000
)
study_size_sp
sample size for specificity validation
study (default value 1000
)
chains
number of Markov chains (default to
1
)
warmup
number of warm up runs (default value
1000
)
iter
number of iterations (default value
2000
)
The function will return a list of 2 items:
info
contains estimated values for se, sp and
corrected seroprevalence
corrected_sero
return a data.frame with
age
, sero
(corrected sero) and
pos
, tot
(adjusted based on corrected
prevalence)
# estimate real prevalence
data <- rubella_uk_1986_1987
output <- correct_prevalence(data$age, pos = data$pos, tot = data$tot, warmup = 1000, iter = 4000, init_se=0.85, init_sp = 0.8, study_size_se=1000, study_size_sp=3000)
#>
#> SAMPLING FOR MODEL 'prevalence_correction' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 2.3e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
#> Chain 1: Iteration: 400 / 4000 [ 10%] (Warmup)
#> Chain 1: Iteration: 800 / 4000 [ 20%] (Warmup)
#> Chain 1: Iteration: 1001 / 4000 [ 25%] (Sampling)
#> Chain 1: Iteration: 1400 / 4000 [ 35%] (Sampling)
#> Chain 1: Iteration: 1800 / 4000 [ 45%] (Sampling)
#> Chain 1: Iteration: 2200 / 4000 [ 55%] (Sampling)
#> Chain 1: Iteration: 2600 / 4000 [ 65%] (Sampling)
#> Chain 1: Iteration: 3000 / 4000 [ 75%] (Sampling)
#> Chain 1: Iteration: 3400 / 4000 [ 85%] (Sampling)
#> Chain 1: Iteration: 3800 / 4000 [ 95%] (Sampling)
#> Chain 1: Iteration: 4000 / 4000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.454 seconds (Warm-up)
#> Chain 1: 0.858 seconds (Sampling)
#> Chain 1: 1.312 seconds (Total)
#> Chain 1:
# check fitted value
output$info[1:2, ]
#> mean se_mean sd 2.5% 25% 50%
#> est_se 0.9027715 1.079489e-04 0.006406352 0.8896699 0.8985944 0.9028616
#> est_sp 0.8026270 8.917098e-05 0.006748365 0.7895613 0.7980551 0.8026941
#> 75% 97.5% n_eff Rhat
#> est_se 0.9072652 0.9151745 3521.965 0.9997603
#> est_sp 0.8072461 0.8155073 5727.302 1.0002271
# compare original prevalence and corrected prevalence
ggplot()+
geom_point(aes(x = data$age, y = data$pos/data$tot, color="apparent prevalence")) +
geom_point(aes(x = output$corrected_se$age, y = output$corrected_se$sero, color="esimated prevalence" )) +
scale_color_manual(
values = c("apparent prevalence" = "royalblue1", "esimated prevalence" = "blueviolet")
)+
labs(x = "Age", y = "Prevalence")
Data after seroprevalence correction
suppressWarnings(
corrected_data <- farrington_model(
age = output$corrected_se$age, pos = output$corrected_se$pos, tot = output$corrected_se$tot,
start=list(alpha=0.07,beta=0.1,gamma=0.03))
)
plot(corrected_data)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
Original data