Imperfect serological test

library(serosv)
library(ggplot2)

Imperfect test

Function correct_prevalence() is used for estimating the true prevalence if the serological test used is imperfect

Arguments:

  • age the age vector

  • pos the positive count vector (optional if status is provided).

  • tot the total count vector (optional if status is provided).

  • status the serostatus vector (optional if pos & tot are provided).

  • init_se sensitivity of the serological test (default value 0.95)

  • init_sp specificity of the serological test (default value 0.8)

  • study_size_se sample size for sensitivity validation study (default value 1000)

  • study_size_sp sample size for specificity validation study (default value 1000)

  • chains number of Markov chains (default to 1)

  • warmup number of warm up runs (default value 1000)

  • iter number of iterations (default value 2000)

The function will return a list of 2 items:

  • info contains estimated values for se, sp and corrected seroprevalence

  • corrected_sero return a data.frame with age, sero (corrected sero) and pos, tot (adjusted based on corrected prevalence)

# estimate real prevalence
data <- rubella_uk_1986_1987
output <- correct_prevalence(data$age, pos = data$pos, tot = data$tot, warmup = 1000, iter = 4000, init_se=0.85, init_sp = 0.8, study_size_se=1000, study_size_sp=3000)
#> 
#> SAMPLING FOR MODEL 'prevalence_correction' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 2.3e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 4000 [  0%]  (Warmup)
#> Chain 1: Iteration:  400 / 4000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  800 / 4000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1001 / 4000 [ 25%]  (Sampling)
#> Chain 1: Iteration: 1400 / 4000 [ 35%]  (Sampling)
#> Chain 1: Iteration: 1800 / 4000 [ 45%]  (Sampling)
#> Chain 1: Iteration: 2200 / 4000 [ 55%]  (Sampling)
#> Chain 1: Iteration: 2600 / 4000 [ 65%]  (Sampling)
#> Chain 1: Iteration: 3000 / 4000 [ 75%]  (Sampling)
#> Chain 1: Iteration: 3400 / 4000 [ 85%]  (Sampling)
#> Chain 1: Iteration: 3800 / 4000 [ 95%]  (Sampling)
#> Chain 1: Iteration: 4000 / 4000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.454 seconds (Warm-up)
#> Chain 1:                0.858 seconds (Sampling)
#> Chain 1:                1.312 seconds (Total)
#> Chain 1:

# check fitted value 
output$info[1:2, ]
#>             mean      se_mean          sd      2.5%       25%       50%
#> est_se 0.9027715 1.079489e-04 0.006406352 0.8896699 0.8985944 0.9028616
#> est_sp 0.8026270 8.917098e-05 0.006748365 0.7895613 0.7980551 0.8026941
#>              75%     97.5%    n_eff      Rhat
#> est_se 0.9072652 0.9151745 3521.965 0.9997603
#> est_sp 0.8072461 0.8155073 5727.302 1.0002271

# compare original prevalence and corrected prevalence
ggplot()+
  geom_point(aes(x = data$age, y = data$pos/data$tot, color="apparent prevalence")) + 
  geom_point(aes(x = output$corrected_se$age, y = output$corrected_se$sero, color="esimated prevalence" )) +
  scale_color_manual(
    values = c("apparent prevalence" = "royalblue1", "esimated prevalence" = "blueviolet")
  )+ 
  labs(x = "Age", y = "Prevalence")

Fitting corrected data

Data after seroprevalence correction

suppressWarnings(
  corrected_data <- farrington_model(
  age = output$corrected_se$age, pos = output$corrected_se$pos, tot = output$corrected_se$tot,
  start=list(alpha=0.07,beta=0.1,gamma=0.03))
)

plot(corrected_data)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.

Original data

suppressWarnings(
  original_data <- farrington_model(
  age = data$age, pos = data$pos, tot = data$tot,
  start=list(alpha=0.07,beta=0.1,gamma=0.03))
)
plot(original_data)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.