Hierarchical Bayesian models

library(serosv)

Parametric Bayesian framework

Currently, serosv only has models under parametric Bayesian framework

Proposed approach

Prevalence has a parametric form π(ai, α) where α is a parameter vector

One can constraint the parameter space of the prior distribution P(α) in order to achieve the desired monotonicity of the posterior distribution P(π1, π2, ..., πm|y, n)

Where:

  • n = (n1, n2, ..., nm) and ni is the sample size at age ai
  • y = (y1, y2, ..., ym) and yi is the number of infected individual from the ni sampled subjects

Farrington

Refer to Chapter 10.3.1

Proposed model

The model for prevalence is as followed

$$ \pi (a) = 1 - exp\{ \frac{\alpha_1}{\alpha_2}ae^{-\alpha_2 a} + \frac{1}{\alpha_2}(\frac{\alpha_1}{\alpha_2} - \alpha_3)(e^{-\alpha_2 a} - 1) -\alpha_3 a \} $$

For likelihood model, independent binomial distribution are assumed for the number of infected individuals at age ai

yi ∼ Bin(ni, πi),  for i = 1, 2, 3, ...m

The constraint on the parameter space can be incorporated by assuming truncated normal distribution for the components of α, α = (α1, α2, α3) in πi = π(ai, α)

αj ∼ truncated 𝒩(μj, τj),  j = 1, 2, 3

The joint posterior distribution for α can be derived by combining the likelihood and prior as followed

$$ P(\alpha|y) \propto \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \prod^3_{i=1}-\frac{1}{\tau_j}\text{exp}(\frac{1}{2\tau^2_j} (\alpha_j - \mu_j)^2) $$

  • Where the flat hyperprior distribution is defined as followed:

    • μj ∼ 𝒩(0, 10000)

    • τj−2 ∼ Γ(100, 100)

The full conditional distribution of αi is thus $$ P(\alpha_i|\alpha_j,\alpha_k, k, j \neq i) \propto -\frac{1}{\tau_i}\text{exp}(\frac{1}{2\tau^2_i} (\alpha_i - \mu_i)^2) \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) $$

Fitting data

To fit Farrington model, use hierarchical_bayesian_model() and define type = "far2" or type = "far3" where

  • type = "far2" refers to Farrington model with 2 parameters (α3 = 0)

  • type = "far3" refers to Farrington model with 3 parameters (α3 > 0)

df <- mumps_uk_1986_1987
model <- hierarchical_bayesian_model(age = df$age, pos = df$pos, tot = df$tot, type="far3")
#> 
#> SAMPLING FOR MODEL 'fra_3' NOW (CHAIN 1).
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: 
#> Chain 1: Gradient evaluation took 4.9e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.49 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 16.491 seconds (Warm-up)
#> Chain 1:                72.793 seconds (Sampling)
#> Chain 1:                89.284 seconds (Total)
#> Chain 1:
#> Warning: There were 211 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: There were 1191 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
#> https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
#> Warning: Examine the pairs() plot to diagnose sampling problems

model$info
#>                       mean      se_mean           sd           2.5%
#> alpha1        4.449809e+74 2.878431e+74 1.643811e+76   1.547477e+01
#> alpha2        6.444939e+75 5.570341e+75 2.741063e+77   5.660714e+03
#> alpha3        1.720783e-01 9.236575e-05 2.968328e-03   1.663489e-01
#> tau_alpha1    2.048404e-02 6.584981e-03 1.781424e-01   2.195344e-92
#> tau_alpha2    2.833880e-06 2.669611e-06 9.236579e-05  3.313683e-140
#> tau_alpha3    7.049813e+00 7.660644e-01 2.432968e+01   1.508735e-05
#> mu_alpha1     8.869548e+00 3.409819e+00 9.718034e+01  -1.868570e+02
#> mu_alpha2     1.679071e+00 3.384614e+00 1.021654e+02  -2.074010e+02
#> mu_alpha3    -8.929460e-01 1.910311e+00 2.646694e+01  -6.554132e+01
#> sigma_alpha1  6.490047e+74 4.853921e+74 2.863878e+76   2.921112e+00
#> sigma_alpha2  7.674661e+75 6.502497e+75 3.314432e+77   9.090962e+03
#> sigma_alpha3  5.261952e+01 1.425853e+01 5.172260e+02   1.192143e-01
#> lp__         -2.657165e+03 2.018145e-01 3.201023e+00  -2.663914e+03
#>                        25%           50%           75%         97.5%     n_eff
#> alpha1        7.185393e+03  1.109143e+09  1.324032e+17  7.795475e+45 3261.3100
#> alpha2        6.044549e+11  1.812547e+21  7.236422e+35  3.426315e+69 2421.4446
#> alpha3        1.700336e-01  1.719750e-01  1.741919e-01  1.777599e-01 1032.7655
#> tau_alpha1    2.472892e-35  1.152570e-19  4.656607e-09  1.172000e-01  731.8561
#> tau_alpha2    1.614807e-73  5.456231e-44  4.899085e-25  1.210013e-08 1197.0890
#> tau_alpha3    2.379951e-03  9.443712e-02  2.618162e+00  7.036289e+01 1008.6542
#> mu_alpha1    -5.455842e+01  1.303527e+01  7.333611e+01  1.871266e+02  812.2582
#> mu_alpha2    -6.574291e+01  8.199330e-01  6.927783e+01  2.085530e+02  911.1489
#> mu_alpha3    -1.669513e+00  1.697070e-01  1.798237e+00  4.968048e+01  191.9549
#> sigma_alpha1  1.465545e+04  2.946399e+09  2.010933e+17  6.890293e+45 3481.1585
#> sigma_alpha2  1.428726e+12  4.289575e+21  2.488950e+36  5.493787e+69 2598.1116
#> sigma_alpha3  6.180236e-01  3.254083e+00  2.049822e+01  2.574732e+02 1315.8638
#> lp__         -2.659285e+03 -2.656942e+03 -2.654870e+03 -2.651406e+03  251.5782
#>                   Rhat
#> alpha1       0.9997812
#> alpha2       1.0000255
#> alpha3       0.9997180
#> tau_alpha1   1.0005484
#> tau_alpha2   1.0005600
#> tau_alpha3   1.0049987
#> mu_alpha1    1.0016656
#> mu_alpha2    1.0007662
#> mu_alpha3    1.0010564
#> sigma_alpha1 0.9998103
#> sigma_alpha2 0.9999960
#> sigma_alpha3 1.0001212
#> lp__         1.0059378
plot(model)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.

Log-logistic

Proposed approach

The model for seroprevalence is as followed

$$ \pi(a) = \frac{\beta a^\alpha}{1 + \beta a^\alpha}, \text{ } \alpha, \beta > 0 $$

The likelihood is specified to be the same as Farrington model (yi ∼ Bin(ni, πi)) with

logit(π(a)) = α2 + α1log (a)

  • Where α2 = log(β)

The prior model of α1 is specified as α1 ∼ truncated 𝒩(μ1, τ1) with flat hyperprior as in Farrington model

β is constrained to be positive by specifying α2 ∼ 𝒩(μ2, τ2)

The full conditional distribution of α1 is thus

$$ P(\alpha_1|\alpha_2) \propto -\frac{1}{\tau_1} \text{exp} (\frac{1}{2 \tau_1^2} (\alpha_1 - \mu_1)^2) \prod_{i=1}^m \text{Bin}(y_i|n_i,\pi(a_i, \alpha_1, \alpha_2) ) $$

And α2 can be derived in the same way

Fitting data

To fit Log-logistic model, use hierarchical_bayesian_model() and define type = "log_logistic"

df <- rubella_uk_1986_1987
model <- hierarchical_bayesian_model(age = df$age, pos = df$pos, tot = df$tot, type="log_logistic")
#> 
#> SAMPLING FOR MODEL 'log_logistic' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 1.1e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.607 seconds (Warm-up)
#> Chain 1:                0.567 seconds (Sampling)
#> Chain 1:                1.174 seconds (Total)
#> Chain 1:
#> Warning: There were 549 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess

model$type
#> [1] "log_logistic"
plot(model)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.