Semiparametric model

library(serosv)

Penalized splines

Proposed model

Penalized splines

A general model relating the prevalence to age can be written as a GLM

g(P(Y_i = 1|a_i)) = g(π(a_i)) = η(a_i)

Where g is the link function and η is the linear predictor

The linear predictor can be estimated semi-parametrically using penalized spline with truncated power basis functions of degree p and fixed knots κ₁, ..., κ_k as followed

η(a_i) = β₀ + β₁a_i + ... + β_pa_i^p + Σ_k = 1^ku_k(a_i − κ_k)₊^p

Where

$$ (a_i - \kappa_k)^p_+ = \begin{cases} 0, & a_i \le \kappa_k \\ (a_i - \kappa_k)^p, & a_i > \kappa_k \end{cases} $$

In matrix notation, the mean structure model for η(a_i) becomes

η = Xβ + Zu

Where η = [η(a_i)...η(a_N)]^T, β = [β₀β₁....β_p]^T, and u = [u₁u₂...u_k]^T are the regression with corresponding design matrices

$$ X = \begin{bmatrix} 1 & a_1 & a_1^2 & ... & a_1^p \\ 1 & a_2 & a_2^2 & ... & a_2^p \\ \vdots & \vdots & \vdots & \dots & \vdots \\ 1 & a_N & a_N^2 & ... & a_N^p \end{bmatrix}, Z = \begin{bmatrix} (a_1 - \kappa_1 )_+^p & (a_1 - \kappa_2 )_+^p & \dots & (a_1 - \kappa_k)_+^p \\ (a_2 - \kappa_1 )_+^p & (a_2 - \kappa_2 )_+^p & \dots & (a_2 - \kappa_k)_+^p \\ \vdots & \vdots & \dots & \vdots \\ (a_N - \kappa_1 )_+^p & (a_N - \kappa_2 )_+^p & \dots & (a_N - \kappa_k)_+^p \end{bmatrix} $$

FOI can then be derived as

$$ \hat{\lambda}(a_i) = [\hat{\beta_1} , 2\hat{\beta_2}a_i, ..., p \hat{\beta} a_i ^{p-1} + \Sigma^k_{k=1} p \hat{u}_k(a_i - \kappa_k)^{p-1}_+] \delta(\hat{\eta}(a_i)) $$

Where δ(.) is determined by the link function use in the model

Penalized likelihood framework

Refer to Chapter 8.2.1

Proposed approach

A first approach to fit the model is by maximizing the following penalized likelihood

Where:

Xβ + Zu is the linear predictor
D is a known semi-definite penalty matrix (Wahba 1978), (Green and Silverman 1993)
y is the response vector
1 the unit vector, c(.) is determined by the link function used
λ is the smoothing parameter (larger values –> smoother curves)
ϕ is the overdispersion parameter and equals 1 if there is no overdispersion

Fitting data

To fit the data using the penalized likelihood framework, specify framework = "pl"

Basis function can be defined via the s parameter, some values for s includes:

"tp" thin plate regression splines
"cr" cubic regression splines
"ps" P-splines proposed by (Eilers and Marx 1996)
"ad" for Adaptive smoothers

For more options, refer to the mgcv documentation (Wood 2017)

data <- parvob19_be_2001_2003
pl <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "pl") 
pl$info
#> 
#> Family: binomial 
#> Link function: logit 
#> 
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#> 
#> Estimated degrees of freedom:
#> 6.16  total = 7.16 
#> 
#> UBRE score: 0.1206458

plot(pl)

Generalized Linear Mixed Model framework

Refer to Chapter 8.2.2

Proposed approach

Looking back at @ref(eq:penlikelihood), a constraint for u would be Σ_ku_k² < C for some positive value C

This is equivalent to choosing (β, u) to maximise @ref(eq:penlikelihood) with D = diag(0, 1) where 0 denotes zero vector length p + 1 and 1 denotes the unit vector of length K

For a fixed value for λ this is equivalent to fitting the following generalized linear mixed model Ngo and Wand (2004)

$$ f(y|u) = exp\{ \phi^{-1} [y^T(X\beta + Zu) - c(X\beta + Zu)] + 1^Tc(y)\},\\ u \sim N(0, G) $$

With similar notations as before and G = σ_u²I_K × K

Thus Z is penalized by assuming the corresponding coefficients u are random effect with u ∼ N(0, σ_u²I).

Fitting data

To fit the data using the penalized likelihood framework, specify framework = "glmm"

data <- parvob19_be_2001_2003
glmm <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "glmm") 
#> 
#>  Maximum number of PQL iterations:  20
#> iteration 1
#> iteration 2
#> iteration 3
#> iteration 4
glmm$info$gam
#> 
#> Family: binomial 
#> Link function: logit 
#> 
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#> 
#> Estimated degrees of freedom:
#> 6.45  total = 7.45

plot(glmm)

Eilers, Paul H. C., and Brian D. Marx. 1996. “Flexible Smoothing with b-Splines and Penalties.” Statistical Science 11 (2). https://doi.org/10.1214/ss/1038425655.

Green, P. J., and Bernard. W. Silverman. 1993. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman; Hall/CRC. https://doi.org/10.1201/b15710.

Ngo, Long, and Matthew P. Wand. 2004. “Smoothing with Mixed Model Software.” Journal of Statistical Software 9 (1). https://doi.org/10.18637/jss.v009.i01.

Ruppert, David, M. P. Wand, and R. J. Carroll. 2003. Semiparametric Regression. Cambridge University Press. https://doi.org/10.1017/cbo9780511755453.

Wahba, Grace. 1978. “Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression.” Journal of the Royal Statistical Society Series B: Statistical Methodology 40 (3): 364–72. https://doi.org/10.1111/j.2517-6161.1978.tb01050.x.

Wand, M. P. 2003. “Smoothing and Mixed Models.” Computational Statistics 18 (2): 223–49. https://doi.org/10.1007/s001800300142.

Wood, Simon N. 2017. Generalized Additive Models: An Introduction with r. Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279.

- Penalized splines