Addressing survey fatigue bias in longitudinal social contact studies to improve pandemic preparedness

Categories: Disease & Virus

May 25, 2025

The COVIMOD study

The COVIMOD study is a repeated survey from April 2020 to December 2021 comprising 33 waves. Participants were recruited through email invitations sent to existing panel members of the online market research platform IPSOS i-Say³⁹. To ensure the sample’s broad representativeness of the German population, quota sampling was conducted by age, sex, and region. A subset of adult participants living with children under the age of 18 were selected to answer the survey on behalf of their children. Similar to the CoMix study^6,13, individuals were invited to participate in multiple waves to track changes in behaviour and attitude toward COVID-19 across time. When the participant size did not meet the sampling quota due to dropout, new participants were recruited into the study.

The COVIMOD questionnaire was based on the questionnaire of the CoMix study^6,13, and includes questions on demographics, the presence of a household member belonging to a high-risk group, attitudes towards COVID-19 as well as related government measures ^14,37. Participants were asked to provide information about their social contacts between 5 a.m. the previous day to 5 a.m. the day of answering the survey. A contact was defined as either a skin-to-skin contact (e.g., kiss, handshake, hug), or the exchange of three or more words in the presence of another person. Participants reported information on the sex and age band of each contact, their relationship, the contact setting (e.g., home, school, workplace, place of entertainment), and whether the contact was a household member.

For survey waves 1 and 2, participants were asked to report information on each contact separately. However, from wave 3 participants were provided with the option to report the total number of contacts in the event that they could not list all of them separately. A copy of the COVIMOD questionnaire may be found in Additional file 1 of Tomori et al. 2021¹⁴. COVIMOD was approved by the ethics committee of the Medical Board Westfalen-Lippe and the University of Münster, reference number 2020-473-fs. Written informed consent was obtained from all participants and/or their legal guardians. All methods were carried out in accordance to relevant guidelines and regulations. As only anonymised COVIMOD data was used in this work, an institutional review was not required for reanalysis.

Other datasets

To better contextualise the changes in social contact patterns in relation to non-pharmaceutical interventions, we acquired data from the Oxford Covid-19 Government Response Tracker (OxCGRT) which collected information on policy measures that tackle COVID-19 through 2020 to 2021 ²⁶. Specifically, we extract the OxCGRT compact dataset for Germany and use the stringency index which measures containment, closure policies, and public information campaigns. The urban-rural typologies of where participants lived were determined by matching residence information to nomenclature of territorial units for statistics (NUTS) level 3 regions and using the classification defined by Eurostat. Specifically, predominantly urban regions are where at least 80% of the population live in urban clusters (a urban-rural typology of at least 300 inhabitants per $\text {km}^2$ and a minimum population of at least 5000 inhabitants); intermediate regions are defined as regions where between 50% but to 80% of the population live in urban clusters; and predominantly rural regions are defined as regions where least 50% of the population live in rural areas (all areas outside urban clusters) ⁴⁰. NUTS data for 2021 were downloaded from the Eurostat website ⁴¹. To construct post-stratification weights we obtained population size estimates by age and sex for 2019 (Table code: 12411-0041) and household size estimates for 2019 (Table code: 12211-9020) from the GENESIS-online database ⁴², and population size by NUTS level 3 region for 2019 (Table code: demo_r_pjangrp3) from Eurostat ⁴³.

Software

All analysis in this work was performed using R version 4.4.1 ⁴⁴. Bayesian inference was performed using the Stan⁴⁵ probabilistic programming language through the cmdstanr⁴⁶ package version 2.34.1 as front-end. A number of analyses were performed on high performance computing clusters maintained by the research computing service at Imperial College to reduce experiment time but all analysis can be run on modern laptops. The data and code to replicate the results in this study can be found at: https://github.com/ShozenD/contact-survey-fatigue.

Data processing

In all analyses, we omitted data for participants who did not disclose their age or sex, information which is required in all of our models. Adhering to ethical standards, the age of participants under 18 years (children) were reported in distinct age groups, namely, 0–4, 5–9, 10–14, and 15–18 years. To ascertain detailed age data for those under 18, we imputed their age by selecting from a discrete uniform distribution framed by the minimum and maximum ages of the participant’s age group. The total number of contacts reported by individuals were truncated to 50 (99th percentile) to mitigate the effects of extreme outliers on statistical estimates.

Features associated with pandemic social contacts

The number of contacts by an individual within a given time-frame can influenced by a variety of individual-level factors including age ^{6,14,17,19,21,29}, sex ^14,17,19, household size ^6,14,17,21, employment status ^6,17, and day of week effects ¹⁴. We perform a feature selection analysis with a focus on employment status (full-time employed, part-time employed, stay-at-home parent, long-term sick, retired, self-employed, student (above age 15), unemployed and seeking job, unemployed and not seeking job), day of week effects (weekday, weekend), presence of COVID-like symptoms (yes, no), and urban-rural typology (rural, intermediate, urban) while controlling for the effects of age, sex, and household size (Table 1). To prevent survey fatigue effects from contaminating the results, only records from first-time participants were used in the analysis.

Let $Y^{(0)}_i$ for $i = 1,2,\ldots , n^{(0)}$ denote the number of contacts by a first-time participant. We model $Y^{(0)}_i$ using a negative binomial distribution, $Y^{(0)}_i \sim \operatorname {NegBinomial}(\lambda _i, \varphi )$ ($i = 1,2,\ldots ,n^{(0)}$), where $\mathbb {E}[Y^{(0)}_i] = \lambda _i$ and $\operatorname {Var}(Y^{(0)}_i) = \lambda _i + \lambda ^2_i/\varphi$. The mean $\lambda _i$ is modelled on the log scale as

$$\begin{aligned} \log (\lambda _i) = \beta _0&+ \underbrace{ {\varvec{u}}_{i,\text {age}}^\top {\varvec{\alpha }}_{\text {age}} + {\varvec{u}}_{i,\text {sex}}^\top {\varvec{\alpha }}_{\text {sex}} + {\varvec{u}}_{i,\text {hhsize}}^\top {\varvec{\alpha }}_{\text {hhsize}} }_{\text {control features}} \nonumber \\&+ \underbrace{ {\varvec{v}}_{i,\text {employ}}^\top {\varvec{\beta }}_{\text {employ}} + {\varvec{v}}_{i,\text {symptom}}^\top {\varvec{\beta }}_{\text {symptom}} + {\varvec{v}}_{i,\text {dow}}^\top {\varvec{\beta }}_{\text {dow}} + {\varvec{v}}_{i,\text {urban}}^\top {\varvec{\beta }}_{\text {urban}} }_{\text {test features}}. \end{aligned}$$

(1)

The control features ${\varvec{u}}_{i,\text {age}}, {\varvec{u}}_{i,\text {sex}}, {\varvec{u}}_{i,\text {hhsize}}$ are one-hot encodings of age (14 categories), sex (2 categories), and household size (5 categories), respectively. These are standard predictors in contact behaviour studies and are always included in the model without selection. The test features ${\varvec{v}}_{i,\text {employ}}, {\varvec{v}}_{i,\text {symptom}}, {\varvec{v}}_{i,\text {dow}}, {\varvec{v}}_{i,\text {urban}}$ are one-hot encodings of employment status (9 categories), presence of COVID-like symptoms (2 categories), day of week (2 categories), and urban-rural classification (3 categories), respectively. We assign the following priors:

$$\begin{aligned} \beta _0&\sim \operatorname {Normal}(0, 100), \\ {\varvec{\alpha }}_{\text {age}}&\sim \operatorname {SZMVNormal}(14/13), \\ {\varvec{\alpha }}_{\text {sex}}&\sim \operatorname {SZMVNormal}(2), \\ {\varvec{\alpha }}_{\text {hhsize}}&\sim \operatorname {SZMVNormal}(3/2), \\ {\varvec{\beta }}_{\text {employ}}&\sim \operatorname {SZRHS}_{3, 2, 4}(2, 8/\sqrt{n^{(0)}}), \\ {\varvec{\beta }}_{\text {symptom}}&\sim \operatorname {SZRHS}_{3, 2, 4}(2, 1/\sqrt{n^{(0)}}), \\ {\varvec{\beta }}_{\text {dow}}&\sim \operatorname {SZRHS}_{3, 2, 4}(2, 1/\sqrt{n^{(0)}}), \\ {\varvec{\beta }}_{\text {urban}}&\sim \operatorname {SZRHS}_{3, 2, 4}(2, 2/\sqrt{n^{(0)}}). \end{aligned}$$

Sum-to-zero multivariate normal (SZMVNormal) priors²⁸ enforce identifiability by ensuring that the coefficients sum to zero. This centres the intercept $\beta _0$ as a global average, with each coefficient representing a deviation from this baseline. These priors also mitigate multicollinearity inherent in one-hot encoding without a reference category. For variable selection, we use sum-to-zero regularised horseshoe (SZRHS) priors²⁸, which combine the identifiability constraint with shrinkage to induce sparsity. Further details are provided in the Supplementary Materials.

Following inference stage, variable selection is performed given a relevance threshold. We set an optimistic threshold where if the value of the median of marginal posterior of the parameter is outside the range $(-0.0513, 0.0488)$ which corresponds to a $\pm 5\%$ change from the baseline parameter $\beta _0$. Additionally, features associated with social contacts may evolve over time with the introduction or the conclusion of non-pharmaceutical interventions, vaccination, public information campaigns, and other external factors. Therefore, in this and the following analysis, we applied the model independently to waves 4 and 21 which comprised of a balanced number of first-time and repeat participants (Fig. 1), allowing us to compare the two groups at the same point in time. The final set of selected features was the union of the features selected in each wave.

Features associated with reductions in reported contacts in association to repeat participation

Individual features associated with survey fatigue have been identified as potential concerns in several studies ^17,21, but have not been examined in detail. We investigate associations between survey fatigue and the following variables: schooling/age (14 categories), sex (2), household size (5), employment status (9), and urban-rural typology (3) (see Table 1).

Let $Y^{(1)}_i$ denote the number of contacts reported by the ith repeat participant, for $i = 1,2,\ldots ,n^{(1)}$. We model $Y^{(1)}_i$ using a negative binomial distribution, $Y^{(1)} \sim \operatorname {NegBinomial}(\lambda _i, \varphi )$ $(i = 1,2,\ldots ,n^{(1)})$. The mean $\lambda _i$ is modelled on the log scale as,

$$\begin{aligned} \log (\lambda _i) = \hat{\beta }_0 + \underbrace{{\varvec{u}}^\top _i \hat{{\varvec{\alpha }}}}_{\text {control}} + \underbrace{{\varvec{v}}^{*\top }_i \hat{{\varvec{\beta }}}^*}_{\text {selected}} + \underbrace{ {\varvec{w}}^\top _{i,\text {age}} {\varvec{\gamma }}_{\text {age}} + {\varvec{w}}^\top _{i,\text {sex}} {\varvec{\gamma }}_{\text {sex}} + {\varvec{w}}^\top _{i,\text {hhsize}} {\varvec{\gamma }}_{\text {hhsize}} + {\varvec{w}}^\top _{i,\text {employ}} {\varvec{\gamma }}_{\text {employ}} + {\varvec{w}}^\top _{i,\text {urban}} {\varvec{\gamma }}_{\text {urban}} }_{\text {survey fatigue test features}}. \end{aligned}$$

(2)

The vector ${\varvec{u}}_i$ (dimension 19) contains one-hot encodings of the control features: schooling/age (14 categories), sex (2), and household size (3). The vector ${\varvec{v}}_i$ includes one-hot encodings of the features selected in the prior variable selection model. $\hat{\beta }_0$, $\hat{{\varvec{\alpha }}}$, and $\hat{{\varvec{\beta }}}^*$ are the posterior medians of the intercept term, the coefficients on the control features, and the coefficients corresponding to the selected test features from model (1). To test for survey fatigue effects, we include one-hot encodings ${\varvec{w}}_{i,\cdot }$ for age/schooling, sex, household size, employment status, and urban-rural typology. Let ${\varvec{\gamma }} = ({\varvec{\gamma }}_{\text {age}}, {\varvec{\gamma }}_{\text {sex}}, {\varvec{\gamma }}_{\text {hhsize}}, {\varvec{\gamma }}_{\text {employ}}, {\varvec{\gamma }}_{\text {urban}})$ denote the concatenated vector of associated coefficients. Because survey fatigue is expected to reduce reported contact intensity, we place a constrained prior on ${\varvec{\gamma }}$ that limits effects to the negative domain, ${\varvec{\gamma }} \sim \operatorname {half-RHS}^-_{3,2,4}(2,32/\sqrt{n^{(1)}})$, where $\operatorname {half-RHS}^-$ is the negative-domain variant of the regularised horseshoe prior (see Supplemental Text S1). Features were selected if the posterior median reduction in expected contact intensity exceeded 5%. We applied this model separately to survey waves 4 and 21, and used the union of selected features from both to construct the final set.

Functional form of survey fatigue effects over the number of repeat participations

Previous studies showed that survey fatigue build up over the number of times participants contribute to consecutive survey waves ^17,19. We leverage this empirical observation to identify a simple yet effective functional form for modelling longitudinal survey fatigue. We focused on data from waves 3-12 (May 28^th, 2020 to July 1^st, 2020), as these waves fell into a period of homogeneous, relaxed contact restriction measures in Germany, thereby minimising the potential confounding effects of non-pharmaceutical interventions on our analysis. We also preferred to analyse data from relatively early period of the pandemic as interventions became more heterogeneous across states with time. We analysed a total 129,831 records from 2,492 unique participants with a maximum of 11 repeats.

Let $Y_{i,r}$ denote the number of contacts reported by individual i in their r-th participation, for $i = 1,2,\ldots ,2492$ and $r = 0,1,\ldots ,11$, where $r = 0$ corresponds to first-time participation. Let $t_{i,r}$ denote the calendar date of the r-th participation by individual i. The longitudinal model is specified as:

$$\begin{aligned} Y_{i,r} \sim \text {NegBinomial}(\lambda _i, \varphi ), \quad \log (\lambda _i) = \beta _0 + \underbrace{{\varvec{x}}^\top _i{\varvec{\beta }}}_{\text {control}} + \underbrace{\tau (t_{i,r})}_{\text {time}} + \underbrace{\rho (r)}_{\text {fatigue}} \end{aligned}$$

(3)

The covariate vector ${\varvec{x}}_i$ includes the dummy encoding for age, sex, household size, and other relevant features identified during the previous variable selection stage. Temporal variation in contact intensity is captured by $\tau (t_{i,r})$, which represents calendar-time effects as deviations from the global intercept $\beta _0$ using a zero-mean Gaussian process characterised by the Matérn 3/2 covariance kernel

$$\begin{aligned} k(t, t’) = \sigma ^2_{\tau } \left( 1 + \frac{\sqrt{3}|t – t’|}{\ell _\tau } \right) \exp \left( -\frac{\sqrt{3}|t – t’|}{\ell _\tau } \right) \end{aligned}$$

The term $\rho (r)$ captures the reduction in reported contacts associated with repeated participation, representing accumulating survey fatigue. We assessed 4 choices for modelling $\rho (r)$, which differed in the degree of assumption on the functional form of accumulating survey fatigue. To ensure the identifiability of time trends, we assumed that survey fatigue effects do not vary over calendar time and that the magnitude and shape of fatigue are consistent across participants.

The first model assumes that the reduction in reported contacts is constant regardless of the number of participations:

$$\begin{aligned} \rho ^{\text {Identical}}(r) = \rho \sim \operatorname {Normal}(0,1) \end{aligned}$$

(4)

We refer to this model as the identical fixed effects model. This is the adjustment strategy taken by Loedy et al.²¹

The second model assumes that reduction in reported contacts varies independently by the number of participations:

$$\begin{aligned} \rho ^{\text {Independent}}(r) = \rho _r \sim \text {Normal}(0, 1) \quad r \in \{1,2,\ldots ,R\}, \end{aligned}$$

(5)

with $\rho _0 = 0$ for identifiability. We refer to this model as the independent fixed effects model. This is the adjustment strategy taken by Dan et al.¹⁹

The third model assumes that the reduction in reported contacts is a smooth non-parametric function of the number of participations. To infer this function, we place a zero-mean Gaussian process prior with the squared exponential kernel depending on the rescaled number of participations $\tilde{r} = (r – \operatorname {mean}(r))/\operatorname {sd}(r)$

$$\begin{aligned} \rho ^{\text {GP}}(r)&= \rho (\tilde{r}) \sim \operatorname {GP}(0, k_{\sigma _\rho ,\ell _\rho }), \quad k_{\sigma _\rho , \ell _\rho }(\tilde{r},\tilde{r}’) = \sigma ^2_\rho \exp \left( -\frac{(\tilde{r}-\tilde{r}’)^2}{2 \ell _\rho ^2} \right) \nonumber \\ \sigma _{\rho }&\sim \text {inv-Gamma}(5, 5), \quad \ell _{\rho } \sim \text {inv-Gamma}(5, 5) \end{aligned}$$

(6)

for $r=1,\dotsc ,R$, and $\rho (0)=0$ to ensure identifiability. We refer to this model as the Gaussian process model.

For the fourth model, we take inspiration from dose-response modelling in biochemistry ³¹ and model survey fatigue with a three-parameter Hill function:

$$\begin{aligned} \rho ^{\text {Hill}}(r) = -\gamma \frac{e^\zeta r^\eta }{1 + e^\zeta r^\eta } \quad \text {where} \quad \gamma , \eta \in \mathbb {R}^+, \zeta \in \mathbb {R} \end{aligned}$$

(7)

where $\gamma$ is a scale parameter which control the effect size at large values of r, while $\zeta$ and $\eta$ in tandem controls how quickly the function approaches its minimum value. Figure S3 shows the functional form of the Hill function under different parameter values. This model is computationally more efficient than GP models, and renders, due to its parametric form, survey fatigue effects strongly predictable. Given their constraints, we assign the following priors to each parameter:

$$\begin{aligned} \gamma \sim \text {half-Normal}^+(0,1), \quad \zeta \sim \text {Normal}(0,1), \quad \eta \sim \text {Exponential}(1). \end{aligned}$$

We refer to this model as the Hill model.

Throughout, we used the following generic priors to the parameters in the components common across all models:

$$\begin{aligned} 1/\varphi&\sim \text {Exponential}(1), \\ \beta _0&\sim \text {Normal}(0, 5), \\ \beta _j&\sim \text {Normal}(0, 1) \quad j = 1,\ldots ,J \\ \sigma _{\tau }&{\sim } \text {inv-Gamma}(3, 1), \\ \ell _{\tau }&{\sim } \text {inv-Gamma}(5, 1). \end{aligned}$$

Correcting for long-term survey fatigue effects with statistical contact models

Contact intensity estimates gained from statistical models fitted to data with only first-time participants avoids the bias induced by survey fatigue. However, this may mean that estimates are based on substantially smaller sample sizes (Fig. 1). Here, we describe our approach to mitigating the effects of survey fatigue. The analysis is conducted on wave 21 which contains a large number of newly recruited participants as well as repeat participants with varying numbers of prior participations.

Let $a_i$ denote the age for the i-th participant. The covariate vector ${\varvec{x}}_i = (x_{i1}, \ldots , x_{iP})^\top \in \{0,1\}^P$ contains the dummy encodings for sex, household size variables, and other control features identified in the preceding feature selection analysis. The vector ${\varvec{w}}_i = (w_{i1}, \ldots , w_{iQ})^\top \in \{0,1\}^Q$ includes the one-hot-encodings of features associated with survey fatigue. The contact count of participant i at participation r is modelled as:

$$\begin{aligned} Y_{i,r} \sim \text {NegBinomial}\left( \lambda _{i,r}, \varphi \right) , \quad \log (\lambda _{i,r}) = \beta _0 + \underbrace{f_{\sigma ,\ell }(a_i)}_{\text {age}} + \underbrace{{\varvec{x}}_i^\top {\varvec{\beta }}}_{\text {control}} + \underbrace{{\varvec{w}}^\top _i \rho (r)}_{\text {fatigue}} \end{aligned}$$

(8)

where $\beta _0$ is a global intercept, and $\rho (r) \in \mathbb {R}_{\le 0}^Q$ models the fatigue effect at participation r for each of the Q selected variables:

$$\begin{aligned} \left[\rho (r)\right]_q = -\gamma _q \exp (\zeta _q)r^{\eta _q}/(1+\exp (\zeta _q)r^{\eta _q}) \end{aligned}$$

The covariate coefficients ${\varvec{\beta }} = (\beta _1, \ldots , \beta _P)^\top$ and intercept $\beta _0$ are assigned weakly informative priors, and the inverse of the overdispersion parameter $\varphi$ is given an exponential prior:

$$\begin{aligned} \beta _0 \sim \text {Normal}(0, 5), \quad \beta _p \sim \text {Normal}(0, 1), \quad 1/\varphi \sim \text {Exp}(1). \end{aligned}$$

Let $\hat{\gamma }$, $\hat{\zeta }$, and $\hat{\eta }$ denote posterior median estimates from the longitudinal model in Eq. (3). To incorporate prior knowledge in the Bayesian framework, we assign the following weakly informative priors to the Hill function parameters:

$$\begin{aligned} \gamma _q \sim \operatorname {half-Normal}^+(\hat{\gamma }, 1), \quad \zeta _q \sim \operatorname {Normal}(\hat{\zeta }, 1), \quad \eta _q \sim \operatorname {Exponential}(\hat{\eta }), \quad \text {for } q = 1,2,\dots Q. \end{aligned}$$

The function $f_{\sigma ,\ell }(a_i): \mathbb {R} \rightarrow \mathbb {R}$ represents the non-linear effect of age on contact intensity. For computational efficiency, $f_{\sigma ,\ell }$ is modelled using a zero-mean Hilbert space approximate Gaussian process (HSGP) ^47,48, parametrised by $\sigma , \ell$ and based on the squared exponential covariance kernel:

$$\begin{aligned} k(x, x’) = \sigma ^2 \exp \left( -\frac{(x – x’)^2}{2\ell ^2} \right). \end{aligned}$$

For a quantity $\theta$, let $\theta ^{(s)}$ denote its value at the s-th MCMC iteration, for $s = 1, \ldots , S$. Let $v_g, v_h, v_u$ and $v_j$ denote the post-stratification weights for sex, household size, urban-rural typography, and employment-status respectively, calculated as

$$\begin{aligned} v_g = \frac{P_g}{\sum _{g} P_g}, \quad v_h = \frac{P_h}{\sum _{h} P_h}, \quad v_u = \frac{P_u}{\sum _{u} P_u}, \quad v_j = \frac{n_j}{\sum _{j} n_j} \end{aligned}$$

where P denotes the population size estimates and n denotes sample sizes. We calculate the post-stratification weights for employment status from the sample as we could not obtain accurate population estimates at the level we are stratifying at. While this means that the estimates may not be completely representative of the population, it facilitates comparisons with non-model based approaches such as bootstrapping. For each MCMC sample s, we compute the population-weighted contact intensity as

$$\begin{aligned} \lambda ^{(s)}(a) = \sum _{g} \sum _{h} \sum _{u} \sum _{j} v_g v_h v_u v_j \exp \left( \beta _0^{(s)} + f^{(s)}_{\sigma ,\ell }(a) + \beta ^{(s)}_g + \beta ^{(s)}_h + \beta ^{(s)}_u + \beta ^{(s)}_j \right) , \end{aligned}$$

We then summarize the posterior distribution of $\lambda ^{(s)}(a)$ using the median and 95% credible interval:

$$\begin{aligned} \lambda _M(a)&:= \operatorname {median}\left( \{ \lambda ^{(s)}(a) \mid s = 1,\ldots , S \} \right) , \end{aligned}$$

(9a)

$$\begin{aligned} \lambda _L(a)&:= \operatorname {quantile}_{2.5\%}\left( \{ \lambda ^{(s)}(a) \mid s = 1,\ldots , S \} \right) , \end{aligned}$$

(9b)

$$\begin{aligned} \lambda _U(a)&:= \operatorname {quantile}_{97.5\%}\left( \{ \lambda ^{(s)}(a) \mid s = 1,\ldots , S \} \right) . \end{aligned}$$

(9c)

We begin by fitting the model to the subset of data consisting only of first-time participants and treat the resulting median contact intensity, $\lambda ^0_M(a)$, as the baseline. Next, we sequentially include participants with one, two, …, up to 20 repeat survey responses. At each step $r = 1,\ldots ,20$, we re-fit the model using all participants with at most r repeats and compute the preceding quantities. For comparison, we also obtained estimates in the same manner from a model without the survey fatigue adjustment term ${\varvec{w}}^\top _i \rho (r)$ in Eq. (8).

To quantify deviations from the baseline, we compute the mean absolute percentage error (MAPE) and the empirical 95% coverage of the baseline within the posterior credible intervals:

$$\begin{aligned} \operatorname {MAPE}(r) = 100 \times \frac{1}{85} \sum _{a = 0}^{84} \frac{|\lambda ^r_M(a) – \lambda ^0_M(a)|}{|\lambda ^0_M(a)|}, \quad \operatorname {Coverage}_{95\%}(r) = \frac{1}{85} \sum _{a = 0}^{84} 1\left\{ \lambda ^0_M(a) \in [\lambda ^r_L(a), \lambda ^r_U(a)] \right\} , \end{aligned}$$

where $1\{\cdot \}$ is the indicator function.

Longitudinal and subgroup specific contact intensity estimates

To obtain survey fatigue–adjusted estimates, we applied the model defined in Eq. (8) to all 33 waves of the COVIMOD surveys as follows. Let $t = 1,2,\ldots ,33$ index the survey waves. For the first wave ($t = 1$), we used the same priors as in the previous section. From the second wave onward, we incorporated information from the previous wave by setting the priors for the individual-level feature coefficients and Hill function parameters as:

$$\begin{aligned} \beta _{t,p}&\sim \text {Normal}(\hat{\beta }_{t-1,p}, 0.3) \quad \text {for } p = 1,2,\ldots ,P, \\ \gamma _{t,q}&\sim \text {half-Normal}^+(\hat{\gamma }_{t-1,q}, 0.3), \\ \zeta _{t,q}&\sim \text {Normal}(\hat{\zeta }_{t-1,q}, 0.1), \\ \eta _{t,q}&\sim \text {half-Normal}^+\left( 1/\hat{\eta }_{t-1,q}, 0.1\right) \quad \text {for } q = 1,2,\ldots ,Q, \end{aligned}$$

where $\hat{\beta }_{t-1,p}$, $\hat{\gamma }_{t-1,q}$, $\hat{\zeta }_{t-1,q}$, and $\hat{\eta }_{t-1,q}$ denote the posterior means from wave $t-1$. This sequential fitting procedure allows the model to leverage prior information to partially resolve the identifiability issue between $\beta _{t,p}$ and the Hill function parameters when no first-time participants are present.

For each MCMC sample s, we compute the population-weighted contact intensity as:

$$\begin{aligned} \lambda ^{(s)} = \sum _a \sum _g \sum _h \sum _j v_{a,g} v_h v_u v_j \exp \left( \beta _0^{(s)} + f^{(s)}_{\sigma ,\ell }(a) + \beta ^{(s)}_g + \beta ^{(s)}_h + \beta ^{(s)}_{u} + \beta ^{(s)}_j \right) , \end{aligned}$$

(10)

where $v_{a,g} = P_{a,g} / \sum _a \sum _g P_{a,g}$ and $v_h, v_u, v_j$ is defined as in the previous section. Subgroup-specific contact intensities are obtained by including additional covariates and restricting the summation to relevant subpopulations. For example, the contact intensity among stay-at-home parents aged 19–64 is computed as:

$$\begin{aligned} \lambda _{\text {stay-at-home, 19–64}}^{(s)} = \sum _{a=19}^{64} \sum _g \sum _h \sum _u\tilde{v}_{a,g} v_h v_u \exp \left( \beta _0^{(s)} + f^{(s)}_{\sigma ,\ell }(a) + \beta ^{(s)}_g + \beta ^{(s)}_h + \beta ^{(s)}_{u} + \beta ^{(s)}_{\text {stay-at-home}} \right) , \end{aligned}$$

where the weights $\tilde{v}_{a,g}$ are normalized to sum to 1 within the 19–64 age range:

$$\begin{aligned} \tilde{v}_{a,g} = \frac{P_{a,g}}{\sum _{a=19}^{64} \sum _g P_{a,g}}. \end{aligned}$$

We summarize the posterior distribution of $\lambda ^{(s)}$ using the median, 2.5% quantile, and 97.5% quantile, as in Eq. (9a-c). The resulting estimates were compared against (i) those from a simple bootstrap procedure ²⁵, (ii) model-based estimates without the survey fatigue adjustment term ${\varvec{w}}^\top _i \rho (r)$, and (iii) estimates based on data from first-time participants only, in waves with at least 300 such participants.

Source link