Network-based SEIR model and data collection
Our study builds on the findings of Thakur et al.6. The authors modeled potential trajectories of measles resurgence in Virginia for a relatively short time duration (365 days) under two distinct scenarios: (1) pre-COVID-19 and (2) post-lockdown conditions where social activities return to pre-COVID-19 levels, but MMR immunization rates decline. In this study, we use the same scenarios but focus on estimating the economic burden of resurgence in each of the scenarios and the spillover effects. The outline of the analyses is illustrated through flowchart in Fig. 1.
Flowchart of the outline of data analysis methodology. HIQ_C\(\gamma\), Home isolation/quarantine compliance; HI_D, Home isolation delay; HQ_D, Home quarantine of delay; HI_P, Home isolation adherence probability; MMR_E, Effectiveness of MMR vaccine; HI_T, Duration of home isolation; HQ_T, Duration of home quarantine; TR\(\tau\), Transmissibility rate; MMR\(\alpha\), MMR level reduction.
An agent-based model simulates the spread of measles in Virginia (VA). The disease spreads across a synthetic social contact network of VA, where nodes represent individuals and edges represent contacts between individuals. There are approximately 7.6 million nodes and 371.9 million edges in the network. These individuals have demographic attributes as available in the US Census and are placed in households that are geo-located20,21,22. Their daily activities are determined from time-use surveys and geolocation sources, forming the basis for person-to-person contacts and disease transmission. Using this network, the model simulated the transmission of measles with an SEIR (Susceptible-Exposed-Infected-Recovered) model with several critical parameters influencing the outcomes. Each simulation begins with one random infected node of age 5–17 years and is run for 365 time-steps (365 days). For more details on this agent-based model and the construction of the synthetic contact network, see Thakur et al.6,20,21,22.
For our study, simulated outcomes on measles infection counts and the number of days individuals spend in home isolation or quarantine are derived from the above agent-based model. There are a total of 30 distinct scenarios, as outlined in Table 1. For each scenario, 300 replicates are generated, resulting in a total of 9000 simulations. The disease starts from a single index case that is randomly selected in each replicate. The economic evaluation is based on the integration of disease outcomes, cost estimates, along with the demographic characteristics of each household.
Cost estimation
We assess the economic impact of measles across 133 counties in Virginia, which include 38 independent cities, from a societal perspective. For each county, we evaluate the cost of measles outbreak for each of the 9000 simulation runs. Following Pike et al.24, costs are valued on an incremental resource-use basis and comprise treatment (direct medical) treatment, productivity losses, and MMR vaccination cost. Our objective is to quantify the direct societal burden attributable to cases and vaccination. Further assumptions and parameterization are detailed in subsequent sections. All costs are stated in 2021 US dollars (USD). All dollar amounts were modeled in 2021 USD to preserve internal consistency with simulation inputs; readers can convert any reported total to later-year USD. This currency rescaling does not affect comparative or spatial conclusions.
Treatment costs
The treatment costs are a function of the incidence of measles. Based on the historical data, the US Center for Disease Control and Prevention (CDC) has estimated that approximately 1 in 4 cases in the US culminate in hospitalization, with 1 in 1000 escalating to fatalities25. For the purpose of computation, the treatment cost per case, denoted as \({C}^{m}\), is calculated using the formula:
$$\:\begin{array}{c}{C}^{m}=\frac{1}{4}\times\:{C}^{hosp}+\:\frac{3}{4}\times\:{C}^{outpatient},\end{array}$$
(1)
where the cost of hospitalization is assumed to vary between $4,032 to $46,060 per case, and the cost of outpatient visit varies from $88 to $526 per case, based on data (in 2013 US$) from Whitney et al.26,27. These unit-cost values are derived from U.S. insurance claims databases and reflect realized reimbursements from payers to providers for measles-related care. We use them as a proxy for direct medical costs from a societal perspective, that is, medical services and supplies associated with outpatient visits and hospitalizations, including laboratory services and prescription drugs. Non-medical costs such as patient travel/time, money spent on food and lodging etc. are not consistently captured in claims and are therefore excluded. Our focus remains on estimating the spatial spillover effects. For our computations, we use the median of the reported ranges for our baseline estimates and inflated to 2021 US$: \({C}^{hosp}=\$\text{29,132.92}\) and \({C}^{outpatient}=\$357.10\). The median offers a more robust central estimate under such skewed distributions and ensures that the results are not disproportionately influenced by extreme values.
Productivity losses
Productivity loss is estimated to occur when working adults must stay home from work either to take care of themselves or a child, due to sickness. If an individual is working, the wage loss is equal to daily wages multiplied by the number of missed working days. To calculate daily wages, we use household’s annual income and divide it by the number of working days in a year (i.e. 261), and then divide it by the number of working individuals in the household.
Productivity loss due to a child staying home is estimated in the following manner. If a non-working individual aged 13 or above is available within the household, that individual is designated as the caregiver for the child, and there is no loss in wages. In the absence of a non-working family member, the role of a caregiver is assigned to a working family member, and the wage loss for this member is calculated the same way as before.
To methodically quantify the daily productivity loss for each working individual \(i\) within household \(j\), denoted as (\({C}_{i,j}^{p}\)), we employ the following formulation:
$$\:\begin{array}{c}{C}_{i,j}^{p}=\frac{\text{H}\text{o}\text{u}\text{s}\text{e}\text{h}\text{o}\text{l}\text{d}\:\text{i}\text{n}\text{c}\text{o}\text{m}\text{e}}{\#\text{W}\text{o}\text{r}\text{k}\text{e}\text{r}\text{s}\:\text{i}\text{n}\:\text{h}\text{o}\text{u}\text{s}\text{e}\text{h}\text{o}\text{l}\text{d}}\times\:\frac{1}{261}, \end{array}$$
(2)
where the denominator 261 represents the average number of working days in a year. In the absence of data on individual income, we use households’ annual income and the number of employed family members to determine the productivity loss.
MMR vaccination cost
We assume MMR doses are co-administered during routine well-child visits, so no incremental administration, travel, or additional-visit costs are attributed in the base case. We use a transparent retail price per MMR dose because CDC VFC contract prices apply to awardee programs and are not directly available to private providers. Since programs like VFC can procure at discounted contract rates, our retail assumption is conservative. Drawing from the data provided by GoodRx28, each dose of the MMR vaccine is priced at $114. Given the CDC’s recommendation for two doses per individual—with the initial dose administered between 12 and 15 months and the subsequent dose between 4 and 6 years29—the total cost for the full vaccination schedule per individual amounts to $228, i.e., \({C}^{v}=\$228\). Occasional stand-alone vaccination visits that could generate extra administration or non-medical costs are assumed rare and negligible in this setting and are excluded.
Total cost
We assume that every measles infection is reported and treated as either an inpatient or outpatient. Using the above parameters, the total cost for county \(s\), denoted by \({T}_{s}\), is calculated as:
$$\:\begin{array}{c}\:{T}_{s}={C}^{m}\times\:{I}_{s}+{C}^{v}\times\:\:{V}_{s}+\:\sum\:_{j=1}^{{N}_{s}}\sum\:_{i=1}^{{n}_{j}}{C}_{i,j}^{p}\times\:{t}_{i,j},\end{array}$$
(3)
where \({I}_{s}\) is the number of infected measles case in county \(\:s\); \(\:{V}_{s}\) is the number of vaccinated individuals in county \(s\); \({t}_{i,j}\) is the number of days of home stay of working individual \(i\) in household \(j\); \({n}_{j}\) is the total number of working individuals in household \(j\); \({N}_{s}\) is the total number of households in county \(s\).
Spatial analysis
We use a spatial econometric model to examine the epidemiological and economic cost of measles outbreak in the county of origin as well as its impact in the neighboring regions due to spatial interdependencies between counties. We measure (A) the mean total measles cases to estimate the average epidemiological impact and (B) the mean total cost per capita to estimate the average financial burden. We also evaluate the 90th percentiles of measles cases and total cost per capita, to understand the variability in outcomes across runs. See details in the Supplementary Appendix S4, and results in the Table S6 and Figure S5–S6.
Generalized spatial autoregressive model (GSAR)
To understand the spatial complexities inherent in the spread of measles, it is imperative to acknowledge the spatial interdependence between regions. An alteration in one region can instigate changes in its neighboring regions, a cascading effect often referred to as the ‘spillover effect’. Addressing this requires a modeling approach that can capture such spatial dependence.
The response variable represents measles cases or standardized costs across counties \(\mathcal{S}=(\text{1,2},\dots\:,s)\), denoted as \(\varvec{y}=({y}_{1},\dots\:{y}_{s})\). The outcome \({y}_{i}\) is assumed to come from a distribution of the Exponential family with mean parameter \({\mu\:}_{i}\). The relationship between \({\mu\:}_{i}\) and the linear predictor on a vector of covariates \({\varvec{x}}_{\varvec{i}}\) is established through a link function \(g(\cdot\:)\):
$$\:\begin{array}{c}g\left({\mu\:}_{i}\right)={\eta\:}_{i}={{\varvec{x}}_{\varvec{i}}}^{T}\beta\:. \end{array}$$
(4)
To examine the spatial relationships between regions, we use a spatial weight metric constructed from the commuter flows between counties in Virginia. We use county-to-county commuting flows from 2016-2020 5-Year ACS Census data30. The commute flow based spatial weight matrix, \(\varvec{W}\), is formulated with each element, \({w}_{ij}\), defined as:
$$\:\begin{array}{c}{w}_{ij}=\left\{\begin{array}{c}\frac{{c}_{ij}}{\sqrt{{N}_{i}{N}_{j}}},\:\:i\ne\:j\\\:0,\:\:i=j,\end{array},\right.\: \end{array}$$
(5)
where \({c}_{ij}\) is the number of individuals commuting from one specific region \(i\) to another region \(j\); \({N}_{i}\) and \({N}_{j}\) indicate the population of region \(i\) and \(j\), respectively. Dividing \({c}_{ij}\) by the square root of \({N}_{i}{N}_{j}\) normalizes commuter flows relative to the populations of the regions, ensuring the weights represent proportional interactions rather than being skewed by absolute population sizes. Notably, \(\varvec{W}\:\)is standardized such that the sum of all its elements equals 1, maintaining uniformity in the weight matrix.
Under the Bayesian perspective, the Generalized Spatial Autoregressive Model (GSAR) is used to account for spatial dependence in data where outcomes in one region may be influenced by outcomes or predictors in neighboring regions. This is particularly relevant in infectious disease modeling, where disease spread is not confined to administrative boundaries. In the GSAR framework, the dependent variable \(\varvec{y}\) is assumed to follow a distribution from the Exponential family. For example, if the dependent variable is assumed to follow a Poisson distribution, then
$$\:\varvec{y}\sim\text{P}\text{o}\text{i}\text{s}\left(\varvec{\mu\:}\right),\:$$
where \(\varvec{\mu\:}\) is an \(n\times\:1\) vector representing the expectation of \(\varvec{y}\), i.e., \(\varvec{\mu\:}=E\left(\varvec{y}\right)\). A link function \(g\left(\varvec{\mu\:}\right)\) relates the expected outcome to a linear predictor \(\varvec{\eta\:}\) via:
$$\:\begin{array}{c}g\left(\varvec{\mu\:}\right)=\varvec{\eta}\:={\left({\varvec{I}}_{\varvec{n}}-\rho\:\varvec{W}\right)}^{-1}\varvec{X}\varvec{\beta}\:, \end{array}$$
(6)
where \(\varvec{X}\) denotes an \(n\times\:p\) matrix of independent variables, while \(\varvec{\beta\:}\) signifies the \(p\times\:1\) coefficient vector of the independent variables to be estimated. The matrix \(\varvec{W}\) indicates the \(n\times\:n\) spatial weight structure. The spatial lag coefficient, represented by \(\rho\), is to be estimated and characterizes the spatial autocorrelation coefficient. The term \({\left({\varvec{I}}_{\varvec{n}}-\rho\:\varvec{W}\right)}^{-1}\)introduces spatial spillover effects, meaning that a change in a covariate in one region can influence outcomes in neighboring regions through the spatial structure defined by the weight matrix \(\varvec{W}\).
To derive approximate inference for this model, Goméz-Rubio et al.31 developed a latent class “slm” in “INLA” R-package32. For an in-depth exploration of the estimation process and its mathematical foundations, see Goméz-Rubio et al. (2021)31.
In our study, we employ the GSAR to independently analyze the incidences of measles cases and the associated measles costs. Specifically, the measles cases are modeled using a Poisson distribution with a logarithmic link function (GSAR-Poisson), reflecting their discrete nature and right-skewed characteristics. Concurrently, the cost data are modeled using a Gamma distribution, also integrated with a logarithmic link function (GSAR-Gamma), to accommodate their non-negative and skewed distributional properties.
Direct effect and spillover effect
It is crucial to distinguish the definitions of direct and spillover (indirect) effects within the Spatial Autoregressive (SAR) model from the conventional usage of these terms in the vaccine literature33. In the SAR model, these terms are defined differently due to the spatial interdependencies inherent in the model. Specifically, the SAR framework typically identifies two types of marginal effects: direct effects and spillover (or indirect) effects. The direct effect refers to the change in the dependent variable in a particular region resulting from a unit change in the explanatory variable within that same region. Conversely, the spillover effect, often termed the indirect effect, reflects the alteration in the dependent variable in one region due to a unit change in the explanatory variable in a neighboring region34.
The full detailed derivations for the direct and spillover effects are extensively documented in the Supplementary Appendix S1.1.
Variable selection
In the GSAR model, three experimental setting parameters serve as the primary explanatory variables: MMR vaccination reduction rate, transmissibility, and compliance with home isolation/quarantine directives. The MMR vaccination reduction rate is conceptualized as a continuous variable, while transmissibility and adherence directives are delineated as factor variables: {0.4, 0.5, 0.6} for measles transmissibility levels (low, baseline, high) and {75%, 85%, 95%} for home isolation/quarantine compliance levels (low, median, high), respectively.
To account for the potential confounders affecting the epidemiological and the economic impact of measles, the model integrates four demographic attributes of each county as covariates. These include average annual household income, the proportion of males, the proportion of children below 5 years of age, and Proportion of employed population. The selection of these confounders is informed by a rigorous assessment of multicollinearity and individual variable significance.
Statistical analysis
From the 9000 simulation runs, we obtain simulated data on measles incidence, isolation/quarantine days, MMR vaccine costs, medical treatment costs, and productivity losses. To carry out the spatial analysis, the simulated results are aggregated by 133 counties and 30 unique experimental scenarios. To ascertain the presence of spatial dependency, we compare the GSAR model against a non-spatial generalized linear model (GLM), with additional diagnostic results reported in Supplementary Appendix S2.2 (Tables S2 and S3). As suggested by Kelejian and Prucha (2001)35, Morans’ I (Moran’s Index) statistic is utilized as a measure of spatial dependence. This statistic allows for the quantification of the linear relationship between observations and their spatially lagged counterparts. The value of Moran’s I tends to fall within the range of [− 1, 1]. A positive value is indicative of a positive spatial correlation, while a negative value suggests a negative spatial correlation. On the other hand, values close to 0 are associated with a lack of spatial autocorrelation. The observed Moran’s index value in our model indicates a significant spatial correlation, indicating strong spatial interdependencies within the region. Certain model fit indices, including Deviance Information Criterion (DIC), and Watanabe-Akaike Information Criterion (WAIC) are used to evaluate the fitness of the model. DIC is a generalization of AIC, developed especially for Bayesian model comparison36, and WAIC can be seen as an improvement over the DIC37.
To assess the predictive accuracy of the model, the Leave-One-Out Cross-Validation (LOOCV) approach is implemented to ensure consistency in representation. Note that we have 133 counties, and each county has 30 experimental scenarios, resulting in a total of 3990 observations. In this approach, each of the 133 counties is sequentially left out once for testing, while the model is trained on the remaining counties. This ensures that predictions are grounded on a comprehensive range of experimental settings from multiple counties. We compute the predictive Root Mean Square Error (PRMSE) around our forecasts. Furthermore, we also obtain the predictive intervals from the posterior predictive distribution which represents the probability of future observations given the observed data. These not only offer a clear measure of the prediction performance, but also provide insights into the range of possible outcomes.
Data analysis in this paper is performed using software R with Version 4.3.0. We use “spdep” package38 and “INLA” package32 to conduct the spatial correlation test and spatial econometric model, respectively.
