We proceeded in three steps. First, we developed a dynamic model of pneumococcal carriage transmission (Fig. 1, Supplementary Table 1). The parameter values were based on empirical data, taken from literature, or assumed (Table 2). We verified this model’s adequacy against the observed prevalence of VT carriers among children from pre- to post-PCV era in France (1999–2008), the UK (2002–2016), Alaska (2000–2009), and Massachusetts (2001–2008) (Table 1). Second, we simulated the dynamics of pneumococcal carriage transmission after PCV introduction using contact matrices from 34 countries16 and assessed the impact of social contact patterns on the dynamics of VT carriage decline. Third, we changed one key parameter (such as vaccine efficacy and population susceptibility) at a time in the simulations investigate the effect of each key parameter on VT elimination. We describe the Data, the Model, and the details of these three steps in the Analyses below.
Data
Contact matrices and demography
We used the inferred contact matrices \(\:{M}_{ij}\) from16. The contacts, stratified by age yearly from 0 to 84, were derived from synthetic networks built using population census data and socio-demographic surveys. The overall contact matrix for a location is a weighted sum of the setting-specific (i.e., household, school, workplace, and community) contact matrices. \(\:{M}_{ij}\) gives the total number of daily contacts between age groups \(\:i\) and \(\:j\) per person in age group \(\:i\), we applied reciprocity correction on \(\:{M}_{ij}\) and transformed it into \(\:{\stackrel{\sim}{m}}_{ij}\), which gives the total annual contacts between age groups \(\:i\) and \(\:j\) per person in age group \(\:i\) and per person in age group \(\:j\) (density scale, as defined in44; see Supplementary Fig. 3). To elucidate the effect of social contact patterns, we used a common population structure for all contact matrices in our simulations. As sensitivity analyses, we repeated the simulations using country-specific empirical demography from16 and birth rate from the World Bank Open Data45.
Carriage duration and prevalence
We extracted data about age-specific carriage duration and carriage prevalence from published studies identified through a scoping literature search.
Among the identified culture-based studies, we included the studies that reported median durations (n = 8) (Supplementary Fig. 1, Supplementary Data 1), because the duration of carriage has a left-skewed distribution, with few individuals showing lasting carriage. For the model of carriage duration with age, we used non-linear least squares regression to estimate the parameters in the equation (Supplementary Fig. 2):
$$\:Duration=a+(b-a)\:\times\:\:exp(-c\:\times\:\:Age)$$
where \(\:a=21\) (standard error: 5.8), \(\:b=62\:\left(14.6\right)\), and \(\:c=0.45\:\left(0.4\right)\).
In the main analysis, we fixed the age-specific initial carriage prevalence \(\:{f}_{C}^{\left(i\right)}\left(0\right)\) based on26 and the age-specific susceptibility parameter \(\:{\beta\:}\!^{\left(i\right)}\) based on47. As sensitivity analyses, we used two other distributions of \(\:{\beta\:}\!^{\left(i\right)}\) over age, considering a lower carriage prevalence in age 0 and a higher carriage prevalence in all ages, to reflect the observed data from the identified culture-based pre-PCV carriage prevalence studies (n = 17) (Supplementary Figs. 5 and 6, Supplementary Data 2). After calibrating \(\:{\beta\:}\!^{\left(i\right)}\) for the assumed carriage prevalences, we re-simulated the time-to-elimination for all countries.
VT-carrier prevalence in children in the real world
We extracted the VT-carrier prevalence in children from the pre- to post-PCV era in 4 locations—France20, the UK21, Alaska22, and Massachusetts23 (Supplementary Data 3)—to verify our model’s ability to reproduce the decline in VT-carrier prevalence following PCV introduction. The observed data come from cross-sectional surveys among children attending daycare centers or primary care clinics. In all four included studies, the detection of S. pneumoniae was culture-based, and the serotyping was either by traditional Quellung reaction or molecular methods. For point estimates of carriage reported without uncertainty, we calculated the standard error (SE) for proportion and indicated the uncertainty limits as 1.96×SE from the mean.
Model
We formulated a deterministic model that simulates the transmission of VT and NVT carriage (Fig. 1) based on the neutral null model proposed by17. Assuming a stable population (i.e., birth rate = death rate), susceptible individuals (\(\:S\)) become VT-carriers (\(\:{C}_{V}\)) at the rate \(\:{\lambda\:}_{V}\), or NVT-carriers (\(\:{C}_{N}\)) at the rate \(\:{\lambda\:}_{N}\). Mono-carriers \(\:{C}_{V}\) (or \(\:{C}_{N}\)) can be colonized by the other serotype at rate \(\:{{k}_{N}\times\:\lambda\:}_{N}\) (or \(\:{{k}_{V}\times\:\lambda\:}_{V}\)) and become co-carriers (\(\:{C}_{VN}\)), and co-carriers return to mono-carriers \(\:{C}_{V}\) (or \(\:{C}_{N}\)) at rate \(\:{{c\times\:k}_{V}\times\:\lambda\:}_{V}\) (or \(\:{{c\times\:k}_{N}\times\:\lambda\:}_{N}\)). We assumed the inter-serotype competition parameter, \(\:k\), to be 0.5 in the main analysis and tested a range of values (0.1, 0.25, 0.75) based on published estimates18,46,47. When \(\:{k}_{V}\)=0.5, VT is half as likely to colonize an individual already colonized by NVT. We further assumed this competition to be symmetrical (\(\:{k}_{V}={k}_{N}\)) to ensure neutrality at initiation. The parameter \(\:c\), representing the fraction of co-carriers returning to \(\:{C}_{V}\) (or \(\:{C}_{N}\)) upon re-infection with VT (or NVT), was fixed to 0.5 to ensure neutrality48.
The vaccine was introduced at time \(\:{t}_{V}\) and had a coverage of \(\:{p}_{V}\). Therefore, \(\:{p}_{V}\) was zero before time \(\:{t}_{V}\) and equal to \(\:{p}_{V}\) starting from time \(\:{t}_{V}\).
In our age-structured model, individuals moved from one age to the next year of age at an aging rate \(\:{\delta\:}\!_{i}\)=1 per year. The whole population of newborns was unvaccinated. As individuals moved from age 0 to age 1, a fraction (\(\:{p}_{V}\)) of the population was vaccinated and partially protected from pneumococcal colonization (superscript “\(\:(V,1)\)”). The rest (\(\:1-{p}_{V})\) of age 0 stayed unvaccinated as they reached age 1 (superscript “\(\:(N,1)\)”).
For the dynamics of the vaccinated individuals, the rate of VT carriage acquisition \(\:{\lambda\:}_{V}\) was reduced by a factor \(\:VE\), where \(\:VE\) represents the vaccine efficacy against acquisition of VT carriage. Vaccine-conferred immunity was assumed to wane at a rate \(\:{\alpha\:}\!_{V}\), so that \(\:1/{\alpha\:}\!_{V}\) represents the average duration of vaccine protection.
The age-specific carriage acquisition rate, \(\:{\lambda\:}\!^{\left(i\right)}\), depends on \(\:{\beta\:}\!^{\left(i\right)}\), the cumulative number of carriers in the contactee age groups, \(\:C{C}^{\left(j\right)}\), and the per capita contact matrix, \(\:{\stackrel{\sim}{m}}_{ij}\). The carriage acquisition rates for VT and NVT were expressed as:
$$\:{\lambda\:}_{V}^{\left(i\right)}=\:{\beta\:}_{V}^{\left(i\right)}\:{\sum\:}_{j=0}^{A-1}{\stackrel{\sim}{m}}_{ij}{CC}_{V}^{\left(j\right)}$$
$$\:{\lambda\:}_{N}^{\left(i\right)}=\:{\beta\:}_{N}^{\left(i\right)}\:{\sum\:}_{j=0}^{A-1}{\stackrel{\sim}{m}}_{ij}{CC}_{N}^{\left(j\right)}$$
where
$$\:{CC}_{V}^{\left(i\right)}={C}_{V}^{(V,i)}+{C}_{V}^{(N,i)}+q({C}_{VN}^{\left(V,i\right)}+{C}_{VN}^{(N,i)})$$
$$\:{CC}_{N}^{\left(i\right)}={C}_{N}^{(V,i)}+{C}_{N}^{(N,i)}+q({C}_{VN}^{\left(V,i\right)}+{C}_{VN}^{(N,i)})$$
Here, \(\:q\) refers to the relative infectiousness of each serotype for co-carriers.
Table 2 summarizes the parameters used in this study.
Outcome definition
In a neutral null model, one serotype is not assumed to have a fitness advantage over the other; therefore, co-carriers transmit either VT or NVT at equal probability. The relative infectiousness with each serotype for co-carriers, \(\:q\), is set to 0.5, such that co-carriers are equally infectious as mono-carriers17. To ensure neutrality in the null model, we checked if \(\:F\) was stable over time in the model without an effective vaccine, as suggested by17 (Supplementary Fig. 4). \(\:F\) is given by:
$$\:F=\:\frac{{C}_{V}^{\left(V\right)}+{C}_{V}^{\left(N\right)}+q({C}_{VN}^{\left(V\right)}+{C}_{VN}^{\left(N\right)})}{{C}_{V}^{\left(V\right)}+{C}_{V}^{\left(N\right)}{+\:C}_{N}^{\left(V\right)}+{C}_{N}^{\left(N\right)}+2q({C}_{VN}^{\left(V\right)}+{C}_{VN}^{\left(N\right)})}$$
We defined time-to-elimination as the duration between vaccine introduction and the time when \(\:F\) dropped to 5% of its initial value in age 0, representing a fully unvaccinated population and reflecting the indirect effect of PCV introduction. As time-to-elimination in all ages were highly correlated in each country, the choice of age had a negligible effect on the analyses comparing countries.
Analyses
Model assessment
To verify the model, we used location-specific contact matrices and parameter sets to simulate the VT-carrier prevalence in children from the pre- to post-PCV era in 4 locations (Table 1) and compared the simulated values to the observed ones qualitatively.
We calibrated the model for each location by estimating the parameters \(\:{\beta\:}\!^{\left(i\right)}\). First, we performed a global search on 1000 values between − 10 and 10, corresponding to \(\:\beta\:\) values between 0 and 1 on the logit-transformed scale. The values were sampled using Sobol’s sequence49, a quasi-random sampling method, to ensure the global parameter space was searched thoroughly. The global search sought a set of \(\:{\beta\:}\!^{\left(i\right)}\) that minimized the total squared difference in simulated versus observed pre-PCV VT-carrier prevalence on the logarithmic scale in all age groups. Here, the age groups were defined based on the observed prevalences as 0 y, 1–4 y, 5–17 y, 18–39 y, 40–59 y, and 60–84 y. The best five solutions from the global search were used as the starting value for a local search using the Subplex algorithm50 until the total squared difference was minimized or could not be further reduced after a maximum of 1000 evaluations. Given the uncertainty around this parameter, we simulated the VT-carrier prevalence in children in each location using a range of vaccine efficacy against colonization51.
Effect of contact features
After model assessment, we moved on to investigate the effect of social contact structure on the replacement dynamics. We first summarized the contact matrices using two age group-specific features—contact rate and assortativity—and then explored the relationship between these features and the time-to-elimination. Here, the age groups were defined as 0–4 y, 5–9 y, 10–19 y, 20–39 y, 40–59 y, and 60–84 y, to be consistent with the parameter value assignment (Table 2). We defined contact rate as the average total daily contacts in an age group and assortativity as the fraction of contacts from within the age group out of total contacts for each age in the age group.
We described the distribution of total contact and assortativity over age in all 34 contact matrices (Fig. 5A, B) and explored the association between time-to-elimination and these contact features in all age groups (Supplementary Fig. 13). We also looked at this association in the sensitivity analyses, where we used empirical demography (Supplementary Fig. 14), and where we assumed different carriage prevalences across ages (Supplementary Fig. 15).
Based on the strong negative correlation between the two contact features and time-to-elimination portrayed in children under 5 (Fig. 5C), we performed a regression analysis on time-to-elimination with standardized contact rate and standardized assortativity in this age group as predictors in a GLM with a log link (Fig. 5D). We reported the effect estimates with 95%CI for both variables and assessed the goodness-of-fit with \(\:{R}^{2}\). We further tested the out-of-sample prediction performance of the GLM containing only these two predictors by leaving 4 contact matrices out as the test set and using the remaining 30 contact matrices as the training set. We repeated this procedure 10 times and reported the MRAE for each iteration (Supplementary Table 2).
Effect of key parameters
To delineate the individual effect of the key parameters— vaccine efficacy, vaccine coverage, immunity waning, the initial proportions of VT and NVT carriers, and population susceptibility—on time-to-elimination, we varied them one at a time using a range of values and measured the time-to-elimination.
Vaccine efficacy and coverage were considered key parameters because these contributed to the selective pressure that drives serotype replacement. We varied vaccine efficacy between 33 and 77% based on the observed efficacy with uncertainty in a community randomized trial51, consistent with the findings of a systematic review52. Other than no waning, we tested a range of durations of vaccine-conferred immunity, ranging from 3 to 10 years53,54. Evidence suggests pre-PCV serotype distribution in carriage and diseases as important predictors of vaccine impact55; therefore, we tested a range of initial proportions of VT-carriers (\(\:{f}_{V}\left(0\right)\)), NVT-carriers (\(\:{f}_{N}\left(0\right)\)), and implicitly, co-carriers (1–\(\:{f}_{V}\left(0\right)\)–\(\:{f}_{N}\left(0\right)\)), either allowing the proportion of VT among colonizing serotypes (\(\:F\)) to fluctuate or be fixed at 0.65. For constant \(\:F\), we further considered a range of competition levels (\(\:{k}_{V}\)=\(\:{k}_{N}\)=0.1, 0.25, 0.75) in the sensitivity analysis.
In these model experiments, the age-specific overall carriage prevalence remained constant. Lastly, to investigate the dynamics under different population susceptibilities, we changed the age-specific susceptibility parameter, \(\:{\beta\:}\!^{\left(i\right)}\), by ± 20% compared to the baseline value, which led to higher and lower overall carriage, respectively. In each simulation, we used 34 contact matrices from16 to see if the effect of each key parameter differs by social contact structure.
Numerical implementation
All analyses were conducted in RStudio with R version 4.5.1) and the non-linear model fitting was performed using the base package “stats”56. The transmission model was implemented using the package “pomp” version 4.657. All optimization procedures were implemented using the algorithms available in the package “nloptr” version 2.0.358.