molFOI is not associated with age
We define the molFOI as the number of genetically distinct strains that infected a given participant over a period of time12, where a newly incident infection by a genetically distinct strain was defined as observation of a novel amplicon haplotype (not seen in the previous two samples from the participant) at one or more of the four genotyped loci (Supplemental Fig. 1)12. We calculated the molFOI for each participant over the entirety of the 2011 season (Fig. 2a). The molFOI range was 0–55 (mean: 11, median: 10, standard deviation: 9). We found no overall correlation between participant age and molFOI (quasi-Poisson regression, age coded as a continuous variable, P = 0.077), though infection rate may be lower in the youngest and oldest participants We also examined the number of clinical disease episodes per participant in the same 2011 cohort, using the number of malaria treatments as a proxy for clinical disease diagnoses (Fig. 2b). The range in number of treatments per participant is smaller than molFOI, ranging from 0 to 6 (mean: 2, median: 2, standard deviation: 1). We found a negative correlation between participant age and number of treatments (quasi-Poisson regression, P = 0.00073, β = −0.02). We directly compared molFOI and number of treatments per participant (Supplemental Fig. 3a). The number of treatments was significantly associated with molFOI (quasi-Poisson model, t(462) = 3.07, P = 0.0023), with inclusion of treatments accounting for 2.02% of deviance in 2011 molFOI values when compared to the base model, implying that disease rate explains only a small fraction of the observed variation in infection rate (deviance explained calculated as the difference between the deviance observed by this model and the intercept-only model, divided by the intercept-only model deviance). These observations suggest that the NAI that accumulates with age in this population9 reduces the risk of clinical disease but does not provide measurable protection against infection.
a Each point represents a molFOI value for each of the 464 participants over the 2011 season. The x-axis groups participants by age, aiming for similar sized age groups for comparisons. We found no significant association between age and molFOI (quasi-Poisson regression, two-sided t test; t(462) = 1.78, P = 0.077, β = 0.014 [95% CI: −0.0014 to 0.029]). b Each point represents the number of treatments a participant received for symptomatic disease over the 2011 season for each of the 464 participants. The x-axis matches that in part a. We found a significant negative association between age and number of treatments (quasi-Poisson regression, two-sided t test; t(462) = −3.40, P = 0.00073, β = −0.02 [95% CI: −0.036 to −0.0096]). In both plots, violin plots represent the density distributions, with horizontal lines at the 25th, 50th, and 75th percentiles. CI confidence interval, molFOI molecular force of infection.
Intra-individual molFOI is consistent across seasons
We next compared individual molFOI across transmission seasons to determine whether molFOI variation in 2011 was driven by stochastic factors, or whether differences in infection rate among participants are consistent over time. We hypothesized that while heterogeneity in infection rate might obscure age-associated differences when comparing age bins cross-sectionally, these differences might be evident when profiling individuals longitudinally. We estimated the cumulative molFOI over the subsequent five seasons (2012–2016) and compared that value to the value from 2011 for individual participants (Fig. 3a). We rejected the null hypothesis that variation in 2011 molFOI is stochastic; participants with high molFOI in 2011 generally had high molFOI in 2012-2016 as well (Spearman’s rank correlation, ρ(111) = 0.53 (95% CI = [0.38, 0.66]), P = 1.51e−9; linear regression, P = 3.4e−13, R2 = 0.38). We performed the same analysis using symptomatic disease incidence (as estimated by number of malaria treatments) in place of molFOI in Fig. 3b. In contrast to molFOI, while the linear regression (adjusted R2 = 0.03, P = 0.041) between number of treatments in 2011 and 2012–2016 was significant, Spearman’s rank correlation (ρ(111) = 0.17 (95% CI = [−0.02, 0.35]), P = 0.073) did not indicate the presence a significant correlation in symptomatic disease incidence across time periods. This contrast between infection and treatment rates suggests that molFOI may be a more durable estimator of individual infection risk, as it does not appear to be modified by age-dependent NAI.
a Each point represents a single participant, showing molFOI from the 2011 season (x-axis) and cumulative molFOI across the 2012–2016 seasons (y-axis). The axis scales vary due to the difference in time periods and sampling strategy between the two datasets (see Fig. 1). The line represents a linear regression (adjusted R2 = 0.38, P = 3.4e−13). We also computed the two-sided Spearman’s rank correlation between molFOI in 2011 and 2012–2016; we found a highly significant, positive correlation (⍴(111) = 0.53 (95% CI = [0.38, 0.66]), P = 1.51e−9). b Replicates the figure in (a), but with number of treatments for symptomatic disease in 2011 on the x-axis and number of treatments for symptomatic disease in 2012–2016 on the y-axis. The line represents a linear regression (adjusted R2 = 0.03, P = 0.04), but a two-sided Spearman’s rank correlation did not identify a significant association between number of treatments in 2011 and 2012–2016 (⍴(111) = 0.17 (95% CI = [−0.02, 0.35]), P = 0.073). c Average molFOI of all participants of a given age (on the x-axis) during a given season (on the y-axis). d Average number of treatments of all participants of a given age (on the x-axis) during a given season (on the y-axis). e molFOI from the 2011 season, stratified by infection status at enrollment in May 2011. molFOI for participants who were infected at enrollment is significantly higher than for those not infected at enrollment (Kruskal–Wallis test, χ2(1) = 78.42, P = 8.33e−19). f Replicates the analysis in (c), but with number of treatments for symptomatic disease in 2011 on the y-axis. Participants who were infected at enrollment had significantly more treatments than those who were not infected at enrollment (Kruskal–Wallis test, χ2(1) = 7.85, P = 0.0051). Violin plots in (e, f) represent the density of the distributions, with horizontal lines at the 25th, 50th, and 75th percentiles. CI confidence interval, molFOI molecular force of infection.
We also stratified molFOI and treatment data for each participant by season, resulting in five molFOI values per participant (Supplemental Fig. 2). The individual molFOI-per-season values ranged from 0 to 106 (mean: 10, median: 6, standard deviation: 12). We found no significant differences in the overall distribution of molFOI values stratified by season, suggesting no major temporal or weather factors impacted any one season (Kruskal–Wallis test, χ2(4) = 4.36, P = 0.36). We visualized the patterns of average molFOI per age in each season (Fig. 3c). We found that a specific sub-cohort of participants that were age nine in 2012 consistently exhibited high average molFOI values over time, despite their increasing age and cumulative parasite exposure. We similarly visualized average treatment counts per age (Fig. 3d), but did not observe the same sub-cohort trend as with molFOI.
We further explored consistency in individual infection rate over time by fitting a linear mixed-effects model (Table 1) to these data, with participant age as a fixed effect and participant identity as a random effect, aiming to capture intra-individual changes in molFOI. In this model, fixed effects (participant age) had a pseudo-R2 value of 0.00, while the pseudo-R2 value for all effects was 0.50, further demonstrating the lack of relationship between age and infection rate. The mixed-effects model fit the data better than an age-only linear model (linear regression, adjusted R2 = 0.076, P = 4.4e−12; models compared via likelihood rank test, χ2(1) = 170.69, P = 5.24e−39), suggesting that participant identity is a more significant driver of variation in infection rate than age. We also fit linear mixed-effects models to subsets of this cohort by age (Table 1). In all of these models, age is not significantly associated with molFOI, and the pseudo-R2 value for all effects is notably larger than the pseudo-R2 value for age alone (models for participants of all ages, 0–4 years, 4–7 years, and 8+ years, respectively; pseudo-R2 for age as a fixed effect: 0.00, 0.00, 0.01, 0.00 in each model, respectively, and pseudo-R2 for all effects, including participant identity as a random effect: 0.50, 0.71, 0.37, 0.52, respectively).
We also examined infection status in May 2011 (at study enrollment) as a predictor of infection rate; 44% of participants (n = 202) were asymptomatically infected at enrollment. We found that molFOI from the 2011 season was significantly greater for participants who were infected at enrollment (Fig. 3e, mean molFOI for infected = 15.9; uninfected = 7.6; Kruskal–Wallis test, χ2(1) = 78.42, P = 8.33e−19). Again, we replicated this analysis with symptomatic disease incidence (as estimated by treatment counts) in place of molFOI (Fig. 3f). We found a significant difference for disease incidence as well (mean number of treatments for infected = 2.2, for uninfected = 1.9; Kruskal–Wallis test, χ2(1) = 7.85, P = 0.0051), though the range of this metric (0–6) is narrower than that of molFOI (0–55).
We compared infection rate to rates of symptomatic disease by evaluating molFOI with respect to the number of treatments per participant within a transmission season or the parasitemia of individual samples (Supplemental Fig. 3). We also hypothesized that lower density infections would correlate with lower molFOI, if participants had parasitemia near the level of detection of the genotyping assay. However, participants grouped by low, middle, or high molFOI in 2011 exhibited no significant differences in parasite densities (Kruskal–Wallis test, χ2(2) = 4.68, P = 0.096). In fact, participants with high molFOI in 2012-2016 had slightly lower densities (mean densities: 2.5e4 (low molFOI), 2.1e4 (mid molFOI), 2.4e4 (high molFOI); Kruskal–Wallis test, χ2(2) = 10.52, P = 0.0052; Dunn test with Holm correction, P = 0.0037 (low vs. high)), rejecting our null hypothesis. We speculate that individuals with more NAI may have lower parasite densities, and may be more likely to exhibit high molFOI due to reduced incidence of symptomatic disease and antimalarial treatment.
Host factors explain some heterogeneity in molFOI
We next hypothesized that host-specific factors could affect infection rate. We found that participant sex was not associated with molFOI in 2011 (Supplemental Fig. 4a, mean molFOI for females = 11.6, males = 11.1; Kruskal–Wallis test, χ2(1) = 2.47, P = 0.12) or 2012–2016 (Supplemental Fig. 4c, mean cumulative molFOI for females = 47.7, males = 54.0; Kruskal–Wallis test, χ2(1) = 0.77, P = 0.38). Similarly, we stratified symptomatic disease incidence by participant sex, finding that participant sex was also not associated with symptomatic disease counts in either 2011 (Supplemental Fig. 4b, mean cumulative number of treatments for females = 2.1, males = 1.9; Kruskal–Wallis test, χ2(1) = 1.17, P = 0.28) or 2012–2016 (Supplemental Fig. 4d, mean cumulative number of treatments for females = 4.4, males = 4.7; Kruskal–Wallis test, χ2(1) = 0.15, P = 0.70).
We also considered participant ethnicity, bednet usage, and socioeconomic factors19 as potential explanations for molFOI heterogeneity and temporal consistency20. However, neither ethnicity nor bednet usage varied significantly among participants. The majority of participants in both the 2011 and 2012–2016 cohorts are of the Bambara ethnic group (418/464 participants or 90% in 2011 and 105/120 participants in 2012–2016, or 87.5%). Participants nearly universally reported daily bednet usage when surveyed in 2013 (561/563 households owned bednets and 559/563 households reported daily usage). We used roof type as a proxy for socioeconomic status, as in prior studies of this cohort17, where metal roofs are a proxy for relative wealth. In the 2011 cohort, we found a significant difference in molFOI between participants living in homes with flat roofs (n = 143, mean molFOI = 14.1) vs. those from homes with metal roofs (n = 269, mean molFOI = 10.5; Kruskal–Wallis test, χ2(1) = 8.12, P = 0.004). We observed a similar trend in the 2012-2016 data between participants from homes with flat roofs (n = 37, mean molFOI = 58.6) vs. participants from homes with metal roofs (n = 76, mean molFOI = 49.8), but the difference was not statistically significant (Kruskal–Wallis test, χ2(1) = 1.64, P = 0.20).
Given that many individuals in this population carry malaria-protective variant alleles at the hemoglobin subunit beta (HBB) locus17, we examined molFOI stratified by HBB genotype (Fig. 4, stratifications of cohorts by HBB genotype and age or sex in Supplemental Table 1). Of the 464 participants in our 2011 cohort, 372 were homozygous for the ancestral allele, A (HbAA genotype). A total of 43 participants were heterozygous for the S allele (HbAS genotype), which confers the sickle cell trait and has been associated with lower risk of both uncomplicated and severe malaria disease21,22,23. A total of 47 participants were heterozygous for the C allele (HbAC), a genotype that has not been as clearly associated with protection from uncomplicated malaria as the HbAS genotype21 Due to the low sample size, we excluded two participants with HbCC and HbSC genotypes from the analysis in Fig. 4a only. We found significant differences in the molFOI in 2011 for participants with HbAS compared to both HbAA and HbAC genotypes (Fig. 4a, mean molFOI for HbAA (11.8), for HbAC (13.1), for HbAS (5.3). Kruskal–Wallis test, χ2(2) = 23.53, P = 7.8e−6, ε2 = 0.05. Dunn test with Holm adjustment, P = 7.6e−6 (HbAA vs. HbAS), 9.1e−5 (HbAC vs. HbAS), 0.51 (HbAA vs. HbAC)). In the 2012–2016 cohort, which had fewer participants (n = 120) and thus lower statistical power compared to the 2011 cohort, fewer participants had variant genotypes (6 HbAC and 17 HbAS); molFOI values were lower in HbAC (mean molFOI: 45.5) and HbAS (mean molFOI: 34.3) participants than in HbAA participants (mean molFOI: 54.3), but they did not reach the threshold of significant difference (Fig. 4c; Kruskal–Wallis test, χ2(2) = 0.96, P = 0.62). We also examined symptomatic disease stratified by HBB genotype (Fig. 4). We found significant differences in the number of treatments for individuals with HbAS genotype compared to HbAA in both the 2011 cohort (Fig. 4b, Kruskal-Wallis test, χ2(2) = 6.55, P = 0.038, ε2 = 0.014. Dunn test with Holm adjustment, P = 0.032 (HbAA vs. HbAS), 0.12 (HbAC vs. HbAS), 0.93 (HbAA vs. HbAC) and the 2012-2016 cohort (Fig. 4d, (Kruskal–Wallis test, χ2(2) = 12.58, P = 0.0019, ε2 = 0.11. Dunn test with Holm adjustment, P = 0.0016 (HbAA vs. HbAS), 0.36 (HbAC vs. HbAS), 0.53 (HbAA vs. HbAC)).
a molFOI values from 2011, stratified by participant HBB genotype. molFOI is significantly different between some groups (Kruskal–Wallis test, χ2(2) = 23.53, p value = 7.8e−6, ε2 = 0.05. Dunn test with Holm adjustment, p values = 7.6e−6 (HbAA vs. HbAS), 9.1e−5 (HbAC vs. HbAS), 0.51 (HbAA vs. HbAC)). b Number of treatments for symptomatic disease in 2011, stratified by HBB genotype. Number of treatments is significantly different between some groups (Kruskal–Wallis test, χ2(2) = 6.55, p value = 0.038, ε2 = 0.014. Dunn test with Holm adjustment, p values = 0.032 (HbAA vs. HbAS), 0.12 (HbAC vs. HbAS), 0.93 (HbAA vs. HbAC)). c molFOI values from 2012 to 2016, stratified by participant HBB genotype. No significant differences between groups (Kruskal–Wallis test, χ2(2) = 0.96, p value = 0.62). d Number of treatments for symptomatic disease in 2012–2016 stratified by participant HBB genotype. Number of treatments is significantly different between some groups (Kruskal–Wallis test, χ2(2) = 12.58, p value = 0.0019, ε2 = 0.11. Dunn test with Holm adjustment, p values = 0.0016 (HbAA vs. HbAS), 0.36 (HbAC vs. HbAS), 0.53 (HbAA vs. HbAC)). In all panels, horizontal lines represent significant differences between groups, and colors represent the same categories as the x-axis. Violin plots represent the density of the distribution, with horizontal lines representing the 25th, 50th, and 75th percentiles. HBB hemoglobin subunit beta locus, molFOI molecular force of infection.
Additionally, we considered other potential effects that could be associated with HBB genotype (Supplemental Fig. 5). Heterozygous HBB genotypes did not cluster into a particular region of Kalifabougou (Supplemental Fig. 5a), nor did they have a different range in participant ages than HbAA (Supplemental Fig. 5b, Kruskal–Wallis test, χ2(2) = 1.84, P = 0.40). We also did not find a difference in parasite density for HbAS individuals, which could have potentially biased our ability to detect molFOI (Supplemental Fig. 5c, Kruskal–Wallis test, χ2(2) = 2.45, P = 0.29), though the sample size of available parasitemia data after stratification by HBB genotype is small.
We fit an additional linear mixed-effects model to the 2012–2016 molFOI data incorporating these additional host factors in addition to age, with participant identity as a fixed effect (Table 1). In this model, none of the fixed effects have a significant effect on molFOI on their own (P = 0.26, 0.47, 0.60, 0.09, 0.41 for age, roof type, HbAC genotype, HbAS genotype, and sex, respectively). As a whole, the fixed effects contributed a small amount to the model’s total explanatory power; pseudo-R2 for fixed effects alone was 0.03, and pseudo-R2 for all effects was 0.52.
Geographic features may explain some heterogeneity in molFOI
We next considered whether geographic factors could affect infection rate. First, we analyzed the spatial autocorrelation of the 2011 molFOI values (Fig. 5a), to look for general associations between household location and molFOI. We found a very small negative dispersion of molFOI (Moran’s I = −0.023, pseudo-P = 0.002, two-sided hypothesis, compared to a Monte-Carlo simulation of permutations of the data), suggesting that micro-geographic features do not drive locally similar molFOI values. We found a positive correlation between the distance from participants’ homes to the Kalifabougou study clinic and their molFOI in 2011 (Fig. 5b, linear regression, adjusted R2 = 0.089, P = 4.3e−8; quasi-Poisson regression, t(312) = 5.71, P = 2.68e−8). We also found a positive correlation between participants’ distance to the closest part of the river system and their 2011 molFOI (Fig. 5c, linear regression, adjusted R2 = 0.025, P = 0.0031; quasi-Poisson regression, t(312) = 3.00, P = 0.003). Taken together, these spatial analyses suggest that geographic differences may have a slight effect, but they do not explain the majority of the observed heterogeneity in infection rate in this cohort (2.45% and 8.30% of deviance explained by distance to clinic and rivers, respectively).
a Visualization of homes and molFOI in Kalifabougou, Mali. Each dot represents the geographic coordinates of one participant’s home, colored by their molFOI in 2011. Points are slightly jittered, for visibility. Black lines represent local rivers, geospatial data from Global Map of Mali © ISCGM/IGM. b Scatter plot of each participant’s molFOI in 2011 (x-axis) vs. the distance from their home to the Kalifabougou clinic (y-axis, labeled in kilometers) (quasi-Poisson regression, two-sided t test; t(312) = 5.71, P = 2.68e−8, β = 0.00021 [95% CI: 0.00014–0.00028]). The line shows the linear regression (adjusted R2 = 0.089, P = 4.3e−8). c A similar scatter plot to (b), except that the y-axis shows the distance from each participant’s home to the nearest river (quasi-Poisson regression, two-sided t test; t(312) = 3.00, P = 0.003, β = 0.19 [95% CI: 0.066–0.32]). Linear regression shown as described above (adjusted R2 = 0.025, P = 0.0031). CI confidence interval, molFOI molecular force of infection.
Several serological features distinguish between low and high infection rate participants
Systems serological profiling was previously performed on serum from 201 participants included in our sequencing study18. These data were generated using samples from May 2011, at enrollment into the study and before the transmission season began that year, with the objective of defining potential correlates of protection. Antigen-specific IgG1, IgG2, IgG3, IgG4, IgA1, and IgM to AMA1, full-length CSP, MSP1, RH5 and the N- and C-terminal domains of CSP were quantified. The functional potential of these antigen-specific antibodies was also captured, including the ability of the antigen-specific antibodies to bind to Fc receptors (FcRn, FcγRIIAH, FcγRIIAR, FcγRIIB, FcγRIIIAF, FcγRIIIAV, and FcγRIIIB) and recruit antibody-dependent complement deposition, antibody-dependent cellular phagocytosis, and antibody-dependent neutrophil phagocytosis. While the evolution of antigen-specific neutrophil activation was previously found to be correlated with rates of disease severity in this cohort18, here we aimed to define whether malaria-specific serological markers were positively or negatively associated with infection rate, as estimated by molFOI.
We set thresholds for molFOI in 2011 to create “low molFOI” (<4) and “high molFOI” (>13) groups, which correspond to the lowest and highest 12% of the molFOI distribution (Fig. 6a). To explore the sensitivity of the analysis to threshold used, we defined two additional different sets of thresholds, to consider the upper and lower 33% and 25% of the molFOI distribution (Supplemental Table 2). We primarily discuss results from the most stringent set of thresholds here; the others are highly concordant and are described in Supplemental Figs. 6 and 7. For each set of molFOI cutoffs, we compared each serological feature between low and high molFOI groups using Mann–Whitney tests with Benjamini–Hochberg corrections. In all three approaches to molFOI stratification, Fcγ receptor binding to the highly polymorphic c-terminal region of CSP (specifically, Fcγ receptor IIIAV, or FcγRIIIAV), as well as Fcγ receptor binding to RH5 (FcγRIIIAV), were enriched in participants with low molFOI (Mann–Whitney test with Benjamini–Hochberg correction, P = 0.015 and 0.030, respectively; Fig. 6c, d), suggesting that these may be protective against infection. While only these two features were significant in all three molFOI stratifications, 13 additional features were enriched in individuals with high molFOI in two of the three stratifications, including IgG1 specific to AMA1, similar to previous studies suggesting that it may be a marker of infection history24,25,26.
a Distribution of molFOI in the 2011 season for the 201 participants for whom we had both genetic and serology data. We compared low and high molFOI participants in the following analyses, who are colored in orange and teal, respectively (see Supplemental Table 2 for details on molFOI ranges used here and in other versions of this analysis). b We identified serological features with significant differences between high and low molFOI groups, using two-sided Mann–Whitney tests with Benjamini–Hochberg corrections for multiple comparisons. Each point here represents a serological feature in the analysis. The x-axis shows the difference between the average value of that feature in the high molFOI vs. low molFOI groups; points greater than zero were expressed more in the high molFOI group and points less than zero were expressed more in the low molFOI group. The y-axis shows the −log10 of the adjusted p values; points above the dashed line were significant. Significant features are labeled. c Raw data from FcγRIIIAV binding to the c-terminal of CSP, one of the features that was significant in this analysis (two-sided Mann–Whitney test with Benjamini–Hochberg correction for multiple comparisons; W = 112, uncorrected P = 0.00018, corrected P = 0.015; confidence interval not applicable). d Raw data from FcγRIIIAV binding to RH5, the other significant feature in this analysis (two-sided Mann–Whitney test with Benjamini–Hochberg correction for multiple comparisons; W = 127, uncorrected P = 0.00068, corrected P = 0.030; confidence interval not applicable). In (c, d), each point represents a single participant. Violin plots represent the density of the distribution, with horizontal lines representing the 25th, 50th, and 75th percentiles. CSP circumsporozoite protein, molFOI molecular force of infection, RH5 reticulocyte-binding protein homolog 5.




