Evolution and spillover dynamics of yellow fever at the forest–urban interface in Brazil

Categories: Disease & Virus

March 12, 2026

Ethics approval

Only naturally deceased NHP carcasses were sampled for YFV surveillance. The surveillance protocol for dead NHPs was approved by the Ethics Committee for the use of Animals in Research, Instituto Adolfo Lutz (nos. 0135D/2012 and 020 G/2014), and includes work in protected environmental areas.

Study site

Parque Estadual Alberto Löfgren (PEAL; 186 hectares, 40% public use) in an Atlantic Forest fragment in northern São Paulo municipality (775–850 m elevation) contiguous with Cantareira State Park (PEC). It sits within São Paulo Green Belt Biosphere Reserve (UNESCO), part of the Atlantic Forest Biosphere Reserve. Public-use sectors include Horto Florestal, Olaria, Polo Ecocultural and Arboreto Vila Amália, which also contains ~256 residential dwellings and staff housing. Annual visitation exceeded 1.6 million in 2019. For spatial analyses, we grouped management sectors into three zones (A–C, Fig. 1a).

Entomological surveys

Following YFV confirmation at PEAL, we conducted targeted mosquito surveys in 43 days from 22 Dec 2017 to 02 Oct 2018 (40 epidemiological weeks) at 39 points near NHP carcasses. Using personal protective equipment, teams applied a standard protocol with: (1) hand nets and (2) Nasci vacuum aspirators at ground level and (3) CDC light traps baited with CO₂ (dry ice) at ground and canopy levels. Traps operated from 09:00 to 15:00 and aspirator collections were performed for a fixed duration per point within the same period. To capture mosquito–host interface zones, forest edges, peridomestic areas and forest trails were sampled.

Captured mosquitoes were frozen alive in liquid nitrogen and stored at −70 °C in labelled cryotubes. For each collection we recorded method, stratum (ground/canopy), coordinates, date, number of tubes and collection period. Specimens were identified on a cold table with standard taxonomic keys^62,63,64. Females from the same place/date were pooled (1–10) by species (or genus if needed). Raw, georeferenced records are provided in Supplementary Table 1. Temporal and spatial sampling layouts, genus composition and relative abundance are shown in Supplementary Fig. 1. Abundance summaries stratified by collection method and height are shown in Extended Data Fig. 1.

All statistical analyses, data wrangling and visualization were performed in R (v.4.3.2)⁶⁵ using tidyverse, ggplot2, lubridate and related packages unless otherwise specified. Additional packages used for specialized tasks are cited at first mention.

Meteorological data for São Paulo municipality

Hourly total precipitation (tp), 2-m air temperature (T) and 2-m dewpoint temperature (d) for 1 Jan 2017–31 Dec 2018 were retrieved from the ERA5-Land reanalysis dataset provided by Copernicus Climate Change Service (C3S) (0.1⁰ resolution)⁶⁶. ERA5 temperatures (Kelvin) were converted to °C and precipitation (m) to mm. Relative humidity (rh) was computed using the Magnus approximation (parameters a = 17.625, b = 243.04; temperatures in °C):

$$\mathrm{rh}=100\times \exp (d\times a)/(d+b)\exp (T\times a)/(T+b)$$

(1)

ERA5-Land was overlaid on LandScan annual population rasters (1/120° resolution^67,68) using geobr (v.1.8)⁶⁹ municipality boundaries (majority-area tule when borders intersected a cell). Each LandScan cell inherited the nearest ERA5-Land grid-point values (nearest-neighbour mapping), yielding a per-cell table with administration unit, population and hourly meteorological variables for each pixel. Population-weighted hourly series were computed for São Paulo municipality and summarized into daily and then monthly indices (temperature minima/means/maxima; relative humidity minima/means/maxima; precipitation sum of daily totals) (Supplementary Table 4). These indices approximate meteorological conditions experienced across São Paulo municipality (including PEAL) during the study period.

Association between vector abundance and climate factors

Monthly counts of H. leucocelaenus were modelled with negative-binomial GLMs. Covariates were z standardized, and predictors included contemporaneous and 1-month-lagged temperature (min/mean/max), relative humidity (min/mean/max) and cumulative rainfall. Unless specified, all hypotheses were two-sided and interpreted at alpha = 0.05. Model selection used the Akaike information criterion (AIC). Results are reported as rate ratios (RR) with two-sided 95% confidence intervals (CIs) and pseudo-R². Collinearity (particularly between temperature and rainfall) (Supplementary Fig. 2) was assessed and, where indicated, addressed via interaction/quadratic terms. Models were fitted using MASS::glm.b, and correlations were summarized with corrplot⁷⁰ and regression tables generated with stargazer⁷¹ (Supplementary Tables 5 and 6). An identical framework was applied to the four most frequent genera at PEAL (Haemagogus, Aedes, Limatus and Culex) (Supplementary Table 7). Coefficients are reported as RR per 1 °C and per unit rainfall, with 95% CIs and pseudo-R².

NHP carcass collection and notification

On Oct 9, 2017, the first confirmed YFV case at PEAL was detected in a southern brown howler monkey (A. guariba). During the ensuing outbreak, carcasses were collected under routine wildlife surveillance and samples were collected for virological testing. For each carcass, we recorded detection date, coordinates and decomposition stage (early-stage ≤48 h, medium/advanced >48 h post mortem) based on external morphology and soft-tissue preservation; no carcasses in desiccated/remains stages were collected (≥144 h post mortem) (Supplementary Table 10). Notifications were filed by the Wild Animal Management Center (CeMaCas) and reported to the São Paulo State Health Secretariat via Information System for Notifiable Diseases (SINAN), following Brazilian Ministry of Health guidelines²⁴.

Nucleic acid isolation and YFV RT–qPCR

Specimens were tested for YFV by RT–qPCR in accordance with the YFV Brazilian National Surveillance Program²⁴. For NHPs, liver fragments were homogenized with magnetic beads (Magna Lyser, Roche), and RNA was extracted from 700 μl of homogenate (QIAamp RNA Blood mini kit, QIAGEN) following manufacturer instructions. Mosquito pools were homogenized in 700 μl phosphate-buffered saline (PBS, 0.75% bovine albumin, penicillin 100 U ml⁻¹ and streptomycin 100 μg ml⁻¹) using the MagNA Lyser (Roche) and RNA extracted (QIAamp Viral RNA mini kit, QIAGEN). Viral detection was conducted using the GoTaq 1-Step RT–qPCR (Promega) at Instituto Adolfo Lutz Virology Centre as previously described⁷², with standard RT–qPCR cycling conditions: reverse transcription (45 °C for 10 min), enzyme activation (95 °C for 10 min), followed by 40 cycles (95 °C for 15 s and 60 °C for 45 s for annealing/extension) on an ABI7500 Real-Time PCR (Thermo Fisher).

Species‑specific C
_t analysis across mosquito species

To compare viral RNA abundances across mosquito species, we analysed C_t values from YFV-positive pools of non-blood-fed females, combining 9 observations from this study with 109 observations from three Brazilian datasets^5,73,74. Species names were harmonized (grouping H. janthinomys and H. capricornii as females are morphologically indistinguishable⁷⁵); assay protocols^72,76,77, date, municipality and pool size were recorded. We compared H. leucocelaenus against other species within each assay using two-sided Mann–Whitney/Wilcoxon tests with Benjamini–Hochberg FDR adjustment across pairs. To account for assay and pool size, we fitted a linear model with C_t ≈ pecies + assay + pool size (Extended Data Fig. 2). Statistical tests were conducted in R and results tidied with broom⁷⁸. The source data (species, assay, pool size, pool-level C_t, references) are provided as Supplementary Table 2.

H. leucocelaenus YFV natural infection rates

We estimated mosquito infection rates using (1) minimum infection rate (MIR; per 1,000) and (2) a pooled-binomial maximum-likelihood approach accounting for variable pool sizes⁷⁹. To contextualize PEAL, we compiled Brazilian studies testing ≥10 pools per species and extracted MIRs for H. leucocelaenus (n = 7), H. janthinomys (n = 5) and ‘other species’ (n = 4, including Sabethes soperi, Sabethes chloropterus, Aedes scapularis and Aedes taeniorhynchus)^5,32,80 (Extended Data Fig. 3). Species-level MIR distributions were compared using Kruskal–Wallis followed by Benjamini–Hochberg FDR-adjusted two-sided pairwise Wilcoxon rank-sum tests. The source data (species, municipality, number of pools, total mosquitoes and pool sizes (when available), MIR, dates and references) are provided as Supplementary Table 3.

Viral metagenomics

A schematic of laboratory and phylogenetic analyses is shown in Supplementary Fig. 4. YFV-positive NHP tissues and mosquito pools were sequenced with a validated SMART-9N metagenomic sequencing protocol^9,10,18. Briefly, 44 μl of extracted viral RNA were DNase treated (TURBO DNase, Thermo Fisher), cleaned up and concentrated to 10 μl (RNA Clean & Concentrator-5, Zymo). Extraction blanks and no-template controls were included at each step to monitor contamination. MinION libraries were prepared from 50 ng of double-tagged cDNA per sample, barcoded and pooled equimolarly using the EXP-NBD104 (1–12) and EXP-NBD114 (13–24) Native Barcoding kits (Oxford Nanopore Technologies (ONT)), and sequenced with the SQK-LSK109 sequencing kit on fresh FLO-MIN106 (R.9.4.1) flowcells (ONT). Three 24-plex libraries were run on a GridION (ONT) under MinKNOW 1.15.1 (ONT) using a standard 48-h script.

Tiled‑amplicon sequencing and cross‑method benchmarking

A subset of 56 YFV-positive NHP and mosquito samples was sequenced with a validated multiplex tiled-amplicon protocol (YFV500/V1)⁸¹ designed during the early stages of the outbreak (April 2017)⁹. Overlapping 500 bp amplicons spanning the coding region of the YFV South American genotype I outbreak clade were generated and sequenced on GridION (ONT). We compared C_t values with paired differences in consensus sequence coverage between tiled-amplicon vs viral metagenomics, fitting a linear model (stats v.3.6.2 package) to calculate the C_t at which the methods perform equivalently (Supplementary Fig. 3). Source data can be found in Supplementary Table 8. Coverage profiles and recurrent amplicon dropouts for the YFV500/V1 multiplex primer scheme are shown in Extended Data Fig. 4.

Consensus genome generation

FASTQ files were demultiplexed and adapter trimmed (Guppy v.5.0.16, ONT), mapped to YFV South American genotype I (BeH655417; GenBank accession number JF912190) with minimap2 (v.2.28)⁸², then processed with SAMtools (v.1.20)⁸³. Raw read counts and fragment N₅₀ were summarized with NanoStat (v.1.1.2)⁸⁴. Genome visualization, mapped-read counts and depth profiles were inspected with Tablet (v.1.17.08.17)⁸⁵ and SAMtools (v.1.20)⁸³. Variants were called using Medaka v.1.12.1 (ONT) and consensus sequences generated with margin_cons Medaka v.1.12.1 (ONT); regions with C_t values and metadata are provided in Supplementary Tables 8 and 9.

Host and vector identification and detection of other viral pathogens from metagenomic sequencing data

Despite DNAse treatment to remove background DNA in the SMART-9N protocol, sufficient host and vector reads remained to confirm species. Demultiplexed FASTQs were mapped with minimap2 (v.2.28)⁸² to H. leucocelaenus (GenBank accession number NC057212.1) and A. guariba (GenBank accession number NC_064186.1). Alignments were processed with SAMtools (v.1.20)⁸³ and consensus fragments were generated as described above. Species identities were verified using BLASTn v.216.0 against NCBI GenBank⁸⁶. To screen for non-YFV viruses, reads were classified with Kraken2 (v.2.1.3)⁸⁷ (RefSeq complete viral genomes⁸⁸), inspected in pavian⁸⁹ and validated by reference mapping (NCBI Viral Genome Resource⁹⁰). We considered a detection as corroborated when mapping showed coherent coverage across the expected genomic region (breadth and depth consistent with library yield) and the signal was absent from concurrent negative controls.

HAV consensus and phylogeny

Reads from NP2754 were mapped to GenBank accession number MG049743.1 and a consensus was generated using the same pipeline (positions with depth 91. For context, we retrieved the closest 100 sequences to NP2754 HAV via BLASTn (v.216.0)⁸⁶ and NCBI Virus (Brazilian sequences >500 bp, Oct 2025)⁹². After excluding outliers with very long branch lengths (AF268396.1, MG181943.1, MZ557007.1) and removing records without country information, sequences were aligned with MAFFT v.7 (–auto) and end trimmed to the longest region shared with the NP2754 HAV consensus. After additional QC with TempEst (v.1.5.3)⁹³, a 97-sequence alignment (6,638 bp, sampling range = 68 years) was used to estimate a maximum-likelihood tree using IQ-TREE2 (v.2.3.66)⁹⁴ with ModelFinder Plus⁹⁵ and node support from UFBoot2 (1,000) and SH-aLRT (1,000) replicates⁹⁶. The final tree was midpoint rooted and is shown in Extended Data Fig. 5.

YFV nucleotide data collation, curation and ML phylogenies

Public YFV sequences were retrieved from GenBank and filtered to include Brazil-origin samples with >70% genome coverage (based on submitter metadata or computed coverage) and exclude laboratory strains, chimaeras and records lacking a verifiable collection date/location. For samples represented by multiple assemblies, we kept the highest coverage or earliest submission and removed duplicates. PEAL sequences from this study were added after the same QC. Coding regions were aligned with MAFFT v.7 (–auto)⁹⁷ and manually curated in AliView (v.1.28)⁹⁸. Sequences with >30% ambiguous sites or obvious frame disruptions were excluded.

From the curated alignment (10,221 bp) we defined three datasets: (1) Brazil (2008–2023; n = 1,063, 13 states, including PEAL; Supplementary Fig. 5); (2) São Paulo (a subset focused on the previously described YFV_SP clade²⁵, including sequences from Goiás, Minas Gerais, São Paulo and Rio de Janeiro collected between 2015–2019; n = 450, including PEAL; Supplementary Fig. 6); and (3) PEAL (n = 88 near-complete and complete PEAL genomes from groups I–III; Fig. 3a). Per-sequence QC metrics and accession numbers of the PEAL dataset are provided in Supplementary Table 9. We inferred ML phylogenies using IQ-TREE (v.2.2.2.6)⁹⁴ with the ModelFinder Plus model selection⁹⁵ and 1,000 ultrafast bootstrap replicates⁹⁶. Clock-like signal was assessed with TempEst (v.1.5.3)⁹³. For the PEAL dataset, we additionally applied Bayesian Evaluation of Temporal Signal (BETS)⁹⁹, computing four independent analyses: strict vs relaxed molecular clock (with log-normally distributed rate variation among branches¹⁰⁰), and exponential vs Bayesian skygrid tree priors¹⁰¹. Including sampling dates in BETS provided strong support for the presence of temporal signal in the PEAL dataset (log Bayes factors > 18) when including sampling dates vs the null model (no sampling dates) in all four clock and tree prior model combinations. Moreover, the relaxed clock with a skygrid tree prior outperformed the relaxed clock with an exponential tree prior (log Bayes factors = 15.5). Alignments, trees and XMLs are provided in Data availability.

Bayesian phylogenetic inference

Time-scaled phylogenies for the São Paulo and PEAL datasets were inferred using BEAST X¹⁰² with BEAGLE v.3 acceleration¹⁰³ under an autocorrelated relaxed clock (log‑normal among‑branch rate variation¹⁰⁰), Bayesian skygrid demographic model (using 47 grid points and cut-off of 4 years for the São Paulo dataset, and 35 grid points and a cut-off of 3 years for the PEAL dataset)¹⁰¹, HKY substitution with among‑site heterogeneity^104,105 and a CTMC reference prior on the clock rate¹⁰⁶. Tip dates were set at the reported precision (day, month or year). Two independent MCMC chains (100 million steps, sampling every 10,000 steps) were run per dataset. Convergence and mixing of the MCMC chains were assessed in Tracer (v.1.7.2)¹⁰⁷ (effective sample size > 200 for all key parameters and visual agreement across duplicate runs). After 10–25% burn-in, runs were combined (LogCombiner v.1.10.5) and summarized as maximum credibility (MCC) trees (TreeAnnotator (v.1.10.5)¹⁰⁸). Time-dependence of evolutionary rates in YFV South American genotype 1 was explored by comparing posterior rates inferred from the PEAL and São Paulo datasets with published data^{9,109,110,111,112} (Supplementary Fig. 7).

Epizootic cluster definition

On PEAL phylogenies, a cluster was defined as a monophyletic group of ≥2 PEAL genomes (NHP or mosquito) with strong support (ML bootstrap >90 and/or posterior probability >0.90). We verified monophyly in the São Paulo and Brazil‑wide ML trees; all remained strongly supported (bootstrap >95) except Cluster D (2 H. leucocelaenus sequences), which we retained on the basis of support within the PEAL analysis (bootstrap = 81). Given the observation that YFV evolutionary rates can be time dependent (Supplementary Fig. 7), cluster dating used the single-season PEAL dataset (root-to-tip r = 0.39). For each cluster we recorded the (1) median date of the cluster most recent common ancestor (MRCA); (2) median date of the parent of the MRCA; (3) midpoint between (1) and (2); and (4) cluster duration, defined as the number of days from the last sampling date to (3) (Fig. 4b).

Modelling YFV transmission dynamics

Model framework

We simulated the PEAL epizootic in A. guariba using an IBM implemented in R (package ‘individual’ (v.1.1.17)¹¹³). Each NHP transitions through susceptible (S), exposed (E), infectious (I) and dead (D) states, assuming 100% lethality in A. guariba at PEAL (approximating observed lethality; Fig. 1b). Upon death, individuals are detected and sampled with sampling probability p_obs, reflecting incomplete ascertainment of all deceased A. guariba. A summary of the model parameterization and sources for the parameters can be found in Extended Data Table 1.

The infection process and the force of infection (FOI)

We assumed density-dependent transmission, such that each susceptible A. guariba experiences a force of infection (FOI) proportional to the number of infectious individuals at time t, I(t). Thus:

$$\lambda (t)=\beta I(t)$$

(2)

where β is the transmission rate. If γ is the average duration of infectiousness, the basic reproduction number R₀ is:

$${R}_{0}=\beta S(t)\gamma$$

(3)

representing the total number of infected A. guariba that a single infected A. guariba infects during the course of their infection, in a population the size of the susceptible A. guariba population in PEAL. Because the model does not explicitly represent mosquito populations, R₀ reflects the combined host–vector–host process (that is, the product of the number of vectors infected by an infectious A. guariba and the number of A. guariba subsequently infected by infectious vectors).

IIP in A. guariba

Following infection, individuals enter an exposed (E) state before becoming infectious. The time spent in the E state is drawn from a gamma distribution and reflects both the IIP of YFV in NHPs which was estimated from ref. ¹¹⁴ (Fig. 4c,d) and the EIP in the vector (see below). In the case of the incubation period of YFV in A. guariba, a gamma distribution was fitted to this data using the Stan (v.2.36.0) probabilistic programming language¹¹⁵. Weakly informative gamma priors were placed on both the shape (a) and scale (b) parameters of the gamma distribution for the IIP, as follows: a ≈ Gamma(1, 0.5); b ≈ Gamma(1, 2). The model was run with 4 chains of 2,000 iterations each, of which the first 1,000 iterations were used as warm up for adaptation, resulting in a total of 4,000 posterior samples. Convergence was confirmed by standard diagnostics (R-hat

EIP in H. leucocelaenus

We reanalysed the experimental data of ref. ¹¹⁶ (Fig. 4a,b) to estimate the EIP for H. leucocelaenus. Briefly, Haemagogus mosquitoes were inoculated with YFV and then made to feed on mice some number of days following inoculation. The proportion of mice that go on to die following Haemagogus feeding is therefore indicative of the proportion of mosquitoes that have completed the EIP and are thus able to successfully transmit YFV upon feeding. A gamma distribution was fitted to the data presented in the paper for the 25 °C experiment (representing the most complete experiment that is nearest to the temperatures measured in PEAL as shown in Fig. 1c). Uniform, minimally informative priors were placed on both the shape (a) and scale (b) parameters of the gamma distribution, with each specified as uniform(0.1, 10). We excluded the results associated with days 0 and 1, which are associated with high mortality and probably represent infection occurring from residual viable YFV from injection into the vector, rather than new infectious YFV virions generated by replicative cycles inside the vector. As above, the model was run with 4 chains of 2,000 iterations each, of which the first 1,000 iterations were used as warm up for adaptation, resulting in a total of 4,000 posterior samples, and convergence was confirmed across chains (R-hat A. guariba becoming infected and (indirectly, through infectious Haemagogus vectors) generating subsequent infections in other NHPs were accurately captured. Following this intrinsic and extrinsic incubation period, the infected A. guariba were then presumed to begin contributing to onward transmission (that is, the FOI term described above).

NHP infectious period, time to death and observation process

We assumed that YFV is 100% lethal in A. guariba and thus, once infectious, an individual NHP eventually goes on to die due to YFV. In our IBM, this delay between infection and death was drawn from a gamma distribution, the parameters of which were estimated from reanalysis of experimental A. guariba infection data presented in ref. ¹¹⁴. As previously, gamma distributions were fitted to this data using the Stan (v.2.36.0) probabilistic programming language. Weakly informative gamma priors were placed on both the shape (a) and scale (b) parameters of the gamma distribution, as follows: a ≈ Gamma(1, 0.5); b ≈ Gamma(1, 2). The model was run with 4 chains of 2,000 iterations each, of which the first 1,000 iterations were used as warm up for adaptation, resulting in a total of 4,000 posterior samples. Convergence of the chains was confirmed (R-hat

Upon death, each NHP was reported with probability p_obs, which represents the probability that a dead NHP was observed, and a notification sent that enabled enumeration and sampling to occur. In practice, we implemented this by drawing a Bernoulli random variable at the time of death for each individual, with parameter p_obs representing the chance the death was recorded:

$${D}_{\mathrm{reported}} \sim \mathrm{Binomial}({D}_{\mathrm{obs}},{p}_{\mathrm{obs}})$$

(4)

Given confirmation of local extinction of the NHP population of PEAL (see also Fig. 1b), this probability was estimated empirically from the collated data for PEAL, specifically by dividing the number of enumerated A. guariba deaths over the modelled time period by the total A. guariba population of PEAL immediately preceding the modelled time period. Detection and reporting of deaths occurred with some delay following death; this delay was estimated empirically from local veterinarian team estimates of the stage of corpse decomposition at detection (that is, the number of days since the NHP had died, see also Supplementary Table 8), and its estimation is described in further detail below. Specifically, a mixture of exponential and gamma distributions was fitted to the data using the Stan (v.2.36.0) probabilistic programming language¹¹⁵. Weakly informative gamma priors were placed on both the shape (a) and scale (b) parameters of the gamma distribution as follows: a ≈ Gamma(1, 1); b ≈ Gamma(2, 2). For the exponential distribution, we placed a uniform(0.1, 10) prior on the rate parameter. As above, the model was run with 4 chains of 2,000 iterations each, of which the first 1,000 iterations were used as warm up for adaptation, resulting in a total of 4,000 posterior samples. Convergence of the chains was confirmed (R-hat A. guariba death and that death being reported and enumerated are shown in Fig. 2c.

Inference of R
₀ and fitting to PEAL timeseries data

To infer the basic reproduction number (R₀) of YFV-infected A. guariba in PEAL during the outbreak, we fitted the IBM to the timeseries of NHP YFV deaths in PEAL. We placed an informative prior on the timing and intensity of importations based on the dated phylogenetic analysis of YFV PEAL dataset (see Fig. 3), constraining possible start dates using the posterior distribution of the parental node of Cluster A (sensitivity analyses using the midpoint of Clade A’s parental node and the MRCA of Clade A as the basis for the prior on the start data are shown in Supplementary Fig. 8). Because the first detected case (NP2067) was phylogenetically unrelated to subsequent transmission (a scenario that was consistent in all three phylogenetic datasets) and to the subsequent sustained outbreak that led to local extinction, we excluded deaths before 15 Nov 2017 (Fig. 1b) from model fitting, treating this early event as a separate, self‑limited introduction.

Daily incidence of reported NHP deaths y(t) were modelled as the Poisson realizations of the expected number of detected deaths, D_reported(t), reported on day t:

$$Y(t) \sim \mathrm{Poisson}({D}_{\mathrm{reported}}(t))$$

(5)

The likelihood for the entire timeseries is therefore given by:

$$\mathrm{Poisson}(Y(t){|}_{D\mathrm{reported}}(t))$$

(6)

A closed-form expression of the likelihood of the observed data given the model and its parameters was not analytically tractable, so we used particle filtering methods to obtain unbiased likelihood estimates¹¹⁷. To generate an estimate of the marginal model likelihood for each parameter combination, we conducted a parameter scan across different parameter combinations, utilizing a bootstrap particle filter with 250 particles. If the expected values of count distributions derived from the modelling framework are equal to 0 when empirically observed data are not 0, this results in particles of 0 weight, which can lead to the particle filter estimating the marginal likelihood to be 0 for all particles, and prevent the bootstrap particle filter from sampling efficiently or appropriately. To mitigate this, we followed the approach of ref. ¹¹⁸ and added a small but non-zero weight for each particle at every observation. This was achieved by adding a small amount of noise (exponentially distributed with mean 10⁻²) to modelled incident death values of 0. For each parameter combination, we ran 10 independent replicates with a bootstrap particle filter (to generate 10 independent estimates of the model likelihood for that parameter combination) and calculated the mean likelihood from these independent replicates. Model fitting was carried out within a Bayesian framework, with weakly informative priors used for the R₀ and epidemic start date. For the prior on R₀, we compiled 11 field-based estimates of R₀ reported in a recent review²⁶. A truncated normal was fitted to the data reported in this review to construct a prior distribution for R₀ (mean ≈ 4.6, s.d. ≈ 2.7). The prior on the epidemic start date was constructed on the basis of the PEAL molecular clock analysis. Specifically, the start date of the outbreak was bounded by Clade A’s MRCA posterior distribution, fitting a Weibull distribution to this posterior and using it as the prior in the transmission model inferential framework.

Transmission generations and epidemic amplification

To interpret cluster sizes in terms of underlying transmission dynamics, we simulated a negative-binomial branching process (offspring mean R, dispersion k) with combined generation time G = EIP (H. leucocelaenus) + EIP (A. guariba). This allowed us to summarize the number of host–mosquito–host generations required to reach each observed cluster size. For the dominant Cluster A, the inferred size was consistent with 1–3 generations. Full details of the inferential framework and its implementation are available in Data availability.