Data and model agents
Six datasets were used to characterize the domestic pig and wild boar populations and outbreak progression. Outbreak records—which included dates, coordinates, host classification (swine or wild boar), and farm type (backyard or commercial)—were retrieved from the WOAH Animal Disease Information System database, covering the period from first detection (10 June 2018) to the end of the first epidemic wave (31 December 2018)42. The study region was restricted to the six southeastern counties (Braila, Calarasi, Constanta, Galati, Ialomita, and Tulcea) where the initial epidemic wave was observed. Village spatial data was retrieved from the Romanian National Agency for Mapping and Real Estate43. Locations of industrial production sites were retrieved from data available through the county-level directorates of the National Sanitary Veterinary and Food Safety Authority44. Village census data was retrieved from the National Institute of Statistics (INSSE)45.
Domestic pig farms were categorized into backyard village herds and commercial sites. Though the two classes differ in scale and structure, commercial sites (n = 55 of 1041 total farms) were not considered separately, given both their limited number and their being principally large integrative herds with low connectivity to backyard herds. Conclusions from field investigations recommended that outbreaks be counted in the form of infected and cleared villages37, resulting in a village being considered as a single large backyard farm and represented through its centroid.
Wild boar habitat patches were characterized through both CORINE Land Cover satellite imagery46 and ENETWILD wild boar density estimates47. Interconnected hexagonal patches, spanning the full study area, were sized to match estimated Romanian wild boar home ranges (25 km2)32. For each patch, forest coverage was computed as the proportion of the total area classified as forest, while wild boar density was computed as the mean density across all 4 km2 density grid cells contained within each patch.
Infectious periods were defined through both field expertise and temporal clustering of cases. Sequential cases within a village were considered to be part of the same outbreak if they occurred less than n weeks apart, where n represents the assumed duration of post-detection infectiousness. Expert opinion from field investigations suggested a period of outbreak infectiousness of 2–4 weeks, where cases detected within that period within a village were considered to be part of the same outbreak23. Therefore, a village could have multiple outbreaks when gaps between detections exceeded this threshold. To identify an appropriate assumption for infectiousness duration that minimizes spurious re-infections (i.e. those artificially created from definition constraints) while remaining sensitive to true re-infection occurrences, we evaluated a range of candidate durations and applied an elbow method heuristic to select the value prior to diminishing returns of increasing infectiousness duration (Supplementary Fig. 12). Here, a value of three weeks was identified and subsequently used to compute the individual infectious periods for each village. Wild boar patches were assumed to be infectious for six weeks following case detection48.
To assess the potential influence of within-village heterogeneity, we explored the association between village population size and infectious period duration, considering village population a proxy for family counts and thus farm counts. Only a weak relationship was identified (Spearman’s = 0.10), suggesting that village size did not explain the variation observed in the infectious period durations. Given the absence of additional within-village farm data, within-village dynamics were not explicitly modeled.
In the model, each agent is either a domestic pig farm (denoted i if infectious or j if susceptible) or a wild boar patch (denoted p if infectious or q if susceptible). Units are generically referred to as source unit \(a\in \{i,p\}\) and target unit \(b\in \{j,{q}\}\), depending on the infection state and host class.
Infection process
The infection state process for each agent followed a modified SIR (susceptible, infectious, recovered) framework. To reflect the surveillance process, the infectious state was partitioned into undetected and detected compartments. Domestic pig farms transited through susceptible, infectious, detected (and still infectious), and recovered states, while wild boar patches transited through susceptible, infectious, and detected states. Latency was not considered as the model used weekly timesteps and the mean incubation period for the ASFV genotype II strain-of-interest was less than one week49,50,51. Re-susceptibility was not considered, as re-infection events were uncommon (10% of farms), and accounting for this was unsupported by the data.
Each susceptible unit \(b\in \{\,j,{q}\}\) received infectious pressure from internal sources \(a\in \{i,p\}\) plus external sources (representing long-distance movements or cross-border introductions), yielding a total of six forces of infection. The total force of infection on susceptible unit b at time t, given by \({\lambda }_{b}\left(t\right)\), was computed as:
$$\begin{array}{c}{\lambda }_{b}\left(t\right)={\phi }_{b}\left({\sum }_{a}{\lambda }_{{ab}}(t)+{\lambda }_{{xb}}(t)\right)\end{array}$$
(1)
Where \({\phi }_{b}\) represented the relative susceptibility of target unit b, which was given by the assumed timing of the winter holiday slaughter period for farms and by habitat suitability for patches; \({\lambda }_{{ab}}\left(t\right)\) represents the total force of infection from all units a onto unit b at time t; and \({\lambda }_{{xb}}\left(t\right)\) represents the total force of infection from external sources onto unit b at time t. The individual infectious pressure from each source of infection a onto susceptible unit b at time (t) was computed as:
$$\begin{array}{c}{{{{\rm{\lambda }}}}}_{{ab}}\left(t\right)={{\mathbb{1}}}_{{a}_{{\mathrm{inf}}}}\left(t\right)\cdot {{\mathbb{1}}}_{{b}_{{\mathrm{sus}}}}\left(t\right)\cdot {{\mathbb{1}}}_{a{b}_{{\mathrm{contact}}}}\cdot {\psi }_{a}\cdot {\beta }_{{ab}}\end{array}$$
(2)
Where indicators \({{\mathbb{1}}}_{{a}_{{\mathrm{inf}}}}\left(t\right)\), \({{\mathbb{1}}}_{{b}_{{{\mathrm{sus}}}}}\left(t\right)\), and \({{\mathbb{1}}}_{a{b}_{{{\mathrm{contact}}}}}\) indicated if unit a was infectious at time t, unit b was susceptible at time t, and units a and b were in contact, respectively; \({\psi }_{a}\) represented the relative infectivity of source unit a, modeled in the same manner as \({\phi }_{b}\); and \({\beta }_{{ab}}\) represented the weekly transmission rate from source unit a onto target unit b for unit class pair \({ab}\). For farms, both the relative infectivity and susceptibility were adjusted to reflect the shift in transmission dynamics that would be presumed to occur from a mass cull. This parameter took effect in model week 24, corresponding to the start of December, to capture both the official slaughter date (December 20) and account for preceding preparatory activities.
The external force of infection from external sources x upon unit b was computed as:
$$\begin{array}{c}{\lambda }_{{xb}}\left(t\right)={\beta }_{{xb}}\left(t\right)\end{array}$$
(3)
where \({\beta }_{{xb}}\left(t\right)\) represented the weekly transmission rate from external sources onto unit b at time t. The external force of infection was applied dynamically per county, determined by the estimated first week of infection in a given county and adjusted by the assumed detection delay for the host class.
Transitions among infection states were simulated using a classical τ-leap approximation52, in which discrete-time transition probabilities were derived from exponential waiting times and applied at fixed time steps. For susceptible unit \(b\in \{\,j,{q}\}\), infection occurred with the probability:
$${p}_{{\mathrm{inf}} \!,b}\left(t\right)=1-\exp \left(-{{{{\rm{\lambda }}}}}_{b}\left(t\right)\cdot {dt}\right)$$
(4)
Where \({\lambda }_{b}\left(t\right)\) was the total force of infection on unit b at time t (as defined in Eq. 1), and \({dt}\) represents the fixed simulation time step (set at 0.1, one-tenth of a week). To account for multiple potential sources of infection, a Bernoulli trial was independently evaluated for each transmission route at each time step. If infection of the target unit was successful, the responsible transmission route was sampled proportionally to the normalized probabilities of successful transmission. Upon infection, the unit entered the infectious-undetected state.
Transition from the infectious-undetected to infectious-detected state was governed by the detection rate \(\sigma\), and assumed to follow an exponential distribution. The corresponding probability of passive detection at each τ-leap time step for unit a, given its unit class, was:
$${p}_{{\mathrm{det}},{{\mathrm{passive}}}}=1-\exp \left(-{{{{\rm{\sigma }}}}}_{{a}}\cdot {dt}\right)$$
(5)
For farms under active surveillance, defined as those within a 10 km radius of a detected farm as per national regulation24, the detection probability was increased by a multiplicative parameter \({{{\rm{\alpha }}}}\). The resulting probability of active detection at each time step was:
$$\begin{array}{c}{p}_{{\mathrm{det}}\!,{{\mathrm{active}}}}=1-\exp \left(-{{{{\rm{\sigma }}}}}_{a}^{{{\mathrm{farm}}}}\cdot \alpha \cdot {dt}\right)\end{array}$$
(6)
Assuming that all cases are eventually detected, transition to the recovered state for detected units was governed by the recovery rate \(\gamma\), specific for unit class and also assumed to follow an exponential distribution, and given by the probability:
$$\begin{array}{c}{p}_{{{\mathrm{rec}}}}=1-\exp \left(-{{{{\rm{\gamma }}}}}_{{{a}}}\cdot {dt}\right)\end{array}$$
(7)
No recovery was assumed for wild boar patches, based on evidence regarding carcass persistence, imperfect carcass detection, and environmental contamination durations53,54.
Model structure and variant formulations
Variations of multiple key structural parameters were tested to identify which parameter formulation best-captured the observed epidemic, including contact structure, transmission pathway formulation, and transmission rate formulation (Supplementary Table 2).
Contact from farms to patches was defined as either zero-order (contacting only to the patch containing the farm) or first-order (contacting both the patch containing the farm and its immediate neighbors). Similarly, contact from patches to farms was defined as either zero-order (contacting farms within the patch only) or first-order (including farms in adjacent patches). Contact among farms remained fixed at 10 km, and patch-to-patch contact was limited to first-order adjacency.
Each transmission pathway (ij, pj, iq) was evaluated under both frequency-dependent and density-dependent assumptions, with the exception of patch-to-patch (pq) transmission, which was always frequency-dependent due to the near-uniform six-neighbor configuration of the hexagonal patch network.
Transmission rate structures were formulated to reflect the spatial heterogeneity seen in the observed county-level trajectories. For farm-to-farm transmission (\({{{{\rm{\beta }}}}}_{{ij}}\)), both a uniform transmission rate across all counties and a transmission rate stratified into high, medium, and low tiers were used. The tiers corresponded to the total number of infected farms in the observed data, with Tulcea (n = 106) and Braila (n = 87) classified as high-transmission, Constanta (n = 51) and Ialomita (n = 42) as medium, and Calarasi (n = 20) and Galati (n = 8) as low.
For patch-to-patch transmission (\({{{{\rm{\beta }}}}}_{{pq}}\)), models took either a uniform transmission rate across all counties or followed a two-tier system where counties with a high incidence among wild boar (Tulcea, n = 37; Ialomita, n = 11) were assigned a separate rate.
Lastly, the transmission rate from external sources onto wild boar patches \(({\beta }_{{xq}})\) was either active, where its application varied temporally and was applied to patches within counties aligned to start with the estimated date of first infection, or inactive, where its effect was removed from the model.
These combinations resulted in 256 total models that contained between 9 and 13 estimated parameters, depending on whether specific transmission rates were stratified by county-level incidence and whether an explicit external force of infection upon wild boar was included.
All models used the same state transition framework and initial conditions.
Model initialization
Simulations were initialized by seeding infections in the two wild boar habitat patches that contained the earliest confirmed wild boar cases, which were also the two patches that contained the first domestic pig farm cases. These detections, having occurred on 12 and 16 June 2018 in Tulcea county, were backdated by 6 weeks to account for the estimated detection delay for wild boar carcasses55. The simulation, therefore, started in early May 2018.
Model calibration
To select the most appropriate habitat suitability proxy for wild boar transmission risk, a ROC curve for predicting the occurrence of at least one case in a patch was constructed for both forest coverage and host density estimates.
Following the determination of the metric for habitat suitability, the array of candidate models reflecting all combinations of parameter structures was calibrated to identify which parameter combinations and structural assumptions best replicated the observed epidemic dynamics. Each model estimated the posterior distributions for a set of unobservable transmission parameters, consisting of transmission rates \(\left({\beta }_{{ij}}\,{\beta }_{{pj}}\,{\beta }_{{xj}}\,{\beta }_{{iq}}\,{\beta }_{{pq}}\,{\beta }_{{xq}}\right)\) and relative modifiers for host infectivity \(\left({\psi }_{i}{\psi }_{p},\,\right)\) and susceptibility \(\left({\phi }_{j}{\phi }_{q}\right)\). Contingent on model structure, certain parameters were stratified by county incidence or omitted altogether. Detection and recovery rates were informed from the literature and observed data. Detection in farms was estimated at 1/3 weeks-1 based on outbreak investigations in the Russian Federation56, while detection in patches was set at 1/6 weeks-1 based on estimates for a large wild boar population55. Recovery rates for farms were computed empirically from the observed epidemic data and defined per county, per time period (split between the first and second halves of the model period), while patches (if recoverable, as in the alternate scenario) were assigned a fixed rate of 1/3 weeks-1.
Numerical parameterization occurred through approximate Bayesian methods using a sequential Monte Carlo algorithm (ABC-SMC), implemented via the adaptive population Monte Carlo (APMC) methodology57.
At each calibration step, 200 parameter sets (“particles”) were drawn from uniform prior distributions. Epidemic trajectories were generated per particle for each host population and summarized as the incidence counts per host, per county, per week—for a total of 432 summary statistics. To avoid over-weighting statistics with high variance, the contribution of each statistic to the overall distance was normalized by its variance, ensuring compatibility among the different magnitudes of the individual summary statistics. The final distance between the simulated and observed data for the particle was then computed as the maximum absolute distance across all summary statistics.
Particles having simulated distances below a dynamically-updated tolerance threshold were retained, with the proportion of particles accepted at each step used to determine convergence. Retained particles were assigned weights based on their proximity to the observed data and perturbed to augment exploration of the parameter space, and a new set of particles was generated. At each successive generation, the tolerance threshold was reduced, forcing the algorithm to generate simulations increasingly closer to the observed summary statistics. This process continued until the particle acceptance rate fell below 5%, the recommended termination threshold, where limited improvement in convergence would be obtained from further iteration.
Model selection and structural evaluation
All models were ranked by their final Euclidean distance to the observed summary statistics, and a filter was applied to retain only those that succeeded in capturing the observed epidemic trajectory in at least 4 counties. This ensured that models that closely fit the overall summary statistics but failed to reproduce important localized peaks in incidence (a product of the sparsity of the observed incidence in some counties) were excluded. The retained model with the lowest summary statistic distance was then selected as the best-fitting model.
To evaluate the contribution of each structural parameter to overall model fit, a random forest regression model was used with final distance as the outcome variable and the structural parameters as predictors. Corresponding variable importance scores were used to quantify the influence of each parameter. To determine which variables associated with each structural parameter yielded better-fitting models, empirical cumulative distribution functions of the final distance across the values of each structural parameter were visually assessed.
Baseline and alternative management scenarios
To evaluate the impact of enhanced control strategies on epidemic outcomes, both a baseline simulation and four alternative management scenarios—that targeted either improved detection or reduced transmission—were explored through their effect on final epidemic size. For each scenario, five replicates of each conserved particle were run, resulting in a total of 500 simulations per scenario. Results were summarized by the median and 95% credible intervals of the final epidemic size.
Improved passive surveillance among farms was simulated by reducing the average duration of undetected circulation from three to two weeks. Reactive culling was implemented through shortening the infectious period of detected farms to one week, reflecting immediate depopulation upon detection. Preventive culling in response to nearby wild boar case detection was simulated through transitioning all farms within a detected patch to the recovered state upon detection. Environmental sanitation, reflecting clearance of wild boar carcasses, was simulated through patches transitioning to the recovered state following a two-week period of circulation—reflecting the time needed to disinfect the area—rather than remaining infectious indefinitely.
Performance-weighted ensemble modeling
To account for structural uncertainty and to avoid reliance on a single best-fitting model, performance-weighted ensembles from the top model variants (defined as those that successfully captured the observed epidemic trajectory in at least four counties, clustered at the lowest distance, and provided a satisfactory visual epidemic fit) were constructed via a model-averaging approach58. Model weights were computed as the normalized proportion of the predictive error of each model, with the final Euclidean distance between simulated and observed summary statistics serving as the performance marker. This allowed models with a better fit to have a proportionally greater influence on ensemble estimates.
The derived weights were applied to the summary outputs of each model (weekly incidence medians and credible interval bounds, proportion of infections attributable to each transmission pathway, and distributions of final epidemic size under baseline and alternative scenarios), and ensemble estimates accounting for the relative influence of each model were obtained.
Data processing, model implementation, parameterization and analysis were performed in R statistical software version 4.3.3 “Angel Food Cake”59. The sequential Monte Carlo algorithm for model calibration was implemented via the EasyABC package60, and random forest feature selection occurred via the ranger package61. Data manipulation and visualization were performed via the tidyverse suite62.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.