Predicting schistosome transmission in rural Uganda using water contact data from wearable GPS devices

Categories: Disease & Virus

May 22, 2026

Ethical approval

Data collection and use were reviewed and approved by the Oxford Tropical Research Ethics Committee (509-21), Vector Control Division Research Ethics Committee of the Uganda Ministry of Health (VCDREC146) and Uganda National Council for Science and Technology (HS 1664ES). Written informed consent was obtained for all study participants, with adults consenting on behalf of children and children providing verbal assent.

Study context

In sub-Saharan Africa—the region with the lowest access to safe drinking water—408 million people rely on unsafe sources such as rivers, lakes and ponds⁴⁴. This study focuses on Uganda, where 44 and where the national prevalence of schistosomiasis has been estimated to be 26% using point-of-care circulating cathodic antigen tests¹⁸. Despite over 13 rounds of MDA since 2003, repeated MDA in Uganda has failed to achieve the WHO targets of morbidity control and elimination as a public health problem⁴⁵. The 2022 WHO schistosomiasis guideline highlights the need for multifaceted control, such as combining MDA with behaviour change and WASH, to achieve the WHO 2030 control targets¹³. SchistoTrack collected comprehensive environmental, behavioural and spatial data to inform more integrated, focal schistosomiasis control programmes. The study focused on moderate- and high-endemicity villages in western and eastern Uganda, where the prevalence of schistosomiasis, as measured by stool microscopy, was 43%². The study took place across three Ugandan districts, Pakwach, Buliisa and Mayuge, which are located along the River Nile, Lake Albert and Lake Victoria, respectively²⁵. These areas were selected to represent diverse climates (tropical rainforest and tropical savannah climates), waterbody types (lake and river settings) and tribal and religious groups. All villages were 2.

Study design and participant sampling

The GPS logger study was nested within the SchistoTrack cohort study baseline, which took place between January and February 2022 and enrolled 1,459 households across 38 villages (n = 2,885), approximating 40 households per village². Participants were randomly selected from village registers or MDA records (see Puthur et al.²⁵ for details). Of those 38 villages, 12 (four in each district) were selected for the GPS logger study based on their levels of open-water contact and occupational fishing, determined predominantly by the presence of a beach or landing site. None of the selected villages had piped water (that is, individuals had no access to safe drinking water within their households). A maximum of 50 participants from 25 households per village were selected among SchistoTrack participants, with a target of one adult–child pair (with adults defined as people ≥18 years of age and children defined as those 5–17 years of age) per household²⁵. This study excluded children aged

Household survey data

Before the GPS logger study, questionnaires to collect sociodemographics, biomedical variables, WASH and environmental variables and self-reported open-water contact patterns were administered to the household head of all 1,459 households that were part of the larger SchistoTrack study. Questionnaires were translated into local languages and administered digitally by two surveyors (one recruited centrally from Kampala and one recruited from within the district) trained by G.F.C. and M.T.E. The data were collected via a tablet device (Lenovo TB-8505F) and the Open Data Kit platform (using Open Data Kit Collect application version 2022.4). To ensure the quality of the survey data, questionnaires were piloted using field tests and mock interviews; response constraints were implemented to avoid implausible response values as far as possible; and standardized interview protocols were used to ensure consistent interview procedures were followed by surveyors. At the end of the household interview, one adult and one child per household were selected for clinical assessments by the household head. From these clinical participants, a subset was subsequently selected for inclusion in the GPS logger study.

Community sensitization, consent and remuneration

G.F.C., M.T.E. and F.R. conducted community workshops in each village to explain the study objectives and address any questions or concerns. As part of the standard written informed consent for recruitment into SchistoTrack, consent was obtained from adult participants and parents or guardians for children under 18 years in the local language, specifically for the GPS logger study. Children provided verbal assent and, where possible, written assent was also collected. Participants received a nominal remuneration equivalent to about one day of wages (10,000 UGX; ~US$2.73) in addition to any incentive in the study upon returning their GPS loggers at the end of the observation period.

GPS logger settings and procedures

Wearable GPS loggers (i-gotU GT-600; Mobile Action Technology, Taiwan) were used to record the location data of participants at two-minute intervals between 05:00 and 20:00 local time over approximately ten consecutive days. This time window was chosen to capture activities during the day. Loggers were turned off from 20:00–05:00 to save battery, using scheduled control, as loggers were not recharged in the field by participants or the study team. Also to save battery, all loggers were set to motion activation such that loggers turned off when participants did not move. Button controls on devices were disabled, so participants were unable to switch them off. Devices were placed in waterproof pouches and worn around the neck using a lanyard (Supplementary Fig. 17). Children and adults in the same household always received different-coloured lanyards to minimize the risk of accidentally mixing up the loggers. Adults were asked to help children with the day-to-day wear of the loggers. Participants were instructed to wear the devices during all daily activities, including during open-water contact, and to remove them only at night.

Participant flow

A participant flowchart is shown in Supplementary Fig. 18. A total of 215 loggers were purchased for the study, with the aim of distributing 200 loggers across three districts sequentially (that is, 600 in total).The total number of recruited participants was 585—slightly below the target of 600. This was mainly due to device failure and unreturned loggers in Pakwach. After excluding participants with fewer than two complete days of GPS data—excluding the first day of recording for all participants to reduce the Hawthorne effect⁴⁶—we included a final analytic sample of 452 participants in the study.

GPS logger testing and validation

We tested the accuracy of GPS loggers in situ. During fieldwork, ten loggers were placed next to each other on the ground for 10 min in an unobstructed area in Buliisa to estimate GPS position error in the field. Based on this experiment, we estimated that GPS loggers were accurate within 8.6 m. GPS precision was estimated using the uere function in the ctmm package in R⁴⁷. Our estimate of 8.6 m aligned with a previous study, which estimated the accuracy of i-gotU GT-120 under different levels of obstruction and found that the mean location error was 48.

GPS logger data retrieval

For 94 participants, no GPS data could be retrieved as loggers suffered water damage despite the use of protective pouches and the manufacturer’s indication that devices were fully waterproof. Multiple retrieval strategies were employed: metal contacts were cleaned with ethanol in the field; loggers were transported to Oxford for a second cleaning attempt; and 36 devices with persistent data loss were opened at the University of Oxford Materials Science Department, where a research technician resoldered the GPS chips onto functioning logger boards. This recovered data from 24 of the 36 affected loggers.

Self-reported wear compliance

Wear comfort was assessed when participants returned GPS loggers. As not all devices were returned immediately following the end of the study period by the participants themselves, these data were available for 87% of participants (395/452). Among those participants, 83% (328/396) reported that wearing the logger was comfortable. Other participants self-reported that loggers caused perceived chest pain (34), body weakness (5), nausea (1) or headaches (1), which were addressed through sensitization.

Mapping of village infrastructure, open-water sites and taps and boreholes

As part of the study, F.R., M.T.E. and (initially) G.F.C. comprehensively mapped the locations of all public taps and boreholes used by the study communities that were relevant for schistosome transmission, assisted by the village chairman or a village health team member. As part of this mapping, the type (tap or borehole), GPS location and state of each facility were assessed. Trained malacologists from the Division of Vector-Borne and Neglected Tropical Diseases at the Uganda Ministry of Health mapped all open-water sites used by study communities, including sites outside village perimeters but accessed by community members with the help of local guides, such as village chairmen or community health workers. This comprehensive mapping approach ensured the inclusion of all relevant open-water sites used by the study population, regardless of their proximity to residential areas, as is described in detail by Iacovidou et al.²⁶.

Deriving open-water contact and tap or borehole usage patterns

A contact event with an open-water site and tap or borehole was defined as being within a 20 m buffer of the GPS location of an open-water site or tap or borehole. We chose 20 m to account for possible GPS position errors while minimizing overestimation of open-water contact. This buffer was conservative compared with previous GPS logger studies; Eyre et al.²⁷ used a 30 m buffer around the shoreline, whereas Seto et al.³¹ used a 100 m buffer around the shoreline. We used a larger buffer of 30 m as a robustness check, with the results reported in Supplementary Fig. 14. GPS locations of sites and taps or boreholes were recorded via tablet (Lenovo Tab M8 (3rd Gen); Lenovo Group; China) with 5–10 m manufacturer-reported accuracy, and GPS logger points were reported with 8.6 m accuracy, as described above. All contact events with open-water sites and taps or boreholes were identified using recursion analysis—a well-established method in movement ecology^49,50—and implemented in the recurse package in R⁵¹. The output from the recursion analysis was a list of all distinct open-water contact and tap or borehole usage events with an associated timestamp, number of revisits, contact duration and open-water site and tap or borehole identifier (Fig. 1e). Compared with simply counting the number of points within each buffer area, recursion analysis has the advantage of reconstructing an approximate linear trajectory. It can therefore identify visits with durations shorter than the GPS logger sampling frequency. This is particularly important given that water contacts are typically short²².

As the SchistoTrack protocol did not allow sites to be larger than 15 m, even when they were directly adjacent, we grouped such nearby locations into a single open-water site (or a single tap or borehole for adjacent taps and boreholes) for the analysis, following the methods used by Iacovidou et al.²⁶. The density-based spatial clustering of applications with noise algorithm was used and the optimal number of clusters was determined based on the gap statistic, separately by district⁵². This collapsed the 143 open-water sites and 63 taps and boreholes into 69 and 32 clusters, respectively, to which we then assigned the mean GPS cluster coordinates.

Sanitation infrastructure

This analysis did not focus on the use of public latrines, even though it would have been possible to derive this information from the GPS data using the same methods as for open-water site and tap or borehole usage. There were two reasons for this. First, field visits to all public latrines by M.T.E. and F.R., and initially by G.F.C., revealed that most were locked (usually a community member held the keys) and not regularly used by the community. Second, among the 452 GPS logger participants, 343 (75.9%) lived in households that had a private latrine (that is, a flush or pour-flush toilet, a covered pit latrine with or without privacy or a composting toilet). Thus, most households could rely on private latrines, which provide more privacy and convenience compared with public latrines^53,54. However, private latrine usage events could not be derived from GPS logger data due to the unknown precise location of these latrines.

Dataset and variables

Dataset

The dataset used to estimate spatial decay models was generated by calculating the Euclidean household distance to all open-water sites and taps or boreholes (both used and unused) within the same district (Fig. 1e). This covered distances of up to 11, 17 and 26 km for Pakwach, Buliisa and Mayuge, respectively.

Outcomes

For the models estimating any usage of open-water sites or taps or boreholes, the outcome was a binary indicator of whether an individual used an open-water site or a tap or borehole. For the model estimating duration of usage, the outcome was a continuous variable indicating the number of minutes per day each individual had contact with each mapped open-water site or tap or borehole. Duration, in minutes per day per individual, was calculated by dividing the total duration of contact with each open-water site or tap or borehole by the number of distinct calendar days with GPS data for each individual.

Covariates

The main predictor variable was household distance to each open-water site or tap or borehole. We selected additional predictor variables based on their relevance for open-water contact. The following individual-level variables were used: age, gender, occupation, self-reported open-water contact, water contact activity match and mobility tercile. Age was coded as a binary variable (2. We also used gender (male or female), reported by the household head, as a covariate. Individual-level occupation was also used as a covariate because occupation is an important determinant of open-water contact². The household head (aged ≥18 years) reported the occupation of all household members. The occupation categories were fishing, fishmongering, farming and other. A self-reported open-water contact variable, described in detail in Reitzug et al.², was also included. A binary open-water contact activity match variable was generated for each individual–site pairing. Whenever an individual conducted any of the 11 domestic, recreational or occupational open-water contact activities, and whenever the same activity was reported to be performed at a specific site, based on reports by the local guide, this was defined as an activity match. The 11 open-water contact activities included: collecting drinking water; washing clothes with soap; washing clothes without soap; bathing with soap; bathing without soap; washing jerry cans or household items; collecting papyrus; fishing; fishmongering; collecting shells; and swimming or playing. All individuals who reported collecting drinking water were also assigned to having an activity match with taps or boreholes, as this activity was amenable to being conducted at a tap or borehole. Apart from the activity match, open-water contact activities of an individual were not considered in this analysis, as the GPS data did not contain any definitive information on the specific open-water contact activity an individual engaged in. We also used an individual’s mobility tercile as a covariate (see below). At the household level, aside from distance to the closest sites and taps or boreholes, a self-reported variable for whether the household used a safe drinking water source was also used. Taps and boreholes were counted as safe, whereas open freshwater sources such as swamps or lakes were counted as unsafe. To account for geographic, cultural, behavioural and environmental factors that differed across study locations, a district-level categorical variable was included in some models. Households paid different prices for tap usage compared with boreholes (see results); therefore, we also generated a variable indicating whether each public water supply was a tap or a borehole.

Schistosome reinfection

To test associations between water contact and (re)infection intensity, we relied on schistosome infection data from the SchistoTrack study. All participants were tested for schistosome infection using Kato–Katz microscopy at baseline, four to five weeks later following treatment with praziquantel and at one-year follow-up. Baseline testing took place before the GPS logger handout and the study team was unaware of the Kato–Katz results. Treatment with praziquantel occurred irrespective of infection status and participants remained unaware of their infection status throughout the study. Infection intensity was measured in EPG of stool. The one-year post-treatment infection measurement was chosen as this is the interval for MDA in endemic areas and it is frequently used as a time frame to assess reinfection^13,55,56. Among participants, 0.4% (2/452) were missing Kato–Katz results at baseline, 3.1% (14/452) were missing them at treatment follow-up and 20% (91/452) were missing them at one-year follow-up. For those individuals, we imputed the reinfection intensity based on the baseline infection intensity using the multivariate imputations by chained equations technique, implemented in the mice package in R⁵⁷.

Human mobility measures

We aimed to compare human mobility patterns across different activity types: visits to open-water sites or taps or boreholes and overall movement. To do so, we selected the radius of gyration R_g,i as our metric, as it measures the Euclidean typical displacement of individual i from their household location and is widely used in human mobility modelling^58,59,60. This choice ensured consistency with our use of Euclidean distances for calculating household distances to visited open-water sites and taps or boreholes. R_g,i was computed using the following formula:

$${R}_{{\rm{g}},i}=\sqrt{\frac{1}{N}{\sum }_{l=1}^{N}{({\vec{r}}_{l}-{\vec{r}}_{{\rm{h}}})}^{2}}$$

(1)

where ${({\vec{r}}_{l}-{\vec{r}}_{h})}^{2}$ represents the squared Euclidean distance from location l to the household location h.

Spatial decay models

We modelled open-water site and tap or borehole usage as a function of Euclidean household distance, motivated by evidence that open-water contact and schistosome infection are waterbody distance dependent^2,17,23. Our previous work using self-reported open-water contact and closest site assignment suggested that the water contact over household distance was approximately linear². The modelling framework proposed here differs from previous studies in that it aims to estimate the usage of each open-water site and tap or borehole, as opposed to just the closest site. Our spatial decay models take the general form:

$$P({\mathrm{response}}_{i,k}=1)={b}_{0}\times f({d}_{i,k})$$

(2)

where P is the probability that individual i uses a specific open-water site and tap or borehole k. Here f(d_i,k) represents the spatial decay function with d being the household distance to k. The binary outcome (response_i,k) is modelled using a Bernoulli distribution with an identity link function to represent probabilities directly. All spatial decay models were fitted as Bayesian nonlinear regression models.

The sequential model-building process we used was as follows. First, we determined a suitable spatial decay function f(d) in equation (2). We tested whether an exponential decay or a power-law decay—two decay functions used commonly in human mobility modelling⁵⁸—or a Hill function (which includes more parameters to control the curvature) best fit the data (Supplementary Table 7). As models were fitted as Bayesian nonlinear models, the best decay function was determined via ELPD—a Bayesian leave-one-out measure of out-of-sample predictive fit⁶¹ (see below for more details). Second, we tested whether the inclusion of covariates, added one variable (j in equation (3)) at a time (from district, gender, age (2)).

$$P({\mathrm{response}}_{i,j,k}=1)={b}_{0,j}\times f_{j}({d}_{i,j,k}).$$

(3)

We then extended equations (2) and (3) to evaluate the influence of spatial distance, human mobility and usage of safe water infrastructure on human contact with open-water sites, as shown in Fig. 3. For competing opportunities (Fig. 3b), we drew up Stouffer’s intervening opportunity framework⁶² to develop a spatially explicit model estimating the decay separately for the first, second, third, …, nth closest site. As 3c) were assessed by estimating decay separately per mobility tercile, represented by j in equation (3). Crowding out (Fig. 3d) was assessed using equation (5).

Analogously to the model predicting the probability of open-water-site usage in equation (2), we used the same modelling approach to predict tap or borehole usage, with P(response = 1) representing the probability of using a tap or borehole. We also used spatial decay models to predict the duration of open-water site and tap or borehole usage. These models took the following form:

$$\,E({t}_{i,k})={b}_{0}\times f({d}_{i,k}).$$

(4)

We modelled equation (4) using a zero-inflated negative binomial model:

$${t}_{i,k} \sim \left\{\begin{array}{cc}0, & \,\mathrm{with\; probability}\,{\pi }_{i,k},\\ \,\mathrm{NegBin}\,({\mu }_{i,k},\phi ), & \,\mathrm{with\; probability}\,1-{\pi }_{i,k},\end{array}\right.$$

where μ_i,k = b₀ × f(d_i,k) is the mean duration for individual i at site k, and ϕ is the dispersion parameter, which allows the negative binomial to model strong overdispersion and a heavy right tail in visit durations. The zero-inflation probability π_i,k accounts for excess zeros beyond those produced by the negative binomial component. Due to the disproportionate influence that a very small number of extreme duration values can have on the dispersion parameter ϕ, and to obtain parameter estimates that were more representative of typical open-water contact behaviour, we capped all durations above the 99.9th percentile at that percentile value (>139 min d⁻¹ set to 139). This transformation affected only 0.1% of observations and preserved the rank order of almost all data while limiting the undue influence of extreme values on the zero-inflated negative binomial fit. For exponential duration decay models, we used an uninformative prior, Unif(0, 1), for b₁ and a normal prior, Normal(1, 9), truncated at zero and 30, for b₀. This prior was chosen to match the mean and standard deviation of the data.

We also built spatial decay models in which the usage of taps or boreholes reduces the probability and frequency of open-water-site usage (crowding-out models) and compared the predictive performance with that of models without crowding-out effects. For the probability of open-water-site usage with crowding out, the model took the following form:

$$\begin{array}{l}P(\mathrm{water}\,\mathrm{site}\,{\mathrm{usage}}_{i,k}=1)\\ ={b}_{0}\times f_{j}({d}_{i,k})\times \left(1-\alpha \times P(\mathrm{tap}\,\mathrm{or}\,\mathrm{borehole}\,{\mathrm{usage}}_{i})\right)\end{array}$$

(5)

where the probability of open-water-site usage is inversely related to the probability of tap or borehole usage and where α quantifies the magnitude of crowding out. Again, this probability is estimated for each individual i and open-water site k pair. A model analogous to equation (5) is used for the duration of open-water-site usage.

Supplementary Table 7 provides a summary of all of the relevant spatial decay functions and models presented in this study.

Estimating spatial decay models

We used Bayesian nonlinear regression models, implemented using the brms package⁶³ in R version 4.1.0, to fit spatial decay models. Bayesian modelling was used because it allowed us to estimate distributions of the decay parameters and quantify the uncertainty of these parameter estimates by taking posterior draws from these distributions to obtain 95% CrIs. Due to convergence issues in the Bayesian modelling, we restricted our data to all sites within 3,000 m of the households, which captured over 99% of all open-water sites and taps or boreholes used. For all Bayesian models estimated via brms, a Hamiltonian Monte Carlo sampling algorithm via the Stan backend, employing the No-U-Turn Sampler, was used^63,64. Four Markov chains with 2,000 iterations per chain (1,000 warm-up iterations and 1,000 sampling iterations) were employed. Convergence was assessed based on whether the potential scale reduction factor ($\widehat{R}$) was close to 1. Within brms, the plot function was used to assess chain mixing visually and the pp_check function was used to assess agreement between predicted and observed response values.

As far as possible, we used non-informative priors. For instance, in the exponential decay model, we used uniform priors Unif(0, 1) for b₀ and b₁, the most conservative priors possible, as b₀ and b₁ can only take values between 0 and 1. Model fitting enforced P(response_i,j,k = 1) ∈ [0, 1] throughout. Where model outputs were used to assign individuals to open-water sites, predicted probabilities were converted to binary outcomes using the optimal cut-point probability determined by the cutpointr package with the F1 score as the criterion⁶⁵.

Selecting the best-fitting models

To determine which decay function best fit the data, we used Bayesian leave-one-out cross-validation. The best fit was identified based on ELPD for a new dataset—a Bayesian leave-one-out measure of out-of-sample predictive fit⁶¹. When comparing a model against a reference model, negative ELPD values signify lower predictive density, indicating that a model performs worse than the reference model. For the decay function, ELPD was compared between global Hill, exponential and power-law decay models (Supplementary Table 7). We additionally report AUROCs and their 95% CIs from fivefold cross-validation to evaluate the ability of the models to correctly discriminate between used and unused open-water sites or taps or boreholes for each individual. When computing AUROCs, stratification based on district was performed when the folds were constructed to ensure proper representation of each district. In models where one district at a time was held out to evaluate predictive performance on data from this district using a model trained on the other two districts (Supplementary Fig. 12), the training data comprised 50 open-water sites per individual, sampled with replacement from the full data to ensure a more balanced dataset across districts when calculating AUROC.

Importantly, we evaluated the predictive performance of the spatial decay models at the individual level, which is a substantially more stringent approach compared with how gravity models (which are akin to our spatial decay models) are conventionally evaluated. As gravity models are not at the individual level, Pearson correlation coefficients or R² values between predicted and observed aggregate mobility flows are typically used for evaluation^66,67,68. For fine-scale predictions, gravity models have performed poorly, with R² values for predicting commuting behaviour in London of between 7–22%⁶⁷. Here, we are not primarily interested in measures of aggregate performance but in the more difficult problem of predicting individual-level usage patterns. This problem is more challenging because we aim to predict not whether any open-water contact occurs but the specific open-water site(s) that individuals visit. Consequently, if an individual visits a different water site than the one predicted by the model, this would not be counted as a classification mistake when the goal is to predict the aggregate number of individuals with open-water contact. In our case, however, this case would introduce two classification mistakes (one open-water site incorrectly predicted as not being visited and another being incorrectly predicted as visited).

Out-of-sample predictions of open-water contact

The ability of spatial decay models to predict population-wide patterns of open-water contact was demonstrated using the publicly available Global Google–Microsoft Open Buildings Dataset (V3), which has mapped over two billion structures globally⁴². Footprints of all buildings in a 10 km × 10 km area around the study villages in Pakwach were extracted. As there were few commercial or public buildings in this rural area, we assumed that each building represented a household.

To generate population-wide predictions of open-water contact, we calculated the centroid location of each building and input the centroid GPS coordinates into equation (3), using the district-specific decay model for Pakwach. This produced out-of-sample predictions of open-water-site usage for all households in the study villages in Pakwach.

Sensitivity and robustness analyses

Several sensitivity and robustness analyses were conducted to account for the potential for open-water contact misclassification, systematic differences in compliance across demographic groups, and differential logger failure across districts.

As most open-water contact events are brief (in the range of minutes), there is a risk of misclassification, for instance due to passing by an open-water site without having actual open-water contact. To evaluate the influence of such brief, potentially misclassified contact events, we successively increased the threshold below which we removed events (0–10 min). We then recalculated the spatial decay model predicting open-water-site usage, including only open-water contact events above these thresholds, to understand how the model coefficients b₀ and b₁ changed. This thresholding exercise also informed on the minimum GPS logger sampling frequency needed to reproduce the patterns observed with GPS data at two-minute intervals.

Comparisons were made between the demographic characteristics of individuals with sufficient GPS data (≥2 days), those with insufficient data (

This study was conducted across three different districts; behaviour patterns from one district may not be generalizable. To address this, data from two districts at a time were used to predict water site usage in the held-out district using the global spatial decay model of open-water contact (equation (2)). The numbers of individuals and observation days were downsampled to determine the minimum numbers of individuals and observation days required to achieve good model performance, which was assessed based on AUROCs. Although participants were required to contribute a minimum of two days of GPS logger data to be included in this analysis, we assessed for included participants whether even a single day of GPS logger data from all participants yielded model performance based on the AUROC.

Spatially explicit transmission model

We constructed a modular, spatially explicit IBM to simulate the dynamics of reinfection in our study population using Julia (version 1.11.3). We informed our model from existing IBMs, adapting snail dynamics from the SchiSTOP model²¹, incorporating them at the site level rather than aggregated and adapting human infection dynamics from the SCHISTOX model²⁰. Parameter values were derived from these two models or based on biological realism, but were not directly fitted to the data. Initial snail states were informed by Iacovidou et al.²⁶, with aggregated information for the water site clusters, and initial worm burdens in humans were informed by the EPG data of the 452 individuals who participated in the GPS logger study.

Spatial explicitness in the model was introduced using weighted bipartite networks (one for each district), where nodes represent households and open-water sites and edges were weighted according to the probability, P, of site usage, as calculated using the spatial decay function (equation (2)).

We considered four decision rules regarding site usage, where each rule represents an independent visitation decision per individual per time step:

Visit the nearest open-water site with 100% probability.
Visit the nearest open-water site with probability P.
Each site is given a success (1) or failure (0) score based on Bernoulli trials with probability P. Individuals visit a single site with a success score chosen at random. If no successes are recorded, individuals stay home.
Same as the previous rule, but individuals visit all sites that return a success score from the Bernoulli trials.

The first three rules only allow individuals to visit a maximum of one site per day, with the first two rules restricting visitation to their nearest site. All rules, except the first one, are informed by the spatial decay function.

We simulated 100 replicates for each decision rule for 365 days, considering the study population of 452 individuals. The aims were to investigate whether information from the spatial decay models was informative for predicting infection prevalence one year after treatment and to assess the site-specific visitation distributions from the different rules. Site visits were calculated by counting, for each replicate, how many times individuals visited a given site, then summarizing these counts across all replicates to obtain average visit frequencies per site. By taking the cumulative miracidial input at the site level, we computed the relative contamination for each site as a percentage of the most contaminated site within each district.

We performed sensitivity analysis on the aggregation parameter (as was also done in Graham et al.²⁰ when using the SCHISTOX model) by showing the predicted prevalence results using lower values. However, we believe a higher value is more consistent with the observed prevalence in our study area.

Inclusion

This study was co-designed and implemented by researchers based in Uganda and the United Kingdom. Ugandan investigators from the Division of Vector-Borne and Neglected Tropical Diseases at the Ministry of Health and local partners contributed to the study design, community engagement, field implementation, data curation and manuscript preparation, and all listed authors meet journal authorship criteria, with additional contributors acknowledged. Characterizing fine-scale open-water contact behaviours and identifying key transmission sites emerged as a local priority through local community engagement activities as part of SchistoTrack, and the research results are shared with communities during annual sessions. Research ethics approval was obtained from the Oxford Tropical Research Ethics Committee, Vector Control Division Research Ethics Committee of the Uganda Ministry of Health and Uganda National Council for Science and Technology. All participants received remuneration (10,000 UGX, equivalent to ~US$2.73 or one day of wages) upon enrolment into SchistoTrack and an equal amount upon completion of the GPS logger sub-study. Those with confirmed S. mansoni infection received praziquantel in line with Ugandan national guidelines. No biological specimens, cultural artefacts or associated traditional knowledge were exported outside the country. Training and capacity-building workshops for household survey data collection, environmental sampling and spatial mapping using geographical information software were conducted as part of SchistoTrack.