Probabilistic classification of late treatment failure in uncomplicated falciparum malaria

Categories: Disease & Virus

November 10, 2025

Data from¹³

We re-analysed data from a six-arm therapeutic efficacy study conducted between February and July in 2021 across 3 provinces in Angola¹³. For completeness, we provide a summary of the study design and genotyping approach of ref. ¹³ here. Outpatients presenting to urban clinics situated in provincial capitals were screened for inclusion in the study. Enrolment criteria included uncomplicated P. falciparum monoinfection and either a history of fever or an axillary temperature reading ≥37.5 °C. To avoid confounding by different transmission intensities, some inclusion criteria varied by province. In Benguela (low to moderate transmission), children between 6 and 143 months of age with 1000—100,000 parasites/μL at baseline were enrolled. In Lunda Sul (moderate to high transmission) and Zaire (moderate to very high transmission), children were enrolled with a narrower age range of 6 to 59 months, and higher baseline parasite densities of 2000–200,000 parasites/μL. Parasitaemia was quantified using microscopy. 622 patients were enrolled across the study arms.

Study participants were treated with 3-day regimens of either artemether-lumefantrine or artesunate-amodiaquine (Lunda Sul and Zaire), and dihydroartemsinin-piperaquine or artesunate-pyronaridine (Benguela). Dosing was determined by weight bands in accordance with manufacturer’s guidelines. Antimalarial treatment was largely supervised. Follow-up, entailing clinical examination and slide microscopy, occurred on days 1 (clinical examination only), 2, 3, 7, 14, 21 and 28 in addition to days 35 and 42 for patients treated with dihydroartemsinin-piperaquine and artesunate-pyronaridine, with the convention that enrolment (baseline) was designated day 0. Recurrent infections, characterised by microscopy-detected asexual P. falciparum parasitaemia occurring between day 7 and the end of follow-up, were identified in 71 patients.

Genotyping was performed for 70 pairs of baseline and recurrent samples, and an additional 42 baseline samples. DNA was extracted from dried blood spots and P. falciparum diagnosis was confirmed by PCR. For a panel of 7 neutral microsatellite markers (M2490, M313, M383, PfPK2, POLYA, TA1, TA109), fragment lengths were then assessed using capillary electrophoresis. In the original study¹³, classification of reinfection versus recrudescence was performed using the Bayesian CDC model⁸, and a simple match-counting algorithm (at least 4/7 matches^13,24,25, where a per-marker observation is a match if some alleles are the same at enrolment and recurrence), stratified by study site. Here, we also perform classification using our novel probabilistic approach, PfRecur and a 5/7 matching algorithm.

Molecular marker cardinality and parasitaemia in¹³

We explored the relationship between MOI (defined as the maximum cardinality across the genotyped panel of 7 neutral microsatellites) and ${\log }_{10}$ parasitemia for 182 samples with genotyping data available, using the Jonckheere-Terpstra test³⁴, with p-values derived under a normal approximation using the R function PMCMRplus::jonckheereTest (V1.9.6)³⁵. We also examined the within-host diversity of each marker, as a function of ${\log }_{10}$ parasite density, performing linear regression using the R functions ggplot2::geom_smooth (V3.5.1)³⁶ and stats::lm (V4.2.1)³⁷. We defined the within-host diversity of each marker for each sample as the complement of Nei’s gene identity metric³⁸, taking a uniform distribution over compatible allelic configurations at that locus (i.e., all possible ways of allocating alleles observed at that locus to the number of clones given by the MOI). Nei’s gene identity metric is formulated as the sum of squares of allele frequencies; the complement can be interpreted as the probability of differing alleles when a pair of clones is sampled (with replacement) from the sample (see Supplementary Note 3.1).

PfRecur

PfRecur classifies a recurrent infection as either a recrudescence or a reinfection based on either the posterior probability of there being at least one recrudescent clone (M1); or the posterior expected proportion of recrudescent clones (M2) (Fig. 5). A complete description of the framework is provided in Supplementary Note 1; below, we provide a brief outline, using the terms marker and locus interchangeably. Classification of recurrent infections is done for each pair in a group of patients, where the grouping is determined by the user (usually by study or study site, depending on the context). The model is not fully Bayesian in that there is no way of specifying multilevel groupings.

Overview of the statistical model

PfRecur classifies infections based on an analytically tractable Bayesian model. Under the model, each sample is treated as a set of genetically-distinct clones; the cardinality of this set is referred to hereafter as the multiplicity of infection (MOI). For each sample, we observe a set of alleles G_ℓ at loci ℓ ∈ {1, …, L}, with locus-wise cardinality M_ℓ = ∣G_ℓ∣. We take the MOI to be the maximum cardinality over all loci: $M={\max }_{1\le \ell \le L}{M}_{\ell }$.

The indices r, b and i are used to distinguish samples that are handled differently under the model. Index r corresponds to the recurrent parasite sample for which classification is being performed; index b corresponds to the paired baseline sample (i.e., it is from the same study participant as sample r). Index i = 1, …, n iterates over n baseline samples that are not paired with sample r: they are from different study participants.

The recurrent sample r (of MOI M^(r)) is modelled as a mixture of M_C recrudescent clones and M_I = M^(r) − M_C newly-inoculated clones. For each recurrent sample r, the target of inference under the PfRecur statistical model is the posterior distribution over M_C, with state space {0, …, min(M^(b), M^(r))}:

$${\mathbb{P}}\left({M}_{C}=m\,| \,{G}^{(r)}\right)=\frac{{\mathbb{P}}\left({M}_{C}=m\right){\prod }_{\ell=1}^{L}{\mathbb{P}}\left({G}_{\ell }^{(r)}\,| \,{M}_{C}=m,{M}_{I}={M}^{(r)}-m\right)}{\mathop{\sum }_{m=0}^{\min ( {M}^{(b)},M^{(r)})}{\mathbb{P}}\left({M}_{C}=m\right)\mathop{\prod }_{\ell=1}^{L}{\mathbb{P}}\left({G}_{\ell }^{(r)}\,| \,{M}_{C}=m,{M}_{I}={M}^{(r)}-m\right)}.$$

(1)

The model is predicated on the following set of assumptions:

Unlinked loci: valid if neutral loci lie on different chromosomes (because chromosomes assort independently in meiosis) or inter-locus distances are large (because then loci are more likely to be separated by one or more recombination break points in meiosis).
Clones within samples are independent as are non-recrudescent clones between samples: violated by the presence of sibling parasites within samples³¹; additionally violated if the average population-level relatedness is high (although sample allele frequencies for the baseline samples 1, …, n partially encode average relatedness³³) and/or if the population is structured (e.g., by geographic barriers in study site, or by household effects among study participants).
Uniform distribution of allelic states for each sample (i.e., any configuration of alleles compatible with the number of successfully genotyped clones and the set of genotypes observed at a given locus is modelled to be equally likely): enforced in the absence of quantitative data on relative allelic abundance (i.e., assuming bulk genotypic data with no quantitative information on the intra-sample amount of each allele), given measures of allelic abundance are not readily available for length polymorphic markers that are genotyped using electrophoresis and currently used to perform molecular correction; however, we note that amplicon sequencing, which is recognised by the WHO as a potential future standard for molecular correction, generates read count data on multi-allelic markers (microhaplotypes), and read count data can be used to estimate the relative abundance of within-sample alleles.
Reinfection and recrudescence are not mutually-exclusive: the recurrent sample r comprises a mixture of recrudescent clones drawn from the paired baseline sample b, and newly-inoculated clones drawn from the contemporaneous population at large, which is approximated by the remaining baseline samples 1, …, n.
Non-parametric genotyping error: we accommodate non-parametric genotyping error in baseline samples relative to the recurrent sample, specified by the probability that each allele called in a baseline sample matches an allele called in the recurrent sample r.
No undetected clones in the baseline samples 1, …, n: we assume that the consequences of ungenotyped clones (if any) ‘average out’ over samples 1, …, n from which baseline population allele frequencies are derived.
Ungenotyped clones in the paired baseline sample b: the allelic states of undetected clones in the paired baseline sample are imputed using allele frequencies derived over the unpaired baseline samples 1, …, n, with a marker-wise truncated binomial model for the number of clones genotyped per locus.
Ungenotyped clones in the recurrent sample r: observed alleles in the recurrent sample are allocated over successfully genotyped clones only, with a marker-wise truncated multinomial model for the number of clones genotyped per locus.

Model structure

Figure 6 depicts the model structure. The likelihood of the locus-wise recurrent data, ${G}_{\ell }^{(r)}$, is conditional on the random variables M_C and M_I = M^(r) − M_C. It is also conditional on the locus-wise baseline data, ${G}_{\ell }^{(b)}$ and ${\{{G}_{\ell }^{(i)}\}}_{i=1..n}^{i\ne b}$, and on the recurrent and baseline MOIs, which are derived from the complete recurrent and baseline data. Although these variables are data derived, they are not treated as random variables under the model. The likelihood also features other user-defined inputs that are not treated as random variables. These include a genotyping detection rate, ω, and a locus-wise genotyping error matrix, δ_ℓ. The prior on M_C is conditional on M^(r) and on a prior parameter β. There is no prior on M_I in Equation (1) since M_I is deterministic given M^(r) and a realisation of M_C. Thanks to an analytically tractable likelihood (next section), Equation (1) is analytically tractable. As such, inference under the PfRecur framework does not require a numerical sampler or an optimisation algorithm.

**Fig. 6: Graphical representation of the Bayesian statistical model within the PfRecur framework.**

Model likelihood

We derive multiple expressions in Supplementary Note 1 (see Supplementary Equations (15), (17), (18) and (19)), which together can be used to evaluate the likelihood:

$${\mathbb{P}}\left({G}_{\ell }^{(r)}\,\left\vert \,{M}_{c}=m,{M}_{I}={M}^{(r)}-m\right.\right).$$

(2)

The various steps used to conceptualize the likelihood are sketched out below. Throughout, newly-inoculated clones are drawn from population I (the contemporaneous population at large), which is approximated under the model by the baseline samples i = 1, …, n. The recrudescent clones are drawn from population C, comprised of clones in the paired baseline sample b. In an intermediate step, the probability of ${G}_{\ell }^{(r)}$ is modelled conditional on allele frequencies, whereby the probability that a clone drawn from population S ∈ {I, C} harbours allele α at locus ℓ is equated to the population allele frequency ${\theta }_{\ell }^{(S)}(\alpha )$. In later steps, allele counts supersede allele frequencies.

Modelling the number of clones genotyped per locus

In the allelic data of a given sample, some loci have cardinalities lower than the MOI. The lower cardinality could occur because multiple clones within the sample share identical alleles at this locus, because the MOI is spuriously elevated by genotying errors, or because some clones are not genotyped at this locus. We model the number of genotyped clones per locus in the recurrent sample r at loci with cardinalities strictly less than the observed MOI (i.e., \(| {G}_{\ell }^{(r)}| ), thereby allowing some clones in the recurrent sample to evade detection at some loci (multiple clones sharing identical alleles is also addressed — see next section). A spuriously elevated MOI is not accounted for.

We construct a model of the number of clones genotyped per locus in the recurrent sample r, i.e., the number of clones that contribute to the observation ${G}_{\ell }^{(r)}$, as follows. Each clone in sample r, irrespective of whether it is drawn from population I or C, is detected with probability ω at locus ℓ. Then, given M_C = m and M_I = M^(r) − m, the number of detected clones ${Q}_{\ell }^{(r,S)}$ derived from populations S ∈ {I, C} at locus ℓ follow the truncated multinomial distribution

$$ {\mathbb{P}}\left({Q}_{\ell }^{(r,C)}={q}_{C},{Q}_{\ell }^{(r,I)}={q}_{I},\,| \,{M}_{C}=m,{M}_{I}={M}^{(r)}-m\right)\\ =\frac{\left(\begin{array}{c}m\\ {q}_{C}\end{array}\right)\left(\begin{array}{c}{M}^{(r)}-m\\ {q}_{I}\end{array}\right){\omega }^{{q}_{I}+{q}_{C}}{(1-\omega )}^{{M}^{(r)}-{q}_{I}-{q}_{C}}}{\mathop{\sum }_{j={M}_{\ell }^{(r)}}^{{M}^{(r)}}\left(\begin{array}{c}{M}^{(r)}\\ j\end{array}\right){\omega }^{j}{(1-\omega )}^{{M}^{(r)}-j}}\,{{\mbox{for}}}\,\,{q}_{I}+{q}_{C}\ge {M}_{\ell }^{(r)},\,{q}_{C}\le m,\,{q}_{I}\le {M}^{(r)}-m.$$

(3)

Allocating observed alleles to genotyped clones

We average analytically over compatible allelic configurations in sample r using the inclusion-exclusion principle. Suppose there are ${Q}_{\ell }^{(r,S)}={q}_{S}$ genotyped clones drawn from populations S ∈ {I, C} at locus ℓ in sample r. Then, the probability of observing the set of alleles ${G}_{\ell }^{(r)}$ at locus ℓ takes the form

$$ {\mathbb{P}}\left({G}_{\ell }^{(r)}\left\vert \,{{{{\boldsymbol{\theta }}}}}_{\ell }^{(C)},\,{{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)},\,{Q}_{\ell }^{(r,C)}={q}_{C},\ {Q}_{\ell }^{(r,I)}={q}_{I}\right.\right)\\ =\mathop{\sum}_{A\in {{{\mathcal{P}}}}\left({G}_{\ell }^{(r)}\right)}{(-1)}^{| A| }{\left(\mathop{\sum}_{\alpha \in {G}_{\ell }^{(r)}\setminus A}{\theta }_{\ell }^{(C)}(\alpha )\right)}^{{q}_{C}}{\left({\sum}_{\alpha \in {G}_{\ell }^{(r)}\setminus A}{\theta }_{\ell }^{(I)}(\alpha )\right)}^{{q}_{I}}$$

(4)

where ${{{\mathcal{P}}}}({G}_{\ell }^{(r)})$ denotes the power set of ${G}_{\ell }^{(r)}$; that is, the set of all subsets of ${G}_{\ell }^{(r)}$ encompassing the empty set. Convolving Equation (4) over the locus-wise truncated multinomial model of genotyped clones (3) yields the probability of the observation ${G}_{\ell }^{(r)}$ at locus ℓ for the recurrent sample r, formulated with respect to the population allele frequencies ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(S)}$, S ∈ {C, I}.

Modelling allele frequencies

The derivation of population allele frequencies under our model is two-fold. Denote by H_ℓ the set of possible alleles at locus ℓ (equipped with an arbitrary ordering) and by ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(S)}$ the vector of population allele frequencies over H_ℓ. We begin by deriving allele frequencies ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(S)}$ conditional on the vector of per-sample allele counts C_ℓ for each baseline sample b or 1, …, n over H_ℓ; and later address the distribution of per-sample allele counts C_ℓ, which are not directly observed using bulk genotypic data.

For population I, we derive allele frequencies under a Bayesian multinomial-Dirichlet model of baseline samples i = 1, …, n (which excludes the paired baseline sample b). Allele frequencies are formulated over clones in the baseline samples i = 1, …, n, whereby each sample is effectively weighted by its MOI (i.e., high MOI samples are more informative). Under the assumption of clone-wise independence (both within and across samples), we obtain the multinomial likelihood

$$\left.{\sum}_{i=1}^{n}{{{{\bf{C}}}}}_{\ell }^{(i)}\,\right\vert \,{{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)} \sim \,{{\mbox{Multinomial}}}\,\,\left({\sum}_{i=1}^{n}{M}^{(i)},\,{{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}\right).$$

Taking a uniform prior for ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}$ over the ∣H_ℓ∣ − 1 simplex yields the posterior

$$\left.{{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}\,\right\vert \,{\sum}_{i=1}^{n}{{{{\bf{C}}}}}_{\ell }^{(i)} \sim \,{{\mbox{Dirichlet}}}\,\,\left({\sum}_{i=1}^{n}{{{{\bf{C}}}}}_{\ell }^{(i)}+{{{\bf{1}}}}\right).$$

(5)

For population C, allele frequencies are largely informed by the paired baseline sample b, but not exclusively because the alleles of ungenotyped clones in sample b are imputed using population I allele frequencies, which are based on samples i = 1, …, n (Equation (5)). This imputation explains the omission of sample b in the formulation of allele frequencies for population I. Akin to the model of genotyped clones for sample r (Equation (3)), the number of genotyped clones ${Q}_{\ell }^{(b)}$ in sample b that contribute to the observation ${G}_{\ell }^{(b)}$ is modelled to follow a truncated binomial distribution, with support $| {G}_{\ell }^{(b)}|,\ldots,{M}^{(b)}$ and success probability ω. Given ${Q}_{\ell }^{(b)}=q$, we denote by ${{{{\bf{C}}}}}_{\ell }^{(b,q)}$ the vector of allele counts over the q clones in sample b that are genotyped at locus ℓ. We model ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(C)}$ as a deterministic function of ${Q}_{\ell }^{(b)}=q$, the per-sample allele counts ${{{{\bf{C}}}}}_{\ell }^{(b,q)}$ of the genotyped clones in sample b, and ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}$,

$${{{{\boldsymbol{\theta }}}}}_{\ell }^{(C)}=\frac{{{{{\bf{C}}}}}_{\ell }^{(b,q)}}{{M}^{(b)}}+\left(1-\frac{q}{{M}^{(b)}}\right)\cdot {{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}.$$

(6)

Modelling genotyping errors

We account for genotyping error when modelling the per-sample baseline allele counts ${{{{\bf{C}}}}}_{\ell }^{(i)}$ and ${{{{\bf{C}}}}}_{\ell }^{(b,q)}$. Genotyping errors are modelled using a user-specified right-stochasic error matrix δ_ℓ for each locus ℓ, with the interpretation that ${\delta }_{\ell }(\alpha,{\alpha }^{{\prime} })$ yields the probability that an allele called as α in a baseline sample b or i = 1, …, n matches an allele called as ${\alpha }^{{\prime} }$ in a recurrent sample r at locus ℓ. The PfRecur framework is marker-agnostic, in that the non-parametric error model δ_ℓ can be adapted to different marker types. In the present study, we consider a normalised geometric model adapted to length polymorphic microsatellite markers (akin to ref. ⁸), parametrised by a genotyping error probability ε where, for allele lengths i, j

$${\delta }_{\ell }(i,j)=\left\{\begin{array}{ll}\frac{{\varepsilon }^{| i-j|+1}}{{\sum }_{k\ne i}{\varepsilon }^{| i-k| }}\quad &\,{\mbox{if}}\,\,j\,\ne \,i\\ 1-\varepsilon \quad &\,{\mbox{if}}\,\,j=i\end{array}\right.,$$

(7)

where the sum in the denominator is taken over the allelic lengths in the set H_ℓ.

Deriving moments of allele counts

The locus-wise probability (4) of observed genotypes for recurrent sample r is formulated as a multinomial expression of the population allele frequencies ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(S)}$. Convolving Equation (4) over the modelled distributions of ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(I)}$(5) and ${{{{\boldsymbol{\theta }}}}}_{\ell }^{(C)}$(6), which are conditioned on per-sample allele counts, yields a multinomial expression of baseline allele counts (Supplementary Equation (12)).

Because individual clones and the alleles they carry are not directly observable in multiclonal samples genotyped using standard methods (e.g., not single-cell genotyping), allele counts for multiclonal samples must be derived under an appropriate model^{16,17,18,19,20}. By Jensen’s inequality, the expected per-sample allele counts ${\mathbb{E}}[{{{{\bf{C}}}}}_{\ell }^{(z)}]$ (which are straightforward to compute) cannot be substituted directly for ${{{{\bf{C}}}}}_{\ell }^{(z)}$ in Supplementary Equation (12). Convolving the locus-wise probability given by Supplementary Equation (12) over compatible allelic configurations in baseline samples b and i = 1, …, n requires calculating moments of the per-sample allele counts

$${\mathbb{E}}\left[{\left({\sum}_{\alpha \in A}{C}_{\ell }^{(b,q)}(\alpha )\right)}^{m}\right]={\sum}_{k=1}^{q}{k}^{m}\cdot {\mathbb{P}}\left({\sum}_{\alpha \in A}{C}_{\ell }^{(b,q)}(\alpha )=k\right)$$

and the pooled-sample allele counts (i.e., per-sample allele counts summed over one or more baseline samples)

$${\mathbb{E}}\left[{\left({\sum}_{i=1}^{n}{\sum}_{\alpha \in A}{C}_{\ell }^{(i)}(\alpha )\right)}^{m}\right]={\sum}_{k=1}^{{M}^{(1)}+\cdots+{M}^{(n)}}{k}^{m}\cdot {\mathbb{P}}\left({\sum}_{i=1}^{n}{\sum}_{\alpha \in A}{C}_{\ell }^{(i)}(\alpha )=k\right)$$

aggregated over allelic subsets A ⊆ H_ℓ (Supplementary Note 1.2).

In Supplementary Note 1.3, we derive these moments from first principles under our framework. In brief, we use a simple combinatorial argument based on ordered partitions to derive probability mass functions and consequently cumulants of the per-sample allele counts. To adjust per-sample allele counts for genotyping error, we adopt a Poisson binomial model: the number of distinct alleles in a given set A ⊂ H_ℓ that are harboured by clones in a baseline sample is modelled as a sum of ∣G_ℓ∣ independent, but not identically distributed, Bernoulli random variables with respective success probabilities ∑_α∈Aδ_ℓ(g, α), $g\in {G}_{\ell }^{(q)}$.

To compute moments of the pooled-sample allele counts (under the assumed independence of clones within and between baseline samples), we first sum cumulants of the per-sample allele counts to recover cumulants of the pooled-sample allele counts. We then exploit complete exponential Bell polynomials to map cumulants of the pooled-sample allele counts to moments of the pooled-sample allele counts, as required. We adopt this construction because the order of these moments (up to M^(r)) is likely, in practical settings, to be smaller than the number of baseline samples n over which allele counts are aggregated.

Prior of the statistical model

Since mixtures of newly-inoculated and recrudescent parasite clones are comparatively unlikely in low transmission transmission settings, we take a symmetric prior, which weights pure reinfections and pure recrudescences more heavily than intermediate mixtures. We implement this through a symmetric beta binomial distribution

$${M}_{C} \sim \,{{\mbox{BetaBinomial}}}\,({M}^{(r)},\beta,\beta )$$

with 0 β ≤ 1. In the case β = 1, we recover a uniform distribution over the breakdown of newly-inoculated vs recrudescent clones. In the limit β → 0, the prior probability of all intermediate mixtures approaches zero, whereby reinfection and recrudescence constitute mutually exclusive categories.

Posterior metrics

We generate two posterior metrics for each recurrence: the posterior probability of at least one recrudescent clone

$${{\mbox{M1}}}: \!\!=1-{\mathbb{P}}\left({M}_{C}=0\,| \,{G}^{(r)},{G}^{(b)},{G}^{(1)},\ldots,{G}^{(n)},\omega,\delta,\beta \right)$$

(8)

and the posterior expected proportion of recrudescent clones

$$\,{{\mbox{M2}}}: \!\!=\frac{1}{{M}^{(r)}}{\sum}_{m=0}^{\min({M}^{(b)},M^{(r)})}m\cdot {\mathbb{P}}\left({M}_{C}=m\,| \,{G}^{(r)},{G}^{(b)},{G}^{(1)},\ldots,{G}^{(n)},\omega,\delta,\beta \right).$$

(9)

Application of posterior metrics

For conceptual consistency with the CDC model⁸, we perform classification using metric M1 of PfRecur: a recurrent sample r is classified as a recrudescence if M1 > 0.5, and as a reinfection otherwise. Downstream efficacy estimates are computed using metric M1, rather than dichotomised classifications, in line with recommendations from the CDC^8,13. Metric M2 serves as a supplementary descriptor for recurrences comprising a mixture of newly-inoculated and recrudescent clones, and is used to validate the model against simulated data for which these mixtures are known (see below).

R software

PfRecur is implemented as an R package (available at https://github.com/somyamehra/PfRecur)³⁹. This largely relies on base R functionality³⁷, with additional dependencies on copula::Stirling2 and copula::Stirling1 (V1.1-1)⁴⁰ (to evaluate Stirling numbers of the second and unsigned first kind respectively); PDQutils::cumulant2moment and PDQutils::moment2cumulant (V0.1.6)⁴¹ (to map between cumulants and moments respectively); poisbinom::dpoisbinom (V1.0.1)⁴² (to evaluate the density function of the Poisson binomial distribution); and VGAM::dbetabinom.ab (V1.1-9)⁴³ (to evaluate the density function of the beta binomial prior). The package accommodates samples with a MOI of up to 9 (to avoid numerical instability when evaluating the posterior).

As input, the PfRecur package requires categorical (presence/absence) genotypic data in a list of binary matrices, where each matrix corresponds to a marker; named matrix columns correspond to alleles; named matrix rows correspond to samples; and matrix elements are set to 1 if the corresponding allele has been detected in the relevant sample, and 0 otherwise. Additional user-specified parameters include the per-clone marker-wise probability of detection ω, and a list of marker-wise row-stochastic genotyping error matrices δ_ℓ.

Given a recurrent sample r, paired baseline sample b, and baseline samples 1, …, n, the function PfRecur::evaluate_posterior returns the discrete posterior distribution for M_C over the state space {0, …, M^(r)} in addition to metrics M1 (8) and M2 (9), in the form of a named list.

Simulation study

To validate PfRecur, we simulate recurrent samples as mixtures of newly-inoculated and recrudescent clones (Supplementary Note 2). In brief, we consider samples with MOIs up to 9 (mean baseline MOI ≈ 3), genotyped at 7 unlinked multi-allelic markers (each with between 10 and 30 distinct alleles). We permit siblings within samples, violating the assumed independence of clones under PfRecur. Each simulated clone is detected at each marker with probability ω = 0.9. Genotyping error is applied to the set of alleles harboured by detected clones in each baseline sample with probability ε = 0.05, in accordance with the length-dependent normalised geometric model (7). We simulate 40 baseline datasets, each comprising 25 samples. For a baseline sample of MOI M^(b), we simulate paired recurrences with MOI M^(r) = 1, …, 9 and $m=0,\ldots,\min \{{M}^{(b)},{M}^{(r)}\}$ recrudescent clones. We apply our probabilistic classifier PfRecur to recover the posterior distribution for the number of newly-inoculated vs recrudescent clones within each simulated recurrent sample under a uniform prior (β = 1); the underlying parameters ω and ε are assumed to be known. The results in the main text, pertaining to metrics M1 and M2, are aggregated across 29077 simulated recurrences.

Reanalysis of¹³

Using PfRecur, we perform a re-analysis of ref. ¹³, with baseline samples stratified by study site (whereby allele frequencies for newly-inoculated clones are modelled to be site-specific). By default, we set β = 0.25 for the prior; ω = 0.9 for the per-clone marker-wise probability of detection in each baseline/recurrent pair; and ε = 0.05 for the genotyping error probability, under the normalised geometric model (7). We additionally perform a sensitivity analysis for ω ∈ [0.75, 1] and ε ∈ [0, 0.25]. Classification is performed with both the entire 7 microsatellite marker set, and also omitting the TA109 marker that appears to generate artefacts.

We compare posterior metric M1 of PfRecur against the gold-standard CDC model⁸, which was used originally to analyse¹³. In the present study, we have re-run code provided openly by Plucinksi and colleagues³⁰ (with 100,000 iterations for the Gibbs sampler); posterior probabilities based on all 7 microsatellite markers may differ from those reported in¹³ due to the stochastic nature of the MCMC algorithm.

False positive recrudescence rates

To estimate false positive rates for calling recrudescence, we generate 500 artificial ‘not-recrudescence’ datasets from¹³ by generating random derangements of baseline study participants labels within each study site, whereby permuted baseline/recurrent pairs cannot be derived from the same individual and therefore cannot represent recrudescences. The generation of permuted datasets, rather than permuted pairs, is necessitated by the construction of the CDC model⁸. We perform classification for these permuted datasets using both PfRecur (metric M1) and the CDC model^8,30 (with 10,000 iterations for the Gibbs sampler due to computational time constraints). For each permuted dataset, we compute the false positive recrudescence rate by averaging the posterior probability of recrudescence (under the CDC model) or metric M1 (under PfRecur) over recurrent samples. In addition, we consider a match-counting approach, treating the presence of one or more shared alleles at 4 or 5 markers (or more) as evidence of recrudescence. For the panel of 7 neutral microsatellites, no WHO-endorsed guidelines are available but^13,24,25 support the ≥4/7 rule. We note that current WHO guidelines, tailored to a three marker panel, stipulate a strict 3/3 match-counting rule for primary analysis⁹. We do not consider the strict 7/7 match-counting rule for the 7 neutral microsatellite panel used in¹³, given that the microsatellite marker TA109 appears to be problematic. There are also several recurrences in¹³ (namely, ZL21-292, ZQ21-103 and ZL21-245) where mismatch at a single marker (with a length difference plausibly attributable to either genotyping or human error) supports the use of a relaxed match-counting rule. To avoid ambiguity, we restrict match-counting classification to baseline/recurrent pairs with at least one allele call at each of the 7 markers.