Stock Ticker

Microhaplotype deep sequencing assays to capture Plasmodium vivax infection lineages

A rhAmpSeq multiplex of 93 microhaplotypes with high sensitivity and specificity

A two-step multiplex PCR-based library preparation assay was established using rhAmpSeq chemistry (Integrated DNA Technologies, IDT). The multiplex assay amplifies one species confirmation marker, four putative markers of drug resistance, and 93 microhaplotypes distributed relatively uniformly across the P. vivax genome (Supplementary Data 1; Supplementary Fig. 1). To evaluate the assays technical and analytical performance, a total of 805 P. vivax infections were assessed, including 128 paired P. vivax infections from a randomized controlled trial, two P. vivax serial dilution panels, and 15 non-P. vivax control samples (Supplementary Data 2, Supplementary Fig. 2).

The specificity of the assay was evaluated using a selection of 8 non-vivax Plasmodium spp. and 4 uninfected human DNA controls. Examination of amplicon coverage revealed that 86 markers (89% [86/97] after excluding the mitochondrial species marker) exhibited amplicon coverage <0.9 in all negative controls (Supplementary Fig. 3a). Amongst the 11 markers with coverage >0.9 in one or more negative controls, 6 had read counts <25 in all controls tested, with only 5 markers exceeding 25 reads in the P. knowlesi controls (Supplementary Fig. 3b).

The assay sensitivity was assessed using serial dilutions of two P. vivax-infected patient blood samples (KV3 and KV5) under high sample multiplexing (library pooled across 384 samples), using both the standard PCR step 1 DNA input and reaction volumes (referred to as full chemistry with a 20 μl reaction volume) and halving the PCR step 1 DNA input and reaction volumes (referred to as half chemistry with a 10 μl reaction volume). Under both full and half chemistry conditions, >84 (87%) markers were successfully genotyped ( > 25 reads) in the 70 and 96 parasite/ul sample preparations, as well as in higher densities (Supplementary Figs. 4a and b). Further details on read counts can be found in Supplementary Figs. 4c–f.

Parasite density defined by microscopic blood film examination may not correlate directly with parasite DNA yield owing to the presence of multiple life cycle stages in P. vivax infections (schizonts, for example, having greater DNA quantity than ring stages). To address this, we evaluated the threshold cycle (Ct) values of the KV5 serial dilution using real-time PCR, targeting the pvmtcox1 gene. Samples with a Ct <30 (KV5 96 parasite/ul) had successful genotyping at 87 (89.6%) markers (Supplementary Fig. 5, Supplementary Table 1).

Our serial dilutions were prepared using blood collected into anticoagulant-coated tubes, that generally yield higher quality DNA than dried blood spots. We determined how the number of successfully genotyped markers correlated with parasitemia in dried blood spot samples from a study in Ethiopia. There was a weak correlation between parasitemia and read count (rho = 0.1873, p = 0.0217); 148 cases (98.67%) with parasitemia ranging from 240 to 42,280 parasites/μl could be genotyped successfully at ≥80 (86%) microhaplotype markers (Supplementary Fig. 6).

Geographically diverse application potential

A total of 745 P. vivax isolates were used to evaluate amplification efficacy from 8 countries across the globe. Successful genotyping was defined as ≥25 reads for >80% of the 97 nuclear genome amplicons and could be achieved in 705 (94.6%) isolates. 564 (83%) independent (non-recurrent) samples were taken forward for evaluation of individual marker efficacy and country-specific amplification patterns. Aside from 248 samples, data from all isolates was generated from a single 384-well run (run 3). Four microhaplotype markers (64,721, 354,590, 419,038 and 466,426) exhibited lower read-pair than read amount counts but none failed consistently across populations (Fig. 1b).

Fig. 1: P. vivax proportion of samples with more than 25 reads or more than 10 read pairs by country.
figure 1

Heat maps illustrate the proportion of samples with more than 25 reads (a) and proportion of samples with more than 10 read pairs derived from DADA2 (b) for each marker by country. Markers (x-axis) were ordered by chromosome and coordinate. All samples with genotype failures ( < 25 reads or <10 read pairs) are presented in black. Microhaplotype markers 64721, 354590, 419038 and 466426 displayed consistently low read-pair counts. Data are presented on a total of 564 independent (not including replicates or recurrences) infections that passed genotyping. The data from Ethiopia are split into dried blood spots (DBS) from a clinical trial conducted in East Shewa Zone and whole blood (WB) extracts from a therapeutic efficacy survey conducted in Gamo Zone for comparative assessment. The DBS versus WB comparisons reveal similar read and read pair counts at all loci.

Accuracy in P. vivax variant calling

Using a set of 110 isolates with both whole genome sequencing (WGS) and amplicon sequencing data, the accuracy of SNP variant calling (applying the default 10% minor allele threshold, minor allele depth of 2, and minimum depth of 5 and 25 for WGS and amplicon sequencing data respectively) was confirmed at the 425 SNPs within the 97 markers (only excluding mitochondria). Concordance was observed at 96.4% (42,805/44,386) of the genotype calls (Supplementary Table 2, Supplementary Data 3). Homozygous reference versus homozygous alternate allele discordances contributed 0.33% (148/44,386) of all calls. Most discordant calls reflected heterozygous versus homozygous calls (90.6% (1433/1581)), likely reflecting differences in the limit of genotyping of minor clones between the datasets. After applying a 1% minor allele threshold to both datasets, a notable difference was more than doubling of heterozygous amplicon sequencing calls that were homozygous in the WGS dataset (from 227 calls with 10% threshold to 697 with 1% threshold; Supplementary Table 3). As illustrated in Supplementary Fig. 7, a large proportion of discordant heterozygote genotype calls appear to reflect a greater limit of genotyping in the deep sequencing (amplicon) data.

High potential to capture within-host infection complexity

Using a set of 104 isolates with both WGS and microhaplotype data, within-host infection complexity was explored using the Fws and effective multiplicity of infection (eMOI). As illustrated in Fig. 2a, b, there was a strong correlation between the genome-wide Fws and the microhaplotype-based eMOI in each geographic region assessed (Spearman’s rank correlation, rho = −0.6109985, p = 5.664e-12) (Supplementary Data 4).

Fig. 2: Microhaplotype-based within-host diversity trends.
figure 2

a, b illustrate the level of concordance between genomic (as measured by the Fws) and microhaplotype (as measured by eMOI) data in estimation of within-host P. vivax diversity in 104 independent cases. The boxplots present the median, interquartile range and minimum and maximum value. Overall, high concordance is observed between the two datasets. c presents the eMOI distributions at the country using data on 562 independent cases. The boxplots present the median, interquartile range and minimum and maximum value. Each country is presented in a different colour.

The eMOI distributions in Fig. 2c illustrate trends in within-host diversity at the country level in a larger panel of 562 isolates from 6 populations which had over 30 isolates available: Afghanistan (n = 157), Bangladesh (n = 32), Colombia (n = 32), Indonesia (n = 38), Ethiopia (n = 214), and Vietnam (n = 89). The lowest eMOI distribution was observed in Sumatra, Indonesia, suggestive of low endemicity relative to the other sites. Provincial distributions are provided in Supplementary Fig. 8.

Effective IBD capture

To assess the ability of the microhaplotype panel to capture IBD accurately, we used paneljudge (an R package to judge the performance of a panel of genetic markers using simulated data) to simulate the relative mean square error (RMSE) in estimation of nine pairwise IBD states: IBD = 0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.5, 0.75 and 0.99. The simulations were run in the 6 major populations using data from 91 assayable microhaplotypes (markers 354,590 and 419,038 excluded), revealing moderately high diversity (mean diversity range 0.44–0.67: Supplementary Fig. 9) and low RMSE ( < 0.12 for all pairwise IBD states) in each population (Supplementary Fig. 10, Fig. 3).

Fig. 3: Simulations of IBD estimation in different geographic areas using the microhaplotype panel.
figure 3

Root mean square error (RMSE) of relatedness estimates based on data simulated using nine different data-generating relatedness estimates, r (specifically IBD of 0.01 [essentially unrelated], 0.05 [very low related], 0.1 [low related], 0.15 [low related], 0.2 [low related], 0.25 [half-sibling], 0.5 [sibling], 0.75 [highly related], and 0.99 [essentially clonally identical]) with switch rate parameter k set to 5. Data were generated using paneljudge software on 91 high performance microhaplotypes in independent infections from each population (see sample sizes within the figure). In all populations, half-siblings and siblings had the highest RMSE, but this remained below 0.12 in all cases. Each country is presented in a different colour.

Potential of microhaplotype-based IBD to inform on recurrence

To demonstrate the potential of the assay to inform on the origin of recurrences, the microhaplotype-based IBD was determined using DCifer on data from 128 pairs of initial and recurrent infections (Supplementary Data 5). These isolates came from individuals enrolled into a randomized controlled trial (RCT) at two sites in Ethiopia and treated with either chloroquine (CQ), CQ plus primaquine (PQ), artemether-lumefantrine (AL) or AL plus PQ with one-year follow-up to assess the efficacy of PQ regimens for radical cure19. In the original RCT, genotyping data at 1–7 microsatellites were generated in 46 pairs of infections in which recurrence occurred within 42 days of follow-up. The microsatellite number was too low to enable accurate IBD determination; hence, pairs were defined as heterologous (reinfection/relapse) or homologous (recrudescence/relapse). Of the 46 pairs, 34 had microhaplotype data from our study. Amongst the 34 pairs, 44% (15/34) were defined as heterologous using the microsatellite data compared to 18% (6/34) defined as strangers (arbitrary threshold of IBD < 25%) with the microhaplotype-based IBD estimate (Supplementary Data 5). The relative overestimation of strangers with microsatellite data underscores the improved insight from IBD analysis.

Using the full set of 128 microhaplotype-genotyped pairs from the RCT, we used IBD distributions and thresholds to inform on probable relapse/recrudescence versus reinfection risks in each treatment arm under the assumption that highly related pairs are more likely to reflect relapses than reinfections (assumption suitable for low-inbreeding populations). Our results demonstrate that patients treated without PQ had a higher median IBD (1.0, IQR 0.66–1.0) compared to those treated with PQ (0.50, IQR 0.05–0.99) (p = 0.004, Mann-Whitney U test), consistent with a greater risk of relapsing and recrudescent infections. In the CQ arm, 4% (2/46) of genotyped recurrences occurred by day 28, inferring majority of highly related pairs were likely relapses. The median IBD in the CQ arm (0.98, IQR 0.57–1) was significantly higher than the CQ + PQ arm (0.36, IQR 0–0.98) (p = 0.008, Mann-Whitney U test) (Fig. 4a). Using the IBD ≥ 25% threshold, a higher proportion of highly related pairs were observed in the CQ (84%, 38/45) versus CQ + PQ arm (54%, 7/13) (χ2 = 3.8, p = 0.051).

Fig. 4: IBD distributions by treatment and time to recurrence in a randomized controlled trial conducted in Ethiopia.
figure 4

a presents the IBD distributions in initial and recurrent infection pairs across all pairs and grouped by treatment arm; AL (Artemether-Lumefantrine), CQ (Chloroquine), AL + PQ (AL + Primaquine) and CQ + PQ. Each treatment is presented with a different colour. b presents the same IBD data grouped by recurrences occurring less versus more than 120 days after the initial infection. Each boxplot presents the median, interquartile range and minimum and maximum value. Data are presented on recurrence pairs from 128 independent patients, with only one infection pair presented per patient to avoid potential bias (n = 128 patients, 256 samples). Majority (90%, 115/128) of pairs reflect day 0 and recurrence 1 time points. Where the day 0 or recurrence 1 infections failed genotyping or had inconclusive clinical metadata, consecutive pairs of recurrence 2 to 4 pairs were used instead as the patients received the same treatment up to recurrence 4.

In the AL arm, 41% (23/56) of recurrences occurred by day 42 (range 21–42 days) with less clear distinction of relapse versus recrudescence. The AL arm had higher IBD (1, IQR 0.71–1) than the AL + PQ arm (0.86, IQR 0.26–1) but the difference was not as large as observed with CQ and was not statistically significant (p = 0.155, Mann-Whitney U test) (Fig. 4a). There was also no significant difference between the AL (91%, 51/56) and AL + PQ (71%, 10/14) arms with the IBD ≥ 25% threshold (χ2 = 2.3, p = 0.129). Owing to the differences in PQ impact amongst the CQ and AL arms, we compared the median IBD of the CQ + PQ (0.36, IQR 0–0.98) and AL + PQ (0.86, IQR 0.26–1) arms, but the difference was not significant (p = 0.116, Mann-Whitney U test).

We assessed the distribution of IBD over time using a 120-day threshold in line with the typical relapse periodicity of 4 months for African P. vivax infections20. Recurrences within 120 days of treatment had higher median IBD (0.98, IQR 0.58–1.0) compared to those after 120 days (0.73, IQR 0.12–1.0). The results are consistent with a rising proportion of reinfections at later time points, but the difference was not significant (p = 0.083, Mann-Whitney U test) (Fig. 4b). Using the IBD ≥ 25% threshold, significantly more highly related infections were observed pre (89%, 11/96) versus post (66%, 21/32) day 120 (χ2 = 7.32, p = 0.007).

Spatial transmission potential

The utility of the marker panel to define spatial transmission dynamics was explored using the amplicon sequencing data and genotypes derived from whole genome sequencing data. As illustrated in the neighbour-joining tree in Fig. 5, SNP-based identity-by-state (IBS) analyses on 728 low-complexity infections demonstrated a high degree of differentiation by geographic location. In areas with both amplicon sequencing and genomic data, there was no evident differentiation by methodology, supporting the robustness of merging the data sources. IBD analyses, which rely on accurate population allele frequency estimates, were focused on the amplicon sequencing data from the large sample sets (n ≥ 30) and used both clonal and polyclonal infections. The IBD analyses revealed evidence of local and inter-site transmission networks even at IBD thresholds exceeding ~25% (Fig. 6). For example, moderate differentiation was observed between Gia Lai and Binh Phuoc Province in Vietnam, and between Gamo Zone and East Shewa Zone in Ethiopia (Fig. 6d, f). Supplementary Fig. 10 illustrates the connectivity patterns at thresholds as low as 5%, further demonstrating the differentiation within Ethiopia and Vietnam. IBD analysis also revealed clonal clustering in Sumatra, consistent with the low eMOI distribution (Fig. 6e).

Fig. 5: Identity-by-state-based spatial patterns using AmpSeq and WGS data.
figure 5

The plot presents an unrooted neighbour-joining tree derived from a distance matrix on the microhaplotype calls using genotypes derived from microhaplotype and WGS data at the microhaplotype marker positions only. The neighbour-joining tree illustrates largely distinct clustering by country, except for neighboring Cambodia and Vietnam, and Bangladesh and Thailand. In countries with both WGS (triangles) and AmpSeq data (circles), no evidence of clustering by sequencing method is observed; although separation is observed in Indonesia, the WGS data from this region derives from Papua Province, whilst the AmpSeq data derives from Sumatra Province, located on different islands nearly 2500 Km apart. The plots were generated using data on 728 independent, monoclonal infections.

Fig. 6: IBD-based spatial patterns using microhaplotype data.
figure 6

af present networks illustrating IBD-based connectivity between infections in Afghanistan (a), Bangladesh (b), Colombia (c), Ethiopia (d), Indonesia (e) and Vietnam (f). Each shape reflects an infection, colour-coded by site, and with shapes reflecting monoclonal (circle) versus polyclonal (square) infections. For each country, connectivity (illustrated by connecting lines on a grey scale between shapes) is presented at IBD thresholds ranging from ≥0.24 (thin, light grey lines) to ≥0.95 (thick, black lines). The baseline sample positions are based on the ~25% (0.24) IBD output. IBD measures were calculated on the microhaplotype calls using DCifer software. At the ~25% IBD threshold, large networks (10 or more connected infections) are largely confined to cases from the same site, with one or two cases connecting cases or other networks from different sites; these networks shrink with increasing IBD in all sites except for Sumatra, Indonesia, where the networks appear to reflect clonal clusters (retained at IBD ≥ 95%). All plots were generated using data on independent infections with sample sizes shown within the plots.

Effective plasmodium species confirmation

In addition to the microhaplotypes, a previously described mitochondrial amplicon was included in the assay to confirm Plasmodium spp.21. The assay amplifies coordinates PvP01_MIT_V2:2904-3149, which include species-specific SNPs and indels. Using P. vivax samples from a range of countries, and P. falciparum, P. malariae, P. ovale spp. and P. knowlesi negative controls, we confirmed amplification of the mitochondrial marker at moderate to high depth and coverage in all Plasmodium species (range 28–5236) (Supplementary Table 4). The method also allows for detection of mixed-species infection, which was tested in 3 artificially mixed samples with P. vivax and P. falciparum. Concordance between PCR-based and mitochondrial species classification was confirmed for each of the Plasmodium samples, with 2/3 artificially mixed samples detected successfully.

Drug resistance candidates

Our assay included a non-exhaustive selection of amplicons encompassing candidate markers of antimalarial drug resistance including the multidrug resistance 1 (pvmdr1) 976 and 1076 loci, and dihydropteroate synthase (pvdhps) 383 and 553 loci that have been associated with ex vivo or clinical phenotypes (see ref.22). The dihydrofolate reductase (pvdhfr) 57, 58, 61 and 117 loci did not multiplex effectively with the other markers based on bioinformatic predictions and were thus not included, which limits detection of full sulphadoxine-pyrimethamine resistance. The prevalence of the variants by population is summarized in Fig. 7a and Supplementary Table 5. The prevalence of the pvmdr1 Y976F variant, the most widely characterized candidate of CQR, ranged from 0% in Afghanistan to 100% in Sumatra, Indonesia23. The F1076L variant, which has also been implicated in CQR, exceeded 90% frequency in all countries except Colombia (4%). The prevalence of A383G mutation in pvdhps, a marker of antifolate resistance, varied highly, ranging from 2% in Afghanistan to 86% in Colombia. The pvdhps A553G mutation was observed at 3% in Vietnam and 20% in Bangladesh but absent in other populations.

Fig. 7: Amino acid frequencies at P. vivax drug resistance candidates.
figure 7

The plots present frequency and corresponding upper and lower 95% confidence intervals (CIs) for the given amino acid changes in baseline population samples from each country. All frequencies reflect the suspected drug resistance-conferring amino acid. All plots were generated using independent, monoclonal samples (n = 372). Each country is presented in a different colour.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Line-ups confirmed for Europa League last-16 clash

Steve Martin and Martin Short Will Resume Touring After Daughter’s Death

Iran says its not going close Strait of Hormuz

Who Will Top Next Winter’s Free Agent Hitting Class?