Enhanced blood parasite species identification using V4–V9 18S rDNA barcoding by universal primers on a nanopore platform

Categories: Disease & Virus

November 21, 2025

Primer design

To cover blood parasites from several taxonomic lineages and to get enough species-level resolutions, universal primers covering a high ratio of eukaryotic organisms with a > 1 kb 18S rDNA barcode were selected (Table 1; Fig. 1A). Target regions spanned from variable area 4 (V4) to V9, based on the naming scheme according to the previous publication²⁰. To simulate the classification of parasite species with error-prone portable NGS sequence data, one thousand error-containing sequences for V9 and V4-9 were obtained by introducing random mutations with various error ratios per base from the reference 18S rDNA sequence of P. falciparum, knowlesi, malariae, ovale and vivax. When these sequences were classified by blastn with modified parameters (-task blastn for somewhat similar sequences), up to 1.7% o top hit were misassigned to another species with V9 regions depending on the error rate (Table 2). When these sequences were classified by ribosomal database project (RDP) naive Bayesian classifier method, proportion of sequences without classification above the bootstrap values threshold (> 50%) increased in error rate dependent manner with V9 region (Fig. 1B).

Table 1 Primer sequences.

Table 2 BLASTn based miss-classified sequence number out of thousand simulated error containing 18S rDNA barcodes.

The parameter adjustment of blast search was critical to get similar sequence by blastn with error prone sequence data. With default parameter settings of balstn (-task megablast), more than 50% of the sequences were classified as “no hit” when V9 sequences were analyzed, depending on the error rates even with the large database of NCBI nt database (Supplemental Fig. 1).

F566 primer targets the fourth conserved area before the V4 (Figs. 1A and 2A), 1776R primer targets the 10th conserved area after V9 (Figs. 1A and 2B), and both primers covered the wide range of eukaryotic organisms including the representative eukaryotic pathogens from kingdom Fungi, phylum Nematoda, Platyhelminth, Apicomplexa and Euglenozoa (Fig. 2A and B). Of note, six nucleotides (nts) from the 5’ terminus of F566 had one mismatch with the 18S rDNA of Trypanosoma cruzi, T. brucei, and Leishmania donovani.

Next, non-biased ribosomal RNA small subunit (SSU) sequences from all domains (bacteria, archaea, and eukaryotes) were analyzed with the primer annealing sites. SSU sequences were retrieved from Silva database²¹. The number of organisms having SSU sequence(s) with primer annealing sites was counted. Universal primers (F566 and 1776R, Table 1) had annealing sites with fewer than three total mismatches in over 60% ad less than 1% of SSU entries from eukaryotic or non-eukaryotic organisms, respectively (Supplemental Fig. 2A). This primer covers a reasonable number of organisms deposited to the database in the following taxonomic lineages for blood parasites, order Haemosporida and Piroplasmida in phylum Apicomplexa, order Trypanosomatida in phylum Euglenozoa, order Rhabditida in phylum Nematoda, and order Strigeidida in phylum Platyhelminth (Supplemental Fig. 2B).

Blocking primers to suppress the overwhelming host DNA amplification by universal primer

To suppress overwhelming host DNA amplification by the universal primer for pan-eukaryotic organisms, we designed two blocking primers (Tables 1 and Fig. 1A). 3SpC3_Hs1829R was designed to overlap with universal reverse primer 1776R, to have C3 spacer modification at 3’ terminal to block the extension of the polymerase by competitive clumping manner, and to have host mammalian 18S rDNA specific sequence at 3’ end with middle six inosine babbles, according to the previously validated blocking primer design scheme²². Sequence alignment shows that mammalian host sequences have the blocking primer annealing site with fewer than two mismatches, but the other eukaryotic organisms have more than ten mismatches (Fig. 2B). In silico search of this blocking primer sequence in the SSU sequences in the SILVA database also showed that this blocking primer is selective to mammalian hosts but not in the other eukaryotic organisms for known blood pathogens (Supplemental Fig. 2C). Another blocking primer, PNA_Hs733F, was designed to have perfect matches to mammalian sequences in the V4 region (Figs. 1A and 2C) and was selected to have the least appearance of the blocking primer sequences in non-mammalian sequences throughout the SSU sequence between universal primers F566 and 1776R (Supplemental Fig. 1D). PNA_Hs733F was synthesized to have a peptide backbone to suppress the polymerase extension by an elongation arrest manner. To observe the effect of the designed blocking primers, DNA extracted from human whole blood spiked with T. brucei rhodesiense parasites was amplified with universal primers with various concentrations of blocking primers (Fig. 3). Both of 3SpC3_Hs1829R (Fig. 3A) and PNA_Hs733F (Fig. 3B) suppressed the amplification of human 18S rDNA in a concentration-dependent manner, whereas the amplification of parasite 18S rDNA was not affected. The combination of two blocking primers further suppressed human 18S rDNA amplification (Supplemental Fig. 3). These qualitative results were further validated with the quantitative analysis including the nanopore portable NGS sequencing analysis in the following experiments.

Sensitivity of the designed parasite targeted NGS test using universal primers

Next, we combined DNA barcode amplification using universal primers and blocking primers with actual NGS analysis using a portable nanopore sequencing device to establish a parasite targeted NGS test. For the proof of concept, various numbers of parasites were spiked in the healthy human whole blood, and the extracted DNA was used for the PCR amplification of 18S rDNA barcodes. PCR without blocking primers using the universal primer amplifies human 18S rDNA, detected as dominant bands around 1.2 kb throughout the various numbers of T. brucei rhodesiense (mock control and from 10, 10², 10³, 10⁴ parasites/mL) spiked samples (Fig. 4A, lanes 2–6). When the blocking primers were added to the PCR reaction, Trypanosoma-derived 18S rDNA amplicons (~ 1.5 kb) were visible up to the spiked samples with 10³ parasites/mL (Fig. 4A, lane 11). NGS analysis of those amplicons by the portable nanopore sequencing device also validated that Trypanosoma sequences were detectable from the samples with 10³ and 10⁴ parasites/mL (Fig. 4A, lanes 11 and 12, respectively). With this experiment, we observed a significant proportion of the sequences obtained by the portable nanopore sequencing was found to be nonspecifically amplified genomic regions from host genome (host non-18S) (Fig. 4A). Thus, to decrease the possibility of amplifying those DNA non-specifically with many PCR cycles, we decreased the PCR cycles from 40 to 35 in following experiments. When P. falciparum and B. bovis was spiked in the healthy human whole blood (mock control and from 4 × 10³ to 1.5 × 10⁶parasites/mL, lanes 1 to 6) PCR with blocking primers successfully amplified detectable 18S rDNA barcdoes (Fig. 4B and C, respectively). As common in the other eukaryotic organisms, B. bovis (Fig. 4C) has a matching size of 18S rDNA with human 18S rDNA (~ 1.2 kb), thus the DNA electrophoresis itself cannot ascertain the origin of the amplicon. NGS analysis of those amplicons by portable nanopore sequencer validated that parasites derived sequences were dominant with the samples from 2 × 10⁴ and 4 × 10³ parasites/mL, (for P. falciparum [lane 3] and B. bovis [lane2], respectively). Of note, some of the reads in the PCR amplicon corresponding to the lane 6 in B. bovis are classified as Trichosporon insectorum, suggesting that the environmental contamination during the PCR steps might happen.

A validation of the test with field cattle blood samples

Finally, we tested the established parasite targeted NGS test with field samples. For this purpose, the blood DNA from the cattle from the area with a high prevalence of parasite infection was used. We analyzed three cattle samples, of which two were already known to be positive for Theileria mutans and T. velifera, and one sample was negative for those Theileria parasites by the conventional PCR test²³.

18S rDNA barcodes were amplified with universal primers, with various concentrations of blocking primers, and amplicons were sequenced to see the organisms inside (Fig. 5). To speed up the analytical steps, we first removed host sequences by blasting them to the 18S rDNA database, and after that, the sequence-clusters were made to obtain the accurate 18S rDNA barcode (Fig. 5). Without any blocking primers, host 18S rDNA sequences accounted for more than 90% of the amplicons from all three samples tested (Fig. 5). The addition of 2 µM 3SpC3_Hs1829R suppressed the amplification of host 18S rDNA, and the combination with PNA_Hs733F (1.25 µM and 5 µM final for labels PNA1 and PNA5, respectively) further suppressed the host 18S rDNA amplification in a concentration-dependent manner (Fig. 5, Supplemental Fig. 4). As a result, the sequence reads assigned to the order of piroplasmida increased from < 1.1% to > 37.5% (Supplemental Fig. 4). After the clustering of the error-containing sequence, we could estimate the accurate 18S rDNA barcode [Accession number LC878354 and LC878355], identical to the sequence deposited to the NCBI genbank (KU206307: T. velifera, and KU206320: T. mutans) from all of the three samples (Fig. 5). One bp deletion in T. mutans sequence from BA48 + PNA5 condition, and 1 bp insertion in T. velifera sequence from NN96 + condition were observed (Supplemental Table 1). Considering the other clustered sequences from the same sample with different PCR conditions supported the identical sequence to deposited 18S rDNA barcodes, those consensus sequences could reflect sequencing error of portable nanopore sequencing device. From amplicon of BA38 by universal primers for 18S, we detected 16S rDNA barcode for Anaplasma marginale [Accession number: LC878356] and another genomic DNA sequence [Accession number: LC878357] (supplemental Table 1), 16S rDNA barcode was 100% identical to the reported 16S rRNA sequence of A. marginale KU686794.1, and genomic DNA sequence [LC878357] had 95.48% identity to the whole genome sequence of A. marginale strain Florida CP001079.1. The presence of this pathogen was further validated by the conventional PCR targeting gltA gene [Accession number: LC878358] to be 99.74% identical to the reported A. marginale gltA sequence OQ185251.1. A. marginale is a causative bacterial agent for Anaplasmosis and the universal primers we used had several mismatches to A. marginale SSU (F566; two mismatches in one and three nts from 3’ terminal, and no mismatches in 1776R with A. marginale SSU sequence: KU686778.1).