Stock Ticker

Enhanced blood parasite species identification using V4–V9 18S rDNA barcoding by universal primers on a nanopore platform

Primer design

To cover blood parasites from several taxonomic lineages and to get enough species-level resolutions, universal primers covering a high ratio of eukaryotic organisms with a > 1 kb 18S rDNA barcode were selected (Table 1; Fig. 1A). Target regions spanned from variable area 4 (V4) to V9, based on the naming scheme according to the previous publication20. To simulate the classification of parasite species with error-prone portable NGS sequence data, one thousand error-containing sequences for V9 and V4-9 were obtained by introducing random mutations with various error ratios per base from the reference 18S rDNA sequence of P. falciparum, knowlesi, malariae, ovale and vivax. When these sequences were classified by blastn with modified parameters (-task blastn for somewhat similar sequences), up to 1.7% o top hit were misassigned to another species with V9 regions depending on the error rate (Table 2). When these sequences were classified by ribosomal database project (RDP) naive Bayesian classifier method, proportion of sequences without classification above the bootstrap values threshold (> 50%) increased in error rate dependent manner with V9 region (Fig. 1B).

Table 1 Primer sequences.
Fig. 1
figure 1

18S rDNA barcode and primers in this study. (A) Schematic illustration of the primers used in this study. Universal Primers (F566 and 1776R, Table 1) to amplify V4-9 region are shown in white arrows. Blocking primers (PNA_Hs733F and 3SpC3_Hs1829R) are shown in black arrows. Variable areas of 18S rDNA gene are shown in white boxes. Universal primers (1391f and EukBr) targeting V9 region are shown in grey arrows. Size of the sequence and primer arrows are not scaled to the actual lengths. 3SpC3_Hs1829R and 1776R have actual overlapping bases for four bases. (B) RDP naive bayesian classifier method based classification results of the simulated error containing 18S rDNA barcodes. Proportions of the assigned classification in species levels are shown. NA means that taxonomic assignment were not beyond the bootstrap value threshold (> 50%).

Table 2 BLASTn based miss-classified sequence number out of thousand simulated error containing 18S rDNA barcodes.

The parameter adjustment of blast search was critical to get similar sequence by blastn with error prone sequence data. With default parameter settings of balstn (-task megablast), more than 50% of the sequences were classified as “no hit” when V9 sequences were analyzed, depending on the error rates even with the large database of NCBI nt database (Supplemental Fig. 1).

F566 primer targets the fourth conserved area before the V4 (Figs. 1A and 2A), 1776R primer targets the 10th conserved area after V9 (Figs. 1A and 2B), and both primers covered the wide range of eukaryotic organisms including the representative eukaryotic pathogens from kingdom Fungi, phylum Nematoda, Platyhelminth, Apicomplexa and Euglenozoa (Fig. 2A and B). Of note, six nucleotides (nts) from the 5’ terminus of F566 had one mismatch with the 18S rDNA of Trypanosoma cruzi, T. brucei, and Leishmania donovani.

Fig. 2
figure 2

Multiple Sequence alignment of primer annealing site of 18S rDNA barcodes. Regions for Universal primer F566 (A), 1776R and blocking primer 3SpC3_Hs1829R (B), and PNA_Hs733F (C) are shown. Bases in human 18S rDNA are shown in color background and bases different from human sequence in the corresponding regions are shown in color background. Black boxes show universal primer annealing sites and red boxes show blocking primers annealing sites. Reverse primers are shown in reverse complement from the actual primer sequences. Reverse complement of Inosine bases for 3SpC3_Hs1829R are shown as N for visualization.

Next, non-biased ribosomal RNA small subunit (SSU) sequences from all domains (bacteria, archaea, and eukaryotes) were analyzed with the primer annealing sites. SSU sequences were retrieved from Silva database21. The number of organisms having SSU sequence(s) with primer annealing sites was counted. Universal primers (F566 and 1776R, Table 1) had annealing sites with fewer than three total mismatches in over 60% ad less than 1% of SSU entries from eukaryotic or non-eukaryotic organisms, respectively (Supplemental Fig. 2A). This primer covers a reasonable number of organisms deposited to the database in the following taxonomic lineages for blood parasites, order Haemosporida and Piroplasmida in phylum Apicomplexa, order Trypanosomatida in phylum Euglenozoa, order Rhabditida in phylum Nematoda, and order Strigeidida in phylum Platyhelminth (Supplemental Fig. 2B).

Blocking primers to suppress the overwhelming host DNA amplification by universal primer

To suppress overwhelming host DNA amplification by the universal primer for pan-eukaryotic organisms, we designed two blocking primers (Tables 1 and Fig. 1A). 3SpC3_Hs1829R was designed to overlap with universal reverse primer 1776R, to have C3 spacer modification at 3’ terminal to block the extension of the polymerase by competitive clumping manner, and to have host mammalian 18S rDNA specific sequence at 3’ end with middle six inosine babbles, according to the previously validated blocking primer design scheme22. Sequence alignment shows that mammalian host sequences have the blocking primer annealing site with fewer than two mismatches, but the other eukaryotic organisms have more than ten mismatches (Fig. 2B). In silico search of this blocking primer sequence in the SSU sequences in the SILVA database also showed that this blocking primer is selective to mammalian hosts but not in the other eukaryotic organisms for known blood pathogens (Supplemental Fig. 2C). Another blocking primer, PNA_Hs733F, was designed to have perfect matches to mammalian sequences in the V4 region (Figs. 1A and 2C) and was selected to have the least appearance of the blocking primer sequences in non-mammalian sequences throughout the SSU sequence between universal primers F566 and 1776R (Supplemental Fig. 1D). PNA_Hs733F was synthesized to have a peptide backbone to suppress the polymerase extension by an elongation arrest manner. To observe the effect of the designed blocking primers, DNA extracted from human whole blood spiked with T. brucei rhodesiense parasites was amplified with universal primers with various concentrations of blocking primers (Fig. 3). Both of 3SpC3_Hs1829R (Fig. 3A) and PNA_Hs733F (Fig. 3B) suppressed the amplification of human 18S rDNA in a concentration-dependent manner, whereas the amplification of parasite 18S rDNA was not affected. The combination of two blocking primers further suppressed human 18S rDNA amplification (Supplemental Fig. 3). These qualitative results were further validated with the quantitative analysis including the nanopore portable NGS sequencing analysis in the following experiments.

Fig. 3
figure 3

Concentration dependent effects of blocking primers. Various concentrations of blocking primers were added while amplifying 18S rDNA barcode from the T. brucei rhodesiense spiked human blood samples (105 parasites/mL) for 3SpC3_Hs1829R (A) and PNA_Hs733F (B). Bands above 1500 bp are Trypanosoma derived 18S rDNA and ~ 1200 bp amplicons are Human derived 18S rDNA. When 3SpC3_Hs1829R was added to the PCR reaction, primer dimers were observed below 250 bp.

Sensitivity of the designed parasite targeted NGS test using universal primers

Next, we combined DNA barcode amplification using universal primers and blocking primers with actual NGS analysis using a portable nanopore sequencing device to establish a parasite targeted NGS test. For the proof of concept, various numbers of parasites were spiked in the healthy human whole blood, and the extracted DNA was used for the PCR amplification of 18S rDNA barcodes. PCR without blocking primers using the universal primer amplifies human 18S rDNA, detected as dominant bands around 1.2 kb throughout the various numbers of T. brucei rhodesiense (mock control and from 10, 102, 103, 104 parasites/mL) spiked samples (Fig. 4A, lanes 2–6). When the blocking primers were added to the PCR reaction, Trypanosoma-derived 18S rDNA amplicons (~ 1.5 kb) were visible up to the spiked samples with 103 parasites/mL (Fig. 4A, lane 11). NGS analysis of those amplicons by the portable nanopore sequencing device also validated that Trypanosoma sequences were detectable from the samples with 103 and 104 parasites/mL (Fig. 4A, lanes 11 and 12, respectively). With this experiment, we observed a significant proportion of the sequences obtained by the portable nanopore sequencing was found to be nonspecifically amplified genomic regions from host genome (host non-18S) (Fig. 4A). Thus, to decrease the possibility of amplifying those DNA non-specifically with many PCR cycles, we decreased the PCR cycles from 40 to 35 in following experiments. When P. falciparum and B. bovis was spiked in the healthy human whole blood (mock control and from 4 × 103 to 1.5 × 106parasites/mL, lanes 1 to 6) PCR with blocking primers successfully amplified detectable 18S rDNA barcdoes (Fig. 4B and C, respectively). As common in the other eukaryotic organisms, B. bovis (Fig. 4C) has a matching size of 18S rDNA with human 18S rDNA (~ 1.2 kb), thus the DNA electrophoresis itself cannot ascertain the origin of the amplicon. NGS analysis of those amplicons by portable nanopore sequencer validated that parasites derived sequences were dominant with the samples from 2 × 104 and 4 × 103 parasites/mL, (for P. falciparum [lane 3] and B. bovis [lane2], respectively). Of note, some of the reads in the PCR amplicon corresponding to the lane 6 in B. bovis are classified as Trichosporon insectorum, suggesting that the environmental contamination during the PCR steps might happen.

Fig. 4
figure 4

Selective enrichment of parasites 18S rDNA from parasite spiked samples. (A) PCR products from human blood samples spiked with various number of T. brucei rhodesiense are shown. Lanes 1 and 7, no template control. Lanes 2–6 and 8–12, parasite spiked whole blood samples (0, 10, 100, 1000 and 10000 parasites/mL, respectively). Forty cycles of PCR reaction were used for this experiment. (B, C) PCR products from blood samples spiked with various number of ring stage P. falciparum (B) or B. bovis (C) are shown. Lane 1: no template control. Lanes 2–6: parasite spiked whole blood samples (0, 4000, 20000, 105, 5 × 105, 1.5 × 106 parasites/mL, respectively). Thirty-five cycles of PCR reactions were used to obtain these results. (A, B, and C) Bottom panels show the portable nanopore sequencing results of PCR amplicons. Up to five thousand reads were analyzed. host 18S rDNA were first removed by the blast search against eukaryotic 18S rDNA database. All the sequences belonging to phylum Chordata were considered to be host derived barcodes and are shown as red box: “host 18S”. Non-host 18S sequences were further clustered to give accurate estimation of sequence of origin and classified by the blast search against ncbi nt database. All the PCR reactions were performed with 2 µM 3SpC3_Hs1829R blocking primers and 5 µM PNA_Hs733F blocking primers, except panel A, lanes 1–6, which are the control reaction without any blocking primers.

A validation of the test with field cattle blood samples

Finally, we tested the established parasite targeted NGS test with field samples. For this purpose, the blood DNA from the cattle from the area with a high prevalence of parasite infection was used. We analyzed three cattle samples, of which two were already known to be positive for Theileria mutans and T. velifera, and one sample was negative for those Theileria parasites by the conventional PCR test23.

18S rDNA barcodes were amplified with universal primers, with various concentrations of blocking primers, and amplicons were sequenced to see the organisms inside (Fig. 5). To speed up the analytical steps, we first removed host sequences by blasting them to the 18S rDNA database, and after that, the sequence-clusters were made to obtain the accurate 18S rDNA barcode (Fig. 5). Without any blocking primers, host 18S rDNA sequences accounted for more than 90% of the amplicons from all three samples tested (Fig. 5). The addition of 2 µM 3SpC3_Hs1829R suppressed the amplification of host 18S rDNA, and the combination with PNA_Hs733F (1.25 µM and 5 µM final for labels PNA1 and PNA5, respectively) further suppressed the host 18S rDNA amplification in a concentration-dependent manner (Fig. 5, Supplemental Fig. 4). As a result, the sequence reads assigned to the order of piroplasmida increased from < 1.1% to > 37.5% (Supplemental Fig. 4). After the clustering of the error-containing sequence, we could estimate the accurate 18S rDNA barcode [Accession number LC878354 and LC878355], identical to the sequence deposited to the NCBI genbank (KU206307: T. velifera, and KU206320: T. mutans) from all of the three samples (Fig. 5). One bp deletion in T. mutans sequence from BA48 + PNA5 condition, and 1 bp insertion in T. velifera sequence from NN96 + condition were observed (Supplemental Table 1). Considering the other clustered sequences from the same sample with different PCR conditions supported the identical sequence to deposited 18S rDNA barcodes, those consensus sequences could reflect sequencing error of portable nanopore sequencing device. From amplicon of BA38 by universal primers for 18S, we detected 16S rDNA barcode for Anaplasma marginale [Accession number: LC878356] and another genomic DNA sequence [Accession number: LC878357] (supplemental Table 1), 16S rDNA barcode was 100% identical to the reported 16S rRNA sequence of A. marginale KU686794.1, and genomic DNA sequence [LC878357] had 95.48% identity to the whole genome sequence of A. marginale strain Florida CP001079.1. The presence of this pathogen was further validated by the conventional PCR targeting gltA gene [Accession number: LC878358] to be 99.74% identical to the reported A. marginale gltA sequence OQ185251.1. A. marginale is a causative bacterial agent for Anaplasmosis and the universal primers we used had several mismatches to A. marginale SSU (F566; two mismatches in one and three nts from 3’ terminal, and no mismatches in 1776R with A. marginale SSU sequence: KU686778.1).

Fig. 5
figure 5

Identification of multiple pathogens co-infection from cattle blood samples. An analytical pipeline is shown in the left panel. After the standard quality filtering of the portable nanopore sequencing data, host 18S rDNA were first removed by the blast search against eukaryotic 18S rDNA database. All the sequences belonging to phylum Chordata were considered to be host derived barcodes and are shown as white box: “host”. Non-host 18S sequences were further clustered to give accurate estimation of sequence of origin and classified by the blast search against ncbi nt database. Sequences which were not clustered were classified as “not Clustered” and are shown in black. When less than 80 sequences were obtained as non-host 18S reads, those reads number were considered not enough to get accurate clustering, thus were classified as “not Clustered”.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Guess Which Oscar Nominee This Kiddo Turned Into!

£1,000 buys 300 shares in this red-hot UK gold stock with a P/E ratio of 3

Spain February final CPI +2.3% vs +2.3% y/y prelim

Products to Kick Off Spring Cleaning