A targeted tiled amplicon sequencing approach for clade and subclade level differentiation of monkeypox virus from wastewater

Categories: Disease & Virus

August 11, 2025

Wastewater sample collection

Twenty-four hour composite influent wastewater from four different wastewater treatment plant sites in a large Canadian city with a combined catchment population nearing three million were used in this study (Fig. 1). Composite wastewater influent samples from each site were collected using an autosampler on a biweekly basis from September to October of 2024 as a total composite volume of 400–500 ml. Sterile polyethylene terephthalate bottles were used for collection and storage. Samples were shipped to the JC Wilt Infectious Diseases Research Centre in Winnipeg, Manitoba on ice and stored at 4 °C upon receipt until processing.

Capture and concentration of MPXV

Initial attempts to concentrate MPXV from wastewater samples were performed using Nanotrap^® microbiome A particles (Ceres Nanosciences, Virginia, USA) according to manufacturer described methods using 40 mL of wastewater, a procedure currently in use within our laboratory for SARS-CoV-2. Due to a lack of recovery of MPXV nucleic acid, the starting volume of wastewater influent was increased to 120 mL which similarly did not capture sufficient material for a sequencing reaction. In response, a review of the published literature identified additional concentration methods for capture of MPXV from wastewater, including polyethylene glycol (PEG) precipitation of total suspended solids and extraction of the pellet fraction^41,42. Several attempts to rapidly identify a suitable wastewater processing method were unsuccessful, prompting a shift from processing of the liquid fraction or suspended solids to capture of MPXV from total wastewater solids. This method, adapted from Mejia et al. (2024)³⁰, consisted of centrifuging 120 mL of wastewater for 30 min at 12,000 x g to generate a more solid pellet. The supernatant was removed and the pellet was lysed by addition of 800 µL of CD buffer at room temperature followed by nucleic acid extraction using the Qiagen MagAttract PowerMicrobiome DNA/RNA KF extraction kit (Cat. No. 27600-4-KF) with a final elution into 100 µl of buffer EB. To improve the sequencing yield, a clean-up step was added using a 1:1 ratio of AMPure XP beads for DNA cleanup (Beckman Coulter, Brea, California, USA, Cat. No. A63881), 75% ethanol for washes, and 100 µl of Qiagen buffer EB for the final elution step. Detailed protocols of each of the above methods is shown in Figure S1. All results described in this study were obtained using the method designated as Method F (Figure S1) for wastewater processing and nucleic acid extraction prior to amplicon sequencing using the newly designed tiled primer schema presented in this study targeting a 4.2-kb genomic ITR region.

Real time quantitative polymerase chain reaction (RT-qPCR)

To determine the presence of MPXV in wastewater, samples were screened prior to sequencing using previously published RT-qPCR assays (G2R_G⁴³, G2R_NML³⁰ and F3L⁴⁴. PrimeTime Probes (Integrated DNA Technologies) were synthesized with a 5’ 6-FAM fluorophore and 3’ ZEN-Iowa Black FQ quencher. Primer and probe concentrations are as described in the literature^30,31,43. Each PCR reaction was performed in triplicate in a total volume of 20 µL using the QuantiNova Multiplex PCR Kit (Qiagen, 208452) according to the manufacturer’s specifications. The QuantStudio 5 Real-Time PCR System (Applied Biosystems, A28138) was used with the following conditions: 95 °C for 2 min, followed by 42 cycles of amplification at 95 °C for 5 s and 60 °C for 30 s. Samples with a cycle threshold value (C_T) of less than 38 for any one of the three assays were selected for sequencing. PCR assay positive controls consisted of both a high and low concentration of gBlocks^® Gene Fragments (Integrated DNA Technologies) for each PCR target. Positive controls included in all runs consisted of gBlock^® Gene Fragments (Integrated DNA Technologies) containing the fragment of interest (gBlock sequences and other information available in Table S1) at final concentrations of 10 cp/µL and 1000 cp/µL and diluted AMPLIRUN Monkeypox Virus DNA Control at a final concentration of 35 cp/ml (Vircell Molecular, MBC146-R).

Quantification of DNA standards

The initial concentration of each gBlocks^® Gene Fragment (Integrated DNA Technologies) was quantified by digital PCR according to the manufacturer’s recommendations. Briefly, each gene fragment was diluted in nuclease-free water to a calculated concentration of 10,000 cp/ µL and tested in quadruplicate. A reaction mix volume of 9 µL was prepared using Absolute Q DNA Digital PCR MasterMix (5X) (Applied Biosystems, A52490), primer and probe concentrations of 100 µM, and 2 µL of each diluted gene fragment. Absolute quantification was performed using the QuantStudio Absolute Q Digital PCR System (Applied Biosystems, A52864) under the following conditions: activation for 10 min at 96 °C followed by 40 cycles of 96 °C for 5 s and 60 °C for 15 s.

Generation of standard curves and determination of PCR assay sensitivity

A six-point standard curve from 5 to 500,000 copies per reaction was generated for each qPCR assay using quantified, serially diluted gBlock Gene Fragments (Integrated DNA Technologies) containing the target sequence of the G2R_G, G2R_NML and F3L RT-qPCR assays. Six replicates of each concentration were used per target. The sensitivity of each PCR assay was assessed using serially diluted gBlock Gene Fragments (Integrated DNA Technologies) containing 0.78 to 200 copies per reaction. Twenty replicates of each concentration were tested and a cycle threshold value of less than 40 was interpreted as a positive detection. To calculate the limit of quantification (LOQ), only concentrations in which all replicates were positive were used. The coefficient of variation (CV) was calculated for each concentration tested and fit to a linear regression model linking it to the template concentration⁴⁵. The LOQ was defined as the concentration at which a CV equal to 35% was achieved as predicted by linear regression⁴⁵. To determine the limit of detection (LOD), results were converted into binary values (i.e. positive and negative detection only) and a probit model was generated to predict the relationship between template concentration and positive detection. The LOD was defined as the lowest concentration at which a positivity rate of 95% was maintained as predicted by the probit model. All modeling was performed in RStudio using the tidyverse set of packages⁴⁶.

Tiled MPXV amplicon scheme design

Initial analysis was performed on a database of complete MPXV genomes downloaded from GISAID which were selected to represent the genetic diversity of the virus, including both recent outbreak and historical sequences. To reduce the redundancy and overall size of the dataset for processing, CD-HIT⁴⁷ was used to cluster and remove sequences with more than 95% sequence identity, resulting in a database of 49 MPXV genome sequences. The sequences were aligned with MAFFT⁴⁸ and the resulting alignment was manually investigated for the presence of variable regions suitable for design of a tiled amplicon scheme allowing for clade and subclade differentiation. This process resulted in the identification of an approximately 4.2 kb portion of the ITR region that was extracted from the alignment for further analysis. CD-HIT⁴⁷ was used to remove redundant sequences with greater than 99% identity, resulting in a final database of nine sequences spanning the target region which was used for subsequent assay design. A tiled amplicon scheme was designed with PrimalScheme v.1.3.2⁴⁹ using the final database of nine MPXV sequences. The resulting scheme consisted of 11 primer pairs generating PCR products ranging from 490 to 516 bp (Table 1). The primers were mapped to a whole genome database of representative MPXV sequences using Geneious Prime v2025.0.2⁵⁰ to screen for mismatches that could impact primer binding efficiency. This database consisted of recent whole genome sequences representing MPXV subclades from all global regions with collection dates starting January 1, 2023, downloaded from GISAID⁵¹ on November 20, 2024 with the “complete” and “low coverage excl” filters enabled. The only exception was clade IIa which had no sequences in the GISAID repository within the specified collection dates, so two historical sequences from the USA and Liberia isolated in 1962 and 1970 were used for analysis (GISAID accession EPI_ISL_13056556 and EPI_ISL_13058405, respectively). The sequences included are available on GISAID at https://doi.org/10.55876/gis8.250624tz or under ID EPI_SET_250624tz. A table of dataset information provided by GISAID as well as a list of GISAID accession numbers for all sequences used at the time of analysis are available as supplementary materials.

Table 1 MPXV 4.2 kb ITR tiled amplicon sequencing primers and amplicons. For each primer, two primer binding location ranges (with position values relative to reference NC_063383.1) are shown since the MPXV genome contains two ITRs and therefore each primer has two separate binding locations.

In Silico tiled sequencing primer scheme evaluation

The database of recent MPXV whole genome sequences described in the previous section for screening of primer binding efficiency was used for subsequent in silico analysis. The sequences were aligned using MAFFT⁴⁸ and the whole target region of the tiled amplicon scheme, excluding the terminal primer binding sites, was extracted from the alignments using Geneious Prime v2025.0.2⁵⁰. The resulting sequences were filtered using Cutadapt⁵² in order to remove sequences with more than 10% ambiguous (N) bases. The filtered sequences were then analyzed using Nextclade (v3.10.0, Mpox virus (All clades) reference dataset)⁵³ and the resulting clade assignments were compared to those from GISAID metadata in order to assess the clade differentiation capability and accuracy of the tiled amplicon scheme.

Similar analysis was also performed on each of the 11 individual amplicon regions to assess their capacity to differentiate clades with incomplete target region coverage. Using the MPXV alignments generated previously for whole target region in silico analysis, the primers targeting each amplicon were mapped and each amplicon region, excluding the primer binding sites (coordinates show in Table 1), was extracted using Geneious⁵⁰. The resulting sequences were filtered to remove sequences with any ambiguous bases (N) using Cutadapt⁵². This ambiguous base filtering cut-off was selected because due to the nature of PCR amplification, any amplicons that amplified efficiently should have complete coverage across their entire length. Similar to the whole genome sequence analysis, filtered sequences were analyzed using Nextclade (v3.10.0, Mpox virus (All clades) reference dataset)⁵³ and clade assignments were compared to those from GISAID metadata.

PCR amplification and next generation sequencing

Amplicons were generated using the Q5 Hot Start High-Fidelity DNA Polymerase kit (NEB, Massachusetts, USA; M0493S). For each sample, two PCR reactions were prepared, using either primer pool 1 (containing primers for odd-numbered amplicons) or primer pool 2 (containing primers for even-numbered amplicons) and the products of each primer pool were combined post-amplification. PCR master mixes were prepared using 5 µL of extracted nucleic acid, 12.5 µL of 5X Q5 Reaction Buffer, 1.44 µM of either pool 1 or 2 (3.6µL of a 10 µM pooled primer stock) and nuclease-free water up to a total volume of 25 µL. PCR was carried out at 98 °C for 3 min, followed by 35 cycles of 98 °C for 15 s and 61 °C for 5 min, with a final hold at 4 °C using an Eppendorf Mastercycler Nexus Gradient GSX1 Thermal Cycler (Fisher Scientific, E6332000029). All runs included a MPXV clade IIb positive control sample consisting of diluted AMPLIRUN Monkeypox Virus DNA Control (Vircell Molecular, MBC146-R). Additionally, non-MPXV Orthopoxvirus positive control DNA (vaccinia virus Western Reserve (NML collection, initially provided by Dr. David Evans, University of Alberta), was amplified and sequenced to confirm that sequencing results could be differentiated from MPXV.

The PCR products were run on a 4200 Tapestation System (Aglient Technologies, G2991BA) with a D5000 ScreenTape System (Agilent Technologies, 5067–5588, 5067–5589, 5067–5590) for confirmation of successful amplification following the standard manufacturer’s protocol. Quantification of each sample was done with a Qubit dsDNA Quantification Assay Kit (ThermoFisher Scientific, Q32851) on a Qubit Flex Fluorometer (ThermoFisher Scientific, Q33327) and Picogreen dsDNA Reagent (Invitrogen, P7581) using a FilterMax F5 Multi-Mode Microplate Reader (Molecular Devices, F5). Sequencing library preparation was performed on prepared amplicons using the Illumina Nextera XT library preparation kit as per manufacturer’s instructions. DNA libraries were quantified, pooled and sequenced on an Illumina NextSeq 2000 instrument using a P2 600 cycle kit (Illumina). All sequencing runs included a no template control sample, which underwent PCR amplification and sequencing in parallel with other samples but with water instead of a nucleic acid template.

Sequence assembly and analysis

Quality and primer trimming, read mapping, consensus sequence generation and clade assignment were performed using the nf-core/viralrecon v2.6.0 pipeline⁵⁴ using MPXV isolate M5312_HM12_Rivers (NCBI accession NC_063383.1), trimmed to the target PCR region excluding external primer binding regions, as the reference sequence (reference FASTA used included in supplementary materials). The pipeline was run using the viralrecon amplicon protocol⁵⁴ and a custom primer BED file containing the positions of the primers relative to the reference in order to enable trimming of primer sequences (primer BED file used included as supplementary materials). Clade assignment using Nextclade was enabled using the hMPXV nextclade dataset included in the pipeline at the time of analysis (v2.12.0). Due to the version of Nextclade included in the pipeline not being the most current, assembled consensus sequences were also analyzed using the browser version of Nextclade (v3.10.0, Mpox virus (All clades) reference dataset) to verify clade and sub-clade assignment calls. To further confirm the presence or absence of MPXV, the resulting consensus sequences were queried against the NCBI core nucleotide database using blastn⁵⁵. Additionally, raw sequencing reads were taxonomically classified using Kraken2⁵⁶ and subsequently visualized using Pavian⁵⁷.

Phylogenetic analysis

The MPXV genome database of extracted genome sequences corresponding to the 4.2 kb tiled amplicon target region used previously for in silico primer specificity testing, was used for phylogenetic analysis. To remove sequences with any ambiguous bases, the database was filtered with Cutadapt⁵². The resulting sequence database was further filtered with CD-HIT⁴⁷ using a sequence identity threshold of 0.9995 in order to remove redundant or highly similar sequences, resulting in a final database of 19 representative MPXV sequences. These sequences were aligned to wastewater and positive control derived consensus sequences generated by in-house sequencing using MAFFT⁴⁸. A phylogenetic tree was generated using IQ-TREE⁵⁸ on the find best model setting with ModelFinder⁵⁹ (Best-fit model selected: K3Pu + F) with 1000 ultrafast bootstraps⁶⁰ and then visualized using iTOL⁶¹ with a vaccinia Western Reserve virus consensus sequence (NML collection) chosen as the outgroup for rooting. Subsequent phylogenetic analysis was carried out using only amplicon 4 by extracting the target region from the previously generated alignment of 19 representative MPXV sequences (excluding primer binding regions) followed by the process described above for phylogenetic tree generation (Best-fit model selected: HKY + F) and visualization. Sequences covering the 4.2-kb region were generated using purified DNA from MPXV clade Ib (Pathoplexus accession # PP_0014F6Y.1) provided by the PHAC-NML Special Pathogens unit, MPXV clade IIb (AMPLIRUN Monkeypox Virus DNA Control, Vircell, reference # MBC146-R) and vaccinia virus Western Reserve (provided to PHAC-NML by Dr. David Evans, University of Alberta, Edmonton, Canada).

SNP analysis to assess single amplicon clade and sub-clade differentiation

To investigate the clade and sub-clade classification ability of amplicon 4, the single nucleotide polymorphisms (SNPs) from the nucleotide alignment used for phylogenetic analysis were visualized using snipit⁶². Based on these results, a list of potential clade or subclade specific SNPs within amplicon 4 was generated. The specificity of the SNPs to each clade or subclade was further investigated by looking at the nucleotide composition at the associated positions using the sequence database of recent MPXV sequences downloaded from GISAID and previously used in this study for assay design and in silico analysis.

Source link