Overview of MPXV genomes dataset
This study included 1138 MPXV specimens collected between May 2022 and February 2023 and sequenced by the NYC Public Health Lab (PHL) (Supplementary Data 1). The spatiotemporal distribution (Supplementary Fig. 1A and 1B) shows a peak of specimen collection in July 2022 with the highest frequency of genomes sequenced from residents of Manhattan. Of 758 individuals in this study, 390 had single MPXV genomes collected from one lesion (34% of genomes), 361 had two genomes collected from two lesions (62% of genomes), and the remainder had three or more genomes collected from 3-4 lesions (4% of the genomes). Of the individuals with one genome, most of the sequenced specimens were from the genital area (62%, 241/390) (Supplementary Fig. 1C). Similarly, for individuals with multiple genomes, the majority of sequenced specimens (54%, 404/748) were from the genital area, 30% (228/748) were from upper body, and 10% (72/748) were from lower body sites. Specimens were collected from two lesions from the same anatomical site (e.g., genital area) for 133 sequences and from different anatomical sites (e.g., one lesion from the hand and one lesion from the genital area) for 468 sequences (Supplementary Fig. 1D).
MPXV global genome sequences (n = 2967, excluding NYC sequences) included in this study were collected from 1985 through 2023 with representative sequences from five continents except Oceania. In this dataset, 2877 genomes belong to lineage B of clade IIb, 63 genomes belong to lineage A of clade IIb, and 27 genomes from clade Ia serve as the outgroup (Supplementary Data 2).
Lineage assignment and phylogenetic analysis of MPXV genomes
In the combined NYC and global dataset, Nextclade designated 4,015 of the 4105 sequences as lineage B.1 of MPXV clade IIb. In NYC, the most frequently observed B.1 sublineages were B.1.2, B.1.12, B.1.3 and B.1.7 ( > 50 sequences per sublineage) (Fig. 1A). Nextclade lineage assignment had 99% and 98% agreement with the observed phylogenetic clades for sublineages observed in the global and NYC phylogenies, respectively (Figs. 1B, 2A). The phylogenetic clade that belonged to sublineage B.1.12 included predominantly sequences from NYC (94%) (Figs. 1B and 2A, red asterisk). For genomes designated as B.1 lineage by Nextclade, at least nine distinct clusters were observed in the MPXV phylogeny with sequences in each cluster predominantly from NYC and/or North America (Fig. 2B, Supplementary Fig. 2, Table 1).
A) B.1 and sublineages assignment of the 2022 MPXV genome sequences from NYC, North America (Excluding NYC) and Europe. Sequences from the Africa, Asia and South America were excluded from this figure due to small sample size (Africa: n = 2; Asia: n = 4; South America: n = 75). Lineage and sequence count per source were represented with x-axis and y-axis, respectively. Apart from the parent lineage B.1, the most frequently observed B.1 sublineages in Europe were B.1.1, B.1.2, B.1.7, B.1.3 and B.1.5 ( > 50 sequences per sublineage). In North America (excluding NYC), the most frequently observed B.1 sublineages were B.1.2, B.1.11, B.1.4, B.1.3, and B.1.1 ( > 50 sequences per sublineage). The most frequently observed NYC B.1 sublineages ( > 50 sequences per sublineages) were B.1.2, B.1.12, B.1.3 and B.1.7. B) Global phylogeny of MPXV genome sequences (including NYC sequences). The tree visualized with 4078 sequences where 63 and 4,015 sequences were from lineage A and B, respectively. The collection dates ranged from 10/09/2017 to 01/09/2023. In the phylogeny, the branches were colored by Nextclade lineage assignment. The outer ring was colored by geographical region. Sublineages associated with specific geographical regions were shown using colored asterisks. All the B.1 sublineages were placed as distinct clades and had a 98.9% agreement with the Nextclade lineage assignment. Most sequences (93.6%) from sublineage B.1.12 were from NYC (red asterisk). Most sequences from sublineages B.1.11, B.1.13, B.1.2 and, B.1.4 were primarily from North America (excluding NYC) (orange asterisks); and sublineages B.1.1, B.1.7 and B.1.9 were primarily from Europe (blue asterisks). B.1.6 sublineage was predominantly from South America (purple asterisk). The source data files for Fig. 1 are available in the following GitHub repository: https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.
A) A subtree of the global MPXV tree from Fig. 1 containing 1138 NYC sequences. The branches were colored by Nextclade lineage assignment. The B.1 sublineages in the phylogeny were in 98.0% agreement with the Nextclade lineage assignment. The B.1.12 sublineage was predominantly NYC specific (red asterisk). B) Clusters emerged within B.1 assigned sequences. Lineage assignment was based on Nextclade. Six distinct clusters were observed in the NYC genome phylogeny (see also Supplementary Fig. 2) where sequences in each cluster were predominantly NYC specific and had at least one cluster-specific nonsynonymous mutation except for Cluster #6 (Table 1, Supplementary Fig. 3). The source data files for Fig. 2 are available in the following GitHub repository: https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.
MPVX lineage defining mutations, deletions, and the rate of evolution
A total of 58 MPXV lineage-defining mutations were observed in lineage B, in which eight mutations were intergenic, 22 were synonymous, and 28 were nonsynonymous mutations (Supplementary Table 1). Most of these B lineage-specific mutations (91%) had APOBEC3 signatures (Supplementary Table 1). Genomic analyses showed that some B.1 sublineages harbored additional high frequency mutations that were not considered for defining lineage calls by Nextclade. For example, the B.1.13 sublineage observed in the MPXV global phylogeny had one high frequency intergenic mutation in position 132,520 (allele frequency=0.88) along with the lineage defining-mutation in position 175,093 (Fig. 1B, Supplementary Fig. 3, Table 1). Additionally, the NYC-specific sublineage B.1.12 had two high frequencies nonsynonymous mutations in positions 98,233 (allele frequency=0.93) and 98,455 (allele frequency=0.93) along with the lineage-defining mutation in position 182950 (Supplementary Fig. 3, Table 1). Genome sequences from six of the nine phylogenetic clusters that were predominantly from NYC and/or North America had at least one cluster-specific nonsynonymous mutation (Table 1). The presence of these cluster-specific mutations may qualify the sequences in those clusters to be designated as MPXV B.1 sublineages.
No deletions were found in surface glycoprotein B21R13,19 and the TNF receptor crmB in the genomes sequenced by the NYC PHL. Other large-scale deletions were also rare in NYC sequences. Five occurrences of deletions were observed in 11 sequences collected from 10 individuals (Table 2). Sequences with a deletion from position 11,326:12,237/8 in the OPG023 gene were grouped into two distinct clades in the NYC phylogeny, likely two independent mutational events due to convergent evolution (Supplementary Fig. 4).
The MPXV genome evolution rate was estimated to be 4.28e-5 subs/site/year using MPXV sequences collected from 2021 to 2023 (Supplementary Fig. 5). Analysis of the amino acid changes in the phylogeny indicated that certain genes (OPG109, OPG110, OPG048) were more likely to have mutations (Supplementary Fig. 6).
APOBEC3 mutations in MPXV genomes
We assessed the prevalence of putative APOBEC3 signatures comparing the 2022 outbreak with previous years. The MPXV dataset was divided into three groups: (i) pre-2022 outbreak sequences retrieved from NCBI (1985-2021, n = 81); (ii) 2022 global outbreak sequences, excluding NYC, retrieved from NCBI (n = 2,877); (iii) 2022 NYC outbreak sequences sequenced by the NYC PHL (n = 1138). Compared to pre-outbreak sequences, both global and NYC MPXV sequences had significantly higher numbers of APOBEC3 signatures (p 3). MPXV sequences in NYC also had slightly higher numbers of APOBEC signatures when compared to the global sequences during the 2022 outbreak (p
In the box plots, data points represent the total number of mutations identified with (red) and without (blue) APOBEC-3 signatures in individual MPXV genome sequences. The values are shown on y-axis for: (i) Sequences retrieved from NCBI for MPXV genomes prior to the 2022 outbreak (n = 81); (ii) 2022 outbreak global sequences excluding NYC (n = 2877); and (iii) 2022 NYC outbreak MPXV genomes sequenced by NYC PHL (n = 1138). The center line of each box denotes the median number of mutations per MPXV genome sequences. The whiskers extend to the minimum and maximum number of mutations per sequence within 1.5 times the interquartile range from the first and third quartiles. Mutation counts outside this range are shown as individual outlier points, representing genomes with unusually high or low numbers of mutations. The number of mutations with APOBEC3 signatures was significantly higher in the 2022 outbreak sequences (both global and NYC MPXV sequences) compared to the pre-outbreak sequences, as determined by a two sample t-test (Pre-Outbreak Global vs Outbreak Global: p.adj = 9.18e−24; Pre-Outbreak Global vs Outbreak NYC: p.adj = 1.99e−24; Outbreak Global vs Outbreak NYC: p.adj = 1.61e−114). In the figure, “****“ indicates a p-value https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.
We screened 1138 NYC MPXV genomes and found 455 (42%), out of the total of 1076 mutations, had the “GA > AA” APOBEC3 signature. Four hundred twenty-nine mutations (40%) had the “TC > TT” APOBEC3 signature, and 192 (18%) did not have any APOBEC signatures (Supplementary Data 3). Amongst the 884 APOBEC3 mutations, 6% APOBEC3 signatures were MPXV B lineage-defining, 44% occurred in two or more genomes and the remaining APOBEC3 signatures (51%) occurred only once in a genome (Supplementary Table 1 & Supplementary Data 3).
F13L mutations in NYC MPXV genome sequences
Mutations in the MPXV F13L gene (encodes the VP37 protein that form membrane to surrounds the mature virus) homolog have been previously reported to be associated with tecovirimat (TPOXX) resistance. The NYC MPXV genomes showed a low frequency of F13L mutations (n = seven different mutations), four mutations of which were previously confirmed to be associated with TPOXX resistance33 (Table 3). MPXV sequences with a TPOXX-associated mutation were from four individuals who had advanced HIV disease and were immunocompromised. These patients had been previously treated with TPOXX and were regarded as severe mpox cases. However, the risk of TPOXX resistance after treatment in the general population could not be evaluated due to a lack of information available on TPOXX treatment and outcomes in NYC.
Infections with genetically distinct MPXV strains
Nearly half of individuals in this study (n = 360) had two or more MPXV genomes collected from two or more lesions (66% of genomes). When analyzing individuals for intra-host variation, we observed individuals that had multiple sequences with a high degree of variation between sequences (>10 SNPs). Amongst the individuals that had more than one lesion sequenced, 94% of individuals (337/360) had the same lineage assignment, 6% of individuals (20/360) had lineage assignments that were sublineages of the other sequence(s), and only
Of the 360 individuals with MPXV genomes from multiple lesions, we identified 15 individuals whose genomes were polyphyletic in the phylogeny (i.e., at least two distinct viral genomes mapped to disparate clades diverged at the root of the phylogeny; see Methods for details) (Fig. 4). Eight additional sets of genomes were found to be distantly related but did not pass through the root (Supplementary Fig. 7). The remaining 337 individuals had sequences that were monophyletic, direct ancestors (i.e., closely related, and consistent with potential intra-host variation), or closely related (i.e., genomes did not have enough variation to be considered due to multiple infections). When considering only individual genomes where the branches separating the sequences pass through the root of the tree, we estimated 4.2% (15/360) of mpox cases had multiple infections with distinct MPXV strains in NYC. Expanding the definition to include the eight individuals with a node distance of at least four, the estimate for infection with multiple MPXV strains was 6.4% of mpox cases. These cases occurred in July 2022, the height of the outbreak in NYC (Supplementary Fig. 8).
This NYC MPXV tree was inferred from 1114 NYC sequences by masking the regions with low depth of coverage (Supplementary Table 2). In the figure, PID stands for an individual. Links in the figure were used to connect the sequences from the same individual that diverged at root using their phylogenetic placement. For Individual PID 187, one out of four sequences collected was found in a separate clade. The remaining three were found within the same clade, but one was distantly related by at least 4 nodes. The source data files for Fig. 4 are available in the following GitHub repository: https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.
Intra-host variation in NYC mpox outbreak specimens
The intra-host variation of NYC MPXV sequences was evaluated in 344 individuals with sequenced MPXV genomes from multiple specimens (i.e., more than one lesion, majority sampled on the same day) and that were not due to multiple infections with distinct MPXV strains. We found that MPXV sequences from individuals with multiple lesions were identical in 172 individuals. Of the remaining 172 individuals, 119 individuals had multi-lesion sequences differing by only 1–2 mutations and 53 individuals had multi-lesion sequences differing by more than three mutations. Most intra-host variation had APOBEC3 signatures (Fig. 5A). All the intra-host variation in 66% (114/172) of the individuals had APOBEC3 signatures, and at least 50% of the intra-host variation showed APOBEC3 signatures in 22% (38/172) of the individuals. Only 9% (16/172) of the individuals had intra-host variation that did not have APOBEC3 signatures (Fig. 5B).
A The frequency distribution of the maximum total SNP distance with (red) and without (blue) APOBEC3 signatures between sequences from the same individual. Out of 172 individuals, sequences from 119 (69%) individuals were closely related with 1-2 total SNPs between sequences, and 53 Individuals had sequences that were divergent with >=3 SNPs. Majority of these mutational differences were due to APOBEC3. B The proportion of intra-host SNP variation due to APOBEC. For 66% (114/172) of individuals, the observed mutational difference between sequences was completely due to APOBEC3. For 22% (38/172) of individuals, APOBEC3 contributed to 50% or more (but not 100%) of the mutational differences. Only 9% (16/172) of individuals had observed mutational differences that could not be attributed to APOBEC3. C Intra-host phylogeny. Sequences from 17 individuals that had the greatest phylogenetic distance between them were annotated with colored tips and plotted on the NYC-only phylogeny. Two individuals’ sequences (J2665014/light blue and NG930408/light green) resulted in non-monophyletic placements of sequences. The source data files for Fig. 5 are available in the following GitHub repository: https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.
When these samples were evaluated on the NYC phylogeny, 21 individuals had samples with a genetic distance greater than 0.000021 substitutions/site. Sequences from the 17 individuals with the greatest within-host genetic distance are shown in Fig. 5C, and the remaining four are shown in Supplementary Fig. 9. Sequences from three individuals (two in Fig. 5C and one in Supplementary Fig. 9) were also found in different clades on the NYC phylogeny, illustrating that MPXV evolution within the same individual can be divergent enough to obscure the phylogenetic relationships between sequences. Although a rare occurrence (0.9% of individuals), this observation contrasts with Rueca et al.16 wherein longitudinally sampled individual sequences clustered together on a phylogenetic tree.
Epidemiologically linked mpox cases in NYC
Contact tracing identified 43 mpox cases who were part of epidemiologically linked pairs where both patients were tested at the NYC PHL; these case-pairs were then further divided into 17 groups. (Table 4, Supplementary Data 4). MPXV sequences from groups #1, #3, #5, #8, and #15 were genetically related (Fig. 6, Table 4). Two MPXV sequences were genetically related in group #2, and five MPXV sequences were genetically related in group #6. The genetic relationship for the rest of the sequences from these two groups could not be resolved due to their placement in the basal part of the phylogenetic tree (Fig. 6, Table 4). MPXV sequences from groups #7, #9, #11, #16, and #17 were not monophyletic but shared a common ancestor in the same clade in the phylogeny and were therefore considered potentially genetically related (Fig. 6, Table 4). MPXV sequences from groups #10, #12, #13 and #14 were placed in different clades in the phylogeny and were not genetically related. Intra-host sequence pairs from groups #10, #12, and #13 were categorized as “Distantly Related” based on the multiple infection analysis performed in the previous section (Fig. 6, Table 4). For example, group #12 included sequences from two lesions belonging to one individual. One of these intra-host sequences was assigned to the B.1 lineage (Accession #OQ469282) while the other sequence was assigned as B.1.7 (Accession #OQ469283) (Table 4, Supplementary Data 4). This observation suggests that sequencing only one lesion can result in inaccurate reconstruction of genomic transmission networks. Overall, 13% of sequences from patients with epidemiological links were not genetically linked based on the phylogeny.
The symbols on the branches were colored by putative groups of epidemiologically linked cases. The outer ring (Sequence Type) in the figure was colored by the genome sequences that belonged to the same or different individual of epidemiologically linked cases in a particular group. The middle ring (Phylogenomic Category) is colored by the phylogenetic placement of the genome sequences of epidemiologically linked cases within a group. Epidemiologically linked cases in groups with sequences placed on the same or sister branches were designated as “monophyletic”. Epidemiologically linked cases in groups for which genome sequences were in the same clade in the phylogeny and shared a common ancestor with other NYC sequences in this clade were designated as “Shared Ancestor”. Epidemiologically linked cases in groups for which most of the genome sequences were placed in the basal part of the phylogenetic tree were designated as “inconclusive”. Epidemiologically linked cases in groups for which genome sequences were in different clades in the phylogeny were designated as “Not-linked”. The inner ring (Intra-host category) was colored by the categories that were assigned using the phylogenetic placement of the genome sequences from the same individual (See multiple infection result section: Fig. 4 and Supplementary Fig. 7). The source data files for Fig. 6 are available in the following GitHub repository: https://github.com/sakther-NYCDOHMH/nyc_mpox_genomic_epidemiology/.





