Genome graphs reveal the importance of structural variation in Mycobacterium tuberculosis evolution and drug resistance

Categories: Disease & Virus

November 29, 2025

Construction of a M. tuberculosis pangenome reference graph

We obtained 1363 long-read datasets (ONT or PB) from public repositories and performed quality control for read length and decontamination of human DNA before assembling genomes with Flye¹⁸. After also incorporating 176 long-read assemblies from NCBI to enlarge the dataset, all assemblies were evaluated with QUAST¹⁹, with 834 remaining after filtering (see Methods; Supplementary Fig. 1). To ensure phylogenetic coverage, we sequenced 25 isolates from underrepresented lineages with ONT, achieving representation of lineages 1–9, including phenotypically drug-resistant isolates without known resistance mutations (Supplementary Table 1). Sub-lineage 2.2.1 (Beijing) was the most frequent of the dataset (23.4%), while lineage 4 and its sub-lineages dominated overall (48%; Supplementary Fig. 2). The final cohort of 859 assemblies, representing both genetic and geographical diversity (Supplementary Data 1 and Supplementary Fig. 3), had a mean N50 of 2.13 Mb (range: 11.2 kb–4.45 Mb) and mean coverage depth of 125× (range: 7 × -859 × ; full metrics in Supplementary Data 2).

To construct the Mtb-PRG, we used minigraph²⁰, which efficiently incorporates sequences that are larger than 49 bp and missing from the reference genome while preserving reference coordinates. Starting with the H37Rv reference genome (accession: NC_000962.3) as the backbone, additional genomes were added to the Mtb-PRG in lexicographic order by sample name. Newly added sequences formed nodes, which represent distinct segments of genomic sequences that are connected by paths, resulting in 9350 nodes in total. Nodes connected to each other but not part of the reference formed 848 bubbles, with the largest spanning 97,162 bp (H37Rv coordinates: 3,386,328–3,483,490; Supplementary Fig. 4).

Genotyping SVs using the Mtb-PRG and miniwalk is more precise than using a traditional short-read SV caller

PRG-involving approaches have been shown to call SVs more accurately from short-read sequencing data compared to traditional single-reference genome approaches¹⁶. Since most Mtb genomic data are derived from Illumina short-read sequencing, accurately identifying SVs using such data is valuable. However, no existing method genotypes SVs directly from graphs generated using minigraph. To address this, we developed miniwalk, a tool that identifies SVs by comparing the graph traversal paths of an assembly or long reads, to a reference genome mapped to a minigraph PRG (Supplementary Methods 1). Miniwalk outputs insertions (INSs), deletions (DELs), duplications (DUPs), and inversions (INVs).

We benchmarked miniwalk using 16 samples, each of which had a high-quality, polished long-read assembly to act as truth. To avoid bias, the 16 samples were removed from the PRG for the benchmarking²¹. Truth SVs were called using SVIM-ASM²² on assemblies mapped to the H37Rv reference genome, while miniwalk called/queried SVs on assemblies mapped to the Mtb-PRG. Precision and recall, calculated using the miniwalk bench module (Supplementary Methods 2), averaged 0.85 and 0.82, respectively, showing comparable genotyping performance to SVIM-ASM (Fig. 1a). Nevertheless, precision varied across SV types, with DELs of size 500–1000 bp having the lowest precision (0.5; Fig. 1b), though fewer SVs were found in that size range. Further analysis revealed that this drop was largely driven by a single complex SV present in 11 assemblies, consistently detected by miniwalk but missed by SVIM-ASM (Supplementary Fig. 5). We also constructed an additional PRG excluding assemblies with N50 <90,000 bp and coverage <25 × to determine whether a more stringent approach should have been taken. The average precision and recall using the more stringent PRG was 0.84 and 0.83, respectively, indicating that our initial PRG had better precision. To test whether assembly ordering introduced bias, we rebuilt the Mtb-PRG with assemblies ordered by lineage rather than lexicographic filename and re-evaluated SV genotyping performance. Precision and recall did not differ significantly (two-sided t-tests; precision p = 0.96, recall p = 0.29), indicating that graph construction order had no measurable effect on genotyping accuracy (Supplementary Fig. 6). To complement real-data benchmarking, we simulated an assembly by spiking real SVs into the H37Rv reference. Miniwalk achieved 0.92 precision and 0.82 recall, outperforming results on real data (0.78/0.81), highlighting the reduced complexity of simulations.

**Fig. 1: Benchmark of SV genotyping using the *Mtb*-PRG with miniwalk against the linear genome with SVIM-ASM and manta.**

To assess short-read data performance, Illumina reads used for polishing the 16 assemblies were analyzed using manta²³, a leading short-read SV caller²⁴. Reads were also assembled with shovill²⁵, mapped to the Mtb-PRG, and SVs genotyped with miniwalk. To assess precision and recall of SV genotyping using these two approaches (linear H37Rv/manta and Mtb-PRG/miniwalk), we compared both to SVIM-ASM truth SVs as in Fig. 1a. We found miniwalk demonstrated significantly higher precision (0.7 vs. 0.46 for manta; two-sided t-test, p = 1e-6) at the cost of slightly lower recall (0.34 vs. 0.42 for manta; two-sided t-test, p = 0.007; Fig. 1c). Breaking down by SV type, manta achieved a precision and recall of 0.42 and 0.63, respectively, for DELs and 0.42 and 0.36, respectively, for INSs (Fig. 1d). Meanwhile, miniwalk achieved a precision and recall of 0.64 and 0.54, respectively, for DELs, and 0.86 and 0.19, respectively, for INSs. DELs sized >1000 bp had precision exceeding 0.8 (Fig. 1e). Considering SVs genotyped with miniwalk from the long-read assemblies mapped to the Mtb-PRG as the truth (SVs from Fig. 1a, b; miniwalk-long-read truth SV), precision and recall decreased for Illumina data mapped to the H37Rv linear reference (Fig. 1f, g) but improved slightly for short-read assemblies mapped to the Mtb-PRG (Fig. 1f, h). These results highlight miniwalk’s precision advantage when using a PRG compared to a linear genome-based method. To explore whether recall could be improved, we created another Mtb-PRG, retaining SVs ≥20bp, and evaluated it against the short-read assemblies. We observed, on average, a 2% increase in recall at the cost of a 2% drop in precision (Supplementary Fig. 7).

To further validate the use of miniwalk, we used partial credit from vcfdist²⁶ to evaluate the similarity between our variant calls and the SVIM-ASM truth set. Vcfdist assigns each pair of variants a partial credit score ranging from 0 to 1, based on edit distance (sequence content and position), where 1 means identical variants and 0 means no overlap. We applied growing partial credit (ct) thresholds starting from 0.001. Lower ct thresholds allow called SVs with partial similarity to the truth to be considered true positives, while higher ct thresholds impose stricter criteria, requiring close matches in SV type, position, and sequence content. This evaluation included the previous long- and short-read assemblies mapped to the Mtb-PRG as well as the variants called from manta. For miniwalk-long-read SVs, precision and recall showed a gradual decline as the threshold increased, reflecting differences in sequence content (e.g., long-read sequencing errors, SNPs) and positions between called and truth SVs. Nonetheless, precision (0.75) and recall (0.74) remained moderately high for long-read assemblies mapped to the Mtb-PRG at a ct threshold of 0.5, demonstrating the accuracy of SVs called using our graph-based approach. For short-read data, miniwalk outperformed manta across all ct thresholds up to 0.985 for both precision and recall (Fig. 1i, j).

Further analysis revealed that SVs in complex regions were the ones driving the decrease in precision and recall in short-read data (Supplementary Note 1).

These results suggest using a genome graph as a superior alternative to a linear reference genome for genotyping SVs. With this in mind, we proceeded to learn about SVs in Mtb using the Mtb-PRG and miniwalk.

Landscape of structural variation across the M. tuberculosis genome and phylogeny

We sought to improve the characterization of SVs across diverse Mtb lineages using high-quality long-read data. Assemblies used to construct the Mtb-PRG were mapped back to the PRG. The 38 most fragmented assemblies were excluded from this analysis as they could not be mapped using minigraph’s default parameters (see Methods). SVs in the remaining assemblies (821) were genotyped against the H37Rv reference genome using miniwalk. To further define the diversity of SVs in the Mtb genome, DELs and DUPs in close proximity were grouped as copy number variants (CNVs). SVs within ±25bp proximity, ±50 bp size, and of the same type were considered identical across isolates.

A total of 3077 unique SVs were identified relative to H37Rv, broken down into 1127 INSs, 1694 DELs, 59 DUPs, 58 INVs, and 139 CNVs. The largest SV was a 66,582bp region (H37Rv coordinates: 2,901,855–2,968,437) that was inverted and translocated to a novel genomic position (H37Rv coordinates: 415,370) in lineage 9 (Supplementary Fig. 8). The translocation implicated 70 genes and truncated Rv2577 and Rv0345. This rearrangement highlights the complexity of SV patterns observed in Mtb, though most isolates presented small DELs, CNVs and numerous IS6110 elements (~1358bp; Fig. 2a). We found various INS-DEL pairs of similar size (≥90%) and high identity (≥90%) across each genome (range of 0–3 per genome), likely stemming from IS6110 transposition. A principal component analysis (PCA) performed with the SV genotypes revealed that SVs contribute to population structure, especially along PC2 which separates the ancient (L1, L5–9) from the modern (L2–4) lineages (Fig. 2b). Notably, the “TbD1” SV at position 1,761,789 explains 2% of the variability in PC2 (Supplementary Fig. 9). The allele frequency spectrum of SVs revealed that over half (1617/3077) were singletons (Supplementary Fig. 10a). Importantly, singleton SVs were not enriched in fragmented assemblies, suggesting they represent genuine rare variation rather than artifacts (Supplementary Fig. 10b).

**Fig. 2: Structural variation genotyped from 821 long-read *Mtb* assemblies covering L1–L9.**

To understand the acquisition of SVs across lineages over time, SVs were genotyped against an ancestrally reconstructed sequence (MTBC0²⁷) and mapped to a phylogenetic tree from the genotyped isolates (Fig. 2c). This showed that all isolates have acquired a similar amount of SVs across the phylogeny, the least being L4 with an average of 67 SVs per isolate and the most being L8 with 82 SVs (though 1 isolate was used). Interestingly, Beijing lineage isolates segregated into two branches, one with an average of 86 SVs per isolate and the other with an average of 68 SVs per isolate, indicating distinct evolutionary trajectories of SV accumulation within this sub-lineage. When analyzing L4 sub-lineages, the globally spread generalists L4.1.2 and L4.3²⁸ had an average of 60 and 63 SVs per isolate, respectively, whereas the specialists L4.1.3 and L4.6.2²⁸ had an average of 74. From these results, generalist lineages have significantly less SVs than specialist lineages (two-sided Wilcoxon test, p value = 0.03). SVs are generally deleterious in nature, so lineages with more SVs could have suffered more deleterious mutations than those with less SVs, hindering their geographical spread²⁸. Alternatively, the higher number of SVs in specialist sub-lineages could reflect the process of host-population adaptation via reductive genomic evolution²⁹.

To identify SV hotspots, we scanned the genome using 50,000 bp windows with 5000 bp steps and calculated fold-enrichment relative to the average SV density. Results indicated an uneven SV distribution, with a prominent hotspot between 3,200,000–3,600,000 bp enriched in PE/PPE genes (Fig. 2d). We found that 75% of SV breakpoints fall in genes. This is lower than expected by chance (mean random breakpoints = 92%, absolute z-score = 6.42, p value = 1e-10), likely due to the deleterious effect of SVs (Supplementary Fig. 11). Moreover, SV breakpoints were significantly associated with GC-rich sequences compared to randomly sampled genome regions (two-sided Wilcoxon test, p = 5e-8; Supplementary Fig. 12). PE/PPE genes are known for their GC-rich repetitive regions, which serve as an SV hotspot. These findings highlight that SVs in Mtb are not randomly distributed but are enriched in specific genomic features and regions.

Pangenomic insights into structural variation in the ESX loci of M. tuberculosis

The Mtb genome encodes five type-7 secretion systems, which are known to play an important role in pathogenesis and virulence, the most studied being ESX-1^30,31. ESX-5 is important for full virulence³² and has been involved with various functions, such as the secretion of PE/PPE proteins, the activation of the host cell inflammasome response or macrophage death induction³¹. Its paralogs’ (ESX-5a, ESX-5b, and ESX-5c) functions, however, are less understood. To determine the presence of SVs in the ESX-5 loci and which are more conserved, we analyzed the haplotypes from the 821 long-read assemblies mapped to the Mtb-PRG.

All ESX-5 loci had one SV bubble spanning their locus (Fig. 3a). Most isolates lacked SVs across these loci, with most non-reference haplotypes being present solely in 1–6 isolates. Nevertheless, the most frequent non-reference haplotype was a ppe25–ppe27 deletion, present in 9% of all isolates (Fig. 3b). The deletion was found in all sub-lineage 4.4 isolates and sporadically in other lineages (L4.1, L4.3, L4.5, L1, L3, and L8). The deletion can most likely be attributed to non-allelic homologous recombination, as the 3’ sequences of ppe25 and ppe27 are highly homologous (Supplementary Fig. 13), that is, the deletion spans the ppe25-ppe27 locus and removes ppe25, pe18, and ppe26 while leaving ppe27 intact. The deletion was found to provide higher persistence in mouse models and showed convergent evolution in M. canettii³³, which could explain its homoplasic effect across the Mtb phylogeny (Supplementary Fig. 14). A previous study documenting this deletion hypothesized that the loss of immunomodulatory PE/PPE genes could help the bacteria escape the immune system³⁴.

**Fig. 3: Visualization of the structural variation in the ESX-5 loci through the pangenome reference graph.**

We extended the analysis to assess the impact of SVs across all of the ESX genes. To understand the probability of observing SVs in a gene, we calculated the trimmed (10%; removal of outlier genes with high SV counts) mean number of SVs in essential (0) and non-essential (0.76) genes^35,36, with 33% of genes being overlapped by an SV. Genes prefixed with the letters “ecc” and “mycP” in the ESX loci encode proteins which form the secretion system responsible for translocating proteins involved in virulence³⁷. Consistent with their essential role in virulence³⁸, we found 0 isolates from our collection of 821 to have SVs impacting structural components of the ESX-3 and ESX-5 loci, and only one with an SV impacting ESX-1 genes eccE1 and mycP1. In contrast, the ESX-2 and ESX-4 loci, which are non-essential and have yet to be ascribed a function³⁸, had between 1 and 16 isolates with SVs overlapping their gene members³⁸ (Supplementary Fig. 15a). Among all genes, espI from ESX-1 had the highest number of SVs, with 125 isolates exhibiting overlaps (Supplementary Fig. 15b). EspI acts as a negative regulator of the ESX-1 secretion system; thus, the high frequency of SVs in this gene may suggest positive selection for maintaining constant expression of the system³⁹. Genes within the ESX loci prefixed with the letters “esx” encode proteins that are secreted through the ESX apparatus and thought to be implicated in host-pathogen interaction. Across our collection of 821 isolates, no esx genes in the ESX-1 and ESX-5 loci were found to have any SVs (Supplementary Fig. 15c), however esx genes in the ESX-2 locus displayed SVs in three isolates. Of the ESX-5 paralogs, ESX-5a was the most strongly conserved (1 isolate with an SV falling in a gene), suggesting deletions of this locus are less well tolerated than deletions in ESX-5b and ESX-5c (Fig. 3b and Supplementary Fig. 15c, d). Interestingly, one instance of a full ESX-5c deletion was found in the geographically restricted African lineages (L6 and L9).

A genome and phylogeny-wide screen of SV positive selection in M. tuberculosis

We performed a systematic screen for genes which evolved SVs repeatedly across the phylogeny, following the rationale that this may reveal genes targeted by positive selection⁴⁰. The criteria for our screen stipulated that the gene had to be affected by an SV in at least ten isolates, belonging to more than one Mtb lineage. We further excluded genes in or ±3000 bp of low-mappability regions¹⁰, as those regions are more prone to structural rearrangements, which could be falsely perceived as positive selection. This analysis resulted in 59 homoplasic gene candidates. The gene with the strongest homoplasic effect was Rv3177, with SVs in 158 isolates from L1.2.1, L1.1.1, L2.2.1 (82% of all isolates), L2.2.2, L3.1.2, L3.1.3, various L4 sub-lineages and L5 in 15 different breakpoints across the gene (Fig. 4). Rv3177 is a probable peroxidase involved in detoxification of reactive oxygen species (ROS). In a previous study, variants in Rv3177 were associated with isolates harboring the isoniazid resistance-conferring katG-S315T mutation, suggesting a possible compensatory role in drug resistance⁴¹. Another gene that acquired SVs in different branches of the phylogeny was ctpG with six different DEL breakpoints across the gene (Supplementary Fig. 16) in L3, L4.3.4 and L5 (Fig. 4). CtpG is a zinc (Zn) exporter whose deletion has been associated with Zn accumulation⁴². During host infection, Mtb goes from Zn-rich to Zn-limited environments⁴³. A non-functional Zn exporter could possibly provide bacteria in Zn-limited conditions a selective advantage. Other interesting genes were mmpS1, with SVs in various L4 sub-lineages with a prominence in L4.1 as well as a few isolates in L2 and L3, and glnA3, with SVs in all L2.2.2 isolates and sporadically in L2.2.1, L3, and L4.1 (Fig. 4).

**Fig. 4: Homoplasic effect of SVs overlapping genes across the phylogeny.**

We sought to validate these results within a larger cohort. To do so, we downloaded a list of 44,709 high-quality Illumina sequences grouped in a previous study¹⁷. After assembly, we ended with 41,134 SV VCF files. We found a few isolates with SVs in Rv3177 and mmpS1, as Rv3177 is in a large bubble, likely hard to genotype using short-read data, and the mmpS1 SVs were all INSs, variants with low genotyping recall using short-read data. We were able to confirm the same distribution of SVs for ctpG and glnA3 (Supplementary Fig. 17). A similar distribution was observed for SVs in ethA, a gene whose deletion is associated with ethionamide (ETO) resistance, with a main cluster located in the Beijing lineage. For pncA, a pyrazinamide (PZA) resistance-associated gene, we found less isolates with SVs and the distribution was not as clustered, though mainly isolates in L2 had SVs in that gene. These results showcase the added benefit of using a pangenome to identify key genes implicated in Mtb evolution, and highlight four candidates for prioritization in future research.

Characterization of a M. tuberculosis lineage 1.2.1-specific copper exporter deletion

We expanded the previous screen to detect genes with multiple lineage-specific SVs. This analysis revealed diverse deletions of the ctpV gene, unique to the L1.2.1 sub-lineage (excluding its deepest branch split; Fig. 5a and Supplementary Fig. 18). These deletions increased in size further from the branch split (Fig. 5b) and overlapped the gene’s start codon, indicating a loss-of-function mutation (Fig. 5c). SNPs could not be reliably called for all the isolates as long-read data were used, nevertheless, we found no loss-of-function variants in this gene in other L1 sub-lineages. CtpV is a putative P-type ATPase copper (Cu) exporter protein, with the main Cu binding site located in the first 66 amino acids. A multiple sequence alignment of the protein sequence among diverse mycobacteria showed >80% pairwise identity across the Mycobacterium genus (Supplementary Fig. 19), aside from Mycobacterium leprae, highlighting the importance of this protein not only in Mtb.

**Fig. 5: Discovery of growing *ctpV* deletions across the L1.2.1 phylogeny and a transcriptomics analysis.**

L1.2.1 is predominant in Southeast Asia, especially the Philippines, the fourth country in the world with the highest burden of active tuberculosis¹, accounting for ~65–80% of all lineages in the country^44,45. Moreover, this sub-lineage shows high drug resistance rates, high transmissibility and is the most prevalent L1 sub-lineage globally^46,47. Given these characteristics, we hypothesized that this deletion impacts the expression of copper-associated genes, potentially contributing to the unique adaptive traits of this sub-lineage. To determine this, we compared the transcriptional profile of an L1.2.1 isolate (297bp ctpV confirmed deletion) against L1.1.1 and L1.1.2 isolates under normal growth conditions⁴⁸ (Supplementary Table 2 and Supplementary Fig. 20). This analysis revealed 147 differentially expressed genes (Fig. 5d). Functional categorization revealed that the L1.2.1 isolate showed enrichment of genes involved in virulence, detoxification and adaptation (Fig. 5e). ctpG and Rv0968 were among the differentially expressed genes (Fig. 5f). Rv0968 is part of the cso operon, where ctpV lies. This operon is controlled by CsoR, which inhibits its own expression in normal conditions. However, when Cu is present, it attaches to CsoR and inhibits its function, enabling Rv0968 and ctpV expression^49,50. The higher expression of both Rv0968 and ctpV (nonsignificant; adjusted p value = 0.053) was indicative of Cu accumulation inside these isolates (Supplementary Fig. 21). Moreover, the enrichment of ctpG, another P-type ATPase metal transport protein, has been previously associated with Cu accumulation in the cell, as well as the depletion of ctpB, a P-type ATPase Cu transport protein associated with Cu uptake in Cu-replete medium (nonsignificant; adjusted p value = 0.059)^50,51. Nevertheless, genes associated with oxidative stress (i.e., sigE,furA,nuoB) or protein stress (i.e., rpsR,rplI,rpsQ) were not differentially expressed, indicating that Cu was not present at toxic levels (Supplementary Data 3)⁵⁰. Altogether, these results are indicative of higher Cu accumulation in an L1.2.1 isolate compared to other L1 sub-lineages.

Discovery of DR-associated SVs in 41,134 M. tuberculosis isolates

Using the Mtb-PRG and miniwalk, we aimed to discover DR-associated SVs across various first and second-line drugs for treating TB. The previous list of 41,134 isolates also contained phenotypic information for at least one drug, with first-line drugs being vastly more represented than second-line drugs (Fig. 6a). To search for associations, we used two parallel methods: Firth logistic regression and a supervised machine learning model (xgboost), testing each drug independently and accounting for population structure and TB-Profiler predictions⁵². Analyses were conducted at both the SV and gene levels (SV-gene burden test).

**Fig. 6: Resistance-conferring SVs across 41,134 isolates and 22 drugs.**

Logistic regression revealed 14 non-canonical SVs (post-filtering) significantly associated (adj. p value <0.05) with seven DR phenotypes (Supplementary Data 4). Associations in PE/PPE genes were excluded due to their SV hotspot nature (list of discovered associations: see Supplementary Data 5). Xgboost outputs the estimated importance of each SV and TB-Profiler prediction to the analysis (ranging between 0–100%), indicating the relevance of these factors in classifying DR phenotypes. This analysis highlighted the contribution of the current list of WHO mutations, as identified by TB-Profiler, in predicting DR, with first-line drugs as well as fluoroquinolones and aminoglycosides having high importance scores (importance >50%). However, newer drugs such as bedaquiline, delamanid or clofazimine showed lower importance scores (Fig. 6b). The SV-gene burden test revealed 27 non-canonical genes associated with drug resistance across ten drugs (Supplementary Data 6). To increase confidence in our findings, we performed, for each associated gene and SV, a permutation analysis to estimate the likelihood of our observed p values under a null distribution. This showed that under the null hypothesis, there’s <5% chance of observing these associations (absolute z-score range 1.65–2.57), suggesting statistical significance for all variants.

For isoniazid (INH), we found two SVs and three genes associated with DR (Fig. 6c and Supplementary Figs. 22, 23). The most significant SV was a deletion overlapping Rv3434c (Fig. 6c), a gene previously associated with kanamycin (KAN) and amikacin (AMK) DR⁵³. An overwhelming 99% of isolates with a mutation in this gene presented an isoniazid resistance phenotype, the vast majority from L2. Nevertheless, only five isolates did not present TB-Profiler predictions for INH DR. Another deletion spanning Rv0738-Rv0741 was predominantly found in resistant isolates (97%), with an Rv0740 knock-out having previously been associated with INH resistance⁵⁴. Five isolates with SVs in Rv0740 did not have TB-Profiler predictions for INH DR. These findings underscore the potential of SVs in identifying additional resistance mechanisms, particularly in a widely distributed and commonly used drug like INH, not captured by TB-Profiler. For rifampicin, no non-canonical SVs or genes with SVs were detected in our analysis. For PZA, we found four SVs and six genes. The highest-scoring gene was pncA, with 2.3% of DR isolates having SVs overlapping that gene (Supplementary Fig. 23). An operon with SVs and also highly associated with PZA-DR was Rv1505c-Rv1507c, which plays a role in diacyltrehalose biosynthesis, a lipid that forms part of the cell wall⁵⁵. The only gene with SVs associated with ethambutol (EMB) DR was accE5, whose encoded protein is found in the cell wall⁵⁶ (Supplementary Fig. 23).

For second-line drugs, we found non-canonical SVs and genes with SVs for AMK, bedaquiline (BDQ), capreomycin (CAP), ciprofloxacin (CIP), clofazimine (CFZ), delamanid (DLM), ETO and streptomycin (STR) (Fig. 6c, Supplementary Fig. 23, and Supplementary Note 2). Our results highlight the importance of different types of SVs in DR, such as an insertion associated with STR DR, as well as the role of non-coding regions, such as a deletion upstream Rv2258c associated with CAP DR. The low importance scores of second-line, recently introduced drugs (BDQ, CFZ, and DLM) highlight the need to find more DR variants.

Source link