Stock Ticker

CarbaDetector: a machine learning model for detecting carbapenemase-producing Enterobacterales from disk diffusion tests

Strain collection

This study comprised 385 non-duplicate clinical Enterobacterales isolates, collected from 2012 to 2021 at the University Hospital Cologne and Klinikum Oldenburg in routine diagnostics. Species identity was determined using MALDI-TOF mass spectrometry and confirmed by whole genome sequencing (WGS). Of all isolates, 238 (61.8%) were carbapenemase producers, 147 (38.2%) were carbapenemase-negative. Molecular characterization of all isolates was performed by WGS on the Illumina platform, as previously described21. Briefly, DNA was extracted from pure bacterial cultures using the DNeasy UltraClean Microbial Kit (Qiagen, Hilden, Germany). Whole genome sequencing was performed by Novogene (Beijing, China). Genomic DNA libraries were prepared with the Novogene NGS DNA Library Prep Set with an average insert size of 350 bp, followed by paired-end 150 bp sequencing on an Illumina NovaSeq platform (Illumina, San Diego, CA, USA). Presence or absence of carbapenemase genes was confirmed using ResFinder v4.7.222,23. The results of molecular characterization were used as reference standard to evaluate the algorithm performance. Six species constituted 88.8% of the isolates, namely K. pneumoniae, E. coli, C. freundii, E. cloacae, P. mirabilis and S. marcescens. The most frequent carbapenemase group present was blaOXA-48-like (46.6%). Detailed characteristics of isolates and datasets is provided in the Supplementary Information.

Susceptibility testing

Susceptibility testing was performed at the Institute of Medical Microbiology and Virology, University Oldenburg according to EUCAST standards20, employing disks containing meropenem, ertapenem, imipenem, meropenem-vaborbactam, ceftazidime-avibactam, ceftolozane-tazobactam, temocillin (Oxoid, Basingstoke, UK), and imipenem-relebactam (Mast Group, Merseyside, UK) on Mueller-Hinton agar (Oxoid, Basingstoke, UK). Inhibition zones were measured manually.

Assessing the performance of the novel CA-SFM algorithm and the EUCAST screening process

To set the baseline for our model, we assessed the CA-SFM algorithm and the EUCAST screening algorithm for carbapenemase detection by applying it to all three datasets, using WGS results as ground truth. To develop a universal algorithm using R (rpart (4.1.24) and RandomForest (4.7.1.2) packages24,25), we built a decision tree and a random forest model using (i) species and the standard-scaled inhibition zone diameters and (ii) additionally a random forest model using the scaled differences in inhibition zone diameters. The differences in inhibition zone diameters (instead of only the raw diameters) were included once per antibiotics pair in order to compensate for laboratory-specific differences between measurements. To increase sensitivity, several cutoffs (0.5, 0.6, 0.7, 0.75) in the random forest model classification were assessed, with the final cutoff being 0.6, meaning that samples were predicted as “negative”, if a probability of more than 60% was determined, as opposed to the default 50%.

To estimate model performance, we employed nested cross-validation with 10 outer and 10 inner folds using the nestedcv R package26. Where possible, sampling was stratified for species and presence of carbapenemase genes. Class weights were applied to address the imbalanced distribution between carbapenemase negative and positive samples.

After estimating the performance on our own dataset (Supplementary Data 1), the final model was trained on the whole dataset with hyperparameter tuning via 10-fold cross-validation and applied to the external datasets for additional validation.

Validation of our algorithm using external datasets

To further validate the trained model and its correct prediction of CPE, the resulting model (CarbaDetector) has been used firstly to predict carbapenemase production on a set of 282 Enterobacterales isolates from Switzerland (University of Zurich) with and without carbapenemase production (external dataset A, included in Supplementary Data 2). For this dataset, inhibition zone diameters for all eight antibiotics used in the algorithm were determined.

Secondly, prediction of carbapenemase production on incomplete datasets (where not all eight recommended antibiotic disks were used) was tested on a different, previously published dataset containing the disk diffusion diameters of 518 Enterobacterales isolates submitted for carbapenemase testing to the French reference laboratory for multidrug-resistant Gram-negatives (external dataset B, included in Supplementary Data 3, originally used for the assessment of the CA-SFM algorithm16). Here, the diameters were measured using SIRscan and verified manually. Using the inhibition zone diameters for ertapenem, meropenem, imipenem, temocillin, and ceftazidim-avibactam, we imputed the missing values for imipenem-relebactam, meropenem-vaborbactam, and ceftolozane-tazobactam based on our dataset applying the missRanger R package27. Then, using the built model, the presence or absence of carbapenemase production was predicted. Information on statistical analyses and the development of the app can be found in the Supplementary Information.

Ethics approval

The bacterial strains were isolated during routine diagnostics and anonymized. As no patient data were analyzed, ethical approval was not required for this type of study according to §15 of the professional code for physicians.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Opec+ crude oil output rise largely symbolic as Hormuz strait stays closed

Inter Milan crowned Serie A champions

Kid Rock Filmed Concert Opening Video With Pete Hegseth

Weekend – Bulk carrier attacked by small craft off Iran’s Sirik coast, crew safe