Limitations of estimating antibiotic resistance using German hospital consumption data – a comprehensive computational analysis

For the present work we used quarterly consumption and resistance data over a 5-year period (2015–2019). We obtained antibiotic consumption data on department, specialty and ward level from a Germany-wide antibiotic surveillance project, called ADKA-if-DGI-project12, a cooperation between the German Hospital Pharmacists Association (ADKA), the Division of Infectious Diseases of the university hospital Freiburg (if) and the German Society of Infectious Diseases (DGI) including more than 300 acute care hospitals. Consumption data on antibacterial drugs are expressed as RDD per 100 patient days13,14. Furthermore, we used antibiotic resistance data for E. coli from the Antibiotika-Resistenz-Surveillance (ARS)-database15 provided by the Robert Koch-Institute (RKI). The ARS database collects routine data on antibiotic susceptibility testing of all clinical pathogens and sample types from approximately 80 participating laboratories, which support around 600 hospitals and 24,000 institutions of primary care. For our analysis, only data from the hospital-based (inpatient) settings were retrieved from the ARS database. Routine laboratory testing data are reported as proportions of R (resistant), I (intermediate) and S (sensitive) strains according to national quality standards (MIQ, Mikrobiologisch-infektiologische Qualitätsstandards16). Resistance data are only provided at the federal state level and are not available for each individual participating hospital. If the number of isolates to be tested is less than 50, no information on the proportion of resistant pathogens is given. The average number of isolates of some federal states is low, so we decided to aggregate them into larger regions (Thuringia & Saxony-Anhalt, Brandenburg & Mecklenburg-Western Pomerania, Schleswig-Holstein & Hamburg, Lower Saxony & Bremen, Rhineland-Palatinate & Saarland) to increase data availability. Please note that the hospital cohorts do not fully overlap, as ARS-RKI-database does not disclose information about its participating institutions. To evaluate the results, in-house consumption and resistance data from the university hospital Carl Gustav Carus in Dresden (UKD) had been used with the same structure. Hence, no information about individual patients could and can be identified in all used data sets.

Ethical approval was not required, because the project was based on epidemiological data. Research involving human subjects, human material and specific human or personalised data was not carried out. All data was anonymised regarding the hospital names for antibiotic consumption. Antibiotic resistance data was collected by the ARS database on federal state level and does not include any personal information either.

Firstly, we used descriptive statistics and visualizations to determine some possible patterns and trends in the data.

Secondly, we applied computational models in order to relate the antibiotic use and development of resistance and to uncover possible interactions. Therefore, we considered and compared a wide variety of methods with the goal of obtaining robust results as different approaches focus on different characteristics of the data.

In all models, corresponding hospital antibiotic consumption data from each hospital were aggregated to their according region to match the resistance data. We are working with recommended daily doses (RDD) instead of defined daily doses (DDD) values for accuracy reasons, as they match the actually prescribed dosages more precisely17,18. As mentioned before, the proportions of resistant pathogens are derived by different sample types. Due to the low number of blood samples, we used all sample types to obtain the target values for the models. We also included various quarterly time shifts between consumption and resulting resistance in all presented models. With a total amount of 20 time points (five year, four quarters), most methods of time series analysis do not seem to be suitable due to this low number of time points1. Therefore, we considered both a classical multiple linear regression and a linear mixed effect model, which complements the fixed effects with additional random effects, accounting for dependency among data points. Within the mixed-effect model, the regions served as random effects, because we are not interested in the effect of a particular region, but in the regional effect as a general source of heterogeneity. Furthermore, we used departments (internal vs. surgery), the distinction between general ward and ICU, and the consumption in RDD per 100 patient days of some selected antimicrobial substances as fixed effects. Additionally, we incorporated the number of patient-days as well as time in years and quarters into the model to account for potential confounding effects, where patient-days refer to the total number of days each patient occupies a hospital bed. We utilized the proportion of E. coli isolates being reported resistant to ciprofloxacin or cefotaxime as the dependent response variable. For each of the two substances we fitted a separate model. All linear regression and mixed-effect models were calculated using the statistical software R19. To determine the antibiotics used as predictors, we firstly included ciprofloxacin and cefotaxime in the according models as important representatives of the antibiotic class of fluoroquinolones and third generation cephalosporins respectively. Secondly, based on similar biochemical functionality, we added all substances of the corresponding antibiotic class that were used during the considered period. Thirdly, based on a successive model selection using the AIC criterion20, we added or removed substances as predictors and potentially transformed them according to their distributions to generate some possible regression model configurations. Thereby, substances of all antibiotic classes (e.g. penicillins, cephalosporins etc.) were possible candidates. To check for and to avoid overfitting artefacts, we split the data randomly into a training (90%) and test (10%) data set. A detailed justification for this is given in the following paragraph. Moreover, we fitted the regression models to the training data and, thereafter, applied them to the test data in order to evaluate and compare the accuracy. We conjecture that there is no resistance of E. coli to the two investigated substances without any antibiotic consumption, we assumed a zero intercept (i.e. regression line to pass through the origin)21. Note that this might not be entirely true in practice and will hold only approximately.

In addition to this classical statistical modelling framework, we applied several further machine learning methods to our problem. In terms of performance, artificial neural networks (i.e., deep learning methods) prevailed over other methods, which were random forests (RF) and support vector machines (SVM)22. In the following, we will, therefore, limit ourselves to the description of the artificial neural networks that we implemented in Python with the help of the tensorflow and keras package23.

We used the consumption of all antibiotic substances as the input and the E. coli-resistance to ciprofloxacin and cefotaxime as the output, both of which are fitted in the same model. Additionally, we incorporated the information about the respective regions, departments, and general ward / ICU into the model as Dummy variables using so-called one-hot encoding24, where categorical variables are transformed into several binary vectors. We applied a keras hyperparameter optimization framework to obtain a model containing three hidden layers with 50, 50 and 30 neurons and ReLU activation function25. Adam, an adaptive stochastic gradient descent method26, served as the optimizer. To improve the quality of the results, we also incorporated the concept of zero intercept in linear models (see above). Technically, this was done by adding a fixed number of zero lines to the data. In order to compare the different models to each other, we used the mean squared error (MSE). As artificial neural networks require a lot of data to be trained properly, we decided for the split into 90% training and 10% test data mentioned above.

In order to evaluate the in-house data of the UKD on ciprofloxacin and cefotaxime resistance of E. coli, we used linear models only, because neural networks do not lead to meaningful results because of the small sample size. Starting with the same models as for the main data described above, we removed the random effects, as there are no regional differences. Furthermore, we performed a model selection in the same way and used the coefficient of determination R² to compare this linear model with the regional linear model, as both models were trained on different datasets.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Limitations of estimating antibiotic resistance using German hospital consumption data – a comprehensive computational analysis

Musk-led DOGE works to clean up death records in Social Security database

Who Has The Fastest Punch On Earth?

Gwyneth Paltrow Says She Has Lots of Sex With Timothée Chalamet in ‘Marty Supreme’