Clinical validation of an AI-based blood testing device for diagnosis and prognosis of acute infection and sepsis

Categories: Disease & Virus

September 30, 2025

Between March 2020 and May 2024, 1,441 adult patients with suspected acute infection or suspected sepsis and (1) at least one abnormal vital sign or (2) at least two vital sign changes with a blood culture order were enrolled from 22 emergency departments (Fig. 1 and Supplementary Table 1). After excluding screen failures and withdrawals, 1,222 patients had valid TriVerity results, and 729 of these were clinically adjudicated as consensus for the presence of a bacterial and/or viral infection (Yes or No adjudication status; Methods). We present these consensus-adjudicated patients as the main population (primary diagnostic endpoint); the secondary outcome of forced adjudication includes all 1,222 patients but with less certain adjudication status (Yes plus Probable and Unlikely plus No adjudications are grouped together; Methods). For the prognostic endpoint, 1,120 patients were evaluable (Fig. 1).

Characteristics of study participants at emergency department presentations

Mean age, sex, race and ethnicity were representative of the US emergency department population (Extended Data Table 1). Among patients evaluable for the diagnostic endpoint, the mean age was 50.6 years, and 47.3% of the patients were female. Most patients were White (63.2%), followed by Black (30.6%) and Hispanic/Latino (13.3%). Similar percentages were observed among those evaluated for the prognostic endpoint (Extended Data Table 1). Metabolic/endocrinological, respiratory and cardiovascular diseases were the most prevalent medical conditions. There were no marked differences between patients evaluable for the diagnostic endpoint versus those evaluable for the prognostic endpoint. Overall, 132 patients (18.1%) evaluable for the diagnostic endpoint were immunosuppressed compared to 206 patients (18.4%) evaluable for the prognostic endpoint. Malignancies were the most frequently found type of immunosuppression (approximately 10% of patients), followed by solid organ transplantation, steroid treatment and HIV/AIDS (Extended Data Table 1).

The mean leukocyte counts in patients eligible for the diagnostic endpoint was 11.8 × 10⁹ per liter, the mean neutrophil percentage was 76.9 and the mean lymphocyte percentage was 12.5; similar numbers were observed in patients eligible for the prognostic endpoint (Supplementary Table 2a). Mean concentrations of biomarkers (CRP, procalcitonin and lactate) are shown in Supplementary Table 2b.

Among 1,120 patients eligible for the prognostic endpoint, 24.3% were discharged to home, 55.8% were admitted to a regular ward and 13.8% were admitted to an ICU (Extended Data Table 2a). The mean length of hospital stay was 5 days and of ICU stay was 6.1 days; there was no significant difference in consensus-adjudicated patients. Out of the 1,120 patients, 122 (10.9%) met the primary severity endpoint, which included mechanical ventilation (n = 63, 51.6%), vasopressor use (n = 99, 81.2%) and/or RRT (n = 23, 18.9%) (Extended Data Table 2b). Among 147 patients who were transferred to the ICU, 98 (66.7%) received the ‘ICU-level care’ interventions (composite primary severity endpoint). Extended Data Table 2b also shows differences in clinical outcomes stratified by clinically adjudicated infection status.

Infection status and anatomical location of infection

Out of 729 consensus-adjudicated patients for diagnostic endpoint, 448 (61.5%), 165 (22.6%) and 12 (1.6%) had bacterial, viral and bacterial–viral co-infection, respectively, whereas 104 (14.3%) were adjudicated to not have an infection (Table 1). Among 460 patients adjudicated to have bacterial infections, urinary tract infections (n = 142, 30.9%) and skin or soft tissue infections (n = 133, 28.9%) were most frequent, followed by bloodstream (n = 92, 20%), gastrointestinal tract (n = 69, 15%) and respiratory tract (n = 58, 12.6%) infections. Among 177 patients adjudicated to have viral infections, 169 (95.5%) had respiratory tract infections, followed by gastrointestinal tract infections (n = 4, 2.3%). Three parasitic infections were diagnosed (Giardia lamblia enterocolitis, n = 1; Trichomonas vaginalis vulvovaginitis, n = 2). The percentage of bacterial infections was relatively high (63.1%), driven by seasonal epidemiology and the broad inclusion criteria allowing for enrollment of patients with all types of suspected infections (not only respiratory infections). Infection status for the forced adjudication cohort is shown in Supplementary Table 3a.

Table 1 Clinically adjudicated infection status and anatomical localization of infection

TriVerity result output

TriVerity provides three scores: Bacterial, Viral and Severity. Each score ranges from 0 to 50 and is divided into five interpretation bands (Very Low (0–10), Low (11–20), Moderate (21–30), High (31–40) and Very High (41–50)) that reflect increasing likelihoods of the corresponding infection type or severity. The suggested clinical interpretation of the two highest bands (‘Very High’ and ‘High’) is ‘rule-in’, whereas, for the two lowest bands (‘Very Low’ and ‘Low’), it is ‘rule-out’. Accuracy of TriVerity scores was evaluated as compared to post hoc clinical adjudications (Methods and ref. ²²). For each of the three scores, when considering them separately, 80–86% of the patients were assigned to one of the rule-in or rule-out bands. When considered together, almost all of TriVerity results (99.6% and 99.3% of patients in the consensus and forced adjudication cohort, respectively) fell into one of the four clinically actionable interpretation bands (that is, Very Low, Low, High and Very High) for at least one of the diagnostic and prognostic scores.

Accuracy of TriVerity for diagnosis of bacterial infection

The Bacterial score had an AUROC of 0.83 (80% CI: 0.81–0.85) for detecting bacterial infections in the consensus population. Probability of bacterial infection, as measured by likelihood ratio, ranged over 100-fold and increased monotonically by interpretation band (likelihood ratios: Very Low, 0.08 (80% CI: 0.05–0.11); Low, 0.54 (0.45–0.63); Moderate, 1.14 (0.94–1.42); High, 2.50 (1.97–3.31); Very High, 8.04 (5.66–12.43) (Table 2). The Bacterial score had specificity of 95.5% and 90.7% for the Very High and High bands, respectively, and sensitivity of 97.2% and 81.5% for the Very Low and Low bands, respectively (Table 2). Notably, 81.3% of the patients with consensus adjudication (and 80.4% of those with forced adjudication) fell into one of the clinically actionable Very Low, Low, High and Very High interpretation bands for bacterial infection. At a prevalence of 63.1% for bacterial infections, the probability of having a bacterial infection for the Very High interpretation band was 93.2% and for Very Low was 12.1%. Using forced adjudication (entire population including ‘uncertain’ adjudication—that is, probable and unlikely cases), the area under the curve (AUC) for the detection of bacterial infections was 0.76 (80% CI: 0.75–0.78) (Supplementary Table 3b); sensitivity, specificity and likelihood ratio for the accuracy of TriVerity Bacterial scores are shown in Extended Data Table 3a.

Table 2 Accuracy of TriVerity Bacterial score for the diagnosis of bacterial and viral infections

Accuracy of TriVerity for the diagnosis of viral infection

The Viral score had an AUROC of 0.91 (80% CI: 0.89–0.93) for the detection of viral infections. Likelihood of viral infection increased monotonically more than 400-fold by interpretation band (likelihood ratios: Very Low, 0.09 (80% CI: 0.05–0.14); Low, 0.32 (80% CI: 0.22–0.41); Moderate, 0.87 (80% CI: 0.64–1.13); High, 2.36 (80% CI: 1.68–3.25) and Very High, 40.93 (80% CI: 27.73–72.16)). The Viral score had specificity of 98.6% and 94.0% for the Very High and High bands, respectively, and sensitivity of 95.5% and 90% for the Very Low and Low bands, respectively (Table 2). Notably, 86.1% of patients with consensus adjudication (and 81.3% of those with forced adjudication) fell into the clinically actionable Very High, High, Low and Very Low bands. At a prevalence of 24.3% for viral infections, the probability of having a viral infection in the Very High band was 92.9% and for Very Low was 2.9%. In the forced adjudication, the Viral score had an AUROC of 0.83 (0.81–0.85) for the detection of viral infections; sensitivity, specificity and likelihood ratio for the accuracy of TriVerity Viral scores are shown in Extended Data Table 3b.

The accuracy of the Viral score was robust in patients diagnosed with SARS-CoV-2 (Extended Data Table 4a), demonstrating its applicability and generalizability to emerging pathogens. Median Viral scores were highest in patients with infection from influenza A/B and SARS-CoV-2; patients diagnosed with human metapneumovirus and respiratory syncytial virus had intermediate Viral scores, whereas those with adenovirus and rhinovirus/enterovirus had the lowest Viral scores (Extended Data Table 4b).

Cross-classifications of both Bacterial and Viral scores for patients clinically adjudicated as bacterial infection, viral infection, co-infection and non-infected are shown in Supplementary Table 4a–d.

Overall, the Bacterial and Viral scores were strongly associated with increasing likelihoods of bacterial and viral infections, respectively. Notably, the Bacterial and Viral scores increased monotonically, and the 80% CIs for adjacent bands did not overlap in any of the analyses (Supplementary Fig. 1a,b).

Accuracy of TriVerity compared to commonly used biomarkers

The AUROC of the Bacterial score (0.83, 80% CI: 0.81–0.85) was significantly higher than those of commonly used biomarkers for the diagnosis of infections, including procalcitonin (AUROC = 0.71, 80% CI: 0.68–0.73), CRP (AUROC = 0.74, 80% CI: 0.72–0.77) and white blood cell (WBC) counts (0.76, 80% CI: 0.73–0.78) (P 5a). Supplementary Fig. 2 shows the correlation of WBC, CRP and procalcitonin concentrations with TriVerity Bacterial interpretation bands; Extended Data Table 2d shows the AUROCs for these biomarkers in the forced adjudication cohort.

In addition, the diagnostic accuracy of TriVerity generalized across races better than other biomarkers. Specifically, although the overall AUROC for procalcitonin was 0.71 for diagnosis of bacterial infection, it was substantially lower in Blacks (AUROC = 0.66) and other races (AUROC = 0.62) compared to Whites (AUROC = 0.74), highlighting lower clinical utility of procalcitonin in non-White populations. By contrast, the Bacterial score’s overall AUROC of 0.83 remained virtually identical in Whites (0.82) and Blacks (0.83) and was higher in other races (0.91) (Extended Data Table 5b).

Because TriVerity measures the host immune response to infection, we investigated whether the Bacterial and Viral scores maintained their diagnostic accuracy in immunocompromised patients, who also have an increased risk of infection. The AUROCs for TriVerity Bacterial and Viral scores were not significantly different between immunocompromised and immunocompetent patients (0.80, 80% CI: 0.75–0.85 versus 0.83, 80% CI: 0.81–0.85 for Bacterial scores; 0.89, 80% CI: 0.86–0.94 versus 0.91, 80% CI: 0.89– 0.94 for Viral scores) (Extended Data Table 6a,b). Lastly, when considering median Bacterial and Viral scores for specific anatomical sites of infection, patients adjudicated as positive for bacterial infection in the bloodstream had the highest median Bacterial score, followed by patients with bacterial infections of the respiratory and urinary tracts (Extended Data Table 7a,b). Only AUCs for the Bacterial scores in patients with bloodstream infections were significantly (P

Prognostic accuracy of the TriVerity Severity score

The Severity score predicted the need for ‘ICU-level care’, defined as an acute need for mechanical ventilation, vasopressor use and/or RRT within 7 days, with an AUROC of 0.78 (80% CI: 0.75–0.81). Risk of requiring ICU-level care monotonically increased by interpretation band over 50-fold (likelihood ratios: Very Low, 0.22 (80% CI: 0.14–0.31); Low, 0.43 (80% CI: 0.30–0.57); Moderate, 1.63 (80% CI: 1.35–1.97; High, 2.41 (80% CI: 1.96–2.91); Very High, 11.33 (80% CI: 7.07–17.75)). The Severity score demonstrated specificity of 98.7% and 86.1% for the Very High and High bands, respectively, and sensitivity of 91.8% and 87.7% for the Very Low and Low bands, respectively (Table 3). Most patients (79.6%) were in the clinically actionable Very High, High, Low and Very Low interpretation bands. At a prevalence of 10.9%, the probability of requiring ICU-level care within 7 days was 58.1% for the Very High band and 2.7% for the Very Low band. Likelihood ratios of the Severity scores for the three different individual components of ‘ICU-level care’ were similar to the overall likelihood ratios presented above (Extended Data Table 8). Kaplan–Meier survival analysis also found significantly increasing hazard ratios for the need of ‘ICU-level care’ between days 0 and 7 from ‘Very Low’ to ‘Very High’ Severity bands (Supplementary Fig. 3).

Table 3 Accuracy of TriVerity Severity score for the prediction of the need of mechanical ventilation, vasopressor use and/or RRT within 7 days

Lactate, a mandated biomarker commonly used to estimated severity as part of the SEP-1 bundle, had an AUROC of 0.76 (80% CI: 0.73–0.80) for predicting the need for ICU-level care. In the same patients, the Severity score had an AUROC of 0.78 (80% CI: 0.75–0.80). Lactate higher than 4 mmol l⁻¹ demonstrated specificity of 95.8%, similar to the Very High and High bands for TriVerity Severity score. By contrast, sensitivity in patients with lactate lower than 2 mmol l⁻¹ was 66.7% (Extended Data Table 9a), which was substantially lower than the sensitivity for Very Low and Low bands for the Severity score (>87%). In total, 213 patients had indeterminate lactate concentrations (2–4 mmol l⁻¹), of whom 58 needed ‘ICU-level care’ or died (Extended Data Table 9b). Of these 58, TriVerity identified 46 (79.3%) as Moderate to Very High risk of severe illness, substantially reducing the uncertainty in identifying patients at higher risk of severe illness compared to lactate (Extended Data Table 9b). Hence, despite similar AUROCs, TriVerity demonstrated significantly lower false-negative and higher true-positive rates in patients with indeterminate lactate concentrations.

The severity of a patient’s clinical condition at the time of presentation in the emergency department can be assessed using clinical scores, such as the qSOFA score. We investigated whether integrating qSOFA with the TriVerity Severity score would further improve the accuracy of predicting ‘ICU-level care’. We determined pre-test and post-test probabilities of sequential qSOFA plus Severity scores (Fig. 2 and Supplementary Table 5). ICU-level care requirement among patients with low-risk qSOFA scores (0–1) was 7.7% and for those with high-risk qSOFA scores (2–3) was 46.3% (Fig. 2). However, stratifying patients by TriVerity Severity score interpretation bands markedly increased predictive accuracy in both the low and high clinical risk patients. For instance, a patient with qSOFA 0–1 (7.7% risk overall) would increase to a 52% risk with a ‘Very High’ TriVerity Severity score, whereas a patient with qSOFA 2–3 (46% risk overall) would decrease to a 18–25% risk with a ‘Low’ or ‘Very Low’ Severity score. The overall sensitivity of predicting ‘ICU-level care’ using qSOFA scores alone was 33.93%. When we combined qSOFA with the Severity score, the sensitivity increased significantly to 84.82% (P

**Fig. 2: TriVerity Severity scores used in combination with qSOFA to predict the need for mechanical ventilation, vasopressor use and/or RRT (‘ICU-level care’) within 7 days.**

Finally, the Severity score also predicted the composite need for ‘ICU-level care’ and/or 28-day mortality with a specificity of 99.0% for the rule-in Very High band and a sensitivity of 95.0% for the rule-out Very Low band, which further supports its predictive value of severe illness (Supplementary Table 6).

Potential clinical utility

To examine the potential clinical utility of TriVerity, we performed several preliminary analyses. We note that these analyses were not part of the preplanned analyses. First, we investigated whether the Bacterial scores could help reduce inappropriate antibiotic treatment using post hoc clinically adjudicated infection status as the gold standard (Fig. 3a). Thirty-three patients were adjudicated to have a bacterial infection but did not receive antibiotics on the day of presentation, of whom 10, 11 and 3 (overall 24 (72.7%)) had a Bacterial score of Moderate, High or Very High, respectively (Fig. 3a), suggesting that TriVerity Bacterial score could have helped emergency department providers avoid delays in antibiotic administration. On the other hand, 103 patients were adjudicated to not have a bacterial infection but received antibiotics on the day of presentation, of whom 62 (60.2%) had a Bacterial score of Low or Very Low (Fig. 3a), suggesting that TriVerity results could have helped emergency department providers avoid antibiotic overprescription.

**Fig. 3: Potential diagnostic and prognostic clinical utility of TriVerity.**

Second, we applied likelihood ratios associated with TriVerity Bacterial bands to calculate theoretical post-test probability of bacterial infection for a range of hypothetical pre-test probabilities—ranging from 0 to 1 in 0.1-increment bins (Fig. 3b)—to reflect that physicians will use the Bacterial score in a variety of patients, including in those they judge infection is unlikely (for example, pre-test probability of 10%) or very likely (for example, pre-test probability of 90%). Using a threshold of 69% probability of bacterial infection for providers to prescribe antibiotics²³, the post-test probability crosses the threshold in patients with ‘Very High’ Bacterial scores and pre-test probability as low as 0.3. Similarly, a ‘High’ Bacterial score results in a post-test probability of 0.69 in patients with a pre-test probability greater than 0.5. Conversely, Bacterial scores in the Very Low and Low bands reduce post-test probability below the 0.69 threshold for patients with a pre-test probability less than 90% and 80%, respectively (Fig. 3b). This analysis demonstrates that TriVerity Bacterial scores can assist emergency department providers in providing antibiotics to patients most likely to have bacterial infections and avoid antibiotics in patients less likely to have bacterial infections.

Finally, we explored whether the combination of high Severity and Bacterial scores would identify cases of bacterial sepsis. Because sepsis was not an adjudicated endpoint in the study, we used the definition of adjudicated bacterial infection and a change in SOFA score of ≥2 or requiring ICU-level care as a surrogate and examined the percent of patients with each combination of Bacterial and Severity score who met this definition for sepsis (Fig. 3c). We found that 68% of patients with a ‘Very High’ Bacterial score and a ‘Very High’ Severity score met this definition of sepsis. In addition, 33% of patients with a ‘High’ Bacterial score and a ‘Very High’ Severity score, or vice versa, met this definition of sepsis. In contrast, no patients with a ‘Very Low’ Bacterial score and a ‘Low’ or ‘Very Low’ Severity score had sepsis. Less than 3% of patients with a ‘Low’ Bacterial score and a ‘Low’ or ‘Very Low’ Severity score had sepsis.

Using a logistic regression model for infection status, including vital signs and laboratory values, and TriVerity (even though the adjudicators were blinded to all TriVerity results), the Bacterial score was the most significant variable associated with infection status, followed by WBC and procalcitonin (Supplementary Table 7).

Source link