Dolutegravir restores gut microbiota in late-stage HIV-1 unlike darunavir: an open-label, randomized clinical trial

Categories: Disease & Virus

March 4, 2026

Study design

This substudy was nested within the ADVANZ-4 trial (EudraCT: 2014-002281-70, ClinicalTrials.gov: NCT02337322), a prospective, open-label, two-arm randomized clinical trial enrolling ART-naïve adults presenting with CD4+ T cell counts below 100 cells/mm³ at HIV diagnosis²⁷. Participants were randomized 1:1 to receive either dolutegravir (50 mg QD) or ritonavir-boosted darunavir (800/100 mg QD), each combined with a fixed-dose formulation of lamivudine (300 mg QD) and abacavir (150 mg QD). All participants received cotrimoxazole prophylaxis until achieving CD4+ counts >200 cells/mm³ for 3 consecutive months. Follow-up visits were conducted at baseline and weeks 24, 48, and 96. Fecal samples for microbiome analysis were collected at these time points. We employed a modified intention-to-treat 2 (mITT2) approach, restricting the microbiome analysis to participants providing fecal samples at baseline or at least two follow-up time points.

This study adheres to the STORMS reporting guidelines for human microbiome research, and the completed checklist is provided as Supplementary Table 1.

Endpoints and visit schedule

According to the Statistical Analysis Plan (SAP) of the ADVANZ-4 trial, the primary endpoint was the change in absolute CD4⁺ T-cell count from baseline to week 48. Secondary endpoints included: (i) plasma HIV-1 RNA levels and the proportion of participants achieving viral suppression (

Per protocol and SAP specifications, clinical assessments, blood sampling, and virological measurements were performed at baseline and weeks 4, 12, 24, 36, and 48, with extended follow-up to week 96 in this substudy. Immunological and inflammation biomarkers were assessed at baseline, week 24, and week 48, consistent with the trial protocol.

Participant exclusions

A total of 45 screened individuals were excluded prior to randomization. Most exclusions were due to medical decision (n = 31), primarily involving severe AIDS-defining conditions that compromised clinical stability (n = 11; including opportunistic infections requiring ICU admission [n = 5, of whom two died during admission], primary CNS lymphoma [n = 3], non-Hodgkin lymphoma [n = 2], and CNS toxoplasmosis [n = 1]). Additional medical reasons included severe potential drug–drug interactions (n = 8; mainly tuberculosis or psychiatric comorbidities), absence of social support (homelessness; n = 4), anticipated lack of adherence (n = 3), liver disease (n = 3), and psychiatric disorders (n = 2). Seven individuals did not meet the inclusion criteria (positive HLA-B*5701 test, n = 4; confirmatory CD4+ T cell count >100 cells/mm³, n = 3), and none had baseline antiretroviral resistance. Finally, seven individuals declined to participate.

Sample collection and processing

Stool samples were collected either at clinical sites or at home using the OMNIgene. GUT collection kit (DNA Genotek, Ottawa, Canada). The kit stabilizes microbial DNA at ambient temperatures, minimizing degradation during transport. Upon receipt at the central laboratory (IrsiCaixa, Badalona, Spain), samples were immediately frozen at −80 °C. DNA extraction was performed using the PowerSoil DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA, USA) following the Earth Microbiome Project standard protocols⁵⁶, ensuring reproducibility and minimizing technical biases. Mechanical lysis was achieved via bead-beating.

Sequencing libraries were prepared using the Nextera XT DNA Library Prep Kit (Illumina Inc., San Diego, CA, USA) targeting an average insert size of 300 bp. Paired-end sequencing (2 × 150 bp) was performed on the Illumina HiSeq 2500 platform. A sequencing depth target of 20 million reads per sample was established to ensure sufficient coverage for robust metagenomic profiling.

Flow cytometry

Peripheral blood mononuclear cells (PBMCs) were isolated by density-gradient centrifugation using Ficoll-Hypaque (Biomedics-Biomérieux, Madrid, Spain). Immunophenotyping included quantification of CD4⁺ and CD8⁺ T-cell subsets and characterization of naïve, effector, and memory populations based on CCR7 and CD45RA expression. Markers of immune activation (CD38, HLA-DR), senescence (CD57, CD28), and apoptosis (annexin V, CD95) were assessed in both CD4⁺ and CD8⁺ compartments.

Three-color flow cytometry was performed following previously validated protocols from the parent ADVANZ-4 trial. Data were acquired on a standard flow cytometer and analyzed using FlowJo software. For apoptosis assays, PBMCs were cultured for 18 h at 1 × 10⁶ cells/mL in 24-well plates in the presence or absence of 1% phytohemagglutinin, stained with annexin V-PE (BD PharMingen, San Jose, CA) in buffer containing 5 mM CaCl₂, followed by CD95 detection.

Measurement of inflammatory and immune activation markers by ELISA

Plasma concentrations of inflammatory and immune activation markers were quantified using commercially available enzyme-linked immunosorbent assays (ELISA), performed in accordance with the manufacturers’ protocols. All samples were analyzed under standardized experimental conditions, and measurements were conducted by trained personnel blinded to clinical group allocation.

Baseline plasma samples were available for all participants for all analytes included in the study, and no samples were missing at baseline. Samples were assayed in duplicate, and mean values were used for downstream analyses. Intra- and inter-assay coefficients of variation were within the ranges specified by the manufacturers.

For each analyte, concentration values below the assay-specific lower limit of detection (LLOD) were reported by the assay and retained in the dataset rather than treated as missing values. These values were included in both descriptive and inferential analyses. Apparent gaps in graphical representations therefore reflect measurements below the LLOD and not missing data. As a reference, the LLOD for IL-6 was

All raw concentration values for IL-6 and the remaining inflammatory and immune activation markers measured by ELISA are provided in Supplementary Table 2.

Microbiome data analysis

Raw reads were processed with Trimmomatic (v0.39)⁵⁷, removing Illumina adapters, trimming reads with a sliding window of 30 nucleotides, and discarding reads with average Phred scores below 20. Human reads were removed by mapping against the hg19 reference genome using Bowtie2 (v2.4.2)⁵⁸ with end-to-end alignment mode and a maximum mismatch penalty of 6. Only unmapped reads were retained.

Taxonomic profiling was performed using MetaPhlAn3 (v3.0)⁵⁹ with default parameters, based on clade-specific marker genes from the CHOCOPhlAn database (version 201901). Reads were additionally mapped to the Integrated Gene Catalog (IGC)⁶⁰ using Bowtie2 for gene-level quantification. Rarefaction curves were generated by subsampling reads to increasing coverage thresholds, and gene richness was defined at the minimum point where curves plateaued. Samples falling below the 2nd percentile of the maximum coverage distribution were excluded to mitigate depth-related biases.

Microbiome data processing

Post-taxonomic assignments, abundance matrices were imported into R as Phyloseq objects (phyloseq v1.42.0)⁶¹ for preprocessing. Samples were filtered for completeness of metadata and microbiome data. Taxonomic abundance, taxonomy tables, and sample metadata were integrated into a Tree Summarized Experiment (TreeSE) object (tidyTreeSummarizedExperiment v1.10.0), facilitating consistent downstream analysis.

Alpha diversity analysis

Alpha diversity was evaluated using the mia R package (v1.8.0). Metrics computed included Shannon diversity, Gini dominance, and observed richness (number of detected taxa). To account for uneven sequencing depth, a rarefaction strategy was implemented at the 2nd percentile of library sizes, repeated over 10 random iterations to ensure robustness.

Group and longitudinal differences were analyzed with linear mixed-effects models with the following formula:

$$ {\mathrm{diversity}}\,{\mathrm{metric}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}}\\ +{\mathrm{age}}+{\mathrm{risk}}\_{\mathrm{group}}+(1|{\mathrm{participant}}\_{\mathrm{id}})$$

This model included fixed effects for treatment, time point, and their interaction, and a random intercept for participant. Multiple testing was controlled using the Benjamini–Hochberg FDR, and model-based effect sizes with 95% confidence intervals are reported. Distributions were visualized with boxplots overlaid with jittered datapoints and estimated marginal means for interpretability.

To characterize within-arm consistency of longitudinal trajectories, we additionally estimated a subject-level slope of change from baseline (Δ per visit) for each diversity metric and compared the dispersion of slopes between treatment arms. Equality of variances was assessed using Brown–Forsythe and Fligner–Killeen tests, and variance ratios (dolutegravir/darunavir/ritonavir) with bootstrap confidence intervals (5000 resamples) were computed to quantify differences in intra-group variability. As a confirmatory analysis within the mixed-model framework, we fitted location–scale linear mixed models allowing arm-specific residual variances (varIdent structure in the nlme package) and compared these to homoscedastic models using likelihood ratio tests.

Gene richness was separately assessed by counting the number of genes mapped above a dynamic threshold determined per sample, ensuring comparability across the cohort. This dynamic threshold was derived based on the rarefaction analysis against the IGC, selecting for each sample the coverage depth at which the rarefaction curve plateaued, thus avoiding biases due to differences in sequencing depth.

A detailed overview of all statistical analyses, including the family of tests and the number of comparisons used for FDR control in each analysis, is provided in Supplementary Table 3.

Correlation analyses

Repeated-measures correlation effects (rmcorr), which account for within-participant dependency across longitudinal samples, were calculated to explore associations between microbiome alpha diversity indices (Shannon diversity, Gini dominance, observed richness, and gene richness) and a comprehensive panel of clinical parameters. The clinical variables included circulating immune cell counts (CD4+, CD8+, CD4 nadir), immune activation markers (expression of CD38 and HLA-DR on CD4+ and CD8+ T cells), systemic inflammatory markers (C-reactive protein [CRP], interleukin-6 [IL-6], tumor necrosis factor-alpha [TNF-α], and soluble CD14 [sCD14]), and body mass index (BMI).

To normalize the distribution of cytokine levels and mitigate the influence of extreme values, cytokine data were log2(x + 1) transformed prior to analysis. Repeated-measures correlations were computed both across the full cohort and within each treatment arm to capture treatment-specific host–microbiome associations.

Two-sided p values from correlation tests were adjusted using the Benjamini–Hochberg method to control for the FDR. Only associations with q r) and visualized in heatmaps, with color encoding the strength and direction of associations. Selected relationships were further illustrated with scatterplots showing participant-level trends and fitted regression lines.

Longitudinal evolution of clinical marker levels

Longitudinal changes in clinical markers were analyzed using linear mixed-effects models with the specification:

$$ {\mathrm{marker}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}}+{\mathrm{age}}\\ +{\mathrm{risk}}\_{\mathrm{group}}+(1|{\mathrm{participant}}\_{\mathrm{id}})$$

(1)

Here, time_point was modeled as a categorical factor to allow non-linear trajectories, and a random intercept for participant_id accounted for within-individual correlation. The treatment-by-time interaction enabled inference on both between-group differences at each time point and within-group longitudinal changes. Model coefficients (β) and 95% confidence intervals were estimated, and planned contrasts (pairwise time comparisons within treatment and between-treatment comparisons at each time point) were derived from the fitted models. Multiple testing was controlled using the Benjamini–Hochberg FDR. Visualization included boxplots of raw marker distributions by time and treatment with jittered points, forest plots summarizing model-based contrasts with FDR-adjusted p values, and longitudinal effect plots showing time on the x-axis and estimated effects (with 95% CIs) on the y-axis to highlight temporal trends.

Beta diversity analysis

Beta diversity was computed with Bray–Curtis dissimilarities from relative-abundance profiles. Community structure was visualized by NMDS (monoMDS in vegan). To test associations while accounting for repeated measurements, we fit PERMANOVA models with adonis2 using:

$$ {\mathrm{Distance}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}} \\ +{\mathrm{age}}+{\mathrm{aids}}\_{\mathrm{event}}+{\mathrm{cmv}}\_{\mathrm{negative}}+{\mathrm{risk}}\_{\mathrm{group}}+{\mathrm{HIV}}\_{\mathrm{VL}}$$

(2)

Constraining permutations within participants (strata = record_id). Only complete cases for model covariates were included.

To summarize within-group dispersion, we calculated each sample’s Euclidean distance in NMDS space to its leave-one-out centroid for the corresponding treatment × time_point group (“centroid distance”). Centroid distances were analyzed with linear mixed-effects models (lmerTest) to estimate (i) longitudinal changes within treatment arms versus baseline (week 0) and (ii) between-treatment differences at each time point, using:

$$ {\mathrm{centroid}}\_{\mathrm{dist}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}} \\ +{\mathrm{age}}+{\mathrm{risk}}\_{\mathrm{group}}+(1|{\mathrm{participant}}\_{\mathrm{id}})$$

(3)

Planned contrasts were obtained with emmeans; we report estimates, 95% CIs, and Benjamini–Hochberg FDR-adjusted p values. Results are shown as forest plots (between-treatment contrasts) and longitudinal trajectory plots (within-treatment changes vs baseline).

Associations between community variation and clinical/immunologic markers (CD4, CD8, activated T cell subsets, CRP, IL-6, TNF-α, sCD14, BMI) were assessed using repeated-measures correlation (rmcorr), reported as correlation coefficients (r) with FDR-adjusted p values, both globally and stratified by treatment. For compositional summaries, genus-level stacked barplots (top 25 genera; remainder “Other”) were ordered by NMDS1 and annotated with treatment, risk group, time point, and centroid distance. Figures were generated with ggplot2 and arranged with patchwork; PERMANOVA tables were rendered with gt. All analyses were performed in R.

Differential abundance analysis

We first benchmarked microbiome differential-abundance software using the dar R package to guide method selection⁶². Differentially abundant taxa were then identified with ANCOMBC2 (v2.1.2) (REF) on TreeSummarizedExperiment objects at Genus and Species resolution. The model specification included fixed effects for treatment, time, and covariates, and a subject-level random intercept to account for repeated measures:

$$ {\mathrm{Abundance}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}}+{\mathrm{age}} \\ +{\mathrm{risk}}\_{\mathrm{group}}+(1|{\mathrm{record}}\_{\mathrm{id}})$$

(4)

Pairwise treatment contrasts at each time point were obtained by setting group = “treatment” and pairwise = TRUE. Analyses applied a prevalence filter of ≥10% across samples and used Benjamini–Hochberg FDR control. We considered taxa significant if the FDR-adjusted q value

To relate differential taxa to host immunologic and clinical markers (CD4+ and CD8+ T cell counts, activated T cell subsets, CRP, IL-6, TNF-α, sCD14, BMI), we computed repeated-measures correlations (rmcorr) between CLR-transformed taxon abundances and each marker. Cytokines were log2-transformed with a +1 offset prior to analysis. Repeated-measures correlation coefficients (r) and FDR-adjusted p values were summarized overall and by treatment arm and displayed as dotmaps and per-marker rmcorr plots.

Significance criteria were: |log2 fold-change| ≥ 0.5, FDR-adjusted p value 20%. Differential features were visualized using volcano plots and annotated barplots. Significant taxa were additionally correlated with clinical parameters using Spearman’s method.

Functional analysis

Functional profiling focused on MetaCyc pathways derived from HUMAnN3 outputs⁵⁹. We retained non–taxa-stratified pathways and excluded “UNMAPPED/UNINTEGRATED” entries. Pathway abundance tables were aligned to the study metadata and assembled into TreeSummarizedExperiment objects. Differential pathway abundance was tested with ANCOMBC2 using the model:

$$ {\mathrm{Abundance}} \sim {\mathrm{treatment}}*{\mathrm{time}}\_{\mathrm{point}}+{\mathrm{gender}}+{\mathrm{age}} \\ +{\mathrm{risk}}\_{\mathrm{group}}+(1|{\mathrm{record}}\_{\mathrm{id}})$$

(5)

With a subject-level random intercept (record_id) to account for repeated measures, group = “treatment”, Benjamini–Hochberg FDR control, and a prevalence filter of ≥10%. We considered effects significant when the FDR-adjusted q

To relate functional variation to host markers (CD4+ and CD8+ T cell counts, activated T cell subsets, CRP, IL-6, TNF-α, sCD14, BMI), we computed repeated-measures correlations (rmcorr) between CLR-transformed pathway abundances and each marker; cytokines were log2-transformed with a +1 offset. Correlations were summarized globally and by treatment arm, and multiple testing was controlled with FDR.

External cohort integration and comparative dissimilarity analysis (MetaHIV)

To contextualize ADVANZ-4 microbiome profiles against external HIV phenotypes, we integrated samples from the MetaHIV study and reprocessed both datasets with a harmonized pipeline. Short-read profiles were generated with nf-core/taxprofiler using MetaPhlAn4 to obtain species-level relative abundances. Sample metadata were standardized (common IDs, time points, and HIV phenotype labels) and merged. A TreeSummarizedExperiment object was constructed at the species rank, keeping non-missing samples with complete metadata.

Beta diversity (Bray–Curtis) was computed from species-level relative abundances. For each sample iii, we quantified its dissimilarity to HIV-negative microbiomes as the mean Bray–Curtis distance to all HIV-negative samples:

$${di}=\frac{1}{\left|N\right|}{\sum }_{j\in N}{{\rm{BC}}}\left(i,j\right)$$

(6)

where N indexes HIV-negative samples. Group differences in diversity were tested with a linear mixed-effects model including a subject-level random intercept:

$$di\sim {\mathrm{Profile}}+\left(1,|,{\mathrm{patient}}\,{\mathrm{id}}\right)$$

(7)

followed by estimated marginal means and treatment-vs-control contrasts with HIV-negative as reference (FDR control via Benjamini–Hochberg). Results are presented as effect estimates with 95% CIs in forest plots.

For ordination, we performed NMDS (Bray–Curtis; monoMDS in vegan) on the integrated species matrix to visualize separation across HIV phenotypes and ADVANZ-4 time points. As a complementary summary, we computed each sample’s Euclidean distance in NMDS space to the HIV-negative centroid:

$${{\mathrm{dist}}}_{i}^{\left({\mathrm{neg}}\right)}=\sqrt{{\left({x}_{i}-{\bar{x}}_{{\mathrm{neg}}}\right)}^{2}+{\left({y}_{i}-{\bar{y}}_{{\mathrm{neg}}}\right)}^{2}}$$

(8)

and assessed differences versus HIV-negative with the same mixed-model framework and FDR-adjusted inference.

Network analysis

To explore genus-level co-occurrence patterns under dolutegravir (dolutegravir) versus darunavir/ritonavir (darunavir/ritonavir) therapy, we began with the pre-processed SingleCellExperiment object containing CLR- and log-transformed assays. Samples were subset by time point (0, 24, 48, and 96 weeks) and, within each subset, counts were agglomerated at the genus rank. Relative-abundance profiles were generated, and taxa present in fewer than 15% of samples were removed. The remaining genera were then transformed by centered log-ratio (pseudocount = 1) to stabilize variance. For each time point, the filtered genus-level data were split by treatment group, yielding separate dolutegravir and darunavir/ritonavir matrices for network inference.

Networks were constructed with the SPRING method in NetCoMi’s netConstruct function (NetCoMi v1.1.0)^63,64, restricting to the 200 most prevalent genera and exploring 20 values of the regularization parameter (nlambda = 20, rep.num = 10, thresh = 0.05, Rmethod = “approx”) on signed dissimilarities. The resulting microNet objects were characterized using netAnalyze, applying Louvain community detection, identifying hub taxa via eigenvector centrality, reporting unnormalized degree centrality on the LCC, and computing graphlet orbit counts together with the Graphlet Correlation Matrix (GCM) for that component. To highlight treatment-specific shifts in network micro-architecture, we calculated the difference between the dolutegravir and darunavir/ritonavir GCMs (ΔGCM = GCM_dolutegravir – GCM_darunavir/ritonavir) and visualized this differential matrix as a single heatmap, with statistical significance of Δρ values assessed by FDR-adjusted Spearman tests.

Software and reproducibility

All analyses were conducted in R (v4.5.0) using packages from the Bioconductor (v3.21) and CRAN repositories. Scripted workflows were version-controlled with Git, and data processing pipelines were documented using Quarto notebooks, ensuring transparency. Parallel computing was leveraged via the BiocParallel or future R packages. R Session Info and packages are provided in each Quarto notebook to facilitate reproducibility.

Ethics approval and consent to participate

This study was approved by the institutional ethical review board of the participating institutions (Reference HCB/2014/0905) and by the Spanish Regulatory Authorities (EudraCT 2014-002281-70). All participants provided written informed consent in accordance with the principles expressed in the Declaration of Helsinki and local personal data protection law (LOPD 15/1999).