Identification of novel therapeutic targets in hepatitis-B virus-associated membranous nephropathy using scRNA-seq and machine learning

Categories: Disease & Virus

May 30, 2025

Cell culture and transfection

Human podocytes were acquired from BeNa Culture Collection (Henan, China) and cultured in RPMI 1640 medium (Pronoza, Cat.No PM150110) supplemented with 10% fetal bovine serum (GIBCO, Cat.No 10099141) and 1% penicillin/streptomycin solution (Meilune, Cat.No MAO110), in a CO2 incubator (PHCBI, MCO-18 AC) at 37 °C with 5% CO2. The plasmid encoding the HBx gene (pHBx) and the corresponding negative control plasmid were constructed and purified by Biogenetech (Anhui, China), ensuring high purity and concentration suitable for transfection. Podocytes were seeded onto 6-well plates at a density of 5 × 10⁴ cells/cm² to allow for optimal growth and confluency. Cells were grown until they reached approximately 70% confluence, which is the optimal condition for transfection. Transfection was carried out using Lipofectamine 3000 (Invitrogen, USA) according to the manufacturer’s instructions. The experiment included four groups: control group: cells with no treatment; empty plasmid group: cells transfected with empty plasmid; HBx group: cells transfected with the pHBx plasmid; HBx + miR-223-3p mimic group: cells co-transfected with pHBx and miR-223-3p mimic to study the interactive effects on cell behavior and gene regulation. Transfection efficiency was assessed using quantitative polymerase chain reaction (qPCR) to detect the expression levels of specific genes associated with the plasmid vectors and miR-223-3p mimic.

ROS level determination

Podocytes were washed with PBS and incubated with 10 µM DCFDA from the ROS ELISA Kit (Shanghai Kexing, Cat.No F9689-B) for 30 min at 37 °C in darkness. Post-incubation, cells were washed to remove excess DCFDA, lysed, and centrifuged at 12,000×g for 15 min at 4 °C (Neofuge 13R, Heal Force). 100 µl of supernatants were added to a 96-well plate and processed as per the kit’s instructions. Fluorescence was measured at 450 nm using a Multiskan Sky microplate reader (Thermo Fisher). The optical density(OD) values obtained were normalized to the control group to assess ROS levels.

LDH level determination

Podocytes underwent a PBS wash and subsequent lysis. Following this, cell lysates were subjected to centrifugation at 12,000×g for 15 min at a temperature of 4 °C using a Neofuge 13R centrifuge (Heal Force). The supernatants were then carefully collected, with 100 µl transferred into each well of a 96-well plate, prepared in accordance with the protocol provided with the LDH ELISA Kit (Shanghai Kexing, Cat.No F0350-B). Absorbance levels were then assessed at 450 nm using the Multiskan Sky microplate reader (Thermo Fisher). To gauge cellular damage, the OD measurements were standardized against the control group, focusing on the LDH levels indicated.

Cell viability assessment using MTT Assay

Podocytes were trypsinized using 0.25% trypsin (Biosharp, Cat.No PGY0021) and were then seeded in a 96-well plate (JET Biofil, Cat.No TCP011096) at a density of 5,000 cells per well and allowed to adhere overnight. After treatment according to the experimental groups, 20 µl of MTT solution (Sangon Biotech, Cat.No CB0331) was added to each well and incubated for 4 h at 37 °C. Post-incubation, the medium was removed, and 150 µl of dimethyl sulfoxide (DMSO) was added to dissolve the formazan crystals. The plate was gently shaken for 10 min to ensure complete dissolution. The absorbance was measured at 490 nm using a Multiskan Sky microplate reader (Thermo Fisher). OD values obtained were used to calculate the cell proliferation rate across different treatment groups. Results were expressed as OD490/fold, which represents the proliferation fold increase from 24 to 96 h relative to the 24-h measurement for each group.

Quantitative real-time PCR (qPCR)

48 h post-transfection, cells were lysed using a high-speed low-temperature tissue grinder (Servicebio, KZ-III-F). RNA was extracted using TRIzon Reagent (Kangwei Century, CW0580S) and purified using UltraPure RNA Kit (Kangwei Century, CW0581M), followed by centrifugation in a tabletop high-speed refrigerated centrifuge (Hettichi, 0004219-11). Reverse transcription was performed with Evo M-MLV RT Premix (Acrobiosystems, AG11706) inside a clean bench (Suzhou Purification, SW-CJ-1 FD). The qPCR was conducted on a CFX Connect Real-Time PCR system (Bio-Rad, CFX connect), using SYBR Green Pro Taq HS qPCR Kit (Acrobiosystems, AG11701). The reaction consisted of 12.5 µL SYBR Green mix, 1 µL each of forward and reverse primers (Beijing Dingguo Biotechnology), and cDNA in a total volume of 25 µL. Cycling conditions were 95 °C for 2 min, followed by 40 cycles at 95 °C for 3 s, 60 °C for 30 s, and 70 °C for 10 s. A melting curve was analyzed post-amplification. The internal reference for miRNA-223-3p was U6, and for other mRNAs it was GAPDH. The primer sequences are shown in Table 1. Gene expression was quantified using the 2^-ΔΔCT method and analyzed on the mµLtiskan Sky microplate reader (Thermo Fisher). Primer sequence information for the PCR experiments is provided in Table 1. Each experiment was performed in triplicate.

Table 1 Primer sequences used for quantitative real-time PCR (qPCR).

Western blot

Cells were lysed with enhanced RIPA buffer (Biosharp, AR0102) containing 100 mM PMSF (Biosharp, AR1192). Protein concentrations were determined using a BCA Kit (Beyotime, P0012). Lysates were mixed with 5 × loading buffer (Biosharp, AR1112-10) and heated for denaturation. Proteins were separated by 12.5% SDS-PAGE (Yeasen, PG113) and transferred to PVDF membranes, 0.45 µm for high (Millipore, IPVH00010) and 0.22 µm for low molecular weight proteins (Millipore, ISEQ00010). Membranes were blocked with 5% milk in TBS with 0.1% Tween 20 (Solarbio, T8220) and incubated overnight at 4 °C with primary antibodies against CRIM1 (Thermo Fisher, PA5-34410, 1:1000) and GAPDH (Proteintech, 60004–1-Ig, 1:10000). After washing, they were incubated with HRP-conjugated secondary antibodies (Zhongshan Golden Bridge, ZB-2305 for mouse, ZB-2301 for rabbit, 1:10000). Chemiluminescent detection was performed using an ECL kit (Servicebio, G2014). Band intensity was quantified using ImageJ, with GAPDH as the loading control.

Dual-luciferase reporter (DLR) assay

293 T cells were revived and cultured under standard conditions. For co-transfection, 293 T cells were seeded in 96-well plates at 2 × 10^4 cells/well in DMEM with 10% FBS. The next day, cells were co-transfected with plasmids containing the CRIM1 3’UTR and miR-223-3p using Lipofectamine 2000. After 48 h, luciferase activity was measured using the Dual-Luciferase Reporter Assay System (Promega). Luminescence was recorded by first adding LAR II reagent, followed by Stop & Glo® Reagent, and reading the signal on a luminometer.

Data collection

The total of 2 single-cell transcriptomics datasets, including 1 HBV-MN samples and 2 health sample, were collected from public dataset Gene Expression Omnibus (GEO). The GSE199850 dataset contained one sample from human kidney from patient with HBV-associated MN⁹. The GSE171458 dataset analyzed kidney samples from 6 patients with IMN and 2 healthy control subjects using single-cell RNA sequencing, healthy control samples were selected by our study¹⁰. All the datasets were processed in R (V.4.2.3), and the results were showed using ggplot2 R package (V.3.4.3) except where mentioned.

Data quality control and preprocessing

Quality control was rigorously applied following the generation of the expression matrix using Seurat V4.4.0¹¹..For downstream analysis, we selected cells with unique molecular identifiers (UMIs) ranging from 0 to 30,000. Cells with fewer than 200 or more than 5,000 unique expressed genes were excluded to ensure high data quality. Doublet cells were identified and removed using DoubletFinder, enhancing the accuracy of differential gene expression analysis. Additionally, cells were excluded if over 50% of their transcriptomes were comprised of mitochondrial genes, which is indicative of cellular stress or damage. After these stringent quality control measures and further corrections for batch effects, a total of 12,521 cells were analyzed, including 8,982 from healthy controls and 3,539 from HBV-MN patients. The data underwent natural log transformation and normalization to facilitate robust downstream analyses.

Cell type classification and marker genes analysis

The Seurat package was utilized to analyze RNA-Sequencing data, facilitating cell type identification and clustering analysis. We employed the shared nearest neighbor (SNN) model for clustering and visualized cellular distribution through dimension reduction techniques including PCA, tSNE, and UMAP. Differences among clusters were assessed using the Wilcoxon rank-sum test, and marker genes for each cluster were identified through the FindAllMarkers function, which implements a likelihood-ratio test to refine the differential gene list. Further, sub-clustering analysis of podocytes was conducted using Seurat’s SubsetData function. This comprehensive approach enabled a nuanced exploration of the cellular heterogeneity within the dataset.

Differentially expressed genes analysis

To identify potential differentially expressed genes (DEGs), we employed the Seurat FindMarkers function, which utilizes the Wilcoxon likelihood-ratio test with default parameters. Criteria for selecting DEGs included expression in over 10% of cells within a cluster, an average log-fold change greater than 0.25, and a p-value less than 0.05. Additionally, we characterized the DEGs by comparing the transcriptional profile of HBV-MN with that of healthy donor subjects, enabling a focused investigation into the molecular distinctions driving pathological changes.

Cell type annotation

Cell types were initially annotated automatically using SingleR(V.2.0.0), followed by manual verification based on canonical marker genes¹². The top DEGs in each cluster were determined by the expression patterns of these canonical markers to accurately define cell types. Seurat was utilized to visualize the expression of these markers through heatmaps and dot plots. Additionally, cells identified as doublets, expressing markers for multiple cell types, were manually excluded from the analysis to ensure the accuracy of the cell type classification.

High dimensional WGCNA (hdWGCNA)

In this study, we applied the hdWGCNA R package for weighted gene co-expression network analysis (WGCNA) of scRNA-seq data, focusing on podocytes identified by Seurat¹³. We filtered genes expressed in fewer than 5% of cells and normalized data using the ‘sctransform’ function. Pearson correlations were calculated to assess gene interconnectivity. Using the ‘pickSoftThreshold’ function, we established a scale-free network, generating an adjacency and topological overlap matrix (TOM). Co-expression modules were delineated using the ‘dynamicTreeCut’ algorithm, assigning each a unique color. Hub genes within modules were evaluated for connectivity using eigengene-based connectivity (kME) values from the ‘ModuleConnectivity’ function. High kME values identified central genes to each module. Module eigengenes (MEs) were computed with the ‘GetMEs’ function, summarizing gene expression profiles. The top 25 hub genes, identified by their kME values through the ‘GetHubGenes’ function, were selected for further analysis, including functional enrichment and clinical correlations¹⁴. This streamlined approach with hdWGCNA version 0.2.14 ensured accurate and reproducible findings.

Trajectory analysis

To unravel the dynamic gene expression changes across single-cell pseudotime trajectories, we employed the Monocle 2 (version 2.26.0) specifically for podocyte subtypes¹⁵. Monocle’s single-cell trajectory analysis technique uses an algorithm to map the sequence of gene expression changes each cell undergoes during a biological process, placing each cell at its respective position along the trajectory. This method allows for a comprehensive visualization of cellular transitions in gene expression. Criteria for selecting genes to order cells included a presence in at least 10 cells, a mean expression value of at least 0.05, and a dispersion empirical value of at least 2. Cells were subsequently ordered along this trajectory, and their progression was visualized in a reduced dimensional space to illustrate the developmental continuum. Significant changes in gene expression along the pseudotime trajectory were identified using Monocle’s differential Gene Test function, applying a stringent q-value cutoff of less than 0.01. These changes were depicted through branched-heatmap and scatter plot images, providing a detailed representation of gene expression dynamics that define cell state transitions within podocytes.

Cell–cell interaction analysis

After cell type identification, we explored cell–cell communication using CellChat (version 1.6.1)¹⁶. Initially, a new CellChat object was established based on the merged Seurat object. We utilized the paracrine/autocrine signaling interaction dataset from CellChatDB as the reference database for analysis. Communication probabilities were calculated using the ‘computeCommunProb’ function with the”truncatedMean”type, setting a trimming parameter of 20% (trim = 0.2). This step was crucial for accurately estimating the likelihood of communication based on a robust statistical approach. Subsequently, cell–cell communication was inferred, and the network was aggregated using the default settings within CellChat. This process synthesized the individual communication events into a comprehensive network, enhancing the interpretability of the data. The aggregation and analysis culminated in the visualization of the number of interactions, which illustrated the extensive cell–cell communication network.

Prediction of target genes

To predict the downstream target genes of miR-223-3p, we utilized three established databases: miRDB, miRnet, and TargetScan^17,18,19. These databases are instrumental for identifying potential regulatory relationships based on existing biological data. To enhance the reliability of our predictions, we applied the VennDiagram package(V.1.7.3) in R to determine the intersections among the gene sets predicted by each database. This intersection approach helped refine our results, ensuring that only the most likely target genes, consistently identified across multiple predictive platforms, were considered in our analysis.

Machine learning

To prioritize the most significant genes, we employed three machine learning techniques and divided the data into a testing set (70%) and a training set (30%). To safeguard against overfitting, each model was subjected to fivefold cross-validation. First, we utilized the RandomForest(RF) algorithm(V.4.7.1.1), employing the randomForest package²⁰. RF is an ensemble learning method that constructs a multitude of decision trees, each trained on a random subset of features. Importantly, RF calculates the average contribution of each feature, which allows us to rank genes based on their importance in the model—making it particularly suitable for our high-dimensional data. By assessing the contribution values, we ranked the genes based on their importance in the model. Next, we used the xgboost package (V.1.7.5.1) for implementing extreme gradient boosting (XGBoost). XGBoost is a powerful integrated learning algorithm known for its scalability and efficiency in handling diverse data types, enhancing both model visualization and optimization²¹. It uses a gradient boosting framework to iteratively refine the model by minimizing the prediction error, thereby capturing intricate interactions among features. This iterative optimization not only enhances model performance but also provides a clear visualization of feature importance, complementing the insights gained from RF. Lastly, we applied the k-nearest neighbors (KNN) algorithm using the kknn package (V.1.3.1). KNN is a supervised learning algorithm that classifies data points based on the proximity of similar data in the training set, utilizing a majority voting rule among the k-nearest neighbors to determine the classification²². Although KNN is less complex than the ensemble methods, it serves as an independent validation tool, ensuring that the gene ranking and classification achieved by RF and XGBoost are robust and consistent. To evaluate the diagnostic performance of each model, we analyzed the receiver operating characteristic (ROC) and calculated the area under the curve (AUC).

Statistical analysis

All data visualization and statistical analyses were performed using R software. We assessed differences between two groups for continuous variables using the independent t-test and for categorical variables using the Chi-squared (X^2) test. A two-sided p-value of less than 0.05 was deemed to indicate statistical significance. This approach ensured a rigorous evaluation of the data, facilitating robust conclusions from our study.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used ChatGPT to assist in drafting and refining the manuscript content. After using this tool, the authors reviewed and edited the content as needed and takes full responsibility for the content of the publication.

Source link