Monitoring viral evolution and epidemiological characteristics of SARS-CoV-2 during 2022–2023 using Integrated Genomic Surveillance

Categories: Disease & Virus

May 27, 2026

At the RKI, we developed the IMSSC2 laboratory network, a geographically balanced nationwide laboratory network comprising 24 primary diagnostic laboratories from 14 of 16 federal states of Germany. Laboratories contributing to the SARS-CoV-2 genomic surveillance submit five randomly selected SARS-CoV-2-positive samples per week throughout the entire observation period. This number of samples corresponds to the minimum requirements for detecting SARS-CoV-2 lineages with a prevalence of 3% in the pool of all circulating variants, given an incidence of 50 cases per 100,000 people. SARS-CoV-2-positive patient materials are sent to the RKI for genome sequencing and phylogenetic analyses. Prototypic viruses are isolated from swab material via passaging in human Caco-2 cells and phenotypically assessed through replication analyses on human respiratory epithelial cell culture systems differentiated at the air-liquid-interface (ALI).

Sample Selection and RNA extraction

Samples of SARS-CoV-2-positive patient’s material were selected as described previously¹². In brief, IMSSC2 network laboratories routinely test samples from outpatient health centers and hospitals. For representative sampling, IMSSC2 network laboratories randomly select SARS-CoV-2 samples each week that meet the following criteria: (i) different zip codes to capture geographically diverse cases and minimize the likelihood of samples being collected from the same cluster, (ii) Ct values below 23, which are associated with more accurate whole genome sequencing (WGS) results. To ensure complete traceability of the samples, the sample labels and shipping material are delivered to the network laboratories in advance (before shipment) by the RKI.

For the extraction of total RNA from URT specimens (either nasal, nasopharyngeal or oropharyngeal swabs) at RKI, the Magna Pure 96 DNA and Viral RNA Small Volume kit (Roche Life Science, Mannheim, Germany) and the Magna Pure instrument (Roche Life Science, Mannheim, Germany) were used according to the manufacturer’s instructions. For approximately 10% of the sequenced samples, RNA extractions were already performed on-site at network laboratories, and extracted RNA was sent to RKI.

Sequencing and genome reconstruction

Nanopore libraries for SARS-CoV-2 sequencing were prepared using the NEBNext® ARTIC SARS-CoV-2 Companion kit (New England Biolabs, Frankfurt am Main, Germany) according to the manufacturer’s protocol employing Mosquito HV Genomics and Dragonfly (SPT Labtech, Melbourne, UK) liquid handling instruments. We utilized the ARTIC V4 to V5.3.2 primer sets for amplicon generations (see date, sets and primer sequences provided in Zenodo¹⁵). Barcoding of the samples was performed using the Native Barcoding Expansion kit (EXP-NBD196) and the Ligation Sequencing kit (SQK-LSK109) from Oxford Nanopore Technology. The prepared libraries, consisting of 24 to 96 samples, were pooled and loaded into a R9.4.1 flow cell and from 20.01.2023 onward on R10.4.1 for 12 to 24 h, depending on the number of samples per run, resulting in an average of 116k reads per sample, which could be attributed to SARS-CoV-2. Consensus genomes were then timely reconstructed with most current poreCov version available at the respective time¹⁶ with basecalled FASTQ files. In brief, the read quality is summarized by NanoPlot¹⁷ and taxonomical classification by Kraken2¹⁸ and krona¹⁹. The raw FASTQ files are filtered by length and genomes are reconstructed by the ARTIC pipeline (https://github.com/artic-network/fieldbioinformatics) using medaka as variant caller (https://github.com/nanoporetech/medaka). Thereafter, the lineage of the reconstructed genomes is assigned with pangolin (https://github.com/cov-lineages/pangolin), mutations are analyzed with Nextclade²⁰ and the genome quality is assessed by PRESIDENT.

Curation and quality control of genomic sequences

IMSSC2 laboratory network dataset

To prepare the IMSSC2 dataset, all samples collected in the IMSSC2 laboratory network between December 1st, 2021, and April 30th, 2023, were selected. The final dataset contained a total number of 4595 randomly sampled sequences with valid quality control (QC) criteria. We re-ran poreCov v1.9.2 in FASTA-mode with ‘–n_threshold 0.2‘ on the final data set (n = 4595 sequences) for a uniform QC output and lineage assignment. For sequence QC, poreCov uses PRESIDENT (v0.6.8) (https://github.com/rki-mf1/president) to compare the reconstructed sequences to the Wuhan reference sequence (NC_045512.2) using a sequence similarity threshold of 90% and allowing up to 20% N bases (‘–n_threshold 0.2‘). The N content across the IMSSC2 dataset has a median of 1.47% (0.42% min., 1.75% mean, 9.71% max.). The overall IMSSC2 dataset sequence identity to the Wuhan reference has a median of 98.01% (89.90% min., 97.83% mean, 99.42% max.). PANGO lineages were assigned using pangolin version 4.3 with pangolin-data version 1.23.1 (https://github.com/cov-lineages/pangolin)²¹.

DESH dataset

The DESH dataset contains the nationwide collection of all samples sequenced in Germany during the pandemic’s acute phase and in compliance with the CorSurV. Reconstructed full-genome consensus sequences were submitted to the central collection platform DESH. The RKI published the DESH sequences on GitHub and Zenodo⁹. Duplicates of full-genome sequences were rejected during submission based on the transaction ID, submitting lab, sampling date, FASTA sequence, and header. All QC-filtered and randomly collected sequences only with a sampling date between December 1st, 2021, and April 30th, 2023, were extracted. The resulting dataset contained n = 511,533 sequences. Lineages were assigned as described above.

Phylogenetic tree inference

Genomes that belong to the randomly sampled IMSSC2 dataset (n = 4595) were aligned with MAFFT v7.490 using default parameters²². Phylogenetic inference was performed with IQ-TREE v2.2.0.3²³ under the GTR + F + R2 evolutionary model, using 1000 ultra-fast bootstrap replicates²⁴, and the resulting tree was visualized and colored in Iroki²⁵. Phylogenetic analysis revealed five long branch attractions, which were subsequently removed with treeshrink v1.3.9 using default parameters²⁶.

Analysis of COVID-19 case epidemiological data

Epidemiological data of COVID-19 cases with virus genomic information were retrospectively analyzed. Data for laboratory-confirmed COVID-19 cases were provided through the mandatory German national surveillance system by public health authorities. From December 1st, 2021, to April 30th, 2023, 32,468,122 COVID-19 cases were notified to the national surveillance (Fig. 1 and Table S1).

**Fig. 1: Selection of COVID-19 cases.**

As of December 9th, 2023, 516,128 genome sequences of SARS-CoV-2 lineages were available, originating from both the IMSSC2 laboratory network and DESH, and were submitted as randomly selected samples. The integrated dataset was generated by merging consensus SARS-CoV-2 sequences, assigned PANGO lineages, and epidemiological data on the case level using a unique identifier provided in the data from the German national surveillance system and in the metadata of SARS-CoV-2 genomes. This unique identifier was also utilized to deduplicate COVID-19 cases recorded more than once. Additionally, COVID-19 cases with implausible data for age and sampling date were excluded. The resulting cleaned integrated database contained 272,770 COVID-19 cases (Fig. 1 and Table S1). Cases were further selected for retrospective analysis based on inclusion/exclusion criteria outlined in Fig. 1. First, only cases infected with the most prevalent SARS-CoV-2 variants over a continuous period of at least 8 weeks in Germany were included. Based on this criterion, individuals infected with lineages BA.2, BA.5.1, BQ.1.1, and recombinant lineage XBB.1.5 were selected. In order to provide a comprehensive description of the reporting period, cases infected with BA.1, as the earliest Omicron lineage, as well as cases infected with XBB.1.9.1 and XBB.1.9.2, classified as Variants under Monitoring (VUM) by the WHO in spring 2023, were included in the analysis. Sublineages of the aforementioned lineages and other SARS-CoV-2 variants occurring during the study period were excluded. The final dataset comprised 84,639 COVID-19 cases (Fig. 1). To investigate the distribution of infections among different age groups, individuals were categorized into six age groups: 0–4, 5–14, 15–34, 35–59, 60–79, and 80 years and older. Second, to analyze the association between hospitalization and independent variables using logistic regression, a subset of data was generated, by excluding COVID-19 cases with missing information on hospitalization status, sex, age, and month of diagnostic sampling. The final hospitalization dataset comprised 33,632 COVID-19 cases (Fig. 1).

Cell culture

For cultivation of Caco-2 (ATCC HTB-37) and Vero E6 (ATCC CRL-1586) cells, Dulbecco’s Modified Eagle Medium (DMEM, Thermo Fisher Scientific, Darmstadt, Germany, 52100) containing 10% fetal bovine serum (FBS, Merck, Darmstadt, Germany, F7524) supplemented with 2mM L-glutamine (Carl Roth, Karlsruhe, Germany, HN08.3), 100 U/ml penicillin, 100 μg/ml streptomycin (PAN-Biotech, Aidenbach, Germany, P06-07100), 1x non-essential amino acids (Carl Roth, Karlsruhe, Germany, 9185.1), and 1 mM sodium pyruvate (Carl Roth, Karlsruhe, Germany, 9182.1) was used.

Human Alveolar Epithelial Lentivirus immortalized (hAELVi, inSCREENex, Braunschweig, Germany, INS-CI-1015) cells were cultivated in huAEC Medium (inSCREENex, Braunschweig, Germany, INS-ME-1013-500 ml). Prior to application of the cells, the culture flasks were coated with huAEC Coating solution (inSCREENex, Braunschweig, Germany, INS-SU-1018-100 ml). For polarization, approximately 9 × 10⁵ cells were seeded into the apical chamber of a pre-coated filter insert. Cells were initially incubated under liquid-liquid-conditions for 3 days, followed by cultivation under ALI conditions for up to 28 days. For infection experiments, polarized hAELVi cells were used following incubation under ALI conditions for at least 21 days.

Reconstructed human nasal and bronchial epithelium cultures were obtained from Epithelix (Plan-les-Ouates Switzerland, MucilAir^TM EP01MD) and further cultivated in MucilAir™ culture medium (Epithelix, Plan-les-Ouates, Switzerland, EP05MM) under ALI conditions upon arrival, according to the manufacturer’s instructions. Nasal and bronchial cultures were derived from 3 healthy single donors each. All cells were incubated in a humidified atmosphere at 37 °C with 5% CO₂.

Isolation of SARS-CoV-2 viruses from patient samples

For isolation of primary SARS-CoV-2 isolates, selected samples were sterile filtered (0.2 μm) and subsequently used to inoculate approximately 2 × 10⁵ Caco-2 cells after the presence of a particular virus lineage had been determined by WGS. After incubation at 37 °C and 5% CO₂ for 72 h, the supernatant was harvested and used for high-titer stock production. For preparation of virus stocks, approximately 1 × 10⁷ Caco-2 cells were infected at a multiplicity of infection (MOI) of 0.001 and incubated at 37 °C and 5% CO₂ for 48 h. After incubation, cell debris was removed by centrifugation, and aliquots of stock solution were stored at -80 °C. The absence of second-site mutations was confirmed by WGS. Virus isolation and subsequent assays with infectious material were performed under biosafety level (BSL) 3 conditions at the RKI, Berlin.

Infection of human respiratory cells and virus titration on Vero E6 cells

Cells were infected with SARS-CoV-2 D614G (hCoV-19/Germany/BW-RKI-N-0001/2020, GISAID accession: EPI_ISL_481253), SARS-CoV-2 Delta B.1.617.2 (ENA project PRJEB50616; sequence ID IMSSC2-206-2021-00148), SARS-CoV-2 Omicron BA.2 (ENA project PRJEB55524; accession ID ERS12788649), SARS-CoV-2 Omicron BA.5.1 (GISAID accession: EPI_ISL_14419656), SARS-CoV-2 Omicron BQ.1.1 (GISAID accession: EPI_ISL_16883461), SARS-CoV-2 Omicron XBB.1.5 (GISAID accession: EPI_ISL_18530775), SARS-CoV-2 Omicron XBB.1.9.1 (GISAID accession: EPI_ISL_17006863), or SARS-CoV-2 Omicron XBB.1.9.2 (GISAID accession: EPI_ISL_17069408), respectively.

Cells were washed once with PBS (Vero E6) or D-PBS (ALI human cell cultures) and then inoculated with virus diluted in D-PBS/0.3% BA. For ALI cultures, the virus solution was applied to the apical chamber of the filter insert. After incubation for 1 h at 37 °C, cells were washed twice with PBS or D-PBS, as appropriate, and fresh medium was added to the cells. For ALI cultures, medium was added to the basolateral compartment of the filter insert.

To perform replication analysis, supernatants were harvested at indicated time points and stored at -80 °C until titration by standard Plaque Assay on Vero E6 cells to quantify infectious virus particles. For replication analysis on Vero E6 cells, 10% of the supernatant was harvested and refilled with fresh culture medium. To collect samples of ALI cultures, 50 μl (MucilAir^TM) or 250 μl D-PBS (hAELVi), was used for apical washes at 37 °C for 30 min. The increase of viral titers during the early infection phase was calculated using linear regression between 0 and 16 h post infection (p.i.) from replication analyses.

Plaque reduction neutralization test

Plaque reduction neutralization test (PRNT) was performed as described previously¹². Briefly, 1.6 ×10⁵ Vero E6 cells were plated in 24-well plates the day before. WHO reference serum panels NIBSC 21/338 (a pool of 265 SARS-CoV-2 seropositive donors) or NIBSC 20/142 (a pool of SARS-CoV-2 negative human plasma), respectively, were 2-fold serially diluted and incubated with 50 PFU of SARS-CoV-2 isolates in a total volume of 200 μl for 1 h at 37 °C. The mixture was then used to infect the cells for 1 h at 37 °C. After aspiration of the inoculate, cells were grown for three days in Avicel plaque medium and stained with crystal violet. The PRNT50 titer represents the reciprocal value of the highest serum dilution that reduces plaque number by at least 50% compared to untreated infection.

Cytokine ELISA

Basolateral supernatants of infected cells were used to quantify immune activation. Samples were analyzed according to the manufacturer’s instructions using the Human IFN-beta DuoSet ELISA Kit (R&D Systems Inc., Minneapolis, USA, DY814) and the Human IL-29/IL-28B (IFN-lambda 1/3) DuoSet ELISA Kit (R&D Systems Inc., Minneapolis, USA, DY1598B).

Statistics and reproducibility

Statistical analyses of epidemiological data were performed using R version 4.3.0²⁷. Categorical variables are presented as numbers and percentages of patients. Percentages were calculated based on all observations, including missing values for data completeness. To assess the distribution of infections across sex, age groups, hospitalized cases, mortality, and vaccination status within individual SARS-CoV-2 lineages (BA.1, BA.2, BA.5.1, XBB.1.5, XBB.1.9.1, and XBB.1.9.2), χ² test was applied. The category “missing” was excluded from the analysis when a variable had less than 5% missing values. For variables with more than 20% missing values, the analysis was conducted both with and without these values. In order to compare disease severity between SARS-CoV-2 variants hospitalization was used as an outcome and regression analyses were performed. To this end, a data subset excluding COVID-19 cases with missing values in any of the covariates was generated (Fig. 1). Independent variables included sex assigned at birth, age group, SARS-CoV-2 lineages, and the month of sampling date of the diagnostic sample. Univariate logistic regression models were fitted to examine the associations of each independent variable and the dichotomous outcome (hospitalization or non-hospitalization), and unadjusted odds ratios (OR) are presented. To account for potential confounding, multivariable logistic regression was used to analyze the association between multiple independent variables and the dichotomous outcome, with adjusted ORs (adjOR) presented. To assess the robustness of results, a sensitivity analysis was performed, including all sublineages of BA.1, BA.2, BA.5, BQ.1, XBB.1.5, XBB.1.9.1, and XBB.1.9.2 in the analyses on infection distribution and odds of hospitalization.

The non-parametric, two-tailed Spearman correlation test was used to analyze statistical correlation of SARS-CoV-2 lineage distribution captured by the IMSSC2 laboratory network and the DESH platform (*p p p p p p p

Ethical declaration

All investigations were carried out in accordance with the principles set forth in the Helsinki Declaration. Only pseudo-anonymized surveillance data were analyzed. For the analysis of surveillance data from the mandatory notification system, an ethical statement is not required according to the German Infection Protection Act. The linkage and processing procedures of epidemiological, clinical, and genomic data were conducted in compliance with §13(3) of the German Infection Protection Act, which permits the transfer of pathogen material and associated pseudonymized case data to designated institutions such as RKI, for surveillance and further epidemiological analyses. Epidemiological analyses were conducted in compliance with the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines.

Source link