Evaluating the association between COVID-19 transmission and mobility in omicron outbreaks in China

Categories: Disease & Virus

May 21, 2025

Case data

Case data from 1 January 2022 to 27 November 2022 were obtained from the daily notification of COVID-19 on the National Health Commission of the People’s Republic of China website²². Cases were reported based on the date of detection. It was anticipated that the report delay was minimal due to the stringent city-wide measures implemented in China during the outbreaks²³. In China, cases were classified as either local (domestic) cases or imported cases. Local cases, including both symptomatic and asymptomatic cases, were used in our study. Cities were coded according to Baidu cityCode²⁴. Case data for cities in Yunnan and Xinjiang provinces were missing and therefore not included in our analysis. In total, data from 336 cities were included in our study.

Mobility index data

The daily mobility index data used in this study were sourced from Baidu mobility Big Data^25,26, which is based on the widely used mapping service Baidu Maps in China, similar to Google Maps. Baidu mobility data is collected based on Baidu’s location-based service technology, offering insights into city-specific and temporal migration patterns. Using the location-aware devices, Baidu mobility data captures the spatial-temporal trajectories of daily population movements within communities.

While it may not capture all migrations, it remains valuable for analyzing population flow patterns across different cities and times. To access the Baidu mobility data, hypertext markup language (HTML) requests were sent to the Baidu mobility platform (http://qianxi.baidu.com/). To protect user privacy, the platform records daily travel flows for cities and aggregates this information into an index for cross-city comparisons.

For our study, Baidu mobility data were collected for 366 prefecture-level cities, including three mobility indices: within-city movement, inter-city inflow, and inter-city outflow. The inter-city inflow and outflow index reflects the magnitude of population migration between cities. The within-city movement is calculated as an index based on the ratio of daily intra-city trips to the resident population²⁷. The mobility index can be compared across cities. To avoid weekly fluctuations induced by the work-leisure shift, the daily mobility index was smoothed using a moving average over a 7-day window.

Government response index (GRI)

The daily government response index (GRI) was obtained from the publicly available Oxford COVID-19 Government Response Tracker (OxCGRT)²⁸. The OxCGRT is a comprehensive dataset that captures the diverse government policies implemented in response to the global COVID-19 pandemic, spanning across more than 180 countries. Within the OxCGRT, the GRI) stands out as a reliable and thorough index, effectively portraying the wide range of policy modifications enacted by governments. The GRI comprised 13 indicators²⁹, including containment and closure indicators, economic response indicators, and health systems indicators (Supplementary Method and Supplementary Table 1). The GRI was constructed at the provincial level.

To construct the GRI at the city level, we first screened the textual notes about implementation and cessation of various public health measures at the city level, to ensure that those cities were included in the consideration of GRI. Then, on the days when the daily case count for a city accounted for 80% or more of the total cases in the corresponding province, the GRIs at the province level were considered as the GRI for that city. We conducted a sensitivity analysis with different thresholds, including 70, 90, and 100%.

Definition of outbreaks

An outbreak was defined as 20 or more cases occurring in a single day³⁰. The start date of an outbreak was defined as the date on which the first case (symptomatic or asymptomatic) occurred, going backward from the date on which there were more than 20 cases in a single day. The end of an outbreak was defined as the day with no new cases for 7 consecutive days after the peak of the outbreak. Outbreaks with a duration longer than 14 days were included in the study.

Estimation of time-varying effective reproduction number (R_t)

Since there was pre-symptomatic transmission for SARS-CoV-2³¹, reconstructing the epidemic curve by the date of infection could provide a more accurate estimation of R_t. As the case data was recorded based on the report date, we first reconstructed the epidemic curve by infection date based on the epidemic curve by the report date using a deconvolution approach³², with the distribution of the delay from infection to report (Supplementary Method). Then, we estimated the R_t based on the Poisson framework in Cori et al.¹.

Relationship between transmission and mobility

We computed the cross-correlation between R_t and mobility indices for each identified outbreak using the Pearson correlation and selecting the optimal lag day based on the highest correlation. We combined correlation coefficients from each city and weighted the standard error of estimates to generate a weighted average (Supplementary Method).

Furthermore, we adopted rolling correlation between R_t and three mobility indices to measure and visualize short-term but potentially time-varying correlations^20,33. A detailed comparison of cross-correlation and rolling correlation is presented in Supplementary Table 2. We performed biweekly rolling correlation analysis, where the correlation on day t was estimated based on the two time series data from day t −13 to t, covering a period of 14 days. We only included city outbreaks with a duration of at least 42 days to ensure a sufficient amount of data for estimation. We also conducted triweekly rolling correlation analyses to further explore the sensitivity of our results (Supplementary Method).

To determine if using rolling correlation was necessary, i.e., the magnitude is changing during outbreaks and not constant, we employed the non-linear least squares method to fit the rolling correlation for each outbreak. We fitted five models (constant, linear, quadratic, sine, and cosine) and chose the optimal one based on the smallest Akaike information criterion value. The inclusion of the constant model was based on its frequent adoption in prior analyses^34,35, reflecting situations where the model assumes a consistent relationship between R_t and mobility. We also compared the cross-correlation coefficient with the minimum and maximum values of the rolling correlation coefficient.

Factors affecting cross-correlation and rolling correlation

We aimed to investigate what factors may impact the cross-correlation and rolling correlation. Regarding cross-correlation, we conducted Pearson correlation tests on several potential factors, including outbreak duration, peak value of R_t, and GRI (Supplementary Method).

Regarding rolling correlation, we investigated whether rolling correlations were different by stages of outbreaks and level of GRI (Supplementary Method). Outbreaks were divided into two stages, namely the pre-peak and post-peak stages, based on the peak value of Rt. We employed a mixed-effect regression to assess the impact of different stages and GRI on the rolling correlation between Rt and the mobility index. Analysis stratified by outbreak stage was also conducted. The mixed-effect regression model included a random intercept term to account for variations among different outbreak cities. The rolling correlation for each outbreak was used as the outcome variable. The city-level GRI and different stages served as the predictor variables, respectively. We applied a Fisher transformation to the rolling correlation before fitting the models. In addition, we employed k-means clustering to identify patterns in rolling correlations across outbreaks and subsequently evaluated the associations between the resulting clusters and key urban characteristics (Supplementary Method and Supplementary Figs. 11–13).

Statistics analyses

Processing Python 3.8.6 (Python Software Foundation) and related libraries were utilized to capture the required data. All analyses were performed using R software version 3.6.3 (R Foundation for Statistical Computing, Austria). Uncertainty in effective reproduction number (R_t) was quantified as mean ± standard deviation. For correlation analyses, 95% confidence intervals were generated through 200 iterations of nonparametric bootstrap resampling. The significance level for all tests was set at p < 0.05.

Informed consent was not required for this study since the data used was obtained from publicly available data sources.