Stock Ticker

Assessing the influencing factors of dengue fever in Chinese mainland based on causal analysis

Data on cities and reported dengue fever cases

This paper collects the monthly reported dengue fever cases in provinces of China from February 2005 to December 2019 for study within this range. The data is sourced from The Data-center of China Public Health Science(https://www.phsciencedata.cn/Share/), managed by the Chinese Center for Disease Control and Prevention. The study involves 93,699 reported dengue fever cases, covering 30 provincial-level administrative regions in mainland China (data from the Tibet Autonomous Region was excluded due to high missing rates). Among these, eight provinces had accumulated over 1,000 reported cases, with Guangdong Province having the highest number at 63,510 reported cases, followed by Yunnan Province with 13,917 reported cases. Additionally, 14 provinces had accumulated more than 200 reported cases, including Guangdong, Yunnan, Fujian, Zhejiang, Guangxi, Chongqing, Jiangxi, Hunan, Sichuan, Henan, Hainan, Jiangsu, Hubei, and Shandong. These regions are the primary focus of this study. Observing the temporal and spatial distribution of reported cases, it was found that the number of reported cases is mainly concentrated in southern China (Fig. 8A) and has shown an increasing trend year by year (Fig. 8B).

Fig. 8
figure 8

Spatiotemporal distribution of reported dengue fever cases, population, and GDP Data. (a) Cumulative number of reported dengue fever cases in each province from February 2005 to December 2019. (b) Annual case counts (log-transformed) of dengue fever for each province from February 2005 to December 2019. Provinces were sorted by latitude from high to low. (c) Population of each province as of December 2019. (d) Cumulative gross production value for each province from February 2005 to December 2019.

Data on population and GDP

The population and GDP data are sourced from the National Bureau of Statistics(https://www.stats.gov.cn/). This includes the year-end total population and quarterly GDP of each province. The year-end total population data was interpolated to obtain the monthly population for each region, and the quarterly GDP was averaged to derive the monthly GDP. The data indicate that regions with dense populations and high GDP are mainly distributed in southern and coastal areas of China (Fig. 8C-D).

Data on meteorological and mosquito

All meteorological data comes from the website Weather in 241 Countries Worldwide(https://rp5.ru/). For provincial-level studies, Select the data from one meteorological station within the jurisdictions of Beijing, Shanghai, Tianjin, and Chongqing, the four directly controlled municipalities; For provinces with more than five prefecture-level administrative regions, the average data from five meteorological stations was used; Since Qinghai and Hainan provinces have no more than five prefecture-level administrative regions, the average data from three meteorological stations were used. Daily data from 130 meteorological stations, collected from February 2005 to December 2019, were utilized. For the analysis within Guangdong Province, the meteorological stations corresponding to nine prefecture-level cities in Guangdong Province (Shenzhen, Guangzhou, Heyuan, Shantou, Yangjiang, Shanwei, Meizhou, Shaoguan, and Zhaoqing) were selected, and the involved meteorological indicators and their processing remained unchanged.

These meteorological indicators include: Average Temperature (T), which is the atmospheric temperature at 2 m above the ground (unit: °C). Minimum Temperature (Tn), which is the lowest temperature in the past period (not exceeding 12 h) (unit: °C). Maximum Temperature (Tx), which is the highest temperature in the past period (not exceeding 12 h) (unit: °C). Dew Point Temperature (Td), which is the temperature at which air becomes saturated and water vapor begins to condense into dew when cooled (unit: °C). Atmospheric Pressure (P), which is the atmospheric pressure at mean sea level (unit: mmHg). Horizontal Visibility (VV), which is the maximum distance at which an object can be clearly seen and identified in the horizontal direction (unit: km). Precipitation (RRR), which is the amount of rainfall (unit: mm). Relative Humidity (U), which is the relative humidity at 2 m above the ground (unit: %). Wind Speed (Ff), which is the average wind speed at 10–12 m above the ground over the 10 min preceding the observation (unit: m/s). The missing values in meteorological indicators were filled using linear interpolation. Daily precipitation records that were missing, indicated precipitation without measurement, or reported no precipitation have defaulted to 0 millimeters. Time-varying trends of nine meteorological factors were visualized for the 14 provinces with cumulative case numbers exceeding 200 during the study period (see supplementary information Fig.S1-Fig.S14).

The data representing the adult Aedes mosquito population, known as the Mosquito Oviposition Index (MOI), and the data representing the Aedes larvae population, known as the Breteau Index (BI), are sourced from the Guangdong Provincial Health Commission (https://wsjkw.gd.gov.cn). The mosquito data for nine prefecture-level cities in Guangdong Province (Shenzhen, Guangzhou, Heyuan, Shantou, Yangjiang, Shanwei, Meizhou, Shaoguan, and Zhaoqing) are selected. According to the monitoring guidelines for dengue vectors published by the Guangdong Provincial Health Commission, monitoring staff establish multiple monitoring points in each city to assess the populations of Aedes mosquitoes, including both Aedes albopictus and Aedes aegypti. The Guangdong Provincial Health Commission reports monitoring results every half month. These reports detail the proportions of four categories of monitoring points in each city—meeting epidemic prevention and control standards, low-density, medium-density, and high-density areas—with corresponding MOI and BI values for each category. Based on the distribution of monitoring points across different monitoring levels, we can calculate the data of 100 monitoring points for each city. Additionally, we excluded some months that did not report detection results. Furthermore, considering the differences in data reporting methods before 2015 and the potential impact of the COVID-19 pandemic on the population dynamics of Aedes mosquitoes after 2020, the study period was selected from 2016 to 2019 to ensure reliability and accuracy.

Causal analysis

The causal analysis employs a method specifically designed to determine causal relationships in ecological time series—CCM58. This method can accurately detect causal relationships between variables in nonlinear dynamical systems. Based on Takens’ embedding theorem, the dynamic characteristics of a multidimensional dynamical system can be captured through the time series embedding of a single variable. In CCM, the causal relationship between variables is inferred by assessing whether the historical time series of one variable can reliably predict or explain the state of another variable. That is, in a pair of time series \(\:(X,\:\:Y)\), if Y has higher predictive power for X, then a causal relationship in the direction of \(\:X\to\:Y\) can be detected. First, we prepared the state space\(\:{M}_{Y}\) reconstructed by the Y’s embedding dimension \(\:{E}_{Y}\). Then, in a leave-one-out manner, we predict X from \(\:{M}_{Y}\) using the simplex projection method, labeled as \(\:{\widehat{X}}^{Y}\). The Pearson correlation coefficient between the X and the \(\:{\widehat{X}}^{Y}\), denoted as \(\:{\rho\:}_{X\to\:Y}^{CCM}=\left|Corr(X,\:\:{\widehat{X}}^{Y})\right|\), is used to quantify the CCM skill. The CCM skill value approaching 1 indicates that Y can be predicted from X more accurately, which means the causal relationship from X to Y is stronger.

Specifically, the CCM method involves the following steps: First, the optimal embedding dimension is determined using the simplex projection method, where the optimal embedding dimension corresponds to the highest prediction skill. Second, the S-mapping (Sequential Local Weighted Global Linear Mapping) technique is used to test for the presence of nonlinear relationships. Finally, with the optimal embedding dimension and the presence of nonlinear relationships ensured, the CCM skill is calculated. If the CCM skill improves and shows a convergence characteristic as the time series length increases, it can be inferred that there is a causal relationship between the variables. With limited or noisy field data, CCM is demonstrated by an increase in predictability as the time series lengthens. We performed a causal analysis between all meteorological factors and case numbers for the 14 provinces with the highest number of reported cases and also conducted a significance test for the causal relationships.

The causal relationships detected by the CCM method have transitivity, which makes it impossible for us to distinguish whether the causal relationship between factors and case numbers is direct or indirect with intermediate variables. We further used a method that can identify direct and indirect causality, called PCM47.

PCM is an extension of CCM. The key idea is to exclude the influence of a third variable when checking the consistency between the cross-mapping predictions of one time series with another. For a direct causal relationship from X to Y, indirectly influenced by Z, this is achieved by calculating the partial correlation coefficient\(\:{\rho\:}_{X\to\:Y\left|Z\right.}^{PCM}=\left|PCC(X,{\widehat{X}}^{Y}\left|{\widehat{X}}^{{\widehat{Z}}^{Y}}\right.)\right|\). Here, \(\:{\widehat{X}}^{{\widehat{Z}}^{Y}}\) is obtained by a successive simplex projection(\(\:{\widehat{X}}^{{\widehat{Z}}^{Y}}\) is X predicted by \(\:{M}_{{\widehat{Z}}^{Y}}\) where \(\:{\widehat{Z}}^{Y}\) is Z predicted by \(\:{M}_{Y}\)). When the computed partial correlation coefficient exceeds a given threshold, it is considered that there is a direct causal relationship from X to Y. Based on this, the high-order PCM can be derived as \(\:{\rho\:}_{X\to\:Y\left|\left\{{Z}_{i}\right\}\right.}^{PCM}=\left|PCC(X,{\widehat{X}}^{Y}\left|\left\{{\widehat{X}}^{{\widehat{{Z}_{i}}}^{Y}}\left|i=\text{1,2},\cdots\:\right.\right\}\right.)\right|.\:PCC\left(x,y|z\right)\) is the partial correlation coefficient that describes the degree of association between x and y after removing the effect of variable z, \(\:PCC\left(x,y|z\right)=\frac{\text{Corr}\left(x,y\right)-\text{Corr}\left(x,z\right)\text{Corr}\left(y,z\right)}{\sqrt{(1-\text{Corr}(x,z{)}^{2}\left)\right(1-\text{Corr}(y,z{)}^{2})}}\). This definition can be recursively extended to the case where there are multiple intervening variables between X and Y. For example, if there are two intervening variables, \(\:{Z}_{1}\) and \(\:{Z}_{2}\), the partial correlation coefficient is given by:\(\:PCC(X,Y|{Z}_{1},{Z}_{2})=\frac{PCC(X,Y|{Z}_{1})-PCC(X,{Z}_{2}\left|{Z}_{1}\right)PCC(Y,{Z}_{2}|{Z}_{1})}{\sqrt{(1-PCC(X,{Z}_{2}\left|{Z}_{1}{)}^{2}\right)(1-PCC(Y,{Z}_{2}\left|{Z}_{1}{)}^{2}\right)}}\). If the mediating variable is irrelevant, including it in the partial correlation coefficient formula will not affect the final calculation result.

First, we conducted a PCM analysis on the relationship between meteorological factors and GDP and the number of reported cases in Guangdong Province, which has the highest number of reported cases. Considering that dengue fever is a mosquito-borne disease and previous literature indicates that meteorological factors such as temperature and rainfall can indirectly affect dengue fever by influencing mosquitoes59, we also performed a PCM for the average values of meteorological factors, MOI, and BI collected from nine cities in Guangdong Province. According to the concept proposed by Leng and colleagues47, we determined whether there is a direct causal relationship between factors, the number of reported cases, and mosquitoes by calculating the partial correlation coefficients and comparing them to a given threshold of 0.4. Additionally, we tested the significance of the PCM values using the t-statistic.

Principal component analysis

PCA was employed to identify more important and representative variables60. PCA decomposed the total variance of the 11 original variables \(\:{x}_{1},{x}_{2},\cdots\:,{x}_{11}\), denoted as \(\:\sum\:_{i=1}^{11}D\left({x}_{i}\right)\), into the sum of the variances of 11 independent variables \(\:{y}_{1},{y}_{2},\cdots\:,{y}_{11}\), denoted as \(\:\sum\:_{i=1}^{11}D\left({y}_{i}\right)\). The contribution rate of the \(\:{k}^{th}\) principal component \(\:{y}_{k}\) is \(\:{\phi\:}_{k}=\frac{{\lambda\:}_{k}}{\sum\:_{k=1}^{11}{\lambda\:}_{k}}\), where \(\:{\lambda\:}_{1},\:{\lambda\:}_{2},\cdots\:{\lambda\:}_{11}\) are the eigenvalues of the covariance matrix of the original variables. The cumulative contribution rate of the principal components \(\:{y}_{1}\sim{y}_{m}\:\) is denoted as \(\:{\psi\:}_{m}=\frac{\sum\:_{k=1}^{m}{\lambda\:}_{k}}{\sum\:_{k=1}^{11}{\lambda\:}_{k}}\). where \(\:m<11.\) As the rank of the principal components decreases, the amount of original information explained by each principal component diminishes.

Regression analysis

The regression part aims to explore the nonlinear relationship and lag effect between meteorological factors (temperature and pressure) and the number of reported cases in different provinces. To achieve this, a distributed lag nonlinear model (DLNM)61combined with a generalized additive model (GAM)62 was used to capture the nonlinear effects of meteorological factors and their time-lagged characteristics. Considering the strong correlation between temperature and pressure, separate GAMs were established for temperature and pressure in each province, and GDP was included as a covariate. It is assumed that the number of reported cases follows a Poisson distribution, and a log link function is used to describe the relationship between the predictor variables and the number of reported cases. The average values of the meteorological variables are used as the reference values for calculating relative risks. The model is represented as:

$$\:log\left[E\left({Y}_{t}\right)\right]=\alpha\:+cb\left(M\right)+s\left(GDP\right).\:$$

\(\:{Y}_{t}\) is the number of reported cases in month t. \(\:\text{s}\left(\cdot\:\right)\) is a natural spline smoothing function. \(\:\alpha\:\:\)is the intercept term. M represents meteorological factors: temperature or pressure. \(\:cb\left(M\right)\) denotes the cross-basis function, which is obtained using DLNM to model the nonlinear and distributed lag effects of meteorological factors. Based on the comprehensive reference to the lag effects of temperature and pressure from previous research results11,27,63,64,65, 6 months were chosen as the maximum lag time. When constructing the cross-basis, the lag window was set from 0 to 6 months using the crossbasis() function, and natural splines were used to flexibly estimate the shape of the lag effects. The selection of degrees of freedom was determined by the automatic smoothing selection mechanism in the “mgcv” package in R.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Liverpool defender left out of World Cup squad

Madonna Covering Rent For Musicians Working At Her Old NYC Rehearsal Space

Up 16.5%! Here’s why Hollywood Bowl stock smashed the FTSE 250 today

Trump says Iran would not get sanctions relief in exchange for giving up enriched uranium