Stock Ticker

Association of COVID-19 outcomes with measures of institutional and interpersonal trust: an ecological analysis using national data from 61 countries

Data sources

We used data from the WVS Wave 7 as our source for all trust-related covariates. The WVS Wave 7 is the seventh iteration of the cross-national, cross-sectional social survey periodically conducted by the World Values Survey, a non-commercial research program. For Wave 7, over 90,000 adult respondents across 66 countries and regions were interviewed (primarily via face-to-face interviews, with some phone interviews for remote regions) with a standardized questionnaire covering social, ethical, religious, and economic values. Respondents were selected via random probability representative samples within each country surveyed, with a minimum sample size of 1,000 for all countries. Data collection began in mid-2017 and, due to COVID-19-related delays, was completed in July 202330. Further details of how we analyzed the WVS dataset are below.

Our primary COVID-19 outcomes of interest were confirmed COVID-19 deaths per million from January 1, 2020 to December 31, 2022, sourced from JHU CSSE COVID-19 Data31, the vaccination rate of at least one dose per 100 people from 2020 to 2023, and excess deaths per million in 2020 and 2021, the latter both sourced from the WHO COVID-19 Surveillance Database32,50,61.

Our confounding variables, GDP per capita (in current $USD) and total life expectancy (in years), were sourced from the World Bank38; educational attainment (average years of schooling for individuals aged 15–64) was sourced from the Barro-Lee dataset, processed by Our World In Data39; urbanicity (percentage of the country’s population living in urban areas) was sourced from Our World In Data40; Freedom Score, a measure of democracy, was sourced from Freedom House41. The 2019 metrics were used to capture the status of each country when the COVID-19 pandemic first began.

This covariate, outcome, and country-level feature data resulted in a complete dataset for 61 countries of the 66 surveyed by WVS. For Taiwan, different data sources were necessary for outcome data: its excess deaths estimate was sourced from the Economist62. Missing GDP per capita and life expectancy data from the World Bank were alternatively sourced from the International Monetary Fund63 and the CIA World Factbook64; educational attainment was sourced from a report from Taiwan’s Ministry of Education65; and urbanicity was sourced from Worldometer66. Four surveyed regions (Hong Kong, Macau, Puerto Rico, and Northern Ireland) are accounted for in their country-level outcome data (China, the United States, and the United Kingdom, respectively). The final excluded country was Egypt, due to the high levels of nonresponse for WVS survey questions.

Defining and extracting trust covariates from WVS wave 7

We sought to define trust based on the questions in the WVS Wave 7 survey that included the key words “trust” or “confidence” and were part of the standard questionnaire given to all participating countries. In social science theory, confidence and trust are widely recognized as adjacent, though not equivalent, concepts. Confidence is the belief that an entity will do what they are expected to do and trust is a thornier belief made up not only of confidence in an entity but also faith in an entity’s integrity and competence4,5. In WVS Wave 7, “trust” is used exclusively when asking about interpersonal relations and “confidence” is used exclusively when asking about institutions. We will use the term “trust” to refer to all of the WVS questions, with “interpersonal trust” corresponding to the questions actually labeled “trust” in WVS and “institutional trust” corresponding to the questions actually labeled “confidence.” This decision is motivated by the similarity between the definitions of these two concepts and existing research based on the WVS that also uses the term “trust” to refer to both items labeled “trust” and “confidence”17.

Thirty-three questions met these criteria, coded \(\text{t}=57\dots\:89\), all from the “Social Capital, Trust and Organizational Membership” subsection of the survey. Thirty-two of the 33 questions followed a parallel response structure of four ordinal and four nominal responses, while one question (Q57, “Most people can be trusted”) had two ordinal and four nominal response categories. These three data structures, alongside an example of what the data looks like for one country’s individual responses to one question, are shown in Fig. 3.

Fig. 3
figure 3

The ordinal and nominal response options for our survey questions of interest, alongside an example of this data for the United States and the question “How much confidence do you have in the government?”

Not all of the 33 eligible questions from the WVS were deemed relevant to our research question of interest, primarily because the mechanism through which trust in this entity would act upon our COVID-19 outcomes appeared conceptually ambiguous, such as with trust in the women’s movement (question 80). These thematically irrelevant questions run the risk of introducing non-informative noise to our model, as they do not pertain to our research aim. In order to clearly operationalize our research goal and remove these spurious questions, we conducted a qualitative analysis of the 33 available items. This process identified key thematic groups through which trust could act upon COVID-19 outcomes: trust in “in” interpersonal groups, such as one’s family; trust in “out” interpersonal groups, such as those of another nationality; trust in media, such as the press; trust in bodies of science, such as universities; trust in domestic governance, such as parliament; and trust in international governance, such as the United Nations. These thematic groups were used to identify questions to include in our analysis; some other individual questions were also selected based on prior literature suggesting a correlation between trust in these entities and trust in public health and medicine. For example, environmental protection is frequently classified as part of public health67 leading to the inclusion of question 79 (“Confidence: The Environmental Protection Movement”). In the end, 23 of the 33 questions were selected for our analysis. See Supplementary Table S1 for a full list of the available WVS questions and those used in our analysis.

Each country (\(\text{j}=1\dots\:61\)) in Wave 7 has some \({\text{n}}_{\text{j},\text{t}}\) individual responses to each question \(\text{t}\). As our outcomes of interest are at a population level, we sought to aggregate our trust item data from the individual level to the population level so that each country had one data point for each item. For our aggregation, we excluded the nominal non-responses and focused on the ordinal categorical data.

The total n given by the WVS dataset includes counts of non-responses. For our sample proportions, we will use an adjusted n:

$${n}_{j,t}^{\text{*}}=n-\left({n}_{-1,j,t}+{n}_{-2,j,t}+{n}_{-4,j,t}+{n}_{-5,j,t}\right)$$

where \({n}_{i,j,t}\) with \(\text{i}=\left[-5,-4,-2,-\text{1,1},\text{2,3},4\right]\) corresponds to the possible non-response or response categories \(\text{i}\) for country \(\text{j}\) at question \(\text{t}\). This adjusted \({n}_{j,t}^{\text{*}}\) thus accounts only for response values.

Each country’s count data \(\left({n}_{1,j,t},{n}_{2,j,t},{n}_{3,j,t},{n}_{4,j,t}\right)\) follow a multinomial distribution with parameters \({n}_{j,t}^{\text{*}}\) and \(\left({{\uppi\:}}_{1},{{\uppi\:}}_{2},{{\uppi\:}}_{3},{{\uppi\:}}_{4}\right)\) where \({n}_{j,t}^{\text{*}}={{\Sigma\:}}_{k}{n}_{k,j,t}\:\)is the fixed number of responses (we assume each individual response is independent and identically distributed) and there are \(\text{k}=4\:\)categories of response, such that \({{\uppi\:}}_{k}\) is the probability of observing response \(\text{k}\) and \({{\Sigma\:}}_{k}{{\uppi\:}}_{k}=1\). The multinomial log-likelihood is:

$$\text{L}\left({{\uppi\:}}_{j,t}\right)={{\Sigma\:}}_{k}{n}_{k,j,t}\text{l}\text{o}\text{g}{\left({\uppi\:}\right)}_{k,j,t}$$

Our maximum likelihood estimates are the sample proportions, calculated as \({p}_{j,t}^{k}=\frac{{n}_{k,j,t}}{{n}_{j,t}^{\text{*}}}\text{\:and}\)

\({n}_{j,t}^{\text{*}}={{\Sigma\:}}_{k}{n}_{k,j,t}\). Thus

$${\sum\:}_{\text{k}=1}^{4}{\text{p}}_{\text{j},\text{t}}^{\text{k}}={\sum\:}_{\text{k}=1}^{4}\frac{{\text{n}}_{\text{k},\text{j},\text{t}}}{{\text{n}}_{\text{j},\text{t}}^{\text{*}}}=1.$$

Using these sample proportions, we can calculate a weighted question score as follows:

$$T{S}_{j,t}={p}_{j,t}^{1}\text{*}4+{p}_{j,t}^{2}\text{*}3+{p}_{j,t}^{3}\text{*}2+{p}_{j,t}^{4}\text{*}1$$

where the weights are the inverse of the recorded structure such that the higher the score, the higher the level of trust. This procedure is used for all \(\text{t}\) questions with a four-category ordinal structure, regardless of the wording of the responses, such that the “trust” and “confidence” questions are comparable. Question 57 was similarly aggregated according to its two-category structure, with \({p}_{1}\) as 2 (“Most people can be trusted”) and \({p}_{2}\) as 1 (“Need to be very careful”):\(\text{T}{S}_{j,57}={p}_{j,57}^{1}\text{*}2+{p}_{j,57}^{2}\text{*}1\)

This resulted then a \(61\times\:23\) matrix of countries and question scores used for analysis.

There is substantial within-question variation of country trust scores, particularly for questions about both domestic and international governmental organizations (see Supplementary Figure S3). The most stable questions are those about charities and interpersonal trust, particularly trust in members of one’s community or family. Our exploratory analysis also revealed a series of complexities around missing data in the survey. Country-level non-response was rather low: only a small number of questions were entirely omitted from surveys in a small number of countries. However, individual item non-response, where an individual was asked a question but chose not to respond, varied considerably. While mean individual non-response was below 10% across countries for over three-quarters of the 23 questions, this was not always the case. This was seen particularly in Egypt, where the mean non-response was over 30% and went as high as 74.7%. This led to Egypt’s responses being excluded from further analysis. No other countries were excluded for missingness, and the few remaining question scores with non-response rates above 40% were replaced via single imputation with the median score from countries with missingness below 40%.

Statistical analysis: single item regression

To understand how individual trust items were associated with COVID-19 outcomes, we conducted ordinary least-squares regression. The associations between trust and our COVID-19 outcomes of interest were assessed by a series of single item regression models, adjusting for our five country-level features. For each trust item and each outcome of interest, the following model was used:

$$\text{E}\left({Y}_{r}\right)={{\upbeta\:}}_{0,\text{r}}+{{\upbeta\:}}_{1,r}\text{X}+{{\upbeta\:}}_{2,r}\text{G}\text{D}\text{P}+{{\upbeta\:}}_{3,r}Life\:expectancy+{{\upbeta\:}}_{4,r}Education$$

$$+{\beta\:}_{5,r}Urbanicity+{\beta\:}_{6,r}Freedom\:Score$$

where the subscript r denotes that the model is built with respect to the r-th COVID-19 outcome of interest (confirmed deaths per million, COVID-19 vaccination rate per 100 people, estimated excess deaths per million), and X denotes the question of interest (any of the 23 items included in our analysis).

None of the outcomes were transformed for regression. Measures of GDP, life expectancy, education, urbanicity, and democracy were included to control for potential confounding from healthcare infrastructure, population health, and overall country development. While a multiple linear regression model was considered, the high levels of collinearity between the trust items, coupled with the relatively low sample size (\(N=61\)) made even sparse regression techniques like elastic net regression unstable and ultimately uninformative.

Statistical analysis: deterministic and probabilistic trust clustering and regression

We sought to classify countries based on their level of trust and analyze whether cluster membership impacted COVID-19 outcomes (i.e. whether being “high trust” was associated with better COVID-19 outcomes than being a “low trust” country). We used a two-step process that first identified clusters based on trust item scores and then regressed our COVID-19 outcomes of interest on cluster membership, adjusting for our country-level features. The first clustering approach used was k-means clustering, which minimizes within-cluster variances for a pre-specified number of \(\text{k}\) clusters. A variety of visual and analytical tools—including assessing the AIC (Akaike Information criterion) and BIC (Bayesian information criterion), gap statistics, and elbow method results—were used to select \(\text{k}=3\) clusters based on this algorithm.

The second two-stage approach was clustering using finite Gaussian mixture modeling. Here, we assume that the datapoints for each country are generated by a distribution made up of \(k\) groups and \(k\) corresponding components. This distribution is of the form

$$\text{f}\left({\text{x}}_{\text{i}};{\Psi\:}\right)={{\Sigma\:}}_{k=1}^{K}{{\uppi\:}}_{k}{f}_{k}\left({x}_{i};{{\upbeta\:}}_{k}\right).$$

where \({\uppsi\:}=\left({{\uppi\:}}_{1}\dots\:{{\uppi\:}}_{K-1},{{\upbeta\:}}_{1}\dots\:{{\upbeta\:}}_{k}\right)\) is the vector of all model parameters, \({f}_{k}\left({x}_{i};{{\upbeta\:}}_{k}\right)\) is the \(\text{k}\)-th component’s density function for \({x}_{i}\) with parameters \({{\upbeta\:}}_{k},\:\)and \({{\uppi\:}}_{k}\) is the mixing probability of belonging to component \(\text{k}\) such that \({{\Sigma\:}}_{k=1}^{K}{{\uppi\:}}_{k}=1\). The data’s log-likelihood is then

$$logL\left({\Psi\:}\right)={{\Sigma\:}}_{k=1}^{K}{{\Sigma\:}}_{j=1}^{n}{z}_{kj}(log{{\uppi\:}}_{k}+log{f}_{k}({x}_{i};{{\upbeta\:}}_{k}\left)\right).$$

where \({z}_{kj}\) is the indicator for whether \({y}_{j}\) belongs to component \(\text{k}\). As we assume each component’s density is Gaussian, we can write \({f}_{k}\left({x}_{i};{{\upbeta\:}}_{k}\right)\sim\text{N}\left({{\upmu\:}}_{k},{{\Sigma\:}}_{k}\right)\). The expectation-maximization algorithm can then be used to obtain estimates for \({{\upmu\:}}_{k},{{\Sigma\:}}_{k},\) and \({{\uppi\:}}_{k}\). This is the approach taken in the R package ‘mclust’, which was used for this analysis68. Through an analysis of BIC, ICL (integrated completed likelihood), and log-likelihood values, \(k=3\) components were selected.

For both methods, ordinary least-squares linear regression was used to assess the statistical significance of cluster membership in relation to our three COVID-19 outcomes of interest and trust subdomain scores. Our outcomes were individually regressed on a categorical variable for cluster membership, adjusting for country-level features, as shown below:

$$\begin{aligned} \text{E}\left({Y}_{r}\right) & ={{\upbeta\:}}_{0,\text{r}}+{{\upbeta\:}}_{1,r}I({C}_{i}=Low)+{{\upbeta\:}}_{2,r}I\left({C}_{i}=Medium\right) \\ & \quad +{{\upbeta\:}}_{3,r}GDP+{{\upbeta\:}}_{4,r}Life\:expectancy+{{\upbeta\:}}_{5,r}Education+{\beta\:}_{6,r}Urbanicity+{\beta\:}_{7,r}Freedom\:Score \end{aligned}$$

with the subscript \(r\) denoting that the model is built with respect to the \(r\)-th COVID-19 outcome of interest (confirmed deaths per million, excess deaths per million, COVID-19 vaccination rate per 100 people), \({C}_{i}\) referring to cluster membership of country \(i\), and \({{\upbeta\:}}_{0,\text{r}}\) representing the “high trust” cluster coefficient.

Statistical analysis: Bayesian profile regression

After our two-step clustering, we sought to employ a method that could perform simultaneous clustering and regression and thus better incorporate uncertainty in cluster membership into the model. To do this, we used Bayesian profile regression as implemented in the R package ‘PReMiuM’69. Based on Dirichlet process mixture modeling (DPMM), this framework uses Markov Chain Monte Carlo methods to non-parametrically link the outcome and covariates through cluster membership while adjusting for country-level features48,70. Though this DPMM-based method is not a direct joint analog to our two-step methods (which use finite mixture models for GMM), we chose to leverage this gold standard method to be able to preserve the continuous nature of our covariates. As the outcome variable is considered in the clustering produced by this method, we ran three distinct joint profile regression models for each of our COVID-19 outcomes of interest, so that, in contrast to our two-step findings, we had different clustering assignments for each outcome.

For each COVID-19 outcome of interest \({y}_{i}\), our model is \({y}_{i}={\theta\:}_{{Z}_{i}}+{\beta\:}^{T}{W}_{i}+\:{\epsilon}_{i}\), where \({\theta\:}_{{Z}_{i}}\) is the random effect coefficient of cluster membership, \({\beta\:}^{T}{W}_{i}\) is the term encapsulating the fixed effects (our five country-level features) whose coefficients \(\beta\:\) do not change with cluster membership, and \({\epsilon}_{i}\) is the normally distributed error term. As both our covariates (the trust item scores) and outcomes are continuous, we assumed a mixture of Gaussian distributions for covariates and a Gaussian distribution for our response \({y}_{i}\), whose mean \({\mu\:}_{i}\) is represented by \({\mu\:}_{i}={\theta\:}_{{Z}_{i}}+{\beta\:}^{T}{W}_{i}\). A blocked Gibbs sampler was then used to generate estimates from \(h\) runs. From these runs, partition around medoids on the dissimilarity matrix was used to identify the ‘best’ partitions for a list of possible number of clusters, which was then maximized via average silhouette width to identify an overall ‘best’ and representative partition. For further details, consult the ‘PReMiuM’ manual69.

To run the models for each outcome, we scaled and centered all covariates and country-level features. Because the number of parameters (23 trust items and five country-level features) was large relative to our sample size, we chose a subset of the trust items to use in our analysis to mitigate multicollinearity. We selected these subsets by ranking the single item regression outputs by p-value and magnitude of association to identify the most important trust items for each outcome. For the two deaths per million outcomes, the top ten questions were used; for our vaccination rate, only the top five were used to guarantee a stable model. The distribution of vaccination rate across countries—already more ambiguous than death rates when clustered by trust, as seen in our two-step analysis—likely contributed to the difficulty in finding a stable model for this outcome-influenced regression. Once the questions were selected, we used a burn-in period of \(h=5000\) runs with 10,000 recorded sweeps, default priors provided by ‘PReMiuM’, and an initial number of clusters as two. This clustering was not fixed and changed throughout the model runs; the final output for the two death rates identified the optimal number of clusters as two, but the final output for vaccination rate identified the optimal number as three. Postprocessing returned representative clusters summarizing all clustering explored throughout the algorithm. Using these representative clusters, we calculated highest posterior density intervals for all trust subdomain scores and the empirical mean outcomes for each cluster. Finally, we back-transformed these results from their scaling and centering for ease of comparison with our two-step findings.

Source link

Get RawNews Daily

Stay informed with our RawNews daily newsletter email

Predicted line-ups, where to watch and stats for Conference League semi-final

‘Summer House’ Star Lindsay Hubbard Says Amanda and West Won’t Last

Might it make sense to ‘go away’ from the stock market in May?

Mixed outlook as financial sector surges and tech giants falter