Data source
The data analyzed are from the Uganda National Household Survey (UNHS) of 2019/2020 which collected socio-economic and behavioral data on various indicators to inform and monitor development policies of national and international frameworks14. The UNHS is conducted every 5 years and the 2019/2020 UNHS is the seventh in the series since its inception in 199/2000. Household demographic, social, and economic data were collected from a nationally representative sample from urban and rural areas for 15 geographical regions—14,480 households from 1448 sample clusters. The UNHS used the same sampling frame as that of the Uganda Population and Housing Census conducted in August 2014. The sampling frame was a complete list of enumeration areas created for a national census, comprising 78,692 enumeration areas but it excluded refugees, forests and forest reserves, and institutions such as schools and universities. Each of the 129 districts at the time of the survey was subdivided into sub-counties, the sub-counties were subdivided into parishes, and the parishes were subdivided into villages and finally into enumeration areas. Each enumeration area had a designated residence type—urban or rural. The enumeration areas with less than 50 households were linked to other enumeration areas by a geographical information system so that the primary sampling units were not too small. Samples were selected independently from each stratum based on enumeration areas and socioeconomic status using a probability proportional to size approach.
The survey employed a two-stage stratified sampling design, with the first stage as the grouping of enumeration areas by districts of similar socio-economic characteristics as well as rural versus urban locations. In the second stage, households were sampled using a systematic random sampling approach. Overall, 1651 enumeration areas were selected from the 2014 National Population and Housing Census list to constitute the sampling frame, grouped into sub-regions considering standard errors required for the estimation of poverty indicators. The districts were stratified into 15 sub-regions based on similarity in socio-demographic characteristics. Some districts in Uganda are found in mountainous areas, making access difficult. Therefore, districts were divided into mountainous versus non-mountainous. Overall, 15,786 households were selected for the survey and 13,732 households were successfully interviewed, giving a 90% response rate (93% rural vs. 84% urban). Two rounds of random data collection were conducted, one before the COVID-19 pandemic and another during the pandemic, with response rates of 93% and 89%, respectively. Funding for the survey was provided by the Government of Uganda and the survey included 65,080 individuals.
Ethical considerations
The 2019/2020 Uganda National Household Survey (UNHS) dataset is publicly available at https://microdata.worldbank.org/index.php/catalog/390214. The data are de-identified and therefore, according to the Uganda National research guidelines, no ethical approval was required. For this analysis, a request to conduct an analysis on the UNHS dataset was submitted to the Uganda Bureau of Statistics (UBOS), the agency in charge of the survey. UBOS provided a dataset containing variables necessary to complete the analysis. Therefore, all methods were carried out in accordance with relevant guidelines and regulations.
Study design and variable measurements
We designed a quasi-experimental study using nationally representative data, with the COVID-19 pandemic as the primary exposure. The exposed group included individuals who were surveyed during the COVID-19 pandemic, while the unexposed group included those surveyed before the pandemic. The exposed and unexposed groups were not comparable on several measured covariates due to the absence of randomization in exposure assignment, justifying a need for a quasi-experimental study15. To achieve comparability in measured covariates between the exposed and unexposed groups, we applied propensity-score weighting thus emulating a randomized controlled trial16. Propensity score weighting reduced the systematic differences in covariate distribution between the exposed and unexposed groups enabling an unbiased measure of effect between the primary exposure and the study outcomes17. The propensity score is the probability of being in the exposed group based on observed covariates; it ranges from 0–117.
The outcomes of interest included the frequency of smoking, alcohol consumption, and substance use. Each of these outcomes was measured on an ordinal scale. In the survey, individuals were asked to state the frequency of smoking tobacco product(s), consuming alcohol, or drug use, with the responses as none or not at all, less than daily, and daily. The tobacco products investigated in the survey included smoke such as cigarettes, cigars, pipes full of tobacco, shisha, and others, and smokeless tobacco like snuff, chewed, Betel quid with tobacco, and others. The survey asked for substances namely opium, marijuana, and cannabis. For alcohol consumption, participants reported any form of alcohol consumed. The covariates included the individual’s residence such as region, sub-region, and district, and whether the individual was from a mountainous region, and rural or urban setting.
Other covariates included the individual’s sex, age, level of education, literacy levels, wealth quintile, labor (employment) status, marital status, and whether the individual was an internet or phone user. Age was initially measured in absolute years but later categorized as 15–24 years, 25–59 years, and ≥ 60 years to depict young, middle-aged, and older persons. The mobile phone technology considered was any portable telephone, both basic phones and smartphones, subscribing to a mobile telephone service provider and allowing financial transactions. This study hypothesized that mobile phone owners versus non-owners, and internet users versus non-users may differ systematically, introducing potential selection bias. The observed differences could correlate with the study outcomes, making the inclusion of mobile phone ownership and internet use as covariates important for improving the validity of the effect estimates.
Statistical analysis
The analysis was performed in R (R version 4.2.1 2022-06-23 ucrt). We performed exploratory data analysis where we summarized and presented categorical data using frequencies and percentages and numerical data using the mean and standard deviation if normally distributed otherwise the median and interquartile range were used. We cross-tabulated the covariates by the COVID-19 pandemic period (exposed vs. unexposed) and assessed differences in covariate distribution using tests of statistical significance at a 5% level. The chi-square test was used to assess differences in proportions between the categorical variables and COVID-19 exposure, while the Student’s t-test was used to assess mean differences in numerical data between the exposed and unexposed groups.
We fitted a generalized boosted model (GBM) as a function of the exposure and the covariates adjusting for robust standard errors and used the generated coefficients to predict propensity scores. Compared to a binary logistic regression model, propensity scores generated from a GBM are more appropriate as the model automatically adds polynomials or interaction terms whenever needed compared to relying on a manual approach when using a logistic regression model. GBM uses decision trees to create a complex model by combining multiple simple models from iterative algorithms hence producing a better fit18. To achieve covariate comparability between the exposed and unexposed groups, we created a pseudo population by weighting the exposed group using the inverse of the propensity scores (1/propensity scores) and the unexposed group using the negative of the inverse of propensity scores (1/[1-propensity scores])19.
We then assessed covariate balance (comparability) between the exposed and unexposed groups graphically using a back-back propensity-score mirror histogram, and statistically using the absolute standardized mean difference (SMD), with an SMD < 0.1 considered suggestive of balanced covariates20. After achieving covariate comparability between the two groups, we estimated the effect of the COVID-19 pandemic on the study outcomes using ordered logistic regression as the outcomes were ordinal, adjusting for the propensity score weights, reported as weighted proportional odds ratio (weighted pOR) with the 95% confidence interval (CI).
Additional analyses
We assessed whether the propensity score model (GBM) was correctly specified using a propensity score model specification test, with the null hypothesis being correct model specification. We assessed the robustness of the effect estimates by comparing them with the unadjusted and adjusted ordered logistic regression model results. We also performed a sub-group analysis to compare the outcomes during and before the COVID-19 pandemic separately for men, women, and rural and urban residents. All the results from the unadjusted and adjusted ordered logistic regression analyses were reported as supplementary using an unweighted proportional odds ratio (unweighted pOR). We tested the proportional odds assumption for ordered logistic regression using the Brant Test. The null hypothesis was that the parallel regression assumption holds and the alternative hypothesis was that it does not hold. Additionally, we conducted both non-causal and causal analyses to comprehensively examine the exposure-outcome relationship. The non-causal analysis described the raw association, while the causal analysis adjusted for confounding to provide more robust causal inferences. Comparing results from both approaches highlighted the impact of adjustments in the causal model and demonstrated its value in addressing confounding biases, ensuring consistent and reliable findings.
Reporting of findings
We followed the propensity score analysis guidelines21 and the guidelines for Improving the Reporting Quality of Nonrandomized Evaluations of Behavioral and Public Health Interventions: The TREND statement22 in reporting the findings.