Data
Aggregated Scottish vaccination data were provided to the authors by the Electronic Data Research and Innovation Service (eDRIS). eDRIS serves as the point of contact for access to Public Health Scotland’s administrative data for research. These data were provided to the authors under a data sharing agreement, approved by the NHS Scotland Public Benefit and Privacy Panel for Health and Social Care (HSC-PBPP). Data were provided at an aggregated level and no individuals are cited, thus there was requirement for informed consent for this work. As these data were used for research only, no additional approval was required to conduct this study.
Individuals are grouped by sex, age range (0–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70+), and residing datazone (DZ; census areas of order 500–1,000 individuals, each with an area as low as 0.15–0.4 km2 in densely populated areas). For each of these groups, the data gives the number of individuals who have received exactly 1 dose, exactly 2 doses, exactly 3 doses, and exactly 4 doses. When the number of individuals is fewer than 5, the exact number is not provided, and we take an estimate (see Supplementary Methods 1).
Population denominators are taken from the census table UV102b, giving small-area populations by age and sex as of 22 March 202232. Data on small-area population breakdown by ethnicity are also obtained from the 2022 census data, table UV201b33.
Measures of deprivation are taken from the Scottish Index of Multiple Deprivation (SIMD) dataset23. These data are publicly available. The SIMD contains measures of different indicators of deprivation at the DZ level (e.g., the percentage of residents living in overcrowded housing). The SIMD also ranks DZs by deprivation in each of Access (e.g., broadband speeds, travel time to public services), Income, Employment, Education, Health, Crime and Housing. These ranks are derived from a weighted average of individual deprivation measures (see Supplementary Methods 1 for further details). A DZ with rank 1 is considered to have the highest relative deprivation, and a DZ with rank 6,976 (out of 6,976) the lowest. An overall deprivation rank and decile are also given from a weighted average of all measures.
COVID-19 vaccination in Scotland began with the administration of a first primary dose, and a second primary dose from eight weeks after. Three months from this initial course, adults then become eligible for a first booster dose, commencing in Autumn 2021. A second round of booster vaccination was available to over-75s and those otherwise considered vulnerable to severe COVID-19 disease in Spring 2022. Then, a further round of booster vaccination was available to over 50s and those otherwise vulnerable in Autumn 2022.
These broad-scale trends in the vaccine data used are summarised in Fig. 1.
a First boosters. b Second boosters. Overall uptake is the proportion of all individuals to have received a booster (the denominator being the population size). Decile 1 contains the most deprived DZs, and Decile 10 the least deprived. Second boosters were not widely available to those aged below 50.
We distinguish between two characterisations of booster uptake. Overall uptake is the proportion of individuals to have received a booster vaccination. The denominator is the population. Returning uptake is the proportion of individuals who have received at least one dose and have returned for a booster. The denominator is the number of individuals to have received at least one dose. The product of returning booster uptake and overall first dose uptake is then the overall booster uptake. Our model will be fit to returning uptake, but we report in terms of overall uptake where appropriate.
We exclude the 0–19 age bracket, which includes many very young individuals who were not eligible for any vaccine or booster. Finally, a small fraction of individuals with severely weakened immune systems are eligible for additional primary doses, on top of boosters34. Due to the structure of the data used here, we define first booster uptake for all individuals as uptake of the third available dose, of any type, and second booster uptake as uptake of the fourth available dose.
A model for first booster uptake
The 6976 DZs, 6 age ranges, and 2 sexes divide the population into 83,712 subpopulations, each with ~0–100 individuals, and we term these cohorts. We fit a Random Forest regression to cohort-level returning booster uptake. The model is informed by: age range, sex, ethnicity (% population not identifying as white British), and DZ-level deprivation ranks, by Access (e.g. broadband speeds, travel time to public services), Income, Employment, Education, Health, Crime and Housing.
To keep the model simple as well as suitably defined for generating scenarios with lower uptake later, we use DZ-level ranks rather than individual measures of deprivation as explanatory variables (these themselves have a strong degree of correlation, see Supplementary Fig. 1). Details of the individual deprivation ranks that feed into the model are given in Supplementary Methods 1.
Statistics and reproducibility
We trained a set of models with different hyperparameters, testing across: number of variables tested per tree split (2, 3, 4, 5), maximum node size (2000, 3000, 4000), and random number seed (7 seeds tested each), and training proportion (70%, 80%). Each model had 1000 trees. We chose the hyperparameters and seed of the model (node size 5000, training proportion 80%, max node size 4000) that explained the most DZ-level variation (R-squared) in the DZs the model was not trained on. Fixing the random number seed ensures the same model is produced in each execution of the code.
Predicting future distributions of uptake
It is important to account for population heterogeneity when modelling diseases with high variation in susceptibility from person to person. Similarly, it is important that the amount of vaccine-induced protection in demographics especially vulnerable to disease is accurately calibrated. For modelling scenarios where vaccine uptake has fallen, a simple approach would be to take the uptake from data at some time in the past, and reduce it proportionately across all demographics. However, changes in the observed patterns of drop-off in Fig. 1 suggest a more complex relationship, and in this case, at least, a different approach is required. With limited long-term data, then, we propose a method for redistributing uptake in a non-arbitrary manner, by identifying which fine-scale population groups may be prone to disproportionately larger falls in uptake.
Our regression model fit to first booster uptake takes input data on deprivation and population structure, and reproduces detailed spatial patterns with high accuracy despite not being informed by spatial data explicitly (such as where DZs are located or nearest neighbours). With our model, we are free to feed in data that have been modified in some way, such as data where population structure is unchanged, but the profile of deprivation is different. In doing this, the model will output uptake values that may differ from those using the original data. Such adjustments of the input data form the basis of standard methods of probing machine learning models, such as feature importance (where a variable is randomly shuffled to assess its influence on model performance) and partial dependencies (where one variable is modified to assess its influence on a model outcome)35.
Along these lines, we propose a method for assessing the risk of different population groups suffering disproportionate falls in uptake in the future. Our hypothesis is that deprivation is the key driver for spatial differences in vaccine uptake, and that population groups whose fitted values are more sensitive to small changes in community deprivation are prone to suffer disproportionately higher falls in uptake. This model is equivalent to proposing that, as the underlying impetus to vaccinate declines, the drop-off in uptake across deprivation cohorts will follow a consistent pattern—i.e., low deprivation cohorts under low vaccination uptake, will follow uptake trajectories similar to high deprivation cohorts under high vaccination uptake, rather than all cohorts following similar uptake trajectories at a given level of vaccine uptake.
We therefore adjust input data in a manner that reduces predicted uptake and assess the resulting distribution. The methodology is similar to a partial dependency analysis but differs in two ways. First, instead of adjusting a single variable, we adjust all deprivation measures (in this case, ranks) in parallel. Second, Random Forest models perform poorly when presented with values exceeding the range to which it was fit (i.e., Random Forest models alone are poor at extrapolation), so we need a means of extending the model prediction for when a counterfactual deprivation rank falls below 1.
The approach is summarised in Fig. 2 with examples in Fig. 3, and detailed in Supplementary Methods 2. For each cohort, we adjust the deprivation ranks of its associated DZ over a range of (positive and negative) values of Δ to give a set of predictions of uptake, to estimate how sensitive the prediction is to changes in deprivation. The parameter Δ here is abstract, without a physical analogue. We then fit to each of these predictions a curve (a sigmoid function) as a function of Δ to these modelled values. By taking different values of Δ, we then extrapolate distributions of uptake under these counterfactual data. The effect of this functional description is to smoothly extrapolate the deprivation relationships to consider some nominal community with highly severe deprivation (i.e., that would rank below any existing DZ across all deprivation indices). As the value of Δ falls to larger negative values, uptake over all cohorts will fall, but more sharply in those more sensitive to changes in deprivation rank. Conversely, cohorts that are less sensitive to changes will have lower than average falls.
For each cohort, the green box bounds the floor and ceiling values of deprivation rank shift Δ. Within this range, the projected uptake (blue points) falls for decreasing Δ (increasing level of deprivation). A sigmoid function (black, dotted) is fit to these fit values, which is then shifted to match the actual returning first booster uptake (red circle) at Δ = 0 (vertical dashed line).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


