Summary
We produced forecasts of the number of COVID-19 cases in hospital ward and ICU beds (i.e. the ward and ICU occupancies) on a weekly basis using a bespoke clinical forecasting pipeline (Fig. 1a). We simulated the pathways taken by COVID-19 cases through a hospital as flow through a compartmental model (Fig. 2a). Our clinical forecasting pipeline takes in three primary inputs: an ensemble case forecast, time-varying estimates of key epidemiological parameters (the age distribution of cases, the probability of hospital admission, and the probability of ICU admission), and estimates of patient length of stay. The model outputs are fit to reported occupancy counts across a seven-day window prior to the forecast start date using Approximate Bayesian Computation (ABC)16. We reported the resultant 21-day forecasted counts of ward and ICU occupancy to public health committees on a weekly basis.
a The compartmental model. The probability of transition between Case and Ward and between Ward and ICU was informed by time-varying age-specific estimates, all other probabilities were specified according to age-specific estimates from the multi-state length of stay analysis. The number of occupied ward beds reported by the model is the sum of the individuals in the Ward and Post-ICU ward compartments, and the number of occupied ICU beds is the number of individuals in the ICU compartment.
Compartmental pathways model
Our compartmental model simulates the progression of severe COVID-19 disease and corresponding pathways taken through a hospital (Fig. 2a). The design of this model was informed by COVID-19 clinical progression models previously developed for the Australian health system context22,23,24. In our model, new COVID-19 cases start in the Case compartment according to their date of symptom onset (inferred where not recorded). From this compartment, some fraction of cases are admitted to hospital, according to a (time-varying) probability of case hospitalisation. Hospitalisations start in the Ward compartment, from which a patient can then develop further severe disease and be admitted to ICU, according to a (time-varying) probability of ICU admission. Patients in the ICU compartment can then move to the Post-ICU ward compartment. In addition, across each of the Ward, ICU, and Post-ICU ward compartments, we assume patients have some probability of dying or being discharged. We count the number of occupied ward beds as the number of patients in the Ward and Post-ICU ward compartments, and the number of occupied ICU beds as the number of patients in the ICU compartment.
Length of stay estimates
To simulate the flow of patients through the compartmental model, we need to specify distributional estimates of the duration of time they will spend within a compartment before a transition occurs (i.e. their length of stay), and the probabilities of each particular transition occurring (i.e. transition probabilities). We produced estimates of length of stay and transition probabilities using a multi-state survival analysis approach15. This survival analysis framework allowed us to produce estimates across our compartmental model in near-real-time while accounting for right-censoring, such that we could rapidly incorporate any changes in length of stay or transition probabilities when necessary. Changes in these quantities may have arisen as a consequence of factors such as a new variant exhibiting different clinical severity, changes in clinical practice, or vaccination. Although we did not include these factors as covariates in the survival model, their net effect on length of stay statistics during the study period was captured by producing our length of stay estimates over only recently admitted patients. We estimated length of stay and transition probabilities using hospital data from the state of New South Wales (see ref. 15, Supplementary Methods). We were not able to produce similar estimates for the other states and territories of Australia as the requisite line-listed hospital stay data were not accessible to us or did not exist. The delay distribution for Case to Ward was informed by estimates (not described here) from the FluCAN sentinel hospital surveillance network study25, as appropriate data were not available to estimate this delay in the New South Wales dataset (Supplementary Methods), noting that this delay only affects the relative timing of the occupancy time series. The transition probabilities from the multi-state survival model were used across all transitions in the compartmental model except for the Case to Ward and the Ward to ICU transitions. The transition probabilities for these two transitions were estimated as time-varying (described later), given their substantial impact upon the net occupancy counts. The length of stay and transition probability estimates were provided to the simulation model as bootstrapped samples of gamma distribution shape and scale parameters and multinomial probabilities of transition.
Case incidence
In our compartmental model (Fig. 2a), cases of COVID-19 begin in the Case compartment. As such, we must inform the model with the number of new cases entering this compartment each day: we achieve this through use of a time series of historically reported case incidence concatenated with a trajectory of forecasted case incidence.
We received time series of historical case incidence indexed by date of symptom onset from an external model26. This external model performs imputation of symptom onset dates where they have not been recorded in the data, with the final time series being the count of cases with a (reported or imputed) onset date on each given date. Because this external model did not perform multiple imputation of the symptom onset date, we added noise to capture uncertainty in the case counts via sampling from a negative binomial distribution with a mean of the historical case count and a dispersion of k = 25. This uncertainty was expected to assist the subsequent inference stage by increasing the prior predictive uncertainty. A dispersion value of 25 was selected through visual inspection such that the expected variability in incidence by symptom onset date was captured (noting that the subsequent inference stage was able to further refine this uncertainty where necessary, e.g. rejecting samples where the uncertainty in case incidence was too great or too small).
Our method is agnostic to the case forecasting approach used as input, thus allowing us to couple it with any independently produced forecast of case incidence. Here we used outputs from an ensemble forecast of case incidence, which varied in model composition during the study period (methodologies and summary outputs for the ensemble forecast are publicly available14). A total of four different models were used at various stages: two mechanistic compartmental models, one mechanistic branching process model, and a non-mechanistic time series model (see in refs. 14,27,28 for details). Models within the ensemble received ongoing development across the study period in response to changes in our understanding of the epidemiology and biology of the virus14.
Estimation of time-varying parameters
We specified three parameters in the compartmental model of clinical progression as time-varying. For each forecast, we produced estimates stratified by age group a and varying with time t of: the probability of a case being within a certain age group, page(a, t); the probability of a case being hospitalised, phosp(a, t); and the probability of a hospitalised case being admitted to ICU, pICU(a, t). These parameters were chosen to capture phenomena such as changes in case age distribution, changes in case ascertainment, differences in variant virulence and outbreaks of the disease within populations subgroups. We defined age groups as 10-year groups from age 0 to 80, followed by a final age group comprising individuals of age 80 and above (i.e. 0–9, 10–19, …, 80+).
The time-varying parameters were estimated using case data from the National Notifiable Disease Surveillance System (NNDSS), which collates information on COVID-19 cases across the eight state and territories of Australia. For each case in this dataset, we extracted the date of case notification, the recorded symptom onset date, the age of the case, and whether or not the case had been admitted to hospital or ICU. Where symptom onset date was not available, we assumed it to be one day prior to the date of notification (where this was the median delay observed in the data).
For each of the three time-varying parameters, we constructed estimates using a one-week moving-window average, with estimates for time t including all cases with a symptom onset date within the period (t − 7, t]. To capture uncertainty in these time-varying parameters, estimates were produced using bootstrapping (sampling with replacement) from the line-listed data. A total of 50 bootstrapped samples were produced, with each sample consisting of three parameter time series (each stratified by nine age groups for a total of 27 time series). At the simulation and inference stage, each simulation received a single such sample as input, such that correlation between the 27 time series was preserved. We calculated the first parameter page(a, t), which defines the multinomial age distribution of cases over time, as the proportion of cases within each age group for an estimation window:
$${p}_{{{\rm{age}}}}(a,t)=\frac{\mathop{\sum }_{\tau = t-6}^{t}{n}_{a}(t)}{\mathop{\sum }_{\tau = t-6}^{t}{\sum }_{i}{n}_{i}(\tau )},$$
(1)
where na(τ) is the number of cases in age group a with symptom onset at time τ. To calculate the probability of a case being hospitalised and the probability of a hospitalised case being admitted to ICU, we produced estimates with adjustment for right-truncation. Here, right-truncation was present as we used near-real-time epidemiological data and indexed our estimates by date of symptom onset. The most recent symptom onset dates in our estimates thus included cases that would eventually be (but had not yet been) hospitalised (and similarly for cases admitted to hospital, but not yet admitted to ICU). Had we not accounted for this right-truncation, we would have consistently underestimated the probabilities of hospitalisation and ICU admission for the most recent dates. We describe the maximum-likelihood estimation of the hospitalisation and ICU admission parameters in the Supplementary Methods.
If in a given reporting week we identified a jurisdiction as having unreliable data on hospitalised cases (most often, missing data on cases admitted to hospital or ICU due to data entry delays), we replaced the local estimates with estimates produced from pooled data across all other (reliable) jurisdictions. Changes made in this regard during the study period are listed in the Supplementary Methods.
Simulation and inference
To simulate a single trajectory of ward and ICU occupancy, we sampled: a trajectory of case incidence from the ensemble; a sample of the bootstrapped time-varying parameters (each comprising three time series across nine age groups); and a sample from the bootstrapped length of stay and transition probability estimates. Using these inputs, we performed simulations across the compartmental model (Fig. 2a) independently across each age group and then summed across all age groups to produce total ward and ICU counts for each day. The compartmental model simulates the pathways of patients through the hospital at the population scale with an efficient agent-based approach; we provide details on this algorithm in the Supplementary Methods.
To ensure that trajectories simulated from the clinical pathways model aligned with reported occupancy counts, we introduce a simple rejection-sampling approximate Bayesian method, rejecting trajectories that did not match the true reported occupancy counts within a relative tolerance ϵ across a one-week calibration window. For each simulation with a simulated ward occupancy count \(\hat{W}(t)\) and simulated ICU occupancy count \(\hat{I}(t)\), simulations were rejected where either:
$$\left\vert W(t)-\hat{W}(t)\right\vert > \max (\epsilon W(t),10)\,{{\mbox{or}}}\,\left\vert I(t)-\hat{I}(t)\right\vert > \max (2\epsilon I(t),10),$$
(2)
where W(t) and I(t) were the true reported occupancy counts for each date t in the fitting window, with these counts retrieved from the covid19data.com.au project13. We selected ϵ using a simple stepped threshold algorithm, initialising ϵ at a small value, and continued to sample simulations until 1000 trajectories had been accepted by the model. If 1000 trajectories were not accepted by the time that 100, 000 simulations had been performed (i.e. 100 rejections per target number of output trajectories), we increased ϵ in sequence from [0.1, 0.2, 0.3, 0.5, 1, 10] and restarted the sampling procedure. This behaviour was chosen to achieve a good degree of predictive performance while ensuring that reporting deadlines were met (typically less than 24 h from receipt of ensemble forecasts and relevant hospital data), even where the model was otherwise unlikely to capture hospital occupancy at tighter degrees of tolerance.
We fit simulation outputs over a calibration window defined as the seven days following the start of the 28-day case forecast. This was chosen such that the most up-to-date occupancy data could be used in fitting (typically data as of, or a day prior to, the date clinical forecasts were produced). We could fit the clinical forecast over occupancy data points which were seven days in the future relative to the start of the case forecast for two reasons: the case forecasts were indexed by date of symptom onset and began at the date where a majority (>90%) of cases had experienced symptom onset, adding a delay of 2–3 days; and case forecasts were affected by reporting delays of 3–4 days (whereas occupancy data was not lagged). We did not fit over a larger window as the seven-day window was expected to be sufficient for our purposes, and the computational requirements of model fitting would increase exponentially with a larger window. The forecasts we reported on a weekly basis and examine here are the model outputs across the 21 days following this seven-day fitting window.
We introduced two additional parameters to improve the ability of the model to fit to the reported occupancy counts. These parameters increased variance in the magnitude of the output ward and ICU occupancy count trajectories, reducing the probability of a substantial mismatch between these trajectories and the reported occupancy counts. The first parameter added was H, a modifier on the probability of hospitalisation acting linearly across logit-transformed values:
$${p}_{\,{\mbox{h}}}^{* }={{\mbox{logit}}}^{-1}({{\mbox{logit}}}({p}_{{{\rm{h}}}})+H),H \sim N(0,{\sigma }_{{\mbox{hosp}}\,}^{2}).$$
(3)
The second parameter added was L, which modified the shape of the length of stay distributions across the transitions out of Case, Ward, and Post-ICU Ward, acting linearly across log-transformed values:
$${{\mbox{shape}}}_{i}^{* }={{\mbox{exp}}}({{\mbox{log}}}({{\mbox{shape}}}_{i})+L),L \sim N(0,{\sigma }_{{\mbox{los}}}^{2}).$$
(4)
The values of H and L were sampled from normal distribution priors with means of zero and standard deviations of \({\sigma }_{\,{\mbox{hosp}}\,}^{2}=0.8\) and \({\sigma }_{\,{\mbox{los}}\,}^{2}=0.5\) respectively. We specified these values to reduce the computational time required while ensuring the output model trajectories had good coverage over the reported occupancy counts. These parameters were changed for some jurisdictions during the study period; see the Supplementary Methods for details.
To illustrate the effect of the H and L parameters, we simulated model outputs for an example forecast with and without these parameters set to zero (Supplementary Methods). This demonstrates that output trajectories without the effect of H and L may already align with the reported occupancy counts, but where this does not occur, they enable the recent reported occupancy counts to be well captured by the fitted model outputs (Supplementary Methods).
Performance evaluation
We consider the performance of our forecasts produced between March and September 2022. We produced plots for the visual assessment of forecast performance (Fig. 3a, b and Supplementary Figs. 9–24) which depict all forecasts across the study period with the same presentation of uncertainty as was used in official reporting of the forecasts (with pointwise credible intervals ranging from 20% through to 90% by steps of 10%, and reported occupancy counts overlaid).
a Forecasts of ward occupancy. b Forecasts of ICU occupancy. Credible intervals from 20% through to 90% in 10% increments are displayed in progressively lighter shading. Reported occupancy counts are overlaid. As we produced our forecasts on a weekly basis and each forecast spans three weeks, forecasts are plotted interleaved across three rows; reported occupancy counts are repeated across each row. Forecast start dates are displayed as vertical dashed lines. Note that forecast start date was dependent upon that of the case forecast, and this varied slightly over time (see forecasts 5, 9, 12, and 19). The second week for each forecast (days 8–14) has background shaded in light blue. An identifier for each forecast, 1 through 21, is displayed above each forecast start, and a ^ is displayed where the upper credible intervals of a forecast exceed the y-axis limits. Forecasts for other states and territories are provided in the Supplementary Materials.
To evaluate the overall performance of our forecasts, we calculated continuous ranked probability scores (CRPS) across log-transformed counts of occupancy. The CRPS measures the distributional accuracy of a set of forecasts against the eventual observations18. The CRPS is a proper scoring rule: in the limit, where a forecast reports the true probabilities of the underlying process, it will receive the greatest score. We calculated CRPS over log-transformed counts (specifically, \({x}^{* }={\log }_{e}(x+1)\)) rather than over raw counts, as this has been argued to be more meaningful given the exponential nature of epidemic growth19. This transformation also allows us to interpret the resultant CRPS values as a relative error19, enabling comparison of the forecast performance between different settings. We also calculated skill scores of our forecasting model in comparison to a naive random walk model (Supplementary Methods), with results presented in Supplementary Figs. 7 and 8.
We calculated forecast bias to examine where the overall performance of our forecast was reduced due to consistent overprediction or underprediction20 (Fig. 5a–h). Forecast bias (as opposed to, for example, estimator bias29) ranges between −1 and 1, with a bias greater than zero indicating overprediction and less than zero indicating underprediction. Bias values of approximately zero are ideal, indicating a forecast that overpredicts as often as it underpredicts (or vice versa).
We produced plots demonstrating the association between the performance of our ward and ICU occupancy forecasts and the underlying case forecasts used as input. Specifically, we compared the case forecast performance calculated using CRPS to the bias of the ward occupancy forecasts (Fig. 6a–h) and ICU occupancy forecasts (Supplementary Fig. 5), and the bias of the case forecasts to that of the ward occupancy forecasts (Supplementary Fig. 6). These values were calculated across the whole horizon of the respective forecasts; it should be noted that such comparisons are inherently limited due to the lag between onset of symptoms and admission to hospital, i.e. the performance of the case forecast at the 28 day horizon would be expected to be of lesser influence given these cases are less likely to be hospitalised within the time-frame of our simulation.
We produced probability integral transform (PIT) plots to evaluate the calibration of the forecast (Supplementary Fig. 3). Calibration refers to the concordance between the distribution of our forecasts and the eventual distribution of observations21; for example, in a well calibrated forecast, each decile across the distribution of all forecast predictions should contain ~10% of the eventual observations. Where overlapping intervals contained the eventual observation (typically due to small integer counts, e.g. in smaller population size jurisdictions), we have counted each overlapping interval as containing the observation, with these down-weighted such that any given observation only contributed a total count of one.
Version control repositories are available on GitHub for the simulation and inference steps (http://github.com/ruarai/curvemush), the forecasting pipeline (http://github.com/ruarai/clinical_forecasts), and performance evaluation and manuscript figure plotting code (http://github.com/ruarai/clinical_forecasting_paper). Analysis was performed in the R statistical computing environment (version 4.3.2)30. The forecasting pipeline was implemented using the targets package31, with tidyverse packages used for data manipulation32, pracma for numerical solutions of the maximum-likelihood estimates33, and Rcpp for interfacing with the stochastic simulation C++ code. Forecasting performance was evaluated using the fabletools, tsibble, and distributional packages34,35,36.
Ethics
The study was undertaken as urgent public health action to support Australia’s COVID-19 pandemic response. The study used data from the Australian National Notifiable Disease Surveillance System (NNDSS) provided to the Australian Government Department of Health and Aged Care under the National Health Security Agreement for the purposes of national communicable disease surveillance. Non-identifiable data from the NNDSS were supplied to the investigator team for the purposes of provision of epidemiological advice to government; data were securely managed to ensure patient privacy and to ensure the study’s compliance with the National Health and Medical Research Council’s Ethical Considerations in Quality Assurance and Evaluation Activities. Contractual obligations established strict data protection protocols agreed between the University of Melbourne and sub-contractors and the Australian Government Department of Health and Aged Care, with oversight and approval for use in supporting Australia’s pandemic response and for publication provided by the data custodians represented by the Communicable Diseases Network of Australia. The use of these data for these purposes, including publication, was agreed by the Department of Health with the Communicable Diseases Network of Australia. Ethical approval for this study was also provided by The University of Melbourne’s Human Research Ethics Committee (2024-26949-50575-3). Further, as part of this ethics approval, the University of Melbourne’s Human Research Ethics Committee provided waiver of consent for the use of the case data, as it was believed to be impracticable to contact each individual included in this routinely collected surveillance data and that there was no likely reason that individuals would not consent if asked.
The study used routinely collected patient administration data from the New South Wales (NSW) Patient Flow Portal (PFP). De-identified PFP data were securely managed to ensure patient privacy and to ensure the study’s compliance with the National Health and Medical Research Council’s Ethical Considerations in Quality Assurance and Evaluation Activities. These data were provided for use in this study to support public health response under the governance of Health Protection NSW. The NSW Public Health Act (2010) allows for such release of data to identify and monitor risk factors for diseases and conditions that have a substantial adverse impact on the population and to improve service delivery. Following review, the NSW Ministry of Health determined that this study met that threshold and therefore provided approval for the study to proceed. Approval for publication was provided by the NSW Ministry of Health.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


