Attwood, S. W., Hill, S. C., Aanensen, D. M., Connor, T. R. & Pybus, O. G. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat. Rev. Genet. 23, 547–562 (2022).
Mboowa, G. et al. Africa in the era of pathogen genomics: unlocking data barriers. Cell 187, 5146–5150 (2024).
Chang, S. et al. Mobility network models of COVID-19 explain inequities and inform reopening. Nature 589, 82–87 (2021).
Manna, A., Koltai, J. & Karsai, M. Importance of social inequalities to contact patterns, vaccine uptake, and epidemic dynamics. Nat. Commun. 15, 4137 (2024).
Tsui, J. L.-H. et al. Genomic assessment of invasion dynamics of SARS-CoV-2 Omicron BA.1. Science 381, 336–343 (2023).
du Plessis, L. et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
Khurana, M. P. et al. High-resolution epidemiological landscape from ~290,000 SARS-CoV-2 genomes from Denmark. Nat. Commun. 15, 7123 (2024).
European Centre for Disease Prevention and Control & European Food Safety Authority. Rapid outbreak assessment—prolonged cross-border multi-serovar Salmonella outbreak linked to consumption of sprouted seeds. CDC https://www.ecdc.europa.eu/en/publications-data/rapid-outbreak-assessment-prolonged-cross-border-multi-serovar-salmonella (2025).
Hill, V. et al. Toward a global virus genomic surveillance network. Cell Host Microbe 31, 861–873 (2023).
Ladner, J. T. & Sahl, J. W. Towards a post-pandemic future for global pathogen genome sequencing. PLoS Biol. 21, e3002225 (2023).
Yozwiak, N. L., Schaffner, S. F. & Sabeti, P. C. Data sharing: make outbreak research open access. Nature 518, 477–479 (2015).
Modjarrad, K. et al. Developing global norms for sharing data and results during public health emergencies. PLoS Med. 13, e1001935 (2016).
Kairouz, P. et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 14, 1–210 (2021).
Teo, Z. L. et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep. Med. 5, 101419 (2024).
Crowson, M. G. et al. A systematic review of federated learning applications for biomedical data. PLOS Digit. Health 1, e0000033 (2022).
Dayan, I. et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 27, 1735–1743 (2021).
Brisimi, T. S. et al. Federated learning of predictive models from federated Electronic Health Records. Int. J. Med. Inform. 112, 59–67 (2018).
Sarma, K. V. et al. Federated learning improves site performance in multicenter deep learning without data sharing. J. Am. Med. Inform. Assoc. 28, 1259–1264 (2021).
Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).
McMahan, H. B., Moore, E., Ramage, D., Hampson, S. & Agüera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. Preprint at https://doi.org/10.48550/arXiv.1602.05629 (2023).
Frieden, T. R., Lee, C. T., Bochner, A. F., Buissonnière, M. & McClelland, A. 7–1–7: an organising principle, target, and accountability metric to make the world safer from pandemics. Lancet 398, 638–640 (2021).
Zwiers, L. C., Grobbee, D. E., Uijl, A. & Ong, D. S. Y. Federated learning as a smart tool for research on infectious diseases. BMC Infect. Dis. 24, 1327 (2024).
Lyu, R., Rosenfeld, R. & Wilder, B. Federated epidemic surveillance. PLoS Comput. Biol. 21, e1012907 (2025).
Chen, Z. et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat. Genet. 54, 499–507 (2022).
Halabi, S., Wilder, R., Gostin, L. O. & Hurtado, M. L. Sharing pathogen genomic sequence data—toward effective pandemic prevention, preparedness, and response. N. Engl. J. Med. 388, 2401–2404 (2023).
Tegally, H. et al. Dispersal patterns and influence of air travel during the global expansion of SARS-CoV-2 variants of concern. Cell 186, 3277–3290 (2023).
McCrone, J. T. et al. Context-specific emergence and growth of the SARS-CoV-2 Delta variant. Nature 610, 154–160 (2022).
WHO Guiding Principles for Pathogen Genome Data Sharing (World Health Organization, 2022); https://www.who.int/publications/i/item/9789240061743
Attributes and Principles of Genomic Data-Sharing Platforms Supporting Surveillance of Pathogens with Epidemic and Pandemic Potential (World Health Organization, 2025); https://www.who.int/publications/b/80650
Britton, T. & Scalia Tomba, G. Estimation in emerging epidemics: biases and remedies. J. R. Soc. Interface 16, 20180670 (2019).
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 368, 489–493 (2020).
Kraemer, M. U. G. et al. Tracking the 2022 monkeypox outbreak with epidemiological data in real-time. Lancet Infect. Dis. 22, 941–942 (2022).
Kalkauskas, A. et al. Sampling bias and model choice in continuous phylogeography: getting lost on a random walk. PLoS Comput. Biol. 17, e1008561 (2021).
Lemey, P. et al. Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2. Nat. Commun. 11, 5110 (2020).
Taylor, B. P. & Hanage, W. P. Founder effects arising from gathering dynamics systematically bias emerging pathogen surveillance. Preprint at eLife https://doi.org/10.7554/eLife.104201.1 (2025).
Vecchia, E. D. Pathoplexus: towards fair and transparent sequence sharing. Lancet Microbe 5, 100995 (2024).
Xu, B. et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data 7, 106 (2020).
Demonstrator video showcasing 1+MG federated analysis infrastructure—paving the way to federated learning. European Genomic Data Infrastructure https://gdi.onemilliongenomes.eu/news/federated-analysis-infrastructure (2025).
European Health Data Space Regulation (EHDS). European Commission https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space-regulation-ehds_en (2025).
Busch-Moreno, S. & Kraemer, M. U. G. Sequential federated analysis of early outbreak data applied to incubation period estimation. Epidemics 54, 100890 (2026).
Deisenroth, M. & Ng, J. W. Distributed Gaussian processes. In Proceedings of the 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 1481–1490 (PMLR, 2015).
Achituve, I., Shamsian, A., Navon, A., Chechik, G. & Fetaya, E. Personalized federated learning with Gaussian processes. In Proc. 35th International Conference on Neural Information Processing Systems (eds Ranzato, M. et al.) 8392–8406 (Curran Associates, 2021).
Hripcsak, G. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015).
Schuemie, M. et al. Health-Analytics Data to Evidence Suite (HADES): open-source software for observational research. Stud. Health Technol. Inform. 310, 966–970 (2024).
Voss, E. A. et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 22, 553–564 (2015).
Khera, R. et al. Comparative effectiveness of second-line antihyperglycemic agents for cardiovascular outcomes: a multinational, federated analysis of LEGEND-T2DM. J. Am. Coll. Cardiol. 84, 904–917 (2024).
Suchard, M. A. et al. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis. Lancet 394, 1816–1826 (2019).
Schuemie, M. J., Chen, Y., Madigan, D. & Suchard, M. A. Combining Cox regressions across a heterogeneous distributed research network facing small and zero counts. Stat. Methods Med. Res. 31, 438–450 (2022).
Xia, T., Ghosh, A., Qiu, X. & Mascolo, C. FLea: addressing data scarcity and label skew in federated learning via privacy-preserving feature augmentation. In Proc. 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 3484–3494 (ACM, 2024).
Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).
Abdallah, R., Abdelgaber, S. & Sayed, H. A. Leveraging AHP and transfer learning in machine learning for improved prediction of infectious disease outbreaks. Sci. Rep. 14, 32163 (2024).
Coelho, F. C., de Holanda, N. L. & Coimbra, B. Transfer learning applied to the forecast of mosquito-borne diseases. Preprint at medRxiv https://doi.org/10.1101/2020.02.03.20020164 (2020).
Gautam, Y. Transfer learning for COVID-19 cases and deaths forecast using LSTM network. ISA Trans. 124, 41–56 (2022).
Liu, Y., Kang, Y., Xing, C., Chen, T. & Yang, Q. A secure federated transfer learning framework. IEEE Intell. Syst. 35, 70–82 (2020).
Roster, K., Connaughton, C. & Rodrigues, F. A. Forecasting new diseases in low-data settings using transfer learning. Chaos Solitons Fractals 161, 112306 (2022).
Saha, S. & Ahmad, T. Federated transfer learning: concept and applications. Intell. Artif. 15, 35–44 (2021).
Ye, Y. & Gu, A. Deep transfer learning for infectious disease case detection using electronic medical records. Preprint at https://doi.org/10.48550/arXiv.2103.06710 (2021).
Zhu, H., Xu, J., Liu, S. & Jin, Y. Federated learning on non-IID data: a survey. Neurocomputing 465, 371–390 (2021).
Hsieh, K., Phanishayee, A., Mutlu, O. & Gibbons, P. B. The non-IID data quagmire of decentralized machine learning. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé, H. & Singh, A.) 4387–4398 (PMLR, 2020).
Li, X., Jiang, M., Zhang, X., Kamp, M. & Dou, Q. FedBN: federated learning on non-IID features via local batch normalization. Preprint at https://doi.org/10.48550/arXiv.2102.07623 (2021).
Wang, J. et al. On the unreasonable effectiveness of federated averaging with heterogeneous data. Preprint at https://doi.org/10.48550/arXiv.2206.04723 (2022).
Reddi, S. et al. Adaptive federated optimization. Preprint at https://doi.org/10.48550/arXiv.2003.00295 (2021).
Karimireddy, S. P. et al. SCAFFOLD: stochastic controlled averaging for federated learning. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé, H. & Singh, A.) 5132–5143 (PMLR, 2020).
Li, T. et al. Federated optimization in heterogeneous networks. In Proc. Machine Learning and Systems Vol. 2 (eds Dhillon, I. et al.) 429–450 (MLSys.org, 2020).
Collins, L., Hassani, H., Mokhtari, A. & Shakkottai, S. Exploiting shared representations for personalized federated learning. In Proc. 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 2089–2099 (PMLR, 2021).
Arivazhagan, M. G., Aggarwal, V., Singh, A. K. & Choudhary, S. Federated learning with personalization layers. Preprint at https://doi.org/10.48550/arXiv.1912.00818 (2019).
Deng, Y., Kamani, M. M. & Mahdavi, M. Adaptive personalized federated learning. Preprint at https://doi.org/10.48550/arXiv.2003.13461 (2020).
Zhu, H. et al. FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting. NPJ Digit. Med. 8, 286 (2025).
Li, F., Lam, H. & Prusty, S. Robust importance weighting for covariate shift. In Proc. 23rd International Conference on Artificial Intelligence and Statistics Vol. 108 (eds Chiappa, S. & Calandra, R.) 352–362 (PMLR, 2020).
Wang, V. H.-C., Lei, J., Shi, T. & Pagán, J. A. Weighting the United States All of Us Research Program data to known population estimates using raking. Prev. Med. Rep. 43, 102795 (2024).
Yap, S. et al. Raking of data from a large Australian cohort study improves generalisability of estimates of prevalence of health and behaviour characteristics and cancer incidence. BMC Med. Res. Methodol. 22, 140 (2022).
Chopra, A., Subramanian, J., Krishnamurthy, B. & Raskar, R. flame: a framework for learning in agent-based ModEls. In Proc. 23rd International Conference on Autonomous Agents and Multiagent Systems 391–399 (ACM, 2024).
Chopra, A. et al. Differentiable agent-based epidemiology. In Proc. 2023 International Conference on Autonomous Agents and Multiagent Systems 1848–1857 (ACM, 2023).
Quera-Bofarull, A. et al. Don’t simulate twice: one-shot sensitivity analyses via automatic differentiation. In Proc. 2023 International Conference on Autonomous Agents and Multiagent Systems 1867–1876 (ACM, 2023).
Farkas, K. et al. Wastewater-based monitoring of SARS-CoV-2 at UK airports and its potential role in international public health surveillance. PLoS Glob. Public Health 3, e0001346 (2023).
Li, J. et al. A global aircraft-based wastewater genomic surveillance network for early warning of future pandemics. Lancet Glob. Health 11, e791–e795 (2023).
Gudde, A. et al. Predicting hospital admissions due to COVID-19 in Denmark using wastewater-based surveillance. Sci. Total Environ. 966, 178674 (2025).
O’Reilly, K. et al. Analysis insights to support the use of wastewater and environmental surveillance data for infectious diseases and pandemic preparedness. Epidemics 51, 100825 (2025).
Chopra, A. et al. On the limits of agency in agent-based models. In Proc. 24th International Conference on Autonomous Agents and Multiagent Systems 500–509 (ACM, 2025).
Garg, A. & Chopra, A. Distributed calibration of agent-based models. Preprint at https://openreview.net/pdf?id=tgBrJUWon5 (2024).
Wymant, C. et al. The epidemiological impact of the NHS COVID-19 app. Nature 594, 408–412 (2021).
Kendall, M. et al. Drivers of epidemic dynamics in real time from daily digital COVID-19 measurements. Science 385, eadm8103 (2024).
Baker, A. et al. Epidemic mitigation by statistical inference from contact tracing data. Proc. Natl Acad. Sci. USA 118, e2106548118 (2021).
Chopra, A., Quera-Bofarull, A., Giray-Kuru, N., Wooldridge, M. & Raskar, R. Private agent-based modeling. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems 381–390 (International Foundation for Autonomous Agents and Multiagent Systems, 2024).
Chopra, A. et al. AgentTorch: large population models. GitHub https://github.com/AgentTorch/AgentTorch (2024).
Tegally, H. et al. The evolving SARS-CoV-2 epidemic in Africa: insights from rapidly expanding genomic surveillance. Science 378, eabq5358 (2022).
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
Hill, V. et al. A new lineage nomenclature to aid genomic surveillance of dengue virus. PLoS Biol. 22, e3002834 (2024).
Suster, C. J. E., Pham, D., Kok, J. & Sintchenko, V. Emerging applications of artificial intelligence in pathogen genomics. Front. Bacteriol. 3, 1326958 (2024).
Qammar, A., Karim, A., Ning, H. & Ding, J. Securing federated learning with blockchain: a systematic literature review. Artif. Intell. Rev. 56, 3951–3985 (2023).
Jackson, C., Presanis, A., Conti, S. & De Angelis, D. Value of information: sensitivity analysis and research design in Bayesian evidence synthesis. J. Am. Stat. Assoc. 114, 1436–1449 (2019).
Fawkes, J., Ter-Minassian, L., Ivanova, D., Shalit, U. & Holmes, C. Is merging worth it? Securely evaluating the information gain for causal dataset acquisition. Preprint at https://doi.org/10.48550/arXiv.2409.07215 (2024).
Malik, A. J., Poole, A. M. & Allison, J. R. Structural phylogenetics with confidence. Mol. Biol. Evol. 37, 2711–2726 (2020).
Mifsud, J. C. O. et al. Mapping glycoprotein structure reveals Flaviviridae evolutionary history. Nature 633, 695–703 (2024).
Gutierrez, B. et al. Routes of importation and spatial dynamics of SARS-CoV-2 variants during localized interventions in Chile. PNAS Nexus 3, pgae483 (2024).
Tsui, J. L.-H. et al. Impacts of climate change-related human migration on infectious diseases. Nat. Clim. Chang. 14, 793–802 (2024).
Carlson, C. J. et al. Pathogens and planetary change. Nat. Rev. Biodivers. 1, 32–49 (2025).
Crawford, F. W. et al. Impact of close interpersonal contact on COVID-19 incidence: evidence from 1 year of mobile device data. Sci. Adv. 8, eabi5499 (2022).
Brittain, J.-S. et al. GRAPEVNE—graphical analytical pipeline development environment for infectious diseases. Wellcome Open Res. 10, 279 (2025).
Copernicus Climate Change Service, Climate Data Store. ERA5 hourly data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS) https://doi.org/10.24381/cds.adbb2d47 (2018).
Moukheiber, D. et al. A multimodal framework for extraction and fusion of satellite images and public health data. Sci. Data 11, 634 (2024).
Suel, E., Bhatt, S., Brauer, M., Flaxman, S. & Ezzati, M. Multimodal deep learning from satellite and street-level imagery for measuring income, overcrowding, and environmental deprivation in urban areas. Remote Sens. Environ. 257, 112339 (2021).
Dasgupta, A. et al. Scalable, open-access and multidisciplinary data integration pipeline for climate-sensitive diseases. Wellcome Open Res. 10, 467 (2025).
Kuhn, M., Kunkel, J. & Ludwig, T. Data compression for climate data. Supercomput. Front. Innov. 3, 75–94 (2016).
Klöwer, M., Razinger, M., Dominguez, J. J., Düben, P. D. & Palmer, T. N. Compressing atmospheric data into its real information content. Nat. Comput. Sci. 1, 713–724 (2021).
Berahmand, K., Daneshfar, F., Salehi, E. S., Li, Y. & Xu, Y. Autoencoders and their applications in machine learning: a survey. Artif. Intell. Rev. 57, 28 (2024).
Bacchus, P., Fraisse, R., Roumy, A. & Guillemot, C. Quasi lossless satellite image compression. In 2022 IEEE International Geoscience and Remote Sensing Symposium 1532–1535 (IEEE, 2022).
Liu, Y., Ponce, C., Brunton, S. L. & Kutz, J. N. Multiresolution convolutional autoencoders. J. Comput. Phys. 474, 111801 (2023).
Zhang, C. et al. A survey on federated learning. Knowl.-Based Syst. 216, 106775 (2021).
Gruson, H. & Jombart, T. linelist: tagging and validating epidemiological data. Zenodo https://doi.org/10.5281/zenodo.11954901 (2024).
Griffiths, E. J. et al. The PHA4GE Microbial Data-Sharing Accord: establishing baseline consensus microbial data-sharing norms to facilitate cross-sectoral collaboration. BMJ Glob. Health 9, e016474 (2024).
Brittain, J.-S., Liggins, P. & Dasgupta, A. globaldothealth/InsightBoard. GitHub https://github.com/globaldothealth/InsightBoard (2024).
Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R. & Stiawan, D. The Fast Health Interoperability Resources (FHIR) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med. Inform. 9, e21929 (2021).
Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).
Thorogood, A. et al. International federation of genomic medicine databases using GA4GH standards. Cell Genom. 1, 100032 (2021).
Fiume, M. et al. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37, 220–224 (2019).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
White, T. Hadoop: The Definitive Guide (O’Reilly, 2012).
Zaharia, M. et al. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 56–65 (2016).
Brewer, E. A. Kubernetes and the path to cloud native. In Proc. 6th ACM Symposium on Cloud Computing 167 (ACM, 2015).
Liu, Z. et al. Monolith: real time recommendation system with collisionless embedding table. Preprint at https://doi.org/10.48550/arXiv.2209.07663 (2022).
Dwork, C., McSherry, F., Nissim, K. & Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography (eds Halevi, S. & Rabin, T.) 265–284 (Springer, 2006).
Gong, R. Exact inference with approximate computation for differentially private data via perturbations. J. Priv. Confid. https://doi.org/10.29012/jpc.797 (2022).
Rivest, R. L., Adleman, L. & Dertouzos, M. L. On data banks and privacy homomorphisms. In Foundations of Secure Computation 169–179 (Academic, 1978).
Bonawitz, K., Kairouz, P., McMahan, B. & Ramage, D. Federated learning and privacy: building privacy-preserving systems for machine learning and data science on decentralized data. Queue 19, 87–114 (2021).
Wieland, S. C., Cassa, C. A., Mandl, K. D. & Berger, B. Revealing the spatial distribution of a disease while preserving privacy. Proc. Natl Acad. Sci. USA 105, 17608–17613 (2008).
Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002).
Jones, D., Snider, C., Nassehi, A., Yon, J. & Hicks, B. Characterising the digital twin: a systematic literature review. CIRP J. Manuf. Sci. Technol. 29, 36–52 (2020).
Li, T. Scalable and trustworthy learning in heterogeneous networks. Proc. AAAI Conf. Artif. Intell. 39, 28715 (2025).
Choudhury, O. et al. Differential privacy-enabled federated learning for sensitive health data. Preprint at https://doi.org/10.48550/arXiv.1910.02578 (2020).
Hartmann, F. & Kairouz, P. Distributed differential privacy for federated learning. Google Research https://research.google/blog/distributed-differential-privacy-for-federated-learning/ (2023).
Abowd, J. M. The U.S. Census Bureau adopts differential privacy. In Proc. 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2867–2867 (ACM, 2018).
Dinur, I. & Nissim, K. Revealing information while preserving privacy. In Proc. 22nd ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems 202–210 (ACM, 2003).
Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy) 3–18 (IEEE, 2017).
Fisher, A. A. et al. Scalable Bayesian phylogenetics. Philos. Trans. R. Soc. Lond. B Biol. Sci. 377, 20210242 (2022).
Varilly, P. et al. Delphy: scalable, near-real-time Bayesian phylogenetics for outbreaks. Preprint at bioRxiv https://doi.org/10.1101/2025.03.25.645253 (2025).
Brito, A. F. et al. Global disparities in SARS-CoV-2 genomic surveillance. Nat. Commun. 13, 7003 (2022).
Onywera, H., Mulder, N., Kebede, Y. & Tessema, S. K. How to sustain a public-health genomics AND bioinformatics workforce in Africa. Nat. Med. 31, 2480–2484 (2025).
Mfuh, K. O., Abanda, N. N. & Titanji, B. K. Strengthening diagnostic capacity in Africa as a key pillar of public health and pandemic preparedness. PLOS Glob. Public Health 3, e0001998 (2023).
Shadbolt, N. et al. The challenges of data in future pandemics. Epidemics 40, 100612 (2022).
Wong, B. L. H. et al. Harnessing the digital potential of the next generation of health professionals. Hum. Resour. Health 19, 50 (2021).
Kaduru, C. et al. Strengthening local capacity for mathematical modelling in low- and middle-income countries: the process and lessons learnt in implementing the first cohort of Nigeria malaria modelling fellowships. Malar. J. 24, 116 (2025).
Africa CDC launches AGARI, a continent-wide genomic data platform to strengthen outbreak response. Africa CDC https://africacdc.org/news-item/africa-cdc-launches-agari-a-continent-wide-genomic-data-platform-to-strengthen-outbreak-response/ (2025).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 9459–9474 (Curran Associates, 2020).
Hou, X., Zhao, Y., Wang, S. & Wang, H. Model Context Protocol (MCP): landscape, security threats, and future research directions. Preprint at https://doi.org/10.48550/arXiv.2503.23278 (2025).
Introducing the Model Context Protocol. Anthropic https://www.anthropic.com/news/model-context-protocol (2024).
Henke, E. et al. Conceptual design of a generic data harmonization process for OMOP common data model. BMC Med. Inform. Decis. Mak. 24, 58 (2024).
Gottweis, J. & Natarajan, V. Accelerating scientific breakthroughs with an AI co-scientist. Google Research https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ (2025).
Karimireddy, S. P., Guo, W. & Jordan, M. I. Mechanisms that incentivize data sharing in federated learning. Preprint at https://doi.org/10.48550/arXiv.2207.04557 (2022).
Evertsz, N., Bull, S. & Pratt, B. What constitutes equitable data sharing in global health research? A scoping review of the literature on low-income and middle-income country stakeholders’ perspectives. BMJ Glob. Health 8, e010157 (2023).
Serwadda, D., Ndebele, P., Grabowski, M. K., Bajunirwe, F. & Wanyenze, R. K. Open data sharing and the Global South—who benefits? Science 359, 642–643 (2018).
Viana, R. et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature 603, 679–686 (2022).
Tegally, H. et al. Detection of a SARS-CoV-2 variant of concern in South Africa. Nature 592, 438–443 (2021).
Butera, Y. et al. Genomic and transmission dynamics of the 2024 Marburg virus outbreak in Rwanda. Nat. Med. 31, 422–426 (2025).
The Americas seek to expand genomic surveillance for dengue, chikungunya and other mosquito-borne viruses. Pan American Health Organization https://www.paho.org/en/news/16-8-2023-americas-seek-expand-genomic-surveillance-dengue-chikungunya-and-other-mosquito (2023).
Giovanetti, M. et al. Genomic and epidemiological surveillance of Zika virus in the Amazon region. Cell Rep. 30, 2275–2283 (2020).
Madewell, Z. J., Yang, Y., Longini, I. M. Jr., Halloran, M. E. & Dean, N. E. Household secondary attack rates of SARS-CoV-2 by variant and vaccination status: an updated systematic review and meta-analysis. JAMA Netw. Open 5, e229317 (2022).
Cuomo-Dannenburg, G. et al. Marburg virus disease outbreaks, mathematical models, and disease parameters: a systematic review. Lancet Infect. Dis. 24, e307–e317 (2024).
Marchello, C. S. et al. Complications and mortality of non-typhoidal salmonella invasive disease: a global systematic review and meta-analysis. Lancet Infect. Dis. 22, 692–705 (2022).
Deeks, J. J. et al. Analysing data and undertaking meta-analyses. In Cochrane Handbook for Systematic Reviews of Interventions (eds Higgins, J. P. T. et al.) 241–284 (Wiley, 2019).
Lison, A., Abbott, S., Huisman, J. & Stadler, T. Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates. PLoS Comput. Biol. 20, e1012021 (2024).
Biazzo, I., Braunstein, A., Dall’Asta, L. & Mazza, F. A Bayesian generative neural network framework for epidemic inference problems. Sci. Rep. 12, 19673 (2022).
Semenova, E., Mishra, S., Bhatt, S., Flaxman, S. & Unwin, H. J. T. Deep learning and MCMC with aggVAE for shifting administrative boundaries: mapping malaria prevalence in Kenya. In Epistemic Uncertainty in Artificial Intelligence Vol. 14523 (eds Cuzzolin, F. & Sultana, M.) 13–27 (Springer, 2024).
Williams, R., Hosseinichimeh, N., Majumdar, A. & Ghaffarzadegan, N. Epidemic modeling with generative agents. Preprint at https://doi.org/10.48550/arXiv.2307.04986 (2023).
Zhang, C. & Matsen, F. A. IV. A variational approach to Bayesian phylogenetic inference. J. Mach. Learn. Res. 25, 6890–6945 (2024).
Ki, C. & Terhorst, J. Variational phylodynamic inference using pandemic-scale data. Mol. Biol. Evol. 39, msac154 (2022).
Chatzilena, A., van Leeuwen, E., Ratmann, O., Baguelin, M. & Demiris, N. Contemporary statistical inference for infectious disease models using Stan. Epidemics 29, 100367 (2019).
Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. & Blei, D. M. Automatic differentiation variational inference. J. Mach. Learn. Res. 18, 430–474 (2017).
el Mekkaoui, K., Mesquita, D., Blomstedt, P. & Kaski, S. Federated stochastic gradient Langevin dynamics. In Proc. Conference on Uncertainty in Artificial Intelligence Vol. 161 (eds de Campos, C. & Maathuis, M.) 1703-1712 (PMLR, 2020).
Nemeth, C. & Fearnhead, P. Stochastic gradient Markov Chain Monte Carlo. J. Am. Stat. Assoc. 116, 433–450 (2021).
Voznica, J. et al. Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks. Nat. Commun. 13, 3896 (2022).
Asher, M., Lomax, N., Morrissey, K., Spooner, F. & Malleson, N. Dynamic calibration with approximate Bayesian computation for a microsimulation of disease spread. Sci. Rep. 13, 8637 (2023).
Frazier, P. I. A tutorial on Bayesian optimization. Preprint at https://doi.org/10.48550/arXiv.1807.02811 (2018).
Liu, D. & Sopasakis, A. A combined neural ODE–Bayesian optimization approach to resolve dynamics and estimate parameters for a modified SIR model with immune memory. Heliyon 10, e38276 (2024).
Reiker, T. et al. Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria. Nat. Commun. 12, 7212 (2021).
He, X., Zhao, K. & Chu, X. AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212, 106622 (2021).
Moraga, P. et al. Bayesian spatial modelling of geostatistical data using INLA and SPDE methods: a case study predicting malaria risk in Mozambique. Spat. Spatiotemporal Epidemiol. 39, 100440 (2021).
Lindgren, F. & Rue, H. Bayesian spatial modelling with R-INLA. J. Stat. Softw. https://doi.org/10.18637/jss.v063.i19 (2015).
Li, W., Chen, H., Jiang, X. & Harmanci, A. FedGMMAT: federated generalized linear mixed model association tests. PLoS Comput. Biol. 20, e1012142 (2024).
Li, W. et al. Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources. BMC Med. Inform. Decis. Mak. 22, 269 (2022).
Limpoco, M. A. A., Faes, C. & Hens, N. Linear mixed modeling of federated data when only the mean, covariance, and sample size are available. Stat. Med. 44, e10300 (2025).
Yan, Z., Zachrison, K. S., Schwamm, L. H., Estrada, J. J. & Duan, R. A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data. PLoS ONE 18, e0280192 (2023).
Chang, H. & Shokri, R. Bias propagation in federated learning. Preprint at https://doi.org/10.48550/arXiv.2309.02160 (2023).
Fowl, L., Geiping, J., Czaja, W., Goldblum, M. & Goldstein, T. Robbing the Fed: directly obtaining private data in federated learning with modified models. Preprint at https://doi.org/10.48550/arXiv.2110.13057 (2022).
Almodóvar, A., Parras, J. & Zazo, S. Propensity weighted federated learning for treatment effect estimation in distributed imbalanced environments. Comput. Biol. Med. 178, 108779 (2024).
Almodóvar, A., Parras, J. & Zazo, S. Federated learning for causal inference using deep generative disentangled models. Preprint at https://openreview.net/pdf?id=r7qL5vM3Aa (2023).
Meurisse, M. et al. Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment. BMC Med. Res. Methodol. 23, 248 (2023).
Vo, T. V., Lee, Y. & Leong, T.-Y. Federated causal inference from observational data. Preprint at https://doi.org/10.48550/arXiv.2308.13047 (2023).
Xiong, R. et al. Federated causal inference in heterogeneous observational data. Stat. Med. 42, 4418–4439 (2023).
Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D. & Shmatikov, V. How to backdoor federated learning. Preprint at https://doi.org/10.48550/arXiv.1807.00459 (2019).
He, M., Tang, S. & Xiao, Y. Combining the dynamic model and deep neural networks to identify the intensity of interventions during COVID-19 pandemic. PLoS Comput. Biol. 19, e1011535 (2023).
Fu, W. et al. Privacy-preserving individual-level COVID-19 infection prediction via federated graph learning. ACM Trans. Inf. Syst. 42, 82 (2024).
Liu, Z., Wan, G., Prakash, B. A., Lau, M. S. Y. & Jin, W. A review of graph neural networks in epidemic modeling. In Proc. 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 6577–6587 (ACM, 2024).
Panja, M., Chakraborty, T., Kumar, U. & Liu, N. Epicasting: an ensemble wavelet neural network for forecasting epidemics. Neural Netw. 165, 185–212 (2023).
Wu, Y., Yang, Y., Nishiura, H. & Saitoh, M. Deep learning for epidemiological predictions. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval 1085–1088 (ACM, 2018).
Wood, D. et al. A unified theory of diversity in ensemble learning. J. Mach. Learn. Res. 24, 17302–17350 (2023).
Mnih, V. et al. Asynchronous methods for deep reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.1602.01783 (2016).
Samsami, M. R. & Alimadad, H. Distributed deep reinforcement learning: an overview. Preprint at https://doi.org/10.48550/arXiv.2011.11012 (2020).
Yin, Q. et al. Distributed deep reinforcement learning: a survey and a multi-player multi-agent learning toolbox. Mach. Intell. Res. 21, 411–430 (2024).
Nicholls, S. M. et al. CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 22, 196 (2021).
BroadE: introduction to Terra: a scalable platform for biomedical research. Broad Institute https://www.broadinstitute.org/videos/broade-introduction-terra-scalable-platform-biomedical-research (2021).
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
Turakhia, Y. et al. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 53, 809–816 (2021).
De Maio, N. et al. Maximum likelihood pandemic-scale phylogenetics. Nat. Genet. 55, 746–752 (2023).
Kramer, A. M. et al. Online phylogenetics with matOptimize produces equivalent trees and is dramatically more efficient for large SARS-CoV-2 phylogenies than de novo and maximum-likelihood implementations. Syst. Biol. 72, 1039–1051 (2023).
Zachariasen, T. et al. MAGinator enables accurate profiling of de novo MAGs with strain-level phylogenies. Nat. Commun. 15, 5734 (2024).
Zhang, C., Nielsen, R. & Mirarab, S. CASTER: direct species tree inference from whole-genome alignments. Science 387, eadk9688 (2025).
Benoit, P. et al. Seven-year performance of a clinical metagenomic next-generation sequencing test for diagnosis of central nervous system infections. Nat. Med. 30, 3522–3533 (2024).
Wang, S. et al. PathoTracker: an online analytical metagenomic platform for Klebsiella pneumoniae feature identification and outbreak alerting. Commun. Biol. 7, 1038 (2024).
Ko, K. K. K., Chng, K. R. & Nagarajan, N. Metagenomics-enabled microbial surveillance. Nat. Microbiol. 7, 486–496 (2022).
Kent, C. et al. PrimalScheme: open-source community resources for low-cost viral genome sequencing. Preprint at bioRxiv https://doi.org/10.1101/2024.12.20.629611 (2024).
Pan American Health Organization/World Health Organization. Informative note: update cases of pneumonia due to Legionella—Tucumán, Argentina. Pan American Health Organization https://www.paho.org/sites/default/files/2023-07/20223septemberphetechnicalnotepneumonia-due-legionellaargen.pdf (2022).
Venkatesan, P. UK launch metagenomic pathogen surveillance programme. Lancet Microbe https://doi.org/10.1016/j.lanmic.2025.101143 (2025).
Arita, I. et al. Role of a sentinel surveillance system in the context of global surveillance of infectious diseases. Lancet Infect. Dis. 4, 171–177 (2004).
Anker, K. M. et al. Exploring genetic signatures of zoonotic influenza A virus at the swine–human interface with phylogenetic and ancestral sequence reconstruction. Virus Evol. 11, veaf028 (2025).
Mollentze, N., Babayan, S. A. & Streicker, D. G. Identifying and prioritizing potential human-infecting viruses from their genome sequences. PLoS Biol. 19, e3001390 (2021).
Mollentze, N. & Streicker, D. G. Predicting zoonotic potential of viruses: where are we? Curr. Opin. Virol. 61, 101346 (2023).
Wille, M., Geoghegan, J. L. & Holmes, E. C. How accurately can we assess zoonotic risk? PLoS Biol. 19, e3001135 (2021).
Mollentze, N. & Streicker, D. G. Viral zoonotic risk is homogenous among taxonomic orders of mammalian and avian reservoir hosts. Proc. Natl Acad. Sci. USA 117, 9423–9430 (2020).
Pandit, P. S. et al. Predicting the potential for zoonotic transmission and host associations for novel viruses. Commun. Biol. 5, 844 (2022).