To project and modify a patient’s future, the healthcare decision-making depends on knowledge of the patient’s past and present health conditions. Methods of artificial intelligence (AI) show promise for helping this effort by using large collections of medical information to identify patterns of disease progression. With greater accuracy than current single-disease models, Delphi-2M forecasts the rates of over 1,000 illnesses in each person’s medical history. Human disease progression with age is defined by acute episodes and chronic debilitation, frequently occuring in the form of comorbidities.
Individuals are affected differently by patterns of multimorbidity, influenced by socioeconomic status, lifestyle, and heritable characteristics. Understanding and forecasting illness progression patterns is critical in aging populations, as they show changes in the morbidities within their demographic context. It is essential to predict the burden of illness for the purpose of healthcare and economic planning, as well as for the ongoing monitoring of disease incidence.
The comparison between disease progression models and large language models (LLMs) involves previous occurrences and the use of their relationships to predict future outcomes. Based on a person’s previous medical history, Delphi-2M estimates the rates of over 1,000 diseases, leveraging current single-disease models.
For this study, International Classification of Diseases (ICD-10) codes recorded at the time of initial diagnosis and death were used. All records for 4,71,057 individuals were still alive as of July 1, 2020; the data for the remaining 100,639 people were used for adjustment and validation. A continuous age encoding based on cosine and sine basis functions replaced the generative pretrained transformer (GPT) positional encoding in the first modification.
Model performance increases with the quantity of datapoints, consistent with documented empirical scaling laws, which were confirmed by a detailed screening of architectural parameters. The sex and age-stratified incidence were used as an epidemiological baseline to examine the accuracy of Delphi-2M in forecasting various outcomes of the validation cohort. The average age-diversified area under the receiver operating characteristic (ROC) curve validated Delphi’s ability to predict the next diagnostic token across human diseases.
Researchers selected participants’ health trajectories from the Biobank validation data up to 60-year-old individuals to evaluate the impact on their future health. Synthetic biomedical data was also employed to safeguard privacy, ensuring no personal identities were revealed.
Patterns of co-morbidities were frequently clustered within specific ICD-10 chapters, leading to increased rates of other diseases among diseases of the same categories. The UK Biobank is a prospective cohort study that was recruited from 2006 to 2010, which includes data from 500,000 individuals representing diverse demographic groups.
Findings showed that pregnancy-related disorders occurred within a limited timeframe and disappeared entirely after 10 years. AI methods provided time-dependent predictions across multiple disease conditions, enhancing understanding of individualized health risks and informing improved medical practices.
Reference: Shmatko A, Jung AW, Gaurav K, et al. Learning the natural history of human disease with generative transformers. Nature. 2025. doi:10.1038/s41586-025-09529-3



