Accurate mortality data are essential during public health emergencies, yet COVID-19 death reporting in the United States was often delayed or incomplete, which potentially influences population health assessments and policy decisions. Determining the cause of death is complex and affected by systemic, social, and institutional factors, raising concerns regarding the undercounting of death cases, especially across various sociodemographic groups. Previous evidence has largely relied on excess mortality models and estimated the higher death counts compared to officially reported figures. But these methods cannot clearly differentiate deaths caused directly by infection from those resulting from indirect pandemic effects. Emerging approaches use detailed data and machine learning for improving death estimates. By leveraging reliable in-hospital data, researchers can better detect unrecognized COVID-19 deaths, especially in out-of-hospital settings, and assess disparities in underreporting across populations. This approach was well explained in a study published in Science Advances.
In this study, machine learning models were trained and validated on inpatient deaths using cross-validation techniques. The selected model then predicted COVID-19 deaths in out-of-hospital settings. Researchers estimated unrecognized death cases and calculated an adjusted reporting ratio (ARR) to assess underreporting across populations, followed by fairness as well as misclassification analyses. The model incorporated the temporal (seasonality and month of death), clinical (nursing home, in-patient hospital, out-patient hospital or emergency, hospice, dead on arrival to hosipatl), demographic (age, race, education, sex, ethinicity), and geographic (country, state, residence of death) factors by national mortality data from March 2020 to December 2021. XGBoost showed the best performance and was applied to predict underreported COVID-19 deaths and disparities across sociodemographic groups.
Results showed that total COVID-19 death cases were estimated at 995,787 (95% uncertainty interval [UI]: 990,313-1,001,363), exceeding the 840,251 officially reported deaths by 19% with an ARR of 1.19 (95% UI: 1.18-1.19). This suggests about 155,536 (95% UI: 150,062-161,112) unrecognized deaths (95% UI: 150,062-161,112), mostly in out-of-hospital settings. Deaths at home were underreported, with 111,245 (95% UI: 108,372-114,210) unrecognized death cases in home and an ARR of 2.60 (95% UI: 2.56- 2.65), followed by emergency/outpatient settings (14,832 deaths, 95% UI: 14,364-15,315), other settings (9,452 deaths, 95% UI: 9,011-9,926), and hospice (17,346 deaths, 95% UI: 16,832 -17,873).
Geographically, underreporting was highest in the East South Central with ARR of 1.25 (95% UI: 1.23-1.28), West South Central with 1.31 (95% UI: 1.29-1.33), and Middle Atlantic with 1.26 (95% UI: 1.24-1.27) regions. States like Oklahoma, with an ARR of 1.51 (95% UI: 1.46-1.57) and Alabama, with 1.67 (95% UI: 1.61-1.74), showed the greatest gap.
Disparities were also evident: higher underreporting was observed among individuals aged 65 to 84 years with ARR of 1.21-1.22, males with 1.22 (95% UI: 1.21-1.23), females with 1.14 (95% UI: 1.13 to 1.15), lower educational attainment individuals with 1.29 (95% UI: 1.28-1.31), and Hispanic populations with 1.31(95% UI: 1.30-1.32). Counties with lower income, poorer health, and higher diabetes prevalence also exhibited greater underreporting with an ARR of 1.33 (95% UI: 1.30-1.35).
In conclusion, this study found that COVID-19 deaths in the U.S. were underreported by approximately 19%, with around 155,536 unrecognized deaths, mainly occurring outside hospital settings. Undercounting was highest for deaths occurring at home and varied across populations and regions. These disparities highlight inequities in death reporting and underscore the need for improved and more equitable public health surveillance systems.
Reference: Kiang MV, Li ZR, Wrigley-Field E, et al. Applying machine learning to identify unrecognized COVID-19 deaths recorded as other causes of death in the United States. Science Advances. 2026;12:eaef5697. doi:10.1126/sciadv.aef5697


