AI’s 72% Accuracy in Clinical Decisions Sparks Controversy

As AI technology continues to advance in complex medical scenarios, it is simultaneously stirring controversy within the medical community. Doctors are confronted with critical questions about the acceptable success rates for AI-assisted diagnoses and the real-world reliability of AI, given its promising performance in controlled research environments. 

A recent study conducted by researchers at Mass General Brigham delves into the performance of ChatGPT, an AI model and published in the Journal of Medical Internet Research, in the context of clinical decision-making. The study assessed ChatGPT’s capabilities across a wide range of clinical care scenarios, going beyond a single task evaluation. The findings revealed that ChatGPT achieved an overall clinical decision-making accuracy of 72%, encompassing tasks from identifying potential diagnoses to making final diagnoses and care decisions. 

The significance of this research lies in the potential of AI to enhance the efficiency and accuracy of medical diagnoses, particularly in a U.S. healthcare system that is becoming increasingly complex and costly due to an aging population. Despite hosting some of the world’s top physicians and medical facilities, the United States allocated approximately 18% of its GDP to healthcare in 2021, nearly double the average of advanced economies. 

Notably, the Mass General Brigham study stands out as one of the first comprehensive assessments of the capabilities of large language models in the realm of clinical care. It evaluates ChatGPT’s decision support from the initial patient interaction through the entire care continuum, including post-diagnosis care management. 

While ChatGPT demonstrated impressive performance, correctly diagnosing patients 77% of the time, its accuracy dropped to 60% in cases requiring “differential diagnosis.” This emphasizes the challenge AI faces in comprehensively understanding all potential conditions suggested by a set of symptoms. 

In parallel, another study conducted across 171 hospitals in the United States and the Netherlands explored the effectiveness of a machine learning model called ELDER-ICU. This model excelled at identifying the illness severity of older adults admitted to intensive care units, offering valuable assistance to clinicians in prioritizing geriatric ICU patients who require heightened or earlier attention. 

Despite these promising developments, it’s important to acknowledge that while AI has outperformed medical professionals in specific tasks, like cancer detection through medical imaging, translating AI research into real-world clinical practice remains a complex endeavor. Critics argue that many AI studies do not align with genuine clinical needs. One notable distinction is that AI testing in a research setting carries no risk of malpractice lawsuits, unlike situations involving human clinicians operating independently or with AI assistance in actual clinical environments. 

According to Marc Succi, executive director at Mass General Brigham’s innovation incubator and co-author of the study, there is more work to be done to bridge the gap between a useful machine learning model and its practical application in clinical practice. Succi highlights that AI’s value to doctors is most apparent in the early stages of patient care when limited information is available, and a list of potential diagnoses is needed. 

Succi also notes that large language models like ChatGPT need improvement in the domain of differential diagnosis before they can be considered ready for widespread clinical use. Researchers should explore the application of AI to hospital tasks that do not require final diagnoses, such as emergency room triage. Assessing AI’s value in comparison to doctors of varying levels of seniority is also a complex task since there are no clear benchmarks for success rates among doctors at different stages of their careers. 

Looking ahead, Succi emphasizes that for ChatGPT and similar AI models to be effectively deployed in hospitals, more benchmark research and regulatory guidance are necessary. Additionally, diagnostic success rates must increase to a range of 80% to 90% to ensure patient safety and reliable clinical outcomes. 

While AI continues to make strides in medical contexts, its integration into everyday clinical practice remains a multifaceted challenge. The Mass General Brigham study demonstrates both the potential and limitations of AI in clinical decision-making, highlighting the need for further research, refinement, and regulatory considerations as healthcare systems evolve to meet the demands of an aging population. 

Latest Posts

Free CME credits

Both our subscription plans include Free CME/CPD AMA PRA Category 1 credits.

Digital Certificate PDF

On course completion, you will receive a full-sized presentation quality digital certificate.

medtigo Simulation

A dynamic medical simulation platform designed to train healthcare professionals and students to effectively run code situations through an immersive hands-on experience in a live, interactive 3D environment.

medtigo Points

medtigo points is our unique point redemption system created to award users for interacting on our site. These points can be redeemed for special discounts on the medtigo marketplace as well as towards the membership cost itself.
 
  • Registration with medtigo = 10 points
  • 1 visit to medtigo’s website = 1 point
  • Interacting with medtigo posts (through comments/clinical cases etc.) = 5 points
  • Attempting a game = 1 point
  • Community Forum post/reply = 5 points

    *Redemption of points can occur only through the medtigo marketplace, courses, or simulation system. Money will not be credited to your bank account. 10 points = $1.

All Your Certificates in One Place

When you have your licenses, certificates and CMEs in one place, it's easier to track your career growth. You can easily share these with hospitals as well, using your medtigo app.

Our Certificate Courses