GPT-4 Surpasses ChatGPT and Human Benchmarks in USMLE Soft Skill Assessments

October 5, 2023

Dipannita Roy

In the rapidly evolving world of medical technology, artificial intelligence (AI) stands out as a transformative force. Large language models (LLMs) such as ChatGPT and GPT-4 are at the forefront of this revolution, garnering significant attention for their potential applications in the medical field. One area of interest is their performance in the United States Medical Licensing Examination (USMLE), a pivotal series of tests that evaluate a wide array of skills essential for practicing medicine.

The USMLE is not just a measure of a candidate’s medical knowledge. It delves deeper, assessing their ability to navigate intricate interpersonal situations, uphold patient safety standards, and exercise professional legal and ethical judgments. These ‘soft skills’ form the bedrock of effective medical practice. They enable physicians to resonate with their patients, ensuring not just clinical efficiency but also a humane touch in patient care.

Given the importance of these skills, a study was conducted to evaluate how ChatGPT and GPT-4 fare in answering USMLE-style questions that test these soft skills, particularly empathy, human judgment, and other interpersonal abilities. The research was comprehensive, utilizing a set of 80 questions that mirror the USMLE’s style and standards.

These questions were sourced from two reputable platforms: the official USMLE website and the AMBOSS question bank, a recognized resource for medical practitioners and students. The results of the study were illuminating. GPT-4, the newer model, outshone ChatGPT by a significant margin. It answered a whopping 90% of the questions correctly, compared to ChatGPT’s 62.5%.

But it wasn’t just the accuracy that was noteworthy. GPT-4 displayed an unwavering confidence in its responses. It did not feel the need to revise or second-guess its answers, showcasing a level of certainty in its judgments. In contrast, ChatGPT exhibited a higher rate of self-revision, adjusting its initial answers 82.5% of the time when given an opportunity.

When pitted against human performance, GPT-4’s capabilities became even more evident. AMBOSS’s user statistics, which served as a benchmark for human performance, reported an average correct response rate of 78% for the same set of questions. While ChatGPT lagged behind this human benchmark with an accuracy of 61%, GPT-4 surpassed it with an impressive rate of 86.4%. This suggests that GPT-4 not only holds its own against human judgment but, in certain scenarios, even exceeds it.

The implications of these findings are profound. If an AI model like GPT-4 can demonstrate such a high degree of empathy, ethical judgment, and professionalism, it opens up a world of possibilities for its integration into the medical field. From patient interactions to ethical consultations, the potential applications are vast. However, as with all technological advancements, there are challenges and considerations.

The study, while comprehensive, had its limitations. The question pool, though diverse, was limited to 80 multiple-choice questions. This might not fully encapsulate the vast range of soft skills essential to medical practice. Moreover, while GPT-4’s consistency is commendable, it’s crucial to understand the underlying reasons for its unwavering confidence and how it might translate to real-world scenarios.

In conclusion, the rise of AI in medicine, as exemplified by models like GPT-4, offers a promising future. Their potential to augment human capacity, especially in areas that demand empathy and judgment, can revolutionize patient care. However, as we tread this new path, it’s essential to approach it with caution, ensuring that the human touch in medicine remains irreplaceable.

Journal Reference

Brin, D., Sorin, V., Vaid, A., Soroush, A., Glicksberg, B. S., Charney, A. W., … Klang, E. (2023). Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Retrieved from https://www.nature.com/articles/s41598-023-43436-9