Artificial intelligence, particularly in the form of chatbots like OpenAI’s ChatGPT, has been lauded for its potential to assist in medical decision-making. However, a recent study conducted by researchers at Brigham and Women’s Hospital, affiliated with Harvard Medical School, has raised concerns about the accuracy and safety of AI-generated cancer treatment recommendations.
The study, published recently in JAMA Oncology, focused on evaluating ChatGPT’s ability to provide treatment advice that aligns with guidelines set forth by the National Comprehensive Cancer Network (NCCN). While the findings revealed some promising aspects, they also highlighted significant shortcomings.
According to the study and New York Post, all ChatGPT’s responses included at least one recommendation that was concordant with NCCN guidelines. This suggests that the chatbot has a fundamental understanding of established treatment protocols. However, a concerning 34% of the responses also contained incorrect treatment recommendations, potentially endangering patients who rely on this information.
Even more alarming was the discovery that approximately 12% of ChatGPT’s responses contained outright false information, described as “hallucinations,” with no basis in accepted cancer treatments. These hallucinations typically pertained to localized treatment of advanced disease, targeted therapy, or immunotherapy, indicating a significant gap in the chatbot’s medical knowledge.
Danielle Bitterman, an oncologist at the Artificial Intelligence in Medicine program of the Mass General Brigham health system, expressed her concerns about ChatGPT’s mixed accuracy: “ChatGPT speaks oftentimes in a very sure way that seems to make sense, and the way that it can mix incorrect and correct information is potentially dangerous.”
This study’s results underscore a prevailing concern among critics, including entrepreneur Elon Musk, who has warned of the potential for advanced AI tools to disseminate misinformation if adequate safeguards are not in place. While AI language models like ChatGPT possess impressive capabilities, their limitations and potential for errors cannot be ignored, especially in contexts where human lives are at stake.
The researchers conducted the study by instructing ChatGPT to generate treatment recommendations for breast, prostate, and lung cancer. While AI language models have demonstrated proficiency in medical knowledge and diagnostics, this study found that ChatGPT struggled to provide accurate cancer treatment recommendations, revealing its vulnerability to generating false or potentially harmful information.
OpenAI, the organization behind ChatGPT, has been forthright about the limitations of its technology. In a blog post from March, they admitted that GPT-4, the current iteration of the chatbot available to the public, is “still not fully reliable” and prone to “hallucinating” facts and making reasoning errors.
OpenAI emphasized the need for caution when using language model outputs, particularly in high-stakes situations. The appropriate protocol for utilizing these models should align with the specific use case, which may include human review, additional contextual information, or avoiding high-stakes applications altogether. Developers also bear a responsibility to distribute technologies that do not pose harm, and both patients and healthcare providers must be aware of the limitations of these AI tools.
The scrutiny surrounding ChatGPT has intensified as its popularity has grown. Earlier this month, UK-based researchers revealed that ChatGPT displayed a “significant” bias toward liberal political viewpoints, highlighting concerns about the potential for AI to perpetuate biases in decision-making processes. Inaccurate responses are not unique to ChatGPT; Google’s AI model, Bard, has also been known to generate false information in response to user queries.
ADVERTISEMENT
The implications of AI-generated misinformation extend beyond healthcare. Some experts have raised concerns that chatbots and other AI products could disrupt the upcoming 2024 presidential election, emphasizing the need for robust oversight and regulation in the development and deployment of AI technologies.
While AI language models like ChatGPT hold promise in various domains, including healthcare, a recent study reveals significant shortcomings in their ability to provide accurate and reliable medical advice. Addressing these limitations and implementing appropriate safeguards is crucial to ensure the safe and responsible use of AI in critical decision-making contexts. As AI continues to evolve, striking a balance between its capabilities and limitations remains a pressing challenge for researchers, developers, and society at large.