Artificial intelligence can assist in interpreting diagnostic images, offering valuable support to healthcare professionals and researchers. While human eyes may miss fractures and other abnormalities in an X-ray, AI models can enhance the accuracy of medical imaging by detecting these issues.
However, a new study published in Scientific Reports highlights a novel blind spot in medical imaging research using AI: the phenomenon of ‘shortcut learning,’ which can lead to highly accurate but potentially misleading results.
The researchers analyzed over 25,000 knee X-rays and found that AI models could predict implausible details such as whether patients had refrained from eating refried beans or drinking beer. These models, which have no medical basis, achieved high accuracy by detecting unintended patterns in the data.
Artificial intelligence has the potential to revolutionize medical imaging, but caution is necessary, says Dr. Peter Schilling, the study’s senior author and an orthopedic surgeon at Dartmouth Hitchcock Medical Center. Not all patterns identified by these models are visible to humans, nor are they always accurate. Schilling emphasizes the importance of understanding these risks to avoid drawing misleading conclusions and ensure the integrity of scientific research.
The researchers examined how AI algorithms often rely on confounding variables such as differences in X-ray equipment and clinical site markers, rather than medically relevant features. While some biases were partially addressed, the AI models simply learned other hidden patterns in the data.
The bias extends beyond race or gender clues—where one dominant race or gender might be overrepresented, which is natural, says Brandon Hill, coauthor of the study and a machine learning scientist at Dartmouth Hitchcock. “We also found that the algorithm could predict the year an X-ray was taken. This is problematic because if the model doesn’t stop learning one pattern, it will simply learn another. Such risks can lead to misleading claims and researchers must be aware of how easily this can occur when using this method.
The findings hold up the importance of strict evaluation standards in AI-based medical research. If standard algorithms are over-relied upon without deeper consideration, clinical insights, and treatment pathways may be built on error. ‘When you use models for discovering new patterns in medicine, the burden of proof just goes way up.’ Part of the problem is our own bias. It is all too easy to lead oneself onto the false track of presuming the model can see the way we do.
The study identified 11,921 abstracts, of which 9,484 were screened. After excluding 8,721 based on title and abstract criteria, 763 full manuscripts were assessed individually, with 260 excluded. A total of 503 papers met the criteria for the systematic review, containing data on sensitivity, specificity, or AUC. A meta-analysis included 273 studies in ophthalmology (n = 82), respiratory medicine (n = 115), and breast cancer (n = 82).
Qualitative synthesis was also undertaken for two hundred twenty-four other studies conducted in other medical specialties. All 224 studies in other medical specialties reporting diagnostic accuracy of DL algorithms to identify disease were also included in the literature search. Also included were large numbers of studies in the fields of neurology/neurosurgery (78), gastroenterology/hepatology (24), and urology (25). Only 55 of the 224 studies compared the performance of the algorithm with healthcare professionals.
Hill adds, “AI is almost like dealing with an alien intelligence.” While one might be tempted to say the model is ‘cheating,’ this anthropomorphizes the technology. The model finds a way to solve a problem, but its approach may not align with how a human would solve it. “As we know, AI lacks logic or reasoning,” Hill explains.
Reference: Aggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med. 2021;4(1):65. doi:10.1038/s41746-021-00438-z


