A new research has challenged the notion that generative artificial intelligence (AI) can effectively mitigate burnout among healthcare professionals, suggesting potential risks to patient safety and clinical decision-making.
Despite hopes that AI, particularly large language models (LLMs), could alleviate the burdens of electronic health record (EHR) systems and administrative tasks, recent investigations conducted by U.S. health systems paint a different picture.
A 2023 observational study at Brigham and Women’s Hospital in Boston, Massachusetts, examined the use of AI for electronic patient messaging.
Researchers tasked an LLM with responding to simulated questions from cancer patients, comparing its output to responses crafted by board-certified radiation oncologists.
Published in The Lancet Digital Health, the study uncovered concerning findings—LLM-generated responses, when unedited, posed risks of severe harm, including one instance resulting in death.
The majority of harmful responses stemmed from inaccurately assessing the scenario's urgency and recommended actions.
The study concluded that LLM-assisted responses, after being reviewed and edited by medical professionals, struck a balance between reducing physician workload and ensuring patients receive accurate information.
However, it emphasized the critical need for a thorough evaluation of LLMs in clinical contexts, considering the specific task and level of human oversight.
A separate study conducted at New York’s Mount Sinai Health System assessed the performance of four LLMs in querying medical billing codes. Published in NEJM AI, the research revealed significant shortcomings, with LLMs often generating imprecise or fabricated codes.
Funded by the AGA Research Foundation and the National Institutes of Health (NIH), the study cautioned against the use of LLMs in medical coding tasks without further research.
Despite their ability to approximate the meaning of various codes, LLMs exhibited a concerning lack of precision and a propensity for falsifying codes.
The implications of these findings extend to billing, clinical decision-making, quality improvement, research, and health policy.
Researchers underscored the urgent need for additional research and evaluation to address the limitations of LLMs and ensure patient safety in healthcare settings.