Are AI Chatbots Ready to Aid in Clinical Decision-Making?

Are AI Chatbots Ready to Aid in Clinical Decision-Making? A robot and a doctor stand facing each other with a lightbulb going off between them.

Researchers have been studying whether artificial intelligence (AI) chatbots like ChatGPT can be a useful tool in clinical decision-making. Early findings indicate that AI chatbots have demonstrated some capabilities in this area, but the consensus has been that these tools are not yet reliable enough to trust in the fast-paced and complex decision-making that goes on in hospitals and health systems.

In a research letter published in JAMA Internal Medicine in December, physician-scientists at Beth Israel Deaconess Medical Center (BIDMC) in Boston compared a chatbot’s large language model (LLM) reasoning abilities directly against human performance using standards developed to assess physicians.

Researchers found that the chatbot earned the highest scores, with a median score of 10 out of 10 for the LLM, 9 for attending physicians and 8 for residents. It was more of a draw between the humans and the bot when it came to diagnostic accuracy — how high up the correct diagnosis was on the list of diagnoses they provided — and correct clinical reasoning.

But another key point from that research — namely, that bots were “just plain wrong” and had more instances of incorrect reasoning in their answers (significantly more often than residents) — didn’t seem to garner nearly as many headlines. “The finding underscores the notion that AI will likely be most useful as a tool to augment, not replace, the human reasoning process,” noted a Beth Israel Deaconess Medical Center follow-up report in April.

How Chatbots Are Being Used in Decision-Making

The rapid proliferation of publicly available AI tools that can be used in clinical decision-making begs the question of whether doctors are using chatbots for this purpose.

The answer appears to be ‘yes,’ according to survey data from more than 100 physicians recently polled by Fierce Healthcare and Sermo, a physician social network. Only physicians who reported using general-purpose LLMs could participate in the survey. The report states that some doctors are turning to tools intended for nonclinical uses to make clinical decisions.

Among the survey’s findings:

  • 76% of respondents reported using general-purpose LLMs in clinical decision-making.
  • More than 60% of physicians reported using LLMs like ChatGPT to check drug interactions.
  • More than half use LLMs for diagnosis support.
  • Nearly half use LLMs to generate clinical documentation.
  • 70% use LLMs for patient education.

Importantly, nearly all the respondents (97%) said they perform some vetting of LLM outputs. Three out of four use clinical decision support tools, 60% use Google and peer-reviewed studies and nearly half use peers or colleagues.

4 Takeaways from the Survey

1 | Let the User Beware

Part of the appeal of publicly available AI chatbots is that they are easy to access and to query. A general-purpose LLM such as ChatGPT is trained on publicly available information online, may reference inaccurate, AI-generated content, and is not updated in real time. This means its outputs may be unreliable. Firewalled databases, including medical knowledge databases and scientific journals that typically charge a fee, are excluded from chatbot results.

2 | Refinements still are needed

That physicians are exploring the potential benefits and limitations of AI chatbots in clinical decision-making is not surprising. But as Sara Farag, M.D., a gynecologic surgeon on Sermo’s medical advisory board, noted in the Fierce report, LLM models need to be refined specifically for medical decision-making to be useful for patient management.

3 | Detail and Context Are Essential

Missing details or context in a chatbot query can lead to results that aren’t precise or are dangerous. After experimenting with Microsoft’s Copilot, an AI-powered digital assistant, and asking how to treat a hypothetical patient with a urinary tract infection, Peter Bonis, M.D., chief medical officer with Wolters Kluwer, intentionally left out a key detail — that the patient was pregnant. The chatbot recommended antibiotics that would have been a risk to the fetus.

4 | Don't Lose Sight of the Human Element

Technology like AI is increasingly core to our experience of health care, notes Chris DeRienzo, chief physician executive of the American Hospital Association. “However, it's also true that when we leave technology to its own devices, we run the risk of technology being an orchestra without a conductor. . . . Health care is, and will always be, a uniquely human experience. That's why we need to [discuss these issues]. Because we cannot fail to thread the needle of technology with the fibers of our humanity,” DeRienzo says.

AHA Center for Health Innovation logo

Related Resources

AHA Center for Health Innovation Market Scan
Public
AHA Center for Health Innovation Market Scan
Public
Advancing Health Podcast
Public
AHA Center for Health Innovation Market Scan
Public
AHA Center for Health Innovation Market Scan
Public
AHA Center for Health Innovation Market Scan
Public