Open wide and say AI, + H2O from fog and more

A medical caduceus with scales, silhouetted against a dimly lit, tan background

Measuring progress in health care AI performance relies heavily on question-answer tests, researchers argue, and not enough on evaluating real-world medical tasks.

doomu/Getty Images

Open wide and say AI: Your doctor’s new assistant

😶Human guinea pigs

If you haven’t been to the doctor recently, good for you! Meanwhile, the rest of us have become guinea pigs for the AI tools now commonly used in medical examinations. It’s a mixed blessing, writes Ananya in SN’s recent article: Medical AI tools are growing, but are they being tested properly?

Integrating AI into healthcare holds potential for more efficient record-keeping,  improved diagnoses, treatment decision-making, and reporting. Scads of new medical AI tools are now available to medical practitioners, designed to summarize patient encounters and generate clinical documentation. 

🤥Unreliable narrators

But there are drawbacks: we already know that AI is only as good as the data it’s trained on, and one review of studies evaluating health care AI models, specifically LLMs, found that only 5 percent used real patient data. Few evaluations focused on real-world tasks like summarizing patient encounters. Worse, such tools can also introduce a hazard never encountered in the pre-AI days: hallucinations. OpenAI’s Whisper, an ambient AI listening app that uses speech recognition to summarize interactions, has gotten into some widely reported hot water for hallucinating racial commentary and imagined medical treatments.

Premium Subscription

Access to this content is for Investors Lab subscribers only. Investors Lab delivers exclusive, data-backed insights into scientific breakthroughs set to disrupt industries.