Paging Dr. Google … Dr. GPT is in the house

A hand holds a phone. The screen shows part of a conversation with a chatbot about medical questions. Pills, bottles, flowers and a glass of water are in the background.

State-of-the-art AI chatbots didn't perform well when real people asked for help assessing a medical problem.

Peresmeh/Creatas Video/Getty Images Plus

Back when I was in grad school, I learned how to properly search the web during a mandatory course taught by the university librarians. The core message was simple: the better the query, the better the results. Today, the same principle holds when engaging with chatbots. Good prompts are key to unlocking the value of large language models (LLMs), so you actually use the right sources and get higher-quality results. Remember that next time you fire up ChatGPT with a health question. As SN’s Tina Hesman Saey reports, new research reveals that while AI has the knowledge to diagnose, the average user lacks the query logic to extract it accurately.

👩‍💻 The prompt is the procedure

The work involved assessing how well popular LLMs diagnose complex medical cases. Researchers provided volunteers with expert-crafted clinical vignettes — detailed descriptions of patient symptoms and histories. Then they randomly assigned the volunteers to use various LLMs or other methods (most people in the “other methods” group used Google or another search engine) to see what might more accurately identify the correct condition and what to do about it. Unlike a static search, the chatbot experiments played out as interactive conversations.

The results? Human volunteers got less accurate diagnoses from the bots than controlled lab situations where the bot got fed the entire scenario.

Not yet a subscriber?

Access to this content requires an Investors Lab subscription. Sign up for a free trial today to explore exclusive, data-backed insights into scientific breakthroughs set to disrupt industries.