Close Banner

1 in 2 AI Medical Responses Flagged As Problematic In New Study

Ava Durgin
Author:
April 28, 2026
Ava Durgin
Assistant Health Editor
Image by Pexels: Ketut Subiyanto
April 28, 2026

“Why do I wake up at 3 a.m. every night for no reason?”

“Why do I keep getting bloated every time I eat?” 

“What could cause pain in my right hip flexor?” 

These are the kinds of questions that used to sit in the back of your mind (or your notes app) until a doctor’s appointment. Now, they go straight into AI, for better and for worse.

We’re starting to treat AI like a health resource rather than a tool. It’s fast, available, and surprisingly articulate. But a new analysis suggests that what sounds authoritative isn’t always accurate, and in some cases, it can steer people in the wrong direction.

Testing AI’s health advice

Researchers set out to test how reliable popular AI chatbots are when answering everyday health questions, particularly in areas already flooded with misinformation. They evaluated five widely used models and asked each one 50 questions spanning cancer, vaccines, stem cells, nutrition, and athletic performance.

The questions weren’t random. Some were straightforward, with clear, evidence-based answers. Others were intentionally open-ended or designed to nudge the models toward gray areas where misinformation tends to thrive. Think the kinds of questions people actually ask when they’re trying to make sense of their health, not prompts with one obvious answer.

Each response was scored by experts using a structured system that flagged whether the information was accurate, incomplete, or potentially harmful if someone acted on it without guidance. They also looked at the quality of citations and how easy the answers were to understand.

This wasn’t about catching a few obvious mistakes. It was about stress-testing how these systems behave under real-world conditions.

Half the answers were problematic

About 50% of responses were flagged as problematic. Around 30% were missing context, oversimplifying, or presenting weak evidence as stronger than it is. Nearly 20% were considered highly problematic, meaning the advice could plausibly lead someone toward ineffective or even harmful decisions.

But the more interesting finding isn’t just the error rate. It’s how those errors showed up.

Open-ended questions were the biggest trouble spot. When the chatbot had more freedom to generate a broad answer, it was significantly more likely to be misleading. Closed questions with clear right-or-wrong answers fared better. But the problem is that most people don’t ask tightly framed medical questions. They ask things like, “What’s the best diet for hormone balance?” or “Should I be worried about this symptom?”

The topic also made a difference. The models performed relatively well on vaccines and cancer, where there’s a large body of consistent, structured research. They struggled more with nutrition, fitness, and emerging therapies like stem cells, areas where advice is often nuanced, evolving, or influenced by trends.

Then there is the problem of confidence. The chatbots rarely expressed uncertainty. They didn’t say “this is still being studied” or “you should check with a professional” nearly as often as you’d expect. Instead, they delivered answers with a level of certainty that can easily be mistaken for expertise.

Even the citations, which are supposed to anchor claims in evidence, were unreliable. Many were incomplete or outright fabricated. And the language itself tended to be complex, often written at a level that assumes a college-educated reader. Ironically, that complexity can make answers feel more credible, even when they’re not.

How to reduce medical AI misinformation

This doesn’t mean you need to stop using AI for health questions. It means you need to change how you use it.

Start with how you ask. Narrow, specific questions tend to produce more reliable answers than broad, open-ended ones. Instead of asking for “the best” approach, ask about risks, trade-offs, or what evidence supports a specific claim. By doing so, you’re more likely to get a more accurate answer. 

Pay attention to tone. If an answer sounds overly certain, especially on a nuanced topic, that’s a reason to pause. Real health science is rarely (if ever) black and white. A lack of caveats isn’t a sign of clarity. It’s often a sign of oversimplification.

Be skeptical of citations that you can’t verify. If a chatbot references studies, take an extra minute to check whether they actually exist or are being accurately represented. Fabricated references aren’t always obvious at first glance.

Most importantly, know when AI has reached its limit. It can help you understand a concept, generate questions to ask your doctor, or translate complex information into something more digestible. What it can’t do is evaluate your individual health context, weigh competing evidence, or make judgment calls that require clinical experience.

The takeaway

The gap here isn’t just about accuracy. It’s about expectation. People are using AI as if it’s a source of truth, when it’s really a system built to predict what a good answer sounds like.

If you think of AI as a starting point, it can be really useful. It can nudge you in the right direction and make complex topics feel more approachable. The problem is when you use it as your sole source of information. That’s when you risk making decisions based on something that just sounds convincing, not something that’s actually grounded in science.