Top AI models fail spectacularly when faced with slightly altered medical questions
Artificial intelligence systems often perform impressively on standardized medical exams—but new research suggests these test scores may be misleading. A study published in JAMA Network Open indicates that large language models, or LLMs, might not actually “reason” through clinical questions.
Read more