Washington State University Study Finds AI Accuracy Fails to Meet High Reliability Standards in Scientific Analysis

Washington State University researchers give AI a "D" for accuracy. Discover why ChatGPT struggles to identify false statements and remains inconsistent.

By: AXL Media

Published: Mar 16, 2026, 12:07 PM EDT

Source: Information for this report was sourced from Washington State University

Washington State University Study Finds AI Accuracy Fails to Meet High Reliability Standards in Scientific Analysis - article image
Washington State University Study Finds AI Accuracy Fails to Meet High Reliability Standards in Scientific Analysis - article image

The Gap Between Linguistic Fluency and Conceptual Intelligence

While the rapid adoption of generative artificial intelligence has led to high expectations for its reasoning capabilities, recent research from Washington State University suggests a significant disconnect between fluency and factual reliability. Led by associate professor Mesut Cicek, the study involved feeding over 700 scientific hypotheses into ChatGPT to determine if research supported them. The findings, published in the Rutgers Business Review, indicate that the AI’s ability to "think" is vastly overstated. According to Cicek, the tools currently function as sophisticated memorization engines rather than entities with genuine understanding, highlighting a critical intelligence gap that users must navigate with extreme caution.

Struggling With the Nuance of Scientific Truth

The experiment utilized 719 hypotheses from peer-reviewed business journals published since 2021, testing the AI's ability to handle complex, nuanced reasoning. When asked to determine if a statement was true or false, the AI achieved an accuracy rate of 76.5% in 2024, which improved slightly to 80% by 2025. However, the researchers noted that when adjusted for random chance, the performance was only about 60% better than a blind guess. This performance level is comparable to a low "D" grade in an academic setting, suggesting that the AI lacks the logical depth required for high-stakes scientific or professional analysis.

The Persistence of Contradictory and Inconsistent Outputs

One of the most troubling aspects of the study was the AI’s lack of consistency when presented with identical information. Researchers repeated each query 10 times with the exact same wording and found that the AI frequently changed its mind. In several instances, ChatGPT provided a "true" answer five times and a "false" answer the other five times for the same hypothesis. Cicek emphasized that this inconsistency is a major hurdle for users who require reliable data, as the AI’s response can vary based on nothing more than the randomness of its next word prediction rather than a shift in available evidence.

Categories

Topics

Related Coverage