Frontier AI Reasoning Reaches New Heights as Grok and GPT Tie for Intelligence Crown

Grok-4.20 and GPT 5.4 Pro lead the 2026 AI intelligence rankings with IQ scores of 145 on the Mensa Norway benchmark, signaling a massive leap in reasoning.

By: AXL Media

Published: Apr 24, 2026, 1:10 PM EDT

Source: Information for this report was sourced from Visual Capitalist

Frontier AI Reasoning Reaches New Heights as Grok and GPT Tie for Intelligence Crown - article image

The Rapid Convergence of Frontier Artificial Intelligence

The landscape of high-level AI reasoning has shifted dramatically in 2026, as evidenced by the latest Mensa Norway benchmark results from TrackingAI. The most striking development is the extreme compression at the top of the leaderboard, where multiple models from competing developers are now separated by only a few points. Grok-4.20 Expert Mode and OpenAI’s GPT 5.4 Pro (Vision) currently lead the industry with identical scores of 145, effectively reaching the upper percentiles of human-level performance on visual pattern-recognition tasks. This clustering suggests that the major AI laboratories are increasingly converging on similar architectural breakthroughs, making the race for dominance tighter than at any point in the history of large language models.

Benchmarking Abstract Reasoning via Visual Pattern Tests

The TrackingAI methodology utilizes the public Mensa Norway test, which consists of 35 complex visual-pattern puzzles designed to measure fluid intelligence and abstract reasoning. For vision-capable models, the original images are presented directly, while non-vision models receive verbalized descriptions of the puzzles. While this benchmark offers a familiar way to compare performance over time, researchers note that it captures only a specific slice of intelligence. An IQ-style score serves as a useful proxy for reasoning depth but does not account for critical real-world factors such as coding proficiency, factual reliability, or the ability to execute multi-step tool-based workflows in professional environments.

A Historic Leap in Performance Since 2025

Comparing the current results to data from just one year ago highlights the unprecedented speed of AI development. In early 2025, the highest recorded score on the same benchmark was 135; today, the leading tier has pushed that ceiling to 145. This ten-point jump on a standardized IQ scale indicates that frontier models are not just getting larger, but are becoming significantly more efficient at solving entirely new logic patterns they have not encountered during training. This improvement in "zero-shot" reasoning is particularly evident in models like Gemini 3.1 Pro Preview, which followed the top leaders with a robust score of 141.

Frontier AI Reasoning Reaches New Heights as Grok and GPT Tie for Intelligence Crown

Categories

Topics

Related Coverage