Ranked: The Smartest AI Models Of 2026

Grok-4.20 and OpenAI’s GPT 5.4 Pro tied for the top spot in the 2026 AI intelligence rankings with a record score of 145.

image.png

Key Takeaways

  • Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) tie for the top spot in TrackingAI’s April 2026 Mensa Norway benchmark, each scoring 145.

  • The top tier is getting crowded, with several leading models now separated by only a few points.

  • Scores have risen sharply from 2025, highlighting how quickly frontier AI reasoning has improved on visual pattern-recognition tests.

The race to build smarter AI models is getting tighter at the top.

This visualization, part of Visual Capitalist’s AI Week, sponsored by Terzo, ranks leading systems using data from TrackingAI, which benchmarks models on the Mensa Norway IQ test as of April 2026.

The results show both who leads today and how little now separates the top contenders, with multiple frontier models clustered near the top of the leaderboard.

A Tie at the Top

The ranking offers a snapshot of how today’s leading AI models perform on abstract pattern-recognition tasks, and just how close the race has become.

As the table below shows, only a small gap now separates the top models:

Model

Mensa Norway IQ (April 2026)

Grok-4.20 Expert Mode

145

OpenAI GPT 5.4 Pro (Vision)

145

Gemini 3.1 Pro Preview

141

OpenAI GPT 5.4 Thinking (Vision)

139

OpenAI GPT 5.3

136

Grok-4.20 Expert Mode (Vision)

133

OpenAI GPT 5.4 Thinking

133

Meta Muse Spark

133

Gemini 3.1 Pro Preview (Vision)

132

Qwen 3.5

130

Claude-4.6 Opus

130

Kimi K2.5

127

Manus

115

DeepSeek R1

112

DeepSeek V3

111

Gemini 3.1 Flash Preview

110

Llama 4 Maverick

110

OpenAI GPT 5.3 (Vision)

109

Claude-4.6 Sonnet

106

Bing Copilot

101

Perplexity

97

Mistral Medium 3.1

96

Claude-4.6 Sonnet (Vision)

94

Claude-4.6 Opus (Vision)

82

Llama 4 Maverick (Vision)

79

OpenAI GPT 5.4 Pro

73

The biggest takeaway is how compressed the top of the leaderboard has become. Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) are tied for first at 145, while Gemini 3.1 Pro Preview follows closely at 141.

That narrow spread suggests frontier AI models are increasingly converging at the top, where a difference of just a few points can shift the rankings.

The gains from 2025 are also notable. Last year’s top score was 135, compared with 145 in this year’s results, highlighting the speed at which leading models are improving on this benchmark.

Not all models are keeping pace. Among major AI developers, Mistral’s top model ranks lowest in this dataset, scoring 97—well below the leading group.

How TrackingAI Runs the Test

TrackingAI uses the public Mensa Norway test, a set of 35 visual-pattern puzzles. For non-vision models, the questions are verbalized, while vision models receive the original images directly.

As a result, these results are best understood as a benchmark comparison—not a definitive measure of overall intelligence. Because the test is fundamentally visual, model scores can vary depending on how the questions are presented.

Why This Benchmark Matters

TrackingAI’s leaderboard is useful because it offers a simple, familiar way to compare reasoning performance over time. The site also notes that if a model refuses to answer, it is asked the same question up to 10 times, and the most recent answer is used for scoring.

Still, an IQ-style benchmark captures only one slice of capability. It does not measure everything that matters in real-world AI use, such as coding ability, factual reliability, tool use, or performance in professional domains.

Comments