Ranked: The Smartest AI Models Of 2026

Key Takeaways

Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) tie for the top spot in TrackingAI’s April 2026 Mensa Norway benchmark, each scoring 145.
The top tier is getting crowded, with several leading models now separated by only a few points.
Scores have risen sharply from 2025, highlighting how quickly frontier AI reasoning has improved on visual pattern-recognition tests.

The race to build smarter AI models is getting tighter at the top.

This visualization, part of Visual Capitalist’s AI Week, sponsored by Terzo, ranks leading systems using data from TrackingAI, which benchmarks models on the Mensa Norway IQ test as of April 2026.

The results show both who leads today and how little now separates the top contenders, with multiple frontier models clustered near the top of the leaderboard.

A Tie at the Top

The ranking offers a snapshot of how today’s leading AI models perform on abstract pattern-recognition tasks, and just how close the race has become.

As the table below shows, only a small gap now separates the top models:

Model	Mensa Norway IQ (April 2026)
Grok-4.20 Expert Mode	145
OpenAI GPT 5.4 Pro (Vision)	145
Gemini 3.1 Pro Preview	141
OpenAI GPT 5.4 Thinking (Vision)	139
OpenAI GPT 5.3	136
Grok-4.20 Expert Mode (Vision)	133
OpenAI GPT 5.4 Thinking	133
Meta Muse Spark	133
Gemini 3.1 Pro Preview (Vision)	132
Qwen 3.5	130
Claude-4.6 Opus	130
Kimi K2.5	127
Manus	115
DeepSeek R1	112
DeepSeek V3	111
Gemini 3.1 Flash Preview	110
Llama 4 Maverick	110
OpenAI GPT 5.3 (Vision)	109
Claude-4.6 Sonnet	106
Bing Copilot	101
Perplexity	97
Mistral Medium 3.1	96
Claude-4.6 Sonnet (Vision)	94
Claude-4.6 Opus (Vision)	82
Llama 4 Maverick (Vision)	79
OpenAI GPT 5.4 Pro	73

The biggest takeaway is how compressed the top of the leaderboard has become. Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) are tied for first at 145, while Gemini 3.1 Pro Preview follows closely at 141.

That narrow spread suggests frontier AI models are increasingly converging at the top, where a difference of just a few points can shift the rankings.

The gains from 2025 are also notable. Last year’s top score was 135, compared with 145 in this year’s results, highlighting the speed at which leading models are improving on this benchmark.

Not all models are keeping pace. Among major AI developers, Mistral’s top model ranks lowest in this dataset, scoring 97—well below the leading group.

How TrackingAI Runs the Test

TrackingAI uses the public Mensa Norway test, a set of 35 visual-pattern puzzles. For non-vision models, the questions are verbalized, while vision models receive the original images directly.

As a result, these results are best understood as a benchmark comparison—not a definitive measure of overall intelligence. Because the test is fundamentally visual, model scores can vary depending on how the questions are presented.

Why This Benchmark Matters

TrackingAI’s leaderboard is useful because it offers a simple, familiar way to compare reasoning performance over time. The site also notes that if a model refuses to answer, it is asked the same question up to 10 times, and the most recent answer is used for scoring.

Still, an IQ-style benchmark captures only one slice of capability. It does not measure everything that matters in real-world AI use, such as coding ability, factual reliability, tool use, or performance in professional domains.

Key Takeaways

A Tie at the Top

How TrackingAI Runs the Test

Why This Benchmark Matters

More from this Author

Renewables Beat Coal For First Time Since 1919

Mapped: AI Adoption By Country In 2026

China Vs. America: Who The World Trades With Most