The Great AI Model Race Intensifies

The AI model race accelerated dramatically over the past two weeks, with three major releases and a competitive crisis at OpenAI reshaping enterprise AI strategy. The window between leading and lagging models is compressing from quarters to weeks.
Google launched Gemini 3 on November 18, achieving a record 1501 Elo score on LMArena and reaching 650 million monthly active users. The model scored 37.4% on Humanity’s Last Exam, the highest performance on record. Google released Antigravity, a new agentic development platform, alongside the model.
Six days later, Anthropic released Claude Opus 4.5. The company tested the model on the same difficult take-home exam it gives prospective performance engineers. Claude Opus 4.5 scored higher than any human candidate has ever scored on this test. The model achieved 80.9% on SWE-bench Verified, currently the state of the art for coding benchmarks.
These back-to-back launches triggered an internal crisis at OpenAI. On December 2, CEO Sam Altman issued a “Code Red” memo to employees, calling for an immediate surge to improve ChatGPT. The company delayed planned features including advertising, shopping, health agents, and a personal assistant codenamed “Pulse.” According to sources at The Information, OpenAI is developing a model called “Garlic” that Chief Research Officer Mark Chen told colleagues shows “strong performance” on coding and reasoning benchmarks in internal tests.
The competitive pressure is measurable. ChatGPT maintains approximately 800 million weekly users, while Gemini surged from 450 million to 650 million monthly active users since July.
DeepSeek, a Hangzhou-based startup, released V3.2 on December 1, a 685-billion-parameter open-source model. The company claims performance on par with GPT-5 and Gemini 3 Pro, though these are company assertions rather than independent benchmarks. The model achieved gold medal performance on IMO, CMO, IOI, and ICPC competitions. I tested V3.2 for several days and the performance is strong. The cost advantage is substantial. Processing 50 million input tokens and 20 million output tokens per month costs approximately $245 with DeepSeek V3.2, compared to $880 with Claude and $1,100 with GPT-4 Turbo.
In practice, your model garden will need lifecycle management that can accommodate 90-day evaluation cycles. Your agent catalog will need governance protocols that scale with release velocity. The practical solution is a well-organized AI Ops team. We’ve been helping our clients stand up these frameworks. If you want to discuss how this applies to your organization, please reach out.
More By This Author:
Vibe Revenue And The AI BubbleFree ChatGPT for Teachers
Google’s Gemini 3 Release Signals The End Of Cautious AI Rollouts
Disclosure: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it.