Elon Musk’s AI company, xAI, is turning heads again with the release of its next-gen Grok 4 models, and they’re setting some serious records. The two new models, Grok 4 and Grok 4 Heavy, are both focused on advanced reasoning and problem-solving — and the benchmark numbers are wild.
Alongside the models, xAI dropped a premium subscription tier called SuperGrok Heavy, priced at $300 per month. This new plan gives users full access to Grok 4 Heavy, the company’s most powerful model to date.

Let’s talk numbers — because Grok 4 isn’t just keeping up with the competition, it’s leaving them in the dust:
- 🧠 GPQA benchmark: Grok 4 scored 87.5%, while Grok 4 Heavy hit 88.9%
- 🧮 AIME 2025 exam: Grok 4 Heavy nailed it with a perfect 100%
- 💀 Humanity’s Last Exam (with tools): Grok 4 Heavy got 44.4%, and Grok 4 scored 38.6% — both significantly ahead of Gemini 2.5 Pro (26.9%) and OpenAI’s o3 (24.9%)
But the real headline is the ARC-AGI-2 benchmark, a brand-new test designed to challenge the reasoning abilities of top-tier models. Grok 4 scored 15.9%, the highest ever recorded — and that’s double the score of Claude Opus 4 and OpenAI’s o3. On the older ARC-AGI-1 benchmark, Grok 4 still led the pack with 66.7%.
xAI says Grok 4 Heavy is their largest and most powerful model yet, capable of parallel problem-solving using multiple agents. Basically, it’s not just smart — it’s built to think like a team.
Looking ahead, the roadmap is packed. Musk announced:
- An AI coding model coming in August
- A multi-modal agent dropping in September
- And possibly a video generation model by October
All this solidifies xAI as a serious contender in the AI world, right up there with OpenAI, Google, and Anthropic — and maybe even a step ahead. With Grok 4’s performance, xAI just planted its flag as a leading force in building foundational AI models.
Also Read: Asmongold Becomes Most-Watched Streamer of Q2 2025—Thanks to Going All-In on Kick and Twitch