Muse Spark: Meta's New AI Model Is Good. But Not Open Source.

Okay so Meta actually did it.
Nine months after Mark Zuckerberg rage-assembled one of the most expensive AI teams on the planet, Meta Superintelligence Labs just dropped their first real public model.
It’s called Muse Spark.
And honestly? The benchmarks are looking promising. Not “hype train” promising. Like, actually promising.
But there’s something sitting in the corner of this launch that nobody seems to want to say too loudly.
So let me say it.
What Is Muse Spark and Where Did It Come From?
Meta Superintelligence Labs (MSL) launched in June 2025 after Zuckerberg got frustrated with Llama 4’s performance. The model released that April to lukewarm reception, benchmark gaming allegations, and an unusual Saturday drop that felt like a model being quietly buried. So Zuckerberg did what Zuckerberg does: he reorganized everything, hired a bunch of people for absurd amounts of money, and stood up a new lab specifically aimed at building frontier AI.
Nine months later. Here we are. Muse Spark.
The model scores 52 on the Artificial Analysis Intelligence Index - competing directly against GPT 5.4, Gemini 3.1, Grok 4.2, and Claude Opus 4.6. After 9 months of work, Meta’s Superintelligence Labs has a model that belongs in the same conversation as the best in the world.
That’s not nothing.
The Benchmarks - Let’s Actually Look at Them
Skip the boring part of me listing every number. Here’s what actually matters.
Where Muse Spark genuinely wins:
On HealthBench Hard, Muse Spark Thinking scores 42.8. GPT 5.4 xhigh gets 40.1. Gemini 3.1 Pro? 20.6. Bruh. That gap on health reasoning is wild.
On Humanity’s Last Exam (the “are you actually smart” benchmark) with no tools - Muse Spark Contemplating hits 50.2. Gemini 3.1 Deep Think gets 48.4. GPT 5.4 Pro? 43.9.
That’s a real win. Not a marginal one.
On FrontierScience Research, Muse Spark scores 38.3 vs Gemini’s 23.3. Another big gap.
Where it’s more competitive than dominant:
On the overall AA Intelligence Index, Muse Spark sits at 52. That puts it 4th overall.
Gemini 3.1 Pro Preview and GPT 5.4 both hit 57.
Claude Opus 4.6 (max) sits at 53.
So it’s in the pack. It’s not running away from anyone. But for a team that was literally zero 9 months ago?
Yeah. That’s a solid result.
Hype Check: Is This a Clean Win?
Not entirely, no.
On ARC AGI 2 - the abstract reasoning puzzle benchmark - Muse Spark scores 42.5. Gemini 3.1 gets 76.5. That’s a significant gap, not a rounding error.
On Terminal-Bench 2.0 (agentic terminal coding), Muse Spark gets 59.0. GPT 5.4 gets 75.1.
On GDPval-AA Elo (office tasks), Muse Spark scores 1444. GPT 5.4 gets 1672.
So the model has real strengths. It also has real gaps.
I genuinely don’t know if those gaps matter for most people’s day-to-day use. But if you’re comparing raw benchmark scores, Muse Spark isn’t the universal winner the announcement framing implies.
It’s a strong model with a specific profile. Which, tbh, is more interesting than being generically good at everything.
The Open-Source Thing. Let’s Talk About It.
Here’s my honest take.
Muse Spark is not open source.
Not open weights. Not community license. Closed. Proprietary. API only.
Which is… fine? Like it’s a business decision and I get it. But this is Meta. The company that literally built its AI credibility on LLaMA. The whole “we’re the good guys because we share our models” brand that they spent years building.
And now their most capable model, built by their brand-new superintelligence lab, is locked behind a wall.
Open source used to be Meta’s competitive strategy. Now it looks like it’s becoming their geopolitical chess move.
And that’s the part that’s a little uncomfortable if you actually pay attention.
Because think about who open source AI benefits most right now. It’s companies in China. DeepSeek runs on ideas that trace back to open LLaMA weights. Meta going proprietary with their serious models while keeping the open-source stuff for the Llama family? That pattern isn’t random. The US government has been quietly applying pressure on frontier labs to keep their best stuff closed. Export controls. Compute restrictions. The whole vibe.
So Meta gets to have both: open source credibility for the developer community (Llama 4 is still out there), and a closed proprietary frontier model that doesn’t accidentally hand anyone an advantage.
Which is smart. And also a bit of a values question.
Meta positioning itself as the “open” AI company was always partly marketing. But it was useful marketing that genuinely helped the ecosystem. Closing off Muse Spark while keeping Llama open is them splitting the difference. One is a product strategy. The other is geopolitics.
I’m not mad about it. But let’s at least name it.
FAQ
Is Muse Spark better than GPT-5 and Gemini 3.1?
In some areas, yes. On health benchmarks and Humanity’s Last Exam, Muse Spark beats both. On abstract reasoning and agentic tasks, it trails. It’s competitive, not dominant. Call it a strong 4th or 5th place overall depending on what you care about.
What is Meta Superintelligence Labs?
MSL is Meta’s internal AI division, launched June 30, 2025, focused on building frontier and eventually superintelligent AI. It’s led by Alexandr Wang as chief AI officer, has about 3,000 employees, and was born after Zuckerberg got frustrated with the Llama 4 rollout. Meta invested $14.3 billion into Scale AI as part of the effort.
Why isn’t Muse Spark open source like Llama?
That’s the question. Meta hasn’t given a clean public answer, but the pattern is clear: Llama models stay open weight as a developer ecosystem play, while MSL’s frontier models are proprietary. Some of it is competitive strategy. Some of it is almost certainly geopolitical pressure from the US government to keep top-tier AI capabilities closed.
PS: Nine months from “we’re building a lab” to a model that legitimately competes with GPT 5.4 on Humanity’s Last Exam is not a slow pace. Whatever Zuckerberg paid those people, they showed up.
Chalo, bye!
Related Articles
Never miss an update
Join 50,000+ developers getting our weekly tech insights.



