Cursor 3 Rewrites the Rules — Not Everyone Is Convinced

On April 2, Anysphere shipped Cursor 3 under the internal codename "Glass" — and simultaneously declared the previous version of their product obsolete. That takes confidence. Jonas Nelle, Cursor's head of engineering, told WIRED: "In the last few months, our profession has completely changed. A lot of the product that got Cursor here is not as important going forward anymore."

He's not wrong. The question is whether what replaces it is ready.

What Actually Changed

Cursor 3 wasn't updated — it was rebuilt. The traditional VS Code editor remains available, but the default experience is now the Agents Window: a full-screen workspace where you run multiple AI agents in parallel across local machines, cloud sandboxes, SSH environments, and mobile. You dispatch tasks, close your laptop, and come back to a PR with screenshots and a video recording. More than a third of Cursor's own internal PRs are already authored by agents running in cloud sandboxes.

Design Mode lets frontend developers select UI elements directly in the browser and describe changes in natural language. /worktree isolates experiments in git worktrees. /best-of-n runs the same prompt against multiple models simultaneously. The cloud agent infrastructure supports up to 50 workers per team, with Kubernetes and fleet management APIs, and self-hosted enterprise deployment that keeps code, builds, and secrets entirely on-premise.

The vision is coherent. Developers as architects, agents as builders. I've been using Cursor since before most people had heard of it, and I won't pretend this isn't impressive engineering.

The Model Powering It — and the Transparency Problem

The engine is Composer 2, launched March 19. The benchmarks are strong: 61.3 on CursorBench-3 (up 39% from Composer 1.5), 73.7 on SWE-bench Multilingual, 61.7 on Terminal-Bench 2.0. Pricing undercuts competitors significantly — $0.50 per million input tokens versus Claude Opus 4.6's roughly 10x higher rate. 200,000-token context window. A real-time reinforcement learning pipeline that ships improved model checkpoints every five hours based on actual user interactions.

Within 24 hours of launch, a developer found the internal model ID: kimi-k2p5-rl-0317-s515-fast. Composer 2 is built on Moonshot AI's Kimi K2.5, a one-trillion-parameter mixture-of-experts model. Moonshot's head of pretraining confirmed the tokenizer was "completely identical." Cursor co-founder Aman Sanger called the omission a "miss." VP Lee Robinson acknowledged the base but claimed only about a quarter of compute came from Kimi K2.5, with the rest from Cursor's own training.

Here's what bothers me about this: Cursor is a $50 billion company used by over half the Fortune 500. Kimi K2.5's modified MIT license explicitly requires prominent display of the Kimi name for products exceeding $20 million in monthly revenue. Cursor surpasses that threshold by roughly 8x. This wasn't a startup cutting a corner in a garage. This was a deliberate product decision by a company that knows exactly how big it is.

"Developer trust, once cracked, does not heal on the same timeline as a product roadmap." I didn't write that line — an analyst did. But I'm keeping it because it's exactly right.

The Pricing Conversation Nobody Wants to Have

Six tiers. Free. Pro at $20/month. Pro+ at $60/month. Ultra at $200/month — 20x usage, effectively $4,000 worth of API capacity. Teams at $40/user/month. Enterprise at custom pricing.

Reports have surfaced of developers spending up to $2,000 in two days on agent-intensive workflows. I believe it. Several prominent developers told WIRED they shifted to Claude Code or Codex because of more generous usage limits on those subsidized subscriptions. Claude Code reportedly commands 54% of the AI coding market according to Menlo Ventures data.

Let's be direct: the Ultra tier at $200/month exists for developers who need agents running continuously. If you're using Cursor 3 as Anysphere intends — multiple parallel agents, cloud sandboxes, overnight autonomous tasks — you're on Ultra. That's $2,400 a year, before your IDE costs, before your other subscriptions, before anything else.

That's not inherently wrong. But it means the "agent-first development" vision has a price floor that excludes a significant portion of the developers Cursor is claiming to liberate.

Where It Actually Breaks

I'll say this plainly: autonomous agents in complex codebases are not ready to be trusted without rigorous review, and Cursor 3 does not change that.

Cursor handles Rust lifetime annotations correctly about 80% of the time. A 20% failure rate on Rust's core safety mechanism is not a minor footnote. An engineering team at ilert had to add explicit rule files forbidding AI-generated patterns like holding std::sync::Mutex across .await points — because the agent kept introducing them. Academic research shows ownership and borrowing violations account for over 40% of compilation errors in AI-generated Rust code.

It gets worse at scale. A CodeRabbit analysis of 470 open-source PRs found AI-authored code contained 1.7x more bugs than human-written code, including higher rates of critical issues. An engineer using Claude Code — Cursor's closest competitor — had an agent destroy a live production environment: network, services, and a database with years of data. Amazon's own internal documents cited "Gen-AI assisted changes" as a contributing factor in incidents including a December AWS outage. Apiiro found developers using AI introduced roughly 10x more security issues than those who did not.

OWASP published its first Top 10 for Agentic Applications in 2026. A December 2025 audit found 30+ vulnerabilities across all major AI IDEs — Cursor, Windsurf, GitHub Copilot, and others — with 24 assigned CVEs. Autonomous agent features can be turned into data exfiltration and remote code execution vectors.

Stack Overflow's 2026 Developer Survey: 29% of developers trust AI outputs to be accurate. Down 11 points from 2024. The top Hacker News reaction to Cursor 3's launch: "I wish they'd keep the old philosophy of letting the developer drive and the agent assist… I still want to code, not vibe my way through tickets."

That reaction isn't nostalgia. It's signal.

April 2 Was a Bad Day to Launch Alone

Cursor 3 shipped on the same day GitHub dropped its Copilot SDK into public preview, Google released Gemma 4 under Apache 2.0 — the first time the Gemma family has been fully open for commercial use — and Google's Antigravity IDE continued its free public preview. Antigravity, built on a VS Code fork following Google's $2.4 billion acquisition of Windsurf's talent, scores 76.2% on SWE-bench Verified — the highest published benchmark for any coding agent currently available.

That's a crowded week. And Google's Antigravity is free.

The Honest Verdict

Cursor 3 is the most technically sophisticated agent-first IDE available today. The Composer 2 benchmarks are real. The real-time RL pipeline is genuinely novel — a model that improves every five hours based on actual production usage is a different kind of product than anything that existed two years ago. The cloud agent infrastructure and enterprise self-hosting are serious engineering.

I use Cursor. I'll keep using it. The Tab completions alone justify the Pro subscription for daily work.

But the "agent-first" era will arrive more slowly and unevenly than this launch suggests. The transparency stumble with Kimi K2.5 was avoidable and unnecessary. The $200/month pricing ceiling is real. The trust deficit — 71% of developers who don't trust AI accuracy — didn't appear from nowhere. It was earned.

The developers most likely to benefit from Cursor 3 are those working in well-typed, well-tested codebases with clear architectural boundaries. Everyone else should use it as a powerful assistant, not an autonomous colleague.

The question isn't whether Cursor 3 is impressive. It is. The question is whether "you are the architect, agents are the builders" describes where we actually are — or where Anysphere needs us to believe we are to justify a $50 billion valuation.

Those are different questions with different answers.

Cursor 3 Rewrites the Rules — Not Everyone Is Convinced

What Actually Changed

The Model Powering It — and the Transparency Problem

The Pricing Conversation Nobody Wants to Have

Where It Actually Breaks

April 2 Was a Bad Day to Launch Alone

The Honest Verdict

Sources

Related Articles

Half Your Code Was Written by a Machine. Nobody Checked.

97 Million Installs. Zero Questions.

Half of US Jobs Will Change in Three Years. The Rest Won't Notice.