RAWTEXT

Anthropic Leaked Its Own Source Code. Twice. In One Week.

Sun, 05 Apr 2026 00:00:00 GMT

Let me explain why I'm writing this at all.

I've been following Anthropic for a while. Not obsessively, but I work with this stuff, so I keep up. When the first leak dropped — the npm package one — my first thought was: okay, happens. Then came the second one. That's when I started taking notes.

This isn't a clean investigative report. I'm writing it because it's been bothering me, and because I can't shake the feeling that most of the coverage is missing the actual point.

What Happened

On March 31, 2026, someone at Anthropic forgot to configure a .npmignore file correctly. The official Claude Code npm package shipped with a source map that exposed the complete, unobfuscated TypeScript source code. Security researcher Chaofan Shou found it. The repository was forked more than 41,500 times within hours. Gone is gone.

That was the second leak in a week. Five days earlier, Fortune had reported that Anthropic accidentally made nearly 3,000 files publicly accessible — including a draft blog post about an internal model they call "Mythos" and "Capybara."

Two leaks. Five days. One company that has spent years telling the world it's the most safety-conscious lab in AI.

The Part That Actually Bothers Me

According to a technical breakdown by software engineer Gabriel Anhaia, a single correctly configured .npmignore file — or a correct files field in package.json — would have been enough to prevent all of this. This isn't some obscure edge case. It's the first thing covered in every npm release tutorial I've ever read.

Anthropic's official response: "This was a release packaging issue caused by human error, not a security breach."

Technically accurate. But "human error" sounds like a company that has already moved on before it fully understood what happened.

What was inside the leak isn't trivial either. There were dozens of feature flags for capabilities that are fully built but haven't shipped — including something internally called "KAIROS": an autonomous daemon mode that lets Claude Code operate as an always-on background agent. There's a process inside it called "autoDream" that consolidates memory while the user is idle. Anthropic never intended to publish any of that. Every competitor has it now.

The Timing Nobody Is Talking About

In the early hours of March 31 — simultaneous with the source code leak — there was a supply chain attack on the axios npm package. Axios is a core dependency of Claude Code. Anyone who installed or updated Claude Code between 00:21 and 03:29 UTC may have pulled a trojanized version containing a remote access trojan.

I'm not claiming these are connected. Coincidences happen. But people should know about it.

What's happened since is less ambiguous: the leak is being actively used as a social engineering lure to distribute malicious payloads via GitHub, and there's typosquatting on internal npm package names — traps set for developers trying to compile the leaked Claude Code source themselves. The original mistake was human. What's being built on top of it isn't.

What This Means for Anthropic

Claude Code is running at an annualized revenue of over $2.5 billion, with enterprise as the dominant channel. These aren't forgiving hobbyist users — these are CTOs with long procurement checklists.

Anthropic has built its entire positioning on a single promise: we are the adults in the room. We take safety seriously while everyone else chases market share. That was never just marketing. It's why regulators take their calls, why certain talent chooses them, why enterprise deals close.

And then this happens. Twice. In five days.

Anthropic will survive this. The products are good, and enterprise buyers have short memories when the tool keeps working. But every CTO currently in a procurement decision now has a new question on their list — and Anthropic doesn't have a good answer for it right now.

The next safety promise is going to cost a little more to sell than the last one. That's not a dramatic take. It's just what happens when you stumble twice in a week and your public response amounts to: "Yeah, our mistake, moving on."

I'm not writing this with any satisfaction. I use their products. I want them to be good. But honest criticism shouldn't have to feel like an attack to be worth saying.

Cursor 3 Rewrites the Rules — Not Everyone Is Convinced

Sun, 05 Apr 2026 00:00:00 GMT

On April 2, Anysphere shipped Cursor 3 under the internal codename "Glass" — and simultaneously declared the previous version of their product obsolete. That takes confidence. Jonas Nelle, Cursor's head of engineering, told WIRED: "In the last few months, our profession has completely changed. A lot of the product that got Cursor here is not as important going forward anymore."

He's not wrong. The question is whether what replaces it is ready.

What Actually Changed

Cursor 3 wasn't updated — it was rebuilt. The traditional VS Code editor remains available, but the default experience is now the Agents Window: a full-screen workspace where you run multiple AI agents in parallel across local machines, cloud sandboxes, SSH environments, and mobile. You dispatch tasks, close your laptop, and come back to a PR with screenshots and a video recording. More than a third of Cursor's own internal PRs are already authored by agents running in cloud sandboxes.

Design Mode lets frontend developers select UI elements directly in the browser and describe changes in natural language. /worktree isolates experiments in git worktrees. /best-of-n runs the same prompt against multiple models simultaneously. The cloud agent infrastructure supports up to 50 workers per team, with Kubernetes and fleet management APIs, and self-hosted enterprise deployment that keeps code, builds, and secrets entirely on-premise.

The vision is coherent. Developers as architects, agents as builders. I've been using Cursor since before most people had heard of it, and I won't pretend this isn't impressive engineering.

The Model Powering It — and the Transparency Problem

The engine is Composer 2, launched March 19. The benchmarks are strong: 61.3 on CursorBench-3 (up 39% from Composer 1.5), 73.7 on SWE-bench Multilingual, 61.7 on Terminal-Bench 2.0. Pricing undercuts competitors significantly — $0.50 per million input tokens versus Claude Opus 4.6's roughly 10x higher rate. 200,000-token context window. A real-time reinforcement learning pipeline that ships improved model checkpoints every five hours based on actual user interactions.

Within 24 hours of launch, a developer found the internal model ID: kimi-k2p5-rl-0317-s515-fast. Composer 2 is built on Moonshot AI's Kimi K2.5, a one-trillion-parameter mixture-of-experts model. Moonshot's head of pretraining confirmed the tokenizer was "completely identical." Cursor co-founder Aman Sanger called the omission a "miss." VP Lee Robinson acknowledged the base but claimed only about a quarter of compute came from Kimi K2.5, with the rest from Cursor's own training.

Here's what bothers me about this: Cursor is a $50 billion company used by over half the Fortune 500. Kimi K2.5's modified MIT license explicitly requires prominent display of the Kimi name for products exceeding $20 million in monthly revenue. Cursor surpasses that threshold by roughly 8x. This wasn't a startup cutting a corner in a garage. This was a deliberate product decision by a company that knows exactly how big it is.

"Developer trust, once cracked, does not heal on the same timeline as a product roadmap." I didn't write that line — an analyst did. But I'm keeping it because it's exactly right.

The Pricing Conversation Nobody Wants to Have

Six tiers. Free. Pro at $20/month. Pro+ at $60/month. Ultra at $200/month — 20x usage, effectively $4,000 worth of API capacity. Teams at $40/user/month. Enterprise at custom pricing.

Reports have surfaced of developers spending up to $2,000 in two days on agent-intensive workflows. I believe it. Several prominent developers told WIRED they shifted to Claude Code or Codex because of more generous usage limits on those subsidized subscriptions. Claude Code reportedly commands 54% of the AI coding market according to Menlo Ventures data.

Let's be direct: the Ultra tier at $200/month exists for developers who need agents running continuously. If you're using Cursor 3 as Anysphere intends — multiple parallel agents, cloud sandboxes, overnight autonomous tasks — you're on Ultra. That's $2,400 a year, before your IDE costs, before your other subscriptions, before anything else.

That's not inherently wrong. But it means the "agent-first development" vision has a price floor that excludes a significant portion of the developers Cursor is claiming to liberate.

Where It Actually Breaks

I'll say this plainly: autonomous agents in complex codebases are not ready to be trusted without rigorous review, and Cursor 3 does not change that.

Cursor handles Rust lifetime annotations correctly about 80% of the time. A 20% failure rate on Rust's core safety mechanism is not a minor footnote. An engineering team at ilert had to add explicit rule files forbidding AI-generated patterns like holding std::sync::Mutex across .await points — because the agent kept introducing them. Academic research shows ownership and borrowing violations account for over 40% of compilation errors in AI-generated Rust code.

It gets worse at scale. A CodeRabbit analysis of 470 open-source PRs found AI-authored code contained 1.7x more bugs than human-written code, including higher rates of critical issues. An engineer using Claude Code — Cursor's closest competitor — had an agent destroy a live production environment: network, services, and a database with years of data. Amazon's own internal documents cited "Gen-AI assisted changes" as a contributing factor in incidents including a December AWS outage. Apiiro found developers using AI introduced roughly 10x more security issues than those who did not.

OWASP published its first Top 10 for Agentic Applications in 2026. A December 2025 audit found 30+ vulnerabilities across all major AI IDEs — Cursor, Windsurf, GitHub Copilot, and others — with 24 assigned CVEs. Autonomous agent features can be turned into data exfiltration and remote code execution vectors.

Stack Overflow's 2026 Developer Survey: 29% of developers trust AI outputs to be accurate. Down 11 points from 2024. The top Hacker News reaction to Cursor 3's launch: "I wish they'd keep the old philosophy of letting the developer drive and the agent assist… I still want to code, not vibe my way through tickets."

That reaction isn't nostalgia. It's signal.

April 2 Was a Bad Day to Launch Alone

Cursor 3 shipped on the same day GitHub dropped its Copilot SDK into public preview, Google released Gemma 4 under Apache 2.0 — the first time the Gemma family has been fully open for commercial use — and Google's Antigravity IDE continued its free public preview. Antigravity, built on a VS Code fork following Google's $2.4 billion acquisition of Windsurf's talent, scores 76.2% on SWE-bench Verified — the highest published benchmark for any coding agent currently available.

That's a crowded week. And Google's Antigravity is free.

The Honest Verdict

Cursor 3 is the most technically sophisticated agent-first IDE available today. The Composer 2 benchmarks are real. The real-time RL pipeline is genuinely novel — a model that improves every five hours based on actual production usage is a different kind of product than anything that existed two years ago. The cloud agent infrastructure and enterprise self-hosting are serious engineering.

I use Cursor. I'll keep using it. The Tab completions alone justify the Pro subscription for daily work.

But the "agent-first" era will arrive more slowly and unevenly than this launch suggests. The transparency stumble with Kimi K2.5 was avoidable and unnecessary. The $200/month pricing ceiling is real. The trust deficit — 71% of developers who don't trust AI accuracy — didn't appear from nowhere. It was earned.

The developers most likely to benefit from Cursor 3 are those working in well-typed, well-tested codebases with clear architectural boundaries. Everyone else should use it as a powerful assistant, not an autonomous colleague.

The question isn't whether Cursor 3 is impressive. It is. The question is whether "you are the architect, agents are the builders" describes where we actually are — or where Anysphere needs us to believe we are to justify a $50 billion valuation.

Those are different questions with different answers.