Simon Willison admits his line between ‘vibe coding’ and ‘agentic engineering’ is collapsing
Simon Willison spent the early days of agentic AI carefully drawing a line. Vibe coding was acceptable for personal tools where bugs hurt nobody. Agentic engineering — meaning code shipped to other people, in production — required professional rigor, security awareness, and human review. The two were not the same activity.
In a recent essay titled Vibe coding and agentic engineering are getting closer than I’d like, Willison admits the line is collapsing in his own practice. As Claude Code has become more reliable, he finds himself skipping code review for production systems. He writes: if I haven’t reviewed the code, is it really responsible for me to use this in production?
He answers the question by continuing to do it anyway, which is the most honest part of the essay.
The accountability gap, named
Willison’s specific framing: Claude Code does not have a professional reputation! It can’t take accountability for what it’s done. Human engineering teams build reputations and face consequences for poor work. The agent doesn’t. The agent generates code; the human merges the PR; the human’s name is on the deploy. Yet the human is increasingly treating the agent’s output the way they’d treat a colleague’s, without the social and professional structures that make that trust earned.
This is the normalization of deviance pattern from incident analysis literature — each successful skipped step reinforces skipping the next step, until the system fails in a way that recovers nothing of the previously banked trust.
Why this Willison post matters more than most
Willison is not an AI critic. He’s an enthusiastic, deeply technical practitioner who has been bullish on these tools since well before they were good. When he admits the line is moving, he’s not concern-trolling. He’s flagging an empirical pattern he’s observing in himself. That makes the post unusually trustworthy as a signal.
The pattern he names — agent output normalized to colleague-level trust, despite no accountability structure — applies to every dev using Claude Code, Cursor, Codex, or any other agentic tool in production. It’s not solved by you should review more code. It’s solved (if at all) by either rebuilding the accountability layer or restructuring the trust expectation. Neither move is in any vendor’s roadmap.
The actionable read for an indie founder
The gap Willison names is your liability surface if you ship code where bugs cost real users real money. The vendor cannot be held accountable for what their agent wrote into your codebase. You can. That asymmetry exists regardless of how good the code looks at PR review time. The only durable defense is to keep a categorical line in your own workflow: agent-generated code touching X (payments, auth, data export) gets human-by-human review every time, no exceptions. Agent-generated code touching Y (internal tooling, scripts, docs) doesn’t need that gate.
Willison’s essay reads as confessional but it’s actually a recommendation: maintain the line in writing, with categories you commit to, before the convenience erodes it for you.
Log in to join the discussion.