Karpathy's 'Software 3.0' Framework: The Difference Between Vibe Coding and Agentic Engineering
By EndOfCoding
Andrej Karpathy — the researcher who coined the term 'vibe coding' in February 2025 — has updated his mental model. In a widely-discussed talk and essay in May 2026, Karpathy introduced the 'Software 3.0' framework, which draws a sharp conceptual line between two modes of AI-assisted development that are often conflated: vibe coding and agentic engineering. The distinction matters practically. Vibe coding — the mode most developers have adopted — is interactive and conversational: you describe what you want, the AI generates it, you review and iterate. Agentic engineering is something different in kind: you specify outcomes with formal exit criteria, deploy autonomous agents to accomplish them, and govern the process rather than producing the code yourself. Karpathy's argument is that Software 3.0 encompasses both, but that conflating them leads to mismatched expectations, wrong tooling choices, and predictable failures. This post breaks down the Software 3.0 framework, what the vibe coding vs. agentic engineering distinction means for your workflow, and how to calibrate which mode to use for which tasks.
What You'll Learn
You'll understand the Software 3.0 framework Karpathy introduced in May 2026 and how it extends his original vibe coding concept, the precise distinction between vibe coding (interactive, conversational) and agentic engineering (autonomous, outcome-specified), how to identify which mode is appropriate for a given task and what the failure modes are when you choose wrong, what 'agentic engineering' skills look like in practice and how they differ from prompt engineering, and what Karpathy's framework predicts about where AI-assisted development is heading in the next 12-24 months.
The Software 3.0 Framework
Karpathy's Software 3.0 framework positions AI-native development as the third paradigm of software engineering:
Karpathy's Software Paradigm History:
Software 1.0 (Classical):
├── Humans write explicit rules in formal programming languages
├── Program behavior is fully determined by the code humans write
├── Debugging means reading code and reasoning about control flow
└── Limits: Only works when humans can fully specify the rules
Software 2.0 (Neural Networks / ML):
├── Humans design architectures and training processes
├── Program behavior emerges from optimization over data
├── 'Code' is weights, not source files — not human-readable
└── Limits: Requires large labeled datasets; black-box behavior
Software 3.0 (AI-Native / LLM-Driven):
├── Humans describe intent in natural language
├── LLMs translate intent into executable software artifacts
├── Development is iterative dialogue rather than formal specification
└── Encompasses two distinct modes: vibe coding and agentic engineering
The key insight in Software 3.0 is that natural language is now a programming interface. The skill is no longer (only) writing code in formal languages — it's specifying intent clearly enough that LLMs can translate it correctly.
Vibe Coding vs. Agentic Engineering: The Distinction
Karpathy's framework makes a distinction that's subtle but has large practical consequences:
Vibe Coding (Software 3.0, Mode 1):
├── Mode: Interactive and conversational
├── Loop: Human describes → AI generates → Human reviews → iterate
├── Control: Human maintains moment-to-moment control of the process
├── Output: Code that the human validates at each step
├── Timeframe: Fast iterations, minutes to hours per task
├── Error recovery: Human catches and corrects errors in the review step
├── Primary skill: Prompt engineering — communicating intent precisely
└── Best for: Well-scoped tasks with clear, human-verifiable outputs
Examples of vibe coding:
- 'Build me a login form with email/password validation'
- 'Refactor this function to use async/await'
- 'Generate unit tests for this service class'
- 'Debug why this API endpoint returns 500 on null input'
---
Agentic Engineering (Software 3.0, Mode 2):
├── Mode: Autonomous and outcome-specified
├── Loop: Human specifies exit criteria → Agent executes autonomously →
│ Human governs and evaluates outcomes
├── Control: Human sets constraints and evaluates results, not individual steps
├── Output: Deliverables the human evaluates against exit criteria
├── Timeframe: Long-running (hours to days for complex tasks)
├── Error recovery: Agent self-corrects within constraints; escalates on blockers
├── Primary skill: Task specification — defining clear exit criteria and constraints
└── Best for: Complex, multi-step tasks where delegating the process is valuable
Examples of agentic engineering:
- 'Migrate the authentication system from JWT to session-based auth.
Exit criteria: all existing tests pass, no regressions in auth flows,
documentation updated, no hardcoded secrets in new code.'
- 'Audit the entire codebase for SQL injection vulnerabilities and
generate a prioritized remediation plan. Exit criteria: every
database query reviewed, OWASP SQL injection checklist verified,
findings documented with severity and fix recommendations.'
- 'Build and deploy the new pricing page matching the design spec.
Exit criteria: design matches Figma at ±2px, all pricing tiers
correctly displayed, Paddle integration working in sandbox.'
The critical difference: in vibe coding, the human is in the loop at each generation step. In agentic engineering, the human is governing the process — defining success and reviewing outcomes — while the agent handles the execution steps autonomously.
Why the Distinction Matters: Failure Modes When You Get It Wrong
Confusing the two modes leads to predictable failures in both directions:
Failure Mode 1: Treating agentic tasks as vibe coding
Pattern: You try to accomplish a complex, multi-step task interactively,
stepping through each piece conversationally.
What breaks:
├── Context fragmentation — long interactive sessions lose context from
│ early steps, causing inconsistency across the codebase
├── No systematic verification — without exit criteria, the agent doesn't
│ know when it's done or what 'done' looks like
├── Scope creep — interactive sessions drift as new sub-problems appear
└── Result: Slow, inconsistent, hard-to-validate output
Example: Trying to migrate an entire auth system through a conversational
session — 50+ turns in, the AI has forgotten the constraints from turn 1.
Better approach: Write a spec with exit criteria, deploy an agentic session.
---
Failure Mode 2: Treating vibe coding tasks as agentic engineering
Pattern: You write elaborate exit criteria and agent specs for tasks that
were well-suited for a quick interactive conversation.
What breaks:
├── Over-engineering — spending 30 minutes writing a spec for a 5-minute task
├── Reduced control — you lose the moment-to-moment human judgment that
│ catches issues in real time
├── Autonomy mismatch — the agent makes decisions you'd have made differently
│ if you'd been in the loop
└── Result: Unnecessary complexity, slower iteration for simple tasks
Example: Writing a full agentic spec to add a delete button to a UI — when a
30-second vibe coding prompt would have been the right tool.
The Agentic Engineering Skill Set
The skills Karpathy identifies as central to agentic engineering are different from — and in some ways harder than — traditional prompt engineering:
Agentic Engineering Core Skills:
1. Exit Criteria Specification
Not: 'Make the tests pass'
But: 'All unit tests pass. No regressions in integration tests.
No functions with cyclomatic complexity > 10 introduced.
No new dependencies without justification in commit message.'
The exit criteria must be:
├── Verifiable (the agent can check them programmatically)
├── Complete (covers all dimensions of 'done')
└── Unambiguous (no room for plausible misinterpretation)
2. Constraint Design
├── What the agent may NOT touch (blast radius control)
├── What tools and resources the agent may use
├── What to do when blocked (escalate vs. attempt workaround)
└── Checkpointing strategy for long-running tasks
3. Outcome Evaluation
├── Reviewing agent deliverables against exit criteria
├── Identifying what the agent did correctly vs. incorrectly
├── Distinguishing 'agent error' from 'spec error'
└── Providing feedback that improves future agent specs
4. Process Governance
└── Monitoring agent progress without micromanaging
(the 'manager, not reviewer of every line' posture)
Choosing the Right Mode for a Given Task
A practical decision framework for choosing between vibe coding and agentic engineering:
Choose VIBE CODING when:
├── Task scope: Clear and contained (< 5 files, < 2 hours)
├── Verifiability: You can verify correctness at each generation step
├── Iteration: Fast back-and-forth improves the result
├── Human judgment: The task benefits from your real-time decisions
└── Examples: Implementing a feature, writing tests, debugging a specific bug
Choose AGENTIC ENGINEERING when:
├── Task scope: Complex and multi-step (5+ files, multi-hour execution)
├── Parallelization: The task can be decomposed into parallelizable sub-tasks
├── Exit criteria: 'Done' can be specified formally and verified systematically
├── Autonomy: The execution steps are lower-judgment than the outcome evaluation
└── Examples: Large refactors, codebase audits, migration projects, build pipelines
Red flags for each:
├── Vibe coding red flag: You're 20+ turns in and the session feels unfocused
│ → Stop and write an agentic spec instead
└── Agentic engineering red flag: You keep intervening in the agent's execution
→ The task needed more human judgment than agentic autonomy; switch modes
What Software 3.0 Predicts About Where Development Is Heading
Karpathy's framework carries predictions about the next phase of AI-assisted development:
Karpathy's Software 3.0 predictions (May 2026):
├── The 'developer' role bifurcates:
│ → Vibe coders: Rapid interactive implementation, high human-in-loop
│ → Agentic engineers: System specification, agent orchestration, outcome evaluation
│ → Most developers will do both; the ratio shifts by seniority and task type
│
├── 'Programming' increasingly means specifying outcomes, not writing code:
│ → The most valuable skill becomes clarity of specification, not typing speed
│ → Formal methods and systems thinking become more valued, not less
│
├── Context management becomes a core engineering discipline:
│ → Codebase context files (CLAUDE.md, .cursorrules, agent specs) become
│ first-class engineering artifacts maintained with the same rigor as tests
│
├── Quality gates shift from code review to exit criteria verification:
│ → PR review of AI-generated code is insufficient — the spec that generated
│ the PR is where quality is actually determined
│
└── The ratio of agentic to interactive work increases over 12-24 months:
→ As agents become more capable and reliable, the appropriate scope of
agentic delegation grows; interactive vibe coding handles the long tail
Common Challenges
'How do I know when my exit criteria are good enough to run an agentic task?' — A useful test: if you read the exit criteria cold tomorrow, would you know unambiguously whether each criterion is satisfied? If there's any ambiguity, the agent will fill it with its own judgment, which may not match yours. Add specificity until each criterion has a clear yes/no check. 'Do I need special tools for agentic engineering vs. vibe coding?' — The same tools can operate in both modes: Claude Code, Cursor, and Aider all support both interactive sessions and autonomous agentic runs. What differs is how you invoke them and what context you provide. For agentic runs, you provide a formal spec file as context; for vibe coding, you interact conversationally. 'Isn't agentic engineering just project management for AI agents?' — Yes, in some sense — and that's Karpathy's point. The skill of clear outcome specification, constraint design, and outcome evaluation is more similar to technical project management than to traditional coding. This is a significant identity shift for developers who define themselves by their ability to write code. 'What if the agent hits a blocker mid-task?' — This is a key design decision in your agent spec: explicitly define what the agent should do when blocked. Options: escalate to human (most common for ambiguous situations), attempt an alternative approach with defined constraints, or checkpoint and stop with a status report. Leaving this undefined is the most common agentic engineering mistake.
Advanced Tips
Write agent specs as if you're handing the task to a junior engineer who takes everything literally. The ambiguity that a senior engineer would resolve with judgment becomes an agent error. Every assumption you leave implicit becomes a potential failure mode. This is uncomfortable for developers used to the flexibility of vibe coding — but the discipline produces dramatically better agentic outcomes. Use the failed spec as a feedback artifact. When an agentic run produces output that doesn't match your intent, the failure is almost always in the spec, not the agent. Before re-running, update the spec with the constraint that would have caught the error — the spec becomes more complete with each iteration. Build a spec library for recurring task types. Agentic specs for common task categories (auth migrations, API integrations, test suite generation, security audits) are largely reusable. A well-written spec for 'migrate from library A to library B' generalizes across projects with minor modifications. This library is one of the highest-leverage productivity artifacts a vibe coding team can build. The Vibe Coding Academy Advanced Track covers agentic engineering skills in depth — Module 11 (Multi-Agent Development) and Module 14 (Scaling AI-Built Products) are the most directly relevant to the Software 3.0 framework Karpathy describes. The Vibe Coding Ebook Chapter 6 (The Agent Revolution) and Chapter 16 (What Comes Next) have been updated to incorporate the Software 3.0 framework as of this month's subscriber update.
Conclusion
Karpathy's Software 3.0 framework is the clearest conceptual map yet of where AI-assisted development is and where it's heading. The vibe coding vs. agentic engineering distinction isn't a hierarchy — neither mode is better — it's a tool selection problem. The developers who will produce the highest-quality output fastest are those who recognize which mode fits a given task and shift fluidly between them. Right now, most developers with AI coding experience are primarily vibe coders — comfortable with interactive, conversational AI assistance. The agentic engineering skill set (exit criteria specification, constraint design, outcome evaluation) is less common and increasingly valuable as agents become capable of longer autonomous runs. Building that skill set now — before it becomes table stakes — is the highest-ROI investment in your AI-assisted development practice. The Vibe Coding Academy curriculum is built around both modes, teaching you to recognize the right tool for each task and develop proficiency in both. Follow the latest developments in agentic engineering at EndOfCoding.