Spec-driven development

Definition

Spec-driven development is an approach to building AI systems — agents, pipelines, tools, and workflows — where behavior is grounded in explicit, readable specifications rather than being encoded entirely in model weights or hand-crafted prompt strings. A specification defines what the system should do, what outputs are allowed, which actions are permitted, what constraints must hold, and what success looks like. These specs can take many forms: natural language documents, JSON schemas, OpenAPI definitions, formal rules, or structured requirement sets — and they are treated as first-class artifacts that are versioned, tested against, and retrieved at runtime.

The core idea is to separate the definition of behavior from its implementation. Instead of baking all rules into a monolithic system prompt or fine-tuned model, you maintain a live specification that can be updated, audited, and retrieved independently. In the RDD (Retrieval-Driven Development) pattern, specs are indexed in a vector store or document repository; at runtime the agent retrieves the relevant spec fragments for the current task and grounds its decisions in them. This makes behavior auditable, correctable without retraining, and aligned with the specification that domain experts or compliance teams can read and approve.

Spec-driven development is especially valuable for agents operating in regulated or safety-critical domains, where the cost of misaligned behavior is high and compliance teams need to verify what the system is allowed to do. It also complements prompt engineering — specs provide the stable semantic content; prompts orchestrate how the model reasons about and applies them. The approach contrasts with vibe coding, where behavior emerges iteratively from loose intent rather than from explicit requirements.

How it works

Spec authoring and indexing

Specifications are written in a structured but human-readable format. For an agent, a spec might define allowed tool calls, required output format, constraints on what information can be disclosed, and success criteria. These specs are chunked and indexed — in a vector store for semantic retrieval, or in a structured database for exact lookup — so that relevant fragments can be retrieved at inference time.

Retrieval, generation, and validation

Validation and correction

The validator checks that the generated output or action conforms to the spec: schema validation for structured outputs (JSON Schema, Pydantic), rules-based checks for constraints, or a secondary model call that verifies compliance. If validation fails, the system can retry with the violation description added to context, escalate to a human, or surface a structured error. This closed loop keeps behavior aligned with the spec even when the model would otherwise deviate.

When to use / When NOT to use

Use when	Avoid when
Agent behavior must be auditable and match documented requirements	Requirements are entirely unknown and need to be discovered iteratively
Compliance or safety teams need to approve and review system behavior	The task is exploratory prototyping where the spec would change every iteration
Behavior needs to be updated without retraining (by changing the spec)	The spec is too complex or ambiguous for a model to reliably apply at runtime
Output format and constraints must be enforced reliably	Latency from spec retrieval + validation is unacceptable for the use case

Comparisons

Approach	Behavior defined by	Updatable without retraining	Auditable
Spec-driven (RDD)	Explicit specs retrieved at runtime	Yes	Yes
Prompt engineering	System prompt and examples	Partial (prompt changes)	Limited
Fine-tuning	Model weights	No	Hard
Vibe coding	Iterative user-model dialogue	N/A (exploratory)	No

Pros and cons

Pros	Cons
Behavior is auditable and human-readable without inspecting weights	Spec retrieval and validation add latency and infrastructure complexity
Specs can be updated by domain experts without retraining	Model may misinterpret or incompletely apply retrieved spec fragments
Enables compliance review and signoff on system behavior	Requires discipline to maintain spec quality and coverage as requirements evolve
Validation catches spec violations before they reach users	Not suitable for tasks where requirements are inherently fuzzy or emergent

Code examples

Structured output with spec validation using Pydantic and OpenAI (Python)

from pydantic import BaseModel, field_validator
from openai import OpenAI
import json

client = OpenAI()

# Define the output spec as a Pydantic model
class SupportResponse(BaseModel):
    category: str  # "billing", "technical", "account", "other"
    priority: str  # "low", "medium", "high"
    summary: str
    suggested_action: str

    @field_validator("category")
    @classmethod
    def validate_category(cls, v: str) -> str:
        allowed = {"billing", "technical", "account", "other"}
        if v not in allowed:
            raise ValueError(f"category must be one of {allowed}")
        return v

    @field_validator("priority")
    @classmethod
    def validate_priority(cls, v: str) -> str:
        allowed = {"low", "medium", "high"}
        if v not in allowed:
            raise ValueError(f"priority must be one of {allowed}")
        return v

# System spec retrieved at runtime
spec = """
You are a support ticket classifier. Classify the ticket according to these rules:
- category: billing (payment issues), technical (bugs/errors), account (login/access), other
- priority: high (data loss, service outage), medium (degraded functionality), low (cosmetic/minor)
- summary: one sentence describing the issue
- suggested_action: one sentence recommending next steps
Output ONLY valid JSON matching the schema.
"""

ticket = "I can't log in to my account and my subscription payment failed this morning."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": spec},
        {"role": "user", "content": ticket},
    ],
    response_format={"type": "json_object"},
)

raw = response.choices[0].message.content
parsed = SupportResponse(**json.loads(raw))
print(parsed.model_dump_json(indent=2))

Practical resources

OpenAI – Structured outputs — Native JSON schema enforcement in the API
LangChain – Output parsers — Parsing and validating LLM outputs against schemas
Pydantic documentation — Data validation and schema definition in Python
Instructor library — Structured LLM outputs with Pydantic, retry logic, and validation
Guardrails AI — Framework for spec-driven output validation and correction