Prompt Chaining
Also known as: Pipeline, Sequential Prompting, LLM Pipeline
Decomposes a task into a fixed chain of LLM calls, each consuming the previous output.
Claude Code
- Define each pipeline stage as a named step in `CLAUDE.md` so the agent follows the same decomposition every run.
- Use a hook to validate structured output between stages; exit non-zero to retry the failing stage.
- Pin the stage order in `settings.json` via an allowed tool sequence — prevents the agent skipping gates under time pressure.
- Keep each stage in a separate subagent when stages need clean context isolation from one another.
Primitives
Related patterns
Cursor
- Add a
.cursor/rules/*.mdcfile listing each pipeline stage and its acceptance criteria. - Use Agent mode with Plan mode first — review the generated step list before execution begins.
- Reference the output schema of each stage with
@fileso the next stage prompt sees the contract. - Set
alwaysApply: falseon stage rules and trigger them via@rule-nameonly when that stage is active.
Primitives
Related patterns
Decision
| Use when ✓ | Avoid when ✗ |
|---|---|
| +Use this when the task decomposes cleanly into 2–5 fixed stages whose order is known up front (extract, normalize, draft, review). | −When the path through the work is data-dependent and changes per request, a fixed chain forces a wrong shape on the task. Route or plan instead. |
| +Each stage should produce a parseable artifact (JSON object, list, scored shortlist) so a deterministic gate can validate the handoff between calls. | −Without inter-stage validation, a chain becomes a longer monolith: each stage launders the previous stage's errors into the next, and end-to-end accuracy falls below the single-prompt baseline. |
| +A good fit when accuracy outranks latency and a single monolithic prompt has started to drop instructions or hallucinate intermediate state. | −When stages are mutually independent, sequencing them wastes wall-clock time the parallelization pattern would recover. |
| +Reach for it as the first agentic shape on a new problem. It ships in a day, runs deterministically, and exposes which stage is the real source of error before any planner is introduced. |
In the wild
| Source | Claim |
|---|---|
| github.com → | Anthropic's public agent cookbook ships a runnable prompt-chain workflow that drafts marketing copy, gates the result with a programmatic check, then translates it. The decomposition matches the essay exactly. |
| docs.langchain.com → | LangGraph documents prompt chaining as one of the canonical workflow shapes, modelled as a small graph of typed nodes whose edges carry the validated artifact between stages. |
| openai.com → | OpenAI's Deep Research product runs a multi-stage pipeline that plans sub-questions, retrieves and reads sources for each, then synthesises a report. The chain is long-running, and its stages are visible to the user as a streaming progress trace. |
Reader gotcha
Anthropic's essay names the failure mode bluntly: the trade is latency for accuracy, and the win evaporates when the chain has no programmatic gate between stages. A malformed JSON or an off-topic draft at stage one rides through and the chain returns a confidently wrong final artifact. The gate is the pattern; the chain without it is just a longer prompt. source
Implementation sketch
import { generateText, generateObject } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'
const Outline = z.object({ topic: z.string(), sections: z.array(z.string()).min(3) })
async function draftReport(brief: string): Promise<string> {
const { object: outline } = await generateObject({
model: openai('gpt-4o-mini'),
schema: Outline,
prompt: `Produce an outline for: ${brief}`,
})
if (outline.sections.length < 3) throw new Error('outline gate: too few sections')
const { text: draft } = await generateText({
model: openai('gpt-4o'),
prompt: `Write the report. Topic: ${outline.topic}. Sections: ${outline.sections.join(', ')}`,
})
const { text: polished } = await generateText({
model: openai('gpt-4o-mini'),
prompt: `Edit for tone and clarity. Return only the edited text.\n\n${draft}`,
})
return polished
}
export {}
- LangChain
- LangGraph
- Vercel AI SDK
- Mastra
References
Prompt chaining decomposes a task into a fixed sequence of LLM calls in which each step's output becomes the next step's input. The decomposition is editorial: a human picks the seams, names the intermediate artifacts, and writes one prompt per stage so each prompt is small enough to be reliable on its own. Between stages the program holds the artifact in plain memory, and often runs a deterministic gate (a schema validator, a regex, a length check) that decides whether to advance, retry, or fail closed.
Background · context and trade-offs
The pattern is the first thing the Anthropic taxonomy reaches for when a task can be split cleanly. The trade is latency for accuracy: two or three smaller prompts produce more dependable answers than one heroic prompt that has to plan, retrieve, draft, and proofread inside one inference. Structured output earns the chain its reliability: each stage emits JSON or some other parseable shape so the next stage receives an object, not a paragraph, and the seams stay machine-readable. Without the gates, errors at stage one quietly ride through the rest of the chain.
Prompt chaining sits underneath the more elaborate topologies. Routing picks a chain at runtime; parallelization fans out independent stages; orchestrator-workers and ReAct add a planner that the chain itself does not have. The chain is fixed at author time, which is why it ships the soonest and breaks the latest. The cost is editorial debt: when the task changes shape, the seams have to be redrawn by hand, because no part of the chain decides for itself whether the next call is the right one.