Handoffs / Swarm
Also known as: Agent Handoff, Swarm, Triage-and-Transfer
Lets one agent own the dialogue and hand control off via a transfer tool call.
Claude Code
- Define each specialist as a separate SKILL.md with its own tool surface and trigger condition.
- Encode the handoff decision as a structured output step; the triage subagent emits the target skill name as a JSON field.
- Pass a compact written context packet (not the full transcript) to the specialist at handoff — prevents the new agent re-interpreting prior framing.
- Cap handoff depth in the orchestrator's instructions — declare a maximum count and treat exceeding it as a task-completion error.
Primitives
Related patterns
Cursor
- Create one
.cursor/rules/*.mdcper specialist withalwaysApply: falseand activate via@rule-namewhen handing off. - Use Agent mode as the triage layer — it decides when to switch context; activate the specialist rule after the classification.
- Pass a summary of the conversation to the specialist at handoff via
@filerather than relying on the raw chat history. - Use Plan mode to draft the handoff sequence upfront for complex multi-specialist tasks.
Primitives
Related patterns
Decision
| Use when ✓ | Avoid when ✗ |
|---|---|
| +Right call when a single conversation spans multiple narrow specialisations and the right specialist is only knowable after the user has spoken (customer support triage, multi-domain copilots, intake-then-treatment flows). | −When the work decomposes into independent parallel sub-tasks that fan out and synthesise, an orchestrator-workers pipeline produces results faster and with fewer round trips than a sequential chain of handoffs. |
| +Justified where each specialist needs a different system prompt, tool surface, or model tier, and you want the boundary between roles to be visible in the trace as a named transfer rather than buried inside one prompt. | −Without per-handoff scope guards and a turn budget, two specialists trained on adjacent domains will ping-pong the same request, and the swarm will spend tokens electing not to answer. |
| +A good fit when authority and audit matter: handoffs make the moment a conversation crossed from a generalist to a privileged tool surface (refunds, account changes, code-write access) an explicit, logged event. | −When the routing decision can be made once from the initial request and never revisited, a Routing classifier is a cheaper and more debuggable substitute than a triage agent that runs every turn. |
| +Useful when conversational continuity is the product: the user is talking to one assistant whose voice happens to shift, not filing tickets that get routed and replied to asynchronously. |
In the wild
| Source | Claim |
|---|---|
| github.com → | OpenAI Swarm ships a customer-service starter where a triage agent receives the message and calls `transfer_to_sales_agent` or `transfer_to_refunds_agent`; the chosen agent inherits the thread and either resolves the issue or hands back to triage. |
| openai.github.io → | The OpenAI Agents SDK exposes `handoff()` as a first-class primitive in both the Python and TypeScript runtimes; the docs show a triage agent listing specialist agents in its `handoffs` array, with each transfer recorded as a tool call in the trace. |
| langchain-ai.github.io → | LangGraph documents a Multi-Agent Supervisor pattern in which a supervisor node decides which worker agent runs next based on conversation state, then routes back to itself when the worker finishes. The wiring is the same handoff-and-return loop expressed as graph edges. |
Reader gotcha
Cognition argues from production experience that conversation-spanning handoffs reliably break because each new agent reads the same message history but reconstructs a different latent context, then makes decisions inconsistent with the prior agent's. The fix is not better prompts on each specialist; it is reducing the number of handoffs, or sharing a compact written context that travels with the transfer rather than relying on the raw message log. source
Implementation sketch
import { generateText, tool } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'
type AgentName = 'triage' | 'billing' | 'refunds'
const agents: Record<AgentName, { system: string }> = {
triage: { system: 'Triage. Answer general questions; transfer billing or refund requests.' },
billing: { system: 'Billing specialist. Resolve billing questions; hand back if out of scope.' },
refunds: { system: 'Refunds specialist. Process refund requests; hand back if out of scope.' },
}
export async function runSwarm(messages: { role: 'user' | 'assistant'; content: string }[]) {
let active: AgentName = 'triage'
const handoff = tool({
description: 'Transfer the conversation to another agent.',
parameters: z.object({ to: z.enum(['triage', 'billing', 'refunds']) }),
execute: async ({ to }: { to: AgentName }) => { active = to; return `transferred to ${to}` },
})
for (let turn = 0; turn < 6; turn++) {
const prev: AgentName = active
const { text } = await generateText({ model: openai('gpt-4o'), system: agents[active].system, messages, tools: { handoff }, maxSteps: 4 })
messages.push({ role: 'assistant', content: text })
if (active === prev) return text
}
throw new Error('Swarm turn budget exhausted')
}
- OpenAI Agents
- LangGraph
- Vercel AI SDK
- AutoGen
References
Handoffs treat a multi-agent system as a single conversation with a moving owner. One agent holds the dialogue at a time; when its scope runs out, it emits a transfer (a tool call whose return value is another agent rather than a string) and the runner switches the active agent for the next turn. The new agent inherits the message history and continues from where the previous one stopped, so the user sees one coherent thread even though several prompts and tool sets handled it. OpenAI shipped this as Swarm in 2024 and folded it into the Agents SDK as `handoff()`; both stay small because the runtime work is just a loop over messages and a pointer to the current agent.
Background · context and trade-offs
The pattern only earns its name when the transfer is the agent's own decision and the chosen specialist takes the conversation forward without supervision. A triage agent inspects the request, picks among a registry of specialists, and yields control; the specialist may hand back, hand to a peer, or finish. Roles are encoded as system prompts plus narrow tool surfaces, and the handoff vocabulary as the names of the transfer tools. Every transition is then a recorded tool call with arguments, replayable from the trace and editable by changing one specialist's instructions without touching the others.
Handoffs sit next to but distinct from Routing, Orchestrator-Workers, and Multi-Agent Debate. Routing is a one-shot classifier that fires before any agent runs and never re-routes; orchestrator-workers keeps a central planner that fans sub-tasks out in parallel and synthesises; multi-agent debate runs peer agents in parallel and cross-critiques. Handoffs are sequential and peer-to-peer: exactly one agent is live at any moment and the context is the message history itself. The cost is operational: deciding which agent owns scope creep, preventing two specialists from ping-ponging the same request, and bounding how many turns the swarm can spend electing not to finish.