Skip to content

Migrating from Claude Sub-agents

Claude Code’s sub-agent system is powerful. You define specialized agents with focused prompts, restricted tools, and independent contexts. Claude decides when to delegate, spawns sub-agents in foreground or background, and synthesizes results. It works.

But there’s a design choice buried in the architecture that matters more than any individual feature: who decides what happens next? In Claude Sub-agents, the answer is the LLM. The parent agent reads your request, evaluates sub-agent descriptions, and decides which one to spawn. The routing logic lives in inference, not in config.

This article explores why that matters, when it becomes a problem, and how duckflux offers an alternative where the orchestration is deterministic while the work inside each step stays as creative as the LLM needs to be.


Claude Sub-agents are markdown files with YAML frontmatter that define specialized AI assistants. Each sub-agent has its own system prompt, tool restrictions, model choice, and permission mode.

---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Grep, Glob, Bash
model: sonnet
---
You are a senior code reviewer. When invoked, analyze the code
and provide specific, actionable feedback on quality, security,
and best practices.

At runtime, Claude reads the description field of each available sub-agent and decides whether to delegate. You can nudge this with natural language (“use the code-reviewer agent”) or force it with @-mentions, but the routing is fundamentally an LLM decision.

Sub-agents run in their own context window. They can’t spawn other sub-agents. Results return to the parent, which synthesizes them. For parallel work, you can run multiple sub-agents in the background, or use agent teams for cross-session coordination.

Key capabilities:

  • Isolation. Each sub-agent has its own context, tools, and permissions.
  • Model routing. Haiku for cheap exploration, Opus for complex reasoning, Sonnet as default.
  • Worktree isolation. isolation: worktree gives a sub-agent a temporary git worktree.
  • Persistent memory. Sub-agents can accumulate learnings across sessions.
  • Hooks. PreToolUse, PostToolUse, Stop hooks for lifecycle control.
  • Background execution. Sub-agents run concurrently while you keep working.

Here’s the thing: Claude Sub-agents are orchestrated by inference. The LLM decides:

  1. Whether to delegate at all.
  2. Which sub-agent to spawn.
  3. What prompt to write for the sub-agent.
  4. When to synthesize results vs. spawn more agents.
  5. Whether to chain sub-agents or return to you.

Each of these decisions is a probabilistic inference. On a good day, Claude makes the right calls. On a bad day, it forgets to delegate, picks the wrong sub-agent, writes a vague task prompt, or synthesizes prematurely.

This is fine for interactive, exploratory work. You’re in the loop, you can redirect, you can say “no, use the reviewer agent.” But the moment you want a repeatable pipeline (plan, code, test, review, deploy), you’re asking the LLM to be a reliable router. And LLMs are unreliable routers. They forget steps, miscount iterations, and silently skip transitions.

The sub-agent docs themselves acknowledge this: sub-agents cannot spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent’s orchestration logic is just… its next token prediction.

Compare this to how we treat human workflows. Nobody says “here are five specialists, figure out the order.” We define processes, assign roles to steps, and execute deterministically. The specialists bring creativity; the process brings structure.


duckflux is a declarative, YAML-based workflow DSL. The execution order is defined in config, not inferred by an LLM. The runtime handles sequencing, loops, parallelism, retries, events, and tracing.

flow:
- type: exec
run: npm test

The key difference: duckflux separates orchestration from execution. The workflow file defines what happens in what order. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside each step. The workflow DSL handles the plumbing between steps.


It’s not binary. Different parts of a pipeline need different levels of determinism.

ConcernNeeds determinism?Why
Step orderingYesPlan before code, test before deploy. Not negotiable.
Retry logicYes”Retry 3 times with backoff” is a policy, not a creative decision.
Quality gatesYesTests pass or they don’t. Exit codes, not vibes.
Error handlingYes”If deploy fails, notify Slack” is a business rule.
Code generationNoThe LLM should be creative here.
Code reviewNoThe LLM should reason freely about quality.
PlanningNoBreaking tasks into subtasks is inherently creative.

Claude Sub-agents put everything on the non-deterministic side. duckflux lets you draw the line where it makes sense: deterministic orchestration, non-deterministic execution.


Claude Sub-agentsduckfluxNotes
Sub-agent (markdown file)ParticipantA unit of work. In duckflux, not limited to LLM invocations.
description (LLM-routed)Flow position / when guardExplicit placement replaces LLM routing decisions.
Parent decides delegationflow arrayOrdering is declared, not inferred.
maxTurnsretry.max / loop.maxIteration caps per step, not per agent context.
isolation: worktreecwd per participantWorking directory isolation per step.
Background sub-agentsparallel: constructConcurrent execution declared in config.
Chained sub-agentsSequential flowNo LLM needed to decide “run B after A.”
Sub-agent hooksonError, when, emit/waitLifecycle control in the DSL, not in hook scripts.
Persistent memoryexecution.context / setWorkflow-scoped state. Cross-session memory is outside duckflux scope.
Model routingN/A (bring your own agent CLI)duckflux orchestrates commands; model choice is per-agent.

In Claude Sub-agents, chaining requires the parent to decide the sequence:

Use the code-reviewer subagent to find performance issues,
then use the optimizer subagent to fix them

The parent LLM interprets “then” and decides to spawn the optimizer after the reviewer. If it misunderstands, it might run them in parallel, skip the optimizer, or synthesize prematurely.

duckflux:

participants:
review:
type: exec
run: cat PROMPT_REVIEW.md | $AGENT
optimize:
type: exec
run: cat PROMPT_OPTIMIZE.md | $AGENT
flow:
- review
- optimize

“Then” is a line break in the YAML. No inference needed.

Claude Sub-agents can run research in parallel via background tasks:

Research the authentication, database, and API modules
in parallel using separate subagents

Again, the parent decides whether to actually parallelize, which sub-agents to use, and how to synthesize.

duckflux:

flow:
- parallel:
- as: auth-research
type: exec
run: cat PROMPT_AUTH.md | $AGENT
- as: db-research
type: exec
run: cat PROMPT_DB.md | $AGENT
- as: api-research
type: exec
run: cat PROMPT_API.md | $AGENT

Parallelism is declared. All three run concurrently. The outputs are collected in an array for the next step. No LLM routing decision required.

A common Claude Sub-agents pattern: code, then review, then fix if needed. The parent decides when to stop.

Use the coder subagent to implement the feature,
then use the reviewer subagent to check it.
If there are issues, have the coder fix them.
Repeat until the reviewer approves.

The parent LLM manages the iteration. It decides whether to loop, how many times, and when to stop. If it loses track, the loop might run forever (capped by maxTurns) or stop too early.

duckflux:

participants:
code:
type: exec
run: cat PROMPT_CODE.md | $AGENT
onError: retry
retry:
max: 3
backoff: 2s
test:
type: exec
run: npm test
lint:
type: exec
run: npm run lint
review:
type: exec
run: cat PROMPT_REVIEW.md | $AGENT
flow:
- loop:
until: review.output.approved == true
max: 5
steps:
- code
- test
- lint
- review

The loop condition, iteration cap, and quality gates are all in the config. The LLM does creative work inside code and review. The DSL handles the loop, the exit condition, and the gates. test and lint are real commands with real exit codes, not prompt instructions asking the agent to self-report.

Claude Sub-agents excel at tool restriction. A reviewer gets Read, Grep, Glob but not Write, Edit. This is enforced by the framework.

duckflux doesn’t enforce tool restrictions because it doesn’t manage agent sessions. Each exec step runs a shell command. If that command invokes an agent CLI, the agent’s own config handles tool restrictions.

participants:
review:
type: exec
run: claude --agent code-reviewer --print "Review the auth module"
# The code-reviewer agent definition restricts its own tools

This is a tradeoff. duckflux gives you orchestration without coupling to a specific agent runtime. Tool isolation is the agent’s responsibility, not the workflow’s.

Claude Sub-agents have no event system. If sub-agent A needs to signal sub-agent B, the parent synthesizes A’s output and writes B’s prompt. The coordination happens in the parent’s inference.

duckflux has native emit + wait for cases where steps genuinely need to signal each other:

flow:
- parallel:
- as: data-prep
type: exec
run: ./prepare-data.sh
- as: wait-for-data
type: exec
run: |
# This branch waits for data-prep to signal readiness
- wait:
event: "data.ready"
timeout: 5m
- as: process
type: exec
run: ./process.sh

Events work across parallel branches, across parent/child workflows, and with external event hubs (NATS, Redis). This is coordination infrastructure that the sub-agent model lacks entirely.


Sub-agents are the right tool when:

  • You’re working interactively. Typing in Claude Code, exploring a codebase, asking questions. The LLM-routed delegation is exactly right here because you’re in the loop.
  • The workflow is genuinely emergent. You don’t know the steps upfront. The agent needs to figure out what to do based on what it finds.
  • Context preservation matters. Each sub-agent’s isolated context window prevents pollution of the main conversation. This is a real advantage for high-volume operations.
  • You need model routing. Sending cheap tasks to Haiku and expensive tasks to Opus within a single session is built into the sub-agent model.

Switch when:

  • The workflow is repeatable. If you’ve typed the same chaining instructions more than twice, it should be a config file.
  • You need guaranteed step ordering. Plan, code, test, review, deploy. Always in that order. No exceptions.
  • You need real quality gates. Not “please run the tests”, but npm test as an actual step with an exit code.
  • You need audit trails. Structured JSON traces per step, visible in the web server UI.
  • You need cross-agent events. Steps signaling each other, waiting for external events, publishing to message queues.
  • You want provider independence. duckflux orchestrates $AGENT, not Claude specifically. Swap agents per step.

ConcernClaude Sub-agentsduckflux
RoutingLLM decides (probabilistic)Config declares (deterministic)
Step orderingParent LLM inferenceflow array, top to bottom
Quality gatesPrompt instructionsReal commands with exit codes
RetrymaxTurns (global per agent)retry.max with backoff (per step)
ParallelBackground sub-agents (LLM decides)parallel: construct (declared)
EventsNoneemit + wait (cross-branch, cross-workflow)
TracingTranscript filesStructured JSON + web server UI
Provider lock-inClaude Code onlyAny agent CLI, any runtime
  • Interactive delegation. The natural “use the reviewer agent” UX in Claude Code. duckflux is a runner, not an interactive assistant.
  • Context isolation. Sub-agents protect the parent’s context window. duckflux steps are independent commands, but they don’t share a conversation context across steps.
  • Tool restriction enforcement. Sub-agents have framework-level tool control. In duckflux, that’s the agent’s responsibility.
  • Model routing within the workflow. Sub-agents can use different models per agent. In duckflux, each exec step invokes whatever CLI you point it at.
  • Persistent memory. Sub-agents accumulate learnings across sessions. duckflux has execution.context for within-workflow state, but cross-session memory is outside scope.

You don’t have to choose one or the other. The most practical architecture uses both:

ci-pipeline.duck.yaml
participants:
plan:
type: exec
run: claude --agent planner --print "$(cat SPEC.md)"
code:
type: exec
run: claude --agent coder --print "Implement the plan in PLAN.md"
onError: retry
retry:
max: 3
test:
type: exec
run: npm test
lint:
type: exec
run: npm run lint
review:
type: exec
run: claude --agent reviewer --print "Review the implementation"
flow:
- plan
- loop:
until: review.output.approved == true
max: 5
steps:
- code
- test
- lint
- review

Each claude --agent step is a Claude Sub-agent invocation. The sub-agent gets its isolated context, restricted tools, and specialized prompt. But the orchestration (ordering, looping, gating, retrying) is declarative. The LLM does creative work. The YAML handles plumbing.

This is the core argument: decouple what the LLM is good at (reasoning, generation, analysis) from what config files are good at (sequencing, retrying, branching, gating). Don’t ask the LLM to be a router when you already know the route.


  1. Install the runtime:
Terminal window
bun add -g @duckflux/runner
  1. Identify your repeatable workflows. Which sub-agent chains do you run the same way every time?

  2. Extract the ordering into a .duck.yaml. Each sub-agent becomes a participant. The chain becomes the flow.

  3. Add real quality gates. Replace “please run the tests” with actual npm test steps.

  4. Run it:

Terminal window
quack run my-pipeline.duck.yaml
  1. Observe via quack server --trace-dir ./traces for a visual trace of every step.

Claude Sub-agents represent a real step forward in AI-assisted development. The isolated contexts, tool restrictions, and model routing are well-designed primitives.

But the orchestration layer, where the LLM decides what to delegate, when, and in what order, is the weak link. Not because Claude is bad at it, but because orchestration is fundamentally a deterministic problem being solved with a probabilistic tool.

duckflux doesn’t replace the agents. It replaces the part of the system that shouldn’t be guessing.