Skip to content
← Blog

Master All 6 Claude Code Workflows: Decision Guide

16 min read

Most engineers run Claude Code the same way they once used autocomplete: type a prompt, read the reply, fix what's wrong by hand, repeat. That loop is fine for a rename or a throwaway regex. For anything heavier, it quietly caps your output and still bills you for every round trip. Claude Code workflows give you six distinct ways to structure work, and picking the wrong one costs tokens, wall-clock time, or a merge conflict nobody saw coming. What follows is a decision map built from real task characteristics: when each pattern earns its place, what it actually costs, and how to wire several together without redundancy.

Why the Agentic Loop Beats Prompt-and-Paste Development

The chat API returns one completion and stops. You read it, decide what's wrong, rephrase, and send another prompt, which means you are the loop. That works when the task fits one context window and you already know enough to evaluate the answer. When it doesn't, you spend more time on the seams between turns than on the work itself.

Claude Code's agentic execution loop changes the unit of work. Claude reads context, forms a plan, calls a tool, inspects the tool's output, and iterates against that result rather than against your original prompt. A failing test doesn't send Claude back to you; it sends Claude back into the code. A grep that returns nothing causes the plan to revise itself. The loop catches the classes of error that a single completion can't see, because seeing them requires running something and checking what comes back.

What "dynamic" means in practice is that the plan is not a document; it's a state that updates mid-task based on tool output. Paste-and-prompt leaves you holding that state in your head across multiple browser tabs. The agentic loop holds it in the session, and that's where the real throughput difference lives.

The Agentic Foundation All Six Workflows Share

Strip the branding, and all six patterns run the same engine at different RPMs. Claude reads context, forms a plan, calls a tool, checks the tool's output, and iterates against that output instead of against your original prompt.

The boundary governing how much autonomy Claude holds isCLAUDE.md. Treat it as the onboarding document a new hire reads on day one: persistent, project-level, loaded at session start. Keep it under 200 lines. Anything longer burns context on every single turn for instructions most tasks never touch, which is why language-specific or workflow-specific rules belong in skills that load on demand rather than in the always-on file. Tool permissions are the other half of the boundary, and they bite later in this guide, because sub-agents don't respect the parent session's permission mode the way most people assume.

Workflow 1: Explore, Plan, Code, Commit

This is the safe default; reach for it roughly nine times out of ten. The four phases run in a deliberate order: explore reads relevant files without touching them, plan proposes a change you approve before a single edit lands, code implements only the approved plan, and commit stages a reviewable diff. Run them out of order, and you get the failure everyone complains about, where Claude rewrites three files before you realize it misread the first one.

The gate is plan mode. Launch with claude --permission-mode plan, and Claude reads and proposes but edits nothing until you say go;Shift+Tab toggles the mode mid-session when you decide a task needs the leash. Encode this once and stop re-explaining it every session:

# CLAUDE.md
## Default workflow: Explore -> Plan -> Code -> Commit
1. Explore: read the files I name plus their direct imports. Do not edit.
2. Plan: in plan mode, return a numbered diff outline. Stop for approval.
3. Code: implement only the approved plan; run the test suite per file.
4. Commit: stage, write a conventional-commit message, show the diff first.
## Guardrails
-   Never run migrations, deploys, or `git push` without explicit approval.
-   Keep this file under 200 lines; push language rules into skills.

I ran exactly this against a 500-line scheduling module that needed its retry logic pulled into a separate policy object. Explore and plan were clean: Claude mapped the call sites and proposed a six-step diff I approved with one edit. Where it stalled was the commit phase: a test that depended on wall-clock time kept the loop churning on the implementation instead of flagging the test itself. The lesson generalizes: the loop is only as honest as your verification step, which is the entire point of the next pattern.

Workflow 2: Test-First Development

Write the failing spec first, then let Claude build the implementation that turns it green. Done well, this is the strongest accuracy lever Claude Code offers, because the test becomes an oracle the model can't argue with. Done carelessly, it teaches Claude to cheat.

Here is the trap, concretely. Ask for a function that "returns the active users" and assertresult.length === 3 against a fixture, and a model under pressure will happily writereturn fixtures.slice(0, 3). The test passes; the implementation is a lie. Assertions that pin a shape but not the logic are satisfiable by faking the result, and Claude'sacceptEdits-mode eagerness finds those shortcuts fast.

Write tests that exercise behavior across inputs the fixture doesn't pre-bake: property-based checks, an edge case the hardcoded answer would fail, a second assertion on a different input set. The spec has to be a wall, not a turnstile. When it is, test-first beats Explore-Plan-Code-Commit for any change where correctness is sharply defined and you already know what "right" looks like. The test-driven development workflow in the official docs covers the setup in more detail.

Workflow 3: Multi-Claude Orchestration with Agent Teams

One caveat goes before anything else: Agent Teams is experimental and off by default. Enable it withCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 and accept its limitations going in./resume and/rewind don't restore in-process teammates after a session resumes; you get one team per session; and teams can't nest. Build a production pipeline on those assumptions, and a dropped session resumption will cost you a debugging afternoon.

What you get in exchange is genuine coordination. A lead orchestrates teammates who share a task list, claim work through file locking so two agents never grab the same task, and message each other peer-to-peer. That last point is the real difference from plain sub-agents, which only report back up to the orchestrator and never sideways. Anthropic's recommended shape is three to five teammates carrying five or six tasks each, every teammate with its own independent context window. A concrete fit: reviewing open PRs across three microservices simultaneously.

export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
claude --permission-mode plan
# in-session:
> Form a team of 3 teammates, one per service: auth-svc, billing-svc,
> notify-svc. Each reviews its open PRs and posts findings to the shared
> task list. Flag cross-service contract breaks to the team. Do not merge.

That works because the three reviews are independent reads. The instant the task involves a shared schema or a contract both services own, peer messaging stops being convenience and starts being the only thing keeping them coherent.

Workflow 4: Parallel Sub-Agent Fan-Out

Independence is the entire gating criterion. If two tasks can touch the same file, they don't fan out; they collide, and you spend the time you saved resolving merge conflicts. When tasks are genuinely disjoint, like running test suites across four separate repos, parallel sub-agent execution cuts wall-clock time by 60 to 80 percent, while token cost scales linearly with agent count.

Do the arithmetic before you commit, because the linear-token part is what bites. Say each repo takes 30 minutes and 50,000 tokens to test.

ApproachWall clockTokens
Sequential, 1 agent, 4 repos4 × 30 = 120 min4 × 50k = 200k
4-worker parallel fan-out~32 min~200k
4 teammates in plan mode (Agent Teams)~32 min~1.4M

The fan-out row buys roughly 88 minutes back at no extra token cost. The Agent Teams row is the one engineers walk into blind: teammates running in plan mode use about 7x the tokens of a standard session, landing near 1.4 million tokens for the same four test runs. You pay 7x for coordination you didn't need, since the repos never had to talk to each other. Stay under the ceilings too: four agents are nowhere near the 16-concurrent cap, but naively batching a 500-file migration into one agent per file would blow through the 1,000-agent-per-run limit. See the parallel tasks documentation for how to structure the fan-out invocation.

Workflow 5: Extended Thinking for Irreversible Calls

First, a correction that breaks code written from older docs:budget_tokens is deprecated on Claude Opus 4.8, Opus 4.7, and Sonnet 4.6. On those models, you no longer hand-set a thinking budget; adaptive thinking via the /effort setting is the replacement, andMAX_THINKING_TOKENS only applies to fixed-budget models that still honor a manual number. If your automation passesbudget_tokens to a current model, the value is being silently ignored.

Extended thinking spends real output tokens to reason before answering, so it earns its cost only when the price of a wrong answer dwarfs the price of the tokens. As a rule of thumb, raise the effort when a decision touches more than five files or crosses a service boundary; below that, standard completion is already accurate enough that additional reasoning is pure spend.

The published ranges track this logic: 1,000 to 5,000 thinking tokens cover simple tasks, 5,000 to 15,000 handle moderate ones, and 15,000 to 32,000-plus suit complex analysis, with returns flattening sharply above 32,000. I reach for it on calls a senior engineer would pause on rather than answer reflexively, evaluating three competing schema-migration paths, each with a different rollback story and lock-contention profile. Small scope, high blast radius: that's the exact quadrant where the extra reasoning pays.

Workflow 6: Human-in-the-Loop Approval Gates

The most common mistake here is also the most expensive in your own time: gating every file write. It feels safe, and it is mostly theater, for two reasons. A pipeline that pauses on each edit reduces you to a rubber stamp, and the gate often isn't running where you think it is. Sub-agents always execute inacceptEdits mode regardless of the parent session's permission mode. A file-write gate placed at the sub-agent level is architecturally bypassed; it never fires.

So place gates around reversibility, not anxiety. A revert undoes a bad edit in seconds, which is why edits don't need a checkpoint. A deploy, a schema migration, or an outbound API call doesn't revert cleanly, which is why those do. Think of the gate as a fire door: useless on every interior doorway, essential on the one exit that locks behind you. In practice, the gate belongs at workflow launch, plan mode's approval is exactly this, or around the irreversible shell and API commands in your allowlist, never around the routine writes in between. The human-in-the-loop workflow docs cover the permission-mode flags in detail.

A Selection Matrix from Real Task Characteristics

No product page publishes this because it requires an opinion. I rate each task on four axes: scope (how many files it touches), reversibility (can a clean revert undo it), risk (blast radius if it ships wrong), and context size (does the relevant code fit one context window). Those four inputs place almost every task in one cell.

Task shapeScopeReversibilityRiskContext sizeWorkflow
One-file fix, known answerTinyEasyLowFits one windowPlain prompt loop
Multi-file feature, one repoMediumEasyMediumFits one windowExplore-Plan-Code-Commit
New behavior with a hard specMediumEasyMediumFits one windowTest-first
N independent repos or tasksLargeEasyLowMultiple windowsParallel fan-out
Coupled change across servicesLargeMediumHighMultiple windowsMulti-Claude orchestration
Architecture or schema choiceSmallHardHighFits one windowExtended thinking
Deploy, migration, external callAnyHardHighAnyHuman-in-the-loop gate

The axes are the part worth stealing. Swap my thresholds for yours: if your team's revert story is weaker than mine, more tasks slide toward gates and extended thinking; if your repos are smaller, more slide toward fan-out. The matrix is a starting calibration, not a constitution.

Composing Workflows into One Pipeline

The patterns aren't mutually exclusive, and the highest-leverage setups stack two or three. A real schema change might open with extended thinking to pick the migration path, drop into Explore-Plan-Code-Commit to write the migration and its tests, and end on a single human gate around themigrate command itself. Three workflows, one pipeline, zero redundancy, because each handles a different axis: thinking owns the irreversible decision, EPCC owns the reversible code, the gate owns the irreversible action.

The trap when composing is double-gating. If extended thinking already forced a deliberate choice and you reviewed the plan in plan mode, a third approval on the same logic is friction with no new information. Gate each irreversible step once, at the latest moment you can still course-correct, and let the reversible middle run unattended.

Four Ways These Workflows Break

Context exhaustion is the first and most insidious. A long autonomous run fills the window, and once it does, the model begins forgetting its own earlier decisions mid-task. Checkpoint before you hit the wall: Dynamic Workflows store intermediate results outside the context window precisely so a run can survive its own length, and a deliberate "summarize state and write it to a file" step accomplishes the same thing manually.

Over-broad tool permissions are the quiet failure mode. Grant blanket shell access, and Claude can reach a goal by deleting the failing test rather than fixing the code, and you won't see it happen. Scope the allowlist so the destructive shortcuts aren't reachable.

The other two follow from earlier sections. Fanning out tasks that share a file produces downstream merge conflicts that erase the parallelism you paid for, so test task independence before you parallelize, not after. And gating reversible actions while leaving irreversible ones ungated inverts the risk model entirely: it slows you on edits that revert for free and waves through the deploy that doesn't.

Workflow Cheat-Sheet

WorkflowBest-fit taskKey config or flagPrimary watch-out
Explore-Plan-Code-CommitMost multi-file changes--permission-mode plan,Shift+TabVerification step must be honest
Test-firstBehavior with a sharp specFailing spec before codeAssertions Claude can satisfy by faking the result
Multi-Claude orchestrationCoupled cross-service workCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Experimental; no session resumption
Parallel fan-outIndependent large-scope tasks16-agent concurrency capShared files cause merge conflicts
Extended thinkingIrreversible architecture calls/effort (notbudget_tokens)Diminishing returns above 32k thinking tokens
Human-in-the-loop gateDeploys, migrations, API callsPlan-mode approval at launchSub-agents bypass file-write gates

The single move that compounds: pick the workflow you run most, encode it in aCLAUDE.md template like the EPCC block above, and make it repeatable. Reinventing the approach every task is the actual tax, not the token bill.

Subscribe to get practitioner breakdowns of agentic AI patterns as they evolve, written by a software engineer building with these tools week to week.

Common Questions About Claude Code Workflows

What are the six dynamic workflows available in Claude Code?

The six usage patterns are Explore-Plan-Code-Commit, test-first development, multi-Claude orchestration, parallel sub-agent fan-out, extended thinking, and human-in-the-loop approval gates. These differ from the six script primitives inside the capital-D "Dynamic Workflows" feature: classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, and loop-until-done. Those are lower-level building blocks the runtime executes. The usage patterns map loosely onto those primitives but don't match them one for one.

How does Claude Code's agentic loop differ from the API?

The API returns one completion per request and stops. The agentic loop reads context, plans, calls a tool, verifies that tool's output, and iterates against the result, so the plan can rewrite itself when a test fails or a search comes up empty. You get self-correction the raw API leaves you to build yourself.

When should I use multi-Claude orchestration?

Use it only when tasks are coupled and must coordinate, a change spanning services that share a contract, for instance. For independent tasks, a single agent fanning out sub-agents is cheaper and simpler. Agent Teams also costs roughly 7x the tokens of a standard plan-mode session and is still experimental, so reserve it for work that genuinely needs peer-to-peer messaging between agents.

Does parallel sub-agent fan-out cost significantly more in API tokens?

No. Token cost scales linearly with agent count, so four workers cost about the same total tokens as running the four tasks sequentially. What changes is wall-clock time, which drops 60 to 80 percent for independent tasks. The expensive variant is Agent Teams in plan mode at roughly 7x per session; that's a coordination cost, not a parallelism cost, and it only makes sense when agents actually need to coordinate.

What is extended thinking, and when is the cost worth it?

Extended thinking spends additional output tokens on reasoning before answering, billed at output-token rates. It justifies the cost only for decisions that are high-risk and hard to reverse: a schema migration, an API contract change, or a choice between architectural paths. A practical trigger: any decision touching more than five files or crossing a service boundary. Below that, standard completion is accurate enough. On Opus 4.8, Opus 4.7, and Sonnet 4.6, set effort with/effort;budget_tokens is deprecated on those models and is silently ignored.

How do I add approval gates without slowing the pipeline?

Gate irreversible actions only: deploys, migrations, and external API calls. Don't gate file writes: a revert undoes those for free and, at the sub-agent level, the gate is bypassed anyway, since sub-agents always run inacceptEdits mode. Place the checkpoint at workflow launch (plan mode's approval does this) or around the irreversible shell and API commands in your allowlist. Everything reversible in between should run unattended.

Can I combine multiple workflows in one pipeline?

Yes, and the strongest pipelines do. A schema change can open with extended thinking to choose the migration path, run Explore-Plan-Code-Commit to write and test the migration, and then end on one human gate around the migrate command itself. The rule: gate each irreversible step exactly once and let the reversible middle run unattended. Double-approving the same decision is friction with no new information.

Working through something like this? I help teams ship AI and cloud systems that hold up — and cost what they should.