Skip to content
← Blog

Drydock: from one coding agent to a 19-agent product team

4 min readAIagentsClaudeopen source

AI writes the code now. That was supposed to be the hard part. Then you ship one of those features to production and find out the code was maybe 10% of the job — the rest is architecture, tests, a real security pass, CI/CD, observability, docs, compliance, and a hundred unglamorous decisions nobody puts in a demo. A single "do everything" agent will happily skip most of it and hand you a confident draft.

So I built Drydock — an open-source plugin that turns Claude Code into a coordinated 19-agent product team instead of one generalist doing impressions of ten specialists.

One agent is a generalist; production needs specialists

The tempting design is one big agent with a long prompt. It breaks for two reasons.

First, context drift. A single context holding the architecture, the threat model, and the deploy plan at once loses fidelity as the run grows — by hour two its memory of the API contract from hour one is a lossy summary. It's the same property that makes agents expensive to run: the transcript is the workload, and a long one degrades as well as bills.

Second, generalist output for specialist work. The same model "doing security" in passing is not a security review. Depth comes from a role that does one thing, reads upstream artifacts, and is judged on one thing.

The fix isn't a smarter prompt. It's structure: specialized agents, each with one job, coordinated by an orchestrator, handing verified artifacts to each other.

What Drydock is

You describe what you want in plain English. An orchestrator routes the work through six phases and pauses at three gates for your approval:

DEFINE ─▶ BUILD ─▶ HARDEN ─▶ SHIP ─▶ LAUNCH ─▶ SUSTAIN
  ◆ requirements   ◆ architecture        ◆ production-readiness

Product Manager and Architect scope it; a UX Designer specs the design system. Backend and Frontend Engineers build it. QA, Security, and Code Review harden it. DevOps and SRE ship it. Then Growth, Sales, and Customer Success take it to market. You stay in the strategist's seat and approve at three checkpoints; the agents do the work in between.

The parts most AI tools skip

  • Receipts. Every agent writes a JSON proof of what it produced, and a gate won't open until those artifacts are verified on disk. No "done" without evidence — which is also how you stop an agent from confidently lying about work it didn't do.
  • Re-anchoring. At every phase the orchestrator re-reads the spec from disk instead of trusting its own compressed memory. The same context tax that makes long agent runs expensive also makes them wrong; re-anchoring fights both.
  • A production-readiness gate that doesn't take your word for it. The final gate re-derives test counts, coverage, and performance from the actual JUnit and coverage artifacts. A receipt that claims 90% coverage when the artifact says 60% is a blocking failure, not a pass.
  • Security by default, not as a bolt-on. OWASP 2025 / ASVS / API & LLM Top 10 controls are written at build time and audited in HARDEN — with a VAPT mode for live testing and per-product compliance mapping (SOC 2 / GDPR / HIPAA / PCI-DSS).

What you can actually do with it

Drydock isn't all-or-nothing. Point it at the job in front of you:

  • Build a production SaaS greenfield — idea to deployed.
  • Harden an existing codebase: security + QA + review, then the fixes.
  • Pentest a running target (behind an explicit authorization gate) or map controls to a compliance framework.
  • Ship code you already have — Dockerfiles, CI/CD, IaC, SLOs, runbooks.
  • Design the UX or Launch the go-to-market as standalone phases.
  • Or call any single agent directly: /drydock:security-engineer audit my API for OWASP Top 10.

Install it from inside Claude Code — the repo is the marketplace, no clone required:

/plugin marketplace add sundarshahi/drydock
/plugin install drydock@drydock

Before you trust any agent's "production ready," pressure-test the claim yourself. The Production-Readiness Scorecard grades the same dimensions Drydock's final gate enforces — tests, coverage, security, observability, rollback — so you can see exactly what "ready" should mean.

The takeaway

Code generation was the part we already solved. The leverage now is in everything around the code — and that's a coordination problem, not a bigger-model problem. A team of specialized agents with receipts, gates, and re-anchoring ships software you can defend in production, where one mega-agent ships a draft that looks finished. Drydock is that team: open-source, one install away, idea to launch.

Want the pipeline run against something real — or your current stack audited the way Drydock's HARDEN phase would? That's exactly what a production engagement is for.

Working through something like this? I help teams ship AI and cloud systems that hold up, and cost what they should.