Back
coding2026-02-23

OpenClaw + Codex Agent Swarm: The Complete One-Person Dev Team Setup

Author: Elvis (@elvissun)Original
AD SLOT — TOP

I don't use Codex or Claude Code directly anymore.

I use OpenClaw as the orchestration layer. My orchestrator agent Zoe spawns sub-agents, writes prompts, picks the right model for each task, monitors progress, and pings me on Telegram when a PR is ready to merge.

4 weeks of data:

  • 94 commits in a single day. My most productive day — I had 3 client calls and never opened my editor. Daily average is around 50 commits.
  • 7 PRs in 30 minutes. Idea to production is fast because coding and validation are almost fully automated.
  • Commits → MRR. I'm using this to build a real B2B SaaS — with founder-led sales, I deliver most feature requests the same day. Speed converts prospects into paying customers.

My git history looks like I hired a dev team. It's just me — evolved from managing Claude Code directly to using an OpenClaw agent to manage a fleet of Claude Code and Codex agents.

Success rate: The system completes almost all small-to-medium tasks in one shot, no human intervention needed.

Cost: ~$100/month Claude + $90/month Codex. You can start from $20.


Why One AI Can't Do Two Things at Once

The context window is a zero-sum game. You have to choose what goes in.

Fill it with code → no room for business context. Fill it with customer history → no room for the codebase. That's why a two-layer system works: each AI only loads what it needs.

Specialization through context, not through different models.


The Complete 8-Step Workflow

Step 1: Customer Request → Scope with Zoe

After a client call, I talk through the requirements with Zoe. Because all my meeting notes auto-sync to my Obsidian vault, there's nothing extra to explain. Zoe then does three things:

  • Top up credits to unblock the customer immediately — she has admin API access
  • Pull customer config from the production database — read-only prod DB access for existing settings
  • Spawn a Codex agent — with a detailed prompt containing all the context

Step 2: Spawn the Agent

Each agent gets its own worktree (isolated branch) and tmux session:

# Create worktree + spawn agent
git worktree add ../feat-custom-templates -b feat/custom-templates origin/main
cd ../feat-custom-templates && pnpm install
tmux new-session -d -s "codex-templates" \
  -c "/path/to/worktrees/feat-custom-templates" \
  "$HOME/.codex-agent/run-agent.sh templates gpt-5.3-codex high"

Commands to start agents:

# Codex
codex --model gpt-5.3-codex \
  -c "model_reasoning_effort=high" \
  --dangerously-bypass-approvals-and-sandbox \
  "Your prompt here"

# Claude Code
claude --model claude-opus-4.5 \
  --dangerously-skip-permissions \
  -p "Your prompt here"

The key advantage of tmux is mid-task redirection:

# Wrong direction — correct it live:
tmux send-keys -t codex-templates "Stop. Focus on the API layer first, not the UI." Enter

# Needs more context:
tmux send-keys -t codex-templates "The schema is in src/types/template.ts. Use that." Enter

Task state is tracked in .clawdbot/active-tasks.json:

{
  "id": "feat-custom-templates",
  "tmuxSession": "codex-templates",
  "agent": "codex",
  "status": "running",
  "notifyOnComplete": true
}

Step 3: The Monitor Loop

A cron job runs every 10 minutes, reads the JSON registry, and checks:

  • Is the tmux session still alive?
  • Are there pending PRs on the tracking branch?
  • What's the CI status via gh CLI?
  • Auto-restart failed agents (up to 3 times)
  • Alert only when human intervention is needed

I don't watch terminals. The system tells me when to look.

Step 4: Agent Creates the PR

The agent commits, pushes, and opens a PR via gh pr create --fill. Completion criteria:

  • PR created + branch synced to main
  • CI passes (lint, type check, unit tests, E2E)
  • Codex review passes
  • Claude Code review passes
  • Gemini review passes
  • Screenshots included if there are any UI changes

Step 5: Automated Code Review

Every PR gets reviewed by three AI models:

  • Codex Reviewer — great at edge cases, logic errors, race conditions. Very low false positive rate.
  • Gemini Code Assist — free, catches security issues and scalability problems. Must-have.
  • Claude Code Reviewer — tends to be overly cautious. Skip unless flagged as critical.

Step 6: Automated Testing

CI pipeline: Lint + TypeScript → unit tests → E2E tests → Playwright tests against preview environment.

New rule: if a PR touches any UI, it must include screenshots or CI fails.

Step 7: Human Review

This is when I get the Telegram notification: "PR #341 ready for review."

My review takes 5–10 minutes. A lot of PRs I merge without reading the code — the screenshots tell me everything I need to know.

Step 8: Merge

PR merged. A daily cron job cleans up orphaned worktrees and the task registry JSON.


The Ralph Loop V2: When Agents Fail

When an agent fails, Zoe doesn't just restart it with the same prompt. She analyzes the failure with full business context:

  • Agent ran out of context? "Focus only on these three files."
  • Agent went in the wrong direction? "Stop. The customer wants X, not Y."
  • Agent needs clarification? "Here's the customer's email and their business context."

Zoe also proactively finds work:

  • Morning: Scan Sentry → find 4 new errors → spawn 4 agents to fix them
  • After meetings: Scan meeting notes → spawn agents for feature requests
  • Evening: Scan git log → spawn Claude Code to update changelog and docs

How to Pick the Right Agent

AgentBest forUsage share
CodexBackend logic, complex bugs, multi-file refactors90%
Claude CodeFrontend, git operations, faster turnaround~8%
GeminiDesign-aware tasks — generate HTML/CSS spec first, then Claude Code implements~2%

How to Set This Up Fast

Copy this entire post into OpenClaw and say: "Implement this agent swarm setup for my codebase."

It'll read the architecture, create the scripts, set up the directory structure, and configure cron monitoring. Done in 10 minutes.


The Unexpected Bottleneck: RAM

Each agent needs its own worktree + node_modules. My 16GB Mac Mini started swapping at 4–5 concurrent agents. I bought a Mac Studio M4 Max with 128GB RAM ($3,500) to support the full setup.

If you're just starting out, 4–5 concurrent agents is already seriously powerful.


Original: https://x.com/elvissun/status/2025920521871716562 | Author: @elvissun | Stats: 👍 10,127 | 👁 3,919,563

AD SLOT — BOTTOM
OpenClaw + Codex Agent Swarm: The Complete One-Person Dev Team Setup — ClaWHow