OpenClaw + Codex Agent Swarm: The Complete One-Person Dev Team Setup
I don't use Codex or Claude Code directly anymore.
I use OpenClaw as the orchestration layer. My orchestrator agent Zoe spawns sub-agents, writes prompts, picks the right model for each task, monitors progress, and pings me on Telegram when a PR is ready to merge.
4 weeks of data:
- 94 commits in a single day. My most productive day — I had 3 client calls and never opened my editor. Daily average is around 50 commits.
- 7 PRs in 30 minutes. Idea to production is fast because coding and validation are almost fully automated.
- Commits → MRR. I'm using this to build a real B2B SaaS — with founder-led sales, I deliver most feature requests the same day. Speed converts prospects into paying customers.
My git history looks like I hired a dev team. It's just me — evolved from managing Claude Code directly to using an OpenClaw agent to manage a fleet of Claude Code and Codex agents.
Success rate: The system completes almost all small-to-medium tasks in one shot, no human intervention needed.
Cost: ~$100/month Claude + $90/month Codex. You can start from $20.
Why One AI Can't Do Two Things at Once
The context window is a zero-sum game. You have to choose what goes in.
Fill it with code → no room for business context. Fill it with customer history → no room for the codebase. That's why a two-layer system works: each AI only loads what it needs.
Specialization through context, not through different models.
The Complete 8-Step Workflow
Step 1: Customer Request → Scope with Zoe
After a client call, I talk through the requirements with Zoe. Because all my meeting notes auto-sync to my Obsidian vault, there's nothing extra to explain. Zoe then does three things:
- Top up credits to unblock the customer immediately — she has admin API access
- Pull customer config from the production database — read-only prod DB access for existing settings
- Spawn a Codex agent — with a detailed prompt containing all the context
Step 2: Spawn the Agent
Each agent gets its own worktree (isolated branch) and tmux session:
# Create worktree + spawn agent
git worktree add ../feat-custom-templates -b feat/custom-templates origin/main
cd ../feat-custom-templates && pnpm install
tmux new-session -d -s "codex-templates" \
-c "/path/to/worktrees/feat-custom-templates" \
"$HOME/.codex-agent/run-agent.sh templates gpt-5.3-codex high"
Commands to start agents:
# Codex
codex --model gpt-5.3-codex \
-c "model_reasoning_effort=high" \
--dangerously-bypass-approvals-and-sandbox \
"Your prompt here"
# Claude Code
claude --model claude-opus-4.5 \
--dangerously-skip-permissions \
-p "Your prompt here"
The key advantage of tmux is mid-task redirection:
# Wrong direction — correct it live:
tmux send-keys -t codex-templates "Stop. Focus on the API layer first, not the UI." Enter
# Needs more context:
tmux send-keys -t codex-templates "The schema is in src/types/template.ts. Use that." Enter
Task state is tracked in .clawdbot/active-tasks.json:
{
"id": "feat-custom-templates",
"tmuxSession": "codex-templates",
"agent": "codex",
"status": "running",
"notifyOnComplete": true
}
Step 3: The Monitor Loop
A cron job runs every 10 minutes, reads the JSON registry, and checks:
- Is the tmux session still alive?
- Are there pending PRs on the tracking branch?
- What's the CI status via
ghCLI? - Auto-restart failed agents (up to 3 times)
- Alert only when human intervention is needed
I don't watch terminals. The system tells me when to look.
Step 4: Agent Creates the PR
The agent commits, pushes, and opens a PR via gh pr create --fill. Completion criteria:
- PR created + branch synced to main
- CI passes (lint, type check, unit tests, E2E)
- Codex review passes
- Claude Code review passes
- Gemini review passes
- Screenshots included if there are any UI changes
Step 5: Automated Code Review
Every PR gets reviewed by three AI models:
- Codex Reviewer — great at edge cases, logic errors, race conditions. Very low false positive rate.
- Gemini Code Assist — free, catches security issues and scalability problems. Must-have.
- Claude Code Reviewer — tends to be overly cautious. Skip unless flagged as critical.
Step 6: Automated Testing
CI pipeline: Lint + TypeScript → unit tests → E2E tests → Playwright tests against preview environment.
New rule: if a PR touches any UI, it must include screenshots or CI fails.
Step 7: Human Review
This is when I get the Telegram notification: "PR #341 ready for review."
My review takes 5–10 minutes. A lot of PRs I merge without reading the code — the screenshots tell me everything I need to know.
Step 8: Merge
PR merged. A daily cron job cleans up orphaned worktrees and the task registry JSON.
The Ralph Loop V2: When Agents Fail
When an agent fails, Zoe doesn't just restart it with the same prompt. She analyzes the failure with full business context:
- Agent ran out of context? "Focus only on these three files."
- Agent went in the wrong direction? "Stop. The customer wants X, not Y."
- Agent needs clarification? "Here's the customer's email and their business context."
Zoe also proactively finds work:
- Morning: Scan Sentry → find 4 new errors → spawn 4 agents to fix them
- After meetings: Scan meeting notes → spawn agents for feature requests
- Evening: Scan git log → spawn Claude Code to update changelog and docs
How to Pick the Right Agent
| Agent | Best for | Usage share |
|---|---|---|
| Codex | Backend logic, complex bugs, multi-file refactors | 90% |
| Claude Code | Frontend, git operations, faster turnaround | ~8% |
| Gemini | Design-aware tasks — generate HTML/CSS spec first, then Claude Code implements | ~2% |
How to Set This Up Fast
Copy this entire post into OpenClaw and say: "Implement this agent swarm setup for my codebase."
It'll read the architecture, create the scripts, set up the directory structure, and configure cron monitoring. Done in 10 minutes.
The Unexpected Bottleneck: RAM
Each agent needs its own worktree + node_modules. My 16GB Mac Mini started swapping at 4–5 concurrent agents. I bought a Mac Studio M4 Max with 128GB RAM ($3,500) to support the full setup.
If you're just starting out, 4–5 concurrent agents is already seriously powerful.
Original: https://x.com/elvissun/status/2025920521871716562 | Author: @elvissun | Stats: 👍 10,127 | 👁 3,919,563