How I Write Software with LLMs: A Practical Multi-Agent Workflow
Start with the Right Goal
A lot of us learned programming because we liked the craft. But in production engineering, the real goal is rarely “write beautiful code.” The goal is to ship useful systems that stay reliable under change.
LLMs change the leverage point. If the model can produce syntactically correct code quickly, your value shifts upward:
- defining the right problem
- setting constraints and tradeoffs
- choosing architecture
- catching product and operational failures early
You stop being a code typist and become a systems editor.
Why This Works Better Than One Big Agent
The core workflow uses three roles:
architect: turns intent into an implementation plandeveloper: executes against that planreviewers: independently critique plan-vs-diff quality
This split works for three concrete reasons.
- You pay premium model costs where reasoning matters most, not on every token of implementation.
- Independent reviewers catch different classes of mistakes.
- Capability boundaries become explicit (read-only reviewers, write-enabled implementer, etc.).
Running one model end-to-end can produce velocity, but it also tends to hide mistakes until late. Role separation gives you deliberate friction in the right places.
What “Good Harness” Means in Practice
Your coding harness does not need to be fancy, but it does need two hard requirements:
- support for multiple model providers
- agents that can call each other without manual copy/paste relay
Without multi-provider support, you lose model diversity in review. Without inter-agent calls, you become a human message queue and throughput collapses.
Everything else is secondary: sessions, worktrees, task persistence, and custom tools help, but they are optimizations, not fundamentals.
Architect Phase: Design Before Diff
The architect phase is where reliability is won.
A strong model is used here because the task is not raw code generation, it is design pressure-testing:
- clarify exact behavior
- surface edge cases
- choose implementation boundaries
- lock in non-goals
This phase should feel like a technical design review, not a single prompt.
A practical pattern that works well:
- State a narrow feature objective.
- Let the model ask clarifying questions.
- Push on tradeoffs until the plan is concrete at file/function granularity.
- Require explicit approval text before implementation starts.
That last gate matters. Models are often eager to “start coding” before the plan is fully shaped.
Developer Phase: Execute with Minimal Ambiguity
The developer agent should be cheaper and fast. Its job is to implement the approved plan, not reinterpret product strategy.
A good plan keeps developer variance low:
- target files are named
- expected flow is clear
- interface decisions are already made
- out-of-scope areas are explicit
When the developer finishes, it hands the diff to reviewers.
Reviewer Phase: Independent Critique, Not Rubber Stamp
Reviewer agents inspect two artifacts together:
- the approved plan
- the implementation diff
This prevents shallow feedback. The question is not “is this code plausible?” The question is “did we implement the intended architecture safely and cleanly?”
Different models catch different defects. In practice this often means one reviewer catches correctness bugs, another catches overengineering, and another catches security or UX traps.
If reviewers agree, changes are integrated. If they conflict, escalate to architect arbitration.
Real Session Anatomy: Email Support in One Feature Cycle
The most useful part of the original story is a full real-world session: adding email support to an existing assistant.
The session follows a repeatable arc.
1) High-level intent
The feature starts broad: “add email support.” The model responds with a structured decision tree:
- inbound channel design (webhook vs polling vs SMTP receiver)
- outbound transport (SMTP vs API)
- threading semantics
- attachments and HTML handling
- trust and authentication at public webhook boundaries
2) Constraint shaping
The human chooses direction:
- webhook inbound
- SMTP outbound
- in-process channel
- markdown conversion
- attachment support
This is where architecture becomes yours, not generic model output.
3) Detailed plan + implementation
The architect creates task-level steps, then delegates implementation. The implementation includes channel wiring, parsing, allowlist updates, config updates, and tests.
4) QA uncovers reality gaps
After initial delivery, QA finds a routing bug. The system drops owner emails due to missing owner identity wiring in one path.
This is important: the initial implementation looked complete, tests were green, and the bug still existed. Real QA loops are non-negotiable.
5) Refactor for bug-class elimination
A second pass identifies a structural issue: channel handling is hardcoded in multiple places. Fixing one bug is not enough; the fix is consolidating channel lists to reduce future omission risk.
6) Product nuance and security hardening
Email wildcard behavior is added for practical routing (*@domain.com, user+*@domain.com) with careful matching rules so wildcards cannot cross @ boundaries and accidentally authorize crafted addresses.
That final part is exactly what mature AI-assisted development looks like: not “generate code,” but repeated cycles of behavior validation, threat modeling, and tightening.
The Biggest Failure Mode
The workflow fails when you do not understand the underlying stack well enough to steer architecture.
In that state, you can still get rapid output, but you lose correction authority. Bad decisions stack, patches become brittle, and each “fix” digs deeper.
You can usually detect this early when sessions become repetitive:
- “I know why it broke”
- another patch lands
- the system regresses elsewhere
When this happens, slow down and re-enter architect mode. Rebuild a clean plan, narrow scope, and restore control.
A Practical Blueprint You Can Adopt Tomorrow
If you want to apply this model immediately, use this setup:
- One strong planning model (
architect). - One cost-efficient implementation model (
developer). - Two independent review models (
reviewers). - Explicit approval gate before any implementation.
- Mandatory QA cycle on real behavior, not just tests.
- Escalation rule when reviewer feedback conflicts.
You do not need a perfect stack to start. You need role clarity and discipline.
Final Takeaway
The most useful mental shift is this:
- LLMs are not replacing engineering judgment.
- They are amplifying whatever process you already have.
If your process is fuzzy, LLMs scale confusion. If your process is explicit, LLMs scale output.
The teams that win with AI coding are not the ones with the cleverest prompts. They are the ones with the clearest architecture, fastest feedback loops, and strict quality gates.