Mar 18, 2026

How I Write Software with LLMs: A Practical Multi-Agent Workflow

Start with the Right Goal

A lot of us learned programming because we liked the craft. But in production engineering, the real goal is rarely “write beautiful code.” The goal is to ship useful systems that stay reliable under change.

LLMs change the leverage point. If the model can produce syntactically correct code quickly, your value shifts upward:

defining the right problem
setting constraints and tradeoffs
choosing architecture
catching product and operational failures early

You stop being a code typist and become a systems editor.

Why This Works Better Than One Big Agent

The core workflow uses three roles:

architect: turns intent into an implementation plan
developer: executes against that plan
reviewers: independently critique plan-vs-diff quality

This split works for three concrete reasons.

You pay premium model costs where reasoning matters most, not on every token of implementation.
Independent reviewers catch different classes of mistakes.
Capability boundaries become explicit (read-only reviewers, write-enabled implementer, etc.).

Running one model end-to-end can produce velocity, but it also tends to hide mistakes until late. Role separation gives you deliberate friction in the right places.

What “Good Harness” Means in Practice

Your coding harness does not need to be fancy, but it does need two hard requirements:

support for multiple model providers
agents that can call each other without manual copy/paste relay

Without multi-provider support, you lose model diversity in review. Without inter-agent calls, you become a human message queue and throughput collapses.

Everything else is secondary: sessions, worktrees, task persistence, and custom tools help, but they are optimizations, not fundamentals.

Architect Phase: Design Before Diff

The architect phase is where reliability is won.

A strong model is used here because the task is not raw code generation, it is design pressure-testing:

clarify exact behavior
surface edge cases
choose implementation boundaries
lock in non-goals

This phase should feel like a technical design review, not a single prompt.

A practical pattern that works well:

State a narrow feature objective.
Let the model ask clarifying questions.
Push on tradeoffs until the plan is concrete at file/function granularity.
Require explicit approval text before implementation starts.

That last gate matters. Models are often eager to “start coding” before the plan is fully shaped.

Developer Phase: Execute with Minimal Ambiguity

The developer agent should be cheaper and fast. Its job is to implement the approved plan, not reinterpret product strategy.

A good plan keeps developer variance low:

target files are named
expected flow is clear
interface decisions are already made
out-of-scope areas are explicit

When the developer finishes, it hands the diff to reviewers.

Reviewer Phase: Independent Critique, Not Rubber Stamp

Reviewer agents inspect two artifacts together:

the approved plan
the implementation diff

This prevents shallow feedback. The question is not “is this code plausible?” The question is “did we implement the intended architecture safely and cleanly?”

Different models catch different defects. In practice this often means one reviewer catches correctness bugs, another catches overengineering, and another catches security or UX traps.

If reviewers agree, changes are integrated. If they conflict, escalate to architect arbitration.

Real Session Anatomy: Email Support in One Feature Cycle

The most useful part of the original story is a full real-world session: adding email support to an existing assistant.

The session follows a repeatable arc.

1) High-level intent

The feature starts broad: “add email support.” The model responds with a structured decision tree:

inbound channel design (webhook vs polling vs SMTP receiver)
outbound transport (SMTP vs API)
threading semantics
attachments and HTML handling
trust and authentication at public webhook boundaries

2) Constraint shaping

The human chooses direction:

webhook inbound
SMTP outbound
in-process channel
markdown conversion
attachment support

This is where architecture becomes yours, not generic model output.

3) Detailed plan + implementation

The architect creates task-level steps, then delegates implementation. The implementation includes channel wiring, parsing, allowlist updates, config updates, and tests.

4) QA uncovers reality gaps

After initial delivery, QA finds a routing bug. The system drops owner emails due to missing owner identity wiring in one path.

This is important: the initial implementation looked complete, tests were green, and the bug still existed. Real QA loops are non-negotiable.

5) Refactor for bug-class elimination

A second pass identifies a structural issue: channel handling is hardcoded in multiple places. Fixing one bug is not enough; the fix is consolidating channel lists to reduce future omission risk.

6) Product nuance and security hardening

Email wildcard behavior is added for practical routing (*@domain.com, user+*@domain.com) with careful matching rules so wildcards cannot cross @ boundaries and accidentally authorize crafted addresses.

That final part is exactly what mature AI-assisted development looks like: not “generate code,” but repeated cycles of behavior validation, threat modeling, and tightening.

The Biggest Failure Mode

The workflow fails when you do not understand the underlying stack well enough to steer architecture.

In that state, you can still get rapid output, but you lose correction authority. Bad decisions stack, patches become brittle, and each “fix” digs deeper.

You can usually detect this early when sessions become repetitive:

“I know why it broke”
another patch lands
the system regresses elsewhere

When this happens, slow down and re-enter architect mode. Rebuild a clean plan, narrow scope, and restore control.

A Practical Blueprint You Can Adopt Tomorrow

If you want to apply this model immediately, use this setup:

One strong planning model (architect).
One cost-efficient implementation model (developer).
Two independent review models (reviewers).
Explicit approval gate before any implementation.
Mandatory QA cycle on real behavior, not just tests.
Escalation rule when reviewer feedback conflicts.

You do not need a perfect stack to start. You need role clarity and discipline.

Final Takeaway

The most useful mental shift is this:

LLMs are not replacing engineering judgment.
They are amplifying whatever process you already have.

If your process is fuzzy, LLMs scale confusion. If your process is explicit, LLMs scale output.

The teams that win with AI coding are not the ones with the cleverest prompts. They are the ones with the clearest architecture, fastest feedback loops, and strict quality gates.