GPT-5.4 Mini and Nano: OpenAI's Bet on the Subagent Era


OpenAI just released GPT-5.4 mini and GPT-5.4 nano — their “most capable small models yet.” Less than two weeks after launching the GPT-5.4 flagship, and just days after GPT-5.3, the company dropped two more models into the stack. The pace is relentless.

But these are not just cheaper reruns of the big model. They signal a structural shift in how AI systems are being designed: the subagent pattern, where a large model acts as the brain and delegates chunks of work to smaller, faster, cheaper models running in parallel.

What Shipped

GPT-5.4 mini is available in ChatGPT, Codex, and the API. It supports text and image inputs, tool use, function calling, web search, file search, computer use, and a 400k context window.

  • Pricing: $0.75 per 1M input tokens, $4.50 per 1M output tokens
  • SWE-Bench Pro: 54.4% (only 3 points behind the full GPT-5.4)
  • OSWorld-Verified: 72.1% (vs. the flagship’s 75.0%)
  • GPQA Diamond: 88.0%
  • Runs 2x faster than GPT-5 mini
  • In ChatGPT, available to Free and Go users via the “Thinking” toggle
  • In Codex, uses only 30% of the GPT-5.4 quota

GPT-5.4 nano is API-only. The smallest, cheapest model in the 5.4 family, built for tasks where speed and cost dominate.

  • Pricing: $0.20 per 1M input tokens, $1.25 per 1M output tokens
  • SWE-Bench Pro: 52.4%
  • OSWorld-Verified: 39.0%
  • Cheaper than Google’s Gemini 3.1 Flash-Lite
  • Recommended for classification, data extraction, ranking, and coding subagents

The Numbers in Context

The benchmark picture is hard to compare cleanly across vendors because everyone tests on slightly different variants. But here is a rough pricing landscape for the “small model” tier as of today:

ModelInput (per 1M)Output (per 1M)Notes
GPT-5.4 mini$0.75$4.50400k context
GPT-5.4 nano$0.20$1.25API-only
Gemini 3 Flash$0.50$3.00
Gemini 3.1 Flash-Lite$0.25$1.501M context, 381 tok/s
Claude Haiku 4.5$1.00$5.00

GPT-5.4 nano undercuts everything except Gemini 3.1 Flash-Lite on input cost, and beats it on output cost. GPT-5.4 mini slots in between Gemini Flash and Claude Haiku.

On GPQA Diamond, GPT-5.4 nano reportedly scores 9.8% higher than Claude Haiku 4.5. But on SWE-bench Verified, Haiku 4.5 hits 73.3% — the catch being that it was tested on SWE-bench Verified while OpenAI reports on the harder SWE-bench Pro variant. Direct comparison is murky.

The honest read: these models are all converging. The meaningful differentiation is less about raw benchmark points and more about latency, cost, context window, and how well they integrate into agentic workflows.

The Subagent Pattern Is the Real Story

What makes mini and nano interesting is not that they are small. It is what they are small for.

The emerging architecture in AI-powered development looks like this:

  1. A flagship model (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) acts as the orchestrator
  2. It breaks a complex task into subtasks
  3. It delegates those subtasks to smaller, faster models running in parallel
  4. Results flow back to the orchestrator for synthesis

This is exactly the pattern that Codex now uses natively. The big model plans, the mini model executes simpler coding tasks at 30% of the cost. Nano handles classification, file review, and codebase navigation.

As The New Stack put it, these models are “built for the subagent era.” They are not designed to be used alone. They are designed to be delegated to.

This pattern is everywhere now. Anthropic’s Claude Code delegates to Haiku for exploration tasks. Google’s agent frameworks route between Gemini Pro and Flash. The small model is becoming the worker thread of AI systems.

What This Means for Developers

Cost curves are collapsing. GPT-5.4 nano can describe 76,000 photos for $52, as Simon Willison calculated. Tasks that were prohibitively expensive a year ago are now commodity operations.

The free tier keeps getting better. GPT-5.4 mini in ChatGPT Free means that anyone with a browser now has access to a model that scores 54% on SWE-Bench Pro. A year ago, that would have been frontier performance.

Agentic design is becoming the default. If you are building AI-powered tools and still routing every request to a single model, you are overpaying. The playbook is clear: use the biggest model for planning and hard reasoning, delegate everything else to the cheapest model that can handle it.

Last year’s flagship is this year’s free tier. This is the pattern that keeps repeating. GPT-5.4 nano outperforms GPT-5 mini. The rate of capability depreciation in AI models has no precedent in consumer technology. Build systems that can swap models easily.

The Competitive Landscape

The flagship race in March 2026 is remarkably tight. On SWE-Bench Verified, Gemini 3.1 Pro sits at 80.6%, Claude Opus 4.6 at 80.8%, and GPT-5.4 is in the same neighborhood. On Chatbot Arena, Claude Opus 4.6 holds the #1 Elo on both text and code leaderboards.

But the small model tier is where the real competition is heating up. Google’s Gemini 3.1 Flash-Lite, OpenAI’s GPT-5.4 nano, and Anthropic’s Haiku 4.5 are all fighting for the “worker model” slot in agentic architectures. The winner is whichever model offers the best performance-per-dollar for delegated subtasks.

This is a fundamentally different competition than the flagship race. It is not about who scores highest on a benchmark. It is about who can do reliable commodity work at the lowest cost and lowest latency. A model that is 2% worse but 3x cheaper and 2x faster will win the subagent slot every time.

Looking Forward

The direction is clear. Model providers are no longer just shipping bigger, smarter models. They are shipping model families designed to work together in hierarchical architectures. The big model thinks. The small model does.

For anyone building on top of these models, the implication is straightforward: design your systems with multiple model tiers from the start. Route by task complexity, not by habit. And expect the cost floor to keep dropping.

The most interesting question is not which small model is best today. It is how quickly the orchestration layer — the part that decides what to delegate and to whom — becomes the real differentiator.

References