OpenAI GPT-5.4: The Most Capable Model for Professional Work and Autonomous Agents


OpenAI has released GPT-5.4, positioning it as “our most capable and efficient frontier model for professional work.” The release marks a significant consolidation of OpenAI’s model lineup, combining advanced reasoning, frontier coding capabilities, and native computer-use abilities into a single model family. Available in three variants — GPT-5.4, GPT-5.4 Thinking, and GPT-5.4 Pro — the new model takes direct aim at enterprise customers and professional developers.

Three Variants, One Goal: Professional Excellence

GPT-5.4 ships in three distinct configurations, each optimized for different use cases:

GPT-5.4 (Standard)

The base model delivers strong general-purpose performance with improved accuracy and token efficiency. It’s the default for everyday professional tasks — writing, analysis, and code generation.

GPT-5.4 Thinking

The reasoning-focused variant replaces GPT-5.2 Thinking across ChatGPT. When activated, the model outlines its approach before generating responses, allowing users to redirect course mid-process. On OpenAI’s internal investment banking benchmark, performance jumped from 43.7% with GPT-5 to 87.3% with GPT-5.4 Thinking — a remarkable improvement that signals real-world applicability for complex financial analysis.

GPT-5.4 Pro

The highest-performance tier, available exclusively to Pro and Enterprise plans. Optimized for demanding workloads requiring maximum accuracy and depth.

Native Computer Use: A First for OpenAI

Perhaps the most significant addition is native computer-use capabilities — a first for any OpenAI model. GPT-5.4 can autonomously operate computers and software, issuing mouse and keyboard commands and navigating desktop environments. This isn’t bolted-on functionality; it’s built into the model from the ground up.

The model set record scores on key computer-use benchmarks:

  • OSWorld-Verified: New state-of-the-art performance
  • WebArena Verified: Best-in-class autonomous web navigation

This capability enables multi-app task automation without requiring developers to build supporting infrastructure. The model can search for and deploy external tools on demand, handling intricate multi-step tasks independently.

For context, Anthropic introduced computer use with Claude back in October 2024. OpenAI entering this space signals that autonomous computer operation is becoming a standard capability for frontier AI models, not a niche experiment.

1 Million Token Context Window

The API version of GPT-5.4 supports context windows up to 1 million tokens — by far the largest OpenAI has offered. This opens the door to processing entire codebases, lengthy legal documents, or extensive financial datasets in a single pass.

For comparison, GPT-5.2 offered 256K tokens and GPT-5.3 Instant offered 400K tokens. The jump to 1M tokens puts OpenAI in direct competition with Google’s Gemini models, which have offered large context windows for some time.

Coding: Absorbing GPT-5.3-Codex’s Capabilities

GPT-5.4 is OpenAI’s first mainline reasoning model to incorporate the frontier coding capabilities previously exclusive to GPT-5.3-Codex. This means developers no longer need to choose between a model that reasons well and one that codes well — GPT-5.4 does both.

The model is rolling out across ChatGPT, the API, and Codex (OpenAI’s agentic coding tool). For development teams, this consolidation simplifies model selection and deployment.

Accuracy and Efficiency Improvements

OpenAI claims GPT-5.4 is their most factual and reliable model to date:

  • 33% fewer errors in individual claims compared to GPT-5.2
  • 18% fewer errors across complete responses
  • 47% reduction in total token usage when using tool-search configurations (same accuracy)

The token efficiency gains are particularly notable. Despite slightly higher per-token costs — input tokens now cost $2.50 per million versus $1.75 for GPT-5.2 — the reduced token consumption means many workloads will actually cost less to run.

Spreadsheets and Data Analysis

GPT-5.4 shows particular strength in coding and data analysis tasks, especially spreadsheet generation. This is a strategically important capability — Microsoft previously added Anthropic’s Claude to Copilot 365 specifically because Claude outperformed OpenAI’s models in this area. GPT-5.4 appears designed to close that gap.

The model also produces presentations with stronger, more varied aesthetics and improved integration of image generation tools, making it more useful for business-facing deliverables.

Safety: Chain-of-Thought Controllability

OpenAI introduced a new open-source safety evaluation called CoT controllability, which measures whether models can deliberately obfuscate their reasoning to evade monitoring. The key finding: GPT-5.4 Thinking’s ability to control its chain-of-thought is low.

This is actually a positive result for safety. A model that cannot hide its reasoning process is inherently more transparent and auditable. This matters increasingly as models gain agentic capabilities and operate with greater autonomy.

Availability and Migration Timeline

PlanAccess
ChatGPT PlusGPT-5.4 Thinking
ChatGPT TeamGPT-5.4 Thinking
ChatGPT ProGPT-5.4 Thinking + GPT-5.4 Pro
EnterpriseGPT-5.4 Thinking + GPT-5.4 Pro
APIGPT-5.4 (all variants, up to 1M context)

GPT-5.2 Thinking will remain available for three months in the Legacy Models section of the model picker. It will be retired on June 5, 2026. Teams relying on GPT-5.2 should begin planning their migration.

The Competitive Landscape

GPT-5.4’s release is a direct shot at Anthropic, which has historically held the advantage with enterprise customers. The competition has intensified across several fronts:

  • Computer use: Anthropic pioneered this with Claude; OpenAI now matches it natively
  • Coding: Both companies are pushing the boundaries of AI-assisted development
  • Financial services: Both offer specialized integrations, with Anthropic launching Claude for Financial Services in July 2025
  • Agentic workflows: The race to build reliable autonomous AI agents is accelerating

Mario Rodriguez, GitHub’s Chief Product Officer, praised the model’s logical reasoning and complex workflow execution capabilities — an endorsement that carries weight given GitHub’s central role in the developer ecosystem.

What This Means for Developers

For developers evaluating GPT-5.4, here’s a practical breakdown:

Upgrade if you need:

  • Native computer-use capabilities for automation
  • Large context windows (500K+ tokens) for processing extensive codebases or documents
  • A single model that handles both reasoning and coding without compromises
  • Enterprise-grade accuracy for professional deliverables

Wait if you’re:

  • Happy with GPT-5.3 Instant for lightweight, fast tasks
  • Cost-sensitive and the per-token price increase matters more than efficiency gains
  • Not using agentic workflows or computer-use features

Consider alternatives if:

  • You need the absolute best coding performance (Claude Opus 4.5 still leads on SWE-bench Verified)
  • You’re already deeply integrated into Anthropic’s ecosystem
  • You prefer Anthropic’s safety-first approach and alignment track record

The Bottom Line

GPT-5.4 represents OpenAI’s most cohesive model release in a while. Rather than spreading capabilities across multiple specialized models, they’ve consolidated their best features into a unified family. The native computer-use abilities, 1M token context window, and improved accuracy make it a genuinely compelling option for professional work.

The AI model landscape continues to move at breakneck speed. With Anthropic, Google, and OpenAI all pushing the boundaries, developers have never had better options for integrating AI into their workflows. The real winner here is the developer community — more capable models, better pricing efficiency, and an expanding toolkit for building the next generation of AI-powered applications.

Learn More