Feb 13, 2026

GPT-5.3-Codex-Spark: OpenAI's Bet on Real-Time AI Coding Hits 1,000 Tokens Per Second

OpenAI has released a research preview of GPT-5.3-Codex-Spark, a smaller, speed-optimized variant of their GPT-5.3-Codex model and their first model designed specifically for real-time coding. The headline claim: over 1,000 tokens per second, achieved by running on Cerebras Wafer-Scale Engine 3 (WSE-3) hardware instead of traditional NVIDIA GPUs.

This is the first tangible result of the OpenAI-Cerebras partnership announced in January 2026, and it signals a clear strategic shift—speed as a first-class feature for coding models, not just an afterthought.

Why Speed Matters for Coding

The argument for Codex-Spark is straightforward: when a model responds fast enough, you can stay in a flow state. Instead of context-switching while waiting for a 15-minute agentic run to complete, you get near-instant feedback that enables rapid iteration.

This is a different design philosophy from the larger GPT-5.3-Codex, which prioritizes thoroughness and accuracy over latency. Spark doesn’t replace the flagship—it complements it by targeting a different workflow: real-time collaboration rather than long-horizon autonomous execution.

OpenAI envisions this as the beginning of a dual-mode Codex system:

GPT-5.3-Codex: Longer-horizon reasoning and execution for complex, multi-step tasks
GPT-5.3-Codex-Spark: Real-time collaboration for rapid iteration and interactive development

The Cerebras Hardware Advantage

Codex-Spark is OpenAI’s first model to run on the Cerebras WSE-3, a wafer-scale chip featuring more than 4 trillion transistors on what’s been described as a “dinner plate-sized piece of silicon.” The architecture eliminates data bottlenecks by using wafer-scale memory, enabling the extreme throughput numbers.

The infrastructure improvements go beyond the chip itself:

80% reduction in client-server roundtrip overhead
30% reduction in per-token overhead
50% reduction in time-to-first-token
Persistent WebSocket connections enabled by default

These optimizations collectively make the model feel near-instant in practice, not just on paper.

Benchmarks: The Speed-Accuracy Trade-off

Codex-Spark is honest about what it is: a smaller model optimized for speed, not a flagship killer. The benchmark numbers reflect this trade-off clearly.

Terminal-Bench 2.0

Model	Score
GPT-5.3-Codex	77.3%
GPT-5.3-Codex-Spark	58.4%
GPT-5.1-Codex-mini	46.1%

Spark scores roughly 19 points below the flagship on Terminal-Bench 2.0, the benchmark measuring agentic terminal-based coding. That’s a meaningful gap—but Spark completes tasks in a fraction of the time.

SWE-Bench Pro

On SWE-Bench Pro, the story is more interesting. Codex-Spark reportedly achieves similar accuracy to the flagship, but completes tasks in 2-3 minutes compared to 15-17 minutes for GPT-5.3-Codex. For tasks where the smaller model is capable enough, you’re getting roughly equivalent results 5-8x faster.

Where Spark Fits

The benchmarks suggest a clear division of labor:

Routine coding tasks (bug fixes, small features, refactoring): Spark handles these at near-instant speed with sufficient accuracy
Complex multi-file architecture changes: The flagship GPT-5.3-Codex remains the better choice
Interactive debugging and iteration: Spark’s speed makes it ideal for rapid back-and-forth

How GPT-5.3-Codex-Spark Compares to the Competition

The broader competitive picture is worth noting. GPT-5.3-Codex (the flagship) currently leads Terminal-Bench 2.0 at 77.3%, surpassing Claude Opus 4.6 by roughly 5 percentage points. On SWE-Bench Pro, it scores 56.8% versus 56.4% for GPT-5.2-Codex.

Spark doesn’t compete with these flagships on raw accuracy. Instead, it occupies a new category: ultra-fast coding models where responsiveness is the primary value proposition. As models across the industry converge on similar capability levels, speed and developer experience become key differentiators.

The broader GPT-5.3-Codex family also showed a massive jump on OSWorld-Verified, from 38.2% (GPT-5.2-Codex) to 64.7%, a 26.5 percentage point improvement that signals growing capability in real-world computer use tasks.

Technical Specifications

Context window: 128k tokens
Modality: Text-only (at launch)
Speed: 1,000+ tokens per second
Compared to flagship: 15x faster throughput
Behavior: Makes minimal, targeted edits by default; doesn’t auto-run tests unless instructed

Codex-Spark is the first in what OpenAI calls a family of ultra-fast models. The roadmap includes larger model variants, longer context windows, and multimodal input support.

Self-Bootstrapping Development

One notable detail from the announcement: early versions of GPT-5.3-Codex-Spark were instrumental in creating itself. OpenAI used earlier iterations to debug training code, manage deployment infrastructure, diagnose tests, and conduct evaluations. This kind of recursive self-improvement in the development pipeline is becoming more common across labs, but it’s still worth noting as a sign of where AI-assisted AI development is heading.

Safety Evaluation

OpenAI evaluated Codex-Spark through their standard deployment process and determined it does not reach their Preparedness Framework threshold for high capability in cybersecurity or biology. The model includes the same safety training as OpenAI’s mainline models with additional cyber-related safeguards.

Availability

Codex-Spark is rolling out as a research preview for ChatGPT Pro subscribers across:

Codex app (latest version)
CLI: codex --model gpt-5.3-codex-spark
VS Code extension

During the preview period, Spark operates under separate rate limits that don’t count toward standard ChatGPT usage limits. Peak demand may result in queuing. API access is coming soon, though pricing has not been announced.

What This Means

Codex-Spark is an interesting strategic move. Rather than chasing the next benchmark record, OpenAI is exploring a different axis of improvement: making AI coding assistants fast enough that they feel like a natural extension of your thought process rather than a tool you invoke and wait for.

The Cerebras partnership is key here. By moving to purpose-built inference hardware, OpenAI is decoupling from the GPU bottleneck that constrains most model serving. If the approach scales, it could fundamentally change how fast AI coding tools operate across the industry.

The trade-off is real—Spark isn’t as capable as the flagship for complex tasks. But for the majority of day-to-day coding interactions where speed matters more than maximum capability, that trade-off may be exactly right.

Learn More

Official announcement: openai.com/index/introducing-gpt-5-3-codex-spark
GPT-5.3-Codex: openai.com/index/introducing-gpt-5-3-codex
Cerebras partnership: openai.com
Codex CLI: Available via codex --model gpt-5.3-codex-spark