GPT-5.3-Codex-Spark: OpenAI's Bet on Real-Time AI Coding Hits 1,000 Tokens Per Second
OpenAI has released a research preview of GPT-5.3-Codex-Spark, a smaller, speed-optimized variant of their GPT-5.3-Codex model and their first model designed specifically for real-time coding. The headline claim: over 1,000 tokens per second, achieved by running on Cerebras Wafer-Scale Engine 3 (WSE-3) hardware instead of traditional NVIDIA GPUs.
This is the first tangible result of the OpenAI-Cerebras partnership announced in January 2026, and it signals a clear strategic shift—speed as a first-class feature for coding models, not just an afterthought.
Why Speed Matters for Coding
The argument for Codex-Spark is straightforward: when a model responds fast enough, you can stay in a flow state. Instead of context-switching while waiting for a 15-minute agentic run to complete, you get near-instant feedback that enables rapid iteration.
This is a different design philosophy from the larger GPT-5.3-Codex, which prioritizes thoroughness and accuracy over latency. Spark doesn’t replace the flagship—it complements it by targeting a different workflow: real-time collaboration rather than long-horizon autonomous execution.
OpenAI envisions this as the beginning of a dual-mode Codex system:
- GPT-5.3-Codex: Longer-horizon reasoning and execution for complex, multi-step tasks
- GPT-5.3-Codex-Spark: Real-time collaboration for rapid iteration and interactive development
The Cerebras Hardware Advantage
Codex-Spark is OpenAI’s first model to run on the Cerebras WSE-3, a wafer-scale chip featuring more than 4 trillion transistors on what’s been described as a “dinner plate-sized piece of silicon.” The architecture eliminates data bottlenecks by using wafer-scale memory, enabling the extreme throughput numbers.
The infrastructure improvements go beyond the chip itself:
- 80% reduction in client-server roundtrip overhead
- 30% reduction in per-token overhead
- 50% reduction in time-to-first-token
- Persistent WebSocket connections enabled by default
These optimizations collectively make the model feel near-instant in practice, not just on paper.
Benchmarks: The Speed-Accuracy Trade-off
Codex-Spark is honest about what it is: a smaller model optimized for speed, not a flagship killer. The benchmark numbers reflect this trade-off clearly.
Terminal-Bench 2.0
| Model | Score |
|---|---|
| GPT-5.3-Codex | 77.3% |
| GPT-5.3-Codex-Spark | 58.4% |
| GPT-5.1-Codex-mini | 46.1% |
Spark scores roughly 19 points below the flagship on Terminal-Bench 2.0, the benchmark measuring agentic terminal-based coding. That’s a meaningful gap—but Spark completes tasks in a fraction of the time.
SWE-Bench Pro
On SWE-Bench Pro, the story is more interesting. Codex-Spark reportedly achieves similar accuracy to the flagship, but completes tasks in 2-3 minutes compared to 15-17 minutes for GPT-5.3-Codex. For tasks where the smaller model is capable enough, you’re getting roughly equivalent results 5-8x faster.
Where Spark Fits
The benchmarks suggest a clear division of labor:
- Routine coding tasks (bug fixes, small features, refactoring): Spark handles these at near-instant speed with sufficient accuracy
- Complex multi-file architecture changes: The flagship GPT-5.3-Codex remains the better choice
- Interactive debugging and iteration: Spark’s speed makes it ideal for rapid back-and-forth
How GPT-5.3-Codex-Spark Compares to the Competition
The broader competitive picture is worth noting. GPT-5.3-Codex (the flagship) currently leads Terminal-Bench 2.0 at 77.3%, surpassing Claude Opus 4.6 by roughly 5 percentage points. On SWE-Bench Pro, it scores 56.8% versus 56.4% for GPT-5.2-Codex.
Spark doesn’t compete with these flagships on raw accuracy. Instead, it occupies a new category: ultra-fast coding models where responsiveness is the primary value proposition. As models across the industry converge on similar capability levels, speed and developer experience become key differentiators.
The broader GPT-5.3-Codex family also showed a massive jump on OSWorld-Verified, from 38.2% (GPT-5.2-Codex) to 64.7%, a 26.5 percentage point improvement that signals growing capability in real-world computer use tasks.
Technical Specifications
- Context window: 128k tokens
- Modality: Text-only (at launch)
- Speed: 1,000+ tokens per second
- Compared to flagship: 15x faster throughput
- Behavior: Makes minimal, targeted edits by default; doesn’t auto-run tests unless instructed
Codex-Spark is the first in what OpenAI calls a family of ultra-fast models. The roadmap includes larger model variants, longer context windows, and multimodal input support.
Self-Bootstrapping Development
One notable detail from the announcement: early versions of GPT-5.3-Codex-Spark were instrumental in creating itself. OpenAI used earlier iterations to debug training code, manage deployment infrastructure, diagnose tests, and conduct evaluations. This kind of recursive self-improvement in the development pipeline is becoming more common across labs, but it’s still worth noting as a sign of where AI-assisted AI development is heading.
Safety Evaluation
OpenAI evaluated Codex-Spark through their standard deployment process and determined it does not reach their Preparedness Framework threshold for high capability in cybersecurity or biology. The model includes the same safety training as OpenAI’s mainline models with additional cyber-related safeguards.
Availability
Codex-Spark is rolling out as a research preview for ChatGPT Pro subscribers across:
- Codex app (latest version)
- CLI:
codex --model gpt-5.3-codex-spark - VS Code extension
During the preview period, Spark operates under separate rate limits that don’t count toward standard ChatGPT usage limits. Peak demand may result in queuing. API access is coming soon, though pricing has not been announced.
What This Means
Codex-Spark is an interesting strategic move. Rather than chasing the next benchmark record, OpenAI is exploring a different axis of improvement: making AI coding assistants fast enough that they feel like a natural extension of your thought process rather than a tool you invoke and wait for.
The Cerebras partnership is key here. By moving to purpose-built inference hardware, OpenAI is decoupling from the GPU bottleneck that constrains most model serving. If the approach scales, it could fundamentally change how fast AI coding tools operate across the industry.
The trade-off is real—Spark isn’t as capable as the flagship for complex tasks. But for the majority of day-to-day coding interactions where speed matters more than maximum capability, that trade-off may be exactly right.
Learn More
- Official announcement: openai.com/index/introducing-gpt-5-3-codex-spark
- GPT-5.3-Codex: openai.com/index/introducing-gpt-5-3-codex
- Cerebras partnership: openai.com
- Codex CLI: Available via
codex --model gpt-5.3-codex-spark