Loading...
OpenAI has taken a notable step away from Nvidia's hardware dominance by launching GPT-5.3-Codex-Spark, its first production AI model designed to run on alternative chip architecture. The new coding-focused model operates on Cerebras processors and achieves remarkable performance speeds of over 1,000 tokens per second, representing a 15-fold improvement over previous iterations.
The strategic significance of this move extends beyond mere performance metrics. By diversifying its hardware partnerships, OpenAI is positioning itself to reduce reliance on Nvidia's GPU ecosystem, which currently dominates AI infrastructure. Sachin Katti, OpenAI's head of compute, emphasized the importance of this partnership, noting Cerebras as a valuable engineering collaborator in expanding platform capabilities.
Codex-Spark represents a focused approach to AI model development, prioritizing speed and efficiency for coding tasks over the broad knowledge base of larger models. The model features a 128,000-token context window and operates exclusively with text input at launch. This specialization allows for optimized performance in its target domain while maintaining practical utility for developers.
Access to the new model is currently limited to ChatGPT Pro subscribers, who pay $200 monthly for premium features. Users can interact with Codex-Spark through multiple interfaces including the dedicated Codex application, command-line tools, and Visual Studio Code extensions. OpenAI is also gradually expanding API access to selected design partners, suggesting a measured rollout strategy.
Performance comparisons reveal the substantial improvements achieved through this hardware transition. OpenAI's existing models running on Nvidia infrastructure deliver significantly lower token rates: GPT-4o processes approximately 147 tokens per second, o3-mini reaches 167 tokens per second, and GPT-4o mini operates at around 52 tokens per second. The 1,000 tokens per second achieved by Codex-Spark represents a dramatic leap in processing speed.
Interestingly, Cerebras hardware has demonstrated even higher capabilities in other contexts. The company has recorded 2,100 tokens per second with Meta's Llama 3.1 70B model and achieved 3,000 tokens per second with OpenAI's own open-weight gpt-oss-120B model. These figures suggest that the current Codex-Spark performance may represent conservative utilization of the available hardware capabilities.
The competitive landscape for AI coding assistance has intensified significantly, with speed becoming a crucial differentiator. Previous evaluations highlighted Codex's performance disadvantages, particularly in head-to-head comparisons with Anthropic's Claude Code, which completed coding tasks in roughly half the time. The new Spark model appears specifically designed to address these competitive weaknesses.
OpenAI claims superior performance on industry-standard benchmarks including SWE-Bench Pro and Terminal-Bench 2.0, which evaluate software engineering capabilities. However, the company has not yet provided independent validation of these performance claims, leaving room for third-party verification of the reported improvements.
The broader implications of this development extend to the AI industry's infrastructure evolution. As companies seek to optimize costs and performance while reducing vendor dependency, alternative hardware solutions like Cerebras chips may gain increased adoption. This trend could accelerate innovation in specialized AI processors and create more competitive dynamics in the hardware market.
The focus on specialized, task-specific models also reflects a maturing approach to AI development. Rather than relying exclusively on large, general-purpose models, companies are increasingly creating optimized solutions for specific use cases. This specialization strategy can deliver better performance and efficiency for targeted applications while potentially reducing computational costs.
Looking forward, the success of Codex-Spark could influence broader industry trends toward hardware diversification and model specialization. As AI workloads continue to grow and evolve, the ability to deploy models on diverse hardware platforms may become increasingly valuable for maintaining competitive advantage and operational flexibility.
Related Links:
2,100 tokens
Context Window
3,000 tokens
Context Window
Note: This analysis was compiled by AI Power Rankings based on publicly available information. Metrics and insights are extracted to provide quantitative context for tracking AI tool developments.