AI

OpenAI and Broadcom unveil Jalapeño

OpenAI and Broadcom unveil Jalapeño

OpenAI logo — via Wikimedia Commons

OpenAI and Broadcom have unveiled a custom LLM-optimized inference chip, the first of its kind co-developed by the two companies, as part of a multi-generation silicon platform aimed at cutting inference costs and latency for frontier AI models. OpenAI and Broadcom unveil LLM-optimized inference chip

Dubbed Jalapeño, the accelerator is a blank-slate design built specifically for LLM inference workloads, rather than a retrofitted general-purpose AI chip adapted from earlier training or inference use cases. Early testing indicates it delivers substantially better performance per watt than current state-of-the-art inference silicon, with engineering samples already running OpenAI’s GPT-5.3-Codex-Spark model at production target frequency and power levels. OpenAI and Broadcom unveil LLM-optimized inference chip

Jalapeño’s architecture is optimized for LLM inference-specific workloads

Unlike general-purpose AI accelerators, Jalapeño’s architecture is optimized around the specific kernel, memory movement, networking, and serving patterns that drive frontier LLM performance, per OpenAI hardware lead Richard Ho. The design cuts unnecessary data movement across compute, memory, and networking resources to push realized utilization far closer to theoretical peak performance than existing inference silicon, and integrates Broadcom’s Tomahawk networking silicon for large-scale production deployment. OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI led the architectural design for Jalapeño, with Broadcom handling silicon implementation and Celestica supporting board, rack, and system integration alongside high-performance networking and scalable production ramp. The entire development cycle from initial design to manufacturing tape-out took just nine months, a timeline OpenAI and Broadcom state is the fastest ever achieved for a high-performance advanced semiconductor ASIC. OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI also used its own consumer-facing AI models to accelerate parts of the design and optimization process, a meta-use case where the tools sold to end users are used to build the infrastructure that runs those same tools. OpenAI and Broadcom unveil LLM-optimized inference chip

2026 gigawatt-scale deployment to support growing AI workload demand

Jalapeño is the first entry in a multi-generation custom silicon roadmap from OpenAI and Broadcom, with plans to deploy the platform at gigawatt scale in data centers built with Microsoft and other partners starting in 2026, per Broadcom CEO Hock Tan. OpenAI and Broadcom unveil LLM-optimized inference chip

The chip is a core component of OpenAI’s full-stack infrastructure strategy, which spans model development, product design, and now custom silicon, kernel, and systems optimization. By controlling more of the stack, OpenAI aims to create a self-reinforcing flywheel: more efficient infrastructure lowers inference costs, which enables more capable model training and cheaper serving for users, driving higher adoption and revenue to fund the next generation of hardware. OpenAI and Broadcom unveil LLM-optimized inference chip

For end users, the chip’s efficiency gains are expected to translate to lower inference costs and reduced latency for ChatGPT, the OpenAI API, and Codex. The gigawatt-scale deployment target also aligns with growing demand for agentic AI workloads, which require far more sustained compute than single-turn chatbot interactions. How agents are transforming work

Bottom line: Jalapeño gives OpenAI direct control over its inference hardware stack, with a confirmed 9-month development timeline, demonstrated performance-per-watt gains over current state-of-the-art inference silicon, and a 2026 gigawatt-scale deployment plan that is expected to lower inference costs and reduce latency for ChatGPT, the OpenAI API, and Codex users. OpenAI and Broadcom unveil LLM-optimized inference chip

We may earn commission from affiliate links at no extra cost to you. Last updated: Jun 25, 2026.
Aira

Founding Editor and Publisher of ZBrandCo, covering artificial intelligence, open-source software, and the developer tools people actually use. Signal over hype: every story starts from a primary source and explains why it matters. ZBrandCo runs no paid reviews and no affiliate links. Tips and corrections: editorial@zbrandco.com.