Google Just Built Two AI Chips That Challenge Nvidia — Here's the Full Story
TPU 8t and TPU 8i. One for training, one for inference. Both designed to make AI faster, cheaper, and more powerful than ever before.
AI AUTOMATION
Jyotsna
4/25/20266 min read
For a decade, Nvidia has ruled the AI chip market almost unchallenged. But on April 22, 2026, at Google Cloud Next in Las Vegas, something shifted. Google announced its 8th generation of Tensor Processing Units — and this time, they didn't just build a better chip. They built two entirely different ones, each purpose-built for a different job. And the AI industry is paying very close attention.
Let's start with the basics, because "AI chips" can sound abstract. Every AI model you've ever used — ChatGPT, Google Gemini, Claude — was built in two stages. First, it was trained on massive amounts of data (a process that takes weeks or months and costs millions of dollars). Then, it was deployed so that users like you and me can ask it questions and get answers in real time. That second stage is called inference.
For years, companies used the same type of chip — mostly Nvidia's GPUs — for both jobs. But as AI has grown more complex, the needs of training and inference have become very different. Training needs raw, brute-force computation. Inference needs speed and low latency. Trying to do both with one chip is like using a bulldozer to parallel park.
Google's answer? Stop trying. Build a separate chip for each job. Meet the TPU 8t (training) and the TPU 8i (inference).
Key Numbers at a Glance
2.7× — Faster Training vs Last Gen
80% — Better Inference per Dollar
1M+ — TPUs in One Cluster
The TPU 8t: Built for Training AI Faster Than Ever
The TPU 8t is Google's heavy-duty training workhorse. It was designed with one goal: cut the time it takes to train massive AI models from months down to weeks. That might sound like a minor improvement, but in the AI industry, that's the difference between winning and losing the race.
What's under the hood?
Each TPU 8t chip packs 12.6 petaFLOPS of FP4 compute, 216 GB of high-bandwidth memory capable of 6.5 TB/s of bandwidth, and up to 19.2 Tbps of chip-to-chip bandwidth. But the real magic happens when you chain them together. Google can cluster 9,600 of these chips into a single superpod — delivering a staggering 121 exaflops of total compute. That's three times more than the previous generation.
Google has also replaced the traditional x86 processors with its own Arm-based Axion CPUs as the host, which ensures the TPUs are always fed with data and never sitting idle. The result: up to 2x better performance per watt, which means lower electricity bills and a smaller carbon footprint for running these massive data centres.
"Our eighth-generation TPUs are the culmination of more than a decade of development." — Amin Vahdat, SVP & Chief Technologist for AI, Google Cloud
The TPU 8i: Making AI Responses Lightning Fast (and Cheap)
If the TPU 8t is the factory that builds the AI, the TPU 8i is the customer service counter — the part that talks to millions of users every second and has to respond instantly. This chip is all about speed, scale, and cost efficiency for inference.
The "memory wall" problem — and how Google solved it
Here's a problem most people don't know about: when an AI model is answering your question, it needs to constantly move data back and forth between its memory and its processor. If memory is slow or limited, the chip ends up sitting idle, waiting. Google calls this the "waiting room" effect, and it wastes time and money.
The TPU 8i eliminates this with a massive on-chip memory pool — 288 GB of HBM plus 384 MB of fast SRAM (3x more than the previous generation). This keeps the model's active data entirely on-chip, so the processor never has to wait. The result: 5x lower collective communication latency and 80% better performance per dollar on large AI models.
Sundar Pichai himself said the TPU 8i can deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively. In plain English: you can run far more AI assistants for far less money.
TPU 8t vs TPU 8i: Side-by-Side Comparison
SpecificationTPU 8t (Training)TPU 8i (Inference)Primary UseModel TrainingReal-time InferenceCompute Power12.6 petaFLOPS (FP4)10.1 petaFLOPS (FP4)HBM Memory216 GB @ 6.5 TB/s288 GB @ 8.6 TB/s ✓On-chip SRAM128 MB384 MB (3× more) ✓Max Cluster Size9,600 chips (121 exaFLOPS) ✓1,152 chips (11.6 exaFLOPS FP8)Performance Gain2.7× vs last gen (same price) ✓80% better per dollar ✓Power Efficiency2× better per watt ✓2× better per watt ✓Chip PartnerBroadcom ("Sunfish")MediaTek ("Zebrafish")AvailabilityLate 2026Late 2026
Who's Already Buying Google's New Chips?
The best signal of a chip's quality isn't the spec sheet — it's who's willing to pay for it. And some of the world's most demanding AI companies have already signed up.
Anthropic — Multi-gigawatt TPU commitment
OpenAI — Now taking TPU capacity (huge signal!)
Meta — Multi-billion, multi-year deal
Apple — Testing AI workloads on Google TPUs
Mistral AI — Signed on for Rubin + TPU mix
Perplexity — Part of broad TPU adoption wave
The most significant name on that list? OpenAI. Historically, OpenAI has trained its models entirely on Nvidia GPUs, through a close partnership with Microsoft. The fact that OpenAI is now taking Google TPU capacity is, as one analyst put it, "the first visible crack in the single-vendor AI substrate." That's a very big deal.
Google vs Nvidia: Who Wins?
Before you start writing Nvidia's obituary — don't. Nvidia is still the dominant force in AI chips, with over $193 billion in data centre revenue in fiscal year 2026, and its new Rubin chips are already sold out. Google itself is actually also planning to deploy Nvidia Rubin GPUs in its cloud, even as it launches its own competing chips.
Google TPU 8 Advantages
Purpose-built for training OR inference
80% better inference cost efficiency
Tightly integrated with Google Cloud
2× better performance per watt
Yearly release cadence planned
Nvidia's Strengths
Massive CUDA software ecosystem
Works with any cloud or on-premise setup
Rubin: 35 petaFLOPS raw compute
Broadest third-party support
Decades of developer trust and tools
The real story here isn't "Google beats Nvidia." It's that the AI chip market is finally becoming competitive. For years, Nvidia had what analysts called a "monopoly premium" — companies paid Nvidia prices because there was no real alternative. Now there is. And when there's real competition, prices fall and capabilities improve — which is good for everyone building AI products.
What Does This Mean for Regular People?
If you're not a data centre engineer, you might be wondering why any of this matters to you. Here's why: every AI tool you use — your AI assistant, your smart search, your code autocomplete — costs money to run. Those costs are paid by the companies that build those tools, and they eventually get passed on to you in the form of subscription prices.
When inference gets 80% cheaper, AI companies can offer better AI at lower prices. Or they can reinvest those savings into building smarter AI. Either way, the end user wins. Faster chips mean faster AI responses. Cheaper chips mean more companies can afford to build AI products. And more competition between chip makers means no single company can hold the entire AI industry hostage to its pricing.
The TPU 8i, in particular, was explicitly designed to run millions of AI agents concurrently — meaning the era of AI that works on your behalf in the background, doing research, booking appointments, managing tasks, is about to get a lot more affordable to operate.
⭐ Key Takeaways — Google TPU 8t & 8i
Google split its AI chip into two — one for training (TPU 8t), one for inference (TPU 8i) — the first company to fully commit to this specialization at scale.
TPU 8t is 2.7× faster than last year's chip at the same price, and can run 121 exaFLOPS in a 9,600-chip cluster.
TPU 8i is 80% more cost-efficient for running AI, with 3× more on-chip memory and 5× lower communication latency.
OpenAI, Anthropic, Meta, and Apple are all adopting Google TPUs — a major crack in Nvidia's AI monopoly.
Both chips will be generally available in late 2026, with early access starting Q3 2026.
Google still plans to use Nvidia chips too — this is about diversification, not replacing Nvidia entirely.
The Bottom Line
Google just made its boldest move yet in the AI hardware war. By splitting its 8th generation TPU into two specialized chips, it has done something Nvidia has been reluctant to do — fully commit to purpose-built silicon for training and inference as separate disciplines. The numbers back it up: 2.7× faster training, 80% cheaper inference, 2× better power efficiency.
The fact that OpenAI — Nvidia's most loyal customer — is now ordering Google TPUs says more than any benchmark. The AI chip market just became a real competition. And the winner of that competition, ultimately, is the AI we all get to use.
Both the TPU 8t and TPU 8i arrive in general availability later in 2026. The AI race just got a whole lot more interesting.



