Home Artificial Intelligence Google unveils next-generation AI chip Trillium

by Anirban Ghoshal

Senior Writer

Google unveils next-generation AI chip Trillium

News

May 15, 20244 mins

Cloud ComputingGenerative AIGoogle Cloud Platform

Trillium, the sixth iteration of Google’s Tensor Processing Unit (TPU), is nearly five times more efficient than its predecessor, TPUv5, in peak compute performance and memory bandwidth, Google said.

Credit: Sundry Photography / Shutterstock

Google unveiled a new chip, Trillium, for training and running foundation large language models such as Gemma and Gemini at its annual I/O conference on Tuesday.

Trillium is the sixth iteration of Google’s Tensor Processing Unit (TPU) and is 67% more energy efficient and nearly five times as fast as its predecessor, TPU v5, according to the company. Google plans to use Trillium in its AI Hypercomputer, a supercomputing architecture designed for cutting edge AI-related workloads, and will make the chips available to enterprises by the end of the year.

“Trillium TPUs achieve an impressive 4.7X increase in peak compute performance per chip compared to TPU v5e. We doubled the High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled the Interchip Interconnect (ICI) bandwidth over TPU v5e,” Amin Vahdat, general manager of systems and cloud AI at Google, wrote in a blog post.

The increase in compute performance, according to Vahdat, is achieved by expanding the size of matrix multiply units (MXUs) and increasing the clock speed, which in turn makes it possible to train the next wave of foundation models faster and run them with reduced latency and lower cost.

MXUs are part of the TPU chip architecture. Typically, a TPU chip contains one or more TensorCores and each of these TensorCore consists of one or more MXUs, a vector unit, and a scalar unit.

Trillium chips can scale up to 256 TPUs in a single high-bandwidth, low-latency pod, Vahdat added.

Other Trillium features include dataflow processors that accelerate models relying on embeddings found in recommendation models, and support for more high-bandwidth memory (HBM) in order to work with larger models with more weights and larger key-value caches.

More slices

Further, Trillium comes with Google’s multislice technology, which the company introduced for the first time, in preview, while unveiling TPU v5e last year in August.

Multislice technology, according to the company, allows enterprise users to easily scale AI models beyond the boundaries of physical TPU pods — up to tens of thousands of Cloud TPU v5e or TPU v4 chips.

Before the release of this technology, training jobs using TPUs were limited to a single slice of TPU chips, capping the size of the largest jobs at a maximum slice size of 3,072 chips for TPU v4.

“With Multislice, developers can scale workloads up to tens of thousands of chips over inter-chip interconnect (ICI) within a single pod, or across multiple pods over a data center network,” Vahdat explained last year in a blog post co-written with his colleague Mark Lohmeyer.

Open source support

Trillium will support open source libraries, such as JAX, PyTorch/ XLA, and Keras 3, Vahdat said. “Support for JAX and XLA means that declarative model description written for any previous generation of TPUs maps directly to the new hardware and network capabilities of Trillium TPUs,” he wrote, adding that Google has partnered with Hugging Face on Optimum-TPU for streamlined model training and serving.

Google launched the first iteration of its TPU in 2016.

Most hyperscalers, including the likes of Microsoft, AWS, and IBM, have started developing their own chips for AI workloads as they face demand on one hand and shortage of Nvidia GPUs on the other.

While AWS has been iterating on its Tranium and Inferentia accelerators, Microsoft, last year, released its Cobalt CPU and Maia accelerator chips.

by Anirban Ghoshal

Senior Writer

Anirban Ghoshal is a senior writer covering enterprise software for CIO.com and databases and cloud and AI infrastructure for InfoWorld.

Americas

Topics

About

Policies

Our Network

More

Google unveils next-generation AI chip Trillium

Trillium, the sixth iteration of Google’s Tensor Processing Unit (TPU), is nearly five times more efficient than its predecessor, TPUv5, in peak compute performance and memory bandwidth, Google said.

More slices

Open source support

More from this author

Oracle to offer 131,072 Nvidia Blackwell GPUs via its cloud

Google Cloud Run now allows AI inferencing on Nvidia GPUs

Google offered complainant €470 million to maintain Microsoft antitrust probe: Report

China seeks 30% growth in national compute capacity by 2025

Alibaba to cease data center operations in India and Australia

Microsoft lays off staffers from its Azure division

Alibaba Cloud is betting on emerging markets with massive price cuts

Microsoft Build 2024: Cloud infra updates include Cobalt 100-based VMs, access to Copilot in Azure

Show me more

How to examine files on Linux

Supermicro unveils AI-optimized storage powered by Nvidia

Nvidia to power India’s AI factories with tens of thousands of AI chips

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command

Google unveils next-generation AI chip Trillium

Trillium, the sixth iteration of Google’s Tensor Processing Unit (TPU), is nearly five times more efficient than its predecessor, TPUv5, in peak compute performance and memory bandwidth, Google said.

More slices

Open source support

Related content

Billion-dollar fine against Intel annulled, says EU Court of Justice

F5, Nvidia team to boost AI, cloud security

AWS, Google Cloud certs command highest pay

2024 global network outage report and internet health check

Newsletter Promo Module Test

More from this author

Oracle to offer 131,072 Nvidia Blackwell GPUs via its cloud

Google Cloud Run now allows AI inferencing on Nvidia GPUs

Google offered complainant €470 million to maintain Microsoft antitrust probe: Report

China seeks 30% growth in national compute capacity by 2025

Alibaba to cease data center operations in India and Australia

Microsoft lays off staffers from its Azure division

Alibaba Cloud is betting on emerging markets with massive price cuts

Microsoft Build 2024: Cloud infra updates include Cobalt 100-based VMs, access to Copilot in Azure

Show me more

How to examine files on Linux

Supermicro unveils AI-optimized storage powered by Nvidia

Nvidia to power India’s AI factories with tens of thousands of AI chips

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command