At the Hot Chips 2024 conference, IBM announced its Telum II processor and previewed its Spyre accelerator for AI and other high-performance Big Iron workloads.
IBM is outfitting the next generation of its Z and LinuxONE mainframes with its latest Telum processor and a new accelerator aimed at boosting performance of AI and other data-intensive workloads.
The new processor, the IBM Telum II, has greater memory and cache capacity than the previous generation, and it integrates a new data processing unit (DPU) specialized for IO acceleration along with enhanced on-chip AI acceleration capabilities.
Developed using Samsung 5nm technology, Telum II has eight high-performance cores running at 5.5GHz, according to IBM. It includes a 40% increase in on-chip cache capacity with virtual L3 and virtual L4 growing to 360MB and 2.88GB, respectively.
“The compute power of each accelerator is expected to be improved by 4x, reaching 24 trillion operations per second (TOPS). But TOPS alone don’t tell the whole story,” wrote Christian Jacobi, IBM Fellow and CTO, IBM Systems Development, and Elpida Tzortzatos, IBM Fellow and CTO of z/OS and AI on IBM Z and LinuxONE, in a blog about the new processor.
“It is all about the accelerator’s architectural design plus optimization of the AI ecosystem that sits on top of the accelerator. When it comes to AI acceleration in production enterprise workloads, a fit-for-purpose architecture matters. Telum II is engineered to enable model runtimes to sit side by side with the most demanding enterprise workloads, while delivering high throughput, low-latency inferencing.”
In a maximum configuration, future IBM Z systems can be equipped with up to 32 Telum II processors and 12 IO cages. Each cage can accommodate up to 16 PCIe slots, allowing the system to support up to 192 PCIe cards. Custom I/O protocols will enhance availability, error checking, and virtualization to meet massive bandwidth requirements and provide redundancy and multi-pathing for protection against simultaneous multi-failure scenarios.
“New compute primitives have also been incorporated to better support large language models within the accelerator. They are designed to support an increasingly broader range of AI models for a comprehensive analysis of both structured and textual data,” Jacobi and Tzortzatos wrote.
New DPU on the Telum II processor chip
IBM’s first Telum processor, introduced in 2021, included an on-chip AI accelerator for inferencing. With the new generation, IBM significantly enhanced the AI accelerator on the Telum II processor. Telum II also adds a new specialized DPU for IO acceleration. The DPU simplifies system operations and can improve key component performance, according to IBM.
From a networking and I/O perspective, one of the benefits of this approach is to move from a 2-port fiber connection (FICON) card to a 4-port card and consolidate the Open Systems Adapter (OSA) Express – the mainframe’s package for networking via a variety of networking protocols – and RDMA over Converged Ethernet (RoCE) Express offerings at the system level, according to Michael Becht, chief engineer and architect for IBM Z I/O channels, and Susan M. Eickhoff, director, IBM Z processor development.
“This change, available beginning with the next-generation IBM Z in the first half of 2025, will allow clients to maintain the same I/O configuration in a smaller footprint, to reduce data center floorspace as they upgrade and modernize their infrastructure,” Becht and Eickhoff wrote in a blog.
IBM Spyre Accelerator
A complement to the Telum II processor is the new Spyre Accelerator, which provides additional AI compute capabilities.
The Spyre Accelerator will contain 1TB of memory and 32 AI accelerator cores that will share a similar architecture to the AI accelerator integrated into the Telum II chip, according to Jacobi and Tzortzatos: “Multiple IBM Spyre Accelerators can be connected into the I/O Subsystem of IBM Z via PCIe. Combining these two technologies can result in a substantial increase in the amount of available acceleration.”
Taken together, the IBM Telum II and the Spyre Accelerator represent a key inflection point for the mainframe, according to Steven Dickens, chief technology advisor with The Futurum Group.
“The fact that all of this chip and AI innovation is coming from IBM and being deployed on mainframes is just about as innovative and important as it gets for enterprise customers,” Dickens said.
Mainframe workloads beyond AI
While the new processor technologies take aim at AI development and workload handling, Big Iron has other transaction-heavy applications and use cases that will also see a boost in performance and energy efficiency, according to Tina Tarquinio, vice president, product management for IBM Z and LinuxONE.
“The use cases for cases for Spyre and the accelerator really span every business use case you can think of,” Tarquinio said. “IBM uses it to accelerate and use AI to help with our internal HR inquiries and functions, for example. The next generation of IBM Z will maintain its leadership in resiliency, plus we have eight nines of availability security in addition to being the only quantum-safe system out there.”
Analysts, too, said the new Telum II-based mainframes will have an impact not only on enterprise AI development but also on other applications, such as database management and distributed cloud or hybrid cloud environments.
“Just look at the core characteristics of these things as servers – they will just be I/O beasts,” Dickens said. “Customers could run big Oracle, or MongoDBs or other mission critical applications much more efficiently.”
Customers will be able to take transactional workloads off the main CPU and move that work to the accelerator for further machine learning, AI or generative AI evaluation and handling, Dickens said, which makes operational, scalable sense.
“In addition to code generation, this scalable mainframe AI platform (chip/card/software) would be good for a number of applications, including credit ratings, fraud detection, compliance, financial settlements, and document processing and simulation,” said Patrick Moorhead, founder, CEO and chief analyst of Moor Insights & Strategy.
“If you’re an enterprise and have a mainframe, you likely are using it for mission-critical apps that require the highest level of resilience and security. Previously in AI, enterprises would move the data off the mainframe to a GPU server, do work on it, then send it back on the mainframe,” Moorhead said. “That’s not efficient or fast and less secure for apps like credit ratings, fraud detection, and compliance.”
IBM’s Jacobi also talked about how code security and compliance will benefit from the new AI support.
“Many clients run dozens of millions of lines of code, or hundreds of millions of lines of code in their applications, and they are very security concerned and sensitive about the code base,” Jacobi said. “The code base itself is sort of a codified business process of how to run an insurance company, or how to run a bank. So, of course, that is very valuable IP to them.”
“When customers do AI on those kinds of code structures, they would prefer to do that directly within the secure environment of the mainframe, rather than doing that analysis elsewhere. And now they can,” Jacobi said.
“With the Spyre, we can cluster up to eight cards together to get to the memory size and compute capacity to run generative workloads on that code. And we’ll be integrating that with our higher-level stack products like Watson Code Assistant for Z with optimized models that that are trained and tuned to have the knowledge that is necessary to do kind of mainframe code refactoring and mainframe code explanation,” Jacobi said.
The Telum II processor will be the central processor powering IBM’s next-generation IBM Z and IBM LinuxONE platforms and is expected to be available in 2025, according to IBM. The Spyre Accelerator, currently in tech preview, is also expected to be available in 2025.
Related reading: