Home Networking Cisco sets a foundation for AI network infrastructure

by Michael Cooney

Senior Editor

Cisco sets a foundation for AI network infrastructure

News Analysis

Jun 20, 20234 mins

Cisco SystemsNetworking

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

diversity saudi arabia turkey middle east networking globe map connections by dem10 gettyimages 118

Credit: Getty Images

Cisco is taking the wraps off new high-end programmable Silicon One processors aimed at underpinning large-scale Artificial Intelligence (AI)/Machine Learning (ML) infrastructure for enterprises and hyperscalers.

The company has added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its now 13-member Silicon One family that can be customized for routing or switching from a single chipset, eliminating the need for different silicon architectures for each network function. This is accomplished with a common operating system, P4 programmable forwarding code, and an SDK.

The new devices, positioned at the top of the Silicon One family, bring networking enhancements that make them ideal for demanding AI/ML deployments or other highly distributed applications, according to Rakesh Chopra, a Cisco Fellow in the vendor’s Common Hardware Group.

“We are going through this huge shift in the industry where we used to build these sorts of reasonably small high-performance compute clusters that seemed large at the time but nothing compared to the absolutely huge deployments required for AI/ML,” Chopra said. AI/ML models have grown from needing a few GPUs to needing tens of thousands linked in parallel and in series. “The number of GPUs and the scale of the network is unheard of.”

The new Silcon One enhancements include a P4-programmable parallel-packet processor capable of launching more than 435 billion lookups per second.

“We have a fully shared packet buffer where every port has full access to the packet buffer regardless of what’s going on,” Chopra said. This is in contrast with allocating buffers to individual input and output ports, which means the buffer you get depends on which port the packets go to. “That means that you’re less capable of writing through traffic bursts and more likely to drop a packet, which really decreases AI/ML performance,” he said.

In addition, each Silicon One device can support 512 Ethernet ports letting customers build a 32K 400G GPU AI/ML cluster requiring 40% fewer switches than other silicon devices needed to support that cluster, Chopra said.

Core to the Silicon One system is its support for enhanced Ethernet features such as improved flow control, congestion awareness, and avoidance.

The system also includes advanced load-balancing capabilities and “packet-spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, the company stated.

Combining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a Scheduled Fabric.

In a Scheduled Fabric, the physical components—chips, optics, switches—are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior, Chopra said. “Ultimately what it translates to is much higher bandwidth throughput, especially for flows like AI/ML, which lets you get much lower job-completion time, which means that your GPUs run much more efficiently.”

With Silicon One devices and software, customers can deploy as many or as few of these features as they need, Chopra said.

Cisco is part of a growing AI networking market that includes Broadcom, Marvell, Arista and others that is expected to hit $10B by 2027, up from the $2B it is worth today, according to a recent blog from the 650 Group.

“AI networks have already been thriving for the past two years. In fact, we have been tracking AI/ML networking for nearly two years and see AI/ML as a massive opportunity for networking and one of the main drivers for data-center networking growth in our forecasts,” the 650 blog stated. “The key to AI/ML’s impact on networking is the tremendous amount of bandwidth AI models need to train, new workloads, and the powerful inference solutions that appear in the market. In addition, many verticals will go through multiple digitization efforts because of AI during the next 10 years.”

The Cisco Silicon One G200 and G202 are being tested by unidentified customers now and are available on a sampled basis, according to Chopra.

by Michael Cooney

Senior Editor

Michael Cooney is a Senior Editor with Network World who has written about the IT world for more than 25 years. He can be reached at michael_cooney@foundryco.com.

Americas

Topics

About

Policies

Our Network

More

Cisco sets a foundation for AI network infrastructure

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

More from this author

Gartner: 13 AI insights for enterprise IT

IBM launches platform to protect data from AI and quantum risks

AI dominates Gartner’s 2025 predictions

Gartner: Top 10 strategic technology trends for 2025

Has the time come for integrated network and security platforms?

Meta taps Arista for Ethernet-based AI clusters

Cisco pumps up data center networking with AI, large workloads in mind

Cisco revamps key DevNet sandboxes

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

How to examine files on Linux

Supermicro unveils AI-optimized storage powered by Nvidia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command

Cisco sets a foundation for AI network infrastructure

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

Related content

F5, Nvidia team to boost AI, cloud security

AWS, Google Cloud certs command highest pay

Why enterprises should care more about net neutrality

Network jobs watch: Hiring, skills and certification trends

Newsletter Promo Module Test

More from this author

Gartner: 13 AI insights for enterprise IT

IBM launches platform to protect data from AI and quantum risks

AI dominates Gartner’s 2025 predictions

Gartner: Top 10 strategic technology trends for 2025

Has the time come for integrated network and security platforms?

Meta taps Arista for Ethernet-based AI clusters

Cisco pumps up data center networking with AI, large workloads in mind

Cisco revamps key DevNet sandboxes

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

How to examine files on Linux

Supermicro unveils AI-optimized storage powered by Nvidia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command