Home Blogs Data Center Explorer Enfabrica looks to accelerate GPU communication

Enfabrica looks to accelerate GPU communication

News

Sep 23, 20244 mins

CPUs and ProcessorsData Center

SmartNICs are designed around CPU-to-CPU communication, making them less optimal for GPUs, the company argues.

Credit: Phonlamai Photo - shutterstock.com

Networking startup Enfabrica is making the rounds at trade shows to demonstrate its new networking products, which are specifically targeted to handle the heavy data throughput required for AI.

Enfabrica’s Accelerated Compute Fabric SuperNIC (ACF-S) silicon is designed to deliver higher bandwidth, greater resiliency, lower latency and greater programmatic control to data center operators running data-intensive AI and HPC.

The company came out of stealth mode last year, announcing a $125 million funding round led by Atreides Management with support from Nvidia – which is also in the smartNIC business with its BlueField line – as well as several venture firms.

Shrijeet Mukherjee, who previously headed up networking platforms and architecture at Google, started the company in 2020 with CEO Rochan Sankar, previously a director of engineering at Broadcom. The two zeroed in on what they say is a problem with networking hardware: that it is built on 20-year-old designs that are just fine for CPUs but not adequate for GPU networking.

“If you look at what happened with data center networking, it sort of evolved into this kind of design where you had traffic that was coming in from one direction, and what you wanted is to be able to share it and distribute it to a whole bunch of nodes. But AI and ML systems break the mold a little bit,” said Mukherjee, chief development officer.

Enfabrica contends that in traditional data center environments, there’s a problem with server networking component sprawl and stovepipe connections that limit bandwidth and fault tolerance. In an AI environment, data movement across GPUs requires multiple hops, is prone to congestion, and results in unpredictable load distribution. Failure of a GPU link stalls the entire job.

“The design of today’s supercomputers is not very fault tolerant, and they have to really go through a lot of effort to handle failures correctly,” Mukherjee said.

Enfabrica brings fault tolerance to networking design. Rather than point-to-point, there are multiple paths from any point to any other point, so the load can be distributed. In the case of a failure, the system will redistribute the load to a lesser number of links.

“If you look at data centers today, it’s built around this model that a two-socket system is your working set. If things fit in that two-socket server, life is great. The moment it’s outside [those boundaries], it’s not that efficient,” said Mukherjee.

“We finally concluded that the architecture itself needs to change, and the way you solve that problem needs to be addressed,” Mukherjee said. “We said it has to be a silicon company. It has to be something that that builds around this idea of what the modern system needs to look like and enables that in a fast and complete way.”

ACF-S delivers multi-terabit switching and bridging between heterogeneous compute and memory resources in a single silicon die without changing physical interfaces, protocols or software layers above device drivers. It reduces the number of devices, I/O latency hops, and device power usage in today’s AI clusters consumed by top-of-rack network switches, RDMA-over-Ethernet NICs, Infiniband HCAs, PCIe/CXL switches, and CPU-attached DRAM.

CXL memory bridging allows it to deliver headless memory scaling to any accelerator, enabling a single GPU rack to have direct, low-latency, uncontended access to local CXL.mem DDR5 DRAM at more than 50 times greater memory capacity versus GPU-native High-Bandwidth Memory (HBM) used on GPUs.

Enfabrica displayed its technology at a number of recent tech conferences, including Hot Chips, AI Summit, AI Hardware & Edge AI Summit, and Gestalt IT AI Tech Field Day. Next up is SuperComputing 2024, being held Nov. 17-22 in Atlanta.

Enfabrica has not said when it will ship its products.

by Andy Patrizio

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

The opinions expressed in this blog are those of the author and do not necessarily represent those of ITworld, Network World, its parent, subsidiary or affiliated companies.

Americas

Topics

About

Policies

Our Network

More

Enfabrica looks to accelerate GPU communication

SmartNICs are designed around CPU-to-CPU communication, making them less optimal for GPUs, the company argues.

More from this author

Intel, AMD forge x86 alliance

Vertiv and Nvidia define liquid cooling reference architecture

HPE, Dell launch another round of AI servers

AMD unveils new generation of Epyc, Instinct chips

Intel launches Xeon 6 processors and Gaudi 3 AI accelerators

Intel’s Altera spinout launches FPGA products, software

Intel rumored to be working on major core update

Microsoft, BlackRock form group to raise $100 billion for AI data centers

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

F5, Nvidia team to boost AI, cloud security

How to examine files on Linux

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command

Enfabrica looks to accelerate GPU communication

SmartNICs are designed around CPU-to-CPU communication, making them less optimal for GPUs, the company argues.

Related content

Supermicro unveils AI-optimized storage powered by Nvidia

Nvidia to power India’s AI factories with tens of thousands of AI chips

Gartner: 13 AI insights for enterprise IT

Network jobs watch: Hiring, skills and certification trends

Newsletter Promo Module Test

More from this author

Intel, AMD forge x86 alliance

Vertiv and Nvidia define liquid cooling reference architecture

HPE, Dell launch another round of AI servers

AMD unveils new generation of Epyc, Instinct chips

Intel launches Xeon 6 processors and Gaudi 3 AI accelerators

Intel’s Altera spinout launches FPGA products, software

Intel rumored to be working on major core update

Microsoft, BlackRock form group to raise $100 billion for AI data centers

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

F5, Nvidia team to boost AI, cloud security

How to examine files on Linux

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command