The GPU is the mainstay of AI processing, but several companies think they have a better option.
Twenty years ago, Nvidia made a calculated decision to expand its focus from gaming to high-performance computing processing. So much of HPC is mathematics, and the GPU is, by design, a massive math coprocessor with thousands of cores operating in parallel.
That decision has played out well for Nvidia. In its most recent quarter, Nvidia posted record-high data center revenue of $14.5 billion, which is up 41% from the prior quarter and 279% from the year-ago quarter. Its GPUs are now the standard in AI processing, even more so than they are in gaming.
Naturally, there are plenty of companies coming for Nvidia’s crown. It’s not only the obvious competitors like AMD and Intel but also several startups that purport to have created better ways to process large language models (LLMs) and other elements of AI. These companies include SambaNova, Cerebras, GraphCore, Groq, xAI, and more. At the same time, Intel is pursuing a GPU alternative as well with its Gaudi3 processor (along with having the Max GPU line for data centers).
These vendors are chasing a massive opportunity: Precedence Research puts the AI hardware market at $43 billion in 2022 and rising to $240 billion by 2030.
Limitations of legacy GPU technology
The CPU is not ideal for dedicated processing like AI because it is a general-purpose processor, which means it’s doing a lot of things that it might not need to do, such as powering the system, says Glenn O’Donnell, senior vice president and analyst with Forrester Research.
“It’s burning power and using circuitry that’s not really necessary. So what if you could have a chip that’s optimized for a specific thing?” he said. “Google’s TensorFlow processor is probably one of the most glaring examples of that. It’s optimized for that tensor flow algorithm and the processing necessary to do tensor flow analytics. It’s not a compromise. It’s built for that purpose.”
The GPU has the same problem. The GPU was architected in the 1990s for 3-D gaming acceleration, and like the CPU, it could also stand to be more efficient, notes Daniel Newman, principal analyst with Futurum Research.
“In the general construct, the architecture is still kind of a kernel model, which [means] you do one thing at a time, and then you require a host chip to orchestrate all of the models, or other parts of the models, that need to be computed. And so there is a lot of intercommunication that has to happen between the processors, disassembling the model to break into pieces to feed each of those GPUs and reassembling it in order to actually construct the foundation models,” he said.
Elmer Morales, founder, CEO and head of engineering at Ainstein.com, a platform that allows individuals and businesses to create their own autonomous assistant, said that in the early days of AI and HPC, the industry started using these GPUs because they were already available and “sort of like the low hanging fruit.”
The pitch the GPU-alternative vendors are making is that they have built a better mousetrap.
“You will find that the GPU does a good job as far as general training for a broad range of things, and you can learn how to deploy them very, very quickly,” said Rodrigo Liang, co-founder and CEO of SambaNova Systems. “As you get into these really, really large models, you start to see some deficiencies. When you get to the size of GPT, you’re needing to run thousands of these chips. And ultimately, those chips are not running at great efficiency.”
James Wang, senior product marketing manager at Cerebras Systems, echoes the legacy design sentiment and says that the GPU chip is simply too small. Its chip, the Wafer-Scale Engine-2 (WSE-2), is the size of an album cover. Whereas the Hopper GPU has a few thousand cores, WSE-2 has 850,000 cores, and the company claims 9,800 times the memory bandwidth of the GPU.
“The amount of memory determines what how large-scale of a model you can train,” said Wang. “So if your starting point is a GPU, the maximum you can have is geared toward the size of the GPU and the accompanying memory. If you want to go larger, that problem becomes much more difficult. And you basically have to program around all the weak points of the GPU.”
Morales also said that the GPU is just too small for massive models, and the model has to be split among thousands of GPU chips for processing. “Latency aside, it’s just too small if the model doesn’t fit.” Eighty gigabytes – which is the amount of memory in a Nvidia H100 GPU – “is not enough for a large model,” he said.
By making a physically larger chip with more cores and more memory, more of a large language model can be processed on a per-chip basis, which means fewer chips overall are required to do the work. This translates to lower power draw, and power is a major concern when it comes to processor-intensive AI workloads.
Chipmaker ecosystems bundle hardware and software
While the emphasis is on their processors, startups like Cerebras and SambaNova are more than just chipmakers, they are complete system developers. They provide the server hardware and a software stack to run the applications. Then again, so do Intel, AMD, and Nvidia. All three are known for their silicon, but they have major and massive software efforts around AI.
The software ecosystems have served two purposes: to support the hardware and to lock customers into their respective platforms. “A GPU or even a CPU by itself is pretty useless,” said O’Donnell. “One of the reasons that Nvidia has become the juggernaut of this business is because of the moat, as everybody likes to call it, that they built around their CUDA platform. So replacing Nvidia GPU hardware with Intel hardware is not going to be all that straightforward because of the software ecosystem.”
Wang says that the AI industry as a whole, from Nvidia to Cerebras, is now embracing open-source software, which is helping avoid vendor or platform lock-in (like Nvidia did with CUDA) because the software is cross-platform. So customers can choose the hardware and not be forced to pick a platform based on the software available.
“The shift to open source is a very new phenomenon,” Wang said. “And it’s been very helpful for the industry, because the end result is that one person paid for it, but everyone else in the world gets to benefit from it.”
“We want to make sure that startups and our customers have choices, and that they can use multiple vendors and mix and match things and reprogram things as they see fit to avoid the network lock in,” said Ainstein’s Morales. Ainstein uses Grok systems from the Elon Musk-backed xAI, but its AI agents work on all platforms.
Evolving processor designs eye programmability
O’Donnell believes that the next step in the evolution of AI processing will be the emergence of custom, programmable chips, “FPGAs on steroids,” he said. “In FPGA you can reprogram it to do different things. And it’ll do those things pretty well. I think we’re going to see some real headway there, probably in the latter half of this decade.”
Morales concurs, stating that hardware vendors can’t be locked into one type of model. “Hardware manufacturers are going to have to offer similar chips that are programmable, that can be repurposed to run different models,” he said. “Consumers will have that choice where they can use a device for whatever, against whatever model they wish. So I feel like that is definitely a direction that the industry is going to be trending towards.”
O’Donnell doesn’t believe that most of these startups have much of a chance to dominate, especially up against monsters like Nvidia and Intel. “But I think some of them will find their niche and do well within that niche. I don’t know that any of them are going to explode on the scene. But who knows? Some of them may be acquired just to get some of their intellectual property,” he said.
More GPU news: