Americas

  • United States
Anirban Ghoshal
Senior Writer

Nvidia expands partnership with hyperscalers to boost AI training and development

News
Mar 19, 20244 mins
Cloud ComputingGenerative AIGPUs

As part of its extended collaboration with AWS, GCP, Microsoft, IBM, and Oracle, the chip designer will share its new Blackwell GPU platform, foundational models, and integrate its software across platforms of hyperscalers.

Nvidia Blackwell
Credit: Nvidia

Nvidia is extending its existing partnerships with hyperscalers Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure, and Oracle Cloud Infrastructure, to make available its latest GPUs and foundational large language models (LLMs), and to integrate its software across their platforms.

AWS, for instance, will offer Nvidia’s Blackwell GPU platform, featuring the latest GB200 NVL72 server rack that comes with 72 Blackwell GPUs and 36 Grace CPUs interconnected by Nvidia’s high-speed GPU connecting framework NVLink, as part of its cloud. 

[ Related: More Nvidia news and insights ]

“When connected with Amazon’s powerful networking (EFA), and supported by advanced virtualization (AWS Nitro System) and hyper-scale clustering (Amazon EC2 UltraClusters), enterprises can scale to thousands of GB200 Superchips,” the companies said in a joint statement.

Further, the companies said they expect the availability of Nvidia’s Blackwell platform on AWS to speed up inference workloads for multi-trillion parameter LLMs.

Nvidia will also make the Blackwell GB200 GPUs available in the AWS cloud via its own DGX Cloud AI training service, which hosts in other vendors’ clouds. DGX was initially only available in Microsoft Azure and Oracle Cloud Infrastructure, but last November AWS said it would begin offering it too.

Another feature of the expanded partnership is that Nvidia will offer its NIM microservices inside Amazon SageMaker, AWS’ machine learning platform, to help enterprises deploy foundational LLMs that are pre-compiled and optimized to run on Nvidia GPUs. This will reduce the time-to-market for generative AI applications, the companies said.

Other collaborations between AWS and Nvidia include the use of Nvidia’s BioNeMo foundational model for generative chemistry, protein structure prediction, and understanding how drug molecules interact with targets via AWS’ HealthOmics offering. The two companies’ healthcare teams are also working together to launch generative AI microservices to advance drug discovery, medtech, and digital health, they said.

Google Cloud to get Blackwell-powered DGX Cloud

Google Cloud Platform, like AWS, will be getting the new Blackwell GPU platform and integrating Nvidia’s NIM suite of microservices into Google Kubernetes Engine (GKE) to speed up AI inferencing and deployment. In addition, Nvidia DGX Cloud is now generally available on Google Cloud A3 VM instances powered by NVIDIA H100 Tensor Core GPUs, Google and Nvidia said in a joint statement.  

The two companies are also extending their partnership to bring Google’s JAX machine learning framework for transforming numerical functions to Nvidia’s GPUs. This means that enterprises will be able to use JAX for LLM training on Nvidia’s H100 GPUs via MaxText and Accelerated Processing Kit (XPK), the companies said.

In order to help enterprises with data science and analytics, Google said that its Vertex AI machine learning platform will now support Google Cloud A3 VMs powered by Nvidia’s H100 GPUs and G2 VMs powered by Nvidia’s L4 Tensor Core GPUs.

“This provides MLops teams with scalable infrastructure and tooling to manage and deploy AI applications. Dataflow has also expanded support for accelerated data processing on Nvidia GPUs,” the companies said.

Oracle and Microsoft too

Other hyperscalers, such as Microsoft and Oracle, has also partnered with Nvidia to integrate the chipmaker’s hardware and software to beef up their offerings.

Not only are both companies adopting the Blackwell GPU platform across their services, they are also expected to see the adoption of Blackwell-powered DGX Cloud.

IBM, on the other hand, said nothing about Nvidia hardware — but its consulting team will integrate Nvidia software components such as the NIM microservices suite to help enterprises on their AI development journeys.