Nvidia expands partnership with hyperscalers to boost AI training and development

As part of its extended collaboration with AWS, GCP, Microsoft, IBM, and Oracle, the chip designer will share its new Blackwell GPU platform, foundational models, and integrate its software across platforms of hyperscalers.

Credit: Nvidia

Nvidia is extending its existing partnerships with hyperscalers Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure, and Oracle Cloud Infrastructure, to make available its latest GPUs and foundational large language models (LLMs), and to integrate its software across their platforms.

AWS, for instance, will offer Nvidia’s Blackwell GPU platform, featuring the latest GB200 NVL72 server rack that comes with 72 Blackwell GPUs and 36 Grace CPUs interconnected by Nvidia’s high-speed GPU connecting framework NVLink, as part of its cloud.

[ Related: More Nvidia news and insights ]

“When connected with Amazon’s powerful networking (EFA), and supported by advanced virtualization (AWS Nitro System) and hyper-scale clustering (Amazon EC2 UltraClusters), enterprises can scale to thousands of GB200 Superchips,” the companies said in a joint statement.

Further, the companies said they expect the availability of Nvidia’s Blackwell platform on AWS to speed up inference workloads for multi-trillion parameter LLMs.

Nvidia will also make the Blackwell GB200 GPUs available in the AWS cloud via its own DGX Cloud AI training service, which hosts in other vendors’ clouds. DGX was initially only available in Microsoft Azure and Oracle Cloud Infrastructure, but last November AWS said it would begin offering it too.

Another feature of the expanded partnership is that Nvidia will offer its NIM microservices inside Amazon SageMaker, AWS’ machine learning platform, to help enterprises deploy foundational LLMs that are pre-compiled and optimized to run on Nvidia GPUs. This will reduce the time-to-market for generative AI applications, the companies said.

Other collaborations between AWS and Nvidia include the use of Nvidia’s BioNeMo foundational model for generative chemistry, protein structure prediction, and understanding how drug molecules interact with targets via AWS’ HealthOmics offering. The two companies’ healthcare teams are also working together to launch generative AI microservices to advance drug discovery, medtech, and digital health, they said.

Google Cloud to get Blackwell-powered DGX Cloud

Google Cloud Platform, like AWS, will be getting the new Blackwell GPU platform and integrating Nvidia’s NIM suite of microservices into Google Kubernetes Engine (GKE) to speed up AI inferencing and deployment. In addition, Nvidia DGX Cloud is now generally available on Google Cloud A3 VM instances powered by NVIDIA H100 Tensor Core GPUs, Google and Nvidia said in a joint statement.

The two companies are also extending their partnership to bring Google’s JAX machine learning framework for transforming numerical functions to Nvidia’s GPUs. This means that enterprises will be able to use JAX for LLM training on Nvidia’s H100 GPUs via MaxText and Accelerated Processing Kit (XPK), the companies said.

In order to help enterprises with data science and analytics, Google said that its Vertex AI machine learning platform will now support Google Cloud A3 VMs powered by Nvidia’s H100 GPUs and G2 VMs powered by Nvidia’s L4 Tensor Core GPUs.

“This provides MLops teams with scalable infrastructure and tooling to manage and deploy AI applications. Dataflow has also expanded support for accelerated data processing on Nvidia GPUs,” the companies said.

Oracle and Microsoft too

Other hyperscalers, such as Microsoft and Oracle, has also partnered with Nvidia to integrate the chipmaker’s hardware and software to beef up their offerings.

Not only are both companies adopting the Blackwell GPU platform across their services, they are also expected to see the adoption of Blackwell-powered DGX Cloud.

IBM, on the other hand, said nothing about Nvidia hardware — but its consulting team will integrate Nvidia software components such as the NIM microservices suite to help enterprises on their AI development journeys.

Nvidia expands partnership with hyperscalers to boost AI training and development

As part of its extended collaboration with AWS, GCP, Microsoft, IBM, and Oracle, the chip designer will share its new Blackwell GPU platform, foundational models, and integrate its software across platforms of hyperscalers.

[ Related: More Nvidia news and insights ]

Google Cloud to get Blackwell-powered DGX Cloud

Oracle and Microsoft too

More from this author

Oracle to offer 131,072 Nvidia Blackwell GPUs via its cloud

Google Cloud Run now allows AI inferencing on Nvidia GPUs

Google offered complainant €470 million to maintain Microsoft antitrust probe: Report

China seeks 30% growth in national compute capacity by 2025

Alibaba to cease data center operations in India and Australia

Microsoft lays off staffers from its Azure division

Alibaba Cloud is betting on emerging markets with massive price cuts

Microsoft Build 2024: Cloud infra updates include Cobalt 100-based VMs, access to Copilot in Azure

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

F5, Nvidia team to boost AI, cloud security

How to examine files on Linux

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command