Azure powers intelligent services that have recently caught our attention, such as Microsoft Copilot, Bing, and Azure OpenAI Service
Large language models (LLMs) are the secret sauce behind these services, which enable a plethora of applications such as Microsoft Office 365, chatbots, and search engines with generative AI
How Microsoft uses LLMs to their fullest potentialBut developing new LLMs or enhancing the precision of already-existing ones is a difficult task
The most recent MLPerf 3.1 Training results demonstrate their constant dedication to developing high-caliber, high-performance cloud platforms in order to achieve unmatched efficiency when training large numbers of LLMs
On 1,344 ND H100 v5 virtual machines (VMs), which represent 10,752 NVIDIA H100 Tensor Core GPUs connected by the NVIDIA Quantum-2 InfiniBand networking infrastructure, the GPT-3 LLM model and its 175 billion parameters
The workload puts a strain on the Tensor Cores of the H100 GPUs, the direct-attached Non-Volatile Memory Express disks, and the NVLink interface
Azure has made remarkable progress in optimizing the size of training, as evidenced by its largest contribution in the history of MLPerf Training
Compared to the NVIDIA bare-metal submission, which offers the best-in-class virtual machine performance across all offerings of HPC instances in the cloud, this translates to just a 2 percent increase in training time in AzureFor more details visit govindhtech.com