New AMD ROCm 6.3 Release Expands AI and HPC Horizons

Presenting SGLang, a new runtime optimized for inferring state-of-the-art generative models like LLMs and VLMs on AMD Instinct GPUs 

According to research, you can outperform current systems on LLM inferencing by up to 6X, allowing your company to support AI applications on a large scale

With Python integrated and pre-configured in the ROCm Docker containers, developers can quickly construct scalable cloud backends

AMD resolves these issues with FlashAttention-2 designed for ROCm 6.3, allowing for quicker, more effective training and inference

Direct GPU Offloading: Use OpenMP offloading to take advantage of AMD Instinct GPUs and speed up important scientific applications

Backward Compatibility: Utilize AMD’s next-generation GPU capabilities while building upon pre-existing Fortran code

Streamlined Integrations: Connect to ROCm Libraries and HIP Kernels with ease, removing the need for intricate code rewrites