Intel GPU Multi-Layer Perceptrons with SYCL

Intel proudly introduces the first SYCL implementation of fully-fused Multi-Layer Perceptrons on Intel GPUs that support Intel XMX instructions

The CUDA PyTorch version running on Nvidia’s H100 GPU by up to a factor of 19, and it beats the pre-made Intel Extension for PyTorch (IPEX)implementation running on the same Intel GPU by up to a factor of 30

A SYCL implementation of Multi-Layer Perceptrons (MLPs) optimised for the Intel Data Centre GPU Max 1550 is shown in this work

Compare Intel Data Centre GPU to CUDA for MLPs and find 2.84 inference and 1.75 training (compared to Nvidia's H100 GPU)

In all settings, Intel's technique surpasses CUDA PyTorch on Nvidia's H100 GPU by 19 times and the off-the-shelf Intel Extension for PyTorch (IPEX) by 30 times

Intel's method allows high-throughput training and inference because to its efficient use of Intel Data Centre GPUs

The approach also provides Python bindings that elegantly integrate GPU-accelerated MLPs into PyTorch applications

Intel tested Intel's SYCL implementation on an Intel Data Centre GPU Max 1550 with the CUDA implementation on an Nvidia H100 GPU and PyTorch