Intel AMX Improves PyTorch Faster Training And Inference

PyTorch, a deep learning framework based on Torch, is mostly used in computer vision and natural language processing

Each 4th generation Intel Xeon processor core has an integrated accelerator called Intel AMX, which speeds up workloads related to deep learning training and inference

3rd generation Intel Xeon processors provide FP32 data types via Intel AVX-512 instructions, although Intel AMX only supports BF16 and INT8.

Bfloat16 is a floating-point format that represents the approximate dynamic range of 32-bit floating-point integers while using only 16 bits of computer memory

Use the optimize() method in the Intel Extension for PyTorch on the model and preferred training optimizer after instantiating the ResNet50 model

It boasts Intel AMX BF16 and INT8’s superior performance than FP32

Intel AVX-512 Vector Neural Network Instructions (VNNI) INT8, the prior instruction set for INT8 operations, is compared to the Intel AMX INT8

Intel Extension for PyTorch’s quantization capability is used to quantize the original FP32 model for run cases using INT8