Intel AMX Improves PyTorch Faster Training And Inference
PyTorch, a deep learning framework based on Torch, is mostly used in computer vision and natural language processing
Each 4th generation Intel Xeon processor core has an integrated accelerator called Intel AMX, which speeds up workloads related to deep learning training and inference
3rd generation Intel Xeon processors provide FP32 data types via Intel AVX-512 instructions, although Intel AMX only supports BF16 and INT8.
Bfloat16 is a floating-point format that represents the approximate dynamic range of 32-bit floating-point integers while using only 16 bits of computer memory
Use the optimize() method in the Intel Extension for PyTorch on the model and preferred training optimizer after instantiating the ResNet50 model
It boasts Intel AMX BF16 and INT8’s superior performance than FP32
Intel AVX-512 Vector Neural Network Instructions (VNNI) INT8, the prior instruction set for INT8 operations, is compared to the Intel AMX INT8
Intel Extension for PyTorch’s quantization capability is used to quantize the original FP32 model for run cases using INT8