PyTorch 2.5: Leveraging Intel AMX For Faster FP16 Inference
Your AI performance is improved and made simpler using Intel AMX. Designed to meet the computational demands of deep learning applications, it is an integrated accelerator on Intel Xeon Scalable CPUs
CuDNN Backend for SDPA: SDPA users with H100s or more recent GPUs may benefit from speedups by default with to the CuDNN Backend for SDPA
Increased GPU Support: PyTorch 2.5 now supports Intel GPUs and has additional tools to enhance AI programming on client and data center hardware
Torch Compile Improvements: For a variety of deep learning tasks, Torch.compile has been improved to better inference and training performance
TorchInductor C++ Backend: Now accessible on Windows, the TorchInductor C++ backend improves the user experience for AI developers working in Windows settings
FP16 Datatype Optimization: Intel Advanced Matrix Extensions for TorchInductor and eager mode enable and optimize the FP16 datatype, improving inference capabilities on the newest Intel data center CPU architectures
SYCL Kernels: By improving Aten operator coverage and execution on Intel GPUs, SYCL kernels improve PyTorch eager mode performance