PyTorch 2.7 introduces significant performance improvements for Intel GPUs, supporting both eager mode (torch.compile) and graph mode on Windows and Linux
Includes compatibility with Intel Arc A-Series, B-Series, Core Ultra Processors, and Data Centre GPU Max Series, with simplified installation via Torch-xpu PIP wheels
Intel GPUs are the first accelerators to support torch.compile on Windows, enabling graph mode compilation for improved inference and training performance
Scaled dot-product attention inference performance is optimized for bfloat16 and float16 data types, achieving up to 3x speed improvements for Stable Diffusion on Intel GPUs
PyTorch 2 Export (PT2E) optimizes full graph mode quantization pipelines, enhancing computational efficiency for Intel GPUs
Developers can now analyze model performance on both Linux and Windows using the built-in profiler
PyTorch 2.7 adds compatibility with NVIDIA’s Blackwell architecture, including pre-built CUDA 12.8 wheels for Linux x86 and arm64 architectures
A beta feature offering portable caching for Torch, allowing users to save compiler artifacts and reuse them across machines for faster compilations
Improved support for LLM inference on x86 CPUs, including new decoding backends, trainable biases, and layered jagged tensors for optimized performance
Enables context parallelism for scaled dot-product attention, supporting cuDNN, Efficient, and Flash attention backends, particularly useful for LLM training