TorchDynamo Method For Improving PyTorch Code Performance
TorchDynamo works by tying into the frame evaluation process of Python, which is made possible by PEP 523, and examining Python bytecode while it is running
TorchDynamo and related technologies provide a major step forward in PyTorch’s capacity to effectively aggregate and optimize machine learning models
It’s important to note that while TorchDynamo was originally used to describe the whole functionality, it is now known by its API name “torch.compile” in the most recent PyTorch documentation
GPU Support via Triton: OpenAI’s Triton is a domain-specific language (DSL) for Python that is used to write GPU-accelerated machine learning kernels
CPU Optimizations: For more than 94% of inference and training kernels in PyTorch models, Intel has provided vectorization utilizing the AVX2 and AVX512 instruction sets
PrimTorch: Reduces the original PyTorch operations to a set of around 250 primitive operators, hence simplifying and reducing the number of operators that backend compilers must implement
TorchInductor: The backend compiler that converts the computational graphs that are recorded into machine code that is optimized. Both CPU and GPU optimizations are supported by TorchInductor
AOTAutograd: Enhances training and inference performance by concurrently tracing forward and backward computational graphs in advance