TorchDynamo Method For Improving PyTorch Code Performance

TorchDynamo works by tying into the frame evaluation process of Python, which is made possible by PEP 523, and examining Python bytecode while it is running

TorchDynamo and related technologies provide a major step forward in PyTorch’s capacity to effectively aggregate and optimize machine learning models

It’s important to note that while TorchDynamo was originally used to describe the whole functionality, it is now known by its API name “torch.compile” in the most recent PyTorch documentation

GPU Support via Triton: OpenAI’s Triton is a domain-specific language (DSL) for Python that is used to write GPU-accelerated machine learning kernels

CPU Optimizations: For more than 94% of inference and training kernels in PyTorch models, Intel has provided vectorization utilizing the AVX2 and AVX512 instruction sets

PrimTorch: Reduces the original PyTorch operations to a set of around 250 primitive operators, hence simplifying and reducing the number of operators that backend compilers must implement

TorchInductor: The backend compiler that converts the computational graphs that are recorded into machine code that is optimized. Both CPU and GPU optimizations are supported by TorchInductor

AOTAutograd: Enhances training and inference performance by concurrently tracing forward and backward computational graphs in advance