Running PyTorch models on Habana HPUs required only minor adjustments.
Lazy and eager modes operated without a hitch during training, but all three modes performed well during inference.
Near-linear scalability for distributed training and linear performance for inference.