Google Trillium’s Cost-Effective Breakthrough In MLPerf 4.1

To meet the needs of next-generation models, Google introduced Trillium, its sixth-generation Tensor Processing Unit (TPU)

Google Trillium offers an astounding 99% scaling efficiency and up to 1.8x greater performance-per-dollar than previous-generation Cloud TPU v5p

It compares Google Trillium to Cloud TPU v5p and evaluates these two criteria in addition to performance per dollar

Effectively achieving model convergence, even while hardware usage and scaling indicators offer valuable system insights

Google submitted the GPT3-175b training results for three distinct Cloud TPU v5p configurations and four distinct Google Trillium configurations

MaxText, Google’s high-performance reference solution for Cloud TPUs and GPUs, provides the foundation for all of the findings in this investigation

Google Trillium achieves 99% scaling efficiency, surpassing the 94% scaling efficiency of Cloud TPU v5p cluster within a single ICI domain