MaxDiffusion: Diffusion model inference efficiency

There is a growing need for high-performance, low-cost AI inference (serving) in the quickly changing field of artificial intelligence

JetStream is particularly designed for LLMs and marks a major advancement in both performance and cost effectiveness

Google is pleased to provide the most recent MLPerf Inference v4.0 performance results

Google developed JetStream, an inference engine that offers up to three times more inferences per dollar than earlier Cloud TPU inference engines

JetStream supports your favourite framework, whether you’re using PyTorch or JAX

JetStream provides up to 4783 tokens/second for open models, such as Gemma in MaxText and Llama 2

Google is decided on Cloud TPU v5e with MaxText, JAX, and JetStream for this reason