L4 GPUs For Cloud Run

Google is introducing NVIDIA L4 GPU support to Cloud Run in preview today

You may quickly complete on-demand online AI inference with your preferred LLMs support for NVIDIA GPUs

With 24GB of vRAM, Llama 3.1, Mistral, and Gemma 2 models with up to 9 billion parameters should have fast token rates

Attaching a GPU with Cloud Run functions makes event-driven AI inference easy

NVIDIA GPUs enable Cloud Run to provide robust performance in addition to easy operations

The GPU can be used by container processes once run instances with a L4 GPU and pre-installed drivers start in 5 seconds

This provides safe, reliable deployment of high-performance AI model inferencing on Cloud Run using NVIDIA L4 GPUs