Hexe-LLM: High-Efficiency LLM for TPU-Vertex AI

Google Cloud aims to provide extremely effective and cost-optimized ML workflow recipes with Vertex AI Model Garden

Google debuted the well-liked open source LLM serving stack vLLM on GPUs at Vertex Model Garden

Vertex AI's Hex-LLM LLM serving framework was developed for Google Cloud TPU hardware, part of the AI Hypercomputer

Google is committed to provide Hex-LLM with the latest foundation models and advanced technology as LLM develops

A sample of the ShareGPT dataset, a commonly used dataset with prompts and outputs of varying durations, is used to benchmark Hex-LLM

The performance of the Llama 2 70B (int8 weight quantised) and Gemma 7B versions on eight TPU v5e chips

Vertex Artificial Intelligence Model, a pre-deployed Vertex AI Prediction endpoint that is incorporated into the user interface is Garden’s playground