Hexe-LLM: High-Efficiency LLM for TPU-Vertex AI
Google Cloud aims to provide extremely effective and cost-optimized ML workflow recipes with Vertex AI Model Garden
Google debuted the well-liked open source LLM serving stack vLLM on GPUs at Vertex Model Garden
Vertex AI's Hex-LLM LLM serving framework was developed for Google Cloud TPU hardware, part of the AI Hypercomputer
Google is committed to provide Hex-LLM with the latest foundation models and advanced technology as LLM develops
A sample of the ShareGPT dataset, a commonly used dataset with prompts and outputs of varying durations, is used to benchmark Hex-LLM
The performance of the Llama 2 70B (int8 weight quantised) and Gemma 7B versions on eight TPU v5e chips
Vertex Artificial Intelligence Model, a pre-deployed Vertex AI Prediction endpoint that is incorporated into the user interface is Garden’s playground
For optimum flexibility, use the Vertex Python SDK to deploy a Vertex Prediction endpoint with Hex-LLM using Colab Enterprise notebook examples
For more details
govindhtech.com