Implicit caching, released for the Gemini API, automatically provides cost savings by reusing cached tokens without requiring explicit cache creation
A cache hit occurs when a request shares the same prefix as a previous request, offering a 75% token discount dynamically passed back to developers
To increase cache hit likelihood, start prompts with large, common content and include varying context (e.g., user inquiries) at the end of the request
To increase cache hit likelihood, start prompts with large, common content and include varying context (e.g., user inquiries) at the end of the request
For implicit caching, Gemini 2.5 Flash requires a minimum of 1,024 tokens, while Gemini 2.5 Pro requires 2,048 tokens
Developers can manually cache input tokens for guaranteed cost savings, specifying a time-to-live (TTL) for cached tokens (default is 1 hour)
Cached tokens are charged at a reduced price, with explicit caching ensuring predictable cost reductions for repeated token usage
Free for up to 500 requests per day (RPD); $35 per 1,000 requests beyond 1,500 RPD for paid tiers
Costs depend on the size of input tokens and the TTL duration, making it ideal for large, reusable token sets
Implicit caching for Gemini 2.5 models became available on May 8, 2025, simplifying cost savings for developers without additional setup