OpenAI’s GPT-4o Realtime API: Low-Latency Voice Interface

The Realtime API allows natural speech-to-speech discussions utilizing the six preset voices that are already supported by the API, much to ChatGPT’s Advanced Voice Mode

OpenAI is excited to see how developers use these new powers to craft engaging new audio experiences for consumers in a range of contexts, including education, customer support, translation and accessibility

As of now,,Tier 5 developers can only use the API for up to 100 simultaneous sessions, while Tiers 1-4 are subject to lesser rate limits. In order to accommodate larger deployments, OpenAI will gradually raise these limitations

Additional modalities: The Realtime API will initially handle voice, and OpenAI intend to gradually add more modalities including vision and video

Prompt Caching: In order to allow for the discounted processing of earlier discussion turns, OpenAI will incorporate support for Prompt Caching

Official SDK support: The OpenAI Python and Node.js SDKs will include Realtime API functionality

Increased model compatibility: Future iterations of the GPT-4o mini will also be supported by the Realtime API