The Realtime API allows natural speech-to-speech discussions utilizing the six preset voices that are already supported by the API, much to ChatGPT’s Advanced Voice Mode
OpenAI is excited to see how developers use these new powers to craft engaging new audio experiences for consumers in a range of contexts, including education, customer support, translation and accessibility
As of now,,Tier 5 developers can only use the API for up to 100 simultaneous sessions, while Tiers 1-4 are subject to lesser rate limits. In order to accommodate larger deployments, OpenAI will gradually raise these limitations
Additional modalities: The Realtime API will initially handle voice, and OpenAI intend to gradually add more modalities including vision and video
Prompt Caching: In order to allow for the discounted processing of earlier discussion turns, OpenAI will incorporate support for Prompt Caching
Official SDK support: The OpenAI Python and Node.js SDKs will include Realtime API functionality
Increased model compatibility: Future iterations of the GPT-4o mini will also be supported by the Realtime API