OpenAI launches new real-time voice API with GPT-Realtime-2, real-time translation across more than 70 languages, and live transcription

•

OpenAI has announced three new voice features in its API suite aimed at developers building applications for real-time conversation, translation, and transcription.

New API voice capabilities

The latest model, GPT-Realtime-2, is designed to generate natural-sounding speech and interact with users directly. OpenAI says it includes reasoning capabilities comparable to GPT-5, enabling it to handle more complex requests than models that primarily follow pre-scripted prompts.

GPT-Realtime-Translate is a real-time translation feature intended to match the pace of natural conversation. It supports more than 70 input languages and 13 output languages.

GPT-Realtime-Whisper provides live transcription during a conversation, producing speech-to-text as the dialogue unfolds.

Why the update matters

OpenAI positions the new models as a step beyond basic question-and-answer voice interactions, toward a more complete voice interface that can listen, reason, translate, transcribe, and act throughout a conversation.

Target users and security measures

OpenAI says businesses looking to expand customer-service capabilities are among the most obvious target users. It also notes the features are suitable for education, media, live events, and content-creation platforms.

On misuse risk, OpenAI says safeguards are in place to prevent abuse such as spam, scams, or other forms of online abuse. It also states the system can automatically disconnect a conversation when policy violations are detected.

Pricing and availability

All three models are part of OpenAI’s Realtime API. OpenAI says GPT-Realtime-Translate and GPT-Realtime-Whisper are priced per minute, while GPT-Realtime-2 is priced per token.