Primary navigation

Legacy APIs

gpt-realtime-whisper
gpt-realtime-whisper
Streaming speech-to-text model for realtime transcription
Performance
Speed
Price
$0.017
Input
Output

GPT Realtime Whisper is a streaming speech-to-text model for applications that need low-latency transcript deltas from live audio. It is designed for realtime use cases where developers need to tune latency and accuracy. GPT Realtime Whisper is priced by audio duration rather than text tokens.

16,000 context window
2,000 max output tokens
Sep 30, 2024 knowledge cutoff
Pricing
Pricing is based on the number of tokens used, or other metrics based on the model type. For tool-specific models, like search and computer use, there’s a fee per tool call. See details in the pricing page.
Realtime audio duration
Per minute
Price
$0.017

GPT Realtime Whisper is priced by audio duration rather than text tokens.

Modalities
Text
Input and output
Image
Not supported
Audio
Input only
Video
Not supported
Endpoints
Chat Completions
v1/chat/completions
Responses
v1/responses
Realtime
v1/realtime
Realtime translation
v1/realtime/translations
Realtime transcription
v1/realtime/transcription_sessions
Assistants
v1/assistants
Batch
v1/batch
Fine-tuning
v1/fine-tuning
Embeddings
v1/embeddings
Image generation
v1/images/generations
Videos
v1/videos
Image edit
v1/images/edits
Speech generation
v1/audio/speech
Transcription
v1/audio/transcriptions
Translation
v1/audio/translations
Moderation
v1/moderations
Completions (legacy)
v1/completions
Features
Streaming
Supported
Function calling
Not supported
Structured outputs
Not supported
Fine-tuning
Not supported
Predicted outputs
Not supported
Snapshots
Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for gpt-realtime-whisper.
gpt-realtime-whisper
gpt-realtime-whisper
gpt-realtime-whisper
gpt-realtime-whisper
Rate limits
Rate limits ensure fair and reliable access to the API by placing specific caps on requests, tokens, audio duration, or other usage within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.
TierMinutes-of-audio per minute
FreeNot supported
Tier 1100
Tier 2350
Tier 3650
Tier 41,000
Tier 51,300