GPT-4o mini TTS
Default
Text-to-speech model powered by GPT-4o mini
Text-to-speech model powered by GPT-4o mini
Performance
Higher
Speed
Fast
Input
Text
Output
Audio
GPT-4o mini TTS is a text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text to natural sounding spoken text. The maximum number of input tokens is 2000.
Modalities
Text
Input only
Image
Not supported
Audio
Output only
Video
Not supported
Endpoints
Chat Completions
v1/chat/completions
Responses
v1/responses
Realtime
v1/realtime
Assistants
v1/assistants
Batch
v1/batch
Fine-tuning
v1/fine-tuning
Embeddings
v1/embeddings
Image generation
v1/images/generations
Videos
v1/videos
Image edit
v1/images/edits
Speech generation
v1/audio/speech
Transcription
v1/audio/transcriptions
Translation
v1/audio/translations
Moderation
v1/moderations
Completions (legacy)
v1/completions
Snapshots
Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for GPT-4o mini TTS.
gpt-4o-mini-tts
gpt-4o-mini-tts-2025-12-15
gpt-4o-mini-tts-2025-03-20
gpt-4o-mini-tts-2025-12-15
Rate limits
Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.
| Tier | RPM | TPM |
|---|---|---|
| Free | Not supported | |
| Tier 1 | 500 | 50,000 |
| Tier 2 | 2,000 | 150,000 |
| Tier 3 | 5,000 | 600,000 |
| Tier 4 | 10,000 | 2,000,000 |
| Tier 5 | 10,000 | 8,000,000 |