gpt-oss-20b
Default
Medium-sized open-weight model for low latency
Medium-sized open-weight model for low latency
Reasoning
Higher
Speed
Medium
Input
Text
Output
Text
gpt-oss-20b is our medium-sized open-weight model for low latency, local, or
specialized use-cases (21B parameters with 3.6B active parameters).
Download gpt-oss-20b on HuggingFace.
Key features
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution, and structured outputs.
131,072 context window
131,072 max output tokens
Jun 01, 2024 knowledge cutoff
Reasoning token support
Modalities
Text
Input and output
Image
Not supported
Audio
Not supported
Video
Not supported
Endpoints
Chat Completions
v1/chat/completions
Responses
v1/responses
Realtime
v1/realtime
Assistants
v1/assistants
Batch
v1/batch
Fine-tuning
v1/fine-tuning
Embeddings
v1/embeddings
Image generation
v1/images/generations
Videos
v1/videos
Image edit
v1/images/edits
Speech generation
v1/audio/speech
Transcription
v1/audio/transcriptions
Translation
v1/audio/translations
Moderation
v1/moderations
Completions (legacy)
v1/completions
Features
Streaming
Supported
Function calling
Supported
Structured outputs
Supported
Snapshots
Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for gpt-oss-20b.
gpt-oss-20b
gpt-oss-20b
gpt-oss-20b
Rate limits
Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.
| Tier | RPM | TPM | Batch queue limit |
|---|---|---|---|
| Free | Not supported | ||
| Tier 1 | 0 | 0 | 0 |
| Tier 2 | 0 | 0 | 0 |
| Tier 3 | 0 | 0 | 0 |
| Tier 4 | 0 | 0 | 0 |
| Tier 5 | 0 | 0 | 0 |