gpt-oss-20b Model | OpenAI API

Models

gpt-oss-20b

Default

Medium-sized open-weight model for low latency

Reasoning

Higher

Speed

Medium

Input

Text

Output

Text

gpt-oss-20b is our medium-sized open-weight model for low latency, local, or specialized use-cases (21B parameters with 3.6B active parameters).

Download gpt-oss-20b on HuggingFace.

Key features

Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Full chain-of-thought: Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution, and structured outputs.

131,072 context window

131,072 max output tokens

Jun 01, 2024 knowledge cutoff

Reasoning token support

Modalities

Text

Input and output

Image

Not supported

Audio

Not supported

Video

Not supported

Endpoints

Chat Completions

v1/chat/completions

Responses

v1/responses

Realtime

v1/realtime

Assistants

v1/assistants

Batch

v1/batch

Fine-tuning

v1/fine-tuning

Embeddings

v1/embeddings

Image generation

v1/images/generations

Videos

v1/videos

Image edit

v1/images/edits

Speech generation

v1/audio/speech

Transcription

v1/audio/transcriptions

Translation

v1/audio/translations

Moderation

v1/moderations

Completions (legacy)

v1/completions

Features

Streaming

Supported

Function calling

Supported

Structured outputs

Supported

Snapshots

Snapshots let you lock in a specific version of the model so that performance and behavior remain consistent. Below is a list of all available snapshots and aliases for gpt-oss-20b.

gpt-oss-20b

Rate limits

Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.

Tier	RPM	TPM	Batch queue limit
Free	Not supported
Tier 1	0	0	0
Tier 2	0	0	0
Tier 3	0	0	0
Tier 4	0	0	0
Tier 5	0	0	0

Search the API docs

Get started

Core concepts

Agents

Tools

Run and scale

Evaluation

Realtime API

Model optimization

Specialized models

Coding agents

Going live

Legacy APIs

Resources

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Categories

Topics