OpenAI for Developers in 2025

2025 wasn’t about a single model launch–it was the year AI got easier to run in production. As models improved at planning, tool use, and longer-horizon tasks, more teams shifted from “prompting step-by-step” to delegating work to agents.

For developers, that shift showed up in a few concrete places:

Reasoning became a core dial and increasingly converged with general-purpose chat models.
Multimodality (docs, audio, images, video) became a first-class citizen in the API.
Agent building blocks (Responses API, Agents SDK, AgentKit) made multi-step workflows easier to ship and operate.
Codex made it possible to build faster and better than ever.

TL;DR

The big shift was agent-native APIs plus better models that can perform more complex tasks, requiring reasoning and tool use.
Codex matured across both models and tooling, pairing GPT-5.2-Codex’s repo-scale reasoning with a production-ready CLI, web, and IDE workflows for long-horizon coding tasks.
Improved tooling made it easier to connect models to real systems with fewer rough edges.
Multimodal inputs and outputs (PDFs, images, audio, video) became a practical default in end-to-end workflows.
Evals, graders, and tuning features matured into a more repeatable “measure -> improve -> ship” loop.

Read on for a roundup of major model, API, and platform updates in 2025, and learn how it can help you ship production-grade agents.

Reasoning: from separate models to a unified line

After we first introduced the reasoning paradigm at the end of 2024, where we started giving models “time to think”, early 2025 was the era of reasoning models as a distinct family. Models like o1, o3, and o4-mini made it clear that spending extra compute to think before answering could dramatically improve reliability on complex, multi-step work.

It’s also worth calling out that o3-mini was one of the first signals that reasoning wouldn’t just be a frontier-only feature; it could be delivered in cost-efficient, developer-friendly form factors.

By mid-late 2025, the big trend was convergence: reasoning depth, tool use, and conversational quality increasingly lived inside the same flagship model line (for most teams, “pick a model” became more about cost/latency/quality tradeoffs than choosing between fundamentally different families).

Reasoning-first releases like o1, o3 / o4-mini, and o3-mini helped make “think harder vs. respond faster” a tunable developer decision. As the year progressed, those ideas were increasingly absorbed into the GPT-5.x family, unifying general intelligence, reasoning depth, coding specialization, and multimodality under a single model line.

Multimodality: audio, vision, images, and video

By the end of 2025, multimodal stopped meaning “it can accept an image input” and started meaning “you can build an end-to-end product across modalities”—often in a single workflow.

Audio + realtime

Next-generation audio models improved speech-to-text accuracy and added more controllable text-to-speech, supporting production-grade voice pipelines.
The Realtime API went GA and enabled low-latency, bidirectional audio streaming, making production-grade live voice agents and conversational interfaces viable.

Images

GPT Image 1 introduced a new generation of image generation models, producing high-quality images and structured edits with a strong understanding of the world and better instruction following.
High input fidelity made it possible to preserve details like faces and logos more consistently when editing images.
GPT Image 1 mini made native image generation more cost efficient.
GPT Image 1.5, our most advanced generation model, marked a step change in image quality and edit consistency.
Image generation as a tool in the Responses API enabled image creation as part of multi-turn conversations, in combination with other tools.

Video

Sora 2 & Sora 2 Pro models introduced higher-fidelity video generation with stronger temporal coherence and remixing support.
The Video API exposed video generation and editing via v1/videos, making video a first-class modality in the API alongside text, images, and audio.

PDFs and documents

PDF inputs enabled document-heavy workflows directly in the API.
PDF-by-URL reduced friction by referencing documents without upload.

Why it matters: you can now rely on the OpenAI platform for not only text & vision but also your image and video generation workflows as well as speech-to-speech use cases.

Codex

In 2025, Codex moved beyond being just a coding model and became your Software Engineer teammate: connecting models, local tooling, and cloud to help developers tackle longer, more complex coding tasks.

Models

Early reasoning models demonstrated strong gains on complex coding tasks (multi-file edits, debugging, planning). By mid-late 2025, these capabilities were consolidated into the GPT-5 family, with GPT-5.2-Codex becoming the latest default choice for code generation, review, and repo-scale reasoning—no longer separate from general-purpose models, but specialized within them.

CLI

The open-source Codex CLI (GitHub) brought agent-style coding directly into local environments, enabling developers to run Codex over real repositories, iteratively review changes, and apply edits to files with human oversight. This made long-horizon coding tasks practical in day-to-day workflows.

Codex also became easier to operationalize beyond interactive use, with built-in support for repeatable automation patterns like scripting Codex.

Safety, control, and integrations

Codex leaned into the realities of shipping: sandboxing and approval modes made it easier to keep humans in the loop. At the same time, support for AGENTS.md and MCP made Codex easier to adapt to your repo, extend with third-party tools and context, and even orchestrate Codex via the Agents SDK (by running the CLI as an MCP server).

Web, cloud, and IDE

Beyond the CLI, Codex expanded support for longer sessions and iterative problem solving across the web + cloud and the IDE extension, tightening the loop between conversational reasoning and concrete code changes. Teams could also automate parts of the workflow with Codex Autofix in CI.

Why it matters: by the end of 2025, Codex functioned less as “a model you prompt” and more as a coding surface–combining reasoning-capable models with tools developers already use.

Platform shift: Responses API and agentic building blocks

One of the most important platform changes in 2025 was the move toward agent-native APIs.

The Responses API made it easier to build for the new generation of models:

Support for multiple inputs and outputs, including different modalities
Support for reasoning controls and summaries
Better support for tool calling, including during reasoning

On top of that foundation, 2025 also brought higher-level building blocks like the open-source Agents SDK and AgentKit, making it easier to build and orchestrate agents.

State and persistence also became easier to manage:

Conversation state (plus the Conversations API) for durable threads and replayable state
Connectors and MCP servers for incorporating external context and taking actions through trusted tool surfaces

Why it matters: building multi-step agents and long-running workflows now requires less custom glue code and state management.

Alongside strong primitives, we introduced a set of powerful built-in tools to maximize the utility of models.

Tools: from web search to workflows

In 2025, we launched a set of standardized, composable capabilities that let agents do useful work safely.

Web search provided a simple retrieval primitive for agents that need up-to-date information and citations.
File search (vector stores) provided a default hosted RAG primitive that composes cleanly with Responses + Structured Outputs.
Code Interpreter ran Python in sandboxed containers for data work, file transforms, and iterative debugging.
Computer use enabled “click/type/scroll” automation loops (best paired with sandboxing and human-in-the-loop).

Why it matters: agents can reliably retrieve, compute, and act without every team reinventing a custom tool runtime.

Run and scale: async, events, and cost controls

Once agents moved from “single request” to “multi-step jobs,” production teams needed primitives for cost, latency, and reliability.

Prompt caching reduced latency and input costs when prompts share long, repeated prefixes (system prompts, tools, schemas).
Background mode enabled long-running responses without holding a client connection open.
Webhooks turned “polling everything” into event-driven systems (batch completion, background completion, fine-tuning completion).
Rate limits and workload optimization guidance matured as usage tiers and model families expanded.

Why it matters: building agents became as much about system design (async + events + budgets) as prompting.

Open standards and open-source agent building blocks

Alongside API consolidation, 2025 emphasized interoperability and composability for agentic systems.

The open-source Agents SDK for Python (GitHub) and TypeScript (GitHub) established practical building blocks for tool use, handoffs, guardrails, and tracing—and is provider-agnostic, with documented paths for using non-OpenAI models.
AgentKit added higher-level tooling around agent development (including Agent Builder, ChatKit, Connector Registry, and evaluation loops) for teams that want to ship and iterate faster.
On the standards side, OpenAI pushed AGENTS.md (spec) and participated in the AAIF (Agentic AI Foundation) alongside other ecosystem standards like Model Context Protocol (MCP) and Skills. The value for developers: more portable agent tooling and fewer one-off integrations as the ecosystem converges on shared conventions.

In addition to our work on agents and related standards, we introduced the Apps SDK—an open-source framework that extends the Model Context Protocol (MCP) to let developers build UIs alongside their MCP servers, defining both the logic and interactive interface of applications that can run in clients like ChatGPT.

Why it matters: developers can build agents that are less tightly coupled to a single runtime or UI surface, and more easily integrate OpenAI-powered agents into heterogeneous systems.

Open-weight models

In addition to hosted APIs, OpenAI released open-weight models designed for transparency, research, and on-prem or self-hosted deployment while retaining strong reasoning and instruction-following capabilities.

gpt-oss 120b & 20b reasoning models designed for self-hosting and on-prem deployments.
gpt-oss-safeguard 120b & 20b safety and policy models intended to run alongside gpt-oss.

Evaluation, tuning, and shipping safely

Evals API for eval-driven development.
Reinforcement fine-tuning (RFT) using programmable graders.
Supervised fine-tuning / distillation for pushing quality down into smaller, cheaper models once you’ve validated a task with a larger one.
Graders and the Prompt optimizer helped teams run a tighter “eval → improve → re-eval” loop.

Wrapping up

Throughout 2025, we focused on a few consistent themes aimed at making it easier for developers to build and ship on our platform:

Scaled, controllable reasoning as a core capability
A unified, agent-native API surface
Open building blocks and emerging interoperability standards
Deep multimodal support across text, images, audio, video, and documents
Stronger production tooling for evaluation, tuning, and deployment

Recommended models by task (end of 2025)

If you’re starting a new build or modernizing an integration, these are reasonable “default picks” for your task.

General-purpose (text + multimodal): GPT-5.2 for chat, long-context work, and multimodal inputs.
Deeper reasoning / reliability-sensitive workloads: GPT-5.2 Pro for planning and tasks where quality is worth additional compute.
Coding and software engineering: GPT-5.2-Codex for code generation, review, repo-scale reasoning, and tool-driven coding agents.
Image generation and editing: GPT Image 1.5 for higher-fidelity image generation and iterative edits.
Realtime voice: gpt-realtime for low-latency speech-to-speech and live voice agents.

For up-to-date availability and tiering, see the official model comparison page.

These updates set the foundation for what comes next. Thank you for building with us in 2025—we’re looking forward to what you’ll create in 2026.

Links and resources

Prompt Optimizer
Model comparison (current names, availability, and tiering)
Agents SDK (Python) and Agents SDK (TypeScript)
Codex docs and Codex CLI GitHub
Image Playground
Platform changelog (what shipped, when)

Search the blog

Categories

Topics

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Recent