Scaling

Shipping with Codex
DevDay talk on building, testing, and delivering products with Codex.

Rate limits guide
Guide to understanding and managing rate limits

Balance accuracy, latency, and cost
Talk on optimizing AI systems for accuracy, speed, and cost.

DevDay — optimization breakout
DevDay session discussing optimization of models and prompts.

Evals Best Practices
Best practices for designing and running evals.

Getting Started with Evals
Step-by-step guide to setting up your first eval.

Graders
Guide to using graders for evaluations.

Keep costs low & accuracy high
Guide on balancing cost efficiency with model accuracy.

Latency optimization guide
Best practices for reducing model response latency.

Launch apps with evaluations
Video on incorporating evals when deploying AI products.

LLM correctness and consistency
Best practices for achieving accurate and consistent model outputs.

Model optimization guide
Guide on optimizing OpenAI models for performance and cost.

Predicted outputs guide
Guide to understanding and using predicted outputs.

Production best practices
Guide on best practices for running AI applications in production

Prompt Optimizer
Guide to refining prompts with the Prompt Optimizer.

Reinforcement fine-tuning overview
Guide on reinforcement learning-based fine-tuning techniques.

Working with the Evals API
Guide to building evaluations with the Evals API.

Eval Driven System Design - From Prototype to Production
Cookbook for eval-driven design of a receipt parsing automation workflow.

Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
Cookbook for reinforcement fine-tuning conversational reasoning using HealthBench evaluations.

Evals API Use-case - Responses Evaluation
Cookbook to evaluate new models against stored Responses API logs.