Evals

Build, deploy, and optimize agentic workflows with AgentKit
Cookbook walkthrough for building and deploying agentic workflows with AgentKit.

Building resilient prompts using an evaluation flywheel
Cookbook for creating an evaluation flywheel to improve prompts.

Eval Driven System Design - From Prototype to Production
Cookbook demonstrating eval-driven system design workflows.

Evals API Use-case - MCP Evaluation
Cookbook example demonstrating MCP evaluation with the Evals API.

Evals API Use-case - Structured Outputs Evaluation
Cookbook example demonstrating structured outputs evaluation with the Evals API.

Evals API Use-case - Tools Evaluation
Cookbook example demonstrating tool evaluation with the Evals API.

Evals API Use-case - Web Search Evaluation
Cookbook example demonstrating web search evaluation with the Evals API.

Evals Best Practices
Best practices for designing and running evals.

Getting Started with Evals
Step-by-step guide to setting up your first eval.

Graders
Guide to using graders for evaluations.

Graders for Reinforcement Fine-Tuning
Cookbook on how to use graders for RFT tasks. showing different approaches to evaluating models with the OpenAI API.

Launch apps with evaluations
Video on incorporating evals when deploying AI products.

Model optimization guide
Guide on optimizing OpenAI models for performance and cost.

Prompt Optimizer
Guide to refining prompts with the Prompt Optimizer.

Using Evals API on Audio Input
Cookbook example showing evals on audio inputs.

Using Evals API on Image Inputs
Cookbook example showing evals on image inputs.

Working with the Evals API
Guide to building evaluations with the Evals API.