Primary navigation

Evals

Build, deploy, and optimize agentic workflows with AgentKit

Build, deploy, and optimize agentic workflows with AgentKit

Cookbook walkthrough for building and deploying agentic workflows with AgentKit.

cookbook
Building resilient prompts using an evaluation flywheel

Building resilient prompts using an evaluation flywheel

Cookbook for creating an evaluation flywheel to improve prompts.

cookbook
Eval Driven System Design - From Prototype to Production

Eval Driven System Design - From Prototype to Production

Cookbook demonstrating eval-driven system design workflows.

cookbook
Evals API Use-case - MCP Evaluation

Evals API Use-case - MCP Evaluation

Cookbook example demonstrating MCP evaluation with the Evals API.

cookbook
Evals API Use-case - Structured Outputs Evaluation

Evals API Use-case - Structured Outputs Evaluation

Cookbook example demonstrating structured outputs evaluation with the Evals API.

cookbook
Evals API Use-case - Tools Evaluation

Evals API Use-case - Tools Evaluation

Cookbook example demonstrating tool evaluation with the Evals API.

cookbook
Evals API Use-case - Web Search Evaluation

Evals API Use-case - Web Search Evaluation

Cookbook example demonstrating web search evaluation with the Evals API.

cookbook
Evals Best Practices

Evals Best Practices

Best practices for designing and running evals.

guide
Getting Started with Evals

Getting Started with Evals

Step-by-step guide to setting up your first eval.

guide
Graders

Graders

Guide to using graders for evaluations.

guide
Graders for Reinforcement Fine-Tuning

Graders for Reinforcement Fine-Tuning

Cookbook on how to use graders for RFT tasks. showing different approaches to evaluating models with the OpenAI API.

cookbook
Launch apps with evaluations

Launch apps with evaluations

Video on incorporating evals when deploying AI products.

video
Model optimization guide

Model optimization guide

Guide on optimizing OpenAI models for performance and cost.

guide
Prompt Optimizer

Prompt Optimizer

Guide to refining prompts with the Prompt Optimizer.

guide
Using Evals API on Audio Input

Using Evals API on Audio Input

Cookbook example showing evals on audio inputs.

cookbook
Using Evals API on Image Inputs

Using Evals API on Image Inputs

Cookbook example showing evals on image inputs.

cookbook
Working with the Evals API

Working with the Evals API

Guide to building evaluations with the Evals API.

guide