Primary navigation

Evals

Improve your LLM integrations with evals.

Evals

All recipes21

Building Governed AI Agents - A Practical Guide to Agentic Scaffolding
EvalsGuardrails
Feb 23, 2026
Image Evals for Image Generation and Editing Use Cases
EvalsImagesVision
Jan 29, 2026
Realtime Eval Guide
AudioEvalsResponsesSpeech
Jan 25, 2026
Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining
Evals
Nov 4, 2025
Build, deploy, and optimize agentic workflows with AgentKit
Evals
Oct 17, 2025
Building resilient prompts using an evaluation flywheel
Evals
Oct 6, 2025
Using Evals API on Audio Inputs
AudioEvals
Aug 13, 2025
Using Evals API on Image Inputs
EvalsImages
Jul 15, 2025
Evals API Use-case - MCP Evaluation
EvalsResponses
Jun 9, 2025
Evals API Use-case - Structured Outputs Evaluation
EvalsResponses
Jun 9, 2025
Evals API Use-case - Tools Evaluation
EvalsResponses
Jun 9, 2025
Evals API Use-case - Web Search Evaluation
EvalsResponses
Jun 9, 2025
Eval Driven System Design - From Prototype to Production
CompletionsEvalsFunctionsResponses
Jun 2, 2025
Selecting a Model Based on Stripe Conversion – A Practical Eval for Startups
Evals
Jun 2, 2025
Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
EvalsFine-tuning
May 21, 2025
Evals API Use-case - Responses Evaluation
EvalsResponses
May 13, 2025
Evals API Use-case - Detecting prompt regressions
CompletionsEvals
Apr 8, 2025
Evals API Use-case - Bulk model and prompt experimentation
CompletionsEvals
Apr 8, 2025
Evals API Use-case - Monitoring stored completions
CompletionsEvals
Apr 8, 2025
Evaluating Agents with Langfuse
Agents SDKEvals
Mar 31, 2025
Custom LLM as a Judge to Detect Hallucinations with Braintrust
CompletionsEvals
Oct 14, 2024