Evals
Improve your LLM integrations with evals.
Evals
All recipes20
Image Evals for Image Generation and Editing Use Cases
EvalsImagesVision
Jan 29, 2026Realtime Eval Guide
AudioEvalsResponsesSpeech
Jan 25, 2026Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining
Evals
Nov 4, 2025Build, deploy, and optimize agentic workflows with AgentKit
Evals
Oct 17, 2025Building resilient prompts using an evaluation flywheel
Evals
Oct 6, 2025Using Evals API on Audio Inputs
AudioEvals
Aug 13, 2025Using Evals API on Image Inputs
EvalsImages
Jul 15, 2025Evals API Use-case - MCP Evaluation
EvalsResponses
Jun 9, 2025Evals API Use-case - Structured Outputs Evaluation
EvalsResponses
Jun 9, 2025Evals API Use-case - Tools Evaluation
EvalsResponses
Jun 9, 2025Evals API Use-case - Web Search Evaluation
EvalsResponses
Jun 9, 2025Eval Driven System Design - From Prototype to Production
CompletionsEvalsFunctionsResponses
Jun 2, 2025Selecting a Model Based on Stripe Conversion – A Practical Eval for Startups
Evals
Jun 2, 2025Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
EvalsFine-tuning
May 21, 2025Evals API Use-case - Responses Evaluation
EvalsResponses
May 13, 2025Evals API Use-case - Detecting prompt regressions
CompletionsEvals
Apr 8, 2025Evals API Use-case - Bulk model and prompt experimentation
CompletionsEvals
Apr 8, 2025Evals API Use-case - Monitoring stored completions
CompletionsEvals
Apr 8, 2025Evaluating Agents with Langfuse
Agents SDKEvals
Mar 31, 2025Custom LLM as a Judge to Detect Hallucinations with Braintrust
CompletionsEvals
Oct 14, 2024