Evals
Improve your LLM integrations with evals.
Evals
All recipes16
SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails
Agents SDKEvals
Jun 5, 2026Moving from OpenAI Evals to Promptfoo
Evals
Jun 3, 2026Macro Evals for Agentic Systems
Evals
May 19, 2026Build an Agent Improvement Loop with Traces, Evals, and Codex
Agents SDKCodexEvals
May 12, 2026Evaluating Grounded Spatial Reasoning with GPT-5.5
EvalsImagesReasoningVision
May 11, 2026Build iterative repair loops with Codex
CodexEvals
May 11, 2026Migrate a Legacy Codebase with Sandbox Agents
Agents SDKEvals
Apr 7, 2026Building Governed AI Agents - A Practical Guide to Agentic Scaffolding
EvalsGuardrails
Feb 23, 2026Image Evals for Image Generation and Editing Use Cases
EvalsImagesVision
Jan 29, 2026Realtime Eval Guide
AudioEvalsResponsesSpeech
Jan 25, 2026Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining
Evals
Nov 4, 2025Build, deploy, and optimize agentic workflows with AgentKit
Evals
Oct 17, 2025Building resilient prompts using an evaluation flywheel
Evals
Oct 6, 2025Eval Driven System Design - From Prototype to Production
CompletionsEvalsFunctionsResponses
Jun 2, 2025Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
EvalsFine-tuning
May 21, 2025Evaluating Agents with Langfuse
Agents SDKEvals
Mar 31, 2025