Moving from OpenAI Evals to Promptfoo

OpenAI is winding down the Evals product and recommends Promptfoo for continuing and extending your evaluation workflows. OpenAI Evals lets you export supported evaluations as runnable Promptfoo configs and separately preserve results from past runs.

Promptfoo is an open-source CLI and library for evaluating and red-teaming AI apps and agents. It gives you a more flexible way to run, maintain, and extend your evals locally or in CI.

What Changes When You Move to Promptfoo

OpenAI Evals and Promptfoo share the same core building blocks: test data, prompts, providers/models, and criteria for scoring outputs. For supported evals, the exported Promptfoo configuration carries these pieces into a runnable evaluation, so you can continue testing the same behavior in Promptfoo.

The main difference is the workflow. OpenAI Evals is managed in the OpenAI Platform, where evaluations, runs, results, and related dataset workflows are available in the dashboard. Promptfoo is centered on a portable configuration file and CLI workflow. You can keep evaluations alongside your application code, run them locally or in CI, and adapt them as your testing needs evolve.

OpenAI Evals	Promptfoo
Evaluations and results are managed in the OpenAI Platform dashboard.	Evaluations are defined in a config file and run with the CLI or in CI.
Graders are configured as testing criteria in an eval.	Grading behavior is configured as assertions and metrics.
Runs are tied to the Platform evaluation workflow.	Runs can be integrated into development, testing, and deployment workflows.
Platform features such as dataset review and prompt iteration remain in OpenAI.	Provider selection and evaluation customization are managed in Promptfoo.

Promptfoo does not simply reproduce the OpenAI Evals interface in another tool. It provides a more portable, code-oriented workflow for maintaining, running, and extending evaluations over time.

Migrate an Evaluation to Promptfoo

OpenAI Evals lets you export supported evaluations as runnable Promptfoo configurations, so you can continue running and extending your evals in Promptfoo. Historical result exports are separate: you can import them when you want previous OpenAI Evals runs available in Promptfoo.

Before You Start

Install Promptfoo.
An existing OpenAI Evals evaluation.
An API key for the model provider used in your eval, such as OPENAI_API_KEY.

Export a Runnable Promptfoo Configuration

Open the evaluation in the OpenAI Platform dashboard.
Select a completed run to use as the basis for the exported configuration. If your evaluation does not have a completed run yet, run it once before exporting.
Open the Export menu and select Download runnable Promptfoo config.
Download the Promptfoo configuration file and review any migration warnings before running it.

Screenshot placeholder: OpenAI Evals export menu with Download runnable Promptfoo config visible.

Run the Exported Evaluation in Promptfoo

Run a fresh evaluation and open the results viewer:

promptfoo eval -c <downloaded-config-file> --no-cache
promptfoo view

eval creates a new Promptfoo evaluation run, while view opens its results locally. This new run is separate from any previously completed OpenAI Evals runs.

If you modify the exported configuration or encounter a configuration error, validate it before retrying:

promptfoo validate config -c <downloaded-config-file>

Promptfoo results view after running the exported evaluation

Verify the Migrated Evaluation

After your first Promptfoo run, review the results to confirm that the evaluation behaves as expected. If the export included warnings or you manually changed the configuration, compare the affected prompts, assertions, and outputs before using the eval in an ongoing workflow.

Similarity-based scoring may not produce identical numerical results across systems. Any manually recreated grader, especially an LLM-as-a-judge grader, should be validated before you rely on it for regression decisions.

Preserve Historical Results

If you want past OpenAI Evals runs visible in Promptfoo, download your results from the export menu and import the downloaded file:

promptfoo import <downloaded-results-file>
promptfoo view

This preserves past run results for reference. It does not create the runnable Promptfoo configuration used for future evaluation runs.

Cases That Require Manual Setup

Some evaluations may require additional setup in Promptfoo.

Situation	What to do
A runnable configuration cannot be exported.	Recreate the evaluation in Promptfoo. Import historical results separately, if needed.
A grader is not included in the exported configuration.	Recreate it as a Promptfoo assertion and verify its behavior before relying on it.
A tool, agent, or custom provider workflow is not reproduced.	Configure that workflow in Promptfoo and test it before using the migrated eval.

Customize Your Workflow in Promptfoo

Once your migrated evaluation is working as expected, you can use Promptfoo to expand your test coverage, run evals in CI, and add red-team testing.

If you use Codex, see Add evals to your AI application to build and maintain Promptfoo eval suites with the Promptfoo Codex plugin.

Suggested

Get started

Core concepts

SDKs and CLI

Resources

Legacy APIs

Choose a model

Text and code

Prompting

Reasoning

Images and video

Realtime and audio

Specialized models

Agents SDK

ChatKit

Search and retrieval

Connect tools and data

Build tool workflows

Computer and code

Media

Get started

Audio

Connection methods

Sessions and operations

Go live

Performance and quality

Cost and throughput

Safety and governance

Infrastructure and access

Operations

Core Concepts

Plan

Build

Deploy

Conversion apps

Guides

Resources

Get started

Guides

File Upload

API

Measurement

Advertiser API

API Reference

Get started

Foundations

Explore

Available on

Releases

Workflows

Capabilities

Reference

Customization

Config file

Agent configuration

Extend ChatGPT and Codex

Windows

Development workflows

Extend and automate

Environments

Build with Codex

Third-party integrations

Reference

Permissions

Codex Security

Safety

Getting started

Identity and authentication

Workspace access, policy, and models

Plugin and connector controls

Usage, governance, and compliance

Deployment and model providers

Community

Blog

Community

Blog

Recent

Topics

Topics

Contribute