Skip to content
Primary navigation

Get eval run output items

client.evals.runs.outputItems.list(stringrunID, OutputItemListParams { eval_id, after, limit, 2 more } params, RequestOptionsoptions?): CursorPage<OutputItemListResponse { id, created_at, datasource_item, 7 more } >
GET/evals/{eval_id}/runs/{run_id}/output_items

Get a list of output items for an evaluation run.

ParametersExpand Collapse
runID: string
params: OutputItemListParams { eval_id, after, limit, 2 more }
eval_id: string

Path param: The ID of the evaluation to retrieve runs for.

after?: string

Query param: Identifier for the last output item from the previous pagination request.

limit?: number

Query param: Number of output items to retrieve.

order?: "asc" | "desc"

Query param: Sort order for output items by timestamp. Use asc for ascending order or desc for descending order. Defaults to asc.

One of the following:
"asc"
"desc"
status?: "fail" | "pass"

Query param: Filter output items by status. Use failed to filter by failed output items or pass to filter by passed output items.

One of the following:
"fail"
"pass"
ReturnsExpand Collapse
OutputItemListResponse { id, created_at, datasource_item, 7 more }

A schema representing an evaluation run output item.

id: string

Unique identifier for the evaluation run output item.

created_at: number

Unix timestamp (in seconds) when the evaluation run was created.

datasource_item: Record<string, unknown>

Details of the input data source item.

datasource_item_id: number

The identifier for the data source item.

eval_id: string

The identifier of the evaluation group.

object: "eval.run.output_item"

The type of the object. Always "eval.run.output_item".

results: Array<Result>

A list of grader results for this output item.

name: string

The name of the grader.

passed: boolean

Whether the grader considered the output a pass.

score: number

The numeric score produced by the grader.

sample?: Record<string, unknown> | null

Optional sample or intermediate data produced by the grader.

type?: string

The grader type (for example, "string-check-grader").

run_id: string

The identifier of the evaluation run associated with this output item.

sample: Sample { error, finish_reason, input, 7 more }

A sample containing the input and output of the evaluation run.

error: EvalAPIError { code, message }

An object representing an error response from the Eval API.

code: string

The error code.

message: string

The error message.

finish_reason: string

The reason why the sample generation was finished.

input: Array<Input>

An array of input messages.

content: string

The content of the message.

role: string

The role of the message sender (e.g., system, user, developer).

max_completion_tokens: number

The maximum number of tokens allowed for completion.

model: string

The model used for generating the sample.

output: Array<Output>

An array of output messages.

content?: string

The content of the message.

role?: string

The role of the message (e.g. "system", "assistant", "user").

seed: number

The seed used for generating the sample.

temperature: number

The sampling temperature used.

top_p: number

The top_p value used for sampling.

usage: Usage { cached_tokens, completion_tokens, prompt_tokens, total_tokens }

Token usage details for the sample.

cached_tokens: number

The number of tokens retrieved from cache.

completion_tokens: number

The number of completion tokens generated.

prompt_tokens: number

The number of prompt tokens used.

total_tokens: number

The total number of tokens used.

status: string

The status of the evaluation run.

Get eval run output items

import OpenAI from "openai";

const openai = new OpenAI();

const outputItems = await openai.evals.runs.outputItems.list(
  "egroup_67abd54d9b0081909a86353f6fb9317a",
  "erun_67abd54d60ec8190832b46859da808f7"
);
console.log(outputItems);
{
  "object": "list",
  "data": [
    {
      "object": "eval.run.output_item",
      "id": "outputitem_67e5796c28e081909917bf79f6e6214d",
      "created_at": 1743092076,
      "run_id": "evalrun_67abd54d60ec8190832b46859da808f7",
      "eval_id": "eval_67abd54d9b0081909a86353f6fb9317a",
      "status": "pass",
      "datasource_item_id": 5,
      "datasource_item": {
        "input": "Stock Markets Rally After Positive Economic Data Released",
        "ground_truth": "Markets"
      },
      "results": [
        {
          "name": "String check-a2486074-d803-4445-b431-ad2262e85d47",
          "sample": null,
          "passed": true,
          "score": 1.0
        }
      ],
      "sample": {
        "input": [
          {
            "role": "developer",
            "content": "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\"  \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\"  \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\"  \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\"  \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\"  \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          },
          {
            "role": "user",
            "content": "Stock Markets Rally After Positive Economic Data Released",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          }
        ],
        "output": [
          {
            "role": "assistant",
            "content": "Markets",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          }
        ],
        "finish_reason": "stop",
        "model": "gpt-4o-mini-2024-07-18",
        "usage": {
          "total_tokens": 325,
          "completion_tokens": 2,
          "prompt_tokens": 323,
          "cached_tokens": 0
        },
        "error": null,
        "temperature": 1.0,
        "max_completion_tokens": 2048,
        "top_p": 1.0,
        "seed": 42
      }
    }
  ],
  "first_id": "outputitem_67e5796c28e081909917bf79f6e6214d",
  "last_id": "outputitem_67e5796c28e081909917bf79f6e6214d",
  "has_more": true
}
Returns Examples
{
  "object": "list",
  "data": [
    {
      "object": "eval.run.output_item",
      "id": "outputitem_67e5796c28e081909917bf79f6e6214d",
      "created_at": 1743092076,
      "run_id": "evalrun_67abd54d60ec8190832b46859da808f7",
      "eval_id": "eval_67abd54d9b0081909a86353f6fb9317a",
      "status": "pass",
      "datasource_item_id": 5,
      "datasource_item": {
        "input": "Stock Markets Rally After Positive Economic Data Released",
        "ground_truth": "Markets"
      },
      "results": [
        {
          "name": "String check-a2486074-d803-4445-b431-ad2262e85d47",
          "sample": null,
          "passed": true,
          "score": 1.0
        }
      ],
      "sample": {
        "input": [
          {
            "role": "developer",
            "content": "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\"  \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\"  \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\"  \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\"  \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\"  \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          },
          {
            "role": "user",
            "content": "Stock Markets Rally After Positive Economic Data Released",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          }
        ],
        "output": [
          {
            "role": "assistant",
            "content": "Markets",
            "tool_call_id": null,
            "tool_calls": null,
            "function_call": null
          }
        ],
        "finish_reason": "stop",
        "model": "gpt-4o-mini-2024-07-18",
        "usage": {
          "total_tokens": 325,
          "completion_tokens": 2,
          "prompt_tokens": 323,
          "cached_tokens": 0
        },
        "error": null,
        "temperature": 1.0,
        "max_completion_tokens": 2048,
        "top_p": 1.0,
        "seed": 42
      }
    }
  ],
  "first_id": "outputitem_67e5796c28e081909917bf79f6e6214d",
  "last_id": "outputitem_67e5796c28e081909917bf79f6e6214d",
  "has_more": true
}