Skip to content

Get eval run output items

evals.runs.output_items.list(run_id, **kwargs) -> CursorPage<OutputItemListResponse { id, created_at, datasource_item, 7 more } >
GET/evals/{eval_id}/runs/{run_id}/output_items

Get a list of output items for an evaluation run.

ParametersExpand Collapse
eval_id: String
run_id: String
after: String

Identifier for the last output item from the previous pagination request.

limit: Integer

Number of output items to retrieve.

order: :asc | :desc

Sort order for output items by timestamp. Use asc for ascending order or desc for descending order. Defaults to asc.

Accepts one of the following:
:asc
:desc
status: :fail | :pass

Filter output items by status. Use failed to filter by failed output items or pass to filter by passed output items.

Accepts one of the following:
:fail
:pass
ReturnsExpand Collapse
class OutputItemListResponse { id, created_at, datasource_item, 7 more }

A schema representing an evaluation run output item.

id: String

Unique identifier for the evaluation run output item.

created_at: Integer

Unix timestamp (in seconds) when the evaluation run was created.

datasource_item: Hash[Symbol, untyped]

Details of the input data source item.

datasource_item_id: Integer

The identifier for the data source item.

eval_id: String

The identifier of the evaluation group.

object: :"eval.run.output_item"

The type of the object. Always "eval.run.output_item".

results: Array[{ name, passed, score, 2 more}]

A list of grader results for this output item.

name: String

The name of the grader.

passed: bool

Whether the grader considered the output a pass.

score: Float

The numeric score produced by the grader.

sample: Hash[Symbol, untyped]

Optional sample or intermediate data produced by the grader.

type: String

The grader type (for example, "string-check-grader").

run_id: String

The identifier of the evaluation run associated with this output item.

sample: { error, finish_reason, input, 7 more}

A sample containing the input and output of the evaluation run.

error: EvalAPIError { code, message }

An object representing an error response from the Eval API.

code: String

The error code.

message: String

The error message.

finish_reason: String

The reason why the sample generation was finished.

input: Array[{ content, role}]

An array of input messages.

content: String

The content of the message.

role: String

The role of the message sender (e.g., system, user, developer).

max_completion_tokens: Integer

The maximum number of tokens allowed for completion.

model: String

The model used for generating the sample.

output: Array[{ content, role}]

An array of output messages.

content: String

The content of the message.

role: String

The role of the message (e.g. "system", "assistant", "user").

seed: Integer

The seed used for generating the sample.

temperature: Float

The sampling temperature used.

top_p: Float

The top_p value used for sampling.

usage: { cached_tokens, completion_tokens, prompt_tokens, total_tokens}

Token usage details for the sample.

cached_tokens: Integer

The number of tokens retrieved from cache.

completion_tokens: Integer

The number of completion tokens generated.

prompt_tokens: Integer

The number of prompt tokens used.

total_tokens: Integer

The total number of tokens used.

status: String

The status of the evaluation run.

Get eval run output items

require "openai"

openai = OpenAI::Client.new(api_key: "My API Key")

page = openai.evals.runs.output_items.list("run_id", eval_id: "eval_id")

puts(page)
{
  "data": [
    {
      "id": "id",
      "created_at": 0,
      "datasource_item": {
        "foo": "bar"
      },
      "datasource_item_id": 0,
      "eval_id": "eval_id",
      "object": "eval.run.output_item",
      "results": [
        {
          "name": "name",
          "passed": true,
          "score": 0,
          "sample": {
            "foo": "bar"
          },
          "type": "type"
        }
      ],
      "run_id": "run_id",
      "sample": {
        "error": {
          "code": "code",
          "message": "message"
        },
        "finish_reason": "finish_reason",
        "input": [
          {
            "content": "content",
            "role": "role"
          }
        ],
        "max_completion_tokens": 0,
        "model": "model",
        "output": [
          {
            "content": "content",
            "role": "role"
          }
        ],
        "seed": 0,
        "temperature": 0,
        "top_p": 0,
        "usage": {
          "cached_tokens": 0,
          "completion_tokens": 0,
          "prompt_tokens": 0,
          "total_tokens": 0
        }
      },
      "status": "status"
    }
  ],
  "first_id": "first_id",
  "has_more": true,
  "last_id": "last_id",
  "object": "list"
}
Returns Examples
{
  "data": [
    {
      "id": "id",
      "created_at": 0,
      "datasource_item": {
        "foo": "bar"
      },
      "datasource_item_id": 0,
      "eval_id": "eval_id",
      "object": "eval.run.output_item",
      "results": [
        {
          "name": "name",
          "passed": true,
          "score": 0,
          "sample": {
            "foo": "bar"
          },
          "type": "type"
        }
      ],
      "run_id": "run_id",
      "sample": {
        "error": {
          "code": "code",
          "message": "message"
        },
        "finish_reason": "finish_reason",
        "input": [
          {
            "content": "content",
            "role": "role"
          }
        ],
        "max_completion_tokens": 0,
        "model": "model",
        "output": [
          {
            "content": "content",
            "role": "role"
          }
        ],
        "seed": 0,
        "temperature": 0,
        "top_p": 0,
        "usage": {
          "cached_tokens": 0,
          "completion_tokens": 0,
          "prompt_tokens": 0,
          "total_tokens": 0
        }
      },
      "status": "status"
    }
  ],
  "first_id": "first_id",
  "has_more": true,
  "last_id": "last_id",
  "object": "list"
}