Skip to content
Primary navigation

Output Items

Manage and run evals in the OpenAI platform.

Get eval run output items
evals.runs.output_items.list(strrun_id, OutputItemListParams**kwargs) -> SyncCursorPage[OutputItemListResponse]
GET/evals/{eval_id}/runs/{run_id}/output_items
Get an output item of an eval run
evals.runs.output_items.retrieve(stroutput_item_id, OutputItemRetrieveParams**kwargs) -> OutputItemRetrieveResponse
GET/evals/{eval_id}/runs/{run_id}/output_items/{output_item_id}
ModelsExpand Collapse
class OutputItemListResponse:

A schema representing an evaluation run output item.

id: str

Unique identifier for the evaluation run output item.

created_at: int

Unix timestamp (in seconds) when the evaluation run was created.

formatunixtime
datasource_item: Dict[str, object]

Details of the input data source item.

datasource_item_id: int

The identifier for the data source item.

eval_id: str

The identifier of the evaluation group.

object: Literal["eval.run.output_item"]

The type of the object. Always “eval.run.output_item”.

results: List[Result]

A list of grader results for this output item.

name: str

The name of the grader.

passed: bool

Whether the grader considered the output a pass.

score: float

The numeric score produced by the grader.

sample: Optional[Dict[str, object]]

Optional sample or intermediate data produced by the grader.

type: Optional[str]

The grader type (for example, “string-check-grader”).

run_id: str

The identifier of the evaluation run associated with this output item.

sample: Sample

A sample containing the input and output of the evaluation run.

An object representing an error response from the Eval API.

finish_reason: str

The reason why the sample generation was finished.

input: List[SampleInput]

An array of input messages.

content: str

The content of the message.

role: str

The role of the message sender (e.g., system, user, developer).

max_completion_tokens: int

The maximum number of tokens allowed for completion.

model: str

The model used for generating the sample.

output: List[SampleOutput]

An array of output messages.

content: Optional[str]

The content of the message.

role: Optional[str]

The role of the message (e.g. “system”, “assistant”, “user”).

seed: int

The seed used for generating the sample.

temperature: float

The sampling temperature used.

top_p: float

The top_p value used for sampling.

usage: SampleUsage

Token usage details for the sample.

cached_tokens: int

The number of tokens retrieved from cache.

completion_tokens: int

The number of completion tokens generated.

prompt_tokens: int

The number of prompt tokens used.

total_tokens: int

The total number of tokens used.

status: str

The status of the evaluation run.

class OutputItemRetrieveResponse:

A schema representing an evaluation run output item.

id: str

Unique identifier for the evaluation run output item.

created_at: int

Unix timestamp (in seconds) when the evaluation run was created.

formatunixtime
datasource_item: Dict[str, object]

Details of the input data source item.

datasource_item_id: int

The identifier for the data source item.

eval_id: str

The identifier of the evaluation group.

object: Literal["eval.run.output_item"]

The type of the object. Always “eval.run.output_item”.

results: List[Result]

A list of grader results for this output item.

name: str

The name of the grader.

passed: bool

Whether the grader considered the output a pass.

score: float

The numeric score produced by the grader.

sample: Optional[Dict[str, object]]

Optional sample or intermediate data produced by the grader.

type: Optional[str]

The grader type (for example, “string-check-grader”).

run_id: str

The identifier of the evaluation run associated with this output item.

sample: Sample

A sample containing the input and output of the evaluation run.

An object representing an error response from the Eval API.

finish_reason: str

The reason why the sample generation was finished.

input: List[SampleInput]

An array of input messages.

content: str

The content of the message.

role: str

The role of the message sender (e.g., system, user, developer).

max_completion_tokens: int

The maximum number of tokens allowed for completion.

model: str

The model used for generating the sample.

output: List[SampleOutput]

An array of output messages.

content: Optional[str]

The content of the message.

role: Optional[str]

The role of the message (e.g. “system”, “assistant”, “user”).

seed: int

The seed used for generating the sample.

temperature: float

The sampling temperature used.

top_p: float

The top_p value used for sampling.

usage: SampleUsage

Token usage details for the sample.

cached_tokens: int

The number of tokens retrieved from cache.

completion_tokens: int

The number of completion tokens generated.

prompt_tokens: int

The number of prompt tokens used.

total_tokens: int

The total number of tokens used.

status: str

The status of the evaluation run.