Evals
Manage and run evals in the OpenAI platform.
resource openai_eval
required Expand Collapse
data_source_config: AttributesThe configuration for the data source used for the evaluation runs. Dictates the schema of the data used in the evaluation.
The configuration for the data source used for the evaluation runs. Dictates the schema of the data used in the evaluation.
testing_criteria: List[Attributes]A list of graders for all eval runs in this group. Graders can reference variables in the data source using double curly braces notation, like {{item.variable_name}}. To reference the model’s output, use the sample namespace (ie, {{sample.output_text}}).
A list of graders for all eval runs in this group. Graders can reference variables in the data source using double curly braces notation, like {{item.variable_name}}. To reference the model’s output, use the sample namespace (ie, {{sample.output_text}}).
input?: List[Attributes]A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
The evaluation metric to use. One of cosine, fuzzy_match, bleu,
gleu, meteor, rouge_1, rouge_2, rouge_3, rouge_4, rouge_5,
or rouge_l.
sampling_params?: AttributesThe sampling parameters for the model.
The sampling parameters for the model.
The maximum number of tokens the grader model may generate in its response.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
optional Expand Collapse
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
data openai_eval
optional Expand Collapse
computed Expand Collapse
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
data_source_config: AttributesConfiguration of data sources used in runs of the evaluation.
Configuration of data sources used in runs of the evaluation.
The json schema for the run data source items. Learn how to build JSON schemas here.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
testing_criteria: List[Attributes]A list of testing criteria.
A list of testing criteria.
input: List[Attributes]
The evaluation metric to use. One of cosine, fuzzy_match, bleu,
gleu, meteor, rouge_1, rouge_2, rouge_3, rouge_4, rouge_5,
or rouge_l.
sampling_params: AttributesThe sampling parameters for the model.
The sampling parameters for the model.
The maximum number of tokens the grader model may generate in its response.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
data openai_evals
optional Expand Collapse
computed Expand Collapse
items: List[Attributes]The items returned by the data source
The items returned by the data source
data_source_config: AttributesConfiguration of data sources used in runs of the evaluation.
Configuration of data sources used in runs of the evaluation.
The json schema for the run data source items. Learn how to build JSON schemas here.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
testing_criteria: List[Attributes]A list of testing criteria.
A list of testing criteria.
input: List[Attributes]
The evaluation metric to use. One of cosine, fuzzy_match, bleu,
gleu, meteor, rouge_1, rouge_2, rouge_3, rouge_4, rouge_5,
or rouge_l.
sampling_params: AttributesThe sampling parameters for the model.
The sampling parameters for the model.
The maximum number of tokens the grader model may generate in its response.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
EvalsRuns
Manage and run evals in the OpenAI platform.
resource openai_eval_run
required Expand Collapse
data_source: AttributesDetails about the run’s data source.
Details about the run’s data source.
source: AttributesDetermines what populates the item namespace in the data source.
Determines what populates the item namespace in the data source.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Optional string to search the ‘instructions’ field. This is a query parameter used to select responses.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
input_messages?: AttributesUsed when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
Used when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
template?: List[Attributes]A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.
Labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer).
For models like gpt-5.3-codex and beyond, when sending follow-up requests, preserve and resend
phase on all assistant messages — dropping it can degrade performance. Not used for user messages.
sampling_params?: Attributes
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
response_format?: AttributesAn object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
json_schema?: AttributesStructured Outputs configuration options, including a JSON Schema.
Structured Outputs configuration options, including a JSON Schema.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the response format is for, used by the model to determine how to respond in the format.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
tools?: List[Attributes]A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
function?: Attributes
The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the function does, used by the model to choose when and how to call the function.
The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.
Omitting parameters defines a function with an empty parameter list.
Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters field. Only a subset of JSON Schema is supported when strict is true. Learn more about Structured Outputs in the function calling guide.
A description of the function. Used by the model to determine whether or not to call the function.
filters?: AttributesA filter to apply.
A filter to apply.
Specifies the comparison operator: eq, ne, gt, gte, lt, lte, in, nin.
eq: equalsne: not equalgt: greater thangte: greater than or equallt: less thanlte: less than or equalin: innin: not in
The value to compare against the attribute key; supports string, number, or boolean types.
filters?: List[Attributes]Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
The maximum number of results to return. This number should be between 1 and 50 inclusive.
ranking_options?: AttributesRanking options for search.
Ranking options for search.
High level guidance for the amount of context window space to use for the search. One of low, medium, or high. medium is the default.
user_location?: AttributesThe approximate location of the user.
The approximate location of the user.
The two-letter ISO country code of the user, e.g. US.
The IANA timezone of the user, e.g. America/Los_Angeles.
An OAuth access token that can be used with a remote MCP server, either with a custom MCP server URL or a service connector. Your application must handle the OAuth authorization flow and provide the token here.
Identifier for service connectors, like those available in ChatGPT. One of
server_url or connector_id must be provided. Learn more about service
connectors here.
Currently supported connector_id values are:
- Dropbox:
connector_dropbox - Gmail:
connector_gmail - Google Calendar:
connector_googlecalendar - Google Drive:
connector_googledrive - Microsoft Teams:
connector_microsoftteams - Outlook Calendar:
connector_outlookcalendar - Outlook Email:
connector_outlookemail - SharePoint:
connector_sharepoint
Optional HTTP headers to send to the MCP server. Use for authentication or other purposes.
require_approval?: AttributesSpecify which of the MCP server’s tools require approval.
Specify which of the MCP server’s tools require approval.
always?: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
never?: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
The code interpreter container. Can be a container ID or an object that
specifies uploaded file IDs to make available to your code, along with an
optional memory_limit setting.
Background type for the generated image. One of transparent,
opaque, or auto. Default: auto.
Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 and gpt-image-1.5 and later models, unsupported for gpt-image-1-mini. Supports high and low. Defaults to low.
input_image_mask?: AttributesOptional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
Optional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
The output format of the generated image. One of png, webp, or
jpeg. Default: png.
Number of partial images to generate in streaming mode, from 0 (default value) to 3.
The quality of the generated image. One of low, medium, high,
or auto. Default: auto.
The size of the generated image. One of 1024x1024, 1024x1536,
1536x1024, or auto. Default: auto.
text?: AttributesConfiguration options for a text response from the model. Can be plain
text or structured JSON data. Learn more:
Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
format?: AttributesAn object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
A description of what the response format is for, used by the model to determine how to respond in the format.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
optional Expand Collapse
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
computed Expand Collapse
per_testing_criteria_results: List[Attributes]Results per testing criteria applied during the evaluation run.
Results per testing criteria applied during the evaluation run.
data openai_eval_run
optional Expand Collapse
computed Expand Collapse
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
data_source: AttributesInformation about the run’s data source.
Information about the run’s data source.
source: AttributesDetermines what populates the item namespace in the data source.
Determines what populates the item namespace in the data source.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Optional string to search the ‘instructions’ field. This is a query parameter used to select responses.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
input_messages: AttributesUsed when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
Used when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
template: List[Attributes]A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.
Labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer).
For models like gpt-5.3-codex and beyond, when sending follow-up requests, preserve and resend
phase on all assistant messages — dropping it can degrade performance. Not used for user messages.
sampling_params: Attributes
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
response_format: AttributesAn object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
json_schema: AttributesStructured Outputs configuration options, including a JSON Schema.
Structured Outputs configuration options, including a JSON Schema.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the response format is for, used by the model to determine how to respond in the format.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
tools: List[Attributes]A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
function: Attributes
The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the function does, used by the model to choose when and how to call the function.
The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.
Omitting parameters defines a function with an empty parameter list.
Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters field. Only a subset of JSON Schema is supported when strict is true. Learn more about Structured Outputs in the function calling guide.
A description of the function. Used by the model to determine whether or not to call the function.
filters: AttributesA filter to apply.
A filter to apply.
Specifies the comparison operator: eq, ne, gt, gte, lt, lte, in, nin.
eq: equalsne: not equalgt: greater thangte: greater than or equallt: less thanlte: less than or equalin: innin: not in
The value to compare against the attribute key; supports string, number, or boolean types.
filters: List[Attributes]Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
The maximum number of results to return. This number should be between 1 and 50 inclusive.
ranking_options: AttributesRanking options for search.
Ranking options for search.
High level guidance for the amount of context window space to use for the search. One of low, medium, or high. medium is the default.
user_location: AttributesThe approximate location of the user.
The approximate location of the user.
The two-letter ISO country code of the user, e.g. US.
The IANA timezone of the user, e.g. America/Los_Angeles.
An OAuth access token that can be used with a remote MCP server, either with a custom MCP server URL or a service connector. Your application must handle the OAuth authorization flow and provide the token here.
Identifier for service connectors, like those available in ChatGPT. One of
server_url or connector_id must be provided. Learn more about service
connectors here.
Currently supported connector_id values are:
- Dropbox:
connector_dropbox - Gmail:
connector_gmail - Google Calendar:
connector_googlecalendar - Google Drive:
connector_googledrive - Microsoft Teams:
connector_microsoftteams - Outlook Calendar:
connector_outlookcalendar - Outlook Email:
connector_outlookemail - SharePoint:
connector_sharepoint
Optional HTTP headers to send to the MCP server. Use for authentication or other purposes.
require_approval: AttributesSpecify which of the MCP server’s tools require approval.
Specify which of the MCP server’s tools require approval.
always: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
never: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
The code interpreter container. Can be a container ID or an object that
specifies uploaded file IDs to make available to your code, along with an
optional memory_limit setting.
Background type for the generated image. One of transparent,
opaque, or auto. Default: auto.
Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 and gpt-image-1.5 and later models, unsupported for gpt-image-1-mini. Supports high and low. Defaults to low.
input_image_mask: AttributesOptional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
Optional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
The output format of the generated image. One of png, webp, or
jpeg. Default: png.
Number of partial images to generate in streaming mode, from 0 (default value) to 3.
The quality of the generated image. One of low, medium, high,
or auto. Default: auto.
The size of the generated image. One of 1024x1024, 1024x1536,
1536x1024, or auto. Default: auto.
text: AttributesConfiguration options for a text response from the model. Can be plain
text or structured JSON data. Learn more:
Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
format: AttributesAn object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
A description of what the response format is for, used by the model to determine how to respond in the format.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
per_testing_criteria_results: List[Attributes]Results per testing criteria applied during the evaluation run.
Results per testing criteria applied during the evaluation run.
data openai_eval_runs
optional Expand Collapse
computed Expand Collapse
items: List[Attributes]The items returned by the data source
The items returned by the data source
data_source: AttributesInformation about the run’s data source.
Information about the run’s data source.
source: AttributesDetermines what populates the item namespace in the data source.
Determines what populates the item namespace in the data source.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Optional string to search the ‘instructions’ field. This is a query parameter used to select responses.
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
input_messages: AttributesUsed when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
Used when sampling from a model. Dictates the structure of the messages passed into the model. Can either be a reference to a prebuilt trajectory (ie, item.input_trajectory), or a template with variable references to the item namespace.
template: List[Attributes]A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
A list of chat messages forming the prompt or context. May include variable references to the item namespace, ie {{item.name}}.
Text, image, or audio input to the model, used to generate a response. Can also contain previous assistant responses.
Labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer).
For models like gpt-5.3-codex and beyond, when sending follow-up requests, preserve and resend
phase on all assistant messages — dropping it can degrade performance. Not used for user messages.
sampling_params: Attributes
Constrains effort on reasoning for
reasoning models.
Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing
reasoning effort can result in faster responses and fewer tokens used
on reasoning in a response.
gpt-5.1defaults tonone, which does not perform reasoning. The supported reasoning values forgpt-5.1arenone,low,medium, andhigh. Tool calls are supported for all reasoning values in gpt-5.1.- All models before
gpt-5.1default tomediumreasoning effort, and do not supportnone. - The
gpt-5-promodel defaults to (and only supports)highreasoning effort. xhighis supported for all models aftergpt-5.1-codex-max.
response_format: AttributesAn object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Setting to { "type": "json_schema", "json_schema": {...} } enables
Structured Outputs which ensures the model will match your supplied JSON
schema. Learn more in the Structured Outputs
guide.
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
json_schema: AttributesStructured Outputs configuration options, including a JSON Schema.
Structured Outputs configuration options, including a JSON Schema.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the response format is for, used by the model to determine how to respond in the format.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
tools: List[Attributes]A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
function: Attributes
The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the function does, used by the model to choose when and how to call the function.
The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.
Omitting parameters defines a function with an empty parameter list.
Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters field. Only a subset of JSON Schema is supported when strict is true. Learn more about Structured Outputs in the function calling guide.
A description of the function. Used by the model to determine whether or not to call the function.
filters: AttributesA filter to apply.
A filter to apply.
Specifies the comparison operator: eq, ne, gt, gte, lt, lte, in, nin.
eq: equalsne: not equalgt: greater thangte: greater than or equallt: less thanlte: less than or equalin: innin: not in
The value to compare against the attribute key; supports string, number, or boolean types.
filters: List[Attributes]Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
Array of filters to combine. Items can be ComparisonFilter or CompoundFilter.
The maximum number of results to return. This number should be between 1 and 50 inclusive.
ranking_options: AttributesRanking options for search.
Ranking options for search.
High level guidance for the amount of context window space to use for the search. One of low, medium, or high. medium is the default.
user_location: AttributesThe approximate location of the user.
The approximate location of the user.
The two-letter ISO country code of the user, e.g. US.
The IANA timezone of the user, e.g. America/Los_Angeles.
An OAuth access token that can be used with a remote MCP server, either with a custom MCP server URL or a service connector. Your application must handle the OAuth authorization flow and provide the token here.
Identifier for service connectors, like those available in ChatGPT. One of
server_url or connector_id must be provided. Learn more about service
connectors here.
Currently supported connector_id values are:
- Dropbox:
connector_dropbox - Gmail:
connector_gmail - Google Calendar:
connector_googlecalendar - Google Drive:
connector_googledrive - Microsoft Teams:
connector_microsoftteams - Outlook Calendar:
connector_outlookcalendar - Outlook Email:
connector_outlookemail - SharePoint:
connector_sharepoint
Optional HTTP headers to send to the MCP server. Use for authentication or other purposes.
require_approval: AttributesSpecify which of the MCP server’s tools require approval.
Specify which of the MCP server’s tools require approval.
always: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
never: AttributesA filter object to specify which tools are allowed.
A filter object to specify which tools are allowed.
Indicates whether or not a tool modifies data or is read-only. If an
MCP server is annotated with readOnlyHint,
it will match this filter.
The code interpreter container. Can be a container ID or an object that
specifies uploaded file IDs to make available to your code, along with an
optional memory_limit setting.
Background type for the generated image. One of transparent,
opaque, or auto. Default: auto.
Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 and gpt-image-1.5 and later models, unsupported for gpt-image-1-mini. Supports high and low. Defaults to low.
input_image_mask: AttributesOptional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
Optional mask for inpainting. Contains image_url
(string, optional) and file_id (string, optional).
The output format of the generated image. One of png, webp, or
jpeg. Default: png.
Number of partial images to generate in streaming mode, from 0 (default value) to 3.
The quality of the generated image. One of low, medium, high,
or auto. Default: auto.
The size of the generated image. One of 1024x1024, 1024x1536,
1536x1024, or auto. Default: auto.
text: AttributesConfiguration options for a text response from the model. Can be plain
text or structured JSON data. Learn more:
Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
format: AttributesAn object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
An object specifying the format that the model must output.
Configuring { "type": "json_schema" } enables Structured Outputs,
which ensures the model will match your supplied JSON schema. Learn more in the
Structured Outputs guide.
The default format is { "type": "text" } with no additional options.
Not recommended for gpt-4o and newer models:
Setting to { "type": "json_object" } enables the older JSON mode, which
ensures the message the model generates is valid JSON. Using json_schema
is preferred for models that support it.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
A description of what the response format is for, used by the model to determine how to respond in the format.
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined
in the schema field. Only a subset of JSON Schema is supported when
strict is true. To learn more, read the Structured Outputs
guide.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
per_testing_criteria_results: List[Attributes]Results per testing criteria applied during the evaluation run.
Results per testing criteria applied during the evaluation run.
EvalsRunsOutput Items
Manage and run evals in the OpenAI platform.
data openai_eval_run_output_item
computed Expand Collapse
data openai_eval_run_output_items
optional Expand Collapse
computed Expand Collapse
items: List[Attributes]The items returned by the data source
The items returned by the data source