Skip to content

Completions

Create completion
POST/completions
ModelsExpand Collapse
Completion = object { id, choices, created, 4 more }

Represents a completion response from the API. Note: both the streamed and non-streamed response objects share the same shape (unlike the chat endpoint).

id: string

A unique identifier for the completion.

choices: array of CompletionChoice { finish_reason, index, logprobs, text }

The list of completion choices the model generated for the input prompt.

finish_reason: "stop" or "length" or "content_filter"

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, or content_filter if content was omitted due to a flag from our content filters.

Accepts one of the following:
"stop"
"length"
"content_filter"
index: number
logprobs: object { text_offset, token_logprobs, tokens, top_logprobs }
text_offset: optional array of number
token_logprobs: optional array of number
tokens: optional array of string
top_logprobs: optional array of map[number]
text: string
created: number

The Unix timestamp (in seconds) of when the completion was created.

model: string

The model used for completion.

object: "text_completion"

The object type, which is always "text_completion"

system_fingerprint: optional string

This fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

usage: optional CompletionUsage { completion_tokens, prompt_tokens, total_tokens, 2 more }

Usage statistics for the completion request.

CompletionChoice = object { finish_reason, index, logprobs, text }
finish_reason: "stop" or "length" or "content_filter"

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, or content_filter if content was omitted due to a flag from our content filters.

Accepts one of the following:
"stop"
"length"
"content_filter"
index: number
logprobs: object { text_offset, token_logprobs, tokens, top_logprobs }
text_offset: optional array of number
token_logprobs: optional array of number
tokens: optional array of string
top_logprobs: optional array of map[number]
text: string
CompletionUsage = object { completion_tokens, prompt_tokens, total_tokens, 2 more }

Usage statistics for the completion request.

completion_tokens: number

Number of tokens in the generated completion.

prompt_tokens: number

Number of tokens in the prompt.

total_tokens: number

Total number of tokens used in the request (prompt + completion).

completion_tokens_details: optional object { accepted_prediction_tokens, audio_tokens, reasoning_tokens, rejected_prediction_tokens }

Breakdown of tokens used in a completion.

accepted_prediction_tokens: optional number

When using Predicted Outputs, the number of tokens in the prediction that appeared in the completion.

audio_tokens: optional number

Audio input tokens generated by the model.

reasoning_tokens: optional number

Tokens generated by the model for reasoning.

rejected_prediction_tokens: optional number

When using Predicted Outputs, the number of tokens in the prediction that did not appear in the completion. However, like reasoning tokens, these tokens are still counted in the total completion tokens for purposes of billing, output, and context window limits.

prompt_tokens_details: optional object { audio_tokens, cached_tokens }

Breakdown of tokens used in the prompt.

audio_tokens: optional number

Audio input tokens present in the prompt.

cached_tokens: optional number

Cached tokens present in the prompt.