Skip to content
Primary navigation

Translation server events

These are events emitted from the OpenAI Realtime Translation WebSocket server to the client.

error

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

error: RealtimeError { message, type, code, 2 more }

Details of the error.

message: string

A human-readable error message.

type: string

The type of error (e.g., "invalid_request_error", "server_error").

code: optional string

Error code, if any.

event_id: optional string

The event_id of the client event that caused the error, if applicable.

param: optional string

Parameter related to the error, if any.

event_id: string

The unique ID of the server event.

type: "error"

The event type, must be error.

session.created

Returned when a translation session is created. Emitted automatically when a new connection is established as the first server event. This event contains the default translation session configuration.

event_id: string

The unique ID of the server event.

session: RealtimeTranslationSession { id, audio, expires_at, 2 more }

The translation session configuration.

id: string

Unique identifier for the session that looks like sess_1234567890abcdef.

audio: object { input, output }

Configuration for translation input and output audio.

input: optional object { noise_reduction, transcription }
noise_reduction: optional object { type }

Optional input noise reduction.

Type of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

One of the following:
"near_field"
"far_field"
transcription: optional object { model }

Optional source-language transcription. When configured, the server emits session.input_transcript.delta events. Translation itself still runs from the input audio stream.

model: string

The transcription model used for source transcript deltas.

output: optional object { language }
language: optional string

Target language for translated output audio and transcript deltas.

expires_at: number

Expiration timestamp for the session, in seconds since epoch.

formatunixtime
model: string

The Realtime translation model used for this session. This field is set at session creation and cannot be changed with session.update.

type: "translation"

The session type. Always translation for Realtime translation sessions.

type: "session.created"

The event type, must be session.created.

session.updated

Returned when a translation session is updated with a session.update event, unless there is an error.

event_id: string

The unique ID of the server event.

session: RealtimeTranslationSession { id, audio, expires_at, 2 more }

The translation session configuration.

id: string

Unique identifier for the session that looks like sess_1234567890abcdef.

audio: object { input, output }

Configuration for translation input and output audio.

input: optional object { noise_reduction, transcription }
noise_reduction: optional object { type }

Optional input noise reduction.

Type of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

One of the following:
"near_field"
"far_field"
transcription: optional object { model }

Optional source-language transcription. When configured, the server emits session.input_transcript.delta events. Translation itself still runs from the input audio stream.

model: string

The transcription model used for source transcript deltas.

output: optional object { language }
language: optional string

Target language for translated output audio and transcript deltas.

expires_at: number

Expiration timestamp for the session, in seconds since epoch.

formatunixtime
model: string

The Realtime translation model used for this session. This field is set at session creation and cannot be changed with session.update.

type: "translation"

The session type. Always translation for Realtime translation sessions.

type: "session.updated"

The event type, must be session.updated.

session.closed

Returned when a realtime translation session is closed.

event_id: string

The unique ID of the server event.

type: "session.closed"

The event type, must be session.closed.

session.input_transcript.delta

Returned when optional source-language transcript text is available. This event is emitted only when audio.input.transcription is configured.

Transcript deltas are append-only text fragments. Clients should not insert unconditional spaces between deltas.

delta: string

Append-only source-language transcript text.

event_id: string

The unique ID of the server event.

type: "session.input_transcript.delta"

The event type, must be session.input_transcript.delta.

elapsed_ms: optional number

Timing metadata for stream alignment, derived from the translation frame when available. It advances in 200 ms increments, but multiple transcript deltas may share the same elapsed_ms. Treat it as alignment metadata, not a unique transcript-delta identifier.

session.output_transcript.delta

Returned when translated transcript text is available.

Transcript deltas are append-only text fragments. Clients should not insert unconditional spaces between deltas.

delta: string

Append-only transcript text for the translated output audio.

event_id: string

The unique ID of the server event.

type: "session.output_transcript.delta"

The event type, must be session.output_transcript.delta.

elapsed_ms: optional number

Timing metadata for stream alignment, derived from the translation frame when available. It advances in 200 ms increments, but multiple transcript deltas may share the same elapsed_ms. Treat it as alignment metadata, not a unique transcript-delta identifier.

session.output_audio.delta

Returned when translated output audio is available. Output audio deltas are 200 ms frames of PCM16 audio.

delta: string

Base64-encoded translated audio data.

event_id: string

The unique ID of the server event.

type: "session.output_audio.delta"

The event type, must be session.output_audio.delta.

channels: optional number

Number of audio channels.

elapsed_ms: optional number

Timing metadata for stream alignment, derived from the translation frame when available. Treat elapsed_ms as alignment metadata, not a unique event identifier.

format: optional "pcm16"

Audio encoding for delta.

sample_rate: optional number

Sample rate of the audio delta.