Overview
To support long-running interactions, you can use compaction to reduce context size while preserving state needed for subsequent turns.
Compaction helps you balance quality, cost, and latency as conversations grow.
Server-side compaction
You can enable server-side compaction in a Responses create request
(POST /responses or client.responses.create) by setting
context_management with compact_threshold.
- When the rendered token count crosses the configured threshold, the server runs server-side compaction.
- No separate
/responses/compactcall is required in this mode. - The response stream includes the encrypted compaction item.
- ZDR note: server-side compaction is ZDR-friendly when you set
store=falseon your Responses create requests.
The returned compaction item carries forward key prior state and reasoning into the next run using fewer tokens. It is opaque and not intended to be human-interpretable.
For stateless input-array chaining, append output items as usual. If you are
using previous_response_id, pass only the new user message each turn. In both
cases, the compaction item carries context needed for the next window.
Latency tip: After appending output items to the previous input items, you can
drop items that came before the most recent compaction item to keep requests
smaller and reduce long-tail latency. The latest compaction item carries the
necessary context to continue the conversation. If you use
previous_response_id chaining, do not manually prune.
User journey
- Call
/responsesas usual, but includecontext_managementwithcompact_thresholdto enable server-side compaction. - As the response streams, if the context size crosses the threshold, the server triggers a compaction pass, emits a compaction output item in the same stream, and prunes context before continuing inference.
- Continue your loop with one pattern: stateless input-array chaining (append
output, including compaction items, to your next input array) or
previous_response_idchaining (pass only the new user message each turn and carry that ID forward).
Example user flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
conversation = [
{
"type": "message",
"role": "user",
"content": "Let's begin a long coding task.",
}
]
while keep_going:
response = client.responses.create(
model="gpt-5.2-codex",
input=conversation,
store=False,
context_management=[{"type": "compaction", "compact_threshold": 1000}],
)
conversation.extend(response.output)
conversation.append(
{
"type": "message",
"role": "user",
"content": get_next_user_input(),
}
)Standalone compact endpoint
For explicit control, use the standalone compact endpoint for stateless compaction in long-running workflows.
This endpoint is fully stateless and ZDR-friendly.
You send a full context window (messages, tools, and other items), and the
endpoint returns a new compacted context window you can pass to your next
/responses call.
The returned compacted window includes an encrypted compaction item that carries forward key prior state and reasoning using fewer tokens. It is opaque and not intended to be human-interpretable.
Note: the compacted window generally contains more than just the compaction item. It can also include retained items from the previous window.
Output handling: do not prune /responses/compact output. The returned window
is the canonical next context window, so pass it into your next /responses
call as-is.
User journey for standalone compaction
- Use
/responsesnormally, sending input items that include user messages, assistant outputs, and tool interactions. - When your context window grows large, call
/responses/compactto generate a new compacted context window. The window you send to/responses/compactmust still fit within your model’s context window. - For subsequent
/responsescalls, pass the returned compacted window (including the compaction item) as input instead of the full transcript.
Example user flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Full window collected from prior turns
long_input_items_array = [...]
# 1) Compact the current window
compacted = client.responses.compact(
model="gpt-5.2",
input=long_input_items_array,
)
# 2) Start the next turn by appending a new user message
next_input = [
*compacted.output, # Use compact output as-is
{
"type": "message",
"role": "user",
"content": user_input_message(),
},
]
next_response = client.responses.create(
model="gpt-5.2",
input=next_input,
store=False, # Keep the flow ZDR-friendly
)