The shell tool allows the model to interact with your local computer through a controlled command-line interface. The model proposes shell commands; your integration executes them and returns the outputs. This creates a simple plan-execute loop that lets models inspect the system, run utilities, and gather data until they can finish the task.
Shell is available through the Responses API for use with GPT-5.1, GPT-5.2, GPT-5.1-codex, and GPT-5.2-codex. It is not available via the Chat Completions API.
Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.
See Codex CLI for reference implementation.
When to use
- Automating filesystem or process diagnostics — For example, “find the largest PDF under ~/Documents” or “show running gunicorn processes.”
- Extending the model’s capabilities — Using built-in UNIX utilities, python runtime and other CLIs in your environment.
- Running multi-step build and test flows — Chaining commands like
pip installandpytest. - Complex agentic coding workflows — Using other tools like
apply_patchto complete workflows that involve complex file operations.
Use shell tool with Responses API
-
Add
shellas a tool to your request:- Include
shelltool in the tools paramtools=[{"type": "shell"}] - Can add other tools in the
toolslist andshellcan be used with them. - It’s helpful to specify the shell environment, so the model can provide compatible commands.
Call the shell toolpython1 2 3 4 5 6 7 8 9 10from openai import OpenAI client = OpenAI() response = client.responses.create( model="gpt-5.1", instructions="The local bash shell environment is on Mac.", input="find me the largest pdf file in ~/Documents", tools=[{"type": "shell"}], ) print(response) - Include
-
Response includes
shell_calloutput itemsWhen the model decides to use the
shelltool, it will include one or multipleshell_calloutput items. Eachshell_calloutput item has the following fields.call_idis generated by the API and should be passed back inshell_call_outputin the next request.statusindicates whether the tool call iscompletedorin_progress(in streaming mode).actioncontains information for shell tool executioncommands: A list of shell commands that developers can execute concurrently in the local computer environment.timeout_ms: Optional timeout that should be honored in shell execution.max_output_length: Used to truncate the output when it’s too large. Developer needs to pass it back with the raw output inshell_call_output, and does not need to do the actual truncation.
Example shell_call output1 2 3 4 5 6 7 8 9 10{ "type": "shell_call", "call_id": "...", "action": { "commands": ["ls -l"], "timeout_ms": 120000, "max_output_length": 4096 }, "status": "in_progress" } -
Execute Shell Commands
Run the shell tool inside a sandbox environment with security guard: dedicated containers, minimum privileges, file-system allowlists, and audit logging are all recommended.
For conversation with multiple requests, run the commands inside a persistent shell session, capturing stdout, stderr and exit code.
One Response could contain multiple
shell_callitems and oneshell_callcould contain a list of commands. The commands can all be executed concurrently.Below is an example shell command executor.
Important:
Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.
The example below does not perform any sandboxing or security checks.
See Codex CLI for a reference implementation.
Shell executor examplepython1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27@dataclass class CmdResult: stdout: str stderr: str exit_code: int | None timed_out: bool class ShellExecutor: def __init__(self, default_timeout: float = 60): self.default_timeout = default_timeout def run(self, cmd: str, timeout: float | None = None) -> CmdResult: t = timeout or self.default_timeout p = subprocess.Popen( cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, ) try: out, err = p.communicate(timeout=t) return CmdResult(out, err, p.returncode, False) except subprocess.TimeoutExpired: p.kill() out, err = p.communicate() return CmdResult(out, err, p.returncode, True) -
Send the shell tool response to the model
Package one or multiple shell execution outputs in
shell_call_outputand send it back as input items in the next model request.Each shell execution output should have:
stdout: the raw stdoutstderr: the raw stderroutcome: the outcome of shell execution- If the command timeout, it should be
{"type": "timeout"} - If the command completed, it should be
{"type": "exit", "exit_code": EXIT_CODE}
- If the command timeout, it should be
If
shell_callhasmax_output_length, should always pass it back inshell_call_output.Below is an example
Example shell_call_output payload1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22{ "type": "shell_call_output", "call_id": "...", "max_output_length": 4096, "output": [ { "stdout": "...", "stderr": "...", "outcome": { "type": "exit", "exit_code": 0 } } { "stdout": "...", "stderr": "...", "outcome": { "type": "timeout" } } ] }
Use the shell tool with the Agents SDK
Alternatively, you can use the Agents SDK to use the shell tool. You’ll still have to implement your own shell executor to provide to the tool but the Agents SDK will handle the lifecycle of the execution.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import {
Agent,
run,
withTrace,
Shell,
ShellAction,
ShellResult,
shellTool,
} from '@openai/agents';
/**
* Implement this with your own shell execution environment.
*/
class LocalShell implements Shell {
async run(action: ShellAction): Promise<ShellResult> {
// Implement your own shell execution here
return {
output: [
{
stdout: 'Shell is not available. Needs to be implemented first.',
stderr: '',
outcome: {
type: 'exit',
exitCode: 1,
},
},
],
maxOutputLength: action.maxOutputLength,
};
}
}
const shell = new LocalShell();
const agent = new Agent({
name: 'Shell Assistant',
model: 'gpt-5.1',
instructions:
'You can execute shell commands to inspect the repository. Keep responses concise and include command output when helpful.',
tools: [
shellTool({
shell,
// could also be a function for you to determine if approval is needed
needsApproval: true,
onApproval: async (_ctx, _approvalItem) => {
// Implement your own approval logic
return { approve: true };
},
}),
],
});
await withTrace('shell-tool-example', async () => {
const result = await run(agent, 'Show the Node.js version.');
console.log(`\nFinal response:\n${result.finalOutput}`);
});You can find full working examples on GitHub.
Example of how to use the shell tool with the Agents SDK in TypeScript
Example of how to use the shell tool with the Agents SDK in Python
Handling common errors
- Many CLI tools return non-zero exit codes for warnings; still capture stdout/stderr so the model can interpret the failure.
- If a command exceeds the timeout limit, emit
{"type": "timeout"}and include partial stdout and stderr that are captured. - If
max_output_lengthexists inshell_call, always pass it back inshell_call_output. otherwise the API returns a 400 error for missing parameter. - There is no need to truncate
stderrandstdout. Responses API truncates the output on the server side.
Common patterns and limitations
- Run shell tool in sandbox or container. Consider using Docker, firejail, or a jailed user account.
- Filter or scrutinize high-risk commands (e.g. rm, curl, network utilities).
- Log every command and its output for auditability and debugging.
- Commands are non-interactive: avoid tools that prompt for passwords or open full-screen editors.
- We recommend to execute the
commandsconcurrently, although sometimes the model may output commands that need sequential execution. In this case, the model is able to adjust and recover in the next response.
Usage notes
| API Availability | Supported models |
|---|---|