Primary navigation

Legacy APIs

Shell

Allow models to run shell commands through your integration.

The shell tool allows the model to interact with your local computer through a controlled command-line interface. The model proposes shell commands; your integration executes them and returns the outputs. This creates a simple plan-execute loop that lets models inspect the system, run utilities, and gather data until they can finish the task.

Shell is available through the Responses API for use with GPT-5.1, GPT-5.2, GPT-5.1-codex, and GPT-5.2-codex. It is not available via the Chat Completions API.

Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.


See Codex CLI for reference implementation.

When to use

  • Automating filesystem or process diagnostics — For example, “find the largest PDF under ~/Documents” or “show running gunicorn processes.
  • Extending the model’s capabilities — Using built-in UNIX utilities, python runtime and other CLIs in your environment.
  • Running multi-step build and test flows — Chaining commands like pip install and pytest.
  • Complex agentic coding workflows — Using other tools like apply_patch to complete workflows that involve complex file operations.

Use shell tool with Responses API

  1. Add shell as a tool to your request:

    • Include shell tool in the tools param tools=[{"type": "shell"}]
    • Can add other tools in the tools list and shell can be used with them.
    • It’s helpful to specify the shell environment, so the model can provide compatible commands.
    Call the shell tool
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    from openai import OpenAI
    client = OpenAI()
    
    response = client.responses.create(
        model="gpt-5.1",
        instructions="The local bash shell environment is on Mac.",
        input="find me the largest pdf file in ~/Documents",
        tools=[{"type": "shell"}],
    )
    print(response)
  2. Response includes shell_call output items

    When the model decides to use the shell tool, it will include one or multiple shell_call output items. Each shell_call output item has the following fields.

    • call_id is generated by the API and should be passed back in shell_call_output in the next request.
    • status indicates whether the tool call is completed or in_progress (in streaming mode).
    • action contains information for shell tool execution
      • commands: A list of shell commands that developers can execute concurrently in the local computer environment.
      • timeout_ms: Optional timeout that should be honored in shell execution.
      • max_output_length: Used to truncate the output when it’s too large. Developer needs to pass it back with the raw output in shell_call_output, and does not need to do the actual truncation.
    Example shell_call output
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    {
        "type": "shell_call",
        "call_id": "...",
        "action": {
            "commands": ["ls -l"],
            "timeout_ms": 120000,
            "max_output_length": 4096
        },
        "status": "in_progress"
    }
  3. Execute Shell Commands

    Run the shell tool inside a sandbox environment with security guard: dedicated containers, minimum privileges, file-system allowlists, and audit logging are all recommended.

    For conversation with multiple requests, run the commands inside a persistent shell session, capturing stdout, stderr and exit code.

    One Response could contain multiple shell_call items and one shell_call could contain a list of commands. The commands can all be executed concurrently.

    Below is an example shell command executor.

    Important:

    Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.

    The example below does not perform any sandboxing or security checks.

    See Codex CLI for a reference implementation.

    Shell executor example
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    @dataclass
    class CmdResult:
        stdout: str
        stderr: str
        exit_code: int | None
        timed_out: bool
    
    class ShellExecutor:
        def __init__(self, default_timeout: float = 60):
            self.default_timeout = default_timeout
    
        def run(self, cmd: str, timeout: float | None = None) -> CmdResult:
            t = timeout or self.default_timeout
            p = subprocess.Popen(
                cmd,
                shell=True,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True,
            )
            try:
                out, err = p.communicate(timeout=t)
                return CmdResult(out, err, p.returncode, False)
            except subprocess.TimeoutExpired:
                p.kill()
                out, err = p.communicate()
                return CmdResult(out, err, p.returncode, True)
  4. Send the shell tool response to the model

    Package one or multiple shell execution outputs in shell_call_output and send it back as input items in the next model request.

    Each shell execution output should have:

    • stdout: the raw stdout
    • stderr: the raw stderr
    • outcome: the outcome of shell execution
      • If the command timeout, it should be {"type": "timeout"}
      • If the command completed, it should be {"type": "exit", "exit_code": EXIT_CODE}

    If shell_call has max_output_length, should always pass it back in shell_call_output.

    Below is an example

    Example shell_call_output payload
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    {
        "type": "shell_call_output",
        "call_id": "...",
        "max_output_length": 4096,
        "output": [
            {
                "stdout": "...",
                "stderr": "...",
                "outcome": {
                    "type": "exit",
                    "exit_code": 0
                }
            }
            {
                "stdout": "...",
                "stderr": "...",
                "outcome": {
                    "type": "timeout"
                }
            }
        ]
    }

Use the shell tool with the Agents SDK

Alternatively, you can use the Agents SDK to use the shell tool. You’ll still have to implement your own shell executor to provide to the tool but the Agents SDK will handle the lifecycle of the execution.

Use the shell tool with the Agents SDK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import {
  Agent,
  run,
  withTrace,
  Shell,
  ShellAction,
  ShellResult,
  shellTool,
} from '@openai/agents';


/**
 * Implement this with your own shell execution environment.
 */
class LocalShell implements Shell {
  async run(action: ShellAction): Promise<ShellResult> {
    // Implement your own shell execution here
    return {
      output: [
        {
          stdout: 'Shell is not available. Needs to be implemented first.',
          stderr: '',
          outcome: {
            type: 'exit',
            exitCode: 1,
          },
        },
      ],
      maxOutputLength: action.maxOutputLength,
    };
  }
}

const shell = new LocalShell();

const agent = new Agent({
  name: 'Shell Assistant',
  model: 'gpt-5.1',
  instructions:
    'You can execute shell commands to inspect the repository. Keep responses concise and include command output when helpful.',
  tools: [
    shellTool({
      shell,
      // could also be a function for you to determine if approval is needed
      needsApproval: true,
      onApproval: async (_ctx, _approvalItem) => {
        // Implement your own approval logic
        return { approve: true };
      },
    }),
  ],
});

await withTrace('shell-tool-example', async () => {
  const result = await run(agent, 'Show the Node.js version.');

  console.log(`\nFinal response:\n${result.finalOutput}`);
});

You can find full working examples on GitHub.

Shell tool example - TypeScript

Example of how to use the shell tool with the Agents SDK in TypeScript

Shell tool example - Python

Example of how to use the shell tool with the Agents SDK in Python

Handling common errors

  • Many CLI tools return non-zero exit codes for warnings; still capture stdout/stderr so the model can interpret the failure.
  • If a command exceeds the timeout limit, emit {"type": "timeout"} and include partial stdout and stderr that are captured.
  • If max_output_length exists in shell_call, always pass it back in shell_call_output. otherwise the API returns a 400 error for missing parameter.
  • There is no need to truncate stderr and stdout. Responses API truncates the output on the server side.

Common patterns and limitations

  • Run shell tool in sandbox or container. Consider using Docker, firejail, or a jailed user account.
  • Filter or scrutinize high-risk commands (e.g. rm, curl, network utilities).
  • Log every command and its output for auditability and debugging.
  • Commands are non-interactive: avoid tools that prompt for passwords or open full-screen editors.
  • We recommend to execute the commands concurrently, although sometimes the model may output commands that need sequential execution. In this case, the model is able to adjust and recover in the next response.

Usage notes

API AvailabilitySupported models