Shell

The shell tool allows the model to interact with your local computer through a controlled command-line interface. The model proposes shell commands; your integration executes them and returns the outputs. This creates a simple plan-execute loop that lets models inspect the system, run utilities, and gather data until they can finish the task.

Shell is available through the Responses API for use with GPT-5.1, GPT-5.2, GPT-5.1-codex, and GPT-5.2-codex. It is not available via the Chat Completions API.

Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.

See Codex CLI for reference implementation.

When to use

Automating filesystem or process diagnostics — For example, “find the largest PDF under ~/Documents” or “show running gunicorn processes.”
Extending the model’s capabilities — Using built-in UNIX utilities, python runtime and other CLIs in your environment.
Running multi-step build and test flows — Chaining commands like pip install and pytest.
Complex agentic coding workflows — Using other tools like apply_patch to complete workflows that involve complex file operations.

Use shell tool with Responses API

Add shell as a tool to your request:

Include shell tool in the tools param tools=[{"type": "shell"}]
Can add other tools in the tools list and shell can be used with them.
It’s helpful to specify the shell environment, so the model can provide compatible commands.

Call the shell tool

python

1
2
3
4
5
6
7
8
9
10
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.1",
    instructions="The local bash shell environment is on Mac.",
    input="find me the largest pdf file in ~/Documents",
    tools=[{"type": "shell"}],
)
print(response)

1
2
3
4
5
6
7
8
9
10
const client = new OpenAI();

const response = await client.responses.create({
    model: "gpt-5.1",
    instructions: "The local bash shell environment is on Mac.",
    input: "find me the largest pdf file in ~/Documents",
    tools: [{"type": "shell"}],
});

console.log(response);

Response includes shell_call output items

When the model decides to use the shell tool, it will include one or multiple shell_call output items. Each shell_call output item has the following fields.
- call_id is generated by the API and should be passed back in shell_call_output in the next request.
- status indicates whether the tool call is completed or in_progress (in streaming mode).
- action contains information for shell tool execution
  - commands: A list of shell commands that developers can execute concurrently in the local computer environment.
  - timeout_ms: Optional timeout that should be honored in shell execution.
  - max_output_length: Used to truncate the output when it’s too large. Developer needs to pass it back with the raw output in shell_call_output, and does not need to do the actual truncation.
Example shell_call output
```
1
2
3
4
5
6
7
8
9
10
{
    "type": "shell_call",
    "call_id": "...",
    "action": {
        "commands": ["ls -l"],
        "timeout_ms": 120000,
        "max_output_length": 4096
    },
    "status": "in_progress"
}
```

Execute Shell Commands

Run the shell tool inside a sandbox environment with security guard: dedicated containers, minimum privileges, file-system allowlists, and audit logging are all recommended.

For conversation with multiple requests, run the commands inside a persistent shell session, capturing stdout, stderr and exit code.

One Response could contain multiple shell_call items and one shell_call could contain a list of commands. The commands can all be executed concurrently.

Below is an example shell command executor.

Important:

Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow- / deny-lists before forwarding a command to the system shell.

The example below does not perform any sandboxing or security checks.

See Codex CLI for a reference implementation.

Shell executor example

python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@dataclass
class CmdResult:
    stdout: str
    stderr: str
    exit_code: int | None
    timed_out: bool

class ShellExecutor:
    def __init__(self, default_timeout: float = 60):
        self.default_timeout = default_timeout

    def run(self, cmd: str, timeout: float | None = None) -> CmdResult:
        t = timeout or self.default_timeout
        p = subprocess.Popen(
            cmd,
            shell=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
        )
        try:
            out, err = p.communicate(timeout=t)
            return CmdResult(out, err, p.returncode, False)
        except subprocess.TimeoutExpired:
            p.kill()
            out, err = p.communicate()
            return CmdResult(out, err, p.returncode, True)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import { exec } from "node:child_process/promises";

class ShellExecutor {
    constructor(defaultTimeoutMs = 60_000) {
        this.defaultTimeoutMs = defaultTimeoutMs;
    }

    async run(cmd, timeoutMs) {
        const timeout = timeoutMs ?? this.defaultTimeoutMs;

        try {
            const { stdout, stderr } = await exec(cmd, { timeout });
            return { stdout, stderr, exitCode: 0, timedOut: false };
        } catch (error) {
            const timedOut = Boolean(error?.killed) && error?.signal === "SIGTERM";
            const exitCode = timedOut ? null : error?.code ?? null;
            return {
                stdout: error?.stdout ?? "",
                stderr: error?.stderr ?? String(error),
                exitCode,
                timedOut,
            };
        }
    }
}

Send the shell tool response to the model

Package one or multiple shell execution outputs in shell_call_output and send it back as input items in the next model request.

Each shell execution output should have:

stdout: the raw stdout
stderr: the raw stderr
outcome: the outcome of shell execution
- If the command timeout, it should be {"type": "timeout"}
- If the command completed, it should be {"type": "exit", "exit_code": EXIT_CODE}

If shell_call has max_output_length, should always pass it back in shell_call_output.

Below is an example

Example shell_call_output payload

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
    "type": "shell_call_output",
    "call_id": "...",
    "max_output_length": 4096,
    "output": [
        {
            "stdout": "...",
            "stderr": "...",
            "outcome": {
                "type": "exit",
                "exit_code": 0
            }
        }
        {
            "stdout": "...",
            "stderr": "...",
            "outcome": {
                "type": "timeout"
            }
        }
    ]
}

Use the shell tool with the Agents SDK

Alternatively, you can use the Agents SDK to use the shell tool. You’ll still have to implement your own shell executor to provide to the tool but the Agents SDK will handle the lifecycle of the execution.

Use the shell tool with the Agents SDK

javascript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import {
  Agent,
  run,
  withTrace,
  Shell,
  ShellAction,
  ShellResult,
  shellTool,
} from '@openai/agents';


/**
 * Implement this with your own shell execution environment.
 */
class LocalShell implements Shell {
  async run(action: ShellAction): Promise<ShellResult> {
    // Implement your own shell execution here
    return {
      output: [
        {
          stdout: 'Shell is not available. Needs to be implemented first.',
          stderr: '',
          outcome: {
            type: 'exit',
            exitCode: 1,
          },
        },
      ],
      maxOutputLength: action.maxOutputLength,
    };
  }
}

const shell = new LocalShell();

const agent = new Agent({
  name: 'Shell Assistant',
  model: 'gpt-5.1',
  instructions:
    'You can execute shell commands to inspect the repository. Keep responses concise and include command output when helpful.',
  tools: [
    shellTool({
      shell,
      // could also be a function for you to determine if approval is needed
      needsApproval: true,
      onApproval: async (_ctx, _approvalItem) => {
        // Implement your own approval logic
        return { approve: true };
      },
    }),
  ],
});

await withTrace('shell-tool-example', async () => {
  const result = await run(agent, 'Show the Node.js version.');

  console.log(`\nFinal response:\n${result.finalOutput}`);
});

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from agents import (
    Agent,
    Runner,
    ShellCallOutcome,
    ShellCommandOutput,
    ShellCommandRequest,
    ShellResult,
    ShellTool,
)


class LocalShell:
    async def __call__(self, request: ShellCommandRequest) -> ShellResult:
        # Implement your own shell execution here
        action = request.data.action
        return ShellResult(
            output=[
                ShellCommandOutput(
                    command="(not executed)",
                    stdout="Shell is not available. Needs to be implemented first.",
                    stderr="",
                    outcome=ShellCallOutcome(type="exit", exit_code=1),
                )
            ],
            max_output_length=action.max_output_length,
        )


shell_tool = ShellTool(
    executor=LocalShell(),
    # could also be a function for you to determine if approval is needed
    needs_approval=True,
    # Implement your own approval logic
    on_approval=lambda _ctx, _approval_item: {"approve": True},
)

agent = Agent(
    name="Shell Assistant",
    model="gpt-5.1",
    instructions="You can execute shell commands to inspect the repository. Keep responses concise and include command output when helpful.",
    tools=[shell_tool],
)


async def main():
    result = await Runner.run(agent, input="Show the Node.js version.")

    print(f"\nFinal response:\n{result.final_output}")


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

You can find full working examples on GitHub.

Shell tool example - TypeScript

Example of how to use the shell tool with the Agents SDK in TypeScript

Shell tool example - Python

Example of how to use the shell tool with the Agents SDK in Python

Handling common errors

Many CLI tools return non-zero exit codes for warnings; still capture stdout/stderr so the model can interpret the failure.
If a command exceeds the timeout limit, emit {"type": "timeout"} and include partial stdout and stderr that are captured.
If max_output_length exists in shell_call, always pass it back in shell_call_output. otherwise the API returns a 400 error for missing parameter.
There is no need to truncate stderr and stdout. Responses API truncates the output on the server side.

Common patterns and limitations

Run shell tool in sandbox or container. Consider using Docker, firejail, or a jailed user account.
Filter or scrutinize high-risk commands (e.g. rm, curl, network utilities).
Log every command and its output for auditability and debugging.
Commands are non-interactive: avoid tools that prompt for passwords or open full-screen editors.
We recommend to execute the commands concurrently, although sometimes the model may output commands that need sequential execution. In this case, the model is able to adjust and recover in the next response.

Usage notes

API Availability	Supported models
Responses Chat Completions Assistants	GPT-5.1 GPT-5.2 GPT-5.1-codex GPT-5.2-codex

Search the API docs

Get started

Core concepts

Agents

Tools

Run and scale

Evaluation

Realtime API

Model optimization

Specialized models

Coding agents

Going live

Legacy APIs

Resources

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Categories

Topics

When to use

Use shell tool with Responses API

Use the shell tool with the Agents SDK

Handling common errors

Common patterns and limitations

Usage notes