Primary navigation
Apr 19, 2026

Computer Use Agents in Daytona Sandboxes

Plenty of useful work still lives behind browser UIs with no public API: third-party dashboards, admin panels, form-heavy workflows. The Agents SDK’s Computer Use tool lets an agent see and control a desktop. In this cookbook, we use a Daytona sandbox as the source of that desktop.

The Computer Use tool needs just a handful of primitives to drive a desktop: screenshot, click, type, scroll, press keys. A Daytona sandbox wraps a Linux desktop (browser included) in a Python SDK that exposes exactly those primitives. A thin adapter implementing the Agents SDK’s AsyncComputer interface plugs the sandbox into the tool.

The agent loop runs in this notebook while the sandbox does the actual clicking and typing. As a demo, in this cookbook we have an agent fill out a web form. The form itself is served inside the sandbox on localhost:8080, and the whole session is recorded to an .mp4 embedded below.

The same pattern works for any task you’d describe as “open an app, navigate somewhere, interact with the screen”: testing UI flows end-to-end, driving legacy desktop software, or any workflow that only exists as a human-facing interface.

Below you can watch an agent drive the sandbox to fill out a complex multi-page form. The rest of this cookbook walks through the machinery that makes it run.