Agent-browser for cloud agents (Platform)
Status: Proposed (not in flight)
Audience: Platform team, Cursor Cloud / agent infrastructure
Reference: agent-browser — browser automation CLI for AI agents (Rust, CDP, ref-based snapshots)
Summary
Evaluate and optionally standardize agent-browser as the primary browser driver for cloud agents (shell-first, low-token observe→act loops). Keep Playwright for product e2e and existing repo workflows unless evaluation proves a single stack is enough.
This doc complements AGENTS.md (Playwright CLI under tools/playwright-cli) by defining when to prefer agent-browser and what to spike before adoption.
Why consider it (vs Playwright and IDE MCP)
| Dimension | agent-browser | Playwright (CLI / tests) | Cursor Browser MCP |
|---|---|---|---|
| Output for LLMs | Compact a11y tree + stable refs (@e1); docs cite ~200–400 tokens vs large DOM dumps | Powerful but often heavier context (traces, verbose logs) | Snapshot + refs; IDE-bound, not for headless cloud-only agents |
| Integration model | Shell + optional daemon; works anywhere commands run | Node ecosystem; repo already pins Playwright CLI for some flows | MCP in Cursor, not a generic cloud primitive |
| Maturity / CI | Newer; growing command set and providers | Default for e2e, CI, playwright.config.ts | N/A for CI agents |
| Determinism | Act on refs from latest snapshot | Strong selectors + auto-wait | Ref-based |
Default policy (proposal): agent-browser for autonomous agent loops where observation token cost matters; Playwright for formal e2e and existing expert-network-style automation until a spike says otherwise.
Goals
- Cloud agents can open a URL, snapshot, click/fill by ref, screenshot, and close without Cursor Browser MCP.
- Documented install and verify path on the same VM image used for Cursor Cloud agents (Node 22 per
AGENTS.md). - Clear handoff to skills (e.g. expert-network flows): when to use agent-browser vs Playwright CLI.
Non-goals
- Replacing Playwright for
apps/platforme2e or CI gates without a separate decision. - Committing secrets or real portal credentials (env / 1Password only).
Phased plan
Phase 0 — Scope and metrics (short)
- List target flows (e.g. portal automation, internal smoke checks).
- Success metrics: tokens per step (rough), cold-start time, flake rate vs Playwright on one shared scenario.
Phase 1 — Spike on Cloud VM
- Install per agent-browser installation (
npm i -g,brew, or pinned binary—align with VM constraints). - Ensure headless Chrome via documented first-run (
agent-browser installor equivalent). - Minimal script:
open→snapshot -i→click @eX→screenshot→close. - Side-by-side with existing Playwright CLI path for the same URL (latency + stdout size).
Phase 2 — Standards doc
- Add
standards/03-knowledge/engineering/setup/agent-browser-setup.md: install, verify, env, security notes (allowlists, no secrets in URLs), stale-ref rule (re-snapshot after navigation).
Phase 3 — Skills and defaults
- Update or add a skill so cloud agents default to agent-browser when appropriate; keep Playwright as fallback for scripted flows.
- Document staleness: always refresh snapshot after navigation or large DOM changes before using refs.
Phase 4 — Security and ops
- Session isolation, domain allowlists where feasible, audit logging for production cloud runs.
- If using hosted browsers (Browserbase, Browserless, etc.), wire secrets via env / 1Password per repo policy.
Phase 5 — CI (optional)
- Only if agent-browser flows become regression assets; otherwise keep CI on Playwright.
Related
- Repo browser guidance: root
AGENTS.md(Playwright CLI),standards/03-knowledge/engineering/setup/playwright-cli-setup.md - agent-browser documentation
Open questions
- Pin version under
tools/vs global install on VM image? - Single “browser automation” skill vs per-workflow overrides?
- Linear epic/project ownership for Phase 1–2 (spike + standards)?
Last updated: 2026-03-31