Agent-browser for cloud agents (Platform)

Status: Proposed (not in flight)
Audience: Platform team, Cursor Cloud / agent infrastructure
Reference: agent-browser — browser automation CLI for AI agents (Rust, CDP, ref-based snapshots)

Summary

Evaluate and optionally standardize agent-browser as the primary browser driver for cloud agents (shell-first, low-token observe→act loops). Keep Playwright for product e2e and existing repo workflows unless evaluation proves a single stack is enough.

This doc complements AGENTS.md (Playwright CLI under tools/playwright-cli) by defining when to prefer agent-browser and what to spike before adoption.

Why consider it (vs Playwright and IDE MCP)

Dimensionagent-browserPlaywright (CLI / tests)Cursor Browser MCP
Output for LLMsCompact a11y tree + stable refs (@e1); docs cite ~200–400 tokens vs large DOM dumpsPowerful but often heavier context (traces, verbose logs)Snapshot + refs; IDE-bound, not for headless cloud-only agents
Integration modelShell + optional daemon; works anywhere commands runNode ecosystem; repo already pins Playwright CLI for some flowsMCP in Cursor, not a generic cloud primitive
Maturity / CINewer; growing command set and providersDefault for e2e, CI, playwright.config.tsN/A for CI agents
DeterminismAct on refs from latest snapshotStrong selectors + auto-waitRef-based

Default policy (proposal): agent-browser for autonomous agent loops where observation token cost matters; Playwright for formal e2e and existing expert-network-style automation until a spike says otherwise.

Goals

  1. Cloud agents can open a URL, snapshot, click/fill by ref, screenshot, and close without Cursor Browser MCP.
  2. Documented install and verify path on the same VM image used for Cursor Cloud agents (Node 22 per AGENTS.md).
  3. Clear handoff to skills (e.g. expert-network flows): when to use agent-browser vs Playwright CLI.

Non-goals

  • Replacing Playwright for apps/platform e2e or CI gates without a separate decision.
  • Committing secrets or real portal credentials (env / 1Password only).

Phased plan

Phase 0 — Scope and metrics (short)

  • List target flows (e.g. portal automation, internal smoke checks).
  • Success metrics: tokens per step (rough), cold-start time, flake rate vs Playwright on one shared scenario.

Phase 1 — Spike on Cloud VM

  • Install per agent-browser installation (npm i -g, brew, or pinned binary—align with VM constraints).
  • Ensure headless Chrome via documented first-run (agent-browser install or equivalent).
  • Minimal script: opensnapshot -iclick @eXscreenshotclose.
  • Side-by-side with existing Playwright CLI path for the same URL (latency + stdout size).

Phase 2 — Standards doc

  • Add standards/03-knowledge/engineering/setup/agent-browser-setup.md: install, verify, env, security notes (allowlists, no secrets in URLs), stale-ref rule (re-snapshot after navigation).

Phase 3 — Skills and defaults

  • Update or add a skill so cloud agents default to agent-browser when appropriate; keep Playwright as fallback for scripted flows.
  • Document staleness: always refresh snapshot after navigation or large DOM changes before using refs.

Phase 4 — Security and ops

  • Session isolation, domain allowlists where feasible, audit logging for production cloud runs.
  • If using hosted browsers (Browserbase, Browserless, etc.), wire secrets via env / 1Password per repo policy.

Phase 5 — CI (optional)

  • Only if agent-browser flows become regression assets; otherwise keep CI on Playwright.
  • Repo browser guidance: root AGENTS.md (Playwright CLI), standards/03-knowledge/engineering/setup/playwright-cli-setup.md
  • agent-browser documentation

Open questions

  • Pin version under tools/ vs global install on VM image?
  • Single “browser automation” skill vs per-workflow overrides?
  • Linear epic/project ownership for Phase 1–2 (spike + standards)?

Last updated: 2026-03-31