Alternative Architecture: CLI-Native Agent (Shell Exec)

Date: 2026-03-25 Author: Sam (Brainforge) Status: Draft — comparison against custom-tools approach Related: spike-command-center-data-access.md, linear-tickets-data-access-chat-integration.md


The question

The current plan (custom-tools approach) builds 9 individual Mastra tool functions that each wrap specific GWS CLI commands and Slack API calls. That works, but the GWS CLI was already designed for agent use — structured JSON output, auto-pagination, schema introspection, 40+ built-in skills. Instead of writing search_drive(), get_drive_activity(), search_gmail(), etc. by hand, we could give the agent a generic shell-exec tool that runs gws commands directly and skip the wrapper layer entirely.

This document maps out what that architecture looks like, what changes, and what the effort comparison is.

MCP server mode — removed

GWS CLI had an MCP server mode (gws mcp -s drive,gmail,calendar) through v0.7.0. It was removed in v0.8.0 via PR #275 (merged March 6, 2026). The removal was a breaking change with no replacement — the maintainers cited context window bloat, tool-name parsing bugs, state management issues, and security concerns. The current version (0.22.1) does not have gws mcp.

Third-party projects that depended on gws mcp have migrated to a subprocess-per-call bridge pattern — spawning a short-lived gws CLI process for each tool call instead. That’s the pattern Architecture B uses.

The Slack MCP server (mcp.slack.com) is still live and maintained by Slack.


Architecture A: Custom Tools (current plan)

┌─────────────────────────────────────────────────────┐
│  Next.js + Mastra (Cloud Run, Eden GCP)             │
│                                                      │
│  ┌──────────────┐    ┌───────────────────────────┐  │
│  │  Chat UI     │───▶│  Mastra Agent              │  │
│  │  (React)     │    │  (TypeScript)              │  │
│  └──────────────┘    │                            │  │
│                      │  Tools (hand-built):       │  │
│                      │  - search_drive()          │  │
│                      │  - get_drive_activity()    │  │
│                      │  - get_file_comments()     │  │
│                      │  - search_gmail()          │  │
│                      │  - search_calendar()       │  │
│                      │  - get_user_directory()    │  │
│                      │  - search_slack()          │  │
│                      │  - read_slack_thread()     │  │
│                      │  - get_slack_channel_stats()│  │
│                      └───────────┬────────────────┘  │
│                                  │                    │
│                      ┌───────────▼────────────────┐  │
│                      │  PII Redaction Middleware   │  │
│                      └───────────┬────────────────┘  │
│                                  │                    │
│              ┌───────────────────┼──────────────┐    │
│              ▼                   ▼              ▼    │
│         GWS CLI            Slack API     Vertex AI  │
│        (shell exec)         (HTTP)      Gemini API  │
└─────────────────────────────────────────────────────┘

Each tool function is ~50–150 lines of TypeScript that:

  1. Constructs the right gws CLI command or Slack API call
  2. Parses the JSON response
  3. Runs it through PII redaction
  4. Returns structured data to the agent

Observation: Both the custom tool functions and the raw gws CLI commands do the same thing — call a Google API and return JSON. The custom tools add a translation layer that’s mostly mechanical: build command string → exec → parse JSON → redact → return. The GWS CLI already handles the API call, auth, pagination, and JSON formatting.


Architecture B: CLI-Native Agent (shell exec)

┌──────────────────────────────────────────────────────┐
│  Next.js (Cloud Run, Eden GCP)                       │
│                                                       │
│  ┌──────────────┐    ┌────────────────────────────┐  │
│  │  Chat UI     │───▶│  Mastra Agent               │  │
│  │  (React)     │    │  (TypeScript)               │  │
│  └──────────────┘    │                             │  │
│                      │  Tools:                     │  │
│                      │  - run_gws(command)          │  │
│                      │  - gws_schema(method)        │  │
│                      │  - slack_mcp (MCP client)    │  │
│                      │  - pii_redact()              │  │
│                      │  - project_registry()        │  │
│                      └──────┬──────────┬───────────┘  │
│                             │          │              │
│                    ┌────────▼──┐  ┌────▼───────────┐  │
│                    │ GWS CLI   │  │ Slack MCP      │  │
│                    │ (child    │  │ Server         │  │
│                    │ process)  │  │ (mcp.slack.com)│  │
│                    └────┬──┬──┘  └────────┬───────┘  │
│                         │  │              │           │
│              Google Workspace APIs    Slack APIs      │
│                                                       │
│  PII redaction runs on ALL tool outputs before they   │
│  enter the LLM context window (Mastra processor)      │
│                                                       │
│  LLM: Vertex AI Gemini API (BAA-covered)              │
└──────────────────────────────────────────────────────┘

The agent has two generic tools for Google Workspace instead of 6+ specific ones:

run_gws(command, params?)

A single Mastra tool that:

  1. Receives a gws CLI command string from the agent (e.g. drive files list, gmail users messages list)
  2. Spawns a short-lived child process: gws <command> --params '<json>' --format json
  3. Captures stdout (structured JSON)
  4. Runs the output through redact() (PII anonymization)
  5. Returns the cleaned JSON to the agent
const runGws = createTool({
  id: "run_gws",
  description: "Run a Google Workspace CLI command. Returns JSON.",
  inputSchema: z.object({
    command: z.string().describe("gws CLI command (e.g. 'drive files list')"),
    params: z.record(z.any()).optional().describe("API parameters as key-value pairs"),
  }),
  execute: async ({ command, params }) => {
    const args = ["gws", ...command.split(" ")];
    if (params) args.push("--params", JSON.stringify(params));
    args.push("--format", "json");
 
    const result = await execFile(args[0], args.slice(1), {
      env: {
        ...process.env,
        GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE: "/secrets/sa-key.json",
        GOOGLE_WORKSPACE_CLI_IMPERSONATED_USER: "admin@eden.com",
      },
    });
 
    const raw = JSON.parse(result.stdout);
    return redact(raw);
  },
});

gws_schema(method)

Lets the agent discover the shape of any API method at runtime:

const gwsSchema = createTool({
  id: "gws_schema",
  description: "Get the request/response schema for a gws API method.",
  inputSchema: z.object({
    method: z.string().describe("API method (e.g. 'drive.files.list')"),
  }),
  execute: async ({ method }) => {
    const result = await execFile("gws", ["schema", method]);
    return result.stdout;
  },
});

The agent can call gws_schema("gmail.users.messages.list") to learn what parameters are available, then construct the right run_gws call. This is self-discovery — no hardcoded tool schemas needed.

Slack: MCP client

The Slack MCP server (mcp.slack.com) is still live. Mastra connects to it as an MCP client. The agent gets search, history, thread reading, and user profile tools from Slack’s server directly.

If the Slack MCP server proves unreliable, we fall back to 3 custom Slack tool functions (same as Architecture A).


What stays the same

ComponentNotes
Cloud Run in Eden’s GCPSame. BAA-covered.
Vertex AI Gemini APISame. BAA-covered LLM endpoint.
Next.js 15 + shadcn/uiSame chat UI and dashboard.
PII redaction layerStill required. Identity mapping, redact(), test suite.
Identity mapping tableSame schema, same GCP Secret Manager storage.
Service account + DWDSame. GWS CLI inherits this auth via env vars.
Slack app + OAuth tokenSame. Slack MCP server needs the token.
Google OAuth for DannySame. COO authenticates to the web app.
Project registrySame. Maps project names to channels/folders.

What changes

ComponentCustom Tools (A)CLI-Native (B)
GWS data access6 hand-built TypeScript tool functions, each wrapping a specific gws command1 generic run_gws tool + 1 gws_schema tool; agent constructs commands dynamically
Slack data access3 hand-built TypeScript tool functions wrapping Slack API callsSlack MCP server (hosted by Slack) provides tools directly; fallback to custom tools if needed
Tool count we build9 data access tools + PII + registry = 11 custom tools2 GWS tools + PII + registry = 4 custom tools (+ Slack MCP or 3 custom Slack tools)
Agent’s GWS knowledgeHardcoded in each tool’s description and input schemaAgent uses gws_schema to self-discover; system prompt provides a command reference
PII enforcementIn each tool function (pre-return)In run_gws tool (post-exec, pre-return) + Mastra processor on Slack MCP outputs
Child processesOne gws exec per tool call (same as now, just hidden inside each tool)One gws exec per run_gws call (identical — the child process pattern is the same)
New GWS API coverageNew tool function per API = new ticketAgent already has access via run_gws — just needs the command in its system prompt
Cloud Run configSingle processSingle process (no sidecar — gws is spawned per-call, not persistent)

PII redaction in Architecture B

PII redaction is simpler than the (now-removed) MCP approach because run_gws is a tool we control:

  1. GWS: run_gws calls redact() on the CLI output before returning it to the agent. Same enforcement point as Architecture A — tool-level, pre-return. The difference is one redact() call in one tool vs. the same call copy-pasted into 6 tools.

  2. Slack MCP: Mastra processor intercepts all MCP tool outputs and runs redact() before they enter the LLM context. Or, if we use custom Slack tools instead of MCP, same per-tool enforcement as Architecture A.

  3. The redact() function itself is identical in both architectures. It needs to handle arbitrary JSON shapes either way — the GWS CLI returns different schemas for Drive vs Gmail vs Calendar regardless of whether we wrap them in custom tools or pass them through run_gws.


Effort comparison

Architecture A: Custom Tools (current tickets)

StepTicketsPoints
1. Source auth1, 2, 37
2. Identity + PII4a, 4b, 5a, 5b14
3. Slack tools6, 7, 8a, 8b14
3. Agent + UI9a, 9b8
4. GWS tools10-15, 1623
5. Orchestration + deploy17, 189
Total18 tickets75 pts

Architecture B: CLI-Native (shell exec)

StepWorkPoints
1. Source auth (same)GCP project, service account, DWD request, Slack app7
2. Identity + PII (same)Mapping schema, resolve_identity, redact(), test suite14
3. run_gws + gws_schema toolsBuild the 2 generic tools, test against Drive/Gmail/Calendar/Admin, validate PII redaction on each response shape5
4. Slack data accessEither: connect Slack MCP (3 pts) or build 3 custom tools (14 pts)3–14
5. Agent + chat UIMastra agent with run_gws + Slack tools + project registry, system prompt with GWS command reference, Next.js chat interface, Cloud Run deploy10
6. Cross-platform orchestrationSystem prompt for multi-source reasoning, project registry, parallel query execution5
7. Validation + deployEnd-to-end testing across all GWS + Slack surfaces, anonymization audit, production deploy4
Total (with Slack MCP)~12 tickets~48 pts
Total (with custom Slack)~15 tickets~59 pts

Delta

Arch A (custom)Arch B (Slack MCP)Arch B (custom Slack)
Tickets18~12~15
Points75~48~59
Savings~27 pts (36%)~16 pts (21%)
Custom GWS tool code~600-900 lines (6 tools)~80 lines (2 generic tools)~80 lines
Custom Slack tool code~300-450 lines (3 tools)~0 (MCP)~300-450 lines

The savings come from replacing 6 specific GWS tool functions (Tickets 10-16) with 2 generic tools. The auth, PII, UI, orchestration, and deployment work is roughly the same. Agent prompt engineering effort increases slightly (the system prompt needs a GWS command reference instead of relying on typed tool schemas).


What we gain

  1. ~16-27 points less work. No hand-built tool functions for Drive, Gmail, Calendar, Admin SDK, etc.

  2. Instant coverage of the full GWS API surface. The agent can run any gws command — Drive Activity, Comments, Tasks, Keep, Sheets, Docs — without new tool functions. If the COO asks about Google Tasks tomorrow, the agent can already access it. With custom tools, each new API surface is a new ticket.

  3. The CLI evolves; we don’t maintain wrappers. When gws adds new features, the agent can use them immediately. Our custom tools would need manual updates per API change.

  4. Self-discovery via gws_schema. The agent can inspect any API method’s schema at runtime and construct the right command. No hardcoded input schemas to keep in sync.

  5. Simpler codebase. Two tool functions instead of nine. Less code to test, review, and maintain.

  6. No sidecar process. Unlike the (removed) MCP server mode, run_gws spawns a child process per call — same pattern as Architecture A already uses inside each custom tool. No persistent sidecar to manage.


What we lose / risk

  1. Less structured tool interface. Custom tools have typed input schemas (search_drive(query, folder_id?, owner_token?)) that guide the LLM. run_gws accepts a freeform command string — the agent must know the right gws syntax. Mitigation: system prompt includes a command reference with examples for each API.

  2. Prompt engineering replaces code. Instead of encoding knowledge in typed tool functions, we encode it in the agent’s system prompt. This is less testable and more brittle — a prompt change could break tool routing. Mitigation: comprehensive integration tests that validate the agent calls the right gws commands for each query type.

  3. PII redaction must handle arbitrary shapes. With custom tools, we know exactly which fields to redact in each response. With run_gws, the redaction function sees whatever the CLI returns. Mitigation: redact() already needs to handle nested JSON generically — the same email/name/phone regex patterns work regardless of response shape.

  4. Token budget risk. The agent may request more data than needed (e.g. drive files list without --params '{"fields":"files(id,name,modifiedTime)"}'). Custom tools request only the fields they need. Mitigation: system prompt instructs the agent to use fields parameters; run_gws could enforce a default fields mask.

  5. Command injection surface. The agent constructs shell commands. If the LLM hallucinates a malicious command, run_gws would execute it. Mitigation: whitelist allowed gws subcommands (only drive, gmail, calendar, admin, driveactivity:v2). Reject anything else. Never pass raw shell strings — use execFile (not exec) to prevent injection.

  6. Harder to unit test. Custom tools are pure functions: input → output. run_gws requires mocking the CLI subprocess. Mitigation: mock execFile in tests; test the redact() layer independently with fixtures.


Security: command whitelist

run_gws must NOT be a general shell-exec tool. It should enforce:

const ALLOWED_SERVICES = [
  "drive", "gmail", "calendar", "admin",
  "driveactivity:v2", "sheets", "docs",
];
 
const command = input.command.split(" ");
const service = command[0];
if (!ALLOWED_SERVICES.includes(service)) {
  throw new Error(`Service '${service}' not allowed`);
}

Additionally:

  • Use execFile (not exec) — prevents shell metacharacter injection
  • Read-only operations only — no delete, update, send, insert subcommands unless explicitly allowed
  • Timeout on child process (10 seconds) to prevent hangs
  • Log every command for audit trail

Deployment

Identical to Architecture A — single Cloud Run service, no sidecar:

FROM node:22-slim AS base
RUN npm install -g @googleworkspace/cli
 
# Service account key mounted from GCP Secret Manager at runtime
# GWS CLI reads GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE env var
 
# Single process: next start (port 8080)
# gws is spawned per-call by run_gws tool, not a persistent process

Cloud Run config (same as Architecture A):

  • Memory: 512MB–1GB
  • CPU: 1 vCPU
  • Min instances: 1 (avoid cold start)
  • Secrets: Service account key, Slack OAuth token, identity mapping — all from GCP Secret Manager

Decision matrix

CriterionA: Custom ToolsB: CLI-Native (shell exec)
Total effort75 pts~48-59 pts
GWS API coverage6 specific toolsFull GWS surface (any gws command)
Slack controlFull (custom tools)Depends on MCP / can fall back to custom
PII enforcementPer-tool (tight)Per-tool in run_gws (same enforcement point)
Tool interface qualityTyped schemas, clear inputsFreeform command string + system prompt
Maintenance burdenHigh (6+ GWS tools to keep in sync)Low (2 generic tools + prompt updates)
Token efficiencyHigh (curated fields)Medium (agent must learn to request minimal fields)
TestingUnit tests per toolIntegration tests + redaction unit tests
Operational complexitySimple (one process)Simple (same — no sidecar)
Future extensibilityNew ticket per APIUpdate system prompt
Security surfaceMinimal (hardcoded commands)Command whitelist required
Time to M3~4 weeks~3–3.5 weeks

Recommendation

Architecture B is worth considering but the savings are more modest than originally estimated (~16-27 pts vs the incorrect ~33 pts). The tradeoff is clear:

  • Pick A if you want typed tool interfaces, straightforward unit testing, and minimal prompt engineering risk. The extra ~16-27 pts is mostly mechanical work (build and test each GWS tool function). It’s tedious but safe.

  • Pick B if you want fewer tickets, instant full-API coverage, and less code to maintain long-term. The tradeoff is more reliance on prompt engineering and integration testing. The security whitelist and PII redaction on arbitrary shapes add some complexity, but both are solvable.

  • Hybrid (B for GWS, A for Slack) is the pragmatic middle ground. The 6 GWS custom tools are the most mechanical to build — they’re all the same pattern (construct gws command → exec → parse → redact). Replacing those with run_gws saves the most effort with the least risk. Keeping custom Slack tools preserves rate-limit control and caching.

Next step: If we go with B or hybrid, update the ticket file to reflect the new architecture and re-estimate.