Alternative Architecture: CLI-Native Agent (Shell Exec)
Date: 2026-03-25
Author: Sam (Brainforge)
Status: Draft — comparison against custom-tools approach
Related: spike-command-center-data-access.md, linear-tickets-data-access-chat-integration.md
The question
The current plan (custom-tools approach) builds 9 individual Mastra tool functions that each wrap specific GWS CLI commands and Slack API calls. That works, but the GWS CLI was already designed for agent use — structured JSON output, auto-pagination, schema introspection, 40+ built-in skills. Instead of writing search_drive(), get_drive_activity(), search_gmail(), etc. by hand, we could give the agent a generic shell-exec tool that runs gws commands directly and skip the wrapper layer entirely.
This document maps out what that architecture looks like, what changes, and what the effort comparison is.
MCP server mode — removed
GWS CLI had an MCP server mode (gws mcp -s drive,gmail,calendar) through v0.7.0. It was removed in v0.8.0 via PR #275 (merged March 6, 2026). The removal was a breaking change with no replacement — the maintainers cited context window bloat, tool-name parsing bugs, state management issues, and security concerns. The current version (0.22.1) does not have gws mcp.
Third-party projects that depended on gws mcp have migrated to a subprocess-per-call bridge pattern — spawning a short-lived gws CLI process for each tool call instead. That’s the pattern Architecture B uses.
The Slack MCP server (mcp.slack.com) is still live and maintained by Slack.
Architecture A: Custom Tools (current plan)
┌─────────────────────────────────────────────────────┐
│ Next.js + Mastra (Cloud Run, Eden GCP) │
│ │
│ ┌──────────────┐ ┌───────────────────────────┐ │
│ │ Chat UI │───▶│ Mastra Agent │ │
│ │ (React) │ │ (TypeScript) │ │
│ └──────────────┘ │ │ │
│ │ Tools (hand-built): │ │
│ │ - search_drive() │ │
│ │ - get_drive_activity() │ │
│ │ - get_file_comments() │ │
│ │ - search_gmail() │ │
│ │ - search_calendar() │ │
│ │ - get_user_directory() │ │
│ │ - search_slack() │ │
│ │ - read_slack_thread() │ │
│ │ - get_slack_channel_stats()│ │
│ └───────────┬────────────────┘ │
│ │ │
│ ┌───────────▼────────────────┐ │
│ │ PII Redaction Middleware │ │
│ └───────────┬────────────────┘ │
│ │ │
│ ┌───────────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ GWS CLI Slack API Vertex AI │
│ (shell exec) (HTTP) Gemini API │
└─────────────────────────────────────────────────────┘
Each tool function is ~50–150 lines of TypeScript that:
- Constructs the right
gwsCLI command or Slack API call - Parses the JSON response
- Runs it through PII redaction
- Returns structured data to the agent
Observation: Both the custom tool functions and the raw gws CLI commands do the same thing — call a Google API and return JSON. The custom tools add a translation layer that’s mostly mechanical: build command string → exec → parse JSON → redact → return. The GWS CLI already handles the API call, auth, pagination, and JSON formatting.
Architecture B: CLI-Native Agent (shell exec)
┌──────────────────────────────────────────────────────┐
│ Next.js (Cloud Run, Eden GCP) │
│ │
│ ┌──────────────┐ ┌────────────────────────────┐ │
│ │ Chat UI │───▶│ Mastra Agent │ │
│ │ (React) │ │ (TypeScript) │ │
│ └──────────────┘ │ │ │
│ │ Tools: │ │
│ │ - run_gws(command) │ │
│ │ - gws_schema(method) │ │
│ │ - slack_mcp (MCP client) │ │
│ │ - pii_redact() │ │
│ │ - project_registry() │ │
│ └──────┬──────────┬───────────┘ │
│ │ │ │
│ ┌────────▼──┐ ┌────▼───────────┐ │
│ │ GWS CLI │ │ Slack MCP │ │
│ │ (child │ │ Server │ │
│ │ process) │ │ (mcp.slack.com)│ │
│ └────┬──┬──┘ └────────┬───────┘ │
│ │ │ │ │
│ Google Workspace APIs Slack APIs │
│ │
│ PII redaction runs on ALL tool outputs before they │
│ enter the LLM context window (Mastra processor) │
│ │
│ LLM: Vertex AI Gemini API (BAA-covered) │
└──────────────────────────────────────────────────────┘
The agent has two generic tools for Google Workspace instead of 6+ specific ones:
run_gws(command, params?)
A single Mastra tool that:
- Receives a
gwsCLI command string from the agent (e.g.drive files list,gmail users messages list) - Spawns a short-lived child process:
gws <command> --params '<json>' --format json - Captures stdout (structured JSON)
- Runs the output through
redact()(PII anonymization) - Returns the cleaned JSON to the agent
const runGws = createTool({
id: "run_gws",
description: "Run a Google Workspace CLI command. Returns JSON.",
inputSchema: z.object({
command: z.string().describe("gws CLI command (e.g. 'drive files list')"),
params: z.record(z.any()).optional().describe("API parameters as key-value pairs"),
}),
execute: async ({ command, params }) => {
const args = ["gws", ...command.split(" ")];
if (params) args.push("--params", JSON.stringify(params));
args.push("--format", "json");
const result = await execFile(args[0], args.slice(1), {
env: {
...process.env,
GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE: "/secrets/sa-key.json",
GOOGLE_WORKSPACE_CLI_IMPERSONATED_USER: "admin@eden.com",
},
});
const raw = JSON.parse(result.stdout);
return redact(raw);
},
});gws_schema(method)
Lets the agent discover the shape of any API method at runtime:
const gwsSchema = createTool({
id: "gws_schema",
description: "Get the request/response schema for a gws API method.",
inputSchema: z.object({
method: z.string().describe("API method (e.g. 'drive.files.list')"),
}),
execute: async ({ method }) => {
const result = await execFile("gws", ["schema", method]);
return result.stdout;
},
});The agent can call gws_schema("gmail.users.messages.list") to learn what parameters are available, then construct the right run_gws call. This is self-discovery — no hardcoded tool schemas needed.
Slack: MCP client
The Slack MCP server (mcp.slack.com) is still live. Mastra connects to it as an MCP client. The agent gets search, history, thread reading, and user profile tools from Slack’s server directly.
If the Slack MCP server proves unreliable, we fall back to 3 custom Slack tool functions (same as Architecture A).
What stays the same
| Component | Notes |
|---|---|
| Cloud Run in Eden’s GCP | Same. BAA-covered. |
| Vertex AI Gemini API | Same. BAA-covered LLM endpoint. |
| Next.js 15 + shadcn/ui | Same chat UI and dashboard. |
| PII redaction layer | Still required. Identity mapping, redact(), test suite. |
| Identity mapping table | Same schema, same GCP Secret Manager storage. |
| Service account + DWD | Same. GWS CLI inherits this auth via env vars. |
| Slack app + OAuth token | Same. Slack MCP server needs the token. |
| Google OAuth for Danny | Same. COO authenticates to the web app. |
| Project registry | Same. Maps project names to channels/folders. |
What changes
| Component | Custom Tools (A) | CLI-Native (B) |
|---|---|---|
| GWS data access | 6 hand-built TypeScript tool functions, each wrapping a specific gws command | 1 generic run_gws tool + 1 gws_schema tool; agent constructs commands dynamically |
| Slack data access | 3 hand-built TypeScript tool functions wrapping Slack API calls | Slack MCP server (hosted by Slack) provides tools directly; fallback to custom tools if needed |
| Tool count we build | 9 data access tools + PII + registry = 11 custom tools | 2 GWS tools + PII + registry = 4 custom tools (+ Slack MCP or 3 custom Slack tools) |
| Agent’s GWS knowledge | Hardcoded in each tool’s description and input schema | Agent uses gws_schema to self-discover; system prompt provides a command reference |
| PII enforcement | In each tool function (pre-return) | In run_gws tool (post-exec, pre-return) + Mastra processor on Slack MCP outputs |
| Child processes | One gws exec per tool call (same as now, just hidden inside each tool) | One gws exec per run_gws call (identical — the child process pattern is the same) |
| New GWS API coverage | New tool function per API = new ticket | Agent already has access via run_gws — just needs the command in its system prompt |
| Cloud Run config | Single process | Single process (no sidecar — gws is spawned per-call, not persistent) |
PII redaction in Architecture B
PII redaction is simpler than the (now-removed) MCP approach because run_gws is a tool we control:
-
GWS:
run_gwscallsredact()on the CLI output before returning it to the agent. Same enforcement point as Architecture A — tool-level, pre-return. The difference is oneredact()call in one tool vs. the same call copy-pasted into 6 tools. -
Slack MCP: Mastra processor intercepts all MCP tool outputs and runs
redact()before they enter the LLM context. Or, if we use custom Slack tools instead of MCP, same per-tool enforcement as Architecture A. -
The
redact()function itself is identical in both architectures. It needs to handle arbitrary JSON shapes either way — the GWS CLI returns different schemas for Drive vs Gmail vs Calendar regardless of whether we wrap them in custom tools or pass them throughrun_gws.
Effort comparison
Architecture A: Custom Tools (current tickets)
| Step | Tickets | Points |
|---|---|---|
| 1. Source auth | 1, 2, 3 | 7 |
| 2. Identity + PII | 4a, 4b, 5a, 5b | 14 |
| 3. Slack tools | 6, 7, 8a, 8b | 14 |
| 3. Agent + UI | 9a, 9b | 8 |
| 4. GWS tools | 10-15, 16 | 23 |
| 5. Orchestration + deploy | 17, 18 | 9 |
| Total | 18 tickets | 75 pts |
Architecture B: CLI-Native (shell exec)
| Step | Work | Points |
|---|---|---|
| 1. Source auth (same) | GCP project, service account, DWD request, Slack app | 7 |
| 2. Identity + PII (same) | Mapping schema, resolve_identity, redact(), test suite | 14 |
3. run_gws + gws_schema tools | Build the 2 generic tools, test against Drive/Gmail/Calendar/Admin, validate PII redaction on each response shape | 5 |
| 4. Slack data access | Either: connect Slack MCP (3 pts) or build 3 custom tools (14 pts) | 3–14 |
| 5. Agent + chat UI | Mastra agent with run_gws + Slack tools + project registry, system prompt with GWS command reference, Next.js chat interface, Cloud Run deploy | 10 |
| 6. Cross-platform orchestration | System prompt for multi-source reasoning, project registry, parallel query execution | 5 |
| 7. Validation + deploy | End-to-end testing across all GWS + Slack surfaces, anonymization audit, production deploy | 4 |
| Total (with Slack MCP) | ~12 tickets | ~48 pts |
| Total (with custom Slack) | ~15 tickets | ~59 pts |
Delta
| Arch A (custom) | Arch B (Slack MCP) | Arch B (custom Slack) | |
|---|---|---|---|
| Tickets | 18 | ~12 | ~15 |
| Points | 75 | ~48 | ~59 |
| Savings | — | ~27 pts (36%) | ~16 pts (21%) |
| Custom GWS tool code | ~600-900 lines (6 tools) | ~80 lines (2 generic tools) | ~80 lines |
| Custom Slack tool code | ~300-450 lines (3 tools) | ~0 (MCP) | ~300-450 lines |
The savings come from replacing 6 specific GWS tool functions (Tickets 10-16) with 2 generic tools. The auth, PII, UI, orchestration, and deployment work is roughly the same. Agent prompt engineering effort increases slightly (the system prompt needs a GWS command reference instead of relying on typed tool schemas).
What we gain
-
~16-27 points less work. No hand-built tool functions for Drive, Gmail, Calendar, Admin SDK, etc.
-
Instant coverage of the full GWS API surface. The agent can run any
gwscommand — Drive Activity, Comments, Tasks, Keep, Sheets, Docs — without new tool functions. If the COO asks about Google Tasks tomorrow, the agent can already access it. With custom tools, each new API surface is a new ticket. -
The CLI evolves; we don’t maintain wrappers. When
gwsadds new features, the agent can use them immediately. Our custom tools would need manual updates per API change. -
Self-discovery via
gws_schema. The agent can inspect any API method’s schema at runtime and construct the right command. No hardcoded input schemas to keep in sync. -
Simpler codebase. Two tool functions instead of nine. Less code to test, review, and maintain.
-
No sidecar process. Unlike the (removed) MCP server mode,
run_gwsspawns a child process per call — same pattern as Architecture A already uses inside each custom tool. No persistent sidecar to manage.
What we lose / risk
-
Less structured tool interface. Custom tools have typed input schemas (
search_drive(query, folder_id?, owner_token?)) that guide the LLM.run_gwsaccepts a freeform command string — the agent must know the rightgwssyntax. Mitigation: system prompt includes a command reference with examples for each API. -
Prompt engineering replaces code. Instead of encoding knowledge in typed tool functions, we encode it in the agent’s system prompt. This is less testable and more brittle — a prompt change could break tool routing. Mitigation: comprehensive integration tests that validate the agent calls the right
gwscommands for each query type. -
PII redaction must handle arbitrary shapes. With custom tools, we know exactly which fields to redact in each response. With
run_gws, the redaction function sees whatever the CLI returns. Mitigation:redact()already needs to handle nested JSON generically — the same email/name/phone regex patterns work regardless of response shape. -
Token budget risk. The agent may request more data than needed (e.g.
drive files listwithout--params '{"fields":"files(id,name,modifiedTime)"}'). Custom tools request only the fields they need. Mitigation: system prompt instructs the agent to usefieldsparameters;run_gwscould enforce a default fields mask. -
Command injection surface. The agent constructs shell commands. If the LLM hallucinates a malicious command,
run_gwswould execute it. Mitigation: whitelist allowedgwssubcommands (onlydrive,gmail,calendar,admin,driveactivity:v2). Reject anything else. Never pass raw shell strings — useexecFile(notexec) to prevent injection. -
Harder to unit test. Custom tools are pure functions: input → output.
run_gwsrequires mocking the CLI subprocess. Mitigation: mockexecFilein tests; test theredact()layer independently with fixtures.
Security: command whitelist
run_gws must NOT be a general shell-exec tool. It should enforce:
const ALLOWED_SERVICES = [
"drive", "gmail", "calendar", "admin",
"driveactivity:v2", "sheets", "docs",
];
const command = input.command.split(" ");
const service = command[0];
if (!ALLOWED_SERVICES.includes(service)) {
throw new Error(`Service '${service}' not allowed`);
}Additionally:
- Use
execFile(notexec) — prevents shell metacharacter injection - Read-only operations only — no
delete,update,send,insertsubcommands unless explicitly allowed - Timeout on child process (10 seconds) to prevent hangs
- Log every command for audit trail
Deployment
Identical to Architecture A — single Cloud Run service, no sidecar:
FROM node:22-slim AS base
RUN npm install -g @googleworkspace/cli
# Service account key mounted from GCP Secret Manager at runtime
# GWS CLI reads GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE env var
# Single process: next start (port 8080)
# gws is spawned per-call by run_gws tool, not a persistent processCloud Run config (same as Architecture A):
- Memory: 512MB–1GB
- CPU: 1 vCPU
- Min instances: 1 (avoid cold start)
- Secrets: Service account key, Slack OAuth token, identity mapping — all from GCP Secret Manager
Decision matrix
| Criterion | A: Custom Tools | B: CLI-Native (shell exec) |
|---|---|---|
| Total effort | 75 pts | ~48-59 pts |
| GWS API coverage | 6 specific tools | Full GWS surface (any gws command) |
| Slack control | Full (custom tools) | Depends on MCP / can fall back to custom |
| PII enforcement | Per-tool (tight) | Per-tool in run_gws (same enforcement point) |
| Tool interface quality | Typed schemas, clear inputs | Freeform command string + system prompt |
| Maintenance burden | High (6+ GWS tools to keep in sync) | Low (2 generic tools + prompt updates) |
| Token efficiency | High (curated fields) | Medium (agent must learn to request minimal fields) |
| Testing | Unit tests per tool | Integration tests + redaction unit tests |
| Operational complexity | Simple (one process) | Simple (same — no sidecar) |
| Future extensibility | New ticket per API | Update system prompt |
| Security surface | Minimal (hardcoded commands) | Command whitelist required |
| Time to M3 | ~4 weeks | ~3–3.5 weeks |
Recommendation
Architecture B is worth considering but the savings are more modest than originally estimated (~16-27 pts vs the incorrect ~33 pts). The tradeoff is clear:
-
Pick A if you want typed tool interfaces, straightforward unit testing, and minimal prompt engineering risk. The extra ~16-27 pts is mostly mechanical work (build and test each GWS tool function). It’s tedious but safe.
-
Pick B if you want fewer tickets, instant full-API coverage, and less code to maintain long-term. The tradeoff is more reliance on prompt engineering and integration testing. The security whitelist and PII redaction on arbitrary shapes add some complexity, but both are solvable.
-
Hybrid (B for GWS, A for Slack) is the pragmatic middle ground. The 6 GWS custom tools are the most mechanical to build — they’re all the same pattern (construct
gwscommand → exec → parse → redact). Replacing those withrun_gwssaves the most effort with the least risk. Keeping custom Slack tools preserves rate-limit control and caching.
Next step: If we go with B or hybrid, update the ticket file to reflect the new architecture and re-estimate.