Brainforge Work — Rolling PRD

Owner: Uttam Kumaran Created: 2026-04-29 Last updated: 2026-04-29 Status: active Sources: Miranda Wen’s Product Plan (brainforge-platform PR #660), OpenWork Server PRD, OpenWork Orchestrator PRD, 10x Quality Pass, Usability Refactor Plan, Skill Governance Plan, 2026-04-29 ideation session, 2026-04-29 doc review


1. Product Vision

Brainforge Work provides a unified surface where AI skills become usable products — easy to run, easy to share, easy to govern, and reliable enough for everyday work across teams. It abstracts away underlying complexity so people can execute repeatable workflows faster, more consistently, and at scale.

AI capability is no longer the scarce thing; operational use is. Teams can access strong models and agent workflows, but usage concentrates among experts. Brainforge Work packages expert capability into guided, repeatable, trustworthy workflows that more people can run confidently.

Core Principles

PrincipleDefinition
Self-explanatory by defaultNo formal training required; product guidance is built into the UI
Repeatable workflow executionRun the same workflow again without rebuilding
Low-support adoptionPilot users succeed without live walkthroughs
Governed delegationApprovals, sharing controls, audit visibility
Workflow reuse across teamsExpert-built workflows used by multiple people
Clear operator experienceNo specialist toolchains required

2. Architecture

LayerTechnologyNotes
Desktop shellTauri 2.x (Rust)macOS/Windows/Linux, protocol handlers, sidecar management
FrontendSolidJS + TailwindCSSCUPID domain architecture, DLS tokens mandatory
OrchestratorNode/Bun CLILifecycle supervisor for OpenCode, host/client mode
ServerNode/Bun HTTP APIFilesystem-backed API for remote client config
OpenCode integrationCLI spawn or embedded binaryPer-workspace isolation, aks key, proxy header routing
Authdemo (dev) / real (production) / agent (CI)Azure-first providers (Kimi K2.5 + GPT-4o via brainforge-openai)
Skills200+ skills in .agents/skills/File-based, hierarchical precedence (see §4.4), hot-reload

Runtime Modes

ModeDescription
Desktop-hostedApp runs locally, hosts server + OpenCode engine
CLI-hostedServer surfaces provided by orchestrator on a trusted machine
Hosted CloudBrainforge-hosted infrastructure provisions workers

3. Phases & Tracker

Legend: ☐ Not started | ◐ In progress | ★ Deferred | ✓ Complete | ✗ Cancelled

Phase 0: Foundation (Target: 2026-04)

IDDeliverableStatusNotes
P0.1Rename to “Brainforge Work” (Tauri config, shell defaults)PRD created, renames applied
P0.2Landing surface explains what Work is for, who it’s for
P0.3Basic navigation and startup flow reliable for pilot users
P0.4Guided First-Run Experience (auto-detect env, zero-config)Merged from ideation I2. Prerequisite: P0.5 must ship first — credentials must be diagnosable before users see the app
P0.5Model fallback diagnostics (Azure unreachable → clear error)Merged from ideation I7. Blocks P0.4 — credential health check is prerequisite for any user-facing milestone
P0.6Testing Foundation — Playwright E2E framework + CI matrixSee §4.5. Prerequisite: audit CI path filters (packages/**apps/**) — current filters don’t match monorepo source layout
P0.7Server unit test coverage audit + gap fillExisting: Bun test, 9 files. Gaps: MCP execution, file ops edge cases. Prerequisite: add bun test job to ci.yml — server tests currently not run in CI
P0.8Knowledge Repo Viewer — Quartz (MIT) static site for knowledge/, docs/, apps/app/pr/ hosted at knowledge.brainforge.ai via RailwaySee I16. Supporting infra — does not gate user-facing milestones. GitHub Action rebuild on push to main

Phase 1: Workflow Productization (Target: 2026-04 to 2026-05)

IDDeliverableStatusNotes
P1.1Entry surface reachable without developer-only setup
P1.2Small fixed set of high-confidence workflows run end-to-end
P1.3First-run experience does not assume IDE-native behavior
P1.4Repeat runs of same workflow without rebuilding
P1.510x quality gates met (all Q1-Q10)See §4.3
P1.6Skill Conflict Detection — surface silent trigger overwritesMerged from ideation I1
P1.7Run Dossiers Completion — search, filter, compare, shareMerged from ideation I3
P1.8Token Estimate Display — show estimated token count/cost before skill executionStripped from I4. Full dry-run (sandbox, diff preview) moved to P2.9
P1.9Sequential Skill Chaining — “run skill A, then skill B” with output→input passthroughMerged from ideation I5. DAG editor moved to P2.7, gated on P3.2 governance
P1.10Core Functionality Test Suite — skill use, artifact creation, MCPs, loggingSee §4.5
P1.11Browser walkthrough tests for key user flowsSee §4.5.3
P1.12Pre-push test gate (husky/lefthook) to block regressions

Phase 2: Internal Adoption (Target: 2026-05 to 2026-06)

IDDeliverableStatusNotes
P2.1Initial GTM pilot cohort confirmed
P2.2First workflow set live with clear use cases
P2.3Self-explanatory first use (no live walkthrough needed)
P2.4Support burden manageable for pilot scale
P2.5Strategy/CSO/Delivery workflows identified and scoped
P2.6Workflow Authoring — skill selection + parameter config + bundle + share as reusable workflow packageAddresses P0 gap: product premise requires a creation path
P2.7Multi-Skill Composition DAG Editor — DAG-based chaining with cue points, fallback branches, handoff contractsMoved from P1.9. Gated on P3.2 governance
P2.8Workflow Sharing — share-via-link with preview card, team workflow catalog, discover-from-team-membersPairs with P2.6 authoring. Required for workflow reuse metric
P2.9Skill Preview & Dry-Run Mode — sandboxed execution, diff preview, mock credentialsMoved from P1.8. Full dry-run gated on P3.2 governance

Phase 3: Trust & Governance (Target: 2026-05 to 2026-07)

IDDeliverableStatusNotes
P3.1Success metrics observable consistently for pilot users
P3.2Skill governance: deterministic precedence + conflict surfacing
P3.3Skill audit tooling + CI guardrails
P3.4Minimum viable governance: approvals + sharing controls
P3.5Users report higher confidence in Work vs generic chat
P3.6Agent Explainability Panel — tool-call transparencyMerged from ideation I6

Phase 4: Client Delivery (Target: 2026-07 to 2026-09)

IDDeliverableStatusNotes
P4.1Internal adoption strong enough for credible case study
P4.2Services packaging: Work value clear for delivery quality
P4.3Candidate client workflow categories identified
P4.4CSO/GTM positioning clear for productized future

4. Detailed Requirements

4.1 Server API

Functional:

  • Expose workspace config read/write APIs for .opencode and opencode.json
  • List installed skills, plugins, MCPs without direct FS access
  • Allow saving new skills, plugins, MCP entries from remote clients
  • Host mode auto-starts OpenWork server alongside OpenCode engine
  • Surface pairing URL + tokens in Settings for host mode
  • Show remote-config origin, last updated time, and change attribution

API surface: GET /health, GET /workspaces, GET/PATCH /workspace/:id/config, GET/POST /workspace/:id/skills, GET /workspace/:id/plugins, GET /capabilities

Auth model: Write endpoints (PATCH, POST) require a valid pairing token (see §4.2) or session JWT. Read endpoints require a workspace-scoped client identity header. Demo mode (dev) skips auth; real mode enforces. Agent mode uses VITE_AUTH_BRIDGE_SESSION_TOKEN.

Non-goals: Replacing OpenCode’s server, arbitrary filesystem access, multi-tenant hosting

4.2 Orchestrator

Host lifecycle: User picks workspace → OpenWork starts OpenCode → starts server → registers workspace → Settings shows pairing URL + token

Client flow: User enters host URL + token → client calls /health + /workspaces → host returns OpenCode base URL + directory → client connects via SDK

Fallback: Non-OpenWork URLs connect directly to OpenCode. UI shows “Connected via OpenCode (not OpenWork).”

Data model: WorkspaceInfo { id, name, path, workspaceType, remoteType: "openwork" | "opencode", openworkHostUrl?, openworkWorkspaceId?, opencodeBaseUrl?, opencodeDirectory? }

4.3 UX Standards

10x Quality Gates:

GateRequirementStatus
Q1Dashboard data refresh TTL + manual refresh
Q2No debug logs in production (gated behind developerMode())
Q3Step cluster collapse works consistently in MessageList
Q4Session sidebar shows “Show more” affordance
Q5Context menus never render off-screen
Q6Visibility-aware polling (pause when hidden)
Q7No browser-native prompts (window.confirm/prompt)
Q8Mention search never shows stale results
Q9Inactive workspace session freshness
Q10OpenWork server check backoff when disconnected

Design System: All UI must use DLS tokens. No hardcoded colors, shadows, border radius, or fonts. WCAG 2.1 AA contrast minimum.

Completed UX (from prior refactor):

  • company-dev skips chooser when remote worker env present
  • ✓ Skills renders as discovery catalog, not package manager
  • ✓ Remote workspaces only show cloud-compatible integrations
  • ✓ Dashboard shell reachable even when backend disconnected
  • ✓ Completed runs expose shareable dossier summaries (80%)

4.4 Skill Governance (Planned, Not Implemented)

Precedence model: Project-local (.opencode/skills, .claude/skills, .cursor/skills/*/.opencode/skills) > Repo-discovered (.agents/skills) > Cursor top-level (.cursor/skills/*/SKILL.md) > Global (~/.config/opencode/skills, ~/.claude/skills, ~/.agents/skills)

Conflict handling: Default API returns active skills only (backwards compatible). Opt-in includeConflicts=true returns conflict groups with candidate metadata.

Audit tooling: Scan .agents/skills/**/SKILL.md for duplicates, missing metadata, broken links. Machine-readable JSON + human summary for PR checks.

4.5 Testing Architecture

Current state (audit 2026-04-29):

LayerFrameworkTest countCoverage
Server (apps/server)Bun test9 filesValidators, tokens, skills, sessions, MCP, file ops, workflows
Router (apps/opencode-router)Node node:test6 filesBridge, Slack, Telegram, DB store
Orchestrator (apps/orchestrator)Raw scripts2 filesCLI routing, file sessions
Auth (services/openwork-auth)Vitest1 fileSession cookies, middleware
Share (services/openwork-share)Node node:test3 filesBundle rendering, packaging
Web UI (apps/app)None0 filesP0 gap — no unit or component tests
Desktop (apps/desktop)None0 filesP0 gap — no Rust or integration tests
Browser E2EPuppeteer (scripts only)~15 scriptsHealth, sessions, chat, UI suite
Playwright E2ENot configuredN/AReferenced in docs but not implemented

Key gaps identified:

  • No SolidJS component or unit tests (P0)
  • No skill execution end-to-end tests (P0)
  • No artifact creation/verification tests (P1)
  • No MCP tool execution or MCP server management tests (P1)
  • No logging infrastructure tests (P1)
  • No test coverage measurement (P1)
  • tests/e2e/ directory referenced in docs does not exist (P1)
  • No pre-push test hooks (P3)

4.5.1 Phase 0: Testing Foundation (P0.6)

Goal: Standardize test framework, establish Playwright E2E, wire CI matrix.

Deliverables:

  • Playwright config at repo root with desktop + web + mobile projects
  • tests/e2e/ directory tree with README per AGENTS.md spec
  • CI matrix (ci-tests.yml) runs Playwright on ubuntu-22.04 + macos-14
  • Coverage measurement (v8/c8) wired to server tests
  • Test script naming convention standardized across packages

Framework decisions:

  • Browser E2E (Layer 1 — Regression): Playwright (replacing ad-hoc Puppeteer scripts). Screenshot, trace, video on failure. Deterministic pass/fail for CI.
  • Server unit: Bun test (keep). Fast, native TypeScript, already established.
  • Router integration: Node node:test (keep). Simpler for CLI/network tests.
  • Frontend component: Vitest + Solid Testing Library (add). Matches SolidJS ecosystem.
  • Desktop: cargo test for Rust. Playwright for desktop E2E (launch Tauri binary).

Dual-layer testing model:

LayerTest typeFrameworkWhat it verifiesRuns
Layer 1 — StructuralDeterministic E2EPlaywrightDid the button render? Did the route load? Are DLS tokens applied? Is the MCP connected?CI pre-merge (fast)
Layer 2 — SemanticAgent-driven qualitybrowser-use / Playwright MCP / Cloud MCPDid the skill produce a correct artifact? Was the output well-reasoned? Is the dossier readable?Pre-release, nightly, or manual gate

Layer 2 rationale: Traditional E2E can verify that a skill run completed, but it cannot assess whether the output is correct. Agent-based browsers (browser-use, Playwright MCP, Claude Computer Use) can inspect the result page and apply judgment — “does this legal review flag the right risks?” — in a way static assertions cannot. This directly enables the Adversarial Output Review ideation idea (I9).

Layer 2 candidate tools:

ToolTypeStrengthsWeaknesses
browser-useOSS agent (~50K stars)Python, task-driven (“review this page”), no test writingLLM latency, non-deterministic, young
Playwright MCPMCP serverAny LLM can drive Playwright, reuse existing page objectsToken-heavy (~114K/session), Microsoft recommends CLI over MCP
Cloud-browser MCP (peta.io)Token-efficient MCPCuts tokens from ~114K to ~5K per sessionSingle vendor, new
Claude Computer UseFirst-party AnthropicProduction-grade, built into APIExpensive per action, API-only
QA StudioRecord-to-codeOSS, press record, generates Playwright testsEarly, not semantic judgment

Recommendation: Start with browser-use for Layer 2 (OSS, active community, Python — easy to script as a pre-release gate on the orchestrator stack). Graduate to Playwright MCP + Cloud MCP for tighter integration if token cost becomes an issue.

4.5.2 Phase 1: Core Functionality Test Suite (P1.10)

Test CategoryWhat to TestFrameworkLayerPriority
Skill useInstall a skill → list skills → trigger skill → verify completionPlaywright E2E + server unitL1P0
Skill qualityRun skill → agent inspects output for correctnessbrowser-use / Playwright MCPL2P1
Artifact creationRun creates dossier → Playwright verifies dossier appears, is searchablePlaywright E2EL1P1
Artifact qualityAgent inspects dossier readability, completeness, format correctnessbrowser-useL2P2
Core MCPsConnect MCP → list MCP tools → invoke tool → verify response returnedPlaywright E2E + server integrationL1P0
MCP output qualityAgent verifies MCP tool response is validbrowser-useL2P2
LoggingTrigger error paths → verify structured log output → verify no secrets in logsServer unit + smokeL1P1
Session CRUDCreate session → send prompt → receive response → list messages → deletePlaywright E2E + existing scriptsL1P0
Auth flowdemo mode → real mode → agent mode → verify token lifecycleVitest (existing) + PlaywrightL1P1
Desktop (Tauri)Launch app → connect worker → run skill → verify window statePlaywright desktop E2EL1P1
ConnectorsSlack → Telegram → verify adapter message flowNode test (existing) + integrationL1P1
OrchestratorHost mode startup → client connect → workspace switch → fallback pathOrchestrator scripts + PlaywrightL1P1

4.5.3 Browser Walkthrough Tests (P1.11)

FlowStepsFramework
First launchOpen app → verify dashboard renders → verify no errors in console → verify connection statusPlaywright
Skill discoveryNavigate to Skills → verify catalog renders → search → verify results → click skill → verify detail viewPlaywright
Run a skillSelect skill → fill inputs → click run → verify streaming response → verify “complete” state → verify dossier entry createdPlaywright
MCP managementNavigate to Integrations → add MCP → verify in list → connect → disconnect → removePlaywright
Settings flowOpen settings → change model → verify persists → change theme → verify DLS appliedPlaywright
Error recoveryKill server → verify diagnostic shows → restart server → verify auto-reconnectPlaywright
Desktop-specificOpen Tauri window → verify title “Brainforge Work” → verify menu actions → verify Cmd+Q → verify re-openPlaywright desktop

4.5.4 CI Gates

  • Pre-merge (Layer 1 only): ci.yml runs server Bun tests + auth Vitest tests + Playwright structural E2E + security guards. Must pass in <5 min.
  • E2E matrix: ci-tests.yml runs Playwright suite on ubuntu-22.04 + macos-14
  • Desktop build: build-desktop.yml adds cargo test before build
  • Pre-push hook (P1.12): Run server tests + lint on git push. Full E2E on CI only.
  • Pre-release / nightly (Layer 2): Agent-driven quality gate runs skill quality + artifact quality + MCP output quality tests via browser-use CLI. Results posted as PR comment or Slack. Not a merge block (non-deterministic), but a quality signal.

4.6 Feature Spec Inventory

Note: 11 unstarted items need triage — each should be assigned to a phase or marked cancelled. Items that duplicate phase tracker deliverables should be merged in.

SpecDescriptionStatusTriage
plugin-endpoints.mdPlugin config via /config API endpoint
reload-toast-persist.mdReload toast persists across navigation
browser-entry-button.mdBrowser entry button in UI
skill-creator-triggers.mdTrigger rules for skill creator
notion-connection-fix.mdNotion connection fix
steps-composer-docked.mdDocked steps composer UX
cmdk-session-model-thinking.mdCMD+K for model/thinking settings
OpenCode Server.mdOpenCode server integration
session-view-sans-font.mdSession view font specification
always-open-new-session.mdAlways open new session on boot
context-panel-ux.mdContext panel UX improvements
session-flow-humanized.mdSession flow human narrative pass
orbita-layout-ui.mdOrbita layout UI refresh
multi-workspace-config.mdMulti-workspace config
openwork-orchestrator-multi-workspace.mdOrchestrator multi-workspace
telegram-private-bot-pairing.mdTelegram private bot pairing

4.7 Design Decisions (Resolved from Review 2026-04-29)

Interaction states per surface:

SurfaceLoadingEmptyErrorPopulated
Skills catalogSkeleton card gridCTA: “Browse recommended skills” → curated listRetry button + “Couldn’t load skills”Searchable card grid
MCP/Integrations listSkeleton rowsCTA: “Add your first integration”Red dot + retry + “Connection failed”List with status dots (green/yellow/red)
DashboardSkeleton widgetsCTA: “Run your first workflow”Banner: “Worker not reachable — check connection”Active sessions + metrics
Session composerIdle (empty prompt area)N/A — always availableInline: “Message failed to send — retry?”Streaming → complete
DossiersSkeleton rows”No dossiers yet — run a skill to create one”Retry buttonSearchable + filterable list

Resolved design decisions:

DecisionResolutionRationale
Dashboard auto-focus after startupFocus on active sessions if any exist; otherwise recommended skillsMost users return to ongoing work; new users need discovery
Dossier share target (v1)In-app link with preview card; Markdown export as v2Link is zero-friction for internal teams; export is optional
Discovery card deep-linkingCards link to skill detail page (not integration page root)Users click cards to learn about the skill, not manage infrastructure
Self-explanatory mechanismInline tooltips on first visit + empty-state CTAsSimpler than guided overlay; pairs with P0.4 first-run checklist
First-run flow3-step: Welcome → Scan (“Checking your setup…”) → Ready (“Everything looks good — start here”)Single-screen with progress; no multi-page wizard
Multi-skill composition UXDeferred to Phase 2 exploration. Phase 1: sequential chaining only (“run skill A, then skill B”)DAG editor is High complexity; sequence-first lets single-skill workflows prove value
Dossier search + filter interactionDropdown chips: outcome + date range + skill-name autocomplete. Comparison: side-by-side diffResolves I3 remaining 20%

5. Success Metrics

MetricTargetCurrent
First-run workflow completion>=40% of pilot users complete meaningful workflow on first run
Return usage>=70% return to run a second workflow
Time-to-first-successMedian < 4 minutes from first visit to completion
Self-service adoption>=80% complete first workflow without live walkthrough
Trust vs generic chat>=70% report Work feels more trustworthy than generic chat
Workflow reuse>=2 expert-built workflows reused by 3+ users
Weekly active users>=80% of target pilot users are weekly active
Input-to-feedback latency< 100ms on session and dashboard actions
Stale-dashboard regressions0 in repeated navigation

6. Risk Register

RiskSeverityMitigation
Workflow quality inconsistent early onMediumNarrow set of high-confidence workflows; quality bar before expanding
Product feels too technical for non-IDE usersHighPrioritize onboarding clarity, guided inputs, simpler outputs
Governance friction slows adoptionMediumMinimum viable governance first; increase controls only where needed
Success depends on narrow workflow setMediumFocused pilot catalog; expand only after reuse signals clear
Too much live explanation neededHighLow-training usability as product requirement; support effort as core signal
Skill precedence change alters active resolutionLowShip with conflict surfacing; temporary allowlist for known duplicates
Remote startup conflicts with persisted local preferenceMediumCentralize remote-worker signal; prefer env-driven path in company-dev
Azure credential fragility (silent dead app)HighModel fallback diagnostics (P0.5); credential health check on startup

7. Running Ideas

IDIdeaConvictionComplexityPhaseStatus
I1Skill Conflict Detection & Resolution — surface silent trigger overwrites, conflict dashboard, dead skill janitorHighMed1→3Unexplored
I2Guided First-Run Experience — auto-detect env, zero-config cold start, one-screen checklistHighLow-Med0Unexplored
I3Run Dossiers Completion — search, filter by outcome, compare runs, share via linkHighLow1→2Explored — 80% implemented
I4Skill Preview & Dry-Run Mode — sandboxed execution, cost estimate, diff preview, mock credentialsHighMed-High1→3Unexplored
I5Multi-Skill Composition — now scoped as sequential chaining (P1.9); DAG editor moved to P2.7MedHigh1→2Explored — scoped down from review
I6Agent Explainability Panel — tool-call transparency, decision trace, failure visibilityHighMed3Unexplored
I7Model Fallback & Graceful Degradation — Azure-unreachable diagnostics, provider fallback hints, credential healthHighLow0→1Unexplored
I8Session-to-Skill Compiler — observe manual runs, auto-generate reusable skill from traceMedMed2Unexplored
I9Adversarial Output Review — second-agent critique on every output, confidence scoringMedHigh3Deferred
I10Skill Telemetry & Scorecard — per-skill analyticsMedMed3Deferred
I11Universal Deep-Link Handshake — one-click workspace provisioningMedMed2Deferred
I12Adaptive Onboarding Engine — role-aware skill suggestionsLowHigh2Deferred
I13Continuous Work Stream — no sessions, infinite timelineLowHigh3+Deferred
I14Offline-First Peer-to-Peer — local models, zero cloudLowHigh4+Deferred
I15Voice/Mobile-First — WhatsApp/Telegram connectorLowHigh4+Deferred
I16Knowledge Repo Viewer — Quartz (MIT, 12K★) static site at knowledge.brainforge.ai via RailwayHighLow-Med0Decided — Quartz on Railway (P0.8)

8. Source Documents

DocumentLocationNotes
Miranda Wen Product Planbrainforge-platform: knowledge/engineering/openwork-platform-integration/brainforge-work-product-plan.mdStrategic vision, 4 initiatives, milestones
OpenWork Server PRDapps/app/pr/openwork-server.mdServer API design (804 lines)
OpenWork Orchestrator PRDapps/app/pr/openwork-orchestrator.mdHost/client mode, fallback, data model
10x Quality Passapps/app/pr/openwork-10x.mdP0 quality gates (Q1-Q10)
10x Quality Auditapps/app/pr/openwork-10x-audit.mdCompanion audit findings
Usability Refactor Plandocs/plans/2026-04-18-001-refactor-openwork-usability-and-design-plan.md6-unit UX refactor (complete)
Skill Governance Plandocs/plans/2026-04-20-001-feat-skill-governance-and-discovery-plan.md5 units (not implemented)
Design Refresh DLSdocs/plans/2026-04-24-design-refresh-dls-normalization.mdToken normalization notes
Work Guides Overlaydocs/plans/2026-04-24-002-brainforge-work-guides-overlay-implementation-plan.mdOverlay walkthrough V1
+ 12 feature specsapps/app/pr/*.mdIndividual feature-level specs

9. Deferred / Open Questions

From 2026-04-29 doc review (P1 — security gaps):

#FindingWhy deferred
Q1Pairing token security — generation algorithm, entropy, expiry, rotation, revocation not specifiedRequires dedicated security design pass beyond PRD scope
Q2Credential management — no secrets storage strategy, rotation policy, or compromise procedureDepends on infrastructure decisions (env vars vs vault vs managed secrets)
Q3Data classification — no policy for sensitive data handling, retention/deletion, or trust boundary crossingsNeeds stakeholder agreement on data sensitivity tiers
Q4Input validation — POST endpoints lack schema enforcement, path traversal prevention, rate limitingImplementable as standard controls; deferred to implementation phase with security review gate

Resolution: Address in Phase 1 implementation with a dedicated security review gate before any write endpoint ships to users.