Ideation: Brainforge Slack Assistant — Next Feature Set
Grounding Context
Codebase Context
The Brainforge Slack Assistant V2 is a production Slack bot (Bolt.js + Vercel AI SDK + Azure OpenAI GPT-5.2) with 6 parallel integrations (HubSpot CRM, Exa web search, GitHub repo search, Platform repo vector search, client meeting transcripts, Linear ticket creation). It uses keyword-based intent routing (regex, not LLM) and fires ALL integrations in parallel via Promise.all. Live on Railway, internal-only at Brainforge AI.
Key pain points: Regex gates miss semantic queries (“latest AI news” doesn’t match Exa keywords); all 6 integrations fire regardless of relevance, wasting API calls; no streaming — responses are single-message text; no persistent memory; no feedback loop; no per-channel context; no proactive behavior.
Roadmap (last updated 2026-02-24): Phase 1 (Approval Flow, Reminders, Testing) planned but not started. Phase 2 (Knowledge RAG, Proactive Habits). Phase 3 (Feedback Loop, Evals, Multi-Agent Router). Known gaps include no knowledge RAG, no approval flow, no reminders, no proactive behavior, no feedback loop, no evals.
Past Learnings
7 architecture docs in docs/solutions/architecture-patterns/ document the V2 pipeline: uniform IntegrationResult contract, config-driven integration enablement, parallel Promise.all dispatch, keyword intent gates, identity resolution via Slack → HubSpot, three-phase streaming status updates. Key finding: adding a new integration follows a 3-step pattern with zero LLM prompt changes.
External Context
- Slack’s 2026 API: streaming (
chat.startStream/appendStream/stopStream), Thinking Steps with Task Cards, Card/Carousel blocks,feedback_buttons, MCP Server, Real-Time Search (RTS) API - HITL patterns: Red/Yellow/Green classification (Claude Lab), propose-then-commit architecture (StackAI), approval timeout layers
- Proactive agent patterns: event-driven with tiered filters (AuraHQ), dual-trigger scheduled + on-demand (AgentWatch)
- Eval-driven development: treat evals like TDD, LLM-as-judge, continuous production monitoring (ZOI framework)
- Multi-agent: SlackAgents framework (Salesforce Research), coordinator/dispatcher with structured memory
- Market: Agentforce in Slack (per-channel, configurable), Claude for Slack (MCP, native context), Slackbot rebuilt on Claude
Ranked Ideas
1. LLM-Based Smart Routing (Replace Regex Gates)
Description: Replace keyword-based regex intent routing with a lightweight LLM classification step that determines which integrations to invoke before the expensive parallel fetch. A ~50-token GPT-4.1-mini call classifies the query, blocking irrelevant integrations from wasting API calls and fixing the silent-failure case where semantic queries don’t match regex patterns.
Warrant: direct: ARCHITECTURE.md:233 documents the Exa keyword gate failure mode; assistant.ts:47-50 is the shouldSearchWeb regex that misses “latest AI news.” All 6 integrations fire via Promise.all (assistant.ts:162-179) regardless of relevance.
Rationale: Simultaneously fixes the most common silent-failure mode AND cuts API costs by 60-80% per query. Prerequisite for multi-agent routing (roadmap Phase 3). User perceives a smarter assistant; platform saves money.
Downsides: Adds ~100ms latency; LLM classification can be non-deterministic; requires regex fallback for classification failures.
Confidence: 85% Complexity: Medium Status: Unexplored
2. Rich Responses: Streaming + Block Kit
Description: Replace single-message text-only responses with Slack’s Streaming API showing Thinking Steps as the assistant works, and compose deal lists, pipeline summaries, and ticket prompts as interactive Block Kit components (Cards, tables, confirmation modals). A 50-deal pipeline becomes a rich table with clickable HubSpot links; multi-step reasoning becomes visible Task Cards.
Warrant: direct: messages.ts:28 is text.trim().slice(0, 35000) — no blocks, no actions, no rich formatting. assistant.ts:101,230 always uses chat.update with text: only. external: Slack’s 2026 API provides streaming, Thinking Steps, Card/Carousel blocks as native primitives.
Rationale: Streaming transforms “why is this taking so long?” into “oh, it’s searching HubSpot and transcripts.” Block Kit turns raw text data into actionable, clickable information. UX transformation that shifts the assistant from “neat” to “I use this 5x/day.”
Downsides: Streaming API + Bolt.js wiring may need workarounds; Block Kit adds format complexity; don’t over-block simple responses.
Confidence: 80% Complexity: Medium-High Status: Unexplored
3. Universal Approval & Feedback Loop
Description: Generalize the existing 👍-reaction-to-ticket pattern into a universal propose-then-commit system. Every response carries feedback buttons (✅/❌ via Slack’s feedback_buttons). Any integration proposing an action follows the same approve-before-execute contract. Negative feedback feeds into a weekly prompt improvement analysis.
Warrant: direct: Linear ticket 👍 reaction gate (assistant.ts:289-317) proves the pattern works. Only ticket creation has approval; all other responses have zero feedback. external: Claude Lab’s Red/Yellow/Green classification; Slack’s feedback_buttons block element is table stakes for AI bots.
Rationale: Without feedback, the assistant is a black box — no one knows if it’s improving. A feedback loop creates a quality signal that feeds into evals, prompt tuning, and trust.
Downsides: Approval fatigue risk; requires review discipline; negative feedback volume may be too low initially.
Confidence: 90% Complexity: Low Status: Unexplored
4. Prompt Polymorphism (Composable System Prompts)
Description: Replace the single static systemPrompt string with a library of composable prompt fragments assembled at query time based on which integrations returned data. CRM context → deal-formatting instructions. Transcript context → summarization instructions. Achieves specialist behavior without multi-agent infrastructure.
Warrant: direct: assistant.ts:30-38 — a single hardcoded systemPrompt applied to every query. ROADMAP.md:223 defers multi-agent to Phase 3. Prompt polymorphism captures specialist value at the prompt-engineering level, shipable in days.
Rationale: Specialization is the value of multi-agent, not the agent count. Creates a natural test matrix: does the CRM variant improve deal accuracy? Shipable in days not quarters.
Downsides: Risk of prompt bloat; requires A/B testing per variant; doesn’t solve tool-selection (that’s LLM Router).
Confidence: 75% Complexity: Low Status: Unexplored
5. Structured Context Engine
Description: Replace parts.join('\n\n---\n\n') context concatenation with structured, scored, and token-budgeted assembly. Each integration result classified by relevance, deduplicated, assigned a priority bucket, and formatted with type markers. Irrelevant results dropped; citations surfaced as structured metadata.
Warrant: direct: assistant.ts:181-205 — all context flattened into one string with no scoring, dedup, or budget. Individual integrations already return structured results (hubspot.ts:219-223). external: RAG frameworks demonstrate chunking + relevance scoring improves LLM accuracy 20-40%.
Rationale: Structured, scored context means higher-quality answers with fewer hallucinations. Citations that are currently computed but lost in concatenation become visible. Cheapest architectural improvement: change the format, not the flow.
Downsides: Adds classification step per invocation; requires per-integration relevance heuristics; token budgeting is non-trivial.
Confidence: 80% Complexity: Low-Medium Status: Unexplored
6. Proactive Pulse System (Event-Driven, Channel-Opt-In)
Description: Evolve from purely reactive (@mention/DM) to selectively proactive using event-driven, channel-opt-in model. Channel admins pin an “Assistant: watch” message. The assistant surfaces: unresponded threads, deal stage changes, Monday-morning pipeline snapshots. Configurable per channel, kill-switchable, bounded — not ambient everywhere.
Warrant: direct: assistant.ts:269-287 — only listens for app_mention and message.im. Roadmap Phase 1 #2 (Reminders) + Phase 2 #5 (Proactive Habits). external: AgentWatch dual-trigger architecture; AuraHQ event-driven channel monitoring with tiered filters.
Rationale: Opt-in per channel makes proactivity safe, testable, and reversible. Proactive delivery solves adoption by making the assistant unavoidably useful — it shows up where you already are.
Downsides: Higher operational complexity (scheduling, event listeners); notification fatigue risk if thresholds aren’t tuned; requires per-channel config discipline.
Confidence: 75% Complexity: High Status: Unexplored
7. Persistent Conversation Memory
Description: Transform the interaction log from write-only to read-back. Every conversation stored with full context becomes retrievable: “What was that deal I asked about last Tuesday?” Beyond recall, builds a lightweight user model: pipeline-by-client preference, bullet-point format adaptation.
Warrant: direct: assistant.ts:233-246 — fireAndForgetLog writes to Supabase but nothing reads back. state.ts:20-38 — memory store evaporates on restart. external: ChatGPT, Claude, Gemini demonstrate cross-session memory as the #1 retention feature.
Rationale: Every conversation today is like meeting a stranger. The first 30% of every interaction is re-establishing context. Persistent memory makes the assistant feel like a teammate, not a search box.
Downsides: Requires schema migration; privacy concerns (who can query whose history?); risk of surfacing stale past answers.
Confidence: 70% Complexity: Medium Status: Unexplored
Rejection Summary
| # | Idea | Reason Rejected |
|---|---|---|
| 1 | Knowledge-Base RAG (standalone) | Already roadmap Phase 2 #4; only novel angle was implementation specifics |
| 2 | Multi-Agent Router | Already roadmap Phase 3 #8; revisit when Phase 3 starts |
| 3 | Client-Facing Gateway Mode | Subject-replacement: ROADMAP.md explicitly says internal-only is out of scope |
| 4 | Semantic Thread Collision Detection | Too expensive relative to value; requires cross-channel vector infra |
| 5 | Decision Collision Warning | Too expensive; depends on collision detection |
| 6 | Information Arbitrage Agent | Depends on RAG being in place first |
| 7 | Anticipatory Cueing | High complexity; better as brainstorm variant after Proactive Pulse ships |
| 8 | Assisted Recall Retrospective | Niche; insufficient signal to warrant Phase 1/2 prioritization |
| 9 | Attention Decay Scoring | Better as brainstorm variant within Reminders roadmap item |
| 10 | Triage Card Protocol | Adds complexity before basic capabilities are solid |
| 11 | Self-Service Integration Onboarding | High complexity (MCP dependency); not warranted at current integration count |
| 12 | Selective Integration Firing | Duplicates LLM Router |
| 13 | Scheduled Linear Standup Digest | Duplicates Proactive Pulse; subset of same capability |
| 14 | Linear Ticket to PR Loop | Niche; better as brainstorm variant |
| 15 | Convergent Folgezettel | Too speculative without interaction log queryability |
| 16 | Open-Source / BYO-Model Mode | Below meeting-test threshold; dev-only concern |
| 17 | Channel-Pinned Context (standalone) | Merged into Proactive Pulse System |
| 18 | Conversational Queryability (standalone) | Merged into Persistent Conversation Memory |
| 19 | Eval-Driven Development Pipeline | Merged into Approval & Feedback Loop (feedback data IS the eval signal) |