Ideation: Brainforge Slack Assistant — Next Feature Set

Grounding Context

Codebase Context

The Brainforge Slack Assistant V2 is a production Slack bot (Bolt.js + Vercel AI SDK + Azure OpenAI GPT-5.2) with 6 parallel integrations (HubSpot CRM, Exa web search, GitHub repo search, Platform repo vector search, client meeting transcripts, Linear ticket creation). It uses keyword-based intent routing (regex, not LLM) and fires ALL integrations in parallel via Promise.all. Live on Railway, internal-only at Brainforge AI.

Key pain points: Regex gates miss semantic queries (“latest AI news” doesn’t match Exa keywords); all 6 integrations fire regardless of relevance, wasting API calls; no streaming — responses are single-message text; no persistent memory; no feedback loop; no per-channel context; no proactive behavior.

Roadmap (last updated 2026-02-24): Phase 1 (Approval Flow, Reminders, Testing) planned but not started. Phase 2 (Knowledge RAG, Proactive Habits). Phase 3 (Feedback Loop, Evals, Multi-Agent Router). Known gaps include no knowledge RAG, no approval flow, no reminders, no proactive behavior, no feedback loop, no evals.

Past Learnings

7 architecture docs in docs/solutions/architecture-patterns/ document the V2 pipeline: uniform IntegrationResult contract, config-driven integration enablement, parallel Promise.all dispatch, keyword intent gates, identity resolution via Slack → HubSpot, three-phase streaming status updates. Key finding: adding a new integration follows a 3-step pattern with zero LLM prompt changes.

External Context

  • Slack’s 2026 API: streaming (chat.startStream/appendStream/stopStream), Thinking Steps with Task Cards, Card/Carousel blocks, feedback_buttons, MCP Server, Real-Time Search (RTS) API
  • HITL patterns: Red/Yellow/Green classification (Claude Lab), propose-then-commit architecture (StackAI), approval timeout layers
  • Proactive agent patterns: event-driven with tiered filters (AuraHQ), dual-trigger scheduled + on-demand (AgentWatch)
  • Eval-driven development: treat evals like TDD, LLM-as-judge, continuous production monitoring (ZOI framework)
  • Multi-agent: SlackAgents framework (Salesforce Research), coordinator/dispatcher with structured memory
  • Market: Agentforce in Slack (per-channel, configurable), Claude for Slack (MCP, native context), Slackbot rebuilt on Claude

Ranked Ideas

1. LLM-Based Smart Routing (Replace Regex Gates)

Description: Replace keyword-based regex intent routing with a lightweight LLM classification step that determines which integrations to invoke before the expensive parallel fetch. A ~50-token GPT-4.1-mini call classifies the query, blocking irrelevant integrations from wasting API calls and fixing the silent-failure case where semantic queries don’t match regex patterns.

Warrant: direct: ARCHITECTURE.md:233 documents the Exa keyword gate failure mode; assistant.ts:47-50 is the shouldSearchWeb regex that misses “latest AI news.” All 6 integrations fire via Promise.all (assistant.ts:162-179) regardless of relevance.

Rationale: Simultaneously fixes the most common silent-failure mode AND cuts API costs by 60-80% per query. Prerequisite for multi-agent routing (roadmap Phase 3). User perceives a smarter assistant; platform saves money.

Downsides: Adds ~100ms latency; LLM classification can be non-deterministic; requires regex fallback for classification failures.

Confidence: 85% Complexity: Medium Status: Unexplored


2. Rich Responses: Streaming + Block Kit

Description: Replace single-message text-only responses with Slack’s Streaming API showing Thinking Steps as the assistant works, and compose deal lists, pipeline summaries, and ticket prompts as interactive Block Kit components (Cards, tables, confirmation modals). A 50-deal pipeline becomes a rich table with clickable HubSpot links; multi-step reasoning becomes visible Task Cards.

Warrant: direct: messages.ts:28 is text.trim().slice(0, 35000) — no blocks, no actions, no rich formatting. assistant.ts:101,230 always uses chat.update with text: only. external: Slack’s 2026 API provides streaming, Thinking Steps, Card/Carousel blocks as native primitives.

Rationale: Streaming transforms “why is this taking so long?” into “oh, it’s searching HubSpot and transcripts.” Block Kit turns raw text data into actionable, clickable information. UX transformation that shifts the assistant from “neat” to “I use this 5x/day.”

Downsides: Streaming API + Bolt.js wiring may need workarounds; Block Kit adds format complexity; don’t over-block simple responses.

Confidence: 80% Complexity: Medium-High Status: Unexplored


3. Universal Approval & Feedback Loop

Description: Generalize the existing 👍-reaction-to-ticket pattern into a universal propose-then-commit system. Every response carries feedback buttons (✅/❌ via Slack’s feedback_buttons). Any integration proposing an action follows the same approve-before-execute contract. Negative feedback feeds into a weekly prompt improvement analysis.

Warrant: direct: Linear ticket 👍 reaction gate (assistant.ts:289-317) proves the pattern works. Only ticket creation has approval; all other responses have zero feedback. external: Claude Lab’s Red/Yellow/Green classification; Slack’s feedback_buttons block element is table stakes for AI bots.

Rationale: Without feedback, the assistant is a black box — no one knows if it’s improving. A feedback loop creates a quality signal that feeds into evals, prompt tuning, and trust.

Downsides: Approval fatigue risk; requires review discipline; negative feedback volume may be too low initially.

Confidence: 90% Complexity: Low Status: Unexplored


4. Prompt Polymorphism (Composable System Prompts)

Description: Replace the single static systemPrompt string with a library of composable prompt fragments assembled at query time based on which integrations returned data. CRM context → deal-formatting instructions. Transcript context → summarization instructions. Achieves specialist behavior without multi-agent infrastructure.

Warrant: direct: assistant.ts:30-38 — a single hardcoded systemPrompt applied to every query. ROADMAP.md:223 defers multi-agent to Phase 3. Prompt polymorphism captures specialist value at the prompt-engineering level, shipable in days.

Rationale: Specialization is the value of multi-agent, not the agent count. Creates a natural test matrix: does the CRM variant improve deal accuracy? Shipable in days not quarters.

Downsides: Risk of prompt bloat; requires A/B testing per variant; doesn’t solve tool-selection (that’s LLM Router).

Confidence: 75% Complexity: Low Status: Unexplored


5. Structured Context Engine

Description: Replace parts.join('\n\n---\n\n') context concatenation with structured, scored, and token-budgeted assembly. Each integration result classified by relevance, deduplicated, assigned a priority bucket, and formatted with type markers. Irrelevant results dropped; citations surfaced as structured metadata.

Warrant: direct: assistant.ts:181-205 — all context flattened into one string with no scoring, dedup, or budget. Individual integrations already return structured results (hubspot.ts:219-223). external: RAG frameworks demonstrate chunking + relevance scoring improves LLM accuracy 20-40%.

Rationale: Structured, scored context means higher-quality answers with fewer hallucinations. Citations that are currently computed but lost in concatenation become visible. Cheapest architectural improvement: change the format, not the flow.

Downsides: Adds classification step per invocation; requires per-integration relevance heuristics; token budgeting is non-trivial.

Confidence: 80% Complexity: Low-Medium Status: Unexplored


6. Proactive Pulse System (Event-Driven, Channel-Opt-In)

Description: Evolve from purely reactive (@mention/DM) to selectively proactive using event-driven, channel-opt-in model. Channel admins pin an “Assistant: watch” message. The assistant surfaces: unresponded threads, deal stage changes, Monday-morning pipeline snapshots. Configurable per channel, kill-switchable, bounded — not ambient everywhere.

Warrant: direct: assistant.ts:269-287 — only listens for app_mention and message.im. Roadmap Phase 1 #2 (Reminders) + Phase 2 #5 (Proactive Habits). external: AgentWatch dual-trigger architecture; AuraHQ event-driven channel monitoring with tiered filters.

Rationale: Opt-in per channel makes proactivity safe, testable, and reversible. Proactive delivery solves adoption by making the assistant unavoidably useful — it shows up where you already are.

Downsides: Higher operational complexity (scheduling, event listeners); notification fatigue risk if thresholds aren’t tuned; requires per-channel config discipline.

Confidence: 75% Complexity: High Status: Unexplored


7. Persistent Conversation Memory

Description: Transform the interaction log from write-only to read-back. Every conversation stored with full context becomes retrievable: “What was that deal I asked about last Tuesday?” Beyond recall, builds a lightweight user model: pipeline-by-client preference, bullet-point format adaptation.

Warrant: direct: assistant.ts:233-246fireAndForgetLog writes to Supabase but nothing reads back. state.ts:20-38 — memory store evaporates on restart. external: ChatGPT, Claude, Gemini demonstrate cross-session memory as the #1 retention feature.

Rationale: Every conversation today is like meeting a stranger. The first 30% of every interaction is re-establishing context. Persistent memory makes the assistant feel like a teammate, not a search box.

Downsides: Requires schema migration; privacy concerns (who can query whose history?); risk of surfacing stale past answers.

Confidence: 70% Complexity: Medium Status: Unexplored

Rejection Summary

#IdeaReason Rejected
1Knowledge-Base RAG (standalone)Already roadmap Phase 2 #4; only novel angle was implementation specifics
2Multi-Agent RouterAlready roadmap Phase 3 #8; revisit when Phase 3 starts
3Client-Facing Gateway ModeSubject-replacement: ROADMAP.md explicitly says internal-only is out of scope
4Semantic Thread Collision DetectionToo expensive relative to value; requires cross-channel vector infra
5Decision Collision WarningToo expensive; depends on collision detection
6Information Arbitrage AgentDepends on RAG being in place first
7Anticipatory CueingHigh complexity; better as brainstorm variant after Proactive Pulse ships
8Assisted Recall RetrospectiveNiche; insufficient signal to warrant Phase 1/2 prioritization
9Attention Decay ScoringBetter as brainstorm variant within Reminders roadmap item
10Triage Card ProtocolAdds complexity before basic capabilities are solid
11Self-Service Integration OnboardingHigh complexity (MCP dependency); not warranted at current integration count
12Selective Integration FiringDuplicates LLM Router
13Scheduled Linear Standup DigestDuplicates Proactive Pulse; subset of same capability
14Linear Ticket to PR LoopNiche; better as brainstorm variant
15Convergent FolgezettelToo speculative without interaction log queryability
16Open-Source / BYO-Model ModeBelow meeting-test threshold; dev-only concern
17Channel-Pinned Context (standalone)Merged into Proactive Pulse System
18Conversational Queryability (standalone)Merged into Persistent Conversation Memory
19Eval-Driven Development PipelineMerged into Approval & Feedback Loop (feedback data IS the eval signal)