Agentic Workflow for Data Products & Analyses

Translating AI-enabled PM best practices into reusable data consulting templates

Philosophy: From Vibe Coding to Vibe Analytics

The Core Insight:

“Effective AI coding is rooted in good planning—not jumping into building.”

Translation for Data:

“Effective AI-powered data work is rooted in good problem framing—not jumping into querying.”

The 7 Best Practices → Data Work Translation

1. Structured Planning Over Eager Execution

Original PM Practice:

Create issue in Linear → expand idea with AI → build detailed plan → execute → multiple code reviews using different AI models

Data Work Translation:

📋 Data Request → 🔍 Problem Discovery → 📐 Analysis Plan → 
🛠️ Execute Analysis → 👥 Peer Review (Multi-Model) → 
📚 Document Insights → 🎓 Learning Capture

Why This Matters for Data:

AI tools (Claude, ChatGPT, Cursor) are eager to write SQL/Python immediately
Without planning, you get:
- Wrong metrics (measuring the wrong thing)
- Bad assumptions (didn’t validate data quality)
- Technical debt (unmaintainable queries)
- Misaligned insights (solving wrong problem)

The Problem:

❌ BAD WORKFLOW:
User: "Show me customer churn"
AI: *immediately writes complex SQL with assumptions*
Result: Wrong definition of churn, bad joins, unusable query

✅ GOOD WORKFLOW:
User: "Show me customer churn"
AI (with structured prompt): "Let me understand the problem first:
  - How do you define 'churn'? (canceled, inactive 90 days, etc.)
  - What time period are we analyzing?
  - What segments do we care about?
  - What decision will this inform?
  - What's our source of truth for customer data?"
Result: Aligned metric, validated assumptions, maintainable solution

2. Slash Commands = Reusable Data Workflows

Original PM Practice:

Slash commands (/create issue, /explore, /create plan, /execute, /peer review) encode best practices. When Claude makes mistake, ask what caused it, update the command.

Data Work Translation:

Core Data Slash Commands:

/intake - Capture data request quickly
/discover - Deep problem exploration with stakeholder
/plan_analysis - Create analysis plan (metrics, methods, assumptions)
/execute_analysis - Build queries/code with best practices
/peer_review_analysis - Multi-model code review
/document_insights - Create stakeholder-ready deliverable
/learning_moment - Extract teaching from mistakes
/deslop_analysis - Remove AI-generated verbosity

Example Slash Command Structure:

# /discover
 
You are a senior data analyst helping scope a data request.
 
## Context
- User is mid-sprint and thought of a data need
- We need to deeply understand the problem before building
- Challenge assumptions, don't be a people pleaser
 
## Your Role
1. Understand the business problem (not the data request)
2. Clarify what decision this informs
3. Ask about:
   - Metric definitions
   - Time periods
   - Segments of interest
   - Data quality concerns
   - Success criteria
4. Identify if this is:
   - Ad hoc exploration (do once)
   - Recurring report (needs automation)
   - Strategic analysis (needs deep dive)
 
## Output Format
- TLDR of the request
- Business context (what decision does this inform?)
- Open questions (what we need to clarify)
- Proposed approach (how we'd tackle this)
- Risks/assumptions (what could go wrong)
 
## Tone
- Curious, not judgmental
- Challenge vague requests
- Push for clarity on "why"
- Don't assume you know what they mean

How to Build Your Slash Command Library:

Start with base templates (see below)
Every time AI makes a mistake:
- Ask: “What in your system prompt caused this error?”
- Update the slash command to prevent it
- Document the fix in your knowledge base
Iterate weekly:
- What requests came in this week?
- What patterns emerged?
- What can be templated?

The Compounding Effect:

Week 1: Basic /discover command
Week 5: /discover now includes 12 common pitfalls to avoid
Week 10: /discover knows your company’s metric definitions
Week 20: /discover is a battle-tested framework that prevents 90% of misalignment issues

3. Learning Faster > Analyzing Faster

Original PM Practice:

Use “learning opportunity” slash command that says “I’m a technical PM in the making with mid-level engineering knowledge, explain what you did.” Turns every bug into a teaching moment.

Data Work Translation:

The /learning_moment Command:

# /learning_moment
 
You are teaching a data analyst who wants to become a data engineering expert.
 
## Student Profile
- Strong business/product sense
- Mid-level SQL knowledge
- Basic Python (can read, can't architect)
- Understands statistics conceptually
- Learning data engineering, system design, advanced analytics
 
## Teaching Approach
1. Use 80/20 rule (core concepts, skip edge cases)
2. Explain WHY not just HOW
3. Connect to business impact
4. Use analogies from non-technical world
5. Highlight common pitfalls
6. Recommend what to learn next
 
## When to Use This
- Just fixed a complex data quality issue → explain root cause
- Built sophisticated query → explain query optimization
- Used new statistical method → explain when/why to use it
- Made architectural decision → explain trade-offs
 
## Output Format
- **What we did:** Plain English summary
- **Why this approach:** Decision rationale
- **How it works:** Technical explanation (not too deep)
- **Common mistakes:** What people get wrong
- **When to use:** Generalizable pattern
- **Further learning:** What to explore next

Example Usage:

You: [After Claude writes complex window function query]
     /learning_moment - explain this window function approach

Claude: **What we did:**
We used a window function to calculate 30-day rolling retention 
without doing a self-join, which would be much slower.

**Why this approach:**
Window functions let you do calculations across rows without 
collapsing them (like GROUP BY does). This means one pass through 
the data instead of multiple passes.

**How it works:**
- PARTITION BY: Groups data (like GROUP BY)
- ORDER BY: Defines sequence within each group
- ROWS BETWEEN: Defines the "window" to look at

**Common mistakes:**
- Forgetting ORDER BY (gives unpredictable results)
- Using window function when GROUP BY would be clearer
- Not considering performance on large datasets

**When to use:**
- Running totals
- Moving averages
- Ranking within groups
- Comparing row to previous/next row

**Further learning:**
- Read about different window frame types (ROWS vs RANGE)
- Practice on leetcode SQL problems
- Learn about query execution plans

The Learning Multiplier Effect:

Instead of just shipping analyses, you’re also:

Building deep technical understanding
Creating mental models for future problems
Developing intuition for data quality issues
Learning when to use advanced techniques vs. simple solutions

4. Graduated Tool Adoption (Exposure Therapy)

Original PM Practice:

Start with ChatGPT projects (comfortable UI) → graduate to Lovable/Bolt (low-code) → move to Cursor (full IDE). Don’t jump straight to intimidating tools.

Data Work Translation:

The Learning Ladder for AI-Powered Data Work:

Level 1: ChatGPT Projects (Comfortable UI)
├─ Use: Problem framing, metric definitions, SQL review
├─ Goal: Get comfortable asking AI for help
└─ Duration: 1-2 weeks

Level 2: Claude Projects + Artifacts (Interactive)
├─ Use: Write SQL queries, create visualizations, draft analyses
├─ Goal: See AI generate code in real-time, learn to iterate
└─ Duration: 2-4 weeks

Level 3: Cursor + Claude (Light Mode)
├─ Use: Query IDE with AI assistance, basic Python scripts
├─ Goal: Work in code editor but with lots of guardrails
└─ Duration: 1-2 months

Level 4: Cursor + Multiple Models (Dark Mode)
├─ Use: Full data engineering workflows, complex pipelines
├─ Goal: Orchestrate multiple AI models for different tasks
└─ Duration: Ongoing mastery

Level 5: Agentic Workflows
├─ Use: AI agents autonomously execute entire analyses
├─ Goal: Strategic oversight while AI handles execution
└─ Duration: Future state (but starting to be possible)

Phase 1: ChatGPT Project Setup

Create a “Data CTO” Project:

Custom Instructions:
 
You are the Chief Data Officer of a data consulting firm.
 
Your role:
- You own ALL technical decisions about data
- Challenge my thinking (don't be a people pleaser)
- Explain trade-offs clearly
- Push back on vague requests
- Teach me to be a better data analyst
 
My role:
- I own the business problem
- I own stakeholder relationships
- I define success criteria
- You own how we technically solve it
 
When I ask for analysis:
1. First, understand the business problem (not just data request)
2. Ask clarifying questions
3. Propose 2-3 approaches with trade-offs
4. Only then build the solution
 
When you review my work:
- Be direct about mistakes
- Explain why something is wrong
- Show me the better way
- Help me learn, don't just fix it
 
Knowledge Base:
[Upload: company metric definitions, data dictionary, common queries]

Phase 2: Claude Artifacts for Analysis

Use Claude’s Artifacts to draft SQL queries
Iterate in real-time (see query update as you chat)
Export to your SQL client when ready
Great for: exploratory analysis, one-off queries

Phase 3: Cursor for Production Work

Start with Light Mode (less scary)
Use Composer for full file edits
Use Chat for questions
Gradually move to Dark Mode terminal work

Phase 4: Multi-Model Orchestration

Claude for planning & complex logic
GPT-4 for peer review
Gemini for visualization/dashboard design
Each model has strengths - use them strategically

5. Slop is a People Problem, Not an AI Problem

Original PM Practice:

If you use AI to generate outputs and ship without review, that’s human error. Own every output completely. Guide AI with context on your writing style, problem, constraints.

Data Work Translation:

The Anti-Slop Framework for Data:

❌ SLOP:
User: "Analyze customer churn"
AI: *generates 500-line SQL query with 15 CTEs*
User: *copies into Slack* "Here's the churn analysis"
Result: 
- Stakeholder doesn't understand query
- Assumptions aren't validated
- Results are wrong (but look impressive)
- No one can maintain this

✅ NO SLOP:
User: "Analyze customer churn"
AI: *generates query*
User: 
  1. Reviews logic line by line
  2. Validates assumptions against data dictionary
  3. Tests edge cases
  4. Runs on small sample first
  5. Compares to known baseline
  6. Simplifies query (removes unnecessary complexity)
  7. Adds comments explaining business logic
  8. Documents in analysis plan
  9. Peer reviews with another model
  10. THEN shares with stakeholder

Result:
- Correct analysis
- Maintainable code
- Clear documentation
- Stakeholder trust

The Ownership Principle:

“If I share an analysis with a stakeholder and it’s wrong, that’s MY mistake, not Claude’s.”

How to Own Your AI-Generated Work:

1. Pre-Flight Checklist (Before Running Query)

## Query Review Checklist
 
Business Logic:
- [ ] Metric definition matches stakeholder understanding
- [ ] Time period is correct
- [ ] Segments/filters align with business rules
- [ ] Exclusions are documented
 
Technical Correctness:
- [ ] Joins are correct (INNER vs LEFT vs OUTER)
- [ ] No duplicate rows (test with COUNT vs COUNT DISTINCT)
- [ ] NULL handling is intentional
- [ ] Date logic accounts for timezones
- [ ] Window functions have correct partitions
 
Data Quality:
- [ ] Source tables are trusted/validated
- [ ] Known data issues are handled
- [ ] Sample size is sufficient
- [ ] Outliers are investigated, not just filtered
 
Performance:
- [ ] Query won't timeout on production data
- [ ] Indexes are used appropriately
- [ ] Not scanning unnecessary data
 
Maintainability:
- [ ] CTEs have clear names
- [ ] Complex logic has comments
- [ ] Someone else could understand this in 6 months

2. Results Validation

## Results Sanity Check
 
Does this pass the "smell test"?
- [ ] Order of magnitude feels right
- [ ] Trends match known business patterns
- [ ] Totals reconcile to source data
- [ ] Segments add up correctly
- [ ] No unexpected nulls/zeros
 
Compare to baselines:
- [ ] How does this compare to last month/quarter/year?
- [ ] Are there known events that explain changes?
- [ ] Do other teams have similar numbers?
 
Edge case testing:
- [ ] What happens with test accounts?
- [ ] What happens with refunds/cancellations?
- [ ] What happens at month boundaries?

3. The “Explain to a 5-Year-Old” Test

Before sharing ANY analysis:

Can you explain the metric in plain English?
Can you explain why the number changed?
Can you defend your assumptions?
Can you explain what data is excluded and why?

If you can’t explain it simply, you don’t understand it well enough.

4. Context-Rich Prompting

❌ Lazy Prompting:

"Write SQL to calculate MRR"

✅ Context-Rich Prompting:

"Write SQL to calculate MRR for our SaaS product.

Context:
- We define MRR as: recurring subscription revenue normalized to monthly
- One-time charges should be excluded
- Refunds should reduce MRR in the month they occur
- Customers can have multiple subscriptions
- We bill monthly and annually (annual should be divided by 12)
- Source table: subscriptions (has plan_type, billing_frequency, amount)
- Known issues: Some test accounts have is_test = true flag

Requirements:
- Break down by plan tier (starter, pro, enterprise)
- Show month-over-month change
- Include new MRR, expansion, contraction, churn
- Results should reconcile to finance team's numbers

Output should:
- Use clear CTE names
- Comment any complex logic
- Be readable by someone who isn't a SQL expert

Result: AI generates correct, maintainable, aligned solution.

6. 10x Learner > 10x Analyst

Original PM Practice:

After failed product review at Wix, realized team expected him to learn rapidly, not have all answers. Identified each teammate’s strength and used them as specialized mentors.

Data Work Translation:

The Learning-First Mindset for Data:

Reframe: Junior Data Analyst Expectations

❌ What Juniors Think Teams Want:
- Know all the answers
- Never make mistakes
- Be the smartest person in the room
- Ship perfect analyses on first try

✅ What Teams Actually Want:
- Learn rapidly from feedback
- Ask clarifying questions upfront
- Own your mistakes and improve
- Turn failures into learning moments
- Make the team better through curiosity

The Specialized Mentor Framework:

Instead of trying to learn everything from everyone, identify:

1. The Metrics Expert

Person who deeply understands business metrics
Use them for: Validating definitions, understanding “why” behind metrics
Learning opportunities: Metric design, stakeholder alignment

2. The SQL Wizard

Person who writes elegant, performant queries
Use them for: Query optimization, data modeling questions
Learning opportunities: Advanced SQL, query plans, indexing

3. The Statistics Brain

Person who knows when to use regression vs. cohort analysis
Use them for: Method selection, interpreting results
Learning opportunities: When to use which statistical test

4. The Storyteller

Person who turns data into compelling narratives
Use them for: Presentation feedback, insight framing
Learning opportunities: Data communication, executive summaries

5. The Systems Thinker

Person who sees second/third order effects
Use them for: Impact assessment, unintended consequences
Learning opportunities: Strategic thinking, trade-off analysis

How to Leverage Specialists (Without Being Annoying):

❌ BAD:
"Hey can you review my entire analysis?"

✅ GOOD:
"Hey [Metrics Expert], quick question on our churn definition.
I'm seeing different numbers depending on whether I use 
'subscription_canceled_at' vs 'last_payment_date + 30 days'.
Which one aligns with how Finance calculates churn?

Context: Building Q1 retention dashboard for ELT.
Timeline: Need to align by Friday."

Result: 
- Specific ask
- Shows you tried to figure it out
- Clear context
- Respects their time

Turn Failures Into Shared Wins:

When you mess up an analysis:

❌ Bad Response:

"Sorry, I'll fix it."
*Fixes alone*
*Shares fixed version*

✅ Good Response:

"I messed up - I used the wrong churn definition. 

Before I fix it, can I walk through my thinking with you?
I want to understand where my mental model was wrong so 
I don't make this mistake again.

[10 min meeting]
[Fix analysis]
[Document learning in /learning_moment]
[Update your knowledge base]

Result: Next time, you don't make this mistake.
And you showed you're a 10x learner.

The Feedback Loop:

1. Do Analysis → 2. Get Feedback → 3. Update Mental Model → 
4. Document Learning → 5. Share with Team → 6. Improve Next Time

Each failure becomes a teaching moment. Each teaching moment becomes shared knowledge. Your growth becomes team growth.

7. AI Makes Junior Analysts More Valuable

Original PM Practice:

Junior PMs can now handle strategy, marketing, messaging, and full product implementation—responsibilities typically reserved for senior roles. These reps accelerate learning.

Data Work Translation:

The Expanded Scope for AI-Enabled Data Analysts:

Traditional Junior Analyst:

Run queries senior analysts write
Make dashboards from specs
Answer tactical questions
Limited to execution

AI-Enabled Junior Analyst:

Design entire analysis from scratch
Build production data pipelines
Advise on metric strategy
Own end-to-end data products

What’s Now Possible:

Traditional Junior Role	AI-Enabled Role
”Pull this metric"	"Here’s the metric strategy for this product"
"Make this dashboard"	"Here’s a full self-serve analytics platform"
"Run this SQL query"	"Here’s the data architecture we should build"
"Summarize these results"	"Here’s the strategic recommendation with supporting analysis”

The Reps Advantage:

Without AI:

Build 1 dashboard → Learn dashboard design
1 month timeline
12 dashboards per year
Limited scope (can only do what you’re assigned)

With AI:

Build 1 dashboard (2 days)
Build experimentation framework (1 week)
Build attribution model (1 week)
Build customer segmentation (1 week)
Build LTV prediction model (2 weeks)
Build self-serve analytics tool (3 weeks)
Total: 2 months, 6 major projects
Learned: Dashboard design, experiment design, attribution, ML, self-serve tools
Result: 6x more reps, 10x broader skills, in same timeframe

The Constraint Isn’t Ideas or Technical Ability Anymore:

Old Constraints:

❌ “I don’t know how to code”
❌ “I don’t know how to build dashboards”
❌ “I don’t know how to do statistics”

New Constraint:

✅ Curiosity (Are you willing to learn?)
✅ Judgment (Can you tell good analysis from bad?)
✅ Communication (Can you tell the story?)
✅ Business sense (Are you solving the right problem?)

These are LEARNABLE SKILLS that AI accelerates.

The Complete Agentic Data Workflow

Now let’s put it all together into a reusable system.

Workflow Overview

Request Intake → Problem Discovery → Analysis Planning → 
Execution → Multi-Model Review → Documentation → Learning Capture

Phase 1: Request Intake (5 minutes)

Tool: Linear, Notion, or Claude Chat

Slash Command: /intake

Goal: Quickly capture request without losing flow

# /intake
 
You are mid-analysis and someone just asked for data.
Capture this FAST so you can keep working.
 
Ask 3 questions:
1. What's the data request?
2. What decision does this inform?
3. When do they need it?
 
Create Linear ticket with:
- Title: [Clear description of request]
- Context: [Business problem]
- Priority: [High/Medium/Low based on timeline]
- Labels: [ad-hoc / recurring / strategic]
 
Keep it under 2 minutes.

Example:

You: /intake - Mitesh wants to know affiliate vs. paid search ROI

Claude: Got it. Three quick questions:
1. What's the core question? (Comparing channels for budget allocation?)
2. What's the decision? (Where to shift Q2 spend?)
3. When needed? (Next week's planning meeting?)

You: Yes to all

Claude: Created Linear ticket STU-89:
"Compare LTV:CAC for affiliates vs. paid search channels to inform Q2 budget allocation. Due: Friday for planning meeting."

Phase 2: Problem Discovery (30-60 minutes)

Tool: Claude Projects or Cursor Chat

Slash Command: /discover [Linear ticket ID]

Goal: Deeply understand problem before touching data

# /discover
 
You are a senior data strategist helping scope an analysis.
 
## Your Role
1. Understand the BUSINESS problem (not just data request)
2. Challenge vague requests
3. Identify assumptions
4. Propose 2-3 approaches
5. Surface risks early
 
## Discovery Questions
 
**Business Context:**
- What decision does this inform?
- What's the hypothesis?
- What would "success" look like?
- Who's the audience? (ELT, team lead, self?)
 
**Metric Definitions:**
- How do we define [key metric]?
- What's included/excluded?
- What time period matters?
- What segments do we care about?
 
**Data Landscape:**
- What's our source of truth?
- Any known data quality issues?
- What's already been analyzed in this area?
- Can we repurpose existing work?
 
**Success Criteria:**
- What level of precision matters? (directional vs. exact)
- What format? (dashboard, slide, number)
- How will this be used? (one-time vs. ongoing)
 
## Output Format
 
**Problem Statement:**
[One paragraph: business context, decision to inform, success criteria]
 
**Proposed Approach:**
[2-3 options with trade-offs]
 
**Risks & Assumptions:**
[What could go wrong, what we're assuming]
 
**Open Questions:**
[What we need to clarify before starting]
 
**Next Steps:**
[What happens next]

Example Output:

**Problem Statement:**
Marketing team needs to compare affiliate vs. paid search channels 
to decide Q2 budget allocation. Current spend is 50/50 but we suspect 
affiliates have better LTV:CAC. Need directional answer by Friday 
to inform $200K budget decision.

**Proposed Approach:**

Option 1: Full LTV:CAC Analysis (2 days)
- Calculate true LTV (cohort-based, 180-day)
- Include all costs (platform fees, commissions)
- Show payback curves
- Pro: Most accurate, defensible
- Con: Time-intensive, may have data gaps

Option 2: Simplified ROI Comparison (4 hours)
- 30-day revenue per customer by channel
- Basic CAC (ad spend / new customers)
- Pro: Fast, good enough for directional decision
- Con: Doesn't capture long-term value

Option 3: Repurpose Existing + Update (1 day)
- Use December LTV:CAC analysis
- Update with Jan data
- Add affiliate channel (new)
- Pro: Builds on validated work
- Con: May miss recent changes

**Recommended: Option 3**
Fastest path to defensible answer.

**Risks & Assumptions:**
- Assuming affiliate attribution is accurate (recent improvements)
- Assuming 30-day LTV is proxy for long-term (validate with cohorts)
- May miss seasonal effects (Dec ≠ Jan behavior)

**Open Questions:**
1. Do we have affiliate CAC broken out (or bundled with other costs)?
2. Is Mitesh okay with 30-day LTV proxy?
3. Do we need breakdown by affiliate partner?

**Next Steps:**
1. Confirm approach with Mitesh (15 min)
2. If approved, create analysis plan
3. Execute (1 day)
4. Review + document

Phase 3: Analysis Planning (15-30 minutes)

Tool: Cursor with markdown file

Slash Command: /plan_analysis

Goal: Create executable plan before writing queries

# /plan_analysis
 
Create a detailed analysis plan that serves as:
1. Roadmap for execution
2. Contract with stakeholder
3. Documentation for future
 
## Plan Structure
 
### TLDR
[One sentence: what we're analyzing and why]
 
### Business Context
- Problem we're solving
- Decision this informs
- Success criteria
 
### Metrics & Definitions
- Primary metrics (with formulas)
- Segments/filters
- Time periods
- Exclusions
 
### Data Sources
- Tables/datasets
- Known quality issues
- Assumptions we're making
 
### Methodology
- Step-by-step approach
- Key analyses (with expected output)
- Validation steps
 
### Deliverables
- What format (dashboard, slide, doc)
- What's included
- What's NOT included (out of scope)
 
### Timeline & Status
[ ] Data extraction
[ ] Data validation
[ ] Core analysis
[ ] Peer review
[ ] Documentation
[ ] Stakeholder presentation
 
### Risks
- What could go wrong
- Mitigation plans

Why This Step is Critical:

Forces you to think through edge cases
Surfaces assumptions early
Creates shared understanding with stakeholder
Gives AI clear roadmap (no hallucinations)
Makes analysis reproducible

Phase 4: Execute Analysis (Varies)

Tool: Cursor Composer + Claude Code

Slash Command: /execute_analysis

Goal: Build queries/code following best practices

# /execute_analysis
 
You are executing a data analysis following the approved plan.
 
## Execution Principles
 
1. **Build incrementally**
   - Start with smallest dataset
   - Validate each step
   - Only then scale to full data
 
2. **Write maintainable code**
   - Clear CTE names (what, not how)
   - Comments explain "why"
   - No magic numbers (use variables)
 
3. **Validate assumptions**
   - Check for nulls
   - Test edge cases
   - Compare to known baselines
 
4. **Follow the plan**
   - Reference analysis plan sections
   - Flag if plan needs updating
   - Document deviations
 
## Code Structure
 
For SQL:
```sql
-- Business Context: [Link to plan]
-- Author: Robert Tseng
-- Date: 2026-01-20
-- Last Modified: 2026-01-20
 
-- Step 1: Define time period and filters
WITH date_spine AS (
  -- Create date range for analysis
  ...
),
 
-- Step 2: Get customer data
customers AS (
  -- Pull from dim_customers
  -- Exclude test accounts
  ...
),
 
-- Step 3: Calculate primary metric
...
 
-- Final output
SELECT
  ...
FROM ...

For Python:

"""
Business Context: [Link to plan]
Author: Robert Tseng
Date: 2026-01-20
 
Analysis: Affiliate vs Paid Search LTV:CAC Comparison
See: analysis_plan.md for full details
"""
 
import pandas as pd
import numpy as np
 
# Constants (no magic numbers)
ANALYSIS_START = "2025-01-01"
ANALYSIS_END = "2026-01-20"
MIN_COHORT_SIZE = 100
 
# Step 1: Load data
def load_customer_data():
    """
    Load customer acquisition data from BigQuery
    
    Returns:
        DataFrame with columns: customer_id, channel, acquisition_date, ltv_30d
    """
    ...

Output

Annotated code (SQL/Python)
Validation checks passed
Results preview
Any deviations from plan


**Execution Tips:**

1. **Use Composer for speed** (full file edits)
2. **Use Claude Chat for logic questions**
3. **Test on small sample first** (don't run full query until validated)
4. **Save intermediate results** (easier to debug)
5. **Version control everything** (git commit each step)

---

### **Phase 5: Multi-Model Peer Review (30-60 minutes)**

**Tool:** Cursor + Claude Code + ChatGPT + Gemini

**Slash Commands:** `/review` then `/peer_review`

**Goal:** Catch mistakes through diverse AI perspectives

```markdown
# /review

You are reviewing your own analysis for errors.

## Review Checklist

### Business Logic
- [ ] Metrics match stakeholder definitions
- [ ] Filters align with business rules
- [ ] Time periods are correct
- [ ] Segments make business sense

### Technical Correctness
- [ ] Joins don't create duplicates
- [ ] NULL handling is intentional
- [ ] Aggregations are correct
- [ ] Edge cases are handled

### Data Quality
- [ ] Source tables are validated
- [ ] Known issues are addressed
- [ ] Sample sizes are sufficient
- [ ] Outliers are investigated

### Code Quality
- [ ] Readable by non-expert
- [ ] Well-commented
- [ ] No magic numbers
- [ ] Reusable/maintainable

## Output
- List of issues found (Critical/High/Medium/Low)
- Suggested fixes
- Questions to investigate

The Peer Review Process:

1. Self-Review (Claude reviews own code)
   ↓
2. GPT-4 Review (different perspective)
   ↓
3. Gemini Review (catches different issues)
   ↓
4. /peer_review (reconcile all feedback)

# /peer_review
 
You are the lead analyst on this project.
 
Other team leads have reviewed your code and found issues:
 
**Data Lead (GPT-4) found:**
[paste GPT review]
 
**SQL Expert (Gemini) found:**
[paste Gemini review]
 
Your job:
1. Evaluate each issue (is it valid?)
2. Either fix it or explain why it's not an issue
3. Be rigorous (you have more context than reviewers)
4. Don't take feedback at face value
 
If reviewers are wrong, explain why.
If they're right, fix it.
 
Respond with:
- Issues you agree with (and fixes)
- Issues you disagree with (and rationale)
- Final updated code

Why Multi-Model Review Works:

Claude: Best at business logic, metric definitions
GPT-4: Best at catching edge cases, SQL bugs
Gemini: Best at visualization logic, data storytelling

Each model has blind spots. Together, they catch 90%+ of issues.

Phase 6: Document Insights (30 minutes)

Tool: Claude Artifacts or Google Slides

Slash Command: /document_insights

Goal: Create stakeholder-ready deliverable

# /document_insights
 
Transform analysis into stakeholder-ready deliverable.
 
## Know Your Audience
 
**For ELT:**
- Lead with business impact
- Show decision clarity
- Include risks/caveats
- Keep technical details in appendix
 
**For Team Leads:**
- Show methodology
- Include validation steps
- Provide reproducible code
- Document assumptions
 
**For Self:**
- Full technical detail
- Future improvement ideas
- Lessons learned
 
## Deliverable Structure
 
### Executive Summary (1 slide/section)
- TLDR: What we found
- So What: Why it matters
- Now What: Recommended action
 
### Key Findings (2-3 slides/sections)
- Finding 1: [Insight + supporting data]
- Finding 2: [Insight + supporting data]
- Finding 3: [Insight + supporting data]
 
### Supporting Analysis (appendix)
- Full methodology
- Data sources
- Assumptions
- Validation steps
- Code/queries
 
### Next Steps
- Immediate actions
- Follow-up analyses
- Open questions
 
## Quality Standards
 
Before sharing:
- [ ] Passes "5-year-old" explanation test
- [ ] Numbers are validated against baselines
- [ ] Recommendations are actionable
- [ ] Risks/caveats are clear
- [ ] You can defend every claim

Example Output:

**Executive Summary**

**TLDR:** Affiliates have 2.1x better LTV:CAC than paid search 
($450 vs $210), driven by higher conversion rates and better 
long-term retention.

**So What:** Shifting $100K from paid search to affiliates 
could generate $200K more in customer lifetime value.

**Now What:** Recommend 70/30 budget split (Affiliate/Paid Search) 
for Q2, pending approval from Mitesh.

---

**Key Finding 1: Affiliate Customers Are More Valuable**

[Chart: LTV by Channel]

- Affiliate LTV: $630 (30-day realized)
- Paid Search LTV: $320
- Difference driven by higher AOV and repeat purchase rate

---

**Key Finding 2: Affiliate CAC Is Lower**

[Chart: CAC by Channel]

- Affiliate CAC: $140 (including platform fees)
- Paid Search CAC: $152
- Savings despite higher commission rate (20% vs 15%)

---

**Key Finding 3: Payback Period Favors Affiliates**

[Chart: Cumulative Profit by Day]

- Affiliates: Payback in 18 days
- Paid Search: Payback in 32 days
- Cash flow advantage for affiliates

---

**Appendix: Methodology**

- Data Source: customers table, ad_spend_by_channel
- Time Period: Dec 1, 2025 - Jan 20, 2026
- Segments: New customers only (excluding reactivations)
- LTV Definition: First 30 days of revenue
- CAC Definition: All acquisition costs / new customers

**Assumptions:**
- 30-day LTV is proxy for long-term value (validated against cohorts)
- Attribution is accurate (recent improvements to tracking)
- No major seasonal effects between Dec/Jan

**Validation:**
- Numbers reconcile to finance team (within 2%)
- Edge cases tested (refunds, test accounts excluded)
- Peer reviewed by 3 AI models

Phase 7: Learning Capture (15 minutes)

Tool: Notion, Markdown file, or wiki

Slash Command: /learning_moment

Goal: Extract reusable knowledge from this analysis

# /learning_moment
 
What did we learn from this analysis that we can apply next time?
 
## Capture Learning
 
1. **Technical Learning**
   - New SQL patterns
   - Data quality issues discovered
   - Performance optimizations
   - Tools/methods used
 
2. **Business Learning**
   - How metrics are defined
   - Stakeholder preferences
   - Common questions asked
   - Decision-making process
 
3. **Process Learning**
   - What went well
   - What went poorly
   - Time estimates (actual vs planned)
   - Future improvements
 
## Update Knowledge Base
 
Based on this analysis:
- [ ] Update metric definitions
- [ ] Document data quality issues
- [ ] Add to query library
- [ ] Update slash commands (if mistakes found)
- [ ] Share learnings with team
 
## Output Format
 
**What We Learned:**
[Key insights from this project]
 
**What to Do Differently:**
[Changes for next time]
 
**What to Remember:**
[Reusable patterns/knowledge]

Example:

**What We Learned:**

Technical:
- LTV queries run 10x faster with date spine approach
- Affiliate attribution requires 7-day lookback window
- Our BigQuery cluster scales better with partitioned tables

Business:
- Finance defines CAC differently than Marketing (platform fees)
- Mitesh prefers 30-day LTV over 90-day (faster decision cycle)
- ELT wants payback curves, not just static LTV:CAC

Process:
- Discovery phase saved 1 day (almost went down wrong path)
- Multi-model review caught 3 critical bugs
- Stakeholder check-in halfway through prevented rework

**What to Do Differently:**

- Add affiliate lookback to `/discover` checklist
- Update CAC definition in metric glossary
- Create reusable payback curve template
- Block 30 min halfway through for stakeholder sync

**What to Remember:**

- Always validate attribution window assumptions
- Finance and Marketing may define costs differently
- Partitioned tables matter at scale
- Peer review catches 90% of bugs

The Compounding Knowledge Effect:

Week 1: Basic analysis
Week 4: 10 analyses, 10 lessons captured
Week 12: 30 analyses, your knowledge base is gold
Week 24: You rarely make the same mistake twice
Week 52: You’re a data expert (through accumulated learnings)

Implementation Guide

Week 1: Setup

Day 1-2: Build Your Data CTO

Create ChatGPT Project:

Name: “Data CTO”
Custom Instructions: [Use template above]
Knowledge Base: Upload company data dictionary, metric definitions

Day 3-4: Create Your Slash Commands

Start with these 5 essential commands:

/intake - Quick request capture
/discover - Problem exploration
/plan_analysis - Analysis blueprint
/execute_analysis - Code generation with best practices
/learning_moment - Knowledge capture

Day 5: Test Drive

Pick one real analysis request
Run through full workflow
Time each phase
Document what works/doesn’t work

Week 2-4: Iterate & Improve

Every Analysis:

Use the workflow
Note what breaks
Update slash commands
Document learnings

By Week 4, You Should Have:

10+ analyses completed
Battle-tested slash commands
Growing knowledge base
Faster execution (2x speed vs Week 1)

Month 2-3: Scale

Add Advanced Commands:

/deslop_analysis - Remove AI verbosity
/peer_review - Multi-model review process
/explain_to_stakeholder - Simplify technical content
/metric_definition - Standardize definitions

Build Reusable Assets:

Query templates library
Analysis plan templates
Visualization standards
Stakeholder communication templates

Month 4-6: Mastery

Agentic Workflows:

AI autonomously executes full analyses
You provide strategic oversight
Focus on problem framing, not execution

Knowledge Compounding:

Your slash commands are bulletproof
Your knowledge base is comprehensive
You rarely make the same mistake twice
You’re teaching others your workflow

Success Metrics

Team Effectiveness

Time per analysis (should decrease 50% by Month 3)
Rework rate (should decrease 70% by Month 3)
Stakeholder satisfaction (should increase)

Learning Velocity

New skills acquired per month
Complexity of analyses you can handle
Breadth of problems you can solve

Business Impact

Analyses per month (should 3x)
Decision velocity (faster time to insight)
Quality of recommendations (measured by adoption rate)

Common Pitfalls & Solutions

Pitfall 1: Skipping Discovery

Symptom: Building wrong thing, lots of rework

Solution:

Never skip /discover phase
Spend 30 min upfront saves 3 hours later
Confirm approach with stakeholder before executing

Pitfall 2: Trusting AI Output Blindly

Symptom: Sharing wrong analyses, losing stakeholder trust

Solution:

Always validate results against known baselines
Use multi-model peer review
Run /deslop_analysis before sharing
Own every output (it’s your mistake, not AI’s)

Pitfall 3: Not Documenting Learnings

Symptom: Making same mistakes repeatedly, slow growth

Solution:

Run /learning_moment after EVERY analysis
Update knowledge base weekly
Share learnings with team
Review monthly (what patterns emerged?)

Pitfall 4: Over-Automating

Symptom: Losing touch with the data, becoming a button-pusher

Solution:

Manually QA every analysis
Understand edge cases
Keep stakeholder relationships human
Use AI for execution, not thinking

Adapting for Different Use Cases

For Eden (Current Client):

High-Priority Workflows:

Affiliate Performance Analysis
Experiment Results Analysis
LTV:CAC by Channel
Customer Segmentation
Self-Service Analytics Enablement

Custom Slash Commands Needed:

/experiment_results - Analyze A/B test outcomes
/channel_attribution - Compare channel performance
/segment_analysis - Create customer segments
/elt_story - Package for executive audience

For Future Clients:

Discovery Questions:

What analyses do you run monthly?
What takes the most time?
What causes the most rework?
What do stakeholders ask for most?

Build Custom Commands Based On:

Their metric definitions
Their data sources
Their stakeholder preferences
Their decision-making process

The Future State

In 6 Months:

You’ve built 50+ analyses using this workflow
Your slash commands are battle-tested
Your knowledge base is comprehensive
You rarely make the same mistake twice

What Changes:

5x faster execution
10x broader skillset
Handling strategic work, not just tactical
Teaching others this workflow

The Unlock:

You’re not just a data analyst anymore
You’re a data strategist who can execute
You can take on senior-level scope as a junior
You’re a 10x learner, which makes you 10x valuable

Next Steps

Today:

Create your Data CTO in ChatGPT Projects
Copy these slash commands into Cursor
Pick one analysis and run through workflow

This Week:

Complete 2-3 analyses using this workflow
Document what works/doesn’t work
Start your learning capture process

This Month:

Build your knowledge base
Iterate on slash commands
Share learnings with team
Track speed improvements

This Quarter:

Master the agentic workflow
Train others on your system
Build reusable templates library
Measure business impact

Created: January 20, 2026 Based on: Zevi Arnovitz’s AI-enabled PM workflow Adapted by: Robert Tseng for data consulting Status: Living document - update after every learning

Appendix: Slash Command Library

[See separate file: cursor_slash_commands_data.md for full command library]

Brainforge Knowledge

Explorer

framework

Agentic Workflow for Data Products & Analyses

Philosophy: From Vibe Coding to Vibe Analytics

The 7 Best Practices → Data Work Translation

1. Structured Planning Over Eager Execution

2. Slash Commands = Reusable Data Workflows

3. Learning Faster > Analyzing Faster

4. Graduated Tool Adoption (Exposure Therapy)

5. Slop is a People Problem, Not an AI Problem

6. 10x Learner > 10x Analyst

7. AI Makes Junior Analysts More Valuable

The Complete Agentic Data Workflow

Workflow Overview

Phase 1: Request Intake (5 minutes)

Phase 2: Problem Discovery (30-60 minutes)

Phase 3: Analysis Planning (15-30 minutes)

Phase 4: Execute Analysis (Varies)

Output

Phase 6: Document Insights (30 minutes)

Phase 7: Learning Capture (15 minutes)

Implementation Guide

Week 1: Setup

Week 2-4: Iterate & Improve

Month 2-3: Scale

Month 4-6: Mastery

Success Metrics

Team Effectiveness

Learning Velocity

Business Impact

Common Pitfalls & Solutions

Pitfall 1: Skipping Discovery

Pitfall 2: Trusting AI Output Blindly

Pitfall 3: Not Documenting Learnings

Pitfall 4: Over-Automating

Adapting for Different Use Cases

For Eden (Current Client):

For Future Clients:

The Future State

Next Steps

Appendix: Slash Command Library

Graph View

Table of Contents