Breezy - Data Foundation & Product Analytics Engagement
Client: Breezy (Real Estate Agent Operating System)
Brainforge Team: Uttam Kumaran, Awaish Kumar
Engagement Type: Data Infrastructure & Product Analytics
Status: Discovery & Scoping Phase
Project Overview
Breezy is building an AI-powered operating system for real estate agents, with features including:
- MLS Comps: Accurate property comparable generation using CoreLogic/Hotality data
- Underbuilt: Building code and zoning analysis from municipal sources
- AI Note-taker, Pipeline Management, and more
This engagement focuses on establishing foundational data infrastructure and product analytics to support:
- Product Analytics: Understanding user retention, feature adoption, and growth metrics
- MLS Data Pipeline: Processing 160M+ records daily for sub-1-second comp generation
- Underbuilt Intelligence: Expansion prioritization and extraction automation exploration
Repository Structure
breezy/
├── resources/
│ ├── readme.md # This file - project overview and navigation
│ ├── MEETING_WORKFLOW.md # Meeting management workflow
│ ├── SOW_UPDATE_SUMMARY_2024-12-12.md
│ ├── WORKFLOW_COMPLETE_SUMMARY.md
│ └── running_slack_messages.md # Quick-reference decisions log
│
├── sows/
│ └── SOW-Breezy-BrainforgeAI.md # Current Statement of Work (living document)
│
└── transcripts/ # All call artifacts (transcripts, notes, agendas)
├── README.md
├── brainforge_breezy_initial_discussion.md
├── brainforge_breezy_webinar_demo_12_2_2025.md
├── brainforge_breezy_technical_discussion_12_10_25.md
└── 2024-12-10_technical_deep_dive_notes.md
Quick Links
Core Documents
- Statement of Work (SOW) - Full engagement scope, deliverables, timeline
- SOW Update Summary - What changed after Dec 10 technical deep-dive
- Meeting Workflow Guide - How we manage meetings and documentation
Latest meeting materials (transcripts/)
- Technical deep-dive notes (Dec 10) - MLS architecture, performance requirements
- Technical deep-dive transcript (Dec 10) - Full conversation record
Previous discovery
- Webinar demo (Dec 2) - Product walkthrough with beta users
- Initial discussion - First stakeholder conversation
Engagement Phases
Phase 1: Analytics Foundation (Weeks 1-4 | Early January 2026)
Status: Not Started (Waiting for Android launch completion)
Focus:
- Audit MixPanel/Statsig event taxonomy
- Build D0/D7/D30 retention dashboards
- Establish DAU/MAU tracking and cohort analysis
- Create feature adoption matrix
Key Deliverable: Product analytics playbook and core measurement dashboards
Phase 2: MLS Data Infrastructure (Weeks 3-8 | January 2026)
Status: Architecture Planning
Focus:
- Process 160M records daily from CoreLogic/Hotality
- Achieve sub-1-second query performance for comp generation
- Data lake-first architecture (S3 → PySpark → Postgres) recommended
- Data quality monitoring and alerting
Key Deliverable: Production-ready comp generation pipeline
Technical Approach:
CoreLogic (160M records, 300 cols)
↓ (SFTP/S3)
S3 Data Lake
↓ (PySpark parallel processing)
Postgres (PostGIS geo-spatial indexes)
↓ (<1s query latency)
Comp Generation API
Open Questions:
- Full dumps vs. incremental deltas from CoreLogic?
- Schema documentation availability?
- Daily delivery timing?
Phase 3: Underbuilt Expansion Intelligence (Weeks 5-8 | January 2026)
Status: Scope Defined
Focus:
- Analyze address search logs for expansion prioritization
- Build demand heatmap (top 100 cities)
- POC for LLM/OCR-based extraction automation
- Evaluate modern extraction tools
Key Deliverable: Expansion roadmap + automation feasibility report
Key Technical Requirements
MLS Data Pipeline
- Scale: 160 million records/day, ~300 columns
- Performance: Sub-1-second comp queries (95th percentile)
- Availability: Production-grade, zero downtime during refreshes
- Data Quality: 95%+ properties have sale date, listing date, price
- Source: CoreLogic/Hotality (gold standard for MLS data)
Product Analytics
- Current Stack: MixPanel (events streaming), Statsig (experimentation)
- User Base: 200-400 beta users, 300+ waitlist signups expected Jan 20
- Key Metrics: D0/D7/D30 retention, DAU/MAU, feature adoption funnels
Underbuilt
- Current State: Manual extraction from PDFs/Word docs/county websites
- Coverage: ~500 cities currently
- Opportunity: LLM-based automation for scalability
- Competitive Moat: “As far as I know, we’re the only ones doing this” - Greg
Timeline & Milestones
Current Phase: Discovery & Scoping (Complete)
Next Phase: Phase 1 Analytics Foundation
- Start Date: Early January 2026 (post-Android launch)
- Duration: 4 weeks
- End Date: Late January 2026
Critical Deadline: January 20, 2026 - Waitlist launch (300+ signups expected)
Key Dates:
- End of December 2024: Android launch (Breezy team focus)
- Early January 2026: Phase 1 kickoff
- Mid-February 2026: MLS pipeline operational
- Late February 2026: Full engagement complete
Team & Communication
Brainforge Team
- Uttam Kumaran - Strategist, Lead (Austin, TX)
- Awaish Kumar - Data Architect (Pakistan)
- [Analytics Engineer TBD]
Breezy Stakeholders
- James Harris (Jimsy) - Co-founder, Product Context (2-3 hrs/week)
- Sigal Bareket - Co-founder, Growth/Marketing (2 hrs/week)
- Xiaojie Zhang - Engineering Lead, China Team (AI/Pipeline focus)
- Greg - Founding Engineer, Backend (MLS project lead, 5+ hrs/week Phase 2)
Communication Cadence
- Daily: Slack async standups (blockers, updates)
- Weekly: 60-min sync with founder(s) + technical SMEs
- Bi-weekly: Technical deep-dives as needed
- Ad-hoc: Loom videos for demos when schedules conflict
Channels
- Slack: Primary async communication
- Linear: Milestone tracking, task management
- GitHub: Code, documentation (if applicable)
- This repo:
transcripts/(calls),sows/,resources/(this overview + Slack log)
Document Management Process
We follow a structured workflow for all meetings:
Pre-Meeting (Agenda) → Meeting (Recording) → Post-Meeting Processing
↓
`transcripts/` (transcript + notes + optional agenda)
↓
SOW Updates + Slack Notes
See MEETING_WORKFLOW.md for complete process.
Key Principles:
- All meetings recorded (with consent) for transcript generation
- Meeting notes created within 24 hours while context fresh
- SOW updated within 48 hours of scope-impacting discussions
- Link all related documents for easy navigation
- Track changes via git for SOW evolution visibility
Recent Updates
December 12, 2024
- Created: Meeting Workflow Guide, Technical Deep-Dive Notes, SOW Update Summary
- Updated: SOW with MLS architecture details, performance requirements, Underbuilt automation scope
- Key Insights: 160M record scale, sub-1s query requirement, data lake-first approach recommended
- Timeline: Phase 1 start aligned with Android launch completion (early January)
December 10, 2024
- Meeting: Technical deep-dive with Greg and Xiaojie
- Major Findings:
- MLS scale significantly larger than initially understood
- CoreLogic provides flexible delivery options (S3, SFTP, Snowflake)
- No current analytics use case for MLS data (production-first)
- Underbuilt currently very manual, exploring LLM automation
December 2, 2024
- Attended: Breezy webinar demo with beta users
- Observations: Product walkthrough, agent feedback, underbuilt feature excitement
Key Decisions & Open Questions
Decisions Made
✅ Phase 1 starts after Android launch (early January 2026)
✅ Data lake-first architecture (Option A) recommended for MLS pipeline
✅ Skip Snowflake unless analytics use case confirmed
✅ Underbuilt automation is POC/exploration, not production implementation in this engagement
✅ Use existing MixPanel/Statsig, audit and enhance vs. rebuild
Open Questions
❓ What is exact Android launch date?
❓ CoreLogic delivery mechanism: full dumps vs. deltas?
❓ What is daily data delivery timing and SLA?
❓ Do we need Postgres optimization or alternative query engine (Elasticsearch)?
❓ When does Breezy team have bandwidth for Phase 1 kickoff?
Success Criteria
Phase 1: Analytics Foundation
- Breezy team independently runs retention queries in MixPanel
- Feature usage data validates/challenges product hypotheses
- Dashboards load in <3 seconds with 90 days of data
- Event taxonomy documented and adopted by engineering team
Phase 2: MLS Data Pipeline
- Comp queries return in <1 second (95th percentile)
- Pipeline processes 160M records within daily SLA
- Data quality tests pass: 95%+ properties have required fields
- Zero downtime during daily refreshes
- Engineering team can query any property in covered markets
Phase 3: Underbuilt Intelligence
- Expansion prioritization validates against 3+ known high-demand markets
- Extraction POC demonstrates >80% accuracy on structured data
- Breezy has clear roadmap for Underbuilt scaling
- ROI model informs product decisions on expansion velocity
Reference Materials
Breezy Product Context
- Market: Real estate agents (residential & commercial)
- Value Prop: AI-powered tools for property analysis, comp generation, building code research
- Competitors: Traditional CRMs (lacking AI/data depth), Zillow/Redfin (not agent-focused)
- Pricing Strategy: 15-20% premium vs. legacy CRMs based on accuracy advantage
- Stage: Beta with 200-400 users, preparing for January 2026 waitlist launch
Data Sources
- CoreLogic/Hotality: MLS bulk data (160M records, “gold standard”)
- Trestle: CoreLogic real-time API (California only, requires per-MLS broker agreements)
- MixPanel/Statsig: Product usage events
- County/Municipal: Building codes, zoning (PDFs, Word docs, some APIs)
Tech Stack
- Production DB: Postgres
- Analytics: MixPanel, Statsig
- Marketing: Customer.io (integrating)
- Backend: Not specified (Greg’s domain)
- Mobile: iOS live, Android launching end of December
Contact & Access
Project Lead: Uttam Kumaran (uttam@brainforge.ai)
Access Needed for Kickoff:
- MixPanel (read access)
- Statsig (configuration review)
- Customer.io (event schema)
- CoreLogic credentials (Phase 2)
- Codebase (event instrumentation review)
- Address search logs (Phase 3)
Questions? Reach out via Slack or email.
Appendix: Meeting History
| Date | Type | Attendees | Key Topics | Artifacts |
|---|---|---|---|---|
| Dec 10, 2024 | Technical Deep-Dive | Greg, Xiaojie, Uttam, Awaish | MLS architecture, scale, performance; Underbuilt automation; Timeline | Notes, Transcript |
| Dec 2, 2024 | Product Demo | Breezy team, Beta users, Uttam (observer) | Webinar walkthrough, feature feedback | Transcript |
| [TBD] | Initial Discovery | [TBD] | Project overview, initial requirements | Transcript |
Last Updated: December 12, 2024