Breezy - Data Foundation & Product Analytics Engagement

Client: Breezy (Real Estate Agent Operating System)
Brainforge Team: Uttam Kumaran, Awaish Kumar
Engagement Type: Data Infrastructure & Product Analytics
Status: Discovery & Scoping Phase


Project Overview

Breezy is building an AI-powered operating system for real estate agents, with features including:

  • MLS Comps: Accurate property comparable generation using CoreLogic/Hotality data
  • Underbuilt: Building code and zoning analysis from municipal sources
  • AI Note-taker, Pipeline Management, and more

This engagement focuses on establishing foundational data infrastructure and product analytics to support:

  1. Product Analytics: Understanding user retention, feature adoption, and growth metrics
  2. MLS Data Pipeline: Processing 160M+ records daily for sub-1-second comp generation
  3. Underbuilt Intelligence: Expansion prioritization and extraction automation exploration

Repository Structure

breezy/
├── resources/
│   ├── readme.md                      # This file - project overview and navigation
│   ├── MEETING_WORKFLOW.md            # Meeting management workflow
│   ├── SOW_UPDATE_SUMMARY_2024-12-12.md
│   ├── WORKFLOW_COMPLETE_SUMMARY.md
│   └── running_slack_messages.md      # Quick-reference decisions log
│
├── sows/
│   └── SOW-Breezy-BrainforgeAI.md    # Current Statement of Work (living document)
│
└── transcripts/                       # All call artifacts (transcripts, notes, agendas)
    ├── README.md
    ├── brainforge_breezy_initial_discussion.md
    ├── brainforge_breezy_webinar_demo_12_2_2025.md
    ├── brainforge_breezy_technical_discussion_12_10_25.md
    └── 2024-12-10_technical_deep_dive_notes.md

Core Documents

Latest meeting materials (transcripts/)

Previous discovery


Engagement Phases

Phase 1: Analytics Foundation (Weeks 1-4 | Early January 2026)

Status: Not Started (Waiting for Android launch completion)

Focus:

  • Audit MixPanel/Statsig event taxonomy
  • Build D0/D7/D30 retention dashboards
  • Establish DAU/MAU tracking and cohort analysis
  • Create feature adoption matrix

Key Deliverable: Product analytics playbook and core measurement dashboards


Phase 2: MLS Data Infrastructure (Weeks 3-8 | January 2026)

Status: Architecture Planning

Focus:

  • Process 160M records daily from CoreLogic/Hotality
  • Achieve sub-1-second query performance for comp generation
  • Data lake-first architecture (S3 → PySpark → Postgres) recommended
  • Data quality monitoring and alerting

Key Deliverable: Production-ready comp generation pipeline

Technical Approach:

CoreLogic (160M records, 300 cols) 
    ↓ (SFTP/S3)
S3 Data Lake
    ↓ (PySpark parallel processing)
Postgres (PostGIS geo-spatial indexes)
    ↓ (<1s query latency)
Comp Generation API

Open Questions:

  • Full dumps vs. incremental deltas from CoreLogic?
  • Schema documentation availability?
  • Daily delivery timing?

Phase 3: Underbuilt Expansion Intelligence (Weeks 5-8 | January 2026)

Status: Scope Defined

Focus:

  • Analyze address search logs for expansion prioritization
  • Build demand heatmap (top 100 cities)
  • POC for LLM/OCR-based extraction automation
  • Evaluate modern extraction tools

Key Deliverable: Expansion roadmap + automation feasibility report


Key Technical Requirements

MLS Data Pipeline

  • Scale: 160 million records/day, ~300 columns
  • Performance: Sub-1-second comp queries (95th percentile)
  • Availability: Production-grade, zero downtime during refreshes
  • Data Quality: 95%+ properties have sale date, listing date, price
  • Source: CoreLogic/Hotality (gold standard for MLS data)

Product Analytics

  • Current Stack: MixPanel (events streaming), Statsig (experimentation)
  • User Base: 200-400 beta users, 300+ waitlist signups expected Jan 20
  • Key Metrics: D0/D7/D30 retention, DAU/MAU, feature adoption funnels

Underbuilt

  • Current State: Manual extraction from PDFs/Word docs/county websites
  • Coverage: ~500 cities currently
  • Opportunity: LLM-based automation for scalability
  • Competitive Moat: “As far as I know, we’re the only ones doing this” - Greg

Timeline & Milestones

Current Phase: Discovery & Scoping (Complete)

Next Phase: Phase 1 Analytics Foundation

  • Start Date: Early January 2026 (post-Android launch)
  • Duration: 4 weeks
  • End Date: Late January 2026

Critical Deadline: January 20, 2026 - Waitlist launch (300+ signups expected)

Key Dates:

  • End of December 2024: Android launch (Breezy team focus)
  • Early January 2026: Phase 1 kickoff
  • Mid-February 2026: MLS pipeline operational
  • Late February 2026: Full engagement complete

Team & Communication

Brainforge Team

  • Uttam Kumaran - Strategist, Lead (Austin, TX)
  • Awaish Kumar - Data Architect (Pakistan)
  • [Analytics Engineer TBD]

Breezy Stakeholders

  • James Harris (Jimsy) - Co-founder, Product Context (2-3 hrs/week)
  • Sigal Bareket - Co-founder, Growth/Marketing (2 hrs/week)
  • Xiaojie Zhang - Engineering Lead, China Team (AI/Pipeline focus)
  • Greg - Founding Engineer, Backend (MLS project lead, 5+ hrs/week Phase 2)

Communication Cadence

  • Daily: Slack async standups (blockers, updates)
  • Weekly: 60-min sync with founder(s) + technical SMEs
  • Bi-weekly: Technical deep-dives as needed
  • Ad-hoc: Loom videos for demos when schedules conflict

Channels

  • Slack: Primary async communication
  • Linear: Milestone tracking, task management
  • GitHub: Code, documentation (if applicable)
  • This repo: transcripts/ (calls), sows/, resources/ (this overview + Slack log)

Document Management Process

We follow a structured workflow for all meetings:

Pre-Meeting (Agenda) → Meeting (Recording) → Post-Meeting Processing
                                                    ↓
                                            `transcripts/` (transcript + notes + optional agenda)
                                                    ↓
                                            SOW Updates + Slack Notes

See MEETING_WORKFLOW.md for complete process.

Key Principles:

  1. All meetings recorded (with consent) for transcript generation
  2. Meeting notes created within 24 hours while context fresh
  3. SOW updated within 48 hours of scope-impacting discussions
  4. Link all related documents for easy navigation
  5. Track changes via git for SOW evolution visibility

Recent Updates

December 12, 2024

  • Created: Meeting Workflow Guide, Technical Deep-Dive Notes, SOW Update Summary
  • Updated: SOW with MLS architecture details, performance requirements, Underbuilt automation scope
  • Key Insights: 160M record scale, sub-1s query requirement, data lake-first approach recommended
  • Timeline: Phase 1 start aligned with Android launch completion (early January)

December 10, 2024

  • Meeting: Technical deep-dive with Greg and Xiaojie
  • Major Findings:
    • MLS scale significantly larger than initially understood
    • CoreLogic provides flexible delivery options (S3, SFTP, Snowflake)
    • No current analytics use case for MLS data (production-first)
    • Underbuilt currently very manual, exploring LLM automation

December 2, 2024

  • Attended: Breezy webinar demo with beta users
  • Observations: Product walkthrough, agent feedback, underbuilt feature excitement

Key Decisions & Open Questions

Decisions Made

✅ Phase 1 starts after Android launch (early January 2026)
✅ Data lake-first architecture (Option A) recommended for MLS pipeline
✅ Skip Snowflake unless analytics use case confirmed
✅ Underbuilt automation is POC/exploration, not production implementation in this engagement
✅ Use existing MixPanel/Statsig, audit and enhance vs. rebuild

Open Questions

❓ What is exact Android launch date?
❓ CoreLogic delivery mechanism: full dumps vs. deltas?
❓ What is daily data delivery timing and SLA?
❓ Do we need Postgres optimization or alternative query engine (Elasticsearch)?
❓ When does Breezy team have bandwidth for Phase 1 kickoff?


Success Criteria

Phase 1: Analytics Foundation

  • Breezy team independently runs retention queries in MixPanel
  • Feature usage data validates/challenges product hypotheses
  • Dashboards load in <3 seconds with 90 days of data
  • Event taxonomy documented and adopted by engineering team

Phase 2: MLS Data Pipeline

  • Comp queries return in <1 second (95th percentile)
  • Pipeline processes 160M records within daily SLA
  • Data quality tests pass: 95%+ properties have required fields
  • Zero downtime during daily refreshes
  • Engineering team can query any property in covered markets

Phase 3: Underbuilt Intelligence

  • Expansion prioritization validates against 3+ known high-demand markets
  • Extraction POC demonstrates >80% accuracy on structured data
  • Breezy has clear roadmap for Underbuilt scaling
  • ROI model informs product decisions on expansion velocity

Reference Materials

Breezy Product Context

  • Market: Real estate agents (residential & commercial)
  • Value Prop: AI-powered tools for property analysis, comp generation, building code research
  • Competitors: Traditional CRMs (lacking AI/data depth), Zillow/Redfin (not agent-focused)
  • Pricing Strategy: 15-20% premium vs. legacy CRMs based on accuracy advantage
  • Stage: Beta with 200-400 users, preparing for January 2026 waitlist launch

Data Sources

  • CoreLogic/Hotality: MLS bulk data (160M records, “gold standard”)
  • Trestle: CoreLogic real-time API (California only, requires per-MLS broker agreements)
  • MixPanel/Statsig: Product usage events
  • County/Municipal: Building codes, zoning (PDFs, Word docs, some APIs)

Tech Stack

  • Production DB: Postgres
  • Analytics: MixPanel, Statsig
  • Marketing: Customer.io (integrating)
  • Backend: Not specified (Greg’s domain)
  • Mobile: iOS live, Android launching end of December

Contact & Access

Project Lead: Uttam Kumaran (uttam@brainforge.ai)

Access Needed for Kickoff:

  • MixPanel (read access)
  • Statsig (configuration review)
  • Customer.io (event schema)
  • CoreLogic credentials (Phase 2)
  • Codebase (event instrumentation review)
  • Address search logs (Phase 3)

Questions? Reach out via Slack or email.


Appendix: Meeting History

DateTypeAttendeesKey TopicsArtifacts
Dec 10, 2024Technical Deep-DiveGreg, Xiaojie, Uttam, AwaishMLS architecture, scale, performance; Underbuilt automation; TimelineNotes, Transcript
Dec 2, 2024Product DemoBreezy team, Beta users, Uttam (observer)Webinar walkthrough, feature feedbackTranscript
[TBD]Initial Discovery[TBD]Project overview, initial requirementsTranscript

Last Updated: December 12, 2024