CTA Data Operations Q1 2026 — Statement of Work

Date: December 19, 2025
Version: 1.1
Client: Consumer Technology Association (CTA)
Author: Brainforge AI

1. Overview

This Statement of Work defines Brainforge’s continued engagement to scale and expand CTA’s modern DataOps platform throughout Q1 2026. Building on the foundation established in 2025 (dbt staging models, Snowflake infrastructure, and initial marts), this SOW focuses on expanding data coverage, building business-ready analytics datasets, and enabling self-service analytics across the organization. The work will continue scaling out data that’s landing, expanding the marts layer, and making data more accessible to CTA’s growing analytics needs.

2. Objectives

Expand Data Coverage: Complete staging models for all priority data sources and establish pipelines for new data sources
Build Business-Ready Marts: Create dimensional models and fact tables that enable self-service analytics for key business entities
Enable Team Productivity: Onboard CTA team members (Kyle, Kai, and others) to dbt and Snowflake, enabling them to build and maintain models independently
Improve Data Accessibility: Establish role-based access control, documentation, and training to make data discoverable and usable
Support CES 2026: Deliver analytics-ready datasets and reporting capabilities for CES event operations and post-event analysis
Establish Operational Excellence: Complete CI/CD pipelines, testing frameworks, and monitoring to ensure reliable data operations

3. Scope of Work

3.1 In-Scope

Data Source Expansion (Phase 1)

Complete Remembers staging models for all remaining modules (exhibit, speaker, award, app, etc.)
Build staging models for Salesforce Marketing Cloud (SFMC) data
Build staging models for Salesforce CRM data (post-CES, Feb+)
Ingest historical S3 archive data (~300-400 tables from legacy SQL Server for entity resolution)
Establish data pipelines for new sources as identified (e.g., Polytomic event data, Formstack data, scanner data)
Create monthly backup automation for Remembers data (drop and clone process)
Migrate webhooks database data to proper Snowflake structures
Katherine’s SFTP Workflow Migration: Convert daily CES invite Python/Postgres workflow to dbt (8-10 flat files → transformations → 5 views → FTP upload to Marketing Cloud)

Marts Layer Development (Phase 2)

Build dimensional models (dim_member, dim_organization, dim_event, dim_country, etc.)
Create fact tables (fct_registrations, fct_purchases, fct_email_engagement, etc.)
Develop business reports:
- Active membership report (replaces full-day manual Excel process for membership team)
- Member engagement report
- Event performance reports
Build CES-specific analytics datasets:
- Registration funnels (track 29+ min avg registration time)
- Badge scan analytics (lead retrieval + foot traffic)
- Session scanner data (session attendance patterns)
- Exhibitor ROI analysis
Entity Resolution Implementation: Build DataOps ID as canonical identifier across systems, starting with CES vendor integration
Create intermediate models for complex business logic and reusable transformations

Infrastructure & Operations (Phase 3)

ETL Platform Finalization: Complete Polytomic evaluation and setup, or establish alternative approach for P0 data sources (SFMC, Merits FTP, etc.)
Orchestration Implementation: Evaluate and implement Snowflake native task orchestration (preferred) vs GitHub Actions for dbt execution
Complete GitHub Actions CI/CD pipeline for dbt (automated testing on PRs)
Set up Snowflake CLI integration for automated repo refresh on merges
Implement comprehensive dbt testing framework (data quality tests, schema tests, custom tests)
Establish Snowflake role-based access control (RBAC) with functional roles
Set up Snowflake warehouses for different workload types (ETL, Transform, BI/Reporting)
Create monitoring and alerting for pipeline health
Document Snowflake grants and access patterns
FTP Integration: Establish S3 → Snowflake → dbt → FTP workflow for Marketing Cloud data delivery

Team Enablement (Phase 4)

Onboard Kyle to dbt and Snowflake (training, access setup, workflow documentation)
Onboard Kai (new BI analyst) to Snowflake and reporting workflows
Record dbt/Snowflake onboarding sessions for future team members and “citizen data engineers”
Cursor Demo: Provide training on Cursor for Katherine and Jay (both expressed interest)
Create team training materials and documentation
Establish code review process and best practices
Enable team members to build marts from intermediate models
Support “citizen data engineers” (Anna P/K/R, Chris Deathloff, Quinn, JC) with safe environment to learn

Documentation & Knowledge Management (Phase 5)

Complete schema.yml documentation for all models
Create data dictionary and business glossary
Document naming conventions and project structure
Establish data lineage documentation
Create runbooks for common operations
Integrate Snowflake catalog with Glean (exploration and implementation)

3.2 Out-of-Scope

Okta authentication optimization (separate discovery workstream - identified as causing 80% of customer support requests)
Shopify digital asset store evaluation (separate discovery workstream - authentication loop and download failures)
CES Registration Security Fix (DataOps ID vendor integration for lead retrieval - Katherine managing separately)
Full BI tool implementation (deferred to Q2 - Power BI available for interim use)
Long-term managed services beyond Q1 2026
Custom application development outside of data workflows
Data source migrations or system replacements
Marketing content creation or business strategy
Email deliverability fixes (DMARC/DKIM/SPF configuration)

4. Requirements & Inputs

Access & Permissions

Continued access to Snowflake, dbt Cloud, and GitHub repository
Access to new data sources as they are identified (Polyatomic, Formstack, etc.)
Administrative permissions for Snowflake RBAC setup
Access to Glean for integration exploration

Documentation

Data source documentation and API specifications
Business requirements for marts and reports
Existing data dictionaries and business glossaries
Historical data quality issues and known data problems

Stakeholder Availability

Katherine Bayless (Data Operations): Bi-weekly strategic alignment, ad-hoc questions
Kyle (Data Analyst): Weekly onboarding sessions, model review, requirements gathering
Kai (Data Analyst): Training sessions, requirements for member engagement reports
Jay Heavner (IT): RBAC setup coordination, infrastructure approvals
Other CTA team members: Ad-hoc requirements gathering, testing, feedback

Data & Systems

Access to Remembers data (via Snowflake Share) - ✅ Active
Access to SFMC data (API key available in AWS Secrets Manager)
Access to Salesforce CRM data (post-CES, Feb+, currently in “do not disturb” mode)
Historical S3 archive access (CTA-DataOps-Archive bucket with ~300-400 tables from legacy SQL Server)
Historical CES scan data (S3 bucket access - session scanner files “too big to import manually”)
Formstack/webhooks data for migration - ✅ Working (webhook → S3 → Snowflake)
Merits registration data (FTP access, flat files only - no APIs)
EventPoint data (good APIs available, post-CES priority)
Polytomic connector availability (or alternative ETL approach if Polytomic doesn’t proceed)

5. Deliverables

Phase 1: Data Source Expansion

Complete staging models for all Remembers modules (accounting, app, award, crm, crm_v2, exhibit, purchase, shopping, speaker)
SFMC staging models with documentation (sends, opens, clicks, bounces, unsubscribes, jobs, lists)
Salesforce CRM staging models with documentation (post-CES, Feb+)
Historical S3 archive ingestion (one-time load of ~300-400 tables for entity resolution)
Monthly backup automation script and documentation
Webhooks data migration to proper Snowflake structures
Katherine’s SFTP workflow migrated to dbt (daily CES invite process automation)
New data source pipelines: Merits FTP, session scanners, lead retrieval data (as identified)

Phase 2: Marts Layer Development

Dimensional models (dim_member, dim_organization, dim_event, dim_country, etc.)
Fact tables (fct_registrations, fct_purchases, fct_email_engagement, fct_lead_scans, fct_session_attendance, etc.)
Business reports:
- Active membership report (6-column spreadsheet replacement - “P Negative-1” priority per Katherine)
- Member engagement report
- Event performance reports
CES analytics datasets:
- Registration funnels (track and optimize 29+ min registration time)
- Badge scan analytics (lead retrieval + foot traffic for exhibitor ROI)
- Session scanner data (attendance patterns, popular sessions)
- Exhibitor engagement analysis
Entity resolution: DataOps ID implementation across historical + current data
Intermediate models for complex transformations and reusable business logic

Phase 3: Infrastructure & Operations

ETL platform setup (Polytomic or alternative approach) for P0 sources
Orchestration implementation (Snowflake native tasks preferred, or GitHub Actions backup)
GitHub Actions CI/CD pipeline (automated testing on PRs, repo → S3 deployment)
Snowflake CLI integration for automated repo refresh on merge
Comprehensive dbt testing framework (unique, not_null, relationships, custom tests)
Snowflake RBAC implementation with functional roles (data analyst, data engineer, read-only, etc.)
Snowflake warehouse configuration (ETL, Transform, BI/Reporting workloads)
FTP integration for Marketing Cloud data delivery workflow
Monitoring and alerting setup
Grants documentation

Phase 4: Team Enablement

Kyle onboarding complete (access, training, first models built independently)
Kai onboarding complete (Snowflake access, reporting workflows)
Cursor training delivered for Katherine and Jay
Recorded onboarding sessions for future team members and “citizen data engineers”
Training materials and documentation (dbt, Snowflake, best practices)
Code review process documentation
Best practices guide for building marts and maintaining models
“Calculated neglect” approach: enable team independence through safe environment to learn

Phase 5: Documentation & Knowledge Management

Complete schema.yml documentation for all models
Data dictionary and business glossary
Naming conventions and project structure documentation
Data lineage documentation
Operational runbooks
Glean integration assessment and implementation plan (if feasible)

6. Project Timeline

Q1 2026 (January - March 2026)

January 2026: Foundation & Expansion

Weeks 1-2: Complete Remembers staging models, begin SFMC staging
Weeks 3-4: SFMC and Salesforce CRM staging models, Kyle onboarding begins

February 2026: Marts & Infrastructure

Weeks 1-2: Begin dimensional models, complete RBAC setup, CI/CD pipeline
Weeks 3-4: Fact tables development, testing framework, team training

March 2026: Reports & Enablement

Weeks 1-2: Business reports, CES datasets, documentation completion
Weeks 3-4: Team enablement completion, Glean integration exploration, Q2 planning

Total Duration: 12 weeks (Q1 2026)

7. Assumptions

CTA stakeholders (Katherine, Kyle, Kai, Jay) are available for scheduled meetings and ad-hoc questions
Data sources remain accessible and stable throughout Q1
Remembers contract and data sharing agreement continues as expected
New data sources (Polyatomic, Formstack) can be accessed within Q1 timeline
CTA team members can dedicate time for training and onboarding
Business requirements for marts and reports can be gathered within Q1
No major organizational changes that would impact data operations
Snowflake capacity and performance remain adequate for growing data volumes
GitHub repository access and permissions remain stable

8. Risks

Risk	Impact	Mitigation
Data source access delays	Medium	Identify access requirements early; establish backup plans for critical sources; prioritize sources with existing access; Salesforce CRM blocked until post-CES (Feb+)
Polytomic evaluation delays	Medium	Maintain alternative approach (AWS Glue, manual pipelines) if Polytomic doesn’t proceed; have backup ETL plan ready
Team onboarding slower than expected	Medium	Start onboarding early; provide recorded sessions; create self-service documentation; schedule regular check-ins; Katherine’s “calculated neglect” approach
Business requirements unclear	Medium	Schedule regular requirements gathering sessions; start with high-priority use cases (“P Negative-1” = Remembers); iterate based on feedback
CES timeline constraints	High	No major changes before CES (Jan 2026); Katherine’s SFTP workflow must keep running; post-CES window for changes (Feb+)
Data quality issues discovered	Medium	Build comprehensive testing framework; document known issues (e.g., Remembers dashboard showing 1,658 vs 1,100 actual members); create data quality reports; prioritize critical fixes
Entity resolution complexity	Medium	Start with historical data profiling; build DataOps ID as canonical identifier; test with one vendor before expanding
Infrastructure capacity constraints	Low	Monitor Snowflake usage; optimize queries; scale warehouses as needed; plan for growth; Katherine wants to be “poster child for leveraging all Snowflake features”
Scope creep from other workstreams	Medium	Maintain clear boundaries with Okta/Shopify discovery work; Katherine’s security fix separate; prioritize Q1 deliverables; document dependencies
Finance scrutiny on costs	Medium	Demonstrate ROI early; leverage frugal approach (“squeeze value out of tools we have”); consolidate into Snowflake where possible; use AWS Marketplace for procurement ease

9. Acceptance Criteria

Phase 1 Acceptance:

All Remembers staging models complete with tests and documentation
SFMC and Salesforce CRM staging models operational
Monthly backup automation running successfully
Webhooks data migrated to proper structures

Phase 2 Acceptance:

Minimum 5 dimensional models delivered and tested
Minimum 3 fact tables delivered and tested
Active membership report and member engagement report operational
CES datasets available for event operations

Phase 3 Acceptance:

GitHub Actions CI/CD pipeline running on all PRs
Snowflake RBAC implemented with functional roles
Comprehensive dbt testing framework in place
Monitoring and alerting operational

Phase 4 Acceptance:

Kyle successfully onboarded and building models independently
Onboarding materials and recordings available
Code review process documented and in use

Phase 5 Acceptance:

All models have complete schema.yml documentation
Data dictionary and business glossary published
Operational runbooks available
Glean integration assessed (implementation may extend beyond Q1)

10. Communication Plan

Regular Meetings:

Weekly Technical Working Session (60 min): Kyle + Brainforge team (dbt development, model review)
Bi-weekly Strategic Alignment (30 min): Katherine Bayless + Uttam Kumaran
Monthly Infrastructure Review (30 min): Jay Heavner + Brainforge team (RBAC, infrastructure)
Ad-hoc sessions: As needed for requirements gathering, training, and issue resolution

Async Communication:

Dedicated Slack channel for daily questions and updates
GitHub repository for all code, documentation, and issues
Shared workspace (Google Drive or equivalent) for documentation and deliverables
Weekly status email summarizing progress, blockers, and upcoming milestones

Escalation Path:

Technical blockers escalated to Ashwini Sharma or Uttam Kumaran
Business requirements questions escalated to Katherine Bayless
Infrastructure/access issues escalated to Jay Heavner
Timeline risks communicated within 24 hours of identification

11. Open Questions

These questions will be addressed during Q1 execution:

Polyatomic Integration: Has Polytomic responded with evaluation results? If not proceeding, what is backup ETL approach for SFMC, Merits FTP, EventPoint?
Orchestration Decision: Snowflake native task orchestration vs GitHub Actions? Need to finalize for Katherine’s SFTP workflow.
Historical S3 Data Priority: Which of the ~300-400 tables should be ingested first for entity resolution? What is the timeline?
CES Scanner Data: Session scanner files “too big to import manually” - what is file size/format? When will data be available?
BI Tool Selection: Deferred to Q2 per Katherine. Power BI available for interim use. When to revisit Sigma evaluation?
Glean Integration: What are the technical requirements for Snowflake catalog integration? Priority post-Q1 based on Dec discussions.
DataOps ID Scope: Is Katherine’s vendor integration proceeding? How does this affect broader entity resolution work?
Citizen Data Engineers: Which additional team members (Anna P/K/R, Chris Deathloff, Quinn, JC, Tom Moschello) should be onboarded in Q1?
CES Post-Event Analysis: What are the specific reporting needs for post-CES analysis (Feb-Mar)? Lead gen, foot traffic, exhibitor ROI?
Salesforce CRM Timeline: When post-CES (Feb? March?) will CRM data access be available? What are integration priorities?

12. Pricing

Pricing for this SOW follows the hourly rate structure established in the BrainForge CTA Agreement dated November 12, 2025.

Estimated Effort for Q1 2026:

Phase	Estimated Hours	Description
Phase 1: Data Source Expansion	100-130 hours	Staging models, pipelines, data migration, historical S3 ingestion, Katherine’s SFTP workflow
Phase 2: Marts Layer Development	140-170 hours	Dimensional models, fact tables, reports, entity resolution, CES analytics
Phase 3: Infrastructure & Operations	80-100 hours	ETL/orchestration setup, CI/CD, RBAC, testing, monitoring, FTP integration
Phase 4: Team Enablement	50-70 hours	Training, onboarding, Cursor demo, citizen data engineer support, documentation
Phase 5: Documentation & Knowledge	40-60 hours	Documentation, data dictionary, runbooks, Glean assessment
Total Estimated Hours	410-530 hours	Q1 2026 Engagement

Updated Estimate Rationale:

Added historical S3 archive ingestion (~300-400 tables for entity resolution)
Added Katherine’s SFTP workflow migration (daily CES invite process)
Expanded CES analytics scope (session scanners, lead retrieval, exhibitor ROI)
Added orchestration decision and implementation (Snowflake native tasks evaluation)
Added Cursor training for Katherine and Jay
Expanded entity resolution scope (DataOps ID implementation)

Billing Structure:

Hours will be billed monthly based on actual time spent
Invoices will include detailed time logs by phase and deliverable
Any hours exceeding the estimated range will be discussed with Katherine Bayless before proceeding

Payment Terms:

Net 30 payment terms as per existing agreement
Monthly invoices submitted by the 5th of the following month

13. Sign-Off

By signing below, both parties acknowledge understanding and agreement with the scope, deliverables, timeline, and approach outlined in this Statement of Work.

Client (CTA):

Name: ___________________________
Title: ___________________________
Date: ___________________________
Signature: _______________________

Brainforge AI:

Name: Uttam Kumaran
Title: Managing Lead
Date: December 16, 2025
Signature: _______________________

Appendix A: Key Stakeholders

Name	Role	Involvement
Katherine Bayless	Senior Director, Data Engineering	Strategic sponsor, bi-weekly alignment, executive liaison, decision maker, “P Negative-1” prioritization
Kyle	Data Analyst (Market Research → Data)	Primary model builder, weekly working sessions, requirements gathering, R/Python skills, eager to learn dbt
Kai	Business Intelligence Analyst (New)	Member engagement reports, training sessions, requirements gathering, strong governance background
Jay Heavner	VP of IT	20+ years tenure, infrastructure approvals, RBAC coordination, access management, Okta/systems owner
Ashwini Sharma	Data Engineer (Brainforge)	Primary technical lead, dbt development, infrastructure setup, orchestration implementation
Samuel Roberts	Full-Stack Engineer (Brainforge)	Okta/Shopify discovery support, integration work
Uttam Kumaran	Managing Lead (Brainforge)	Client POC, strategic alignment, project oversight

Additional Stakeholders (Ad-hoc):

Anna P, Anna K, Anna R (Membership): Requirements for active member report
Chris Deathloff (Market Research): Analytics needs, data literacy champion
Quinn, JC (Business Intelligence): Survey analysis, reporting needs
Tom Moschello (ExpoCAD): Show floor data, data quality

Appendix B: Success Metrics

Brainforge will track these metrics during Q1 to measure progress and success:

Data Coverage Metrics:

Number of staging models completed
Number of data sources integrated
Percentage of Remembers modules with staging models
Data freshness (time from source to staging)

Marts Development Metrics:

Number of dimensional models delivered
Number of fact tables delivered
Number of business reports operational
Model test coverage percentage

Team Enablement Metrics:

Number of team members onboarded
Number of models built by CTA team members
Time to first model (for new team members)
Code review participation rate

Operational Excellence Metrics:

CI/CD pipeline success rate
Average time for PR review and merge
Number of data quality issues caught by tests
Pipeline uptime percentage

Business Impact Metrics:

Number of dashboards/reports using marts data
Number of active Snowflake users
Number of questions answered by data team
CES reporting capabilities delivered

Appendix C: Deliverable Checklist