MotherDuck: Default (Calendly-style) export pipeline
Last updated: 2026-04-09
Audience: Data / analytics engineers loading Default product exports into MotherDuck for Omni and downstream marts.
See also: MotherDuck: Brainforge warehouse overview (full org map: raw_data, other databases).
Purpose
Operational exports from Default (events, forms, teams, meetings, submissions, etc.) are landed in MotherDuck, staged in my_db, then merged into raw_export for modeling and BI. This document records where data lives, how loads are sequenced, and where the SQL scripts live.
Other sources (e.g. Plain, CRM) may live under raw_data or other databases — not loaded by this CSV pipeline; see the warehouse overview.
MotherDuck layout (this pipeline only)
| Database | Role |
|---|---|
my_db | Staging and scratch. Time-bounded dumps use a schema like stg_YYYYMMDD_YYYYMMDD (inclusive window). Legacy reference schemas (e.g. 12-16) may still exist for comparison. |
raw_export | Production-style main.raw_* tables consumed by dbt/Omni (e.g. raw_teams, raw_events, raw_meetings, raw_submissions). |
Upsert pattern: delete by id overlap with staging, then INSERT … SELECT * (or an explicit column list when types differ).
Export bundle (typical folder)
Exports are often delivered as a folder named with the date window, for example:
12-28-2025 to 03-20-2026 (latest)/
Smaller CSVs (underscore-prefixed, timestamped):
_teams_*.csv,_members_*.csv,_queues_*.csv,_queues_members_*.csv_events_*.csv,_forms_*.csv
Large CSVs (may be multi-part):
Meetings *.csv— single file, often 10+ GBSubmissions * of 4 *.csv— four parts, each often 10+ GB
Always confirm all expected files are present before a long load; meetings/submissions are easy to miss in automation.
Extended columns (meetings & submissions)
Newer exports append attribution / landing fields after the historical column set. These were added to raw_export.main.raw_meetings and raw_export.main.raw_submissions (see playbook DDL):
Meetings (after meeting_link):
utm_name, utm_source, utm_medium, utm_campaign, utm_term, utm_content, referrer, submission_url, landing_url, landing_referrer, gclid
Submissions (after submission_id):
has_errors (boolean), then the same UTM/URL block as above (no duplicate referrer in the extension block; base table already has referrer).
Loads should use SELECT * from CSV once these columns exist so staging and raw_export stay aligned with CSV column order.
SQL scripts (source of truth)
Scripts live in the Default client / brainforge-bi repo (paths relative to that checkout):
| Script | What it does |
|---|---|
scripts/add_utm_landing_columns_raw_export.sql | ALTER TABLE … ADD COLUMN IF NOT EXISTS for extended meetings/submissions columns on raw_export. |
scripts/load_my_db_stg_20251228_20260320.sql | Load small dimension files from CSV into my_db.stg_* (excludes CSV-only columns not in legacy shape). |
scripts/merge_stg_into_raw_export.sql | Upsert teams, members, queues, events, forms, queues_members into raw_export. raw_queues_members.deleted_at is cast to TIMESTAMP to match raw_export. |
scripts/load_my_db_meetings_submissions.sql | Load full meetings + all submission parts into my_db staging, then upsert raw_meetings / raw_submissions (long-running). |
scripts/merge_stg_meetings_submissions_into_raw_export.sql | Merge only meetings + submissions if staging is already populated. |
When the date window or folder path changes, copy the SQL into a new staging schema name and update read_csv_auto paths inside the scripts (or parameterize in a wrapper).
Runtime expectations
Large exports (tens of GB) are dominated by disk read + parse + upload to MotherDuck. Expect hours, not minutes; the DuckDB CLI may print little until each step finishes. See the playbook for monitoring and ballpark timing.
Related
- CLI setup (install, token, no repeated SSO): MotherDuck CLI setup
- Playbook (step-by-step load): MotherDuck — Default export load
- Org layout: MotherDuck: Brainforge warehouse overview
- CLI / connect: DuckDB CLI: Connecting to MotherDuck
- Auth / token: Authenticating to MotherDuck
Open follow-ups
- Backfill extended UTM/landing columns for rows loaded before the DDL migration (NULL until backfilled).
- Optionally mirror extended columns or staging conventions in
my_dbreference schemas used for QA.