MotherDuck: Default (Calendly-style) export pipeline

Last updated: 2026-04-09
Audience: Data / analytics engineers loading Default product exports into MotherDuck for Omni and downstream marts.

See also: MotherDuck: Brainforge warehouse overview (full org map: raw_data, other databases).

Purpose

Operational exports from Default (events, forms, teams, meetings, submissions, etc.) are landed in MotherDuck, staged in my_db, then merged into raw_export for modeling and BI. This document records where data lives, how loads are sequenced, and where the SQL scripts live.

Other sources (e.g. Plain, CRM) may live under raw_data or other databases — not loaded by this CSV pipeline; see the warehouse overview.

MotherDuck layout (this pipeline only)

DatabaseRole
my_dbStaging and scratch. Time-bounded dumps use a schema like stg_YYYYMMDD_YYYYMMDD (inclusive window). Legacy reference schemas (e.g. 12-16) may still exist for comparison.
raw_exportProduction-style main.raw_* tables consumed by dbt/Omni (e.g. raw_teams, raw_events, raw_meetings, raw_submissions).

Upsert pattern: delete by id overlap with staging, then INSERT … SELECT * (or an explicit column list when types differ).

Export bundle (typical folder)

Exports are often delivered as a folder named with the date window, for example:

12-28-2025 to 03-20-2026 (latest)/

Smaller CSVs (underscore-prefixed, timestamped):

  • _teams_*.csv, _members_*.csv, _queues_*.csv, _queues_members_*.csv
  • _events_*.csv, _forms_*.csv

Large CSVs (may be multi-part):

  • Meetings *.csv — single file, often 10+ GB
  • Submissions * of 4 *.csvfour parts, each often 10+ GB

Always confirm all expected files are present before a long load; meetings/submissions are easy to miss in automation.

Extended columns (meetings & submissions)

Newer exports append attribution / landing fields after the historical column set. These were added to raw_export.main.raw_meetings and raw_export.main.raw_submissions (see playbook DDL):

Meetings (after meeting_link):
utm_name, utm_source, utm_medium, utm_campaign, utm_term, utm_content, referrer, submission_url, landing_url, landing_referrer, gclid

Submissions (after submission_id):
has_errors (boolean), then the same UTM/URL block as above (no duplicate referrer in the extension block; base table already has referrer).

Loads should use SELECT * from CSV once these columns exist so staging and raw_export stay aligned with CSV column order.

SQL scripts (source of truth)

Scripts live in the Default client / brainforge-bi repo (paths relative to that checkout):

ScriptWhat it does
scripts/add_utm_landing_columns_raw_export.sqlALTER TABLE … ADD COLUMN IF NOT EXISTS for extended meetings/submissions columns on raw_export.
scripts/load_my_db_stg_20251228_20260320.sqlLoad small dimension files from CSV into my_db.stg_* (excludes CSV-only columns not in legacy shape).
scripts/merge_stg_into_raw_export.sqlUpsert teams, members, queues, events, forms, queues_members into raw_export. raw_queues_members.deleted_at is cast to TIMESTAMP to match raw_export.
scripts/load_my_db_meetings_submissions.sqlLoad full meetings + all submission parts into my_db staging, then upsert raw_meetings / raw_submissions (long-running).
scripts/merge_stg_meetings_submissions_into_raw_export.sqlMerge only meetings + submissions if staging is already populated.

When the date window or folder path changes, copy the SQL into a new staging schema name and update read_csv_auto paths inside the scripts (or parameterize in a wrapper).

Runtime expectations

Large exports (tens of GB) are dominated by disk read + parse + upload to MotherDuck. Expect hours, not minutes; the DuckDB CLI may print little until each step finishes. See the playbook for monitoring and ballpark timing.

Open follow-ups

  • Backfill extended UTM/landing columns for rows loaded before the DDL migration (NULL until backfilled).
  • Optionally mirror extended columns or staging conventions in my_db reference schemas used for QA.