Playbook: MotherDuck CLI — load Default export into my_db and raw_export
Last updated: 2026-04-09
Prereqs:
- MotherDuck CLI setup — DuckDB CLI install,
MOTHERDUCK_TOKEN, verifyduckdb "md:"(avoids repeated browser auth). - Access to MotherDuck org, export files on local disk, DuckDB CLI with MotherDuck support.
- SQL scripts from the Default brainforge-bi repo (see engineering pipeline note).
1. Know your paths
- Export folder: e.g.
…/12-28-2025 to 03-20-2026 (latest)/— confirm meetings + all submission parts exist before starting long jobs. - SQL scripts: Default brainforge-bi repo,
scripts/directory (see engineering note: MotherDuck Default export pipeline).
2. Schema DDL (extended columns) — run once per environment
If raw_meetings / raw_submissions need UTM + landing columns:
duckdb "md:" -f scripts/add_utm_landing_columns_raw_export.sql(Idempotent: uses ADD COLUMN IF NOT EXISTS.)
3. Load small tables → my_db staging
Edit load_my_db_stg_*.sql if the folder path or staging schema name changed, then:
duckdb "md:" -f scripts/load_my_db_stg_20251228_20260320.sql4. Merge small tables → raw_export
duckdb "md:" -f scripts/merge_stg_into_raw_export.sqlNote: raw_queues_members merge uses CAST(deleted_at AS TIMESTAMP) because raw_export uses TIMESTAMP for that column while staging may be TIMESTAMPTZ.
5. Load meetings + submissions (large — hours)
Single script (full CSV columns + upsert to raw_export):
duckdb "md:" -f scripts/load_my_db_meetings_submissions.sqlOptions for long runs
- Run in
tmux/screenornohupso disconnects do not kill the job. - Redirect output to a log file and
tail -fit; DuckDB may stay quiet until a step completes.
Ballpark duration: Tens of GB often means several hours, highly dependent on upload bandwidth and row width.
Merge only (staging already loaded):
duckdb "md:" -f scripts/merge_stg_meetings_submissions_into_raw_export.sql6. Verify
USE my_db;
SELECT COUNT(*) FROM stg_20251228_20260320.meetings;
SELECT COUNT(*) FROM stg_20251228_20260320.submissions;
USE raw_export;
SELECT COUNT(*) FROM main.raw_meetings;
SELECT COUNT(*) FROM main.raw_submissions;Compare counts to expectations from the export manifest or source system.
7. Troubleshooting
| Symptom | Likely cause | What to do |
|---|---|---|
| Browser SSO / auth every run | No token in shell | MotherDuck CLI setup §2 — set MOTHERDUCK_TOKEN. |
EXCLUDE / column not found | CSV schema changed vs script | Stop using EXCLUDE for that file or refresh the exclude list; prefer full column DDL on raw_export + SELECT * once aligned. |
Type mismatch on INSERT | Inference differs from raw_* types | Use explicit INSERT (cols…) SELECT … with CAST. |
| Job “stuck” with no log output | Normal for huge read_csv_auto | Check process CPU/disk; wait for step boundary. |
| Meetings/submissions “missing” | Export incomplete | Re-export; confirm multi-part submissions and meetings file in folder. |
8. After this playbook
- Backfill old rows for new UTM columns if needed (separate project).
- Update dbt docs / Omni model YAML if new columns should be exposed to analysts.