Magic Spoon Data Platform — Vendor Transition Prep
Date: February 17, 2026 From: Magic Spoon (Mary Burke) + Brainforge (incoming data team) To: Orchard Analytics (outgoing data team) Purpose: Prepare for the Tuesday knowledge-transfer session. This document summarizes what the incoming team has learned from reviewing the repos and working with the Magic Spoon team over the past several weeks. We’ve organized our open questions at the end so Tuesday’s call can focus on the gaps.
Part 1: What We Understand So Far
We’ve spent time in both the magicspoon-dbt and magicspoon-prefect2 repositories and have been working with the Magic Spoon team on active modeling work (SPINS pipeline, MMM mart). Below is our current understanding of the platform. Please let us know where anything is incorrect or incomplete.
Stack Overview
| Layer | Tool | Role |
|---|---|---|
| Data Warehouse | Amazon Redshift | Central store for all raw and transformed data |
| Transformations | dbt (Cloud) | SQL-based modeling across four layers |
| Orchestration | Prefect 2 (Cloud) | Scheduling and running ingestion flows |
| Compute (Prefect) | AWS ECS / Fargate | Runs Prefect agent and flow tasks in containers |
| Container Registry | Amazon ECR | Stores Docker images for Prefect flow execution |
| Object Storage | Amazon S3 | Intermediate staging (e.g., SPINS data exports) |
| BI / Analytics | Omni Analytics | Dashboarding and exploration for the Magic Spoon team |
| Reverse ETL | Klaviyo syncs via dbt | Profile property syncs (RFM data, discount history) |
Data Flow
Source APIs / Files
|
v
Prefect Flows (extract -> transform -> load)
|
v
Redshift (raw schemas per source, e.g. business_central.*, shopify.*)
|
v
dbt Models
base -> derived/facts -> output (marts)
|
v
Omni Dashboards + Klaviyo Syncs + Haus Uploads
dbt Repository (magicspoon-dbt)
Project: magicspoon_dbt (profile: magicspoon-dbt)
Package dependencies: dbt-labs/dbt_utils v0.8.6
Model layers:
models/base/(~270 models) — Raw table references with dtype casting and aliasing. No joins. Organized by source:shopify/(US + Canada stores),business_central/,google_analytics/,klaviyo/,tiktok/,tatari/,stay_ai/,amazon/,dcl/,gorgias/,okendo/,impact/,recharge/,refersion/,parabola/,kroger_retail/,target_retail/,walmart_retail/,wholefoods_retail/,shopper_marketing/,spins/, and utility models.models/derived/(~10 models) — Brainforge/Orchard-created concepts: subscriptions, daily marketing spend, COGS, date spines, GA4 events.models/facts/(~82 models) — Joins and aggregations on top of base tables. Heavy Shopify fact layer (~55 models covering orders, customers, line items, attribution, subscriptions, RFM, bundling, discounts). Also Amazon, Business Central, DCL, Gorgias, Impact, Google Analytics, manual spend, Recharge, RJW.models/output/(~117 marts) — Final models exposed to Omni and downstream consumers. Key domains include:- Shopify: orders, customers, line items, line item units (v1 + v2), refunds, DCL-aged-orders report
- Business Central: item ledger entries, value entries, sales orders/invoices, purchase orders, GL entries, transfer orders, production BOMs, reservation entries (~20 marts)
- Retail: Kroger, Target, Walmart, Whole Foods (daily + weekly, standard + Haus format)
- Marketing/Spend: daily marketing performance (standard + custom attribution), cohort payback, manual spend, agency budget performance
- MMM:
mmm_daily_mart,mmm_weekly_mart - Channels: Google Analytics (multiple GA4 + UA marts), TikTok, Tatari, Pinterest/Criteo (via spend), Impact
- CX: Gorgias tickets/users/tags, Okendo reviews, Enquire responses
- Subscriptions: Stay AI, Recharge
- Syncs: Klaviyo profile property syncs (RFM, discount history, customer properties)
snapshots/— One snapshot:shopify_customer_rfm_snapshottests/,macros/,data/(seeds) — Standard dbt directories
Notes:
profiles.ymlis not in the repo; we assume connection configuration is managed in dbt Cloud.- Schema tests are defined across
schema.ymlfiles in each model layer.
Prefect Repository (magicspoon-prefect2)
Prefect version: 2.20.4
Python version: 3.11
Agent infrastructure: ECS on Fargate (managed via Terraform in terraform/)
Flows (~114 flow files across 22 source folders):
| Source | Flow count | Examples |
|---|---|---|
| Amazon (Seller Central + Ads) | ~12 | orders, fulfilled shipments, campaign performance (SP/SD/SB) |
| Applovin | 1 | ad performance hourly |
| BigQuery | 1 | GA4 events export |
| Business Central | ~18 | item ledger entries (3 variants), sales orders/invoices, GL entries, purchase orders, transfer orders, dimension sets, production BOMs |
| Criteo | 1 | campaign performance daily |
| DCL | 3 | orders, items, items grouped by inventory type |
| Enquire | 2 | questions, responses |
| Google Analytics | 5 | daily performance (overall, by source/medium, landing page, device, page title) |
| Gorgias | 3 | tickets, users, custom fields |
| Haus | ~9 | daily + weekly uploads for Kroger, Target, Walmart, Whole Foods |
| Impact | 2 | partners, partner performance daily |
| Klaviyo | 7 | campaigns, flows, reports (campaign + flow performance), profile property syncs (3) |
| Okendo | 1 | reviews |
| Parabola | 4 | retailer daily sales (Kroger, Target, Walmart, Whole Foods) |
| Perpetua DSP | 1 | performance daily |
| 3 | campaigns, ad groups, ad group performance daily | |
| RJW | 2 | orders, billing charges |
| SPINS | 1 | market insights (GraphQL API) |
| Stay AI | 3 | subscriptions, orders, events |
| Tatari | 4 | linear + streaming spend/impressions, linear + streaming metrics |
| TikTok | ~11 | campaigns, ads, ad groups, performance hourly, GMV Max, shop campaigns, tertiary accounts |
| XPO | 2 | order status events, customer invoices |
Config-driven architecture:
Each flow is defined by a YAML config in flows/configs/ that specifies:
- Deployment: flow module, entrypoint, cron schedule, tags, infrastructure size (default / large / extra_large)
- Tasks: extract, transform, load — each with module, function, and task_args (columns, endpoints, incremental settings, primary keys, load type)
Example pattern (Business Central item ledger entries):
- Extract: API call with incremental filter on
lastModifiedDateTime - Transform: column selection and dtype handling via
common.transform - Load: incremental upsert to Redshift with primary key deduplication
Infrastructure blocks (from admin/blocks/block_config.yaml):
- Storage: GitHub (pulls flow code from
mainbranch) - Infrastructure: Three ECS task sizes:
- Default: 512 MB / 0.25 vCPU
- Large: 2 GB / 1 vCPU
- Extra Large: 10 GB / 2 vCPU
- Notifications: Slack webhook block for failure alerts
Automations (from admin/automations/automations_config.yaml):
- Flow run failure notifications (Slack + email on Crashed / Failed / Late / TimedOut)
- Auto-cancel for flows running longer than 2 hours
- Unhealthy work queue alerts (Slack + email after 30 minutes unhealthy)
Credentials / Secrets:
- Production: stored as Prefect Secret blocks
- Local development:
.envfile - Environment detection:
IS_PRODUCTIONenv var set in Dockerfile
CI/CD (GitHub Actions):
- On push to
main: Run tests, detect changed flow files, deploy only changed flows to Prefect Cloud. Iflib/,admin/,requirements.txt, orDockerfilechanged, rebuild and push ECR image + redeploy all flows. - Manual dispatch: Full ECR rebuild + deploy-all-flows workflow.
- Slack notification: Posts workflow status to Slack on completion.
Alerting on task failure:
TaskSlackAlertclass inlib/alert/slack.pysends Slack messages with flow name, task name, state, traceback, and link to Prefect Cloud flow run. Only fires in production.
Part 2: Questions for Orchard
We’ve grouped these by area so you can prepare answers in advance. We don’t need exhaustive written answers before Tuesday — bullet points or “let’s screenshare that part” is fine. The goal is to make the hour productive.
dbt Cloud and Transformations
- Job configuration: How many dbt Cloud jobs are currently active? What are their schedules and run order? Are there dependencies between jobs (e.g., does one job need to finish before another starts)?
- Failure patterns: Which models or tests tend to fail most often? Are there any known flaky tests that can be safely retriggered?
- Recent failures (Feb 3-5): We noticed a period of continuous dbt failures over approximately 36 hours in early February. Can you walk us through what happened and how it was resolved? Is this a known recurring pattern?
- Shopify modeling: The Shopify layer is the largest part of the dbt project (~55 fact models + multiple output marts). Can you walk us through the bundling logic and how revenue is attributed to individual line items? Where does the core logic live?
- Business Central models: Are there specific Business Central models that require special attention or have known edge cases (e.g., custom web service endpoints vs. standard API endpoints)?
- Schema ownership: Are there any Redshift schemas or tables that are written to outside of dbt (e.g., direct loads from other tools, manual uploads)?
- dbt Cloud environment: What environment variables or profile-level settings are configured in dbt Cloud that aren’t in the repo? Who are the current dbt Cloud admins?
- Testing strategy: Is there a particular approach to how tests are organized? Are there areas of the project that are well-tested vs. areas with gaps?
Prefect Orchestration
- Common failure modes: Which flows fail most frequently? What are the typical root causes and how have you been resolving them?
- Business Central API: We’ve heard about timeout and connection issues with the Business Central endpoints. How have you been handling these? Are there workarounds currently in place (e.g., retry logic, reduced batch sizes)?
- Deployment specifics: Is there anything non-obvious about the current ECS agent setup, work queues, or infrastructure blocks that we should be aware of?
- Secret management: Are all Prefect Secret blocks current? Are any API tokens or credentials approaching expiration or needing rotation?
- Local development: What’s the typical workflow for developing and testing a new flow locally before deploying to production?
- Work pool / queue configuration: How are work queues configured in Prefect Cloud? Is there a single default queue or multiple queues with different concurrency settings?
Source-Specific
- SPINS: What are the specific rate limits for the GraphQL API? How have backfills been handled historically? Any known issues with the mutation request timing or data delivery schedule?
- Parabola / Retail data: The pipeline currently pulls retailer data (Kroger, Target, Walmart, Whole Foods) via Parabola. How reliable has this been? Are there manual steps involved (e.g., triggering Parabola scenarios, downloading files)?
- Klaviyo syncs: The reverse-ETL syncs (RFM, discount history, customer properties) push data back to Klaviyo. How are these triggered? Are there any known issues with Klaviyo API rate limits or data freshness expectations?
- Haus uploads: The Haus flows upload retail data for multiple retailers. What’s the expected format and delivery cadence? Any known issues?
- Other gotchas: Are there any other source-specific quirks, tribal knowledge, or “watch out for this” items that aren’t documented?
Operational
- Monitoring beyond Prefect: Beyond the Prefect automations, is there any other monitoring in place? For example, data freshness checks, row-count alerts, Metaplane, or manual spot-checks?
- Runbooks: Are there any internal runbooks or incident-response documents for common failure scenarios?
- Access and accounts: Can you confirm all the accounts and access we’ll need to take over? (Prefect Cloud, dbt Cloud, AWS, GitHub org, Redshift, any third-party API portals)
- Follow-up contacts: Who should we reach out to if we have follow-up questions after the formal transition period ends?
Part 3: Proposed Agenda for Tuesday
We want to make the most of our time together. Here’s a suggested agenda — happy to adjust based on your priorities.
| Time | Topic | Goal |
|---|---|---|
| 5 min | Introductions and confirm transition timeline | Align on how many sessions we have and the handoff deadline |
| 15 min | dbt Cloud walkthrough | Job schedules, known failure patterns, Shopify model walkthrough |
| 15 min | Prefect walkthrough | Deployment setup, common failures, Business Central handling |
| 10 min | Source-specific gotchas | SPINS, Parabola/retail, Klaviyo syncs, anything else |
| 10 min | Operational handoff | Access/credentials inventory, monitoring, runbooks |
| 5 min | Follow-up plan | Schedule additional sessions if needed |
Appendix: Repository Quick Reference
magicspoon-dbt
magicspoon-dbt/
dbt_project.yml # Project config (name: magicspoon_dbt, profile: magicspoon-dbt)
packages.yml # dbt_utils v0.8.6
models/
base/ # ~270 models -- raw table refs, dtype casting, no joins
shopify/ # US + Canada stores (orders, line items, customers, products, etc.)
business_central/ # ~30 base models (ledger entries, sales, purchases, GL, etc.)
google_analytics/ # GA4 + UA base tables
klaviyo/ # Campaigns, flows, reporting API
tiktok/ # Campaigns, ads, ad groups, performance
amazon/ # Seller Central + Advertising
[+ 15 more sources]
derived/ # ~10 models -- subscriptions, spend, COGS, date spines
facts/ # ~82 models -- joins/aggregations (heavy Shopify, Amazon, BC)
output/ # ~117 marts -- final models for Omni, syncs, exports
retail/ # Kroger, Target, Walmart, Whole Foods (daily/weekly, Haus format)
mmm/ # mmm_daily_mart, mmm_weekly_mart
syncs/klaviyo/ # Reverse ETL to Klaviyo
snapshots/ # shopify_customer_rfm_snapshot
macros/
data/ # Seeds
tests/
magicspoon-prefect2
magicspoon-prefect2/
flows/
[22 source folders]/ # Python flow files (extract/transform/load pattern)
configs/ # YAML configs defining schedules, tasks, and task_args
business_central.yaml # ~18 flows, mostly hourly incremental
tiktok.yaml # ~11 flows
klaviyo.yaml # 7 flows including syncs
google_analytics.yaml # 5 flows
spins.yml # 1 flow, daily at 2am UTC
[+ 17 more configs]
lib/
config/ # FlowConfig, TaskConfig, DeploymentConfig dataclasses + validation
tasks/ # Core task runner + artifact creation
utils/ # Credentials helper, config utilities
alert/ # TaskSlackAlert (Slack webhook on failure)
connect/ # Prefect Cloud connection helper
logging/ # Structured logging
admin/
blocks/ # Infrastructure + storage + notification block definitions
automations/ # Failure notifications, long-run cancellation, unhealthy queue alerts
secrets/ # Secret upload utilities
deploy/ # Flow deployer CLI (used by GitHub Actions)
terraform/ # ECS cluster, agent, ECR, IAM roles
.github/workflows/
deploy.yml # CI/CD on push to main
deploy_manaul.yml # Manual full deploy
pull_request.yml # PR checks