Magic Spoon Data Platform — Vendor Transition Prep

Date: February 17, 2026 From: Magic Spoon (Mary Burke) + Brainforge (incoming data team) To: Orchard Analytics (outgoing data team) Purpose: Prepare for the Tuesday knowledge-transfer session. This document summarizes what the incoming team has learned from reviewing the repos and working with the Magic Spoon team over the past several weeks. We’ve organized our open questions at the end so Tuesday’s call can focus on the gaps.


Part 1: What We Understand So Far

We’ve spent time in both the magicspoon-dbt and magicspoon-prefect2 repositories and have been working with the Magic Spoon team on active modeling work (SPINS pipeline, MMM mart). Below is our current understanding of the platform. Please let us know where anything is incorrect or incomplete.

Stack Overview

LayerToolRole
Data WarehouseAmazon RedshiftCentral store for all raw and transformed data
Transformationsdbt (Cloud)SQL-based modeling across four layers
OrchestrationPrefect 2 (Cloud)Scheduling and running ingestion flows
Compute (Prefect)AWS ECS / FargateRuns Prefect agent and flow tasks in containers
Container RegistryAmazon ECRStores Docker images for Prefect flow execution
Object StorageAmazon S3Intermediate staging (e.g., SPINS data exports)
BI / AnalyticsOmni AnalyticsDashboarding and exploration for the Magic Spoon team
Reverse ETLKlaviyo syncs via dbtProfile property syncs (RFM data, discount history)

Data Flow

Source APIs / Files
        |
        v
  Prefect Flows (extract -> transform -> load)
        |
        v
  Redshift (raw schemas per source, e.g. business_central.*, shopify.*)
        |
        v
  dbt Models
    base -> derived/facts -> output (marts)
        |
        v
  Omni Dashboards  +  Klaviyo Syncs  +  Haus Uploads

dbt Repository (magicspoon-dbt)

Project: magicspoon_dbt (profile: magicspoon-dbt) Package dependencies: dbt-labs/dbt_utils v0.8.6

Model layers:

  • models/base/ (~270 models) — Raw table references with dtype casting and aliasing. No joins. Organized by source: shopify/ (US + Canada stores), business_central/, google_analytics/, klaviyo/, tiktok/, tatari/, stay_ai/, amazon/, dcl/, gorgias/, okendo/, impact/, recharge/, refersion/, parabola/, kroger_retail/, target_retail/, walmart_retail/, wholefoods_retail/, shopper_marketing/, spins/, and utility models.
  • models/derived/ (~10 models) — Brainforge/Orchard-created concepts: subscriptions, daily marketing spend, COGS, date spines, GA4 events.
  • models/facts/ (~82 models) — Joins and aggregations on top of base tables. Heavy Shopify fact layer (~55 models covering orders, customers, line items, attribution, subscriptions, RFM, bundling, discounts). Also Amazon, Business Central, DCL, Gorgias, Impact, Google Analytics, manual spend, Recharge, RJW.
  • models/output/ (~117 marts) — Final models exposed to Omni and downstream consumers. Key domains include:
    • Shopify: orders, customers, line items, line item units (v1 + v2), refunds, DCL-aged-orders report
    • Business Central: item ledger entries, value entries, sales orders/invoices, purchase orders, GL entries, transfer orders, production BOMs, reservation entries (~20 marts)
    • Retail: Kroger, Target, Walmart, Whole Foods (daily + weekly, standard + Haus format)
    • Marketing/Spend: daily marketing performance (standard + custom attribution), cohort payback, manual spend, agency budget performance
    • MMM: mmm_daily_mart, mmm_weekly_mart
    • Channels: Google Analytics (multiple GA4 + UA marts), TikTok, Tatari, Pinterest/Criteo (via spend), Impact
    • CX: Gorgias tickets/users/tags, Okendo reviews, Enquire responses
    • Subscriptions: Stay AI, Recharge
    • Syncs: Klaviyo profile property syncs (RFM, discount history, customer properties)
  • snapshots/ — One snapshot: shopify_customer_rfm_snapshot
  • tests/, macros/, data/ (seeds) — Standard dbt directories

Notes:

  • profiles.yml is not in the repo; we assume connection configuration is managed in dbt Cloud.
  • Schema tests are defined across schema.yml files in each model layer.

Prefect Repository (magicspoon-prefect2)

Prefect version: 2.20.4 Python version: 3.11 Agent infrastructure: ECS on Fargate (managed via Terraform in terraform/)

Flows (~114 flow files across 22 source folders):

SourceFlow countExamples
Amazon (Seller Central + Ads)~12orders, fulfilled shipments, campaign performance (SP/SD/SB)
Applovin1ad performance hourly
BigQuery1GA4 events export
Business Central~18item ledger entries (3 variants), sales orders/invoices, GL entries, purchase orders, transfer orders, dimension sets, production BOMs
Criteo1campaign performance daily
DCL3orders, items, items grouped by inventory type
Enquire2questions, responses
Google Analytics5daily performance (overall, by source/medium, landing page, device, page title)
Gorgias3tickets, users, custom fields
Haus~9daily + weekly uploads for Kroger, Target, Walmart, Whole Foods
Impact2partners, partner performance daily
Klaviyo7campaigns, flows, reports (campaign + flow performance), profile property syncs (3)
Okendo1reviews
Parabola4retailer daily sales (Kroger, Target, Walmart, Whole Foods)
Perpetua DSP1performance daily
Pinterest3campaigns, ad groups, ad group performance daily
RJW2orders, billing charges
SPINS1market insights (GraphQL API)
Stay AI3subscriptions, orders, events
Tatari4linear + streaming spend/impressions, linear + streaming metrics
TikTok~11campaigns, ads, ad groups, performance hourly, GMV Max, shop campaigns, tertiary accounts
XPO2order status events, customer invoices

Config-driven architecture: Each flow is defined by a YAML config in flows/configs/ that specifies:

  • Deployment: flow module, entrypoint, cron schedule, tags, infrastructure size (default / large / extra_large)
  • Tasks: extract, transform, load — each with module, function, and task_args (columns, endpoints, incremental settings, primary keys, load type)

Example pattern (Business Central item ledger entries):

  • Extract: API call with incremental filter on lastModifiedDateTime
  • Transform: column selection and dtype handling via common.transform
  • Load: incremental upsert to Redshift with primary key deduplication

Infrastructure blocks (from admin/blocks/block_config.yaml):

  • Storage: GitHub (pulls flow code from main branch)
  • Infrastructure: Three ECS task sizes:
    • Default: 512 MB / 0.25 vCPU
    • Large: 2 GB / 1 vCPU
    • Extra Large: 10 GB / 2 vCPU
  • Notifications: Slack webhook block for failure alerts

Automations (from admin/automations/automations_config.yaml):

  • Flow run failure notifications (Slack + email on Crashed / Failed / Late / TimedOut)
  • Auto-cancel for flows running longer than 2 hours
  • Unhealthy work queue alerts (Slack + email after 30 minutes unhealthy)

Credentials / Secrets:

  • Production: stored as Prefect Secret blocks
  • Local development: .env file
  • Environment detection: IS_PRODUCTION env var set in Dockerfile

CI/CD (GitHub Actions):

  • On push to main: Run tests, detect changed flow files, deploy only changed flows to Prefect Cloud. If lib/, admin/, requirements.txt, or Dockerfile changed, rebuild and push ECR image + redeploy all flows.
  • Manual dispatch: Full ECR rebuild + deploy-all-flows workflow.
  • Slack notification: Posts workflow status to Slack on completion.

Alerting on task failure:

  • TaskSlackAlert class in lib/alert/slack.py sends Slack messages with flow name, task name, state, traceback, and link to Prefect Cloud flow run. Only fires in production.

Part 2: Questions for Orchard

We’ve grouped these by area so you can prepare answers in advance. We don’t need exhaustive written answers before Tuesday — bullet points or “let’s screenshare that part” is fine. The goal is to make the hour productive.

dbt Cloud and Transformations

  1. Job configuration: How many dbt Cloud jobs are currently active? What are their schedules and run order? Are there dependencies between jobs (e.g., does one job need to finish before another starts)?
  2. Failure patterns: Which models or tests tend to fail most often? Are there any known flaky tests that can be safely retriggered?
  3. Recent failures (Feb 3-5): We noticed a period of continuous dbt failures over approximately 36 hours in early February. Can you walk us through what happened and how it was resolved? Is this a known recurring pattern?
  4. Shopify modeling: The Shopify layer is the largest part of the dbt project (~55 fact models + multiple output marts). Can you walk us through the bundling logic and how revenue is attributed to individual line items? Where does the core logic live?
  5. Business Central models: Are there specific Business Central models that require special attention or have known edge cases (e.g., custom web service endpoints vs. standard API endpoints)?
  6. Schema ownership: Are there any Redshift schemas or tables that are written to outside of dbt (e.g., direct loads from other tools, manual uploads)?
  7. dbt Cloud environment: What environment variables or profile-level settings are configured in dbt Cloud that aren’t in the repo? Who are the current dbt Cloud admins?
  8. Testing strategy: Is there a particular approach to how tests are organized? Are there areas of the project that are well-tested vs. areas with gaps?

Prefect Orchestration

  1. Common failure modes: Which flows fail most frequently? What are the typical root causes and how have you been resolving them?
  2. Business Central API: We’ve heard about timeout and connection issues with the Business Central endpoints. How have you been handling these? Are there workarounds currently in place (e.g., retry logic, reduced batch sizes)?
  3. Deployment specifics: Is there anything non-obvious about the current ECS agent setup, work queues, or infrastructure blocks that we should be aware of?
  4. Secret management: Are all Prefect Secret blocks current? Are any API tokens or credentials approaching expiration or needing rotation?
  5. Local development: What’s the typical workflow for developing and testing a new flow locally before deploying to production?
  6. Work pool / queue configuration: How are work queues configured in Prefect Cloud? Is there a single default queue or multiple queues with different concurrency settings?

Source-Specific

  1. SPINS: What are the specific rate limits for the GraphQL API? How have backfills been handled historically? Any known issues with the mutation request timing or data delivery schedule?
  2. Parabola / Retail data: The pipeline currently pulls retailer data (Kroger, Target, Walmart, Whole Foods) via Parabola. How reliable has this been? Are there manual steps involved (e.g., triggering Parabola scenarios, downloading files)?
  3. Klaviyo syncs: The reverse-ETL syncs (RFM, discount history, customer properties) push data back to Klaviyo. How are these triggered? Are there any known issues with Klaviyo API rate limits or data freshness expectations?
  4. Haus uploads: The Haus flows upload retail data for multiple retailers. What’s the expected format and delivery cadence? Any known issues?
  5. Other gotchas: Are there any other source-specific quirks, tribal knowledge, or “watch out for this” items that aren’t documented?

Operational

  1. Monitoring beyond Prefect: Beyond the Prefect automations, is there any other monitoring in place? For example, data freshness checks, row-count alerts, Metaplane, or manual spot-checks?
  2. Runbooks: Are there any internal runbooks or incident-response documents for common failure scenarios?
  3. Access and accounts: Can you confirm all the accounts and access we’ll need to take over? (Prefect Cloud, dbt Cloud, AWS, GitHub org, Redshift, any third-party API portals)
  4. Follow-up contacts: Who should we reach out to if we have follow-up questions after the formal transition period ends?

Part 3: Proposed Agenda for Tuesday

We want to make the most of our time together. Here’s a suggested agenda — happy to adjust based on your priorities.

TimeTopicGoal
5 minIntroductions and confirm transition timelineAlign on how many sessions we have and the handoff deadline
15 mindbt Cloud walkthroughJob schedules, known failure patterns, Shopify model walkthrough
15 minPrefect walkthroughDeployment setup, common failures, Business Central handling
10 minSource-specific gotchasSPINS, Parabola/retail, Klaviyo syncs, anything else
10 minOperational handoffAccess/credentials inventory, monitoring, runbooks
5 minFollow-up planSchedule additional sessions if needed

Appendix: Repository Quick Reference

magicspoon-dbt

magicspoon-dbt/
  dbt_project.yml           # Project config (name: magicspoon_dbt, profile: magicspoon-dbt)
  packages.yml              # dbt_utils v0.8.6
  models/
    base/                   # ~270 models -- raw table refs, dtype casting, no joins
      shopify/              # US + Canada stores (orders, line items, customers, products, etc.)
      business_central/     # ~30 base models (ledger entries, sales, purchases, GL, etc.)
      google_analytics/     # GA4 + UA base tables
      klaviyo/              # Campaigns, flows, reporting API
      tiktok/               # Campaigns, ads, ad groups, performance
      amazon/               # Seller Central + Advertising
      [+ 15 more sources]
    derived/                # ~10 models -- subscriptions, spend, COGS, date spines
    facts/                  # ~82 models -- joins/aggregations (heavy Shopify, Amazon, BC)
    output/                 # ~117 marts -- final models for Omni, syncs, exports
      retail/               # Kroger, Target, Walmart, Whole Foods (daily/weekly, Haus format)
      mmm/                  # mmm_daily_mart, mmm_weekly_mart
      syncs/klaviyo/        # Reverse ETL to Klaviyo
  snapshots/                # shopify_customer_rfm_snapshot
  macros/
  data/                     # Seeds
  tests/

magicspoon-prefect2

magicspoon-prefect2/
  flows/
    [22 source folders]/    # Python flow files (extract/transform/load pattern)
    configs/                # YAML configs defining schedules, tasks, and task_args
      business_central.yaml # ~18 flows, mostly hourly incremental
      tiktok.yaml           # ~11 flows
      klaviyo.yaml          # 7 flows including syncs
      google_analytics.yaml # 5 flows
      spins.yml             # 1 flow, daily at 2am UTC
      [+ 17 more configs]
  lib/
    config/                 # FlowConfig, TaskConfig, DeploymentConfig dataclasses + validation
    tasks/                  # Core task runner + artifact creation
    utils/                  # Credentials helper, config utilities
    alert/                  # TaskSlackAlert (Slack webhook on failure)
    connect/                # Prefect Cloud connection helper
    logging/                # Structured logging
  admin/
    blocks/                 # Infrastructure + storage + notification block definitions
    automations/            # Failure notifications, long-run cancellation, unhealthy queue alerts
    secrets/                # Secret upload utilities
    deploy/                 # Flow deployer CLI (used by GitHub Actions)
  terraform/                # ECS cluster, agent, ECR, IAM roles
  .github/workflows/
    deploy.yml              # CI/CD on push to main
    deploy_manaul.yml       # Manual full deploy
    pull_request.yml        # PR checks