LMNT ETL Platform Assessment & Recommendation

Prepared for: Shivani Amar, Jason Wu, Phil McKeating
Prepared by: Uttam Kumaran, Awaish Kumar (Brainforge)
Date: December 9, 2025
Engagement Path: Data Foundations Sprint (DFS)
Decision needed by: December 16, 2025 (to meet January ingestion timeline)


Executive Summary

The Problem

Today, LMNT’s commercial data is trapped in silos across 8+ disconnected systems.

  • Carlos manually exports Shopify orders from SourceMedium into spreadsheets.
  • Amazon settlement reports arrive days late and require manual reconciliation.
  • Laura’s wholesale data lives in email attachments and PDFs.
  • Finance spends X+ days each month reconciling revenue across channels.

This creates three critical problems:

  1. Manual burden: 15+ hours/week spent copying data between systems
  2. Delayed insights: The October 2025 Shopify conversion rate drop (34.7% decline) took weeks to diagnose because data wasn’t readily available
  3. No cross-channel view: Can’t answer questions like “Is Target cannibalizing DTC sales?” because the data doesn’t connect

What ETL Solves

ETL (Extract, Transform, Load) is the automated data plumbing that moves information from source systems (Shopify, Amazon, Ware2Go, etc.) into Snowflake, where it can be cleaned, combined, and analyzed.

Think of it like plumbing in a house. ETL is the foundation that everything else (dbt transformations, BI dashboards, AI analysis) is built upon.

Why This Matters Now

  • NetSuite migration (2026): Need clean data lineage and definitions locked before ERP migration
  • Seasonal urgency: Q4 revenue decline needs rapid diagnosis; can’t wait weeks for manual data pulls
  • Team capacity: Carlos, Laura, and team should focus on strategy, not data entry
  • Omnichannel expansion: Retail growth (Target, Costco) requires unified revenue view

The Recommendation

Implement a hybrid ETL architecture using:

  • Fivetran for mission-critical, high-reliability sources (Amazon, Walmart)
  • Polytomic ETL for high-volume sources to take advantage of cost savings (Shopify, Recharge, Ware2Go)
  • Polytomic Reverse ETL for marketing activation (Klaviyo)
  • Snowpipe for retail analytics (SPINS) if real-time needs arise
  • Snowflake Private Share for Walmart 3P (Emerson) — no ETL needed

Investment

ComponentAnnual Cost
Fivetran (Amazon + Walmart eComm)12,060
Polytomic ETL (Shopify + Recharge + Ware2Go)2,278
Polytomic Reverse ETL (Klaviyo)TBD – Contact Polytomic
Snowpipe (SPINS, if needed)1,800
Total Annual Investment (ETL only)16,138

What We’ve Learned About CPG Data Challenges

  1. Retail data lags: POS data from Target/Walmart arrives 2–3 weeks late and requires special handling
  2. Promotional spikes: Black Friday and product launches can 10x normal order volumes; pipelines must handle surges
  3. Revenue definition chaos: “Sales” means different things to e-commerce, finance, and retail teams. Must lock definitions early
  4. Settlement complexity: Amazon settlement reports don’t match order dates; requires reconciliation logic
  5. Wholesale fragmentation: B2B partners send data in wildly inconsistent formats

Our ETL Philosophy

Don’t Over-Engineer

  • Use managed services (Fivetran, Polytomic) for 90% of ingestion
  • Custom code only when absolutely necessary
  • Prioritize reliability over cost optimization

Document Everything

  • Every data source has a lineage diagram
  • Every metric has a signed definition
  • Change control through pull requests and stakeholder review

Build Incrementally

  • Start with highest-value sources (Shopify, Amazon = 80% of revenue)
  • Validate before expanding to wholesale and retail
  • Quick wins build trust for larger initiatives

Why ETL is Non-Negotiable

Without reliable ETL:

  • ❌ Dashboards show stale or incomplete data
  • ❌ dbt models fail when source data is missing
  • ❌ Team loses trust in the platform
  • ❌ Manual processes continue (defeating the entire project)

With reliable ETL:

  • ✅ Data updates automatically every day (or hour if needed)
  • ✅ Historical data available for trend analysis
  • ✅ Foundation for advanced analytics (cohort analysis, LTV predictions)
  • ✅ Enables real-time marketing activation (Reverse ETL to Klaviyo)

Assumptions

Source System Assumptions

  1. Shopify, Amazon, Walmart, Ware2Go, and Recharge row volumes are estimated based on current order activity
  2. SPINS volumes (500k–1M rows/month) are assumed until validated with SPINS team
  3. Emerson (Walmart 3P) data is delivered via Snowflake Private Share (no ETL required)
  4. Order volumes remain stable; growth forecasts are not applied to cost estimates

ETL Pricing Assumptions

  1. Fivetran pricing = $50 per 100k Monthly Active Rows (MAR)
  2. Polytomic ETL pricing = $500/month flat rate (up to ~6M rows)
  3. Polytomic Reverse ETL pricing = $400 per connector/month (Klaviyo confirmed)
  4. Snowpipe cost for SPINS ingestion = 150/month (if real-time is required)

Architectural Assumptions

  1. All sources land in Snowflake as the central data warehouse
  2. No new retail or distribution partners are assumed beyond those listed
  3. Snowflake compute costs for downstream transformation (dbt) are excluded from ETL cost analysis

LMNT Data Ingestion Volume Overview

LMNT ingests data from multiple retail, marketplace, DTC, and fulfillment systems. Order-driven systems like Shopify and Amazon produce predictable monthly row volumes. Operational systems like Ware2Go fluctuate with shipping schedules. Emerson provides large, high-value static datasets directly through Snowflake sharing. SPINS data will arrive as file drops requiring Snowpipe ingestion.

Ingestion Volume Summary

SourceEst. Monthly RowsEst. Annual RowsPrimary Use CaseData Lag
Emerson (Walmart 3P)250k–260k~3MPOS + Omni reporting2–3 weeks
Shopify DTC700k–1M8M–12MDTC orders, customers, productsReal-time
Shopify Wholesale20k–60k250k–700kB2B ordersReal-time
Recharge230k–350k2.8M–4.2MSubscription managementReal-time
Ware2Go360k–1M4.2M–12MFulfillment trackingDaily
Amazon Marketplace1.5M–2M16M–20MMarketplace orders, settlements1–2 days
Walmart eCommerce~10k~120k3P marketplace orders1–2 days
SPINS500k–1M6M–12MRetail analytics (Target, etc.)2–3 weeks

Notes

  • Walmart 3P (Emerson): ~36M rows of static historical data delivered via Snowflake Private Share (no ETL required)
  • Volumes are based on current order rates and typical connector table expansion
  • SPINS volumes are estimated pending validation from SPINS team

ETL Platform Selection & Architecture

We evaluated ETL platforms based on:

  1. Reliability: Uptime, data quality, error handling
  2. Connector coverage: Native support for LMNT’s sources
  3. Cost efficiency: Per-row pricing vs. flat-rate models
  4. Speed to value: Time from sign-up to first data flowing
  5. Maintenance burden: Engineering time required for ongoing operations
  6. Scalability: Ability to handle growth (new sources, higher volumes)
  7. Security & compliance: SOC2, data encryption, access controls

Component Breakdown

1) Fivetran (Primary ETL for Mission-Critical Sources)

Use for

  • Amazon Seller Central (1.5M–2M rows/month)
  • Walmart Seller Central (10k rows/month)

Why Fivetran

  • Highest reliability: 99.9% uptime SLA, battle-tested at enterprise scale
  • Zero maintenance: Automatic schema drift detection, connector updates
  • Rich historical data: Amazon settlement reports go back 18+ months automatically
  • Complex logic handled: Reconciles Amazon’s settlement structure out-of-the-box

Pricing

  • $50 per 100,000 Monthly Active Rows (MAR)
  • Amazon: ~1,000/month
  • Walmart: ~$5/month
  • Total: 12,060/year

2) Polytomic ETL (High-Volume & Flexible Sources)

Use for

  • Shopify DTC (700k–1M rows/month)
  • Shopify Wholesale (20k–60k rows/month)
  • Recharge Subscriptions (230k–350k rows/month)
  • Ware2Go Fulfillment (360k–1M rows/month)

Why Polytomic

  • Cost efficiency: Usage-based pricing cheaper than Fivetran for LMNT’s volumes
    • LMNT’s 1.31M–2.41M rows/month = 190/month
    • Same volume on Fivetran would cost ~1,900/month
    • Savings: ~20K/year vs. Fivetran for these sources
  • Fast connector development: Build custom connectors in ~1 week if needed
  • High volume optimized: Designed for large datasets with predictable scaling
  • Flexibility: API-first design allows custom mappings

Pricing

  • Usage-based tiered model: $78.756 per 1M rows (Tier 1: 0–10M rows/month)
  • LMNT’s estimated cost: 190/month
  • Total: 2,278/year

3) Polytomic Reverse ETL (Marketing Activation)

Use for

  • Klaviyo (customer profiles, segments, purchase behavior)

Why Polytomic Reverse ETL

  • Native Klaviyo integration
  • Incremental syncs (only updates changed records)
  • Field mapping UI for non-engineers
  • Real-time: Sub-5-minute latency for triggered campaigns

Use case Sync enriched customer data from Snowflake → Klaviyo to enable segmentation based on:

  • Lifetime value (LTV)
  • Cross-channel behavior (bought on Amazon, not Shopify)
  • Subscription status (Recharge)
  • Wholesale affiliation

Pricing

  • TBD (requires separate quote)
  • Typically priced per destination connector
  • Added to totals once quote is received

4) Snowpipe (Retail Analytics Files)

Use for

  • SPINS (retail POS data, file drops to S3)

Why Snowpipe

  • Event-driven ingestion (auto-ingests files as they land in S3)
  • Cost-efficient (pay only for actual ingestion)
  • Low latency (near real-time ingestion, if needed)

Alternative

  • Use Polytomic if real-time ingestion isn’t required (batch daily)

Pricing

  • 150/month
  • Total: 1,800/year

Decision point Assess with SPINS team whether real-time is needed or daily batch is sufficient.


5) Snowflake Private Share (No ETL Required)

Use for

  • Emerson (Walmart 3P POS + Omni data)

Why Private Share

  • Zero ETL cost
  • Always current (Emerson controls refresh schedule)
  • 36M historical rows (instant access to years of data)

Pricing

  • $0 (included in Emerson relationship)
  • Only pay Snowflake compute costs when querying

ETL Platform Comparison Matrix

CriteriaFivetranPolytomic (ETL)Polytomic (Reverse ETL)AirbyteDagster
ReliabilityVery High (99.9% SLA)High (99.5%)HighMedium (self-managed)Variable
MaintenanceVery LowLowLowHighVery High
Connector CoverageBroad (300+)Good (150+)Marketing destinationsCustomFull custom
New Connector SpeedSlow (3–6 months)Fast (~1 week)N/AMedium (community)Engineering-built
Cost ModelPer 100k rows ($50)Usage-based tiered ($78.76/1M rows at Tier 1)Per connector (TBD)Infrastructure + laborLabor only
Setup Time3–5 days1–3 days1–3 days1–2 weeks4–8 weeks
SecuritySOC2, GDPR, HIPAASOC2, GDPRSOC2Self-managedSelf-managed
Best ForMission-critical sourcesHigh-volume or unsupportedMarketing activationDev resources availableInternal APIs

Why We Eliminated Other Options

ToolReason for Exclusion
AirbyteHigh maintenance burden; unstable for multi-million row retail datasets; requires dedicated engineering
Hevo DataLimited track record with complex omnichannel retail loads
MatillionHeavy ETL architecture; requires extensive engineering; expensive
Stitch / TalendDeprecated or low investment in new connectors; limited support
RudderstackStrong CDP focus, weaker ETL ingestion for structured analytics use cases

Cost Summary & ROI Analysis

ETL Cost Breakdown

SourceMonthly RowsETL MethodMonthly CostAnnual Cost
Emerson (Walmart 3P)250k–260kSnowflake Private Share$0$0
Shopify DTC700k–1MPolytomic ETL~79~948
Shopify Wholesale20k–60kPolytomic ETL~5~57
Recharge230k–350kPolytomic ETL~28~331
Ware2Go360k–1MPolytomic ETL~79~948
Amazon Marketplace1.5M–2MFivetran1,00012,000
Walmart eCommerce~10kFivetran$5$60
SPINS500k–1MSnowpipe (or Polytomic)1501,800

Polytomic ETL pricing note: Usage-based at $78.756 per 1M rows (Tier 1: 0–10M rows/month)

Total ETL Investment

TypeMonthlyAnnual
Fivetran ETL1,00512,060
Polytomic ETL1902,278
Polytomic Reverse ETL (Klaviyo)TBDTBD
Snowpipe (Optional)1501,800
Emerson Share$0$0
Total (excluding Reverse ETL)1,34516,138

Primary ETL → Fivetran

Use for revenue-critical, daily ingestion:

  • Amazon Seller Central
  • Walmart Seller Central

Why: Strongest reliability & connector quality.

Supporting ETL → Polytomic (ETL)

Use for high-volume or unsupported datasets:

  • Shopify
  • Ware2Go
  • Recharge
  • Any connector missing in Fivetran

Why: Cost-efficient, flexible, rapid connector turnaround.

Activation → Polytomic (Reverse ETL)

Synchronize processed and enriched customer data from Snowflake back into Klaviyo for lifecycle marketing, segmentation, and personalization.


Procurement & Timeline

Procurement Process (Optimized for Speed)

Why this is fast Unlike enterprise software with lengthy RFPs, modern ETL tools are designed for quick sign-up.

Fivetran

  • Trial: 14-day free trial (start immediately, no contract)
  • Contract: Standard MSA, no negotiation needed (SOW within 24 hours)
  • Payment: Credit card or invoice (net-30 terms available)
  • Setup: Go-live within 3–5 days of contract signature
  • SOC2 Type II report available for IT review

Polytomic

  • Trial: 14-day free trial (can load full data during trial)
  • Contract: Standard terms, no customization needed
  • Payment: Credit card (annual prepay for discount) or invoice
  • Setup: Go-live within 1–3 days
  • Dedicated customer success engineer assigned

Total procurement time: 3–5 business days (vs. 4–8 weeks for traditional enterprise tools)

LMNT Procurement Steps

StepOwnerTimelineNotes
1. Review & approve memoShivani, Jason, PhilDec 9–13This document
2. Vendor SOC2 reviewJason (IT)Dec 13–16Brainforge provides SOC2 reports
3. Sign MSAsJason or FinanceDec 16–18Standard contracts, no negotiation
4. Provision API keysJason (IT)Dec 18–19Shopify, Amazon, Recharge, etc.
5. Brainforge setup beginsAwaishDec 19Start with Shopify + Amazon
Go-live (first data flowing)Dec 23–26Week 2 of project

Decision needed by: December 16, 2025 (to meet January backfill timeline)


Implementation Timeline

Week 1: Setup & Configuration (Dec 19–26)

Fivetran

  • Create Fivetran account
  • Connect Amazon Seller Central API
  • Connect Walmart Seller Central API
  • Start incremental sync (30 days back)

Polytomic

  • Create Polytomic account
  • Connect Shopify DTC API
  • Connect Recharge API
  • Start incremental sync

Snowflake

  • Create raw data schemas (raw_shopify, raw_amazon, etc.)
  • Set up user permissions
  • Configure dbt connection

Deliverable: First data flowing into Snowflake (30 days of recent history)

Week 2–3: Historical Backfills (Dec 26 – Jan 9)

Fivetran

  • Amazon: Backfill 24 months (settlement reports, orders, inventory)
  • Walmart: Backfill 12 months

Polytomic

  • Shopify DTC: Backfill 24 months (orders, customers, products)
  • Recharge: Backfill 24 months (subscriptions, charges)

Validation

  • Reconcile row counts: Snowflake vs. source system UI
  • Spot-check sample orders (10–20 per source)
  • Compare revenue totals: Snowflake vs. finance records

Deliverable: 2 years of historical data loaded, validated

Week 4–5: Remaining Sources (Jan 9 – Jan 23)

Polytomic

  • Ware2Go: Connect fulfillment API, backfill 12 months
  • Shopify Wholesale: Connect, backfill 12 months

Snowpipe (or Polytomic)

  • SPINS: Set up S3 bucket, configure Snowpipe, test file ingestion

Emerson

  • Validate Snowflake Private Share access
  • Document table schemas and refresh schedule

Deliverable: All sources connected, backfills complete

Week 6: Reverse ETL (Jan 23 – Jan 30)

Polytomic Reverse ETL

  • Connect Klaviyo destination
  • Map Snowflake customer table → Klaviyo profiles
  • Set up incremental syncs (daily or hourly)
  • Test with small segment before full rollout

Deliverable: Customer data syncing to Klaviyo

Week 7–8: Monitoring & Handoff (Jan 30 – Feb 6)

Monitoring setup

  • Slack alerts for pipeline failures
  • Weekly data quality reports (automated)
  • Dashboard for row counts and freshness

Documentation

  • Connector runbooks (how to troubleshoot each source)
  • Data lineage diagrams
  • Access control documentation

Training

  • 2-hour session with Jason’s IT team or Phil’s analytics team
  • Q&A and knowledge transfer

Deliverable: ETL running autonomously, LMNT team trained

Go-Live: January 2026 (Week 3–4)

What “Go-Live” means

  • All sources flowing daily into Snowflake
  • Historical data complete (2+ years)
  • Monitoring and alerting operational
  • dbt models ready to build on top of raw data

What happens next (out of scope for this memo)

  • Phase 2: dbt transformations (unified fct_orders table)
  • Phase 3: BI dashboards and reports

Final Recommendation

LMNT should implement a hybrid ETL architecture using:

  • Fivetran for Amazon and Walmart
  • Polytomic ETL for Shopify, Recharge, and Ware2Go
  • Polytomic Reverse ETL for Klaviyo
  • Snowpipe for SPINS if realtime needs (otherwise Polytomic)
  • Snowflake Private Share for Emerson Walmart 3P

This structure balances cost efficiency, connector coverage, scalability, and long-term maintainability while supporting LMNT’s operational and analytical requirements.


Appendix

Snowflake Schema Design

Proposed structure LMNT_DATA_WAREHOUSE ├── RAW_SHOPIFY (Fivetran or Polytomic land here) │ ├── orders │ ├── customers │ ├── products ├── RAW_AMAZON (Fivetran lands here) │ ├── orders │ ├── settlements ├── RAW_RECHARGE (Polytomic lands here) │ ├── subscriptions ├── RAW_WARE2GO (Polytomic lands here) │ ├── shipments ├── RAW_SPINS (Snowpipe lands here) │ ├── pos_data ├── RAW_EMERSON (Private Share, read-only) │ ├── walmart_pos │ ├── walmart_omni └── DBT_MODELS (Transformations in Phase 2) ├── staging ├── intermediate └── marts

Data Retention

ETL tools

  • Fivetran: Retains sync logs for 30 days (data lives in Snowflake permanently)
  • Polytomic: Retains sync logs for 90 days

Snowflake

  • Raw data: Retained indefinitely (LMNT controls)
  • Time Travel: 1–90 days (configurable, for accidental deletion recovery)
  • Fail-safe: 7 days (Snowflake backup for disaster recovery)

Monitoring & Alerting

Pipeline health

  • Slack notifications for:
    • Pipeline failures (immediate)
    • Row count anomalies (daily summary)
    • Schema changes detected (immediate)

Data quality

  • dbt tests run after each pipeline sync
  • Alerts for:
    • Null values in critical fields (e.g., order_id, revenue)
    • Referential integrity violations (e.g., order without customer)
    • Statistical anomalies (e.g., 50% drop in daily orders)

Dashboard

  • Snowflake-based monitoring dashboard showing:
    • Sync status by source (last successful sync time)
    • Row counts and growth trends
    • Data freshness (how old is the most recent record?)