LMNT ETL Platform Assessment & Recommendation
Prepared for: Shivani Amar, Jason Wu, Phil McKeating
Prepared by: Uttam Kumaran, Awaish Kumar (Brainforge)
Date: December 9, 2025
Engagement Path: Data Foundations Sprint (DFS)
Decision needed by: December 16, 2025 (to meet January ingestion timeline)
Executive Summary
The Problem
Today, LMNT’s commercial data is trapped in silos across 8+ disconnected systems.
- Carlos manually exports Shopify orders from SourceMedium into spreadsheets.
- Amazon settlement reports arrive days late and require manual reconciliation.
- Laura’s wholesale data lives in email attachments and PDFs.
- Finance spends X+ days each month reconciling revenue across channels.
This creates three critical problems:
- Manual burden: 15+ hours/week spent copying data between systems
- Delayed insights: The October 2025 Shopify conversion rate drop (34.7% decline) took weeks to diagnose because data wasn’t readily available
- No cross-channel view: Can’t answer questions like “Is Target cannibalizing DTC sales?” because the data doesn’t connect
What ETL Solves
ETL (Extract, Transform, Load) is the automated data plumbing that moves information from source systems (Shopify, Amazon, Ware2Go, etc.) into Snowflake, where it can be cleaned, combined, and analyzed.
Think of it like plumbing in a house. ETL is the foundation that everything else (dbt transformations, BI dashboards, AI analysis) is built upon.
Why This Matters Now
- NetSuite migration (2026): Need clean data lineage and definitions locked before ERP migration
- Seasonal urgency: Q4 revenue decline needs rapid diagnosis; can’t wait weeks for manual data pulls
- Team capacity: Carlos, Laura, and team should focus on strategy, not data entry
- Omnichannel expansion: Retail growth (Target, Costco) requires unified revenue view
The Recommendation
Implement a hybrid ETL architecture using:
- Fivetran for mission-critical, high-reliability sources (Amazon, Walmart)
- Polytomic ETL for high-volume sources to take advantage of cost savings (Shopify, Recharge, Ware2Go)
- Polytomic Reverse ETL for marketing activation (Klaviyo)
- Snowpipe for retail analytics (SPINS) if real-time needs arise
- Snowflake Private Share for Walmart 3P (Emerson) — no ETL needed
Investment
| Component | Annual Cost |
|---|---|
| Fivetran (Amazon + Walmart eComm) | 12,060 |
| Polytomic ETL (Shopify + Recharge + Ware2Go) | 2,278 |
| Polytomic Reverse ETL (Klaviyo) | TBD – Contact Polytomic |
| Snowpipe (SPINS, if needed) | 1,800 |
| Total Annual Investment (ETL only) | 16,138 |
What We’ve Learned About CPG Data Challenges
- Retail data lags: POS data from Target/Walmart arrives 2–3 weeks late and requires special handling
- Promotional spikes: Black Friday and product launches can 10x normal order volumes; pipelines must handle surges
- Revenue definition chaos: “Sales” means different things to e-commerce, finance, and retail teams. Must lock definitions early
- Settlement complexity: Amazon settlement reports don’t match order dates; requires reconciliation logic
- Wholesale fragmentation: B2B partners send data in wildly inconsistent formats
Our ETL Philosophy
Don’t Over-Engineer
- Use managed services (Fivetran, Polytomic) for 90% of ingestion
- Custom code only when absolutely necessary
- Prioritize reliability over cost optimization
Document Everything
- Every data source has a lineage diagram
- Every metric has a signed definition
- Change control through pull requests and stakeholder review
Build Incrementally
- Start with highest-value sources (Shopify, Amazon = 80% of revenue)
- Validate before expanding to wholesale and retail
- Quick wins build trust for larger initiatives
Why ETL is Non-Negotiable
Without reliable ETL:
- ❌ Dashboards show stale or incomplete data
- ❌ dbt models fail when source data is missing
- ❌ Team loses trust in the platform
- ❌ Manual processes continue (defeating the entire project)
With reliable ETL:
- ✅ Data updates automatically every day (or hour if needed)
- ✅ Historical data available for trend analysis
- ✅ Foundation for advanced analytics (cohort analysis, LTV predictions)
- ✅ Enables real-time marketing activation (Reverse ETL to Klaviyo)
Assumptions
Source System Assumptions
- Shopify, Amazon, Walmart, Ware2Go, and Recharge row volumes are estimated based on current order activity
- SPINS volumes (500k–1M rows/month) are assumed until validated with SPINS team
- Emerson (Walmart 3P) data is delivered via Snowflake Private Share (no ETL required)
- Order volumes remain stable; growth forecasts are not applied to cost estimates
ETL Pricing Assumptions
- Fivetran pricing = $50 per 100k Monthly Active Rows (MAR)
- Polytomic ETL pricing = $500/month flat rate (up to ~6M rows)
- Polytomic Reverse ETL pricing = $400 per connector/month (Klaviyo confirmed)
- Snowpipe cost for SPINS ingestion = 150/month (if real-time is required)
Architectural Assumptions
- All sources land in Snowflake as the central data warehouse
- No new retail or distribution partners are assumed beyond those listed
- Snowflake compute costs for downstream transformation (dbt) are excluded from ETL cost analysis
LMNT Data Ingestion Volume Overview
LMNT ingests data from multiple retail, marketplace, DTC, and fulfillment systems. Order-driven systems like Shopify and Amazon produce predictable monthly row volumes. Operational systems like Ware2Go fluctuate with shipping schedules. Emerson provides large, high-value static datasets directly through Snowflake sharing. SPINS data will arrive as file drops requiring Snowpipe ingestion.
Ingestion Volume Summary
| Source | Est. Monthly Rows | Est. Annual Rows | Primary Use Case | Data Lag |
|---|---|---|---|---|
| Emerson (Walmart 3P) | 250k–260k | ~3M | POS + Omni reporting | 2–3 weeks |
| Shopify DTC | 700k–1M | 8M–12M | DTC orders, customers, products | Real-time |
| Shopify Wholesale | 20k–60k | 250k–700k | B2B orders | Real-time |
| Recharge | 230k–350k | 2.8M–4.2M | Subscription management | Real-time |
| Ware2Go | 360k–1M | 4.2M–12M | Fulfillment tracking | Daily |
| Amazon Marketplace | 1.5M–2M | 16M–20M | Marketplace orders, settlements | 1–2 days |
| Walmart eCommerce | ~10k | ~120k | 3P marketplace orders | 1–2 days |
| SPINS | 500k–1M | 6M–12M | Retail analytics (Target, etc.) | 2–3 weeks |
Notes
- Walmart 3P (Emerson): ~36M rows of static historical data delivered via Snowflake Private Share (no ETL required)
- Volumes are based on current order rates and typical connector table expansion
- SPINS volumes are estimated pending validation from SPINS team
ETL Platform Selection & Architecture
We evaluated ETL platforms based on:
- Reliability: Uptime, data quality, error handling
- Connector coverage: Native support for LMNT’s sources
- Cost efficiency: Per-row pricing vs. flat-rate models
- Speed to value: Time from sign-up to first data flowing
- Maintenance burden: Engineering time required for ongoing operations
- Scalability: Ability to handle growth (new sources, higher volumes)
- Security & compliance: SOC2, data encryption, access controls
Component Breakdown
1) Fivetran (Primary ETL for Mission-Critical Sources)
Use for
- Amazon Seller Central (1.5M–2M rows/month)
- Walmart Seller Central (10k rows/month)
Why Fivetran
- Highest reliability: 99.9% uptime SLA, battle-tested at enterprise scale
- Zero maintenance: Automatic schema drift detection, connector updates
- Rich historical data: Amazon settlement reports go back 18+ months automatically
- Complex logic handled: Reconciles Amazon’s settlement structure out-of-the-box
Pricing
- $50 per 100,000 Monthly Active Rows (MAR)
- Amazon: ~1,000/month
- Walmart: ~$5/month
- Total: 12,060/year
2) Polytomic ETL (High-Volume & Flexible Sources)
Use for
- Shopify DTC (700k–1M rows/month)
- Shopify Wholesale (20k–60k rows/month)
- Recharge Subscriptions (230k–350k rows/month)
- Ware2Go Fulfillment (360k–1M rows/month)
Why Polytomic
- Cost efficiency: Usage-based pricing cheaper than Fivetran for LMNT’s volumes
- LMNT’s 1.31M–2.41M rows/month = 190/month
- Same volume on Fivetran would cost ~1,900/month
- Savings: ~20K/year vs. Fivetran for these sources
- Fast connector development: Build custom connectors in ~1 week if needed
- High volume optimized: Designed for large datasets with predictable scaling
- Flexibility: API-first design allows custom mappings
Pricing
- Usage-based tiered model: $78.756 per 1M rows (Tier 1: 0–10M rows/month)
- LMNT’s estimated cost: 190/month
- Total: 2,278/year
3) Polytomic Reverse ETL (Marketing Activation)
Use for
- Klaviyo (customer profiles, segments, purchase behavior)
Why Polytomic Reverse ETL
- Native Klaviyo integration
- Incremental syncs (only updates changed records)
- Field mapping UI for non-engineers
- Real-time: Sub-5-minute latency for triggered campaigns
Use case Sync enriched customer data from Snowflake → Klaviyo to enable segmentation based on:
- Lifetime value (LTV)
- Cross-channel behavior (bought on Amazon, not Shopify)
- Subscription status (Recharge)
- Wholesale affiliation
Pricing
- TBD (requires separate quote)
- Typically priced per destination connector
- Added to totals once quote is received
4) Snowpipe (Retail Analytics Files)
Use for
- SPINS (retail POS data, file drops to S3)
Why Snowpipe
- Event-driven ingestion (auto-ingests files as they land in S3)
- Cost-efficient (pay only for actual ingestion)
- Low latency (near real-time ingestion, if needed)
Alternative
- Use Polytomic if real-time ingestion isn’t required (batch daily)
Pricing
- 150/month
- Total: 1,800/year
Decision point Assess with SPINS team whether real-time is needed or daily batch is sufficient.
5) Snowflake Private Share (No ETL Required)
Use for
- Emerson (Walmart 3P POS + Omni data)
Why Private Share
- Zero ETL cost
- Always current (Emerson controls refresh schedule)
- 36M historical rows (instant access to years of data)
Pricing
- $0 (included in Emerson relationship)
- Only pay Snowflake compute costs when querying
ETL Platform Comparison Matrix
| Criteria | Fivetran | Polytomic (ETL) | Polytomic (Reverse ETL) | Airbyte | Dagster |
|---|---|---|---|---|---|
| Reliability | Very High (99.9% SLA) | High (99.5%) | High | Medium (self-managed) | Variable |
| Maintenance | Very Low | Low | Low | High | Very High |
| Connector Coverage | Broad (300+) | Good (150+) | Marketing destinations | Custom | Full custom |
| New Connector Speed | Slow (3–6 months) | Fast (~1 week) | N/A | Medium (community) | Engineering-built |
| Cost Model | Per 100k rows ($50) | Usage-based tiered ($78.76/1M rows at Tier 1) | Per connector (TBD) | Infrastructure + labor | Labor only |
| Setup Time | 3–5 days | 1–3 days | 1–3 days | 1–2 weeks | 4–8 weeks |
| Security | SOC2, GDPR, HIPAA | SOC2, GDPR | SOC2 | Self-managed | Self-managed |
| Best For | Mission-critical sources | High-volume or unsupported | Marketing activation | Dev resources available | Internal APIs |
Why We Eliminated Other Options
| Tool | Reason for Exclusion |
|---|---|
| Airbyte | High maintenance burden; unstable for multi-million row retail datasets; requires dedicated engineering |
| Hevo Data | Limited track record with complex omnichannel retail loads |
| Matillion | Heavy ETL architecture; requires extensive engineering; expensive |
| Stitch / Talend | Deprecated or low investment in new connectors; limited support |
| Rudderstack | Strong CDP focus, weaker ETL ingestion for structured analytics use cases |
Cost Summary & ROI Analysis
ETL Cost Breakdown
| Source | Monthly Rows | ETL Method | Monthly Cost | Annual Cost |
|---|---|---|---|---|
| Emerson (Walmart 3P) | 250k–260k | Snowflake Private Share | $0 | $0 |
| Shopify DTC | 700k–1M | Polytomic ETL | ~79 | ~948 |
| Shopify Wholesale | 20k–60k | Polytomic ETL | ~5 | ~57 |
| Recharge | 230k–350k | Polytomic ETL | ~28 | ~331 |
| Ware2Go | 360k–1M | Polytomic ETL | ~79 | ~948 |
| Amazon Marketplace | 1.5M–2M | Fivetran | 1,000 | 12,000 |
| Walmart eCommerce | ~10k | Fivetran | $5 | $60 |
| SPINS | 500k–1M | Snowpipe (or Polytomic) | 150 | 1,800 |
Polytomic ETL pricing note: Usage-based at $78.756 per 1M rows (Tier 1: 0–10M rows/month)
Total ETL Investment
| Type | Monthly | Annual |
|---|---|---|
| Fivetran ETL | 1,005 | 12,060 |
| Polytomic ETL | 190 | 2,278 |
| Polytomic Reverse ETL (Klaviyo) | TBD | TBD |
| Snowpipe (Optional) | 150 | 1,800 |
| Emerson Share | $0 | $0 |
| Total (excluding Reverse ETL) | 1,345 | 16,138 |
Recommended ETL Architecture
Primary ETL → Fivetran
Use for revenue-critical, daily ingestion:
- Amazon Seller Central
- Walmart Seller Central
Why: Strongest reliability & connector quality.
Supporting ETL → Polytomic (ETL)
Use for high-volume or unsupported datasets:
- Shopify
- Ware2Go
- Recharge
- Any connector missing in Fivetran
Why: Cost-efficient, flexible, rapid connector turnaround.
Activation → Polytomic (Reverse ETL)
Synchronize processed and enriched customer data from Snowflake back into Klaviyo for lifecycle marketing, segmentation, and personalization.
Procurement & Timeline
Procurement Process (Optimized for Speed)
Why this is fast Unlike enterprise software with lengthy RFPs, modern ETL tools are designed for quick sign-up.
Fivetran
- Trial: 14-day free trial (start immediately, no contract)
- Contract: Standard MSA, no negotiation needed (SOW within 24 hours)
- Payment: Credit card or invoice (net-30 terms available)
- Setup: Go-live within 3–5 days of contract signature
- SOC2 Type II report available for IT review
Polytomic
- Trial: 14-day free trial (can load full data during trial)
- Contract: Standard terms, no customization needed
- Payment: Credit card (annual prepay for discount) or invoice
- Setup: Go-live within 1–3 days
- Dedicated customer success engineer assigned
Total procurement time: 3–5 business days (vs. 4–8 weeks for traditional enterprise tools)
LMNT Procurement Steps
| Step | Owner | Timeline | Notes |
|---|---|---|---|
| 1. Review & approve memo | Shivani, Jason, Phil | Dec 9–13 | This document |
| 2. Vendor SOC2 review | Jason (IT) | Dec 13–16 | Brainforge provides SOC2 reports |
| 3. Sign MSAs | Jason or Finance | Dec 16–18 | Standard contracts, no negotiation |
| 4. Provision API keys | Jason (IT) | Dec 18–19 | Shopify, Amazon, Recharge, etc. |
| 5. Brainforge setup begins | Awaish | Dec 19 | Start with Shopify + Amazon |
| Go-live (first data flowing) | — | Dec 23–26 | Week 2 of project |
Decision needed by: December 16, 2025 (to meet January backfill timeline)
Implementation Timeline
Week 1: Setup & Configuration (Dec 19–26)
Fivetran
- Create Fivetran account
- Connect Amazon Seller Central API
- Connect Walmart Seller Central API
- Start incremental sync (30 days back)
Polytomic
- Create Polytomic account
- Connect Shopify DTC API
- Connect Recharge API
- Start incremental sync
Snowflake
- Create raw data schemas (raw_shopify, raw_amazon, etc.)
- Set up user permissions
- Configure dbt connection
Deliverable: First data flowing into Snowflake (30 days of recent history)
Week 2–3: Historical Backfills (Dec 26 – Jan 9)
Fivetran
- Amazon: Backfill 24 months (settlement reports, orders, inventory)
- Walmart: Backfill 12 months
Polytomic
- Shopify DTC: Backfill 24 months (orders, customers, products)
- Recharge: Backfill 24 months (subscriptions, charges)
Validation
- Reconcile row counts: Snowflake vs. source system UI
- Spot-check sample orders (10–20 per source)
- Compare revenue totals: Snowflake vs. finance records
Deliverable: 2 years of historical data loaded, validated
Week 4–5: Remaining Sources (Jan 9 – Jan 23)
Polytomic
- Ware2Go: Connect fulfillment API, backfill 12 months
- Shopify Wholesale: Connect, backfill 12 months
Snowpipe (or Polytomic)
- SPINS: Set up S3 bucket, configure Snowpipe, test file ingestion
Emerson
- Validate Snowflake Private Share access
- Document table schemas and refresh schedule
Deliverable: All sources connected, backfills complete
Week 6: Reverse ETL (Jan 23 – Jan 30)
Polytomic Reverse ETL
- Connect Klaviyo destination
- Map Snowflake customer table → Klaviyo profiles
- Set up incremental syncs (daily or hourly)
- Test with small segment before full rollout
Deliverable: Customer data syncing to Klaviyo
Week 7–8: Monitoring & Handoff (Jan 30 – Feb 6)
Monitoring setup
- Slack alerts for pipeline failures
- Weekly data quality reports (automated)
- Dashboard for row counts and freshness
Documentation
- Connector runbooks (how to troubleshoot each source)
- Data lineage diagrams
- Access control documentation
Training
- 2-hour session with Jason’s IT team or Phil’s analytics team
- Q&A and knowledge transfer
Deliverable: ETL running autonomously, LMNT team trained
Go-Live: January 2026 (Week 3–4)
What “Go-Live” means
- All sources flowing daily into Snowflake
- Historical data complete (2+ years)
- Monitoring and alerting operational
- dbt models ready to build on top of raw data
What happens next (out of scope for this memo)
- Phase 2: dbt transformations (unified
fct_orderstable) - Phase 3: BI dashboards and reports
Final Recommendation
LMNT should implement a hybrid ETL architecture using:
- Fivetran for Amazon and Walmart
- Polytomic ETL for Shopify, Recharge, and Ware2Go
- Polytomic Reverse ETL for Klaviyo
- Snowpipe for SPINS if realtime needs (otherwise Polytomic)
- Snowflake Private Share for Emerson Walmart 3P
This structure balances cost efficiency, connector coverage, scalability, and long-term maintainability while supporting LMNT’s operational and analytical requirements.
Appendix
Snowflake Schema Design
Proposed structure LMNT_DATA_WAREHOUSE ├── RAW_SHOPIFY (Fivetran or Polytomic land here) │ ├── orders │ ├── customers │ ├── products ├── RAW_AMAZON (Fivetran lands here) │ ├── orders │ ├── settlements ├── RAW_RECHARGE (Polytomic lands here) │ ├── subscriptions ├── RAW_WARE2GO (Polytomic lands here) │ ├── shipments ├── RAW_SPINS (Snowpipe lands here) │ ├── pos_data ├── RAW_EMERSON (Private Share, read-only) │ ├── walmart_pos │ ├── walmart_omni └── DBT_MODELS (Transformations in Phase 2) ├── staging ├── intermediate └── marts
Data Retention
ETL tools
- Fivetran: Retains sync logs for 30 days (data lives in Snowflake permanently)
- Polytomic: Retains sync logs for 90 days
Snowflake
- Raw data: Retained indefinitely (LMNT controls)
- Time Travel: 1–90 days (configurable, for accidental deletion recovery)
- Fail-safe: 7 days (Snowflake backup for disaster recovery)
Monitoring & Alerting
Pipeline health
- Slack notifications for:
- Pipeline failures (immediate)
- Row count anomalies (daily summary)
- Schema changes detected (immediate)
Data quality
- dbt tests run after each pipeline sync
- Alerts for:
- Null values in critical fields (e.g., order_id, revenue)
- Referential integrity violations (e.g., order without customer)
- Statistical anomalies (e.g., 50% drop in daily orders)
Dashboard
- Snowflake-based monitoring dashboard showing:
- Sync status by source (last successful sync time)
- Row counts and growth trends
- Data freshness (how old is the most recent record?)