Amazon Raw Models Missing Data Report

Scope: dbt_project/models/raw/amazon/
Files reviewed:

  • raw_amazon__orders.sql
  • raw_amazon__order_items.sql
  • raw_amazon__financial_events.sql
  • raw_amazon__fulfillment_orders.sql
  • raw_amazon__catalog_items.sql
  • raw_amazon__listings_items.sql
  • raw_amazon__marketplace_participations.sql

Goal: highlight key business data that is not present in the current raw Amazon models.


Executive Summary

The Amazon raw layer is strong on identifiers and marketplace metadata, but it is missing several fields needed for revenue, customer, and fulfillment analysis.

Highest-priority gaps

  1. No order total in raw_amazon__orders

    • The orders model has order identity and status fields, but no explicit order amount / grand total.
    • Missing related financial fields also include tax, shipping, discount, and currency at the order header level.
  2. No line revenue in raw_amazon__order_items

    • The order items model has quantity and SKU details, but no unit price, line subtotal, line discount, tax, or line revenue.
    • That prevents item-level revenue reporting and makes order-level reconciliation harder.
  3. No dedicated customer model in the Amazon raw layer

    • The only customer-like fields in raw_amazon__orders are BUYER_EMAIL and BUYER_NAME.
    • There is no customer table or richer customer profile data such as customer ID, shipping address, phone, or lifecycle metadata.
  4. Financial events are not normalized

    • raw_amazon__financial_events exposes event lists, but not exploded event rows with amount, currency, event date, or linked order/item identifiers.
    • This makes refunds, fees, and settlement analysis difficult.

Model-by-Model Review

raw_amazon__orders

What it has:

  • Order identifiers and status fields
  • Buyer email/name
  • Shipment timing and channel metadata

Missing or incomplete data:

  • Order total / grand total
  • Tax amount
  • Shipping amount
  • Discount amount
  • Currency code
  • Customer ID
  • Shipping address / billing address
  • Phone number or richer customer profile fields

Impact:

  • Cannot calculate revenue directly from the order header.
  • Cannot build a complete customer dimension from this table alone.

raw_amazon__order_items

What it has:

  • Item identifiers
  • SKU / ASIN
  • Quantity ordered and shipped
  • Item attributes like condition, gift wrap, promotion IDs

Missing or incomplete data:

  • Unit price
  • Line subtotal
  • Line revenue
  • Line discount amount
  • Line tax amount
  • Shipping allocation at line level
  • Currency code

Impact:

  • Cannot produce item-level revenue or margin analysis.
  • Cannot reconcile order total to item-level totals.

raw_amazon__financial_events

What it has:

  • Event-type groupings for refunds, fees, shipments, reimbursements, and settlement-related activity

Missing or incomplete data:

  • Flattened transaction rows
  • Event amount
  • Currency code
  • Event timestamp
  • Order ID / order item ID linkage
  • Fee type and settlement detail in a normalized structure

Impact:

  • Financial reconciliation and net revenue analysis are constrained.
  • The table is difficult to join cleanly to orders and order items.

raw_amazon__fulfillment_orders

What it has:

  • Fulfillment order identifiers and status metadata
  • Fulfillment policy and shipping speed category

Missing or incomplete data:

  • Shipment tracking number
  • Carrier / service details
  • Shipment cost
  • Package-level line item revenue impact
  • Delivery confirmation detail

Impact:

  • Good for status tracking, but weak for logistics cost and delivery analytics.

raw_amazon__catalog_items

What it has:

  • Product identifiers and catalog metadata

Missing or incomplete data:

  • Customer data
  • Sales transaction data
  • Inventory quantity / stock on hand
  • Pricing history

Impact:

  • Useful as a product reference layer, but not enough for demand or revenue analysis by itself.

raw_amazon__listings_items

What it has:

  • Listing metadata, offer structure, availability, and issues

Missing or incomplete data:

  • Customer data
  • Order data
  • Line revenue
  • Historical price and discount snapshots

Impact:

  • Useful for catalog/listing operations, but not enough for sales reporting.

raw_amazon__marketplace_participations

What it has:

  • Marketplace identity, locale, and participation flags

Missing or incomplete data:

  • Business transaction data
  • Customer data
  • Order and item measures

Impact:

  • This is a lookup table only, not a transactional source.

Cross-Cutting Missing Data

The Amazon raw layer is missing these broader entities or measures:

  • Customer dimension: no dedicated customer model or customer ID
  • Order financial summary: no order total, tax, shipping, discount, or currency at the order header level
  • Order item financials: no unit price, line revenue, or discount allocation
  • Normalized financial ledger: financial events are nested lists instead of row-level facts
  • Logistics detail: no tracking, carrier, or shipping cost fields in fulfillment data

If these fields exist upstream in Amazon SP-API or Polytomic, add them to the raw layer first.

Recommended additions:

  • ORDER_TOTAL, ITEM_PRICE, LINE_TOTAL, LINE_REVENUE, TAX_AMOUNT, SHIPPING_AMOUNT, DISCOUNT_AMOUNT, CURRENCY_CODE
  • CUSTOMER_ID, CUSTOMER_EMAIL, SHIPPING_ADDRESS, BILLING_ADDRESS, PHONE
  • EVENT_AMOUNT, EVENT_DATE, EVENT_TYPE, CURRENCY_CODE, ORDER_ID, ORDER_ITEM_ID
  • TRACKING_NUMBER, CARRIER_NAME, SHIPPING_COST

If Amazon does not expose these directly, create intermediate models that derive what can be inferred from the available source data.

Interim Implementation Update

The shipment-event portion of the financial events feed is now normalized in dbt_project/models/intermediate/amazon/int_amazon__financial_shipment_events.sql. It flattens shipment_event_list into row-level shipment item charge records while preserving the raw nested payloads for traceability.