[Client] — Modeling design doc — [Model / domain name]

About this document (Brainforge)

Internal conventions for how this file works in the repo. Optional: strip or export without this section when sharing a client-only artifact.

Titling and filename

Use [Client] — Modeling design doc — [Model or domain name] for the document title. Examples: LMNT — Modeling design doc — Omnichannel revenue foundation · Acme — Modeling design doc — Wholesale order-to-cash.

Filename: {client}-modeling-design-{model-slug}.md under knowledge/clients/{client}/resources/.

Single source of truth

Repo markdown under knowledge/clients/{client}/resources/ is the canonical design doc once merged. Google Docs may support stakeholder review; this file wins on scope and model definitions.

  • This doc (repo markdown) is the version-of-record for the model design — what was built, at what grain, with what tests. Update it when the model changes; do not maintain parallel “v2” labels.
  • Metric definitions live in Data Platform Documentation and the Metrics Definition Document. This doc links and may reference them but is not the definition authority.

When to use this template

Use this template when:

  • a new data model is being designed and stakeholders need to agree on grain, columns, and tests before code is written
  • an existing model is being refactored and the new design needs documentation
  • the ask is design / specification, not implementation code

Do not use this template when:

  • the deliverable is a full schema audit (use the Discovery Memo Appendix C)
  • the deliverable is a warehouse architecture recommendation (use Warehouse Architecture Assessment template)
  • the work is a one-time ad-hoc analysis with no reusable model

Document metadata

Status: [Draft / In review / Approved / Locked] Warehouse: [Snowflake / BigQuery / other] — Account/region: [details] Database / project: [database or project name] Transformation layer: [dbt / Dataform / custom SQL / other] Prepared for: [Client stakeholder names and roles] Prepared by: Brainforge Last updated: [YYYY-MM-DD]


ArtifactLink / pathNotes
Data Platform Documentation[Google Sheet link]Source catalog, metric definitions
Discovery Memo[path to A1 memo]Source profiling, SLAs, schema reference
Linear project[Linear project URL]Build tickets and acceptance criteria
Schema Audit[Appendix C of A1 memo]Column-level reference for input tables

1. Model purpose and scope

1.1 Why this model exists

[2–4 sentences. What business question or decision does this model serve? What was the problem before this model existed?]

1.2 Scope boundaries

In scope:

  • [Domain, subject area, or source group]
  • [Specific data included]

Out of scope:

  • [Domain excluded from this model]
  • [Reason for exclusion]

2. Input sources

Every source table used by this model, with grain, freshness, and join references.

2.1 [Source table name]

AttributeValue
Warehouse path[database.schema.table]
Source system[e.g., Shopify via Polytomic]
Grain[one row per …]
Freshness SLA[see §2.3 of discovery memo]
Key columns used[column(s)]

2.2 [Source table name]

(Repeat for each input source.)


3. Entity design

One subsection per output model. Each model represents one business entity or analytical surface.

3.1 [Model name]

Business entity: [e.g., Order line — one row per line item on an invoice]

Grain: [one row per …]

Source tables: [table 1], [table 2]

Key columns:

ColumnSourceTransformationBusiness meaning
[column][source.table.column][pass-through / aggregation / case when / join][what it represents]
[column][source.table.column][pass-through / aggregation / case when / join][...]

Filters applied:

  • [e.g., status = 'active' — excludes cancelled orders]
  • [e.g., created_at >= '2025-01-01' — scoped to post-migration data]

Tests / assertions:

  • [e.g., grain uniqueness — no duplicate order line IDs]
  • [e.g., referential integrity — every customer_id exists in customer master]
  • [e.g., not null — order_id is never null]
  • [e.g., accepted range — revenue >= 0]

3.2 [Model name]

(Repeat for each output model.)


4. Join graph and lineage

[Source A] ──┐
             ├──> [Staging A] ──┐
[Source B] ──┘                  │
                                ├──> [Intermediate] ──> [Mart]
[Source C] ─────────────────────┘

[Brief narrative of how sources connect to outputs. Use the diagram above or replace with a more specific flow.]

Key join relationships:

Left tableJoin keyRight tableJoin typeCardinality impact
[table][column(s)][table][inner / left / full][e.g., may fan out on order_id if right side is multi-row]
[table][column(s)][table][inner][1:1]

5. Incremental / snapshot strategy

ConcernApproach
Load strategy[Full refresh / Incremental on timestamp / Snapshot merge]
Incremental key (if used)[column]
Backfill window[e.g., look back 3 days on each run]
Historical rebuild[e.g., full refresh required when grain changes — coordination needed]

6. Dependencies and scheduling

Downstream consumerDependencyExpected timing
[Dashboard / report][this model must complete before][e.g., daily by 08:00 UTC]
[BI semantic layer][this model must complete before][e.g., before business hours]

7. QA and reconciliation

7.1 Row count verification

-- Compare output row count vs source row counts
SELECT 'output_model' AS object, COUNT(*) AS row_count FROM [database.schema.output_model]
UNION ALL
SELECT 'source_table', COUNT(*) FROM [database.schema.source_table];

7.2 Key metric parity

-- Verify a key metric matches the source after transformation
SELECT SUM(revenue) AS total_revenue FROM [database.schema.output_model]
-- Expected: $[X]M, measured YYYY-MM-DD

7.3 What passing looks like

  • Row counts are within expected range from source volumes
  • Grain uniqueness test passes (no duplicate keys)
  • Referential integrity tests pass for all foreign keys
  • Key metric parity within 1% of source (allowing for time-window differences)

Appendix A — Agent guardrails (anti-fluff)

Do:

  • Name grain explicitly (one row per order line vs one row per invoice).
  • Use “transformation layer” or the project’s actual tool name; do not default to “dbt” if the stack is different.
  • Prefer tables over prose for column catalogs and test lists.
  • Flag any join fan-out risk with a note on cardinality.

Do not:

  • Model the data before the design doc is approved — design first.
  • Use placeholders for columns that are known and settled.
  • Add tests that the implementation agent would not be able to write.
  • Say “comprehensive” or “all-encompassing” — name what is and is not in scope.

Appendix B — Pre-handoff QA checklist

  • Every output model has a declared grain
  • Key columns section includes source, transformation, and business meaning
  • Tests / assertions are concrete (not “should be correct”)
  • Join graph shows all sources → outputs
  • Incremental strategy is documented (even if full refresh)
  • QA queries would pass if run against the built model
  • All placeholders are filled or marked as intentional TBD