data-source-discovery-memo-new-share-playbook

Version: 0.2
Last updated: 2026-04-10
Audience: LLM agents and humans drafting a leadership-ready discovery memo for a new share or schema in Snowflake, BigQuery, or another agreed warehouse—plus a large net-new set of tables to profile.

Purpose

Produce a single, linear document that explains what landed in the warehouse, why it matters, and how to use it—at a table-aware level of detail (not a one-pager). Standardize structure, depth, and tone so outputs read like a concise executive memo written by a human, not a generic AI overview.

When to use

Use this playbook when:

a new private share or new schema appears for a client and leadership needs a written baseline
many new tables (for example dozens) need to be introduced in one memo with consistent subsections
the ask is discovery / profiling / interpretation, not a full dimensional model or dbt project design

Do not use this playbook when:

the user only needs a one-page overview (use data-source-memo-executive-style-playbook.md instead)
the task is a technical schema audit with full column dictionaries and no narrative (use a schema audit doc pattern; link from the memo instead)

Service line / subservice

Field	Value
Service line	Data Platform
Primary subservice	Analytics and BI (memo narrative); Data Infrastructure (warehouse validation)

Approval-before-execution pipeline

Drafting: No approval required to draft in knowledge/clients/{client}/resources/ (or the client repo if that is where deliverables live).
Publishing externally: Get owner sign-off on facts, figures, and any client-facing framing.
Source systems of record: Do not edit the client’s or vendor’s canonical Google Doc / wiki unless explicitly asked; create a new file or doc for Brainforge-led memos when that is the engagement rule.

Scope

In scope

Standard memo skeleton (sections 1–7 style: executive summary → relationship to data → table documentation → cross-table → deep dive → recommendations → appendix-style notes as needed)
Per-table blocks for each profiled object: metrics table, business description, interpretation, questions
Warehouse validation steps on the confirmed platform (Snowflake, BigQuery, etc.) to ground counts and grains
Executive style rules and agent guardrails (anti-fluff)

Out of scope

Building dbt models, dashboards, or row-level QA fixes
Replacing a data contract or legal agreement

Prerequisites

Requirement	Notes
Platform and location confirmed	Ask explicitly: Snowflake vs BigQuery (or other). Record account / region / project, database or dataset, and schema (or equivalent) for discovery.
Auth method agreed	User or engagement owner chooses how to authenticate (see Warehouse platform and authentication gate below). Never assume credentials.
Access verified	Read access proven with a minimal query before drafting metrics—see gate section.
Scope list	Which tables are in scope; avoid profiling tables nobody asked for
Snapshot discipline	Record query date and whether counts are estimates (row estimates vs exact counts)

Inputs

Input	Example	Required
Client / program name	LMNT	Yes
Warehouse platform	Snowflake / BigQuery / other	Yes
Discovery scope	Snowflake: database + schema (and share name if applicable); BigQuery: project + dataset	Yes
Prior art	Vault schema audit markdown, vendor deck, stakeholder call notes	As available
Audience line	Leadership, Analytics, Engineering	Yes

Warehouse platform and authentication gate (do this first)

Do not draft §3 metrics or claim live row counts until this section is satisfied.

1) Ask where the data lives

Ask the user (or read the ticket) explicitly:

Which system: Snowflake, BigQuery, or something else?
Exact scope for discovery: database + schema (Snowflake), or project + dataset (BigQuery), including environment (prod vs dev) if both exist.

Write the answers into working notes (ticket or draft memo metadata). Do not guess from filenames alone.

2) Confirm how to authenticate (user’s choice)

The user or engagement owner decides the method—confirm in chat or ticket before running tools:

Platform	Common options (examples—not an exhaustive list)
Snowflake	Snow CLI (`snow` connection test / `snow sql -c <connectionName>`), browser SSO through CLI, key-pair user, service / automation user per client policy
BigQuery	`gcloud auth application-default login`, service account JSON (path or 1Password reference), workload identity—per client policy
Other	Follow client runbook; if none exists, ask

Rules

Do not paste secrets, private keys, or full JSON key contents into the memo or chat logs.
Prefer 1Password CLI or env vars per knowledge/standards/03-knowledge/engineering/setup/1password-cli-setup.md when the engagement stores credentials there.

3) Verify auth and reachability

Run a minimal read that proves the identity can see the target objects, for example:

Snowflake: SHOW DATABASES / SHOW SCHEMAS IN DATABASE … / SHOW TABLES IN SCHEMA …, or SELECT 1 from one known table; use the same connection the profiling will use.
BigQuery: INFORMATION_SCHEMA.TABLES for the dataset, or SELECT 1 from an allowed table in bq or client-approved SQL workspace.

If the user cannot run commands locally, they must confirm access another way (screenshot of successful query, or another engineer attests)—record that confirmation rather than inventing access.

4) Confirm access to the discovery database/schema (and data)

Checklist before moving to memo drafting:

Platform (Snowflake vs BigQuery vs other) is written down.
Database/schema or project/dataset matches what the memo will describe.
Authentication succeeded for the chosen method.
At least one SELECT or metadata listing succeeded on the target scope (not a different account or sandbox by mistake).

If any step fails: stop. Document the error (permission denied, wrong project, MFA required). Ask the user to fix IAM/roles, network, or connection profile. Do not fabricate row counts or table lists.

Standard document structure

Use this order unless the engagement template forbids it:

Title — {Client} — Data source discovery memo (or engagement-specific title; avoid “Part 1 / Part 2” unless the client asked for a split).
Metadata block — Audience, source (share + access path), date. If the memo mixes time periods (for example December narrative plus April profiling), state that explicitly on one or two lines.
§1 Executive summary — Short subsections such as: What this is, Why this matters now, Key findings, What you’ll find in this memo. Keep tight; no table dumps here.
§2 Relationship & access — How data arrives (for example Private Share), implications for ETL, continuity risks if the vendor relationship changes.
§3 Table-level documentation — Group by source domain (for example Source: Walmart, Source: Geodis 3PL, Source: LMNT ERP). Within each domain, numbered subsections (3.1, 3.2, …) one per table.
§3 subsection pattern (repeat)
- ### Summary metrics — small markdown table: row counts, distinct keys, date ranges, grain. Label query snapshot date where counts are time-sensitive.
- ### Business description — what the object represents in business terms.
- ### Interpretation — bullets: grain warnings, join keys, known duplication (for example snapshot-stacked history).
- ### Questions — business questions this table helps answer.
§4 Cross-table insights — Synthesis across domains (distribution, channel mix, etc.) only using tables already profiled.
§5+ Deep dives / comparisons — Optional, engagement-specific.
§6 Recommendations / next steps — Action-oriented, minimal hype.

Level of detail

Executive summary: outcomes and scale, not every metric duplicated from §3.
§3: enough metrics to justify interpretations; avoid reproducing full INFORMATION_SCHEMA column lists.
Interpretation: call out multi-retailer, snapshot vs fact, dedupe rules when row counts are huge.

Workflow

Step 1 — Warehouse platform and authentication gate

Complete Warehouse platform and authentication gate above: confirm Snowflake vs BigQuery (or other), auth method, verified login/list/read, and correct database/schema or project/dataset.

Step 2 — Freeze scope and numbering

Build a table list with stable §3 numbers. If net-new tables arrive later, append 3.12, 3.13, … (do not renumber existing subsections in a published memo without an explicit versioning decision).

Step 3 — Validate in the warehouse

Confirm schema exists; sample row counts, distinct keys, and date columns using the approved and tested connection from Step 1.
Prefer one snapshot date in the memo for all new profiling (for example “queried YYYY-MM-DD”) to avoid mixed stale metrics.

Step 4 — Cross-check with internal audit notes

If knowledge/clients/{client}/resources/*SCHEMA_AUDIT*.md (or similar) exists, reconcile table names and domains; fix the audit doc or the memo if they disagree—do not ship contradictions.

Step 5 — Draft the narrative first, then metrics

Write Business description and Interpretation before polishing numbers; numbers should support the story, not replace it.

Step 6 — Executive summary last (or tighten last)

After §3 is stable, write Key findings and What you’ll find as faithful previews of §3–§4, not generic AI summaries.

Step 7 — Self-review with guardrails

Run the Agent guardrails (anti-fluff) checklist below before sending.

Agent guardrails (executive style, human tone)

Prefer short sentences, neutral voice, and specific numbers with dates.
Use markdown tables for metric bundles; use bullets for interpretations and questions.
Name grain explicitly (order line vs invoice vs snapshot history).
Flag caveats (estimate vs exact count, repeated snapshots inflating rows).
Match the client’s vocabulary when known (program names, retailer names).

Do not

Use filler openers (“In today’s data landscape…”, “It is important to note…”, “This comprehensive overview…”).
Stack synonyms for importance (critical, robust, powerful, holistic) without adding fact.
Add generic value claims (“drives actionable insights”) without tying to a decision or metric.
Use em dashes as a crutch for long, meandering clauses—prefer periods.
Invent row counts, schemas, or join keys; if unknown, write TBD and list what query will answer it.
Rewrite the whole memo in a “friendlier” tone when the ask was factual discovery.

Quality check

Would a director skim §1 in two minutes and know what landed and why it matters?
Could an analyst use §3 to find grain and caveats without reading marketing language?

Failure modes / gotchas

Skipped auth gate—memo cites metrics from a warehouse the author never reached; verify Step 1 before profiling.
Wrong platform or project—Snowflake account vs BigQuery project confused; confirm in writing with the user.
Profiled the wrong environment (dev vs prod schema)—confirm database/schema in the memo header.
Huge row counts without interpretation—readers assume duplication or bugs; explain snapshot stacking or late-arriving rows.
Renumbering §3 after release—breaks links and audit trails; append new subsections instead.
Mixing vendor PDF language with warehouse facts—label each clearly.

Example implementation

knowledge/clients/lmnt/resources/EMERSON_DATA_SOURCE_DISCOVERY_MEMO.md — full-structure discovery memo with multiple source domains and appended net-new §3.xx blocks.

knowledge/standards/02-writing/data-source-memo-executive-style-playbook.md — one-page executive memo (lighter than this playbook).
knowledge/standards/02-writing/data-source-discovery-memo-update-playbook.md — additive updates to an existing discovery memo.
knowledge/standards/02-writing/PLAYBOOK_SCAFFOLD.md
Human tone patterns: .cursor/skills/humanizer/SKILL.md (optional pass for prose-heavy sections—not a substitute for factual discipline).

Brainforge Knowledge

Explorer

data-source-discovery-memo-new-share-playbook

Purpose

When to use

Service line / subservice

Approval-before-execution pipeline

Scope

Prerequisites

Inputs

Warehouse platform and authentication gate (do this first)

1) Ask where the data lives

2) Confirm how to authenticate (user’s choice)

3) Verify auth and reachability

4) Confirm access to the discovery database/schema (and data)

Standard document structure

Workflow

Step 1 — Warehouse platform and authentication gate

Step 2 — Freeze scope and numbering

Step 3 — Validate in the warehouse

Step 4 — Cross-check with internal audit notes

Step 5 — Draft the narrative first, then metrics

Step 6 — Executive summary last (or tighten last)

Step 7 — Self-review with guardrails

Agent guardrails (executive style, human tone)

Failure modes / gotchas

Example implementation

Graph View

Table of Contents

Brainforge Knowledge

Explorer

data-source-discovery-memo-new-share-playbook

Playbook: Data source discovery memo (new share or major new coverage)

Purpose

When to use

Service line / subservice

Approval-before-execution pipeline

Scope

Prerequisites

Inputs

Warehouse platform and authentication gate (do this first)

1) Ask where the data lives

2) Confirm how to authenticate (user’s choice)

3) Verify auth and reachability

4) Confirm access to the discovery database/schema (and data)

Standard document structure

Workflow

Step 1 — Warehouse platform and authentication gate

Step 2 — Freeze scope and numbering

Step 3 — Validate in the warehouse

Step 4 — Cross-check with internal audit notes

Step 5 — Draft the narrative first, then metrics

Step 6 — Executive summary last (or tighten last)

Step 7 — Self-review with guardrails

Agent guardrails (executive style, human tone)

Failure modes / gotchas

Example implementation

Related

Graph View

Table of Contents