Playbook: Data source discovery memo (new share or major new coverage)
Version: 0.2
Last updated: 2026-04-10
Audience: LLM agents and humans drafting a leadership-ready discovery memo for a new share or schema in Snowflake, BigQuery, or another agreed warehouse—plus a large net-new set of tables to profile.
Purpose
Produce a single, linear document that explains what landed in the warehouse, why it matters, and how to use it—at a table-aware level of detail (not a one-pager). Standardize structure, depth, and tone so outputs read like a concise executive memo written by a human, not a generic AI overview.
When to use
Use this playbook when:
- a new private share or new schema appears for a client and leadership needs a written baseline
- many new tables (for example dozens) need to be introduced in one memo with consistent subsections
- the ask is discovery / profiling / interpretation, not a full dimensional model or dbt project design
Do not use this playbook when:
- the user only needs a one-page overview (use
data-source-memo-executive-style-playbook.mdinstead) - the task is a technical schema audit with full column dictionaries and no narrative (use a schema audit doc pattern; link from the memo instead)
Service line / subservice
| Field | Value |
|---|---|
| Service line | Data Platform |
| Primary subservice | Analytics and BI (memo narrative); Data Infrastructure (warehouse validation) |
Approval-before-execution pipeline
- Drafting: No approval required to draft in
knowledge/clients/{client}/resources/(or the client repo if that is where deliverables live). - Publishing externally: Get owner sign-off on facts, figures, and any client-facing framing.
- Source systems of record: Do not edit the client’s or vendor’s canonical Google Doc / wiki unless explicitly asked; create a new file or doc for Brainforge-led memos when that is the engagement rule.
Scope
In scope
- Standard memo skeleton (sections 1–7 style: executive summary → relationship to data → table documentation → cross-table → deep dive → recommendations → appendix-style notes as needed)
- Per-table blocks for each profiled object: metrics table, business description, interpretation, questions
- Warehouse validation steps on the confirmed platform (Snowflake, BigQuery, etc.) to ground counts and grains
- Executive style rules and agent guardrails (anti-fluff)
Out of scope
- Building dbt models, dashboards, or row-level QA fixes
- Replacing a data contract or legal agreement
Prerequisites
| Requirement | Notes |
|---|---|
| Platform and location confirmed | Ask explicitly: Snowflake vs BigQuery (or other). Record account / region / project, database or dataset, and schema (or equivalent) for discovery. |
| Auth method agreed | User or engagement owner chooses how to authenticate (see Warehouse platform and authentication gate below). Never assume credentials. |
| Access verified | Read access proven with a minimal query before drafting metrics—see gate section. |
| Scope list | Which tables are in scope; avoid profiling tables nobody asked for |
| Snapshot discipline | Record query date and whether counts are estimates (row estimates vs exact counts) |
Inputs
| Input | Example | Required |
|---|---|---|
| Client / program name | LMNT | Yes |
| Warehouse platform | Snowflake / BigQuery / other | Yes |
| Discovery scope | Snowflake: database + schema (and share name if applicable); BigQuery: project + dataset | Yes |
| Prior art | Vault schema audit markdown, vendor deck, stakeholder call notes | As available |
| Audience line | Leadership, Analytics, Engineering | Yes |
Warehouse platform and authentication gate (do this first)
Do not draft §3 metrics or claim live row counts until this section is satisfied.
1) Ask where the data lives
Ask the user (or read the ticket) explicitly:
- Which system: Snowflake, BigQuery, or something else?
- Exact scope for discovery: database + schema (Snowflake), or project + dataset (BigQuery), including environment (prod vs dev) if both exist.
Write the answers into working notes (ticket or draft memo metadata). Do not guess from filenames alone.
2) Confirm how to authenticate (user’s choice)
The user or engagement owner decides the method—confirm in chat or ticket before running tools:
| Platform | Common options (examples—not an exhaustive list) |
|---|---|
| Snowflake | Snow CLI (snow connection test / snow sql -c <connectionName>), browser SSO through CLI, key-pair user, service / automation user per client policy |
| BigQuery | gcloud auth application-default login, service account JSON (path or 1Password reference), workload identity—per client policy |
| Other | Follow client runbook; if none exists, ask |
Rules
- Do not paste secrets, private keys, or full JSON key contents into the memo or chat logs.
- Prefer 1Password CLI or env vars per
knowledge/standards/03-knowledge/engineering/setup/1password-cli-setup.mdwhen the engagement stores credentials there.
3) Verify auth and reachability
Run a minimal read that proves the identity can see the target objects, for example:
- Snowflake:
SHOW DATABASES/SHOW SCHEMAS IN DATABASE …/SHOW TABLES IN SCHEMA …, orSELECT 1from one known table; use the same connection the profiling will use. - BigQuery:
INFORMATION_SCHEMA.TABLESfor the dataset, orSELECT 1from an allowed table in bq or client-approved SQL workspace.
If the user cannot run commands locally, they must confirm access another way (screenshot of successful query, or another engineer attests)—record that confirmation rather than inventing access.
4) Confirm access to the discovery database/schema (and data)
Checklist before moving to memo drafting:
- Platform (Snowflake vs BigQuery vs other) is written down.
- Database/schema or project/dataset matches what the memo will describe.
- Authentication succeeded for the chosen method.
- At least one
SELECTor metadata listing succeeded on the target scope (not a different account or sandbox by mistake).
If any step fails: stop. Document the error (permission denied, wrong project, MFA required). Ask the user to fix IAM/roles, network, or connection profile. Do not fabricate row counts or table lists.
Standard document structure
Use this order unless the engagement template forbids it:
- Title —
{Client} — Data source discovery memo(or engagement-specific title; avoid “Part 1 / Part 2” unless the client asked for a split). - Metadata block — Audience, source (share + access path), date. If the memo mixes time periods (for example December narrative plus April profiling), state that explicitly on one or two lines.
- §1 Executive summary — Short subsections such as: What this is, Why this matters now, Key findings, What you’ll find in this memo. Keep tight; no table dumps here.
- §2 Relationship & access — How data arrives (for example Private Share), implications for ETL, continuity risks if the vendor relationship changes.
- §3 Table-level documentation — Group by source domain (for example
Source: Walmart,Source: Geodis 3PL,Source: LMNT ERP). Within each domain, numbered subsections (3.1,3.2, …) one per table. - §3 subsection pattern (repeat)
### Summary metrics— small markdown table: row counts, distinct keys, date ranges, grain. Label query snapshot date where counts are time-sensitive.### Business description— what the object represents in business terms.### Interpretation— bullets: grain warnings, join keys, known duplication (for example snapshot-stacked history).### Questions— business questions this table helps answer.
- §4 Cross-table insights — Synthesis across domains (distribution, channel mix, etc.) only using tables already profiled.
- §5+ Deep dives / comparisons — Optional, engagement-specific.
- §6 Recommendations / next steps — Action-oriented, minimal hype.
Level of detail
- Executive summary: outcomes and scale, not every metric duplicated from §3.
- §3: enough metrics to justify interpretations; avoid reproducing full
INFORMATION_SCHEMAcolumn lists. - Interpretation: call out multi-retailer, snapshot vs fact, dedupe rules when row counts are huge.
Workflow
Step 1 — Warehouse platform and authentication gate
- Complete Warehouse platform and authentication gate above: confirm Snowflake vs BigQuery (or other), auth method, verified login/list/read, and correct database/schema or project/dataset.
Step 2 — Freeze scope and numbering
- Build a table list with stable §3 numbers. If net-new tables arrive later, append
3.12,3.13, … (do not renumber existing subsections in a published memo without an explicit versioning decision).
Step 3 — Validate in the warehouse
- Confirm schema exists; sample row counts, distinct keys, and date columns using the approved and tested connection from Step 1.
- Prefer one snapshot date in the memo for all new profiling (for example “queried YYYY-MM-DD”) to avoid mixed stale metrics.
Step 4 — Cross-check with internal audit notes
- If
knowledge/clients/{client}/resources/*SCHEMA_AUDIT*.md(or similar) exists, reconcile table names and domains; fix the audit doc or the memo if they disagree—do not ship contradictions.
Step 5 — Draft the narrative first, then metrics
- Write Business description and Interpretation before polishing numbers; numbers should support the story, not replace it.
Step 6 — Executive summary last (or tighten last)
- After §3 is stable, write Key findings and What you’ll find as faithful previews of §3–§4, not generic AI summaries.
Step 7 — Self-review with guardrails
- Run the Agent guardrails (anti-fluff) checklist below before sending.
Agent guardrails (executive style, human tone)
Do
- Prefer short sentences, neutral voice, and specific numbers with dates.
- Use markdown tables for metric bundles; use bullets for interpretations and questions.
- Name grain explicitly (order line vs invoice vs snapshot history).
- Flag caveats (estimate vs exact count, repeated snapshots inflating rows).
- Match the client’s vocabulary when known (program names, retailer names).
Do not
- Use filler openers (“In today’s data landscape…”, “It is important to note…”, “This comprehensive overview…”).
- Stack synonyms for importance (critical, robust, powerful, holistic) without adding fact.
- Add generic value claims (“drives actionable insights”) without tying to a decision or metric.
- Use em dashes as a crutch for long, meandering clauses—prefer periods.
- Invent row counts, schemas, or join keys; if unknown, write TBD and list what query will answer it.
- Rewrite the whole memo in a “friendlier” tone when the ask was factual discovery.
Quality check
- Would a director skim §1 in two minutes and know what landed and why it matters?
- Could an analyst use §3 to find grain and caveats without reading marketing language?
Failure modes / gotchas
- Skipped auth gate—memo cites metrics from a warehouse the author never reached; verify Step 1 before profiling.
- Wrong platform or project—Snowflake account vs BigQuery project confused; confirm in writing with the user.
- Profiled the wrong environment (dev vs prod schema)—confirm database/schema in the memo header.
- Huge row counts without interpretation—readers assume duplication or bugs; explain snapshot stacking or late-arriving rows.
- Renumbering §3 after release—breaks links and audit trails; append new subsections instead.
- Mixing vendor PDF language with warehouse facts—label each clearly.
Example implementation
knowledge/clients/lmnt/resources/EMERSON_DATA_SOURCE_DISCOVERY_MEMO.md— full-structure discovery memo with multiple source domains and appended net-new §3.xx blocks.
Related
knowledge/standards/02-writing/data-source-memo-executive-style-playbook.md— one-page executive memo (lighter than this playbook).knowledge/standards/02-writing/data-source-discovery-memo-update-playbook.md— additive updates to an existing discovery memo.knowledge/standards/02-writing/PLAYBOOK_SCAFFOLD.md- Human tone patterns:
.cursor/skills/humanizer/SKILL.md(optional pass for prose-heavy sections—not a substitute for factual discipline).