Playbook: Data source discovery memo (new share or major new coverage)

Version: 0.2
Last updated: 2026-04-10
Audience: LLM agents and humans drafting a leadership-ready discovery memo for a new share or schema in Snowflake, BigQuery, or another agreed warehouse—plus a large net-new set of tables to profile.


Purpose

Produce a single, linear document that explains what landed in the warehouse, why it matters, and how to use it—at a table-aware level of detail (not a one-pager). Standardize structure, depth, and tone so outputs read like a concise executive memo written by a human, not a generic AI overview.


When to use

Use this playbook when:

  • a new private share or new schema appears for a client and leadership needs a written baseline
  • many new tables (for example dozens) need to be introduced in one memo with consistent subsections
  • the ask is discovery / profiling / interpretation, not a full dimensional model or dbt project design

Do not use this playbook when:

  • the user only needs a one-page overview (use data-source-memo-executive-style-playbook.md instead)
  • the task is a technical schema audit with full column dictionaries and no narrative (use a schema audit doc pattern; link from the memo instead)

Service line / subservice

FieldValue
Service lineData Platform
Primary subserviceAnalytics and BI (memo narrative); Data Infrastructure (warehouse validation)

Approval-before-execution pipeline

  • Drafting: No approval required to draft in knowledge/clients/{client}/resources/ (or the client repo if that is where deliverables live).
  • Publishing externally: Get owner sign-off on facts, figures, and any client-facing framing.
  • Source systems of record: Do not edit the client’s or vendor’s canonical Google Doc / wiki unless explicitly asked; create a new file or doc for Brainforge-led memos when that is the engagement rule.

Scope

In scope

  • Standard memo skeleton (sections 1–7 style: executive summary → relationship to data → table documentation → cross-table → deep dive → recommendations → appendix-style notes as needed)
  • Per-table blocks for each profiled object: metrics table, business description, interpretation, questions
  • Warehouse validation steps on the confirmed platform (Snowflake, BigQuery, etc.) to ground counts and grains
  • Executive style rules and agent guardrails (anti-fluff)

Out of scope

  • Building dbt models, dashboards, or row-level QA fixes
  • Replacing a data contract or legal agreement

Prerequisites

RequirementNotes
Platform and location confirmedAsk explicitly: Snowflake vs BigQuery (or other). Record account / region / project, database or dataset, and schema (or equivalent) for discovery.
Auth method agreedUser or engagement owner chooses how to authenticate (see Warehouse platform and authentication gate below). Never assume credentials.
Access verifiedRead access proven with a minimal query before drafting metrics—see gate section.
Scope listWhich tables are in scope; avoid profiling tables nobody asked for
Snapshot disciplineRecord query date and whether counts are estimates (row estimates vs exact counts)

Inputs

InputExampleRequired
Client / program nameLMNTYes
Warehouse platformSnowflake / BigQuery / otherYes
Discovery scopeSnowflake: database + schema (and share name if applicable); BigQuery: project + datasetYes
Prior artVault schema audit markdown, vendor deck, stakeholder call notesAs available
Audience lineLeadership, Analytics, EngineeringYes

Warehouse platform and authentication gate (do this first)

Do not draft §3 metrics or claim live row counts until this section is satisfied.

1) Ask where the data lives

Ask the user (or read the ticket) explicitly:

  • Which system: Snowflake, BigQuery, or something else?
  • Exact scope for discovery: database + schema (Snowflake), or project + dataset (BigQuery), including environment (prod vs dev) if both exist.

Write the answers into working notes (ticket or draft memo metadata). Do not guess from filenames alone.

2) Confirm how to authenticate (user’s choice)

The user or engagement owner decides the method—confirm in chat or ticket before running tools:

PlatformCommon options (examples—not an exhaustive list)
SnowflakeSnow CLI (snow connection test / snow sql -c <connectionName>), browser SSO through CLI, key-pair user, service / automation user per client policy
BigQuerygcloud auth application-default login, service account JSON (path or 1Password reference), workload identity—per client policy
OtherFollow client runbook; if none exists, ask

Rules

  • Do not paste secrets, private keys, or full JSON key contents into the memo or chat logs.
  • Prefer 1Password CLI or env vars per knowledge/standards/03-knowledge/engineering/setup/1password-cli-setup.md when the engagement stores credentials there.

3) Verify auth and reachability

Run a minimal read that proves the identity can see the target objects, for example:

  • Snowflake: SHOW DATABASES / SHOW SCHEMAS IN DATABASE … / SHOW TABLES IN SCHEMA …, or SELECT 1 from one known table; use the same connection the profiling will use.
  • BigQuery: INFORMATION_SCHEMA.TABLES for the dataset, or SELECT 1 from an allowed table in bq or client-approved SQL workspace.

If the user cannot run commands locally, they must confirm access another way (screenshot of successful query, or another engineer attests)—record that confirmation rather than inventing access.

4) Confirm access to the discovery database/schema (and data)

Checklist before moving to memo drafting:

  • Platform (Snowflake vs BigQuery vs other) is written down.
  • Database/schema or project/dataset matches what the memo will describe.
  • Authentication succeeded for the chosen method.
  • At least one SELECT or metadata listing succeeded on the target scope (not a different account or sandbox by mistake).

If any step fails: stop. Document the error (permission denied, wrong project, MFA required). Ask the user to fix IAM/roles, network, or connection profile. Do not fabricate row counts or table lists.


Standard document structure

Use this order unless the engagement template forbids it:

  1. Title{Client} — Data source discovery memo (or engagement-specific title; avoid “Part 1 / Part 2” unless the client asked for a split).
  2. Metadata block — Audience, source (share + access path), date. If the memo mixes time periods (for example December narrative plus April profiling), state that explicitly on one or two lines.
  3. §1 Executive summary — Short subsections such as: What this is, Why this matters now, Key findings, What you’ll find in this memo. Keep tight; no table dumps here.
  4. §2 Relationship & access — How data arrives (for example Private Share), implications for ETL, continuity risks if the vendor relationship changes.
  5. §3 Table-level documentation — Group by source domain (for example Source: Walmart, Source: Geodis 3PL, Source: LMNT ERP). Within each domain, numbered subsections (3.1, 3.2, …) one per table.
  6. §3 subsection pattern (repeat)
    • ### Summary metrics — small markdown table: row counts, distinct keys, date ranges, grain. Label query snapshot date where counts are time-sensitive.
    • ### Business description — what the object represents in business terms.
    • ### Interpretation — bullets: grain warnings, join keys, known duplication (for example snapshot-stacked history).
    • ### Questions — business questions this table helps answer.
  7. §4 Cross-table insights — Synthesis across domains (distribution, channel mix, etc.) only using tables already profiled.
  8. §5+ Deep dives / comparisons — Optional, engagement-specific.
  9. §6 Recommendations / next steps — Action-oriented, minimal hype.

Level of detail

  • Executive summary: outcomes and scale, not every metric duplicated from §3.
  • §3: enough metrics to justify interpretations; avoid reproducing full INFORMATION_SCHEMA column lists.
  • Interpretation: call out multi-retailer, snapshot vs fact, dedupe rules when row counts are huge.

Workflow

Step 1 — Warehouse platform and authentication gate

  • Complete Warehouse platform and authentication gate above: confirm Snowflake vs BigQuery (or other), auth method, verified login/list/read, and correct database/schema or project/dataset.

Step 2 — Freeze scope and numbering

  • Build a table list with stable §3 numbers. If net-new tables arrive later, append 3.12, 3.13, … (do not renumber existing subsections in a published memo without an explicit versioning decision).

Step 3 — Validate in the warehouse

  • Confirm schema exists; sample row counts, distinct keys, and date columns using the approved and tested connection from Step 1.
  • Prefer one snapshot date in the memo for all new profiling (for example “queried YYYY-MM-DD”) to avoid mixed stale metrics.

Step 4 — Cross-check with internal audit notes

  • If knowledge/clients/{client}/resources/*SCHEMA_AUDIT*.md (or similar) exists, reconcile table names and domains; fix the audit doc or the memo if they disagree—do not ship contradictions.

Step 5 — Draft the narrative first, then metrics

  • Write Business description and Interpretation before polishing numbers; numbers should support the story, not replace it.

Step 6 — Executive summary last (or tighten last)

  • After §3 is stable, write Key findings and What you’ll find as faithful previews of §3–§4, not generic AI summaries.

Step 7 — Self-review with guardrails

  • Run the Agent guardrails (anti-fluff) checklist below before sending.

Agent guardrails (executive style, human tone)

Do

  • Prefer short sentences, neutral voice, and specific numbers with dates.
  • Use markdown tables for metric bundles; use bullets for interpretations and questions.
  • Name grain explicitly (order line vs invoice vs snapshot history).
  • Flag caveats (estimate vs exact count, repeated snapshots inflating rows).
  • Match the client’s vocabulary when known (program names, retailer names).

Do not

  • Use filler openers (“In today’s data landscape…”, “It is important to note…”, “This comprehensive overview…”).
  • Stack synonyms for importance (critical, robust, powerful, holistic) without adding fact.
  • Add generic value claims (“drives actionable insights”) without tying to a decision or metric.
  • Use em dashes as a crutch for long, meandering clauses—prefer periods.
  • Invent row counts, schemas, or join keys; if unknown, write TBD and list what query will answer it.
  • Rewrite the whole memo in a “friendlier” tone when the ask was factual discovery.

Quality check

  • Would a director skim §1 in two minutes and know what landed and why it matters?
  • Could an analyst use §3 to find grain and caveats without reading marketing language?

Failure modes / gotchas

  • Skipped auth gate—memo cites metrics from a warehouse the author never reached; verify Step 1 before profiling.
  • Wrong platform or project—Snowflake account vs BigQuery project confused; confirm in writing with the user.
  • Profiled the wrong environment (dev vs prod schema)—confirm database/schema in the memo header.
  • Huge row counts without interpretation—readers assume duplication or bugs; explain snapshot stacking or late-arriving rows.
  • Renumbering §3 after release—breaks links and audit trails; append new subsections instead.
  • Mixing vendor PDF language with warehouse facts—label each clearly.

Example implementation

  • knowledge/clients/lmnt/resources/EMERSON_DATA_SOURCE_DISCOVERY_MEMO.md — full-structure discovery memo with multiple source domains and appended net-new §3.xx blocks.

  • knowledge/standards/02-writing/data-source-memo-executive-style-playbook.mdone-page executive memo (lighter than this playbook).
  • knowledge/standards/02-writing/data-source-discovery-memo-update-playbook.mdadditive updates to an existing discovery memo.
  • knowledge/standards/02-writing/PLAYBOOK_SCAFFOLD.md
  • Human tone patterns: .cursor/skills/humanizer/SKILL.md (optional pass for prose-heavy sections—not a substitute for factual discipline).