Marts Documentation Creation Workflow
Purpose
Create comprehensive documentation for dbt marts by extracting metadata from Snowflake tables. This workflow generates structured CSV documentation that includes table definitions, source mappings, key metrics, dimensions, and operational details.
Prerequisites
- Snowflake access with permissions to query
INFORMATION_SCHEMA - 1Password CLI installed and signed in (see
setup/1password-cli-setup.md) - Python 3.8+ with required packages:
pip install snowflake-connector-python pyyaml - Access to dbt project (for understanding source tables and model structure)
- Network access to Snowflake instance
Workflow Overview
- Query Snowflake for table metadata (columns, row counts, sizes)
- Map tables to dbt source definitions
- Extract key metrics and dimensions from table schemas
- Generate CSV documentation with structured format
- Review and validate output
Step-by-Step Procedure
1. Query Snowflake for Mart Tables
Query INFORMATION_SCHEMA to get table metadata:
SELECT
TABLE_SCHEMA,
TABLE_NAME,
ROW_COUNT,
BYTES,
LAST_ALTERED
FROM PROD_MARTS.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'WHOLESALE_CUSTOMERS'
ORDER BY TABLE_NAME;2. Get Column Details
For each table, query column information:
SELECT
COLUMN_NAME,
DATA_TYPE,
IS_NULLABLE,
COLUMN_DEFAULT
FROM PROD_MARTS.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'WHOLESALE_CUSTOMERS'
AND TABLE_NAME = '<table_name>'
ORDER BY ORDINAL_POSITION;3. Map to dbt Sources
Review dbt project to understand:
- Which source tables feed into each mart
- Intermediate models (
int_*) used in transformations - Business logic and transformations applied
Check dbt_project/models/raw/sources.yml for source definitions.
4. Categorize Columns
For each table, identify:
- Key Metrics: Numeric measures (revenue, counts, averages)
- Key Dimensions: Descriptive attributes (customer_id, date, segment)
- Grain: Level of detail (one row per customer, per order, etc.)
5. Create Documentation CSV
Generate CSV with the following columns:
| Column | Description | Example |
|---|---|---|
Marts | Table name | WHOLESALE_DIM_CUSTOMERS |
Definition | Business purpose and description | Final wholesale customer dimension table... |
Source Tables | dbt source/intermediate tables | int_shopify__wholesale_dim_customer, int_gs__wholesaler_detail |
Key Metrics | Comma-separated metric names | TOTAL_REVENUE, TOTAL_ORDERS, AVERAGE_ORDER_VALUE |
Key Dimensions | Comma-separated dimension names | CUSTOMER_ID, EMAIL, SEGMENT, CUSTOMER_STATE |
Grain | Level of detail | One row per wholesale customer |
Update Frequency | How often table is refreshed | Table (or Daily, Hourly) |
Schema | Full schema identifier | PROD_MARTS.WHOLESALE_CUSTOMERS |
Related Marts | Other related marts | (optional) |
Business Purpose | High-level business use case | Master dimension table for all wholesale customer analysis |
Usage Notes | Additional context | Row count: 14,217, Size: 10.87 MB |
6. Python Script Example
import snowflake.connector
import csv
from typing import List, Dict
def get_table_metadata(conn, schema: str) -> List[Dict]:
"""Query Snowflake for table metadata."""
cursor = conn.cursor()
query = f"""
SELECT
TABLE_NAME,
ROW_COUNT,
BYTES,
LAST_ALTERED
FROM PROD_MARTS.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = '{schema}'
ORDER BY TABLE_NAME
"""
cursor.execute(query)
return cursor.fetchall()
def get_table_columns(conn, schema: str, table: str) -> List[str]:
"""Get column names for a table."""
cursor = conn.cursor()
query = f"""
SELECT COLUMN_NAME
FROM PROD_MARTS.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '{schema}'
AND TABLE_NAME = '{table}'
ORDER BY ORDINAL_POSITION
"""
cursor.execute(query)
return [row[0] for row in cursor.fetchall()]
def categorize_columns(columns: List[str]) -> tuple:
"""Categorize columns into metrics and dimensions."""
# Simple heuristic: numeric-sounding names are metrics
metrics = [c for c in columns if any(x in c.upper() for x in ['COUNT', 'TOTAL', 'SUM', 'AVG', 'REVENUE', 'SPEND'])]
dimensions = [c for c in columns if c not in metrics and c not in ['DBT_UPDATED_AT', '_POLYTOMIC_']]
return metrics, dimensions
# Main workflow
conn = get_snowflake_connection(...)
tables = get_table_metadata(conn, 'WHOLESALE_CUSTOMERS')
with open('marts_documentation.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=[
'Marts', 'Definition', 'Source Tables', 'Key Metrics',
'Key Dimensions', 'Grain', 'Update Frequency', 'Schema',
'Related Marts', 'Business Purpose', 'Usage Notes'
])
writer.writeheader()
for table_name, row_count, bytes_size, last_altered in tables:
columns = get_table_columns(conn, 'WHOLESALE_CUSTOMERS', table_name)
metrics, dimensions = categorize_columns(columns)
writer.writerow({
'Marts': table_name,
'Definition': f'[Define based on dbt code and business context]',
'Source Tables': '[Map from dbt sources.yml]',
'Key Metrics': ', '.join(metrics),
'Key Dimensions': ', '.join(dimensions),
'Grain': '[Determine from table structure]',
'Update Frequency': 'Table',
'Schema': f'PROD_MARTS.WHOLESALE_CUSTOMERS',
'Related Marts': '',
'Business Purpose': '[Business use case]',
'Usage Notes': f'Row count: {row_count:,}, Size: {bytes_size / 1024 / 1024:.2f} MB'
})7. Review and Enhance
- Review definitions - Add business context from dbt model code
- Verify source tables - Cross-reference with dbt
sources.ymland model files - Validate metrics/dimensions - Ensure categorization is accurate
- Add business purpose - Document how each mart is used
- Include usage notes - Add row counts, sizes, update frequencies
Output Format
The CSV should be structured for easy consumption:
- Spreadsheet-friendly: Opens cleanly in Excel/Google Sheets
- Searchable: Column names enable filtering and sorting
- Complete: All relevant metadata included
- Validated: Data accuracy verified against Snowflake
Best Practices
- Query actual Snowflake metadata - Don’t rely on assumptions
- Review dbt code - Understand transformations and business logic
- Categorize carefully - Metrics vs dimensions affects downstream usage
- Document grain clearly - Critical for understanding table usage
- Include operational details - Size, row counts help with performance planning
- Keep definitions concise - But include enough context for understanding
Common Patterns
Dimension Tables
- Grain: One row per entity (customer, product, etc.)
- Key Dimensions: ID, name, attributes
- Key Metrics: Aggregated measures per entity
Fact Tables
- Grain: One row per event (order, transaction, etc.)
- Key Dimensions: Foreign keys, date, status
- Key Metrics: Amounts, counts, quantities
Summary Tables
- Grain: Aggregated level (daily, weekly, by segment)
- Key Dimensions: Grouping attributes
- Key Metrics: Aggregated measures
Troubleshooting
“No tables found”
- Verify schema name is correct (case-sensitive)
- Check role has access to schema
- Query
SHOW SCHEMAS IN DATABASE PROD_MARTS;
“Cannot determine grain”
- Review table structure and sample data
- Check dbt model code for comments
- Query
SELECT * FROM table LIMIT 10to understand structure
“Source tables unclear”
- Review dbt
sources.ymlfor source definitions - Trace
source()→ref()→ mart dependencies - Check intermediate model files (
int_*)
Related Documentation
- Snowflake Table Profiling: See
snowflake-table-profiling.md - Metrics Dictionary Creation: See
metrics-dictionary-creation.md - 1Password CLI Setup: See
setup/1password-cli-setup.md