Marts Documentation Creation Workflow

Purpose

Create comprehensive documentation for dbt marts by extracting metadata from Snowflake tables. This workflow generates structured CSV documentation that includes table definitions, source mappings, key metrics, dimensions, and operational details.

Prerequisites

Snowflake access with permissions to query INFORMATION_SCHEMA
1Password CLI installed and signed in (see setup/1password-cli-setup.md)

Python 3.8+ with required packages:

pip install snowflake-connector-python pyyaml

Access to dbt project (for understanding source tables and model structure)
Network access to Snowflake instance

Workflow Overview

Query Snowflake for table metadata (columns, row counts, sizes)
Map tables to dbt source definitions
Extract key metrics and dimensions from table schemas
Generate CSV documentation with structured format
Review and validate output

Step-by-Step Procedure

1. Query Snowflake for Mart Tables

Query INFORMATION_SCHEMA to get table metadata:

SELECT 
    TABLE_SCHEMA,
    TABLE_NAME,
    ROW_COUNT,
    BYTES,
    LAST_ALTERED
FROM PROD_MARTS.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'WHOLESALE_CUSTOMERS'
ORDER BY TABLE_NAME;

2. Get Column Details

For each table, query column information:

SELECT 
    COLUMN_NAME,
    DATA_TYPE,
    IS_NULLABLE,
    COLUMN_DEFAULT
FROM PROD_MARTS.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'WHOLESALE_CUSTOMERS'
  AND TABLE_NAME = '<table_name>'
ORDER BY ORDINAL_POSITION;

3. Map to dbt Sources

Review dbt project to understand:

Which source tables feed into each mart
Intermediate models (int_*) used in transformations
Business logic and transformations applied

Check dbt_project/models/raw/sources.yml for source definitions.

4. Categorize Columns

For each table, identify:

Key Metrics: Numeric measures (revenue, counts, averages)
Key Dimensions: Descriptive attributes (customer_id, date, segment)
Grain: Level of detail (one row per customer, per order, etc.)

5. Create Documentation CSV

Generate CSV with the following columns:

Column	Description	Example
`Marts`	Table name	`WHOLESALE_DIM_CUSTOMERS`
`Definition`	Business purpose and description	`Final wholesale customer dimension table...`
`Source Tables`	dbt source/intermediate tables	`int_shopify__wholesale_dim_customer, int_gs__wholesaler_detail`
`Key Metrics`	Comma-separated metric names	`TOTAL_REVENUE, TOTAL_ORDERS, AVERAGE_ORDER_VALUE`
`Key Dimensions`	Comma-separated dimension names	`CUSTOMER_ID, EMAIL, SEGMENT, CUSTOMER_STATE`
`Grain`	Level of detail	`One row per wholesale customer`
`Update Frequency`	How often table is refreshed	`Table` (or `Daily`, `Hourly`)
`Schema`	Full schema identifier	`PROD_MARTS.WHOLESALE_CUSTOMERS`
`Related Marts`	Other related marts	(optional)
`Business Purpose`	High-level business use case	`Master dimension table for all wholesale customer analysis`
`Usage Notes`	Additional context	`Row count: 14,217, Size: 10.87 MB`

6. Python Script Example

import snowflake.connector
import csv
from typing import List, Dict
 
def get_table_metadata(conn, schema: str) -> List[Dict]:
    """Query Snowflake for table metadata."""
    cursor = conn.cursor()
    
    query = f"""
    SELECT 
        TABLE_NAME,
        ROW_COUNT,
        BYTES,
        LAST_ALTERED
    FROM PROD_MARTS.INFORMATION_SCHEMA.TABLES
    WHERE TABLE_SCHEMA = '{schema}'
    ORDER BY TABLE_NAME
    """
    
    cursor.execute(query)
    return cursor.fetchall()
 
def get_table_columns(conn, schema: str, table: str) -> List[str]:
    """Get column names for a table."""
    cursor = conn.cursor()
    
    query = f"""
    SELECT COLUMN_NAME
    FROM PROD_MARTS.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = '{schema}'
      AND TABLE_NAME = '{table}'
    ORDER BY ORDINAL_POSITION
    """
    
    cursor.execute(query)
    return [row[0] for row in cursor.fetchall()]
 
def categorize_columns(columns: List[str]) -> tuple:
    """Categorize columns into metrics and dimensions."""
    # Simple heuristic: numeric-sounding names are metrics
    metrics = [c for c in columns if any(x in c.upper() for x in ['COUNT', 'TOTAL', 'SUM', 'AVG', 'REVENUE', 'SPEND'])]
    dimensions = [c for c in columns if c not in metrics and c not in ['DBT_UPDATED_AT', '_POLYTOMIC_']]
    
    return metrics, dimensions
 
# Main workflow
conn = get_snowflake_connection(...)
tables = get_table_metadata(conn, 'WHOLESALE_CUSTOMERS')
 
with open('marts_documentation.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=[
        'Marts', 'Definition', 'Source Tables', 'Key Metrics', 
        'Key Dimensions', 'Grain', 'Update Frequency', 'Schema',
        'Related Marts', 'Business Purpose', 'Usage Notes'
    ])
    writer.writeheader()
    
    for table_name, row_count, bytes_size, last_altered in tables:
        columns = get_table_columns(conn, 'WHOLESALE_CUSTOMERS', table_name)
        metrics, dimensions = categorize_columns(columns)
        
        writer.writerow({
            'Marts': table_name,
            'Definition': f'[Define based on dbt code and business context]',
            'Source Tables': '[Map from dbt sources.yml]',
            'Key Metrics': ', '.join(metrics),
            'Key Dimensions': ', '.join(dimensions),
            'Grain': '[Determine from table structure]',
            'Update Frequency': 'Table',
            'Schema': f'PROD_MARTS.WHOLESALE_CUSTOMERS',
            'Related Marts': '',
            'Business Purpose': '[Business use case]',
            'Usage Notes': f'Row count: {row_count:,}, Size: {bytes_size / 1024 / 1024:.2f} MB'
        })

7. Review and Enhance

Review definitions - Add business context from dbt model code
Verify source tables - Cross-reference with dbt sources.yml and model files
Validate metrics/dimensions - Ensure categorization is accurate
Add business purpose - Document how each mart is used
Include usage notes - Add row counts, sizes, update frequencies

Output Format

The CSV should be structured for easy consumption:

Spreadsheet-friendly: Opens cleanly in Excel/Google Sheets
Searchable: Column names enable filtering and sorting
Complete: All relevant metadata included
Validated: Data accuracy verified against Snowflake

Best Practices

Query actual Snowflake metadata - Don’t rely on assumptions
Review dbt code - Understand transformations and business logic
Categorize carefully - Metrics vs dimensions affects downstream usage
Document grain clearly - Critical for understanding table usage
Include operational details - Size, row counts help with performance planning
Keep definitions concise - But include enough context for understanding

Common Patterns

Dimension Tables

Grain: One row per entity (customer, product, etc.)
Key Dimensions: ID, name, attributes
Key Metrics: Aggregated measures per entity

Fact Tables

Grain: One row per event (order, transaction, etc.)
Key Dimensions: Foreign keys, date, status
Key Metrics: Amounts, counts, quantities

Summary Tables

Grain: Aggregated level (daily, weekly, by segment)
Key Dimensions: Grouping attributes
Key Metrics: Aggregated measures

Troubleshooting

“No tables found”

Verify schema name is correct (case-sensitive)
Check role has access to schema
Query SHOW SCHEMAS IN DATABASE PROD_MARTS;

“Cannot determine grain”

Review table structure and sample data
Check dbt model code for comments
Query SELECT * FROM table LIMIT 10 to understand structure

“Source tables unclear”

Review dbt sources.yml for source definitions
Trace source() → ref() → mart dependencies
Check intermediate model files (int_*)

Snowflake Table Profiling: See snowflake-table-profiling.md
Metrics Dictionary Creation: See metrics-dictionary-creation.md
1Password CLI Setup: See setup/1password-cli-setup.md

Brainforge Knowledge

Explorer

marts-documentation-creation

Marts Documentation Creation Workflow

Purpose

Prerequisites

Workflow Overview

Step-by-Step Procedure

1. Query Snowflake for Mart Tables

2. Get Column Details

3. Map to dbt Sources

4. Categorize Columns

5. Create Documentation CSV

6. Python Script Example

7. Review and Enhance

Output Format

Best Practices

Common Patterns

Dimension Tables

Fact Tables

Summary Tables

Troubleshooting

References

Graph View

Table of Contents

Brainforge Knowledge

Explorer

marts-documentation-creation

Marts Documentation Creation Workflow

Purpose

Prerequisites

Workflow Overview

Step-by-Step Procedure

1. Query Snowflake for Mart Tables

2. Get Column Details

3. Map to dbt Sources

4. Categorize Columns

5. Create Documentation CSV

6. Python Script Example

7. Review and Enhance

Output Format

Best Practices

Common Patterns

Dimension Tables

Fact Tables

Summary Tables

Troubleshooting

Related Documentation

References

Graph View

Table of Contents