Standard Operating Procedure (SOP)

Title: Snowflake Governance, RBAC, and DBT Setup with GitHub Actions Version: 1.0
Date: February 3, 2026
Owner: CTA Data Team + Brainforge Data Engineering

1. Purpose

This SOP standardizes the governance structure, role-based access control (RBAC), schema organization, warehouse configuration, and CI/CD deployment workflows for Snowflake data infrastructure at CTA. It establishes patterns for DBT project organization and GitHub Actions automation to ensure consistent, maintainable, and secure data operations.

The document defines how to:

Structure database roles and manage access permissions
Organize schemas and warehouses
Configure DBT projects for multiple environments
Automate deployments using GitHub Actions
Monitor and audit access patterns

2. Scope

Applies to:

Snowflake RBAC configuration and user provisioning
Schema design and ownership (RAW, STAGING, MARTS layers)
Warehouse sizing and configuration
DBT project structure and deployment
GitHub Actions CI/CD pipelines for data transformations
Access provisioning and revocation workflows

Does not apply to:

Source system data extraction pipelines (handled separately)
BI tool configurations beyond Snowflake access
Ad-hoc analysis queries (covered by read-only access)

3. Definitions

RBAC (Role-Based Access Control): Permission model where access is granted through roles rather than directly to users
Environment: Isolated database context (DEV, PREP, PROD) for different stages of data development
Schema: Logical grouping of database objects (tables, views) within a database
Warehouse: Compute cluster in Snowflake that executes queries
DBT (data build tool): Transformation framework for building data models using SQL
Target: DBT configuration that determines which environment to deploy to
Service Account: Non-human user for automated processes (GitHub Actions, ETL tools)
Least Privilege: Security principle of granting minimum permissions necessary

4. Prerequisites

Before implementing this governance structure:

Snowflake account admin access for role and warehouse creation
GitHub repository access with permissions to configure Actions
1Password or equivalent for secure credential storage
DBT Core installed locally for development (version 1.5+)
Service account created in Snowflake for GitHub Actions
RSA key pair generated for service account authentication

5. Responsibilities

CTA Data Team:

Request access to appropriate roles and schemas
Develop and test DBT models in DEV environment
Submit pull requests following review process
Monitor data quality and pipeline health
Update documentation for schema changes

Brainforge Data Engineering:

Configure and maintain RBAC role hierarchy
Provision new users and service accounts
Review and merge infrastructure changes
Monitor warehouse usage and costs
Support escalations for access or deployment issues

Shared:

Code review for DBT model changes
Data quality testing and validation
Documentation of schema and business logic

6. Step-by-Step Procedure

6.1 RBAC Model and Role Hierarchy

Role Structure

Snowflake roles follow a hierarchy from least to most privileged:

ACCOUNTADMIN (top-level admin, use sparingly)
  └── SYSADMIN (system administration)
       └── DATA_ENGINEER (cross-environment admin)
            ├── PROD_WRITE (production write access)
            │    └── PROD_READ (production read access)
            ├── PREP_WRITE (staging write access)
            │    └── PREP_READ (staging read access)
            └── DEV_WRITE (development write access)
                 └── DEV_READ (development read access)

Role Definitions

Environment-Level Roles (per environment: DEV, PREP, PROD):

{ENV}_READ: Read-only access to all schemas in environment
- Granted: SELECT on all tables/views in database
- Use case: Analysts, BI tools, downstream consumers
{ENV}_WRITE: Full CRUD access within environment
- Granted: CREATE, INSERT, UPDATE, DELETE on schemas
- Use case: DBT service accounts, data engineers during development

Team Roles:

ANALYST: Aggregate role with read access to PREP and PROD
- Reduces friction for team members who need broad visibility
- No write permissions to prevent accidental changes
DATA_ENGINEER: Aggregate role with write access to DEV and PREP, read to PROD
- Development and testing in lower environments
- Read-only production access for troubleshooting

Service Roles:

DBT_SERVICE: Automated transformation execution
- WRITE access to PREP and PROD (environment-specific service accounts recommended)
- Used by GitHub Actions for deployments
ETL_SERVICE: Data ingestion processes
- WRITE access to RAW schema only
- Separate from transformation roles for security

Role Grant Script

-- Create environment-specific read roles
CREATE ROLE IF NOT EXISTS DEV_READ;
CREATE ROLE IF NOT EXISTS PREP_READ;
CREATE ROLE IF NOT EXISTS PROD_READ;
 
-- Create environment-specific write roles
CREATE ROLE IF NOT EXISTS DEV_WRITE;
CREATE ROLE IF NOT EXISTS PREP_WRITE;
CREATE ROLE IF NOT EXISTS PROD_WRITE;
 
-- Create team aggregate roles
CREATE ROLE IF NOT EXISTS ANALYST;
CREATE ROLE IF NOT EXISTS DATA_ENGINEER;
 
-- Create service roles
CREATE ROLE IF NOT EXISTS DBT_SERVICE;
CREATE ROLE IF NOT EXISTS ETL_SERVICE;
 
-- Grant role hierarchy (read roles to write roles)
GRANT ROLE DEV_READ TO ROLE DEV_WRITE;
GRANT ROLE PREP_READ TO ROLE PREP_WRITE;
GRANT ROLE PROD_READ TO ROLE PROD_WRITE;
 
-- Grant to aggregate roles
GRANT ROLE PREP_READ TO ROLE ANALYST;
GRANT ROLE PROD_READ TO ROLE ANALYST;
 
GRANT ROLE DEV_WRITE TO ROLE DATA_ENGINEER;
GRANT ROLE PREP_WRITE TO ROLE DATA_ENGINEER;
GRANT ROLE PROD_READ TO ROLE DATA_ENGINEER;
 
-- Grant service role permissions
GRANT ROLE PREP_WRITE TO ROLE DBT_SERVICE;
GRANT ROLE PROD_WRITE TO ROLE DBT_SERVICE;
 
-- Grant to SYSADMIN for admin management
GRANT ROLE DATA_ENGINEER TO ROLE SYSADMIN;
GRANT ROLE DBT_SERVICE TO ROLE SYSADMIN;
GRANT ROLE ETL_SERVICE TO ROLE SYSADMIN;

6.2 Schema Strategy and Organization

Three-Layer Architecture

Following DBT best practices, organize schemas into three primary layers:

1. RAW Layer

Purpose: Source-conformed data, minimal transformation
Schema naming: RAW.{SOURCE_SYSTEM} (e.g., RAW.REMEMBERS, RAW.POLYTOMIC_SHOPIFY)
Ownership: ETL_SERVICE role
Access: READ for all team roles, WRITE for ETL_SERVICE only
Materialization: Tables (loaded by external tools)

2. STAGING Layer

Purpose: Cleaned, renamed, type-cast data from sources
Schema naming: {ENV}.STAGING or {ENV}.STG_{SOURCE} for source-specific staging
Ownership: DBT_SERVICE role
Access: WRITE for DBT_SERVICE, READ for team roles
Materialization: Views or ephemeral (lightweight transformations)

3. MARTS Layer

Purpose: Business-conformed, wide tables for analytics
Schema naming: {ENV}.{DOMAIN} (e.g., PROD.FINANCE, PROD.MEMBERSHIP)
Ownership: DBT_SERVICE role
Access: WRITE for DBT_SERVICE, READ for team roles
Materialization: Tables or incremental models (for large datasets)

Schema-Level Grants

-- RAW layer: ETL writes, everyone reads
GRANT USAGE ON DATABASE RAW TO ROLE ANALYST;
GRANT USAGE ON SCHEMA RAW.REMEMBERS TO ROLE ANALYST;
GRANT SELECT ON ALL TABLES IN SCHEMA RAW.REMEMBERS TO ROLE ANALYST;
GRANT SELECT ON FUTURE TABLES IN SCHEMA RAW.REMEMBERS TO ROLE ANALYST;
 
-- PREP layer: DBT writes, team reads
GRANT USAGE ON DATABASE PREP TO ROLE ANALYST;
GRANT USAGE ON ALL SCHEMAS IN DATABASE PREP TO ROLE ANALYST;
GRANT SELECT ON ALL TABLES IN DATABASE PREP TO ROLE ANALYST;
GRANT SELECT ON FUTURE TABLES IN DATABASE PREP TO ROLE ANALYST;
 
-- PROD layer: DBT writes, team reads
GRANT USAGE ON DATABASE PROD TO ROLE ANALYST;
GRANT USAGE ON ALL SCHEMAS IN DATABASE PROD TO ROLE ANALYST;
GRANT SELECT ON ALL TABLES IN DATABASE PROD TO ROLE ANALYST;
GRANT SELECT ON FUTURE TABLES IN DATABASE PROD TO ROLE ANALYST;

6.3 Warehouse Strategy and Configuration

Warehouse Sizing

Create dedicated warehouses per environment and workload type:

Warehouse Name	Size	Use Case	Auto-Suspend	Max Clusters
DEV_XS	X-Small	Development, testing	5 min	1
PREP_S	Small	Staging builds, CI checks	5 min	1
PROD_M	Medium	Production transformations	10 min	2
PROD_L	Large	Large model builds (as needed)	5 min	1
LOADING_XS	X-Small	Data extraction, small loads	2 min	1

Warehouse Creation Script

-- Development warehouse
CREATE WAREHOUSE IF NOT EXISTS DEV_XS
  WAREHOUSE_SIZE = 'X-SMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE
  COMMENT = 'Development and testing warehouse';
 
-- Staging warehouse
CREATE WAREHOUSE IF NOT EXISTS PREP_S
  WAREHOUSE_SIZE = 'SMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE
  COMMENT = 'Staging builds and CI pipeline';
 
-- Production warehouse
CREATE WAREHOUSE IF NOT EXISTS PROD_M
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 600
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 2
  SCALING_POLICY = 'STANDARD'
  COMMENT = 'Production transformation runs';
 
-- Loading warehouse
CREATE WAREHOUSE IF NOT EXISTS LOADING_XS
  WAREHOUSE_SIZE = 'X-SMALL'
  AUTO_SUSPEND = 120
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE
  COMMENT = 'ETL ingestion processes';

Warehouse Access Grants

-- Grant warehouse usage to appropriate roles
GRANT USAGE ON WAREHOUSE DEV_XS TO ROLE DEV_WRITE;
GRANT USAGE ON WAREHOUSE PREP_S TO ROLE PREP_WRITE;
GRANT USAGE ON WAREHOUSE PROD_M TO ROLE PROD_WRITE;
GRANT USAGE ON WAREHOUSE LOADING_XS TO ROLE ETL_SERVICE;
 
-- Service accounts need warehouse usage
GRANT USAGE ON WAREHOUSE PREP_S TO ROLE DBT_SERVICE;
GRANT USAGE ON WAREHOUSE PROD_M TO ROLE DBT_SERVICE;

Cost Control Recommendations

Start with smaller warehouses and scale up only when needed
Monitor query times and spilling (bytes spilled to local/remote storage)
Use auto-suspend aggressively (2-5 minutes for infrequent workloads)
Review warehouse usage weekly using WAREHOUSE_METERING_HISTORY view
Consider scheduling large jobs during off-peak hours

6.4 User and Service Account Provisioning

Adding a New User

Create Snowflake user:

CREATE USER IF NOT EXISTS john_smith
  PASSWORD = '<temp-password>'
  DEFAULT_ROLE = ANALYST
  DEFAULT_WAREHOUSE = PREP_S
  MUST_CHANGE_PASSWORD = TRUE
  COMMENT = 'CTA Data Analyst';

Grant appropriate team role:

-- For analysts (read-only)
GRANT ROLE ANALYST TO USER john_smith;
 
-- For data engineers (dev + prep write, prod read)
GRANT ROLE DATA_ENGINEER TO USER john_smith;

Provide onboarding information:
- Snowflake account URL
- Temporary password (via secure channel)
- Link to this SOP for context
- Snowflake CLI setup instructions

Creating a Service Account

Generate RSA key pair:

mkdir -p ~/.snowflake-keys
cd ~/.snowflake-keys
 
# Generate private key
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out dbt_service_key.p8 -nocrypt
 
# Generate public key
openssl rsa -in dbt_service_key.p8 -pubout -out dbt_service_key.pub

Extract public key content:

# Remove header/footer and join lines
grep -v "BEGIN PUBLIC KEY" dbt_service_key.pub | grep -v "END PUBLIC KEY" | tr -d '\n'

Create service user with public key:

CREATE USER IF NOT EXISTS dbt_service
  DEFAULT_ROLE = DBT_SERVICE
  DEFAULT_WAREHOUSE = PREP_S
  RSA_PUBLIC_KEY = '<public-key-content>'
  COMMENT = 'DBT GitHub Actions service account';
 
GRANT ROLE DBT_SERVICE TO USER dbt_service;

Store private key securely:
- Add to 1Password as secure note
- Add to GitHub repository secrets (for Actions)
- Never commit private key to version control

Removing Access

When a user leaves or changes roles:

Revoke role grants:

-- Revoke all roles
REVOKE ROLE ANALYST FROM USER john_smith;
REVOKE ROLE DATA_ENGINEER FROM USER john_smith;

Disable or drop user:

-- Disable (preserves audit history)
ALTER USER john_smith SET DISABLED = TRUE;
 
-- Or drop entirely (after audit period)
DROP USER IF EXISTS john_smith;

6.5 DBT Project Structure and Configuration

Project Organization

Following DBT best practices, organize the project using a modular, layered approach:

cta-dbt/
├── dbt_project.yml
├── packages.yml
├── profiles.yml (local only, not committed)
├── models/
│   ├── staging/
│   │   ├── remembers/
│   │   │   ├── _remembers__sources.yml
│   │   │   ├── _remembers__models.yml
│   │   │   ├── stg_remembers__customers.sql
│   │   │   └── stg_remembers__organizations.sql
│   │   └── shopify/
│   │       ├── _shopify__sources.yml
│   │       ├── stg_shopify__orders.sql
│   │       └── stg_shopify__customers.sql
│   ├── intermediate/
│   │   └── membership/
│   │       ├── _int_membership__models.yml
│   │       └── int_member_engagement.sql
│   └── marts/
│       ├── membership/
│       │   ├── _membership__models.yml
│       │   ├── dim_organizations.sql
│       │   └── fct_member_engagement.sql
│       └── finance/
│           ├── _finance__models.yml
│           └── fct_payments.sql
├── macros/
│   └── generate_warehouse_name.sql
├── tests/
│   └── assert_organization_matching.sql
└── seeds/
    └── organization_aliases.csv

Environment Configuration (dbt_project.yml)

name: 'cta_analytics'
version: '1.0.0'
config-version: 2
 
profile: 'cta_snowflake'
 
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
 
target-path: "target"
clean-targets: ["target", "dbt_packages"]
 
models:
  cta_analytics:
    +materialized: view
    
    staging:
      +materialized: view
      +schema: staging
      
    intermediate:
      +materialized: ephemeral
      +schema: intermediate
      
    marts:
      +materialized: table
      +schema: marts
      membership:
        +schema: membership
      finance:
        +schema: finance
 
vars:
  # Environment-specific variables
  dev_db_object_expiration: 80  # Days until dev objects are dropped
  
  # Source system variables
  current_date: "CURRENT_DATE()"

Profile Configuration (profiles.yml)

Store locally in ~/.dbt/profiles.yml:

cta_snowflake:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: YOUR_ACCOUNT_ID
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      authenticator: externalbrowser  # SSO authentication
      role: DEV_WRITE
      database: DEV
      warehouse: DEV_XS
      schema: staging
      threads: 4
      
    prep:
      type: snowflake
      account: YOUR_ACCOUNT_ID
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      authenticator: externalbrowser
      role: PREP_WRITE
      database: PREP
      warehouse: PREP_S
      schema: staging
      threads: 8
      
    prod:
      type: snowflake
      account: YOUR_ACCOUNT_ID
      user: dbt_service
      role: DBT_SERVICE
      database: PROD
      warehouse: PROD_M
      schema: staging
      threads: 8
      private_key_path: "{{ env_var('SNOWFLAKE_PRIVATE_KEY_PATH') }}"
      private_key_passphrase: "{{ env_var('SNOWFLAKE_PRIVATE_KEY_PASSPHRASE') }}"

Model Naming Conventions

Follow consistent naming to improve discoverability:

Staging Models:

Format: stg_{source}__{entity}.sql
Example: stg_remembers__customers.sql, stg_shopify__orders.sql
Purpose: Source-conformed, cleaned, renamed columns

Intermediate Models:

Format: int_{domain}__{description}.sql
Example: int_membership__engagement.sql
Purpose: Logical transformations preparing for marts

Mart Models:

Format: {type}_{domain}__{entity}.sql
Types: dim_ (dimensions), fct_ (facts), rpt_ (reports)
Example: dim_organizations.sql, fct_member_engagement.sql
Purpose: Business-conformed, wide tables for analytics

Schema and Source Documentation

Every source and model folder should have YAML files:

# _remembers__sources.yml
version: 2
 
sources:
  - name: remembers
    description: CTA membership management system
    database: RAW
    schema: REMEMBERS
    tables:
      - name: customers
        description: Customer records from REMEMBERS
        columns:
          - name: customer_id
            description: Primary key
            tests:
              - not_null
              - unique

# _remembers__models.yml
version: 2
 
models:
  - name: stg_remembers__customers
    description: Cleaned customer data from REMEMBERS
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - not_null
          - unique
      - name: customer_email
        description: Customer email address
        tests:
          - not_null

6.6 GitHub Actions CI/CD Configuration

Repository Secrets Setup

Add the following secrets to GitHub repository settings:

SNOWFLAKE_ACCOUNT: Account identifier (e.g., abc12345.us-east-1)
SNOWFLAKE_USER: Service account username (e.g., dbt_service)
SNOWFLAKE_PRIVATE_KEY: Private key content (base64 encoded)
SNOWFLAKE_ROLE: Service role name (e.g., DBT_SERVICE)
SNOWFLAKE_DATABASE_PROD: Production database name
SNOWFLAKE_DATABASE_PREP: Staging database name
SNOWFLAKE_WAREHOUSE_PROD: Production warehouse name
SNOWFLAKE_WAREHOUSE_PREP: Staging warehouse name

CI/CD Workflow Configuration

Create .github/workflows/dbt-ci.yml:

name: DBT CI
 
on:
  pull_request:
    branches: [main]
    paths:
      - 'models/**'
      - 'macros/**'
      - 'tests/**'
      - 'dbt_project.yml'
 
jobs:
  dbt-test:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install DBT
        run: |
          pip install dbt-snowflake==1.7.0
      
      - name: Setup DBT profile
        run: |
          mkdir -p ~/.dbt
          cat << EOF > ~/.dbt/profiles.yml
          cta_snowflake:
            target: prep
            outputs:
              prep:
                type: snowflake
                account: ${{ secrets.SNOWFLAKE_ACCOUNT }}
                user: ${{ secrets.SNOWFLAKE_USER }}
                role: ${{ secrets.SNOWFLAKE_ROLE }}
                database: ${{ secrets.SNOWFLAKE_DATABASE_PREP }}
                warehouse: ${{ secrets.SNOWFLAKE_WAREHOUSE_PREP }}
                schema: staging
                threads: 8
                private_key_path: /tmp/snowflake_key.p8
          EOF
      
      - name: Decode private key
        run: |
          echo "${{ secrets.SNOWFLAKE_PRIVATE_KEY }}" | base64 -d > /tmp/snowflake_key.p8
          chmod 600 /tmp/snowflake_key.p8
      
      - name: Run DBT deps
        run: dbt deps
      
      - name: Run DBT build (changed models only)
        run: |
          dbt build --select state:modified+ --defer --state ./target
      
      - name: Run DBT test
        run: dbt test --select state:modified+

Create .github/workflows/dbt-deploy.yml:

name: DBT Deploy
 
on:
  push:
    branches: [main]
    paths:
      - 'models/**'
      - 'macros/**'
      - 'tests/**'
      - 'dbt_project.yml'
 
jobs:
  dbt-prod-deploy:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install DBT
        run: |
          pip install dbt-snowflake==1.7.0
      
      - name: Setup DBT profile
        run: |
          mkdir -p ~/.dbt
          cat << EOF > ~/.dbt/profiles.yml
          cta_snowflake:
            target: prod
            outputs:
              prod:
                type: snowflake
                account: ${{ secrets.SNOWFLAKE_ACCOUNT }}
                user: ${{ secrets.SNOWFLAKE_USER }}
                role: ${{ secrets.SNOWFLAKE_ROLE }}
                database: ${{ secrets.SNOWFLAKE_DATABASE_PROD }}
                warehouse: ${{ secrets.SNOWFLAKE_WAREHOUSE_PROD }}
                schema: staging
                threads: 8
                private_key_path: /tmp/snowflake_key.p8
          EOF
      
      - name: Decode private key
        run: |
          echo "${{ secrets.SNOWFLAKE_PRIVATE_KEY }}" | base64 -d > /tmp/snowflake_key.p8
          chmod 600 /tmp/snowflake_key.p8
      
      - name: Run DBT deps
        run: dbt deps
      
      - name: Run DBT build (full refresh)
        run: dbt build --full-refresh
      
      - name: Run DBT test
        run: dbt test
      
      - name: Generate DBT docs
        run: dbt docs generate
      
      - name: Upload docs artifact
        uses: actions/upload-artifact@v3
        with:
          name: dbt-docs
          path: target/

Workflow Triggers and Behavior

Pull Request (CI):

Runs on PR to main branch
Deploys to PREP environment
Tests only changed models (state-based selection)
Blocks merge if tests fail
Uses smaller PREP warehouse for cost efficiency

Main Branch Push (CD):

Runs on merge to main
Deploys to PROD environment
Runs full build and test suite
Generates documentation
Uses production warehouse
Notifications on failure (configure via GitHub Actions)

6.7 Monitoring and Audit

Access Review Process

Perform quarterly access reviews:

Generate user access report:

SELECT
    grantee_name AS user_name,
    role AS granted_role,
    granted_on,
    granted_by
FROM SNOWFLAKE.ACCOUNT_USAGE.GRANTS_TO_USERS
WHERE deleted_on IS NULL
ORDER BY grantee_name, role;

Review with team leads:
- Confirm each user still requires their assigned roles
- Remove access for departed team members
- Adjust roles for changed responsibilities
Document review:
- Record review date and findings
- Track any access changes made
- Update this SOP if role definitions change

Query and Warehouse Monitoring

Monitor usage patterns weekly:

-- Warehouse credit usage by day
SELECT
    DATE_TRUNC('day', start_time) AS usage_date,
    warehouse_name,
    SUM(credits_used) AS total_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC;
 
-- Long-running queries
SELECT
    query_id,
    user_name,
    warehouse_name,
    execution_time / 1000 AS execution_seconds,
    ROUND(bytes_scanned / POWER(1024, 3), 2) AS gb_scanned,
    query_text
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE execution_time > 60000  -- Over 1 minute
  AND start_time >= DATEADD('day', -1, CURRENT_DATE())
ORDER BY execution_time DESC
LIMIT 20;

DBT Run Monitoring

Track DBT execution and model performance:

-- Failed DBT runs (check logs)
SELECT
    query_id,
    user_name,
    start_time,
    error_message,
    query_text
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE user_name = 'DBT_SERVICE'
  AND error_code IS NOT NULL
  AND start_time >= DATEADD('day', -7, CURRENT_DATE())
ORDER BY start_time DESC;

Set up alerts for:

Failed GitHub Actions workflows (native GitHub notifications)
Unusual warehouse credit consumption
Query timeouts or errors in production
Schema changes in source systems

7. Quality Checks

Before considering setup complete, verify:

✅ All roles created and hierarchy properly configured
✅ Test users can connect with appropriate permissions
✅ Service account can authenticate using key pair
✅ DBT runs successfully in all three environments (dev, prep, prod)
✅ GitHub Actions CI workflow passes on test PR
✅ GitHub Actions CD workflow deploys to production on merge
✅ Warehouse auto-suspend is working (check after 5-10 min idle)
✅ Access grants follow least-privilege principle
✅ Documentation is generated and accessible
✅ Monitoring queries return expected results

Testing Checklist:

RBAC Testing:

-- As test user with ANALYST role
USE ROLE ANALYST;
SELECT * FROM PROD.MEMBERSHIP.DIM_ORGANIZATIONS LIMIT 10;  -- Should succeed
 
-- Try to write (should fail)
CREATE TABLE PROD.MEMBERSHIP.TEST_TABLE AS SELECT 1;  -- Should fail

DBT Local Development:

# In local environment
dbt debug  # Verify connection
dbt run --select stg_remembers__customers  # Test single model
dbt test --select stg_remembers__customers  # Test validation

GitHub Actions Testing:
- Create test PR with minor model change
- Verify CI workflow runs and passes
- Merge PR and verify CD workflow deploys to production

8. Escalation Path

Access Issues:

User cannot connect: Verify role grants and warehouse permissions
MFA/SSO issues: Check Snowflake authenticator configuration
Service account errors: Verify key pair and RSA_PUBLIC_KEY configuration
Contact: Brainforge Data Engineering team via Slack or email

DBT/GitHub Actions Issues:

CI workflow failures: Check logs in GitHub Actions tab
Deployment errors: Verify secrets configuration in GitHub
Model build failures: Check Snowflake query history for errors
Contact: CTA Data Team lead or Brainforge support

Performance Issues:

Slow queries: Review warehouse size and query optimization
High costs: Audit warehouse usage and auto-suspend settings
Failed builds: Check for data quality issues or schema changes
Contact: Brainforge Data Engineering for optimization review

Emergency Contacts:

CTA Data Team Lead: [Contact info]
Brainforge Support: [Contact info]
Snowflake Account Admin: [Contact info]

9. Version History

v1.0 (February 3, 2026) - Initial SOP creation
- Defined RBAC role hierarchy and provisioning workflows
- Documented schema and warehouse strategy
- Established DBT project structure and naming conventions
- Configured GitHub Actions CI/CD pipelines
- Created monitoring and audit procedures

Appendix A: Common Commands

Snowflake RBAC Management

-- Show current role and grants
SELECT CURRENT_ROLE();
SHOW GRANTS TO USER CURRENT_USER();
 
-- Switch roles
USE ROLE DATA_ENGINEER;
 
-- Show all users and their roles
SHOW GRANTS TO USER john_smith;
 
-- Show role hierarchy
SHOW GRANTS OF ROLE DATA_ENGINEER;

DBT Commands

# Development
dbt run --select model_name          # Run single model
dbt run --select +model_name         # Run model and upstream
dbt run --select model_name+         # Run model and downstream
dbt test --select model_name         # Test single model
 
# Environment targeting
dbt run --target prep                # Deploy to PREP
dbt run --target prod                # Deploy to PROD
 
# Documentation
dbt docs generate                    # Generate docs
dbt docs serve                       # Serve docs locally
 
# Debugging
dbt debug                           # Test connection
dbt compile --select model_name     # Compile without running
dbt ls --select tag:tdf             # List models by tag

Warehouse Management

-- Show warehouse status
SHOW WAREHOUSES;
 
-- Suspend warehouse manually
ALTER WAREHOUSE DEV_XS SUSPEND;
 
-- Resume warehouse
ALTER WAREHOUSE DEV_XS RESUME;
 
-- Modify warehouse size
ALTER WAREHOUSE PROD_M SET WAREHOUSE_SIZE = 'LARGE';
 
-- Check warehouse usage
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE warehouse_name = 'PROD_M'
  AND start_time >= DATEADD('day', -7, CURRENT_DATE());

Appendix B: Entity Resolution and Data Matching

Based on the January 26 DataOps meeting, one critical challenge is matching organizations across systems (REMEMBERS, Exhibitor/EXPO CAD, CES registration).

Current Approach:

Use REMEMBERS customers and customer_link tables as the backbone
REMEMBERS already maintains EXPO CAD IDs and domain mappings
Match primarily on domain (website, email domain)
Fall back to organization name matching where domain is unavailable

DBT Implementation Pattern:

-- Example: int_membership__organization_matching.sql
WITH remembers_orgs AS (
    SELECT
        customer_id,
        organization_name,
        primary_domain,
        expo_cad_ids  -- Array or delimited string
    FROM {{ ref('stg_remembers__customers') }}
),
 
ces_registrations AS (
    SELECT
        registration_id,
        company_name,
        email_domain,
        website_domain
    FROM {{ ref('stg_ces__registrations') }}
)
 
SELECT
    r.customer_id,
    c.registration_id,
    CASE
        WHEN r.primary_domain = c.website_domain THEN 'domain_match'
        WHEN r.primary_domain = c.email_domain THEN 'email_domain_match'
        WHEN LOWER(r.organization_name) = LOWER(c.company_name) THEN 'name_match'
        ELSE 'no_match'
    END AS match_type
FROM ces_registrations c
LEFT JOIN remembers_orgs r
    ON r.primary_domain = c.website_domain
    OR r.primary_domain = c.email_domain
    OR LOWER(r.organization_name) = LOWER(c.company_name)

Recommendations:

Create dim_organizations as single source of truth
Use DBT seeds for manual match overrides
Build broken join reports to surface unmatched records
Require source system updates (not manual matching by data team)

Appendix C: Cost Optimization Guidelines

Warehouse Sizing:

Start with X-Small for development
Use Small for staging/CI
Use Medium for production (scale to Large only if needed)
Monitor bytes spilled to local/remote storage (indicates undersized warehouse)

Auto-Suspend Configuration:

Development: 2-5 minutes (infrequent use)
Staging/CI: 5 minutes (burst workloads)
Production: 10 minutes (regular scheduled jobs)
Never set auto-suspend > 15 minutes without justification

Query Optimization:

Use LIMIT for development queries
Filter on partition keys where available (e.g., date columns)
Avoid SELECT * in production models
Use incremental models for large tables (>100M rows)
Materialize intermediate steps only when reused

Monitoring and Alerts:

Set budget alerts in Snowflake (Account → Cost Management)
Review weekly cost reports
Investigate warehouses with >50% idle time
Optimize or remove models with consistently poor efficiency

References

Snowflake RBAC Best Practices
DBT Best Practices: Project Structure
GitLab Data Team DBT Guide
Snowflake Warehouse Sizing
DBT Documentation
CTA DataOps Planning Meeting (January 26, 2026)

Brainforge Knowledge

Explorer

2026-02-03_snowflake-governance-dbt-setup