Data Platform Projects Technical Design Document Template (TDD)

0 Document Control

VersionDateAuthorReviewer(s)Linear ProjectStatus
012025_07_23Caio
ArtifactLinkOwnerPurpose in this TDD
Data Platform DocumentationHereCanonical platform context, current-state diagrams, data contracts
Project CharterBusiness goals, scope, success metrics, stakeholders
Project Management PlanTimeline, resourcing, risks, comms/rituals
Linear Project / TeamTicket breakdown, execution tracking, status

Stakeholders

NameDepartment

2. Overview

2. Overview

  • Purpose – why does this project exist?
  • Scope & Goals – what’s in / out, success metrics.
  • Stakeholders & Roles – engineering, product, client, ops.
  • Context diagram – one-slide “where this fits” view.

3. Problem Statement & Requirements

  • Business Problem / User Stories – narrative of current pain; tie to measurable outcomes.
  • Analytics Use Cases – questions dashboards/AI agents must answer; link to PRDs/OKRs.
  • Functional Requirements – e.g., ingest events, model customer identity, expose metrics API.
  • Non-Functional Requirements (NFRs) – latency/throughput, security, compliance, observability, cost constraints, SLAs.

4. Current State Assessment

  • Data Sources & Ingestion – existing ETL/ELT (Fivetran, Airbyte, custom), frequencies, volumes.
  • Warehouses / Lakes – current tech (Snowflake, BigQuery, Redshift, Databricks, etc.), schemas, pain points.
  • Modeling Layer – dbt project structure, test coverage, naming conventions.
  • BI / Activation Layer – tools (Looker, Power BI, Tableau, Hightouch, Segment, Rudderstack, mParticle, etc.) and gaps.
  • Data Quality & Governance – checks, lineage, ownership, documentation gaps.
  • Known Constraints / Debt – legacy models, brittle pipelines, permissions issues.
  • [ARCHITECTURE DIAGRAM]

5. Research & Discovery

Goal: expose all due diligence so reviewers trust the design decisions.

LinkType (spike, RFC, benchmark, article)Key takeaway
  • Competitive / prior-art summary
  • Assumptions & constraints
  • Open questions

6. Target Architecture & Design

  • High-level Architecture Diagram – ingestion → storage → modeling → serving/activation layers.
  • Component Breakdown – purpose, inputs/outputs, dependencies (ETL jobs, dbt models, marts, APIs).
  • Data Model – ERD / star schemas / semantic layer definitions (link to dbt docs).
  • Sequence / Activity Diagrams – critical flows (identity stitching, metric computation).
  • Interfaces & Contracts – schemas (JSON, Avro), API specs (OpenAPI/GraphQL), file formats/locations.
  • Deployment View – environments, IaC repos, CI/CD for dbt/ETL, feature flags.

A. Key Questions & Answers (KQA) Matrix

Business QuestionMetric(s) / OutputNeeded Grain & DimensionsData Path (Src → Model → Surface)Freshness / Latency SLAAccuracy Rule(s) & Test(s)

7. Data Modeling Strategy

  • Modeling Paradigm – dimensional (Kimball), Data Vault, medallion/lakehouse, semantic layer, etc.
  • Naming & Folder Conventions – e.g., staging/intermediate/marts in dbt.
  • Metric Definitions – canonical list tied to business goals.
MetricBusiness DefinitionFormula / SQL LogicGrainSource of Truth Table

8. Tooling & Technology Decisions Matrix

Evaluate and justify tools for:

  • Ingestion / ETL
  • Vendor lock in
  • Transformation / Modeling (dbt, Python notebooks)
  • Storage / Warehouse / Lakehouse
  • BI / Visualization
  • CDP / Activation / Reverse ETL
  • Orchestration & Scheduling
  • Observability & Cost Monitoring
DomainOption(s) ConsideredSelectedCriteria / ScoreNotes

9. Decision Log & Rationale (ADR Index)

Record each significant trade-off as an Architecture Decision Record (ADR). Link them here or inline.

#ChoiceOptions ConsideredDecisionRationaleImpact / Risks

10. Implementation & Migration Plan

  • Milestones / phases (POC → pilot → full migration)
  • Workstream ownership
  • Linear linkage to Project
  • Migration / cutover strategy – data backfill, parallel run, validation windows, rollback plans
  • Decommission plan – what legacy assets retire and when

11. Testing, Data Quality & Validation

  • Test matrix – unit (dbt tests), integration (pipeline), end-to-end (dashboards), load/perf, security
  • Data quality rules – null checks, schema drift, freshness, anomalies (define monitors/alerts)
  • Validation plan – sampling queries, reconciliations vs. legacy numbers
  • Observability – logs, metrics, lineage tools (OpenLineage, Monte Carlo, Databand, etc.)

12. Decision Log & Rationale

For every significant trade-off, include an Architecture Decision Record (ADR) reference or inline table:

#ChoiceOptions consideredDecisionRationaleImpact

Capturing these as ADRs keeps institutional memory intact and is an emerging best practice  .

13. Implementation Plan

  • Milestones / phases
  • Work-stream ownership
  • Roll-back & migration notes (if replacing legacy)

B. Before / After Impact Scorecard

DimensionCurrent StateTarget State% / Δ ImprovementEvidence / Benchmark Plan
Time to answer “X?”2 days manual SQL<15 min self-serve90% fasterQuery log baseline + post-rollout tracking
Accuracy of metric Y±5% variance vs. finance<±0.5%10× betterReconciliation scripts, dbt tests
Data freshnessDailyHourly24×Pipeline SLA

15. Operational Considerations

  • Runbooks / on-call playbooks
  • SLA/SLO targets
  • Cost estimates & governors (if cloud usage matters)

16. Review Meetings, RFC & Sign-offs

  • Design Review Meeting – date/time, attendees, key outcomes
  • RFC Process – distribution list, comment window, approval criteria (link to RFC doc/thread)
  • Sign-off Checklist – required approvals (Architecture Lead, Data Lead, Security, Product)

17. Meeting Notes

Paste minutes or link to recordings for every design review.

Action-item table drives accountability.

18. Agent Handoff Brief

Purpose: Condense the TDD into agent-ready context for AI coding agents (Cursor Plan Mode, Codex, etc.). Fill this out after the design is approved. For each implementation workstream or milestone, create one of these briefs. You can paste just this section as agent context, or pair it with the relevant architecture section.

Codebase Orientation

  • Repo(s): <repo name and path>
  • Key files / entry points:
    • path/to/relevant/filewhat it does
    • path/to/another/filewhat it does
  • Reference implementations: Point to existing features or patterns in the codebase that this work should follow.

Scope for This Workstream

What specifically is being built in this phase. Reference the milestone from the Implementation Plan.

Implementation Sequence

Ordered list of what to build, in what order. Each step should be independently testable.

Acceptance Criteria

Concrete, verifiable statements. An agent (or a human) should be able to check each one off.

Key Decisions to Carry Forward

Pull the most important decisions from the ADR Index and Tooling Matrix that the agent needs to respect. Don’t make the agent read the whole TDD — surface the decisions that matter for implementation.

Constraints and Guardrails

  • Don’t touch: files, services, or areas that are out of scope for this workstream
  • Must use: specific libraries, frameworks, patterns, or conventions required
  • Must not: hard constraints (e.g., don’t modify production schemas without migration, don’t change shared dbt macros)
  • Style / conventions: naming patterns, file organization, dbt/SQL style to follow
  • Data-specific: warehouse to target, schema/dataset naming, test coverage requirements

19. Appendices

  • Glossary
  • Reference material
  • Change log

Appendix A — Section 6 Variants (Target Architecture & Design)

Replace your standard Section 6 with one

6A. Data Warehouse / Lakehouse Migration or Re-Architecture

  • Context & Drivers: Current platform, pain points (cost, performance, governance).
  • Target Stack Diagram: Source → Ingestion → Staging → Modeling → Serving/BI.
  • Environment Strategy: Dev / QA / Prod, branch strategy, CI/CD for dbt/ETL.
  • Migration Path: Parallel run vs. big bang, data backfill plan, cutover criteria.
  • Workload Segmentation: What runs where (compute clusters, warehouses, job queues).
  • Cost/Performance Guardrails: Warehouse sizing, auto-suspend rules, caching strategy.
  • Decommission Plan: Legacy assets, timelines, owners.

6B. CDP / Activation (Reverse ETL) Implementation

  • Identity & Audience Architecture: Identity graph, stitching logic, primary keys.
  • Event & Profile Schemas: Required attributes, PII handling, consent flags.
  • Activation Flows: Data paths from warehouse → CDP → downstream tools (email, ads).
  • Segmentations & Audiences: Definition storage (dbt, CDP UI), refresh cadence.
  • Governance & Opt-Out: Compliance hooks, audit logs, data retention.
  • Performance & Sync SLAs: Latency windows, retry/backoff patterns.

6C. dbt-Centric Modeling & Semantic Layer Build

  • Project Structure: Staging / intermediate / marts folders, packages, macros.
  • Model Dependency Graph: Key lineage blocks; link to dbt docs site.
  • CI/CD & Testing: Gate merges on dbt test success, code owners.
  • Semantic Layer / Metrics: Tooling (MetricFlow, dbt metrics, LookML), how metrics are exposed.
  • Deployment & Scheduling: Orchestrator, run ordering, recovery on failure.

6D. ETL/ELT Pipeline (New or Redesign)

  • Ingestion Patterns: Batch vs. streaming, connectors (Fivetran, Airbyte, custom).
  • Transformation Strategy: Where transforms occur (warehouse vs. Spark vs. Python).
  • Error Handling / Replay: Dead-letter queues, idempotency, late-arriving data.
  • Data Contracts & Schemas: Versioning, schema drift detection, contract enforcement.
  • Observability: Metrics (freshness, row counts), alert routing, lineage capture.

6E. BI / Visualization Tool Rollout or Migration

  • Semantic Layer Strategy: Central model vs. tool-native models.
  • Dashboard & Report Taxonomy: Core dashboards, ownership, refresh SLAs.
  • Access & Governance: Roles, row-level security, certified vs. ad-hoc content.
  • Performance Optimization: Extracts, caching, aggregate tables, query governance.
  • Change Management & Enablement: Training plan, office hours, documentation hub.

6F. Large Technical Migration / Platform Consolidation

  • Scope of Migration: Systems/processes impacted; what stays vs. moves.
  • Interim Architecture: Transitional states, feature flags, phased cutovers.
  • Data Parity Strategy: Reconciliation processes, golden datasets, sign-off criteria.
  • Risk Matrix: Technical/operational risks, mitigations, contingency plans.
  • Sunset Timeline: Dependencies, communication plan, audit requirements.

6.X Question-Driven Design Lens

  • Primary Questions Enabled: (reference KQA matrix rows this section covers)
  • Speed/Accuracy Impact: What improves and how it’s measured (link to Scorecard).
  • Failure Mode on Questions: If this component fails/degrades, which questions break and what’s the fallback?

6.Y Question-to-Component Mapping

Component / ServiceQuestions UnlockedData / Metric DependenciesSLA / Perf TargetOwner

Appendix B — Section 7 Variants (Data Modeling Strategy)

Pick the paradigm that matches the project. You can also combine (e.g., Kimball marts on top of a Data Vault).

7A. Dimensional (Kimball) / Star Schema Refresh

  • Design Principles: Conformed dimensions, slowly changing dimensions (SCD type selection).
  • Grain Definition: For each fact table (event, order, session), specify the grain explicitly.
  • Dim & Fact Mapping Table:
Fact/DimBusiness PurposeGrainKeysSource TablesNotes
  • SCD Handling: Type 1/2 logic, change capture sources.
  • Metric Layer Alignment: How facts feed canonical metrics.

7D. Metrics Layer / Semantic Model First

  • Canonical Metrics Inventory: Table of metric name, owner, formula, grain, dimensions.
MetricOwnerFormulaGrainDimensionsSourceTest(s)
  • Tooling Choice & Governance: dbt metrics, Transform, MetricFlow, LookML, etc.
  • Change Control: Versioning metrics, review process, backfill impact.
  • Exposure Mechanisms: SQL, APIs, BI tools, AI agents.

7E. Identity Graph / Customer 360 (CDP/Data Activation)

  • Entity Definitions: Person, account, device, session; how they relate.
  • Resolution Rules: Deterministic vs. probabilistic matching, priority of identifiers.
  • Golden Record Strategy: Where the truth lives, refresh cadence, conflict resolution.
  • Privacy Bucketing & Consent Flags: Data classification embedded in the model.
  • Downstream Contract: What fields are required by activation tools, update SLAs.

7F. Event-Centric Product Analytics Model

  • Event Taxonomy: Naming, required properties, common dimensions (user_id, session_id).
  • Sessionization & Attribution Logic: Windowing rules, referrers, campaign joins.
  • Aggregations & Rollups: Daily active users, retention cohorts, funnels.
  • Schema Evolution Plan: Adding properties, deprecating events, versioning.
  • Testing & QA: Event coverage, volume anomalies, property null rates.

Appendix A — Section 6 Variants (Target Architecture & Design for AI)

Pick one

6A. Retrieval-Augmented Generation (RAG) Pipeline

  • Context & Drivers – why pure LLM won’t meet accuracy/compliance; need grounded answers.
  • High-Level DiagramQuery → Retriever → Ranker → LLM (w/ context) → Post-processor.
  • Knowledge Store – vector DB choice, chunking strategy, embeddings model, update cadence.
  • Latency Budget – e.g., 300 ms retrieval, 700 ms generation (90-percentile).
  • Grounding & Citations – how sources are surfaced, confidence scoring, fallback if < threshold.
  • Failure Modes on Questions – stale index, long-tail queries, retrieval miss; mitigation.
  • Eval Hooks – automated factuality/hallucination tests every nightly index build.

6B. Agentic System with Tool Invocation

  • Agent Loop DiagramPlanner → Tool-calling → Memory Update → Critic → Next Action.
  • Tool Registry – JSON schema of actions (name, args, guardrails, auth scope).
  • Memory & Long-Term Context – vector store vs. relational store; eviction strategy.
  • Orchestration Runtime – LangChain, LlamaIndex, custom; concurrency, timeout rules.
  • Safety Layer – output filtering, rate limiting, red-teaming hooks.
  • Question-to-Component Map – which tools answer which business questions.
  • Observability – token/sec, cost per call, success/rollback metrics.

6C. Embedded AI Copilot in SaaS Product

  • User Journey Diagram – entry point, context capture, backend calls, UI surfacing.
  • Context Window Assembly – telemetry, recent actions, role/permissions.
  • Personalization Logic – per-user embeddings, on-device vs. server storage.
  • Real-Time Constraints – ≤ 1 s perceived latency; streaming vs. single shot.
  • Fallback UX – graceful degradation when model confidence low.
  • Telemetry for Product Questions – how “Time-to-Answer” & adoption are logged.

6D. Fine-Tuned / Custom LLM Service

  • Base Model Selection – criteria (license, capability, cost).
  • Fine-Tuning Dataset Flow – collection, filtering, dedup, weighting, holdout split.
  • Training Infra – GPUs, parameter-efficient tuning (LoRA, QLoRA), MLOps pipeline.
  • Serving Stack – quantization, tensor parallelism, autoscaling triggers.
  • Versioning & Rollback – traffic shadowing, canary %, safety nets.
  • Evaluation Harness – regression suite on KQA, toxicity, bias, latency, cost.

6E. Multi-Modal AI Assistant

  • Input Modalities – text, image, voice; pre-processing pipelines.
  • Fusion Strategy – encoder sharing, late fusion, routing logic.
  • Real-Time Transcription / OCR – latency and cost gates.
  • Output Rendering – rich snippets, highlighted source images.
  • Accessibility Compliance – captions, alt-text generation.
  • Question Failure Map – how missing modality affects specific questions.

Appendix B — Section 7 Variants (Model & Knowledge Strategy)

Pick one

7A. RAG Knowledge-Base Design

  • Corpus Scope & Owners – docs, tickets, wikis, db records; TTL rules.
  • Chunking & Embeddings Strategy – overlap, window size, model choice, re-embed frequency.
  • Metadata Schema – source, confidence, permissions tag.
  • Index Update Workflow – CDC, webhook triggers, nightly rebuild.
  • Evaluation – recall@k, answer faithfulness, citation coverage.

7B. Prompt & System-Message Library

  • Prompt Catalog – table of prompt ID, purpose, target model, guardrails.
  • Parameterization Rules – slots, variable escape/escaping, defaults.
  • Versioning & A/B Testing – traffic split strategy, success metrics.
  • Observability – prompt-level latency, token usage, cost.

7C. Tool / Action Schema for Agents

  • JSON/YAML Contract – name, description, args, required auth, rate limits.
  • Registration & Discovery – dynamic loading vs. static registry.
  • Safety Scopes – sandboxing, dry-run, audit logging.
  • Mapping to Business Questions – which tool enables which KQA row.

7D. Evaluation & Feedback Dataset Design

  • Golden Dataset – representative user questions, expected answers, edge cases.
  • Scoring Rubrics – factuality, helpfulness, brevity, style.
  • Automated Graders – LLM-as-judge vs. deterministic scripts.
  • Human-in-the-Loop Pipeline – sampling %, UI for raters, adjudication.
  • Continuous Learning Loop – when data promotes to fine-tuning set.

7E. Vector-Store & Embedding Governance

  • Embedding Model Lifecycle – upgrade cycle, backfill policy.
  • Similarity Metrics – cosine vs. IP vs. dot-product; threshold tuning.
  • Namespace & ACL Strategy – per-tenant isolation, encryption at rest.
  • Cost Controls – shard sizing, cold-storage tiers, deletion hooks.

7F. Fine-Tuning / RLHF Dataset Curation

  • Source Mix – customer chats, docs, synthetic Q&A.
  • Filtering – PII removal, toxicity filter, language detection.
  • Label Schema – preference pairs, scorecards, multi-choice.
  • Data Weighting & Sampling – boost rare intents, down-weight low quality.
  • Ethics & Bias Review – checklists, external audit steps.

Key Questions & Answers (KQA) Matrix & Impact Scorecard

(Mandatory in every variant – paste near the top of Section 6 or 7)

Business QuestionMetric / OutputLatency SLAAccuracy TargetComponent PathEval Test
DimensionCurrentTargetΔ ImprovementEvidence Plan
Time-to-Answer
Answer Accuracy
Cost / 1k Tokens