AI, Automation & Platform Engineering Projects Technical Design Document Template (TDD)

0. Document Control

VersionDateAuthorReviewer(s)Linear ProjectStatus
01YYYY_MM_DD

1. Overview

  • Purpose — why does this project exist?
  • Scope & Goals — what’s in / out, success metrics.
  • Stakeholders & Roles — engineering, product, client, ops.
  • Context diagram — one-slide “where this fits” view.

2. Problem Statement & Requirements

  • Business Problem / User Stories — narrative of current pain; tie to measurable outcomes.
  • Use Cases — what tasks the system must perform: questions it must answer, workflows it must orchestrate, services it must expose, user experiences it must deliver; link to PRDs/OKRs.
  • Functional Requirements — e.g., answer user queries from a knowledge base, expose a REST/GraphQL API, build a customer-facing feature, execute multi-step workflows, integrate with external services via MCP.
  • Non-Functional Requirements (NFRs) — latency/throughput, cost budgets (tokens, infrastructure), security/compliance, observability, scalability, SLAs.

3. Current State Assessment

Include sections relevant to the project. Not every project will touch every area.

  • Application Services & APIs — existing backend services, APIs, frontend apps, languages/frameworks, database layer.
  • Existing AI Capabilities — models in use, fine-tuned models, current accuracy/quality baselines.
  • Agent Frameworks & Orchestration — current tooling (LangChain, LlamaIndex, custom), runtime patterns.
  • Workflow Automation — existing automations (n8n, Windmill, cron jobs, custom scripts), trigger patterns.
  • MCP Servers & Integrations — deployed MCP servers, tool registries, API integrations.
  • LLM Providers & API Access — which providers, rate limits, spend, model versions pinned.
  • Vector Stores & Knowledge Bases — current retrieval infrastructure, embedding models, index sizes.
  • Prompt Management — how prompts are versioned, stored, and deployed today.
  • Infrastructure & Deployment — hosting platforms (Railway, Heroku, GCP, AWS, etc.), CI/CD, environments, domain/DNS, secrets management.
  • Observability & Monitoring — tracing (LangFuse, Langsmith), logging, APM, alerting, cost dashboards.
  • Known Constraints / Debt — brittle systems, missing tests/evals, provider lock-in, cost overruns, legacy code.
  • [ARCHITECTURE DIAGRAM]

4. Research & Discovery

Goal: expose all due diligence so reviewers trust the design decisions.

LinkType (spike, RFC, benchmark, eval, article)Key takeaway
  • Model benchmarks & eval results (if AI project)
  • Library / framework comparisons, proof-of-concept results
  • Competitive / prior-art summary
  • Assumptions & constraints
  • Open questions

5. Target Architecture & Design

  • High-Level Architecture Diagram — show the full system: clients/UIs services/APIs AI/model layer integrations/tools data stores.
  • Component Breakdown — purpose, inputs/outputs, dependencies for each component (services, agents, RAG pipelines, MCP servers, workflow automations, databases, frontend apps).
  • Integration Architecture — how components connect (MCP protocol, REST/GraphQL APIs, webhooks, event buses, message queues, database connections).
  • Sequence / Activity Diagrams — critical flows (request lifecycles, agent reasoning loops, RAG retrieval-to-response, workflow trigger chains, multi-turn conversations, auth flows).
  • Interfaces & Contracts — API specs (OpenAPI/GraphQL), MCP tool schemas, webhook payloads, database schemas, prompt templates with variable contracts.
  • Deployment View — environments, hosting platform selection (Railway, Heroku, GCP, AWS, etc.), CI/CD pipelines, feature flags, secrets management, domain/DNS.

A. Key Questions & Answers (KQA) Matrix

Business QuestionMetric(s) / OutputRequired Context & DataSystem Path (Input Agent/Model Tool Output)Latency / Freshness SLAAccuracy Target & Eval(s)

6. AI Model & Knowledge Strategy

Include this section when the project involves AI/ML components. For purely traditional dev projects, note “N/A — no AI components” and move on.

  • Model Selection Strategy — which LLMs/embedding models for which tasks; cost vs. capability tradeoffs; fallback chains.
  • Prompt Engineering Approach — system prompts, few-shot examples, chain-of-thought, structured output schemas.
  • Knowledge Management — RAG architecture, vector store design, chunking strategy, embedding model selection, index update cadence.
  • Agent Design Patterns — tool calling conventions, memory management (short-term / long-term), planning strategies, error recovery.
  • Workflow Automation Design — trigger logic, branching/conditional paths, retry/error handling, human-in-the-loop gates.
CapabilityModel / ApproachWhyFallbackCost Tier
Primary reasoning
Embeddings
Classification / routing
Code generation
Summarization

7. Tooling & Technology Decisions Matrix

Evaluate and justify tools for each domain relevant to the project:

Application & Infrastructure

  • Languages & frameworks (Python, TypeScript/Node, Next.js, FastAPI, etc.)
  • Databases (Postgres, Supabase, Redis, MongoDB, etc.)
  • Deployment & hosting (Railway, Heroku, GCP, AWS, Vercel, etc.)
  • CI/CD & DevOps (GitHub Actions, Docker, Terraform, etc.)
  • Auth & identity (Supabase Auth, Clerk, Auth0, custom)

AI & Automation (when applicable)

  • LLM providers (OpenAI, Anthropic, Google, open-source / self-hosted)
  • Agent frameworks & orchestration (LangChain, LlamaIndex, CrewAI, custom)
  • Vector stores & retrieval (TurboPuffer, Pinecone, Weaviate, pgvector, Qdrant)
  • Workflow automation (n8n, Windmill, Temporal)
  • MCP framework & runtime
  • Observability & tracing (LangFuse, Langsmith, custom)
  • Evaluation frameworks (Ragas, DeepEval, custom harness)
  • Prompt management & versioning
DomainOption(s) ConsideredSelectedCriteria / ScoreNotes

8. Decision Log & Rationale (ADR Index)

Record each significant trade-off as an Architecture Decision Record (ADR). Link them here or inline.

#ChoiceOptions ConsideredDecisionRationaleImpact / Risks

9. Security, Safety & Guardrails

General Security (all projects)

  • Authentication & Authorization — auth strategy, role-based access, API key management, OAuth flows.
  • Input Validation & Sanitization — injection prevention (SQL, XSS, etc.), request validation.
  • PII & Data Handling — how sensitive data is handled in transit, at rest, and in logs; redaction strategy.
  • Rate Limiting & Abuse Prevention — per-user/per-tenant limits, escalation rules.
  • Compliance & Regulatory — applicable regulations, data residency, audit trail requirements.
  • Secrets Management — how credentials, API keys, and tokens are stored and rotated.

AI-Specific Safety (when AI components are involved)

  • Output Validation — content filtering, format enforcement, confidence thresholds, hallucination detection.
  • Prompt Injection Defenses — input guardrails, context length management, system prompt protection.
  • Bias & Fairness — assessment methodology, testing cadence, mitigation strategies.
  • Human-in-the-Loop Requirements — which actions require human approval, escalation flows, override mechanisms.
  • Red-Teaming Plan — adversarial testing scope, cadence, responsible disclosure.

10. Implementation & Migration Plan

  • Milestones / phases (POC pilot production rollout)
  • Workstream ownership
  • Linear linkage to Project
  • Migration / cutover strategy — prompt migration, model swaps, A/B traffic shifting, rollback plans
  • Decommission plan — what legacy systems/workflows retire and when

B. Before / After Impact Scorecard

DimensionCurrent StateTarget State% / Delta ImprovementEvidence / Benchmark Plan
Time to answer “X?”Manual research, hours/days< 30 sec self-serve95%+ fasterQuery log baseline + post-rollout tracking
Task completion accuracyManual, ~80% consistency> 95% eval accuracy~15 pt improvementGolden dataset evals, human review sample
Workflow cycle time4 hrs manual, 3 handoffs< 5 min automated50x fasterTiming instrumentation
Cost per interaction$X (manual labor)$Y (tokens + infra)Z% reductionLangFuse cost tracking

11. Testing & Validation

Standard Testing (all projects)

  • Unit Tests — coverage targets, framework (pytest, Jest, Vitest, etc.), mocking strategy.
  • Integration Tests — API contract tests, database integration, service-to-service calls.
  • End-to-End Tests — critical user paths, browser automation (if frontend), workflow paths.
  • Performance / Load Testing — response time benchmarks, throughput under load, stress testing.
  • Regression Testing — test suites that run on dependency updates, code changes, deploys.

AI Evaluation (when AI components are involved)

  • Eval Framework — tooling (Ragas, DeepEval, custom), golden datasets, automated vs. human grading.
  • Accuracy & Faithfulness Testing — factuality checks, hallucination detection, citation verification.
  • Safety Testing — red-teaming results, adversarial input testing, guardrail validation.
  • Model Regression — test suites that run on model upgrades or prompt changes.

Validation & Acceptance

  • User Acceptance Testing — pilot user feedback loops, A/B test design, success criteria.
  • Staging / QA Environments — how pre-production mirrors production, data seeding strategy.
Test TypeScopeTool / MethodPass CriteriaCadence
UnitFunctions, modules> X% coverageEvery commit
IntegrationAPI contracts, service callsAll contracts passEvery PR
E2ECritical user pathsAll paths greenEvery deploy
Eval suiteGolden dataset, N questions> X% accuracyEvery deploy
LatencyP50 / P90 / P99< Y msEvery deploy
SafetyAdversarial inputs, N scenarios0 critical failuresWeekly / release
RegressionPrompt + model + code changesNo degradation > Z%Every change

12. Operational Considerations

  • Runbooks / on-call playbooks (service outage, model degradation, provider outage, cost spike)
  • SLA/SLO targets (availability, latency, accuracy, error rates)
  • Incident response — how to roll back a deploy, revert a model/prompt, disable a feature/tool
  • Scaling strategy — autoscaling triggers, queue management, circuit breakers, database connection pooling
  • Backup & recovery — database backups, disaster recovery, data export strategy
  • Logging & alerting — structured logging, alert routing, escalation paths

13. Cost Management

Infrastructure Costs

  • Hosting & Compute — platform costs (Railway, Heroku, GCP, AWS), instance sizing, autoscaling impact.
  • Database & Storage — managed database costs, storage tiers, backup costs.
  • Third-Party Services — SaaS/API costs, per-call pricing, volume commitments.

AI-Specific Costs (when applicable)

  • Token Usage Projections — estimated tokens per interaction, daily/monthly volume, growth assumptions.
  • Cost Per Interaction Targets — budget per query/task, broken down by model tier.
  • Model Tier Strategy — routing cheap tasks to smaller/cheaper models, reserving expensive models for complex reasoning.
  • Caching Strategy — semantic caching, response caching, embedding cache; expected hit rates.

Governance

  • Cost Monitoring & Alerting — dashboards, alert thresholds, per-tenant/per-feature cost attribution.
  • Budget Governance — hard caps, soft warnings, escalation when thresholds are breached.
Cost ComponentUnitEstimated VolumeUnit CostMonthly EstimateNotes
Hosting / computeper instance/hour
Databaseper plan / GB
LLM inference (primary)per 1k tokens
LLM inference (fallback)per 1k tokens
Embedding generationper 1k tokens
Vector storeper GB / per query
Third-party APIs / SaaSper call / month

14. Review, RFC & Sign-offs

  • Design Review Meeting — date/time, attendees, key outcomes.
  • RFC Process — distribution list, comment window, approval criteria (link to RFC doc/thread).
  • Sign-off Checklist — required approvals (AI Lead, Engineering Lead, Security, Product).

15. Meeting Notes

Paste minutes or link to recordings for every design review.

Action-item table drives accountability.

16. Agent Handoff Brief

Purpose: Condense the TDD into agent-ready context for AI coding agents (Cursor Plan Mode, Codex, etc.). Fill this out after the design is approved. For each implementation workstream or milestone, create one of these briefs. You can paste just this section as agent context, or pair it with the relevant architecture section.

Codebase Orientation

  • Repo(s): <repo name and path>
  • Key files / entry points:
    • path/to/relevant/filewhat it does
    • path/to/another/filewhat it does
  • Reference implementations: Point to existing features or patterns in the codebase that this work should follow.

Scope for This Workstream

What specifically is being built in this phase. Reference the milestone from Section 10.

Implementation Sequence

Ordered list of what to build, in what order. Each step should be independently testable.

Acceptance Criteria

Concrete, verifiable statements. An agent (or a human) should be able to check each one off.

Key Decisions to Carry Forward

Pull the most important decisions from the ADR Index (Section 8) and Tooling Matrix (Section 7) that the agent needs to respect. Don’t make the agent read the whole TDD — surface the decisions that matter for implementation.

Constraints and Guardrails

  • Don’t touch: files, services, or areas that are out of scope for this workstream
  • Must use: specific libraries, frameworks, patterns, or conventions required
  • Must not: hard constraints (e.g., no new dependencies without approval, don’t modify shared schemas)
  • Style / conventions: naming patterns, file organization, code style to follow
  • AI-specific: model provider to use, prompt patterns to follow, eval thresholds that must pass

17. Appendices

  • Glossary
  • Reference material
  • Change log

Appendix A — Section 5 Variants (Target Architecture & Design)

Replace your standard Section 5 with one (or combine relevant pieces) of the following variants based on project type.

5A. Retrieval-Augmented Generation (RAG) Pipeline

  • Context & Drivers — why pure LLM won’t meet accuracy/compliance; need grounded answers.
  • High-Level DiagramQuery Retriever Ranker LLM (w/ context) Post-processor.
  • Knowledge Store — vector DB choice, chunking strategy, embeddings model, update cadence.
  • Latency Budget — e.g., 300 ms retrieval, 700 ms generation (90-percentile).
  • Grounding & Citations — how sources are surfaced, confidence scoring, fallback if below threshold.
  • Failure Modes on Questions — stale index, long-tail queries, retrieval miss; mitigation.
  • Eval Hooks — automated factuality/hallucination tests every nightly index build.

5B. Agentic System with Tool Invocation

  • Agent Loop DiagramPlanner Tool-calling Memory Update Critic Next Action.
  • Tool Registry — JSON schema of actions (name, args, guardrails, auth scope).
  • Memory & Long-Term Context — vector store vs. relational store; eviction strategy.
  • Orchestration Runtime — LangChain, LlamaIndex, custom; concurrency, timeout rules.
  • Safety Layer — output filtering, rate limiting, red-teaming hooks.
  • Question-to-Component Map — which tools answer which business questions.
  • Observability — token/sec, cost per call, success/rollback metrics.

5C. Embedded AI Copilot in SaaS Product

  • User Journey Diagram — entry point, context capture, backend calls, UI surfacing.
  • Context Window Assembly — telemetry, recent actions, role/permissions.
  • Personalization Logic — per-user embeddings, on-device vs. server storage.
  • Real-Time Constraints1 s perceived latency; streaming vs. single shot.
  • Fallback UX — graceful degradation when model confidence is low.
  • Telemetry for Product Questions — how “Time-to-Answer” and adoption are logged.

5D. Fine-Tuned / Custom LLM Service

  • Base Model Selection — criteria (license, capability, cost).
  • Fine-Tuning Dataset Flow — collection, filtering, dedup, weighting, holdout split.
  • Training Infra — GPUs, parameter-efficient tuning (LoRA, QLoRA), MLOps pipeline.
  • Serving Stack — quantization, tensor parallelism, autoscaling triggers.
  • Versioning & Rollback — traffic shadowing, canary %, safety nets.
  • Evaluation Harness — regression suite on KQA, toxicity, bias, latency, cost.

5E. Multi-Modal AI Assistant

  • Input Modalities — text, image, voice; pre-processing pipelines.
  • Fusion Strategy — encoder sharing, late fusion, routing logic.
  • Real-Time Transcription / OCR — latency and cost gates.
  • Output Rendering — rich snippets, highlighted source images.
  • Accessibility Compliance — captions, alt-text generation.
  • Question Failure Map — how missing modality affects specific questions.

5F. MCP Server & Tool Platform

  • MCP Server Architecture — transport layer, session management, capability negotiation.
  • Tool Schema Design — JSON schema definitions, argument validation, auth scoping.
  • Resource & Prompt Exposure — what resources are served, prompt templates provided.
  • Multi-Tenant Isolation — per-client/per-user scoping, credential management.
  • Composability — how multiple MCP servers are composed for a single agent.
  • Deployment & Versioning — server lifecycle, backward compatibility, discovery mechanisms.

5G. Workflow Automation & Integration Platform

  • Workflow Engine — n8n, Windmill, Temporal; selection rationale.
  • Trigger Taxonomy — webhooks, schedules, event-driven, manual; trigger-to-workflow mapping.
  • Step Design — idempotency, retry logic, timeout handling, error routing.
  • Integration Registry — external systems connected, auth methods, rate limits per integration.
  • Human-in-the-Loop Gates — approval steps, escalation paths, notification channels.
  • Monitoring & Alerting — execution dashboards, failure alerts, SLA tracking.

5H. API / Backend Service

  • Service Architecture — monolith vs. microservices, request lifecycle, middleware stack.
  • API Design — REST vs. GraphQL, versioning strategy, pagination, error contract.
  • Database Design — schema (ERD), ORM/query layer, migrations strategy, indexing plan.
  • Auth & Permissions — authentication method, authorization model (RBAC, ABAC), token management.
  • Background Jobs / Queues — async processing, job scheduling, retry/dead-letter handling.
  • Caching Layer — what’s cached, TTL strategy, invalidation rules.
  • Third-Party Integrations — external APIs consumed, webhook handling, circuit breakers.

5I. Frontend / Product Feature

  • User Journey & Wireframes — key screens, user flows, interaction patterns.
  • Component Architecture — framework (Next.js, React, etc.), component hierarchy, state management.
  • API Integration Layer — how the frontend talks to backend(s), data fetching strategy, optimistic updates.
  • Performance Budget — bundle size targets, Core Web Vitals targets, lazy loading strategy.
  • Accessibility — WCAG compliance level, testing approach.
  • Deployment & CDN — hosting (Vercel, Railway, etc.), edge caching, preview deployments.

5.X Question-Driven Design Lens

  • Primary Questions Enabled — (reference KQA matrix rows this section covers).
  • Speed/Accuracy Impact — what improves and how it’s measured (link to Scorecard).
  • Failure Mode on Questions — if this component fails/degrades, which questions break and what’s the fallback?

5.Y Question-to-Component Mapping

Component / ServiceQuestions UnlockedModel / Tool DependenciesSLA / Perf TargetOwner

Appendix B — Section 6 Variants (AI Model & Knowledge Strategy)

Pick the paradigm that matches the project. You can combine approaches (e.g., RAG knowledge base with a prompt library and eval dataset).

6A. RAG Knowledge-Base Design

  • Corpus Scope & Owners — docs, tickets, wikis, DB records; TTL rules.
  • Chunking & Embeddings Strategy — overlap, window size, model choice, re-embed frequency.
  • Metadata Schema — source, confidence, permissions tag.
  • Index Update Workflow — CDC, webhook triggers, nightly rebuild.
  • Evaluation — recall@k, answer faithfulness, citation coverage.

6B. Prompt & System-Message Library

  • Prompt Catalog — table of prompt ID, purpose, target model, guardrails.
  • Parameterization Rules — slots, variable escaping, defaults.
  • Versioning & A/B Testing — traffic split strategy, success metrics.
  • Observability — prompt-level latency, token usage, cost.

6C. Tool / Action Schema for Agents

  • JSON/YAML Contract — name, description, args, required auth, rate limits.
  • Registration & Discovery — dynamic loading vs. static registry.
  • Safety Scopes — sandboxing, dry-run, audit logging.
  • Mapping to Business Questions — which tool enables which KQA row.

6D. Evaluation & Feedback Dataset Design

  • Golden Dataset — representative user questions, expected answers, edge cases.
  • Scoring Rubrics — factuality, helpfulness, brevity, style.
  • Automated Graders — LLM-as-judge vs. deterministic scripts.
  • Human-in-the-Loop Pipeline — sampling %, UI for raters, adjudication.
  • Continuous Learning Loop — when data promotes to fine-tuning set.

6E. Vector-Store & Embedding Governance

  • Embedding Model Lifecycle — upgrade cycle, backfill policy.
  • Similarity Metrics — cosine vs. inner product vs. dot-product; threshold tuning.
  • Namespace & ACL Strategy — per-tenant isolation, encryption at rest.
  • Cost Controls — shard sizing, cold-storage tiers, deletion hooks.

6F. Fine-Tuning / RLHF Dataset Curation

  • Source Mix — customer chats, docs, synthetic Q&A.
  • Filtering — PII removal, toxicity filter, language detection.
  • Label Schema — preference pairs, scorecards, multi-choice.
  • Data Weighting & Sampling — boost rare intents, down-weight low quality.
  • Ethics & Bias Review — checklists, external audit steps.

6G. Workflow & Automation Recipe Design

  • Recipe Catalog — table of automation name, trigger, steps, owner, frequency.
  • Input/Output Contracts — expected payloads, schema validation, error shapes.
  • Dependency Map — which automations depend on which services/APIs/models.
  • Testing Strategy — dry-run modes, mock integrations, snapshot testing.
  • Versioning & Rollback — recipe version control, staged rollouts, instant rollback.

Key Questions & Answers (KQA) Matrix & Impact Scorecard

(Mandatory in every variant — paste near the top of Section 5 or 6)

Business QuestionMetric / OutputLatency SLAAccuracy TargetComponent PathEval Test
DimensionCurrentTargetDelta ImprovementEvidence Plan
Time-to-Answer
Answer Accuracy
Cost / 1k Tokens
Workflow Cycle Time