Data Engineering Assessment — Audit Checklist

Use this checklist when auditing the data-engineering-assessment repo. Fill in with Awaish/Demi during or after reviewing the repo. Reference: AI challenge README for target structure.


1. Inventory (what exists today)

ItemPresent?Notes
README with overview and objective☐ Yes ✓ NoREADME is minimal (one line + links).
Clear instructions for the candidate✓ Yes ☐ NoFull instructions in CHALLENGE.md (ingestion, RBAC, dbt, orchestration).
Tasks (GitHub issues or ISSUE_TASKS.md)✓ Yes ☐ NoCHALLENGE.md = task list (sections 2–5 + deliverables checklist).
Sample data or code / seed files✓ Yes ☐ NoDATA/: 3 JSONL (customers, orders, products) + 3 METADATA_*.md.
Run/validation (e.g. DuckDB, tests, script)✓ Yes ☐ NoCandidate runs Airbyte, dbt, GitHub Actions; dbt test required.
Time expectation stated☐ Yes ✓ NoNot stated.
Submission steps (fork → branch → PR)☐ Yes ✓ NoNot in README.
Evaluation criteria table (like AI challenge)☐ Yes ✓ NoNot present.
Contact/support note for candidates☐ Yes ✓ NoNot present.

Other contents (list):

  • CHALLENGE.md: Ingestion (Airbyte → Postgres), Postgres roles/RBAC, dbt (staging + order summary mart), GitHub Actions (PR → staging, schedule → prod).
  • Out-of-scope section (no cloud deploy, no BI tool setup).

2. Stage 3 mapping

Map repo tasks to Stage 3 focus areas: role-specific outcomes, functional depth, problem-solving in real client contexts, learning velocity and quality bar.

Stage 3 focusHow the current assessment signals thisGap?
Role-specific outcomesAirbyte ingestion, Postgres RBAC, dbt staging/marts, GitHub Actions — all DE outcomes.✓ None
Functional depth (pipelines, warehouse, tooling)End-to-end: raw → staging → mart; 4 roles; staging vs prod targets; CI + schedule.✓ None
Problem-solving in realistic scenarioShopify-style JSONL, nested structures, schema/namespace choices, docs required.✓ None
Learning velocity and quality barDeliverables checklist + docs; candidate must document choices and run steps.✓ None

3. Comparison to AI challenge pattern

ElementAI challengeDE repo todayAction
Overview + objective in README✗ README minimalAdd to README
Repo structure sectionAdd to README
Functional + technical requirements✓ In CHALLENGE.mdKeep; README can summarize + link
Time expectation (~5 h)Add to README
Submission: fork, branch name, PRAdd to README
PR description: Loom? setup? assumptions?Add to README
Evaluation criteria table in READMEAdd to README
Contact (no public issues)Add to README

4. Gaps to fix (summary)

  • README: Add overview, objective, repo structure, time expectation (~5–8 h), submission steps (fork → branch → PR, what to include in PR), evaluation criteria table, contact. Keep CHALLENGE.md as the detailed task spec.
  • Evaluation criteria: Add table in README (data/pipeline design, code quality, system design, completeness, optional presentation).
  • Contact and owner: Add “For questions contact your Brainforge contact; do not open public issues.” Optional: “Content owner: Awaish.”

5. Suggested DE evaluation criteria table (for README)

Use or adapt this in the DE repo README so it matches the AI challenge style. Align with the Interview Scorecard and Rubrics for the Data track.

AreaDescription
Data / pipeline designHow well the solution handles data ingestion, transformation, or storage (as relevant to the task).
Code qualityStructure, readability, maintainability, and adherence to common DE practices.
System designLogical architecture, clarity of choices, and ability to generalize or extend.
CompletenessMeets submission and documentation requirements; run/validation succeeds.
PresentationIf Loom is required: clarity and professionalism of walkthrough.

6. After audit