Data Engineering Assessment — Audit Checklist
Use this checklist when auditing the data-engineering-assessment repo. Fill in with Awaish/Demi during or after reviewing the repo. Reference: AI challenge README for target structure.
1. Inventory (what exists today)
| Item | Present? | Notes |
|---|---|---|
| README with overview and objective | ☐ Yes ✓ No | README is minimal (one line + links). |
| Clear instructions for the candidate | ✓ Yes ☐ No | Full instructions in CHALLENGE.md (ingestion, RBAC, dbt, orchestration). |
| Tasks (GitHub issues or ISSUE_TASKS.md) | ✓ Yes ☐ No | CHALLENGE.md = task list (sections 2–5 + deliverables checklist). |
| Sample data or code / seed files | ✓ Yes ☐ No | DATA/: 3 JSONL (customers, orders, products) + 3 METADATA_*.md. |
| Run/validation (e.g. DuckDB, tests, script) | ✓ Yes ☐ No | Candidate runs Airbyte, dbt, GitHub Actions; dbt test required. |
| Time expectation stated | ☐ Yes ✓ No | Not stated. |
| Submission steps (fork → branch → PR) | ☐ Yes ✓ No | Not in README. |
| Evaluation criteria table (like AI challenge) | ☐ Yes ✓ No | Not present. |
| Contact/support note for candidates | ☐ Yes ✓ No | Not present. |
Other contents (list):
- CHALLENGE.md: Ingestion (Airbyte → Postgres), Postgres roles/RBAC, dbt (staging + order summary mart), GitHub Actions (PR → staging, schedule → prod).
- Out-of-scope section (no cloud deploy, no BI tool setup).
2. Stage 3 mapping
Map repo tasks to Stage 3 focus areas: role-specific outcomes, functional depth, problem-solving in real client contexts, learning velocity and quality bar.
| Stage 3 focus | How the current assessment signals this | Gap? |
|---|---|---|
| Role-specific outcomes | Airbyte ingestion, Postgres RBAC, dbt staging/marts, GitHub Actions — all DE outcomes. | ✓ None |
| Functional depth (pipelines, warehouse, tooling) | End-to-end: raw → staging → mart; 4 roles; staging vs prod targets; CI + schedule. | ✓ None |
| Problem-solving in realistic scenario | Shopify-style JSONL, nested structures, schema/namespace choices, docs required. | ✓ None |
| Learning velocity and quality bar | Deliverables checklist + docs; candidate must document choices and run steps. | ✓ None |
3. Comparison to AI challenge pattern
| Element | AI challenge | DE repo today | Action |
|---|---|---|---|
| Overview + objective in README | ✓ | ✗ README minimal | Add to README |
| Repo structure section | ✓ | ✗ | Add to README |
| Functional + technical requirements | ✓ | ✓ In CHALLENGE.md | Keep; README can summarize + link |
| Time expectation (~5 h) | ✓ | ✗ | Add to README |
| Submission: fork, branch name, PR | ✓ | ✗ | Add to README |
| PR description: Loom? setup? assumptions? | ✓ | ✗ | Add to README |
| Evaluation criteria table in README | ✓ | ✗ | Add to README |
| Contact (no public issues) | ✓ | ✗ | Add to README |
4. Gaps to fix (summary)
- README: Add overview, objective, repo structure, time expectation (~5–8 h), submission steps (fork → branch → PR, what to include in PR), evaluation criteria table, contact. Keep CHALLENGE.md as the detailed task spec.
- Evaluation criteria: Add table in README (data/pipeline design, code quality, system design, completeness, optional presentation).
- Contact and owner: Add “For questions contact your Brainforge contact; do not open public issues.” Optional: “Content owner: Awaish.”
5. Suggested DE evaluation criteria table (for README)
Use or adapt this in the DE repo README so it matches the AI challenge style. Align with the Interview Scorecard and Rubrics for the Data track.
| Area | Description |
|---|---|
| Data / pipeline design | How well the solution handles data ingestion, transformation, or storage (as relevant to the task). |
| Code quality | Structure, readability, maintainability, and adherence to common DE practices. |
| System design | Logical architecture, clarity of choices, and ability to generalize or extend. |
| Completeness | Meets submission and documentation requirements; run/validation succeeds. |
| Presentation | If Loom is required: clarity and professionalism of walkthrough. |
6. After audit
- Update tech-assessment-plan-de-ae-ai.md Part 1.2 with any repo-specific notes.
- Add “Data Engineering Assessment” to the Interview Exercises database in Notion with link to the public repo.
- Ensure repo is public and README is candidate-ready; owner (Awaish) documented.