Data engineering focus areas
Observability
- We should be able to zoom in/out on our data landscape.
What is running in production?
- How can I find relevant data?
- How can I discover new insights?
- What are the dependencies across datasets?
What is the state of our system?
- When we’re things last loaded?
- What is seeing the most usage?
- How long are our build times?
- How do we know something is down?
Development workflow
- How do we get things done?
- How do we structure our project?
- Where should these files go?
- How should I name this thing?
How do we set up issues?
- I found a bug.
- What info should I include?
Writing clearly
- What is a “good” commit message?
- Reviewing pull requests.
Documentation
- How do I run this?
- What do I do when … ?
Agent learning and autonomy
- What should the agent be allowed to do without approval?
- Which actions always require human confirmation?
- How do we capture run traces and turn them into safe improvements?
- How do we score learned behavior quality before promotion?
- How do we roll back a learned behavior quickly?
Data PR trust and review quality
- Can every data PR show measurable impact automatically?
- Are we detecting duplicates, null spikes, and row-count regressions?
- How do reviewers see blast radius without manual SQL compare work?
Analyst cloud workflow quality
- Can we go from question to evidence-backed insight in one cloud flow?
- Does each claim link to a query or artifact?
- How do we preserve narrative quality while keeping technical traceability?