Data engineering focus areas

Raw schema profiles

Profiled schemas for transformation layer design: raw-schema-profiles/


Observability

  • We should be able to zoom in/out on our data landscape.

What is running in production?

  • How can I find relevant data?
  • How can I discover new insights?
  • What are the dependencies across datasets?

What is the state of our system?

  • When we’re things last loaded?
  • What is seeing the most usage?
  • How long are our build times?
  • How do we know something is down?

Development workflow

  • How do we get things done?
  • How do we structure our project?
  • Where should these files go?
  • How should I name this thing?

How do we set up issues?

  • I found a bug.
  • What info should I include?

Writing clearly

  • What is a “good” commit message?
  • Reviewing pull requests.

Documentation

  • How do I run this?
  • What do I do when … ?

Agent learning and autonomy

  • What should the agent be allowed to do without approval?
  • Which actions always require human confirmation?
  • How do we capture run traces and turn them into safe improvements?
  • How do we score learned behavior quality before promotion?
  • How do we roll back a learned behavior quickly?

Data PR trust and review quality

  • Can every data PR show measurable impact automatically?
  • Are we detecting duplicates, null spikes, and row-count regressions?
  • How do reviewers see blast radius without manual SQL compare work?

Analyst cloud workflow quality

  • Can we go from question to evidence-backed insight in one cloud flow?
  • Does each claim link to a query or artifact?
  • How do we preserve narrative quality while keeping technical traceability?