[Client] — Golden dataset spec — [Domain / scope]

About this document (Brainforge)

Internal conventions for how this file works in the repo. Strip or export without this section when sharing with a client.

Titling and filename

Use [Client] — Golden dataset spec — [Domain or use case] for the document title. Examples: LMNT — Golden dataset spec — Omnichannel revenue · Acme — Golden dataset spec — Wholesale order-to-cash.

Filename: {client}-golden-dataset-{domain}.md under knowledge/clients/{client}/resources/.

When to use this template

Use this when building an evaluation dataset for AI-powered natural-language querying (e.g., Cortex Analyst, custom NL2SQL). The golden dataset defines a set of questions with verified correct answers that serve as the acceptance test for the semantic layer.

Do not use this template when:

  • designing the semantic view itself (use the Semantic View Design Doc)
  • profiling a new data source (use the Discovery Memo)

Document metadata

Status: [Draft / In review / Approved / Locked] Warehouse: [platform] — Account/region: [details] Semantic view: [view name or path to C2 doc] Version: [1.0 / increment when questions are added or answers change] Prepared by: Brainforge Last updated: [YYYY-MM-DD]


ArtifactLink / pathNotes
Semantic View Design Doc[path to C2 doc]The semantic view this dataset tests
Discovery Memo[path to A1 memo]Source profiling reference
Data Platform Documentation[Google Sheet link]Source catalog, metric definitions

1. Dataset purpose

[2–4 sentences. What questions should this golden dataset cover? What use cases or user personas does it represent? What makes a passing vs. failing result?]


2. Question catalog

Each row is one test case. Placeholder values are shown; replace with actual questions and answers.

#Natural language questionExpected SQL logicExpected result typeResult valueToleranceStatus
1[e.g., "What was total revenue last month?"]SUM(revenue) WHERE month = CURRENT_MONTH[Scalar / Row / Table][$X][±% or exact][Active]
2[e.g., "Which product had the most growth?"]TOP 1 product ORDER BY growth DESC[Row][Product name][exact][Active]
3[e.g., "Show me revenue by state for last quarter"]SELECT state, SUM(revenue) WHERE quarter = PREVIOUS[Table][State: X, Revenue: Y; ...][±5%][Active]
4[Edge case: "What was revenue last month?" when table is empty][Scalar][null or 0][exact][Active]
5[Edge case: date range with no data][Scalar][0 or empty][exact][Active]

3. Source tables

TableRoleVerified answers query source
[database.schema.table][fact / dimension / reference][SQL used to generate verified answers]
[database.schema.table][fact / dimension / reference][SQL used to generate verified answers]

4. Edge cases

  • [Edge case][What the edge case is. How the golden dataset handles it. What the expected behavior of the NLQ system should be.]
  • [Edge case][...]

5. Known limitations

  • [Limitation][What the golden dataset does not cover. Why. What follow-up work would address it.]

6. Runner manifest

AttributeValue
Question count[N] active questions
Last run[YYYY-MM-DD]
Pass rate[N / N] (XX%)
Runner command[e.g., python scripts/golden_audit.py --dataset {path}]

Appendix — Pre-handoff QA checklist

  • Every question has a verified answer from the warehouse (not hand-crafted)
  • Questions cover: simple aggregations, filters, group-bys, time ranges, comparisons
  • Edge cases are included (empty results, null handling, ambiguous phrasing)
  • Tolerance is defined per question and justified
  • Expected SQL logic is documented (directional, not executable — the NLQ engine may generate different SQL for the same answer)
  • Runner manifest tracks pass rate over time
  • Questions are written in the vocabulary the client actually uses