2026-05-08_brainforge_interview_with_austin_whitake

Meeting Title: Brainforge Interview with Austin Whitaker Date: 2026-05-08 Meeting participants: Awaish Kumar, austinW

WEBVTT

1 00:00:19.620 ⇒ 00:00:21.649 austinW: Hey, Austin Whitaker, nice to meet you.

2 00:00:22.480 ⇒ 00:00:23.540 Awaish Kumar: Hi, how you doing?

3 00:00:23.830 ⇒ 00:00:24.880 austinW: I’m doing well, how are you?

4 00:00:25.750 ⇒ 00:00:27.000 Awaish Kumar: I’m good as well.

5 00:00:27.940 ⇒ 00:00:29.050 Awaish Kumar: Nice to meet you too.

6 00:00:29.500 ⇒ 00:00:36.439 Awaish Kumar: Yeah, for this interview, we are just going to talk about, your background and,

7 00:00:36.960 ⇒ 00:00:40.939 Awaish Kumar: what you have been working on so far, and maybe…

8 00:00:41.370 ⇒ 00:00:46.999 Awaish Kumar: Yeah, and if you have any questions about rainfall, I would be… Happy to answer that.

9 00:00:47.480 ⇒ 00:00:48.150 austinW: I gotcha.

10 00:00:48.690 ⇒ 00:00:52.860 Awaish Kumar: Okay, yeah, let’s get started with your introduction.

11 00:00:53.340 ⇒ 00:01:00.200 austinW: Yeah, sure. So, Austin Whitaker, I’ve been working in data and analytics in corporate settings for about 13 years.

12 00:01:00.250 ⇒ 00:01:13.700 austinW: I got started, towards business intelligence with the presentation layer, but worked backwards into the SQL layer, and then into data warehousing opportunities. Etl on top of that warehousing, that was, like, SQL Server and SSIS days.

13 00:01:13.860 ⇒ 00:01:28.619 austinW: I saw the transition to cloud, so I’ve worked in different environments, and more recently, in probably the last 6 or 7 years, I’ve more so been working, you know, in the cloud stacks. I did 4 years at Amazon, where it was kind of an abstraction from AWS services.

14 00:01:28.620 ⇒ 00:01:34.450 austinW: But that was kind of where I transitioned from a lot of the relational database considerations towards cluster database and cluster compute.

15 00:01:34.840 ⇒ 00:01:47.220 austinW: And then, probably in the last 3 or 4 years, I was chasing both, you know, the new edge of AI, you know, opportunities. My current, contract opportunity is actually in an AI engineering capacity.

16 00:01:47.310 ⇒ 00:02:05.050 austinW: But along with that, I also took a lot of time to get into the open source ETL orchestration tooling. So I’ve kind of, you know, worked across all of it. I’ve seen, you know, I was working the BI title when that was, you know, I guess the DBA title was more prominent. I saw the data engineering title come up, analytics engineering appeared.

17 00:02:05.190 ⇒ 00:02:21.089 austinW: So yeah, you know, I’ve just, you know, given that we’ve all been in this for a long time, you know, we just chase the changing tools, but stick to the fundamentals that are always kind of there. So I’ve got a really strong basis of SQL, I’ve got an appreciation for ETL patterns and data warehousing patterns.

18 00:02:21.240 ⇒ 00:02:39.519 austinW: And, you know, really with the AI tooling, I’ve been able to venture into more generalist stuff, building services, taking dependencies on APIs, and really with the benefit of the AI tooling, my current project was really based on making a full app, reaching into, you know, databases and API dependencies.

19 00:02:39.520 ⇒ 00:02:45.510 austinW: But to create, you know, a hub for these folks to kind of access the data that they have on hand.

20 00:02:45.510 ⇒ 00:02:53.120 austinW: but incorporate some AI, generative AI features. So, been kind of generalist and broad across the whole spread, yeah.

21 00:02:53.120 ⇒ 00:03:00.659 Awaish Kumar: Okay, yeah. So, like, what one of your recent projects look like, in terms of CoolStack and,

22 00:03:01.440 ⇒ 00:03:03.850 Awaish Kumar: And deliverables.

23 00:03:04.150 ⇒ 00:03:09.380 austinW: Gotcha. Yeah, so my current, really the late last project on this freelance bit,

24 00:03:09.500 ⇒ 00:03:28.820 austinW: I got brought in the… our end users, they’re these, rather the company. Company is a company of, they do an LMS for nurses, so the content that they’re putting into this learning management stuff is… it’s medical, healthcare things, and this company, you know, curates, you know, thousands of questions, but also course content.

25 00:03:28.990 ⇒ 00:03:38.370 austinW: And so, we needed to make an application that would help their content editors manage the existing questions, but also generate new questions.

26 00:03:38.520 ⇒ 00:03:53.420 austinW: So, we ended up creating what was a React app, and I did… I chose Next.js, as opposed to… they use Vite all the time, because I knew that it would be, you know, more than a single-page app, and I like, actually, the paid routing that React, Next.js actually offered.

27 00:03:53.570 ⇒ 00:04:05.930 austinW: But otherwise, you know, we tapped into, we made a datastore, actually, that, you know, unified course materials, question materials, took dependencies from the front end to those… that… that data store.

28 00:04:06.140 ⇒ 00:04:16.700 austinW: to make their questions and course content searchable. Their ERP systems were not very, you know, you could look at one thing at a time, but you couldn’t, you know, deduce some summaries or aggregates on associations.

29 00:04:17.019 ⇒ 00:04:28.710 austinW: And we had, basically, content search, you could do metadata association, so we’d make API calls to associate, call it a question to a competency and build libraries that way.

30 00:04:28.820 ⇒ 00:04:31.009 austinW: And as far as the AI functionality.

31 00:04:31.150 ⇒ 00:04:37.950 austinW: We had a… in this app, we made an editor’s experience where they could recall an existing question.

32 00:04:38.220 ⇒ 00:04:44.180 austinW: Pull it in, and it’s, you know, you’re looking at the question stem, the answer, and then typically it’s multiple choice, so 3 distractors.

33 00:04:44.310 ⇒ 00:05:01.680 austinW: And so it would do three things. It would kind of enforce editorial requirements to say, hey, the question needs to be structured in this way, and it can’t include this language. So we could run a service that would bring back a generative response based on prompting and rules. Say, hey, this is how far you’re off from the mark.

34 00:05:01.830 ⇒ 00:05:12.899 austinW: It could also, if we deleted what were called distractors, those are the wrong answers, we could, have new and viable, like, believable distractors generated on request.

35 00:05:13.150 ⇒ 00:05:20.539 austinW: And then, as a part of that, too, you know, we had a… I took two stabs at, kind of, rag or generative, stuff on,

36 00:05:20.780 ⇒ 00:05:29.819 austinW: creating questions from zero. So one was to create RAG based on transcriptions of course material, which is kind of limited. It was kind of,

37 00:05:30.110 ⇒ 00:05:31.090 austinW: I’d say…

38 00:05:31.380 ⇒ 00:05:45.369 austinW: tough data formats, but separately, I created, based on, basically, a Microsoft template that was out there on GitHub, but I adapted an existing example of deep research. We had some Azure, deep research model capacity.

39 00:05:45.380 ⇒ 00:05:59.889 austinW: And I took what was a headed application, proof of concept, and I turned it into a headless service. And the idea was that, you know, for anything that a nurse would need to know, you could type in a topic, and then request deep research, it would return probably, like, a 3-page report.

40 00:06:00.010 ⇒ 00:06:19.189 austinW: it was far more reliable than model training, because we didn’t have hallucinations, because the deep research was reliably pulling, you know, the resources. We could track the URLs, and I got it also to, you know, determine what the publication dates were, and then also do inline citations. So we could do generative AI off of that as well, and create questions and answers.

41 00:06:19.260 ⇒ 00:06:20.839 austinW: Pretty big end-to-end, yep.

42 00:06:21.200 ⇒ 00:06:26.179 Awaish Kumar: Okay, so it’s more of an AI… like, a lot of AI part in there. Yeah.

43 00:06:26.180 ⇒ 00:06:28.189 austinW: This position was more AI-leaning, yep.

44 00:06:28.860 ⇒ 00:06:35.380 Awaish Kumar: So, like, for this project, how do you design the database or data viral?

45 00:06:35.810 ⇒ 00:06:39.970 austinW: Gotcha. Yeah, for this… for this project, we did have, so…

46 00:06:41.120 ⇒ 00:06:56.830 austinW: we had limited access to the source systems, so we could request, you know, instances of items on an API basis, like, individually, but it was very tough to stitch together, so we had an abstraction process where we did run ETL to create a, we did… it was, like.

47 00:06:57.210 ⇒ 00:07:02.020 austinW: the AWS, ER… rather.

48 00:07:02.620 ⇒ 00:07:07.409 austinW: Not DynamoDB. We did it on the relational database product that AWS puts out.

49 00:07:07.540 ⇒ 00:07:12.979 austinW: So we had an offline process where daily we would scrape down the questions and course material and model that.

50 00:07:13.310 ⇒ 00:07:25.689 austinW: And, because that database was maintained, like, we maintained the SQL for that database creation, Cursor was actually able to help us, you know, very quickly. I was surprised by this, because I’ve worked in databases, I had, you know, speculations.

51 00:07:25.770 ⇒ 00:07:34.380 austinW: But basically, we made services which would do the SQL to request data across these to, you know, bring related data to the UIs.

52 00:07:34.860 ⇒ 00:07:37.300 Awaish Kumar: I mean, like, in the…

53 00:07:38.150 ⇒ 00:07:43.540 Awaish Kumar: Using Karsal, you can actually… yeah, while development, we can… we can speed it up, but…

54 00:07:43.540 ⇒ 00:07:44.640 austinW: It’s visited outcome.

55 00:07:44.640 ⇒ 00:07:47.280 Awaish Kumar: Right? But once this is deployed, like, how you…

56 00:07:47.630 ⇒ 00:07:55.630 Awaish Kumar: Like, how, basically, it was interacting with the customer, and then, were you reading from your PD…

57 00:07:55.830 ⇒ 00:08:02.439 Awaish Kumar: filled database, SQL Server, or… Or directly from requesting from APIs?

58 00:08:02.930 ⇒ 00:08:08.570 austinW: Gotcha, gotcha. So, we had a, it was deployed on Lambdas, and…

59 00:08:08.710 ⇒ 00:08:22.090 austinW: the end users, you know, we… our build process, they would use a code build, and then lambdas to host, necessarily. And then, so our end users would go to the, the React app, the Next.js app.

60 00:08:22.390 ⇒ 00:08:30.519 austinW: They’d go to different screens, and based on those screens, services would call our database layer, and so that would present the data.

61 00:08:30.660 ⇒ 00:08:44.159 austinW: And then separately, we took dependencies on the source system. There’s a… there was a… there’s an application which existed to basically… it was a malleable framework where you could, apply metadata relationships.

62 00:08:44.240 ⇒ 00:08:50.199 austinW: So we had a data store which had just the questions and answers kind of flat, that we weren’t running so many,

63 00:08:50.250 ⇒ 00:09:07.510 austinW: we’d only do reads too, basically. But then, based on having the application keys from the source available to us, we would make API calls to allow for transactions. So to say that if you could identify a concept, and then identify a question, you could then, you know, in our application.

64 00:09:07.550 ⇒ 00:09:11.990 austinW: you know, apply the necessary keys to make the API call to associate them in the source system.

65 00:09:12.180 ⇒ 00:09:23.990 austinW: There was refreshes that would happen, so when we wrote to the source of truth on these relationships, we would also make writes and updates into our database that we could control.

66 00:09:24.040 ⇒ 00:09:32.900 austinW: And that kept it in sync for the most part. There was cases where this was still proof of concept, so we did put up with, like, a nightly sync, where we would more or less have some drop and replace.

67 00:09:33.030 ⇒ 00:09:38.070 austinW: But we did our best, you know, when we made live updates to reflect the state as closely as we could.

68 00:09:38.680 ⇒ 00:09:39.320 Awaish Kumar: Okay.

69 00:09:39.450 ⇒ 00:09:43.000 Awaish Kumar: So, yeah, as far as I understand, the project is, like.

70 00:09:43.280 ⇒ 00:09:50.169 Awaish Kumar: your application might read from source systems, like, call, but then,

71 00:09:50.790 ⇒ 00:10:02.039 Awaish Kumar: there will be interaction between the LLM and the user, and based on that, if you have any updated data, that will go to the system as well, but also will be stored as a…

72 00:10:02.350 ⇒ 00:10:05.130 Awaish Kumar: As a tracking, in your office.

73 00:10:05.390 ⇒ 00:10:21.519 austinW: Yeah, that’s correct. So, when we would read from… we had basically a snapshot, like a transient database that had an image of the latest data questions and answers. We would render that to the front end, and then in session, that’s where we kind of manage where we would do data changes.

74 00:10:21.650 ⇒ 00:10:28.100 austinW: And then make API calls to say that if there was a change, it would write to the system if you want to commit that change, so…

75 00:10:28.170 ⇒ 00:10:42.019 austinW: it was kind of a soft editing thing. We didn’t want every event to write back to a production system. We wanted to allow for session state to manage changes, and then once they were sure and they wanted to commit, you know, then they could make the API calls that would lock it in.

76 00:10:42.100 ⇒ 00:10:47.350 austinW: The generative, outward results, they would all be managed in session.

77 00:10:48.650 ⇒ 00:10:53.049 Awaish Kumar: Okay, got it. So in that case, like, the middle…

78 00:10:53.200 ⇒ 00:10:55.390 Awaish Kumar: Database that you were managing, like.

79 00:10:55.390 ⇒ 00:10:55.950 austinW: Yeah.

80 00:10:55.950 ⇒ 00:10:59.469 Awaish Kumar: It won’t be that, big or complex, it’s…

81 00:10:59.470 ⇒ 00:11:00.540 austinW: No, yeah.

82 00:11:00.540 ⇒ 00:11:05.560 Awaish Kumar: stations and… ongoing things in a session. It is tracking that, right?

83 00:11:05.810 ⇒ 00:11:17.700 austinW: That’s correct. We didn’t track so much of what was happening in the session. It was really, you know, I would say, it was session-scoped as far as people’s client and their web session.

84 00:11:17.700 ⇒ 00:11:28.759 austinW: If they closed the app and they left, they wouldn’t be able to record the… recall their stuff. There was a separate implementation we did. We stored, like, a JSON file on S3, so that they could restore their session, or at least what they were editing.

85 00:11:28.850 ⇒ 00:11:42.679 austinW: But separately is that, yeah, we designed that middleware… that middle database to be very simple, but also to be one that you could try… you could, you know, delete it and quickly restore it, just so that in case any state got away from the source system, we weren’t attached to it.

86 00:11:42.680 ⇒ 00:11:46.269 Awaish Kumar: We’re maintaining this state during the session.

87 00:11:46.560 ⇒ 00:11:47.100 austinW: Yeah.

88 00:11:47.750 ⇒ 00:11:48.070 Awaish Kumar: Okay.

89 00:11:48.070 ⇒ 00:11:49.429 austinW: Yeah, yeah,

90 00:11:50.440 ⇒ 00:11:57.140 Awaish Kumar: Okay, moving on to your experience with the AI AE work that… Having a.

91 00:11:57.830 ⇒ 00:11:58.580 austinW: Damn.

92 00:11:58.580 ⇒ 00:12:03.060 Awaish Kumar: Like, can you elaborate more on that? Like, what kind of… If you’ve worked on…

93 00:12:03.320 ⇒ 00:12:17.159 austinW: Yeah, so this opportunity that I’ve been on most recently, this is with HealthStream, this was the first one where I was really making production or really AI solutions for a client. I was, previously kind of just learning on the side, is that,

94 00:12:17.810 ⇒ 00:12:37.169 austinW: I’d say back in 2023, 24. I was doing a lot more home labbing, necessarily. I was using Docker. I was actually inclined to approach these LLMs from a local LLM perspective, rather than, you know, just using Claude or using, OpenAI or something like that on the web.

95 00:12:37.350 ⇒ 00:12:53.299 austinW: So, I was really… I was using Olama, I was using LLM Studio, I was kind of following with the model releases that were changing, and just evaluating for myself, you know, how is Quen, or how is DeepSeek, and, you know, what is… what is Sonnet doing, what is Llama doing?

96 00:12:53.700 ⇒ 00:13:11.279 austinW: And so I kind of saw, I guess, the leaps and bounds of things, but granted, I was limited to the capacity on my laptop, so there were cases where the models were just too big, but I was learning about quantization, you know, where you have, like, the 8 billion versus the 16B, you know, variance, and taking those dependencies, but exploring, you know, just…

97 00:13:11.360 ⇒ 00:13:21.080 austinW: LLM responses. With this last opportunity, though, this was more where I really, you know, I leveraged my service, you know, development experience.

98 00:13:21.120 ⇒ 00:13:25.869 austinW: to adapt and incorporate more of, I guess, what I regard as qualitative responses.

99 00:13:25.890 ⇒ 00:13:43.159 austinW: So when you’re getting paragraphs back, or if you wanted to do a single, like, a one-shot prompting thing, or you wanted to carry that context to subsequent calls, I evaluated things like, Crew AI was one that was, you know, very quick and out of the box to find agent scope, tasks, tell them if they could collaborate.

100 00:13:43.160 ⇒ 00:13:53.089 austinW: And I went as far as developing, well, in some cases, taking available tools, so you’d give the AI a tool to do a thing, but in other cases, I had to develop, develop some tools.

101 00:13:53.130 ⇒ 00:14:07.930 austinW: And, separately was more so getting towards Langchain. I found that Crew AI had a lot of features, but it was actually kind of slow. So just going back to Langchain, did help to string together subsequent prompt calls, from a service perspective.

102 00:14:09.000 ⇒ 00:14:09.690 Awaish Kumar: Okay.

103 00:14:10.820 ⇒ 00:14:14.790 Awaish Kumar: yeah, in terms of warehousing, what…

104 00:14:15.280 ⇒ 00:14:19.590 Awaish Kumar: Yeah. What do you have experience with the latest warehouses?

105 00:14:20.240 ⇒ 00:14:32.789 austinW: Yeah, sure. So, I got started again with the SQL Server on-prem stuff, so it was TFS days, and I was responsible for dev, UAT, and production back in those days. Pretty large company, pretty large data, necessarily.

106 00:14:32.870 ⇒ 00:14:46.350 austinW: I’ve seen the transition to cloud, so dependence… I had, at Geared, a company… two companies ago, we built, really those data lake and, like, medallion, you know, practices against BigQuery.

107 00:14:46.620 ⇒ 00:14:52.419 austinW: And then the clients that I was working for, really, my current contract, they are a Snowflake customer.

108 00:14:52.490 ⇒ 00:15:09.870 austinW: They’ve got some on-prem stuff, some Snowflake stuff, but I’ve run into the likes, basically, of Snowflake and Databricks. I’ve gone for interviews for Databricks implementation partners before. So I look at them as, you know, it’s definitely a pay-as-you-go thing. You have to be really careful about what compete you’re firing off.

109 00:15:10.020 ⇒ 00:15:26.259 austinW: And then specifically cases where, you know, your cloud dependencies. So, if you’re doing things in a region, you know, making sure that your copies are going to like-to-like regions, or else you get surprised by bills. So, you know, I see the carryover. You know, SQL’s always been on lingua franca, basically. You can kind of use it everywhere.

110 00:15:26.390 ⇒ 00:15:40.019 austinW: And I’m otherwise just following the product offerings for Snowflake and Databricks, for the most part in those terms. I’ve liked, you know, the incorporation of their MCP servers and their AI agents in-app. That’s been fun to watch.

111 00:15:40.120 ⇒ 00:15:43.079 austinW: And then how they’ve approached things. Like, I think that,

112 00:15:43.430 ⇒ 00:15:57.020 austinW: between Databricks and Snowflakes, like, I like Databricks’ AIVI product, where you’re still looking at, kind of, blessed and reviewed queries as you’re… as it’s defined in that semantic layer. So I’m comfortable using those tools for sure, too.

113 00:15:57.320 ⇒ 00:16:01.780 Awaish Kumar: So, like, how would you, for example.

114 00:16:02.370 ⇒ 00:16:05.180 Awaish Kumar: like, optimize your cost on Snowflake.

115 00:16:05.870 ⇒ 00:16:25.790 austinW: Thank you. Yeah, so, I mean, in general, it’s… in a lot of SQL cases, it’s… it’s still cluster compute in a way. It’s not like you’re optimizing glue all the time, you know, BigQuery and Snowflake, they have different ways that they store and recall this data, but in general, you know, I’d say anything that’s, like, long-running queries,

116 00:16:26.670 ⇒ 00:16:34.660 austinW: Queries pulling unnecessary data, largely taking statistics, kind of, on what are the most trafficked queries of an existing production system.

117 00:16:34.930 ⇒ 00:16:37.180 austinW: Making sure they’re not pulling extra data.

118 00:16:37.330 ⇒ 00:16:44.069 austinW: And then, you know, in the cases where, you know, they just don’t maybe have the refinement, so there’s definitely times where…

119 00:16:44.070 ⇒ 00:17:00.169 austinW: you’ll have bad views, where views are taking dependencies on views, and they’re just going back to sources when they don’t have to. So in some cases, you could make, like, a DPT model and just kind of get to a higher level of distillation, so that you’re hitting a smaller fact table as opposed to a larger original source.

120 00:17:00.900 ⇒ 00:17:02.210 Awaish Kumar: Yeah, so…

121 00:17:02.380 ⇒ 00:17:08.429 Awaish Kumar: Okay, yeah, these are, like, some of the things that you can do, like, in terms of best practices.

122 00:17:08.700 ⇒ 00:17:09.210 austinW: Pune.

123 00:17:09.210 ⇒ 00:17:14.250 Awaish Kumar: the query, but what, like, warehousing techniques you can apply for optimization?

124 00:17:14.680 ⇒ 00:17:19.210 austinW: Sure. So in terms of warehousing, so…

125 00:17:19.950 ⇒ 00:17:25.839 austinW: Warehousing in general, I think that when you’re… when you’re simplifying complicated ERPs… go ahead.

126 00:17:26.359 ⇒ 00:17:34.809 Awaish Kumar: Yeah, if you want to talk about a single one, like BigCerry or Snowflake, where you have most experience, like, you can just give examples of that.

127 00:17:34.970 ⇒ 00:17:40.210 austinW: Okay. Yeah, so, at Amazon is a good example,

128 00:17:40.440 ⇒ 00:17:44.059 austinW: We were… necessarily, we’re using,

129 00:17:44.270 ⇒ 00:17:51.390 austinW: it was a pay-as-you-go, it was really a large-scale GLU instance, but it was compute on demand.

130 00:17:51.490 ⇒ 00:17:55.620 austinW: So we needed to make sure that we matched compute to the workloads, and in these cases.

131 00:17:55.680 ⇒ 00:18:14.939 austinW: We would do case… like, our gran… our granularity of source data was so huge, but what we needed to get to were insights about a customer, about a volume of… I was working at Alexa shopping, so the idea was that an Alexa user is using so many, you’d have monthly active users, certainly, but I was reporting at a grade of different,

132 00:18:15.270 ⇒ 00:18:31.980 austinW: it was features that they were using, so shopping-related features. So because the original, you know, the number of utterances you’d have, you’d have, like, you know, call it 10 service calls that are back behind one 30-second interaction with Alexa. So we did a lot of work, I guess, to,

133 00:18:33.230 ⇒ 00:18:47.100 austinW: standardize that fact in that, that dimension-level data. So we would do pre-computed aggregates, at certain granularities, so you could look at customer per day, certainly filtering out a lot of, you know, production data noise.

134 00:18:47.280 ⇒ 00:18:55.199 austinW: But in terms of that, you know, doing materialized views, or physical views necessarily for past data that was locked in.

135 00:18:55.350 ⇒ 00:18:57.070 austinW: We would.

136 00:18:57.250 ⇒ 00:18:58.540 Awaish Kumar: Yeah.

137 00:18:58.640 ⇒ 00:19:06.510 Awaish Kumar: This is still, like, at a carry level, that, these things you… you would do while you are modeling things, like.

138 00:19:06.900 ⇒ 00:19:12.910 Awaish Kumar: Aggregate early, filter out early, so you can, at the end, you have the less amount of data.

139 00:19:13.440 ⇒ 00:19:14.280 austinW: Yeah.

140 00:19:14.280 ⇒ 00:19:18.259 Awaish Kumar: What about, like, techniques like partitioning, clustering?

141 00:19:18.260 ⇒ 00:19:32.680 austinW: Absolutely. Yeah, so partition strategies were really important, especially with our glue catalog, is that we would encounter tables where the fact data was not partitioned. So there’s cases where, you know, instead of running a presentation layer query repeatedly.

142 00:19:32.680 ⇒ 00:19:39.119 austinW: We did have the ability to create our own S3, you know, bucket destinations and manage files that way.

143 00:19:39.120 ⇒ 00:19:47.539 austinW: So I did take it upon myself at different times to do different distribution partition strategies to persist, you know, the granularities of data across those three.

144 00:19:47.630 ⇒ 00:19:48.640 austinW: And then…

145 00:19:48.850 ⇒ 00:19:54.470 austinW: using… it was really Spark SQL syntax, but to request data across those partitions for days or weeks.

146 00:19:55.300 ⇒ 00:20:01.319 Awaish Kumar: Google is, like, kind of not a warehouse. I understand you worked with S3, and you tried to implement,

147 00:20:01.460 ⇒ 00:20:12.899 Awaish Kumar: partitioning, architect 3, but then there are some techniques in warehouse itself, like, if you go to the Redshift, there are different optimization techniques. If you use Snowflake, then…

148 00:20:13.340 ⇒ 00:20:21.659 austinW: That’s right. It’s still mostly on the basis of, like, distribution, distribute by and partition by, like, distribute by and sort by, order by, really.

149 00:20:21.660 ⇒ 00:20:34.270 Awaish Kumar: how they handle, and what options do you have is kind of limited. Yeah. Obviously, there are different ways, different conventions, different names, the concept is the same, so… but, yeah, thing is that, like.

150 00:20:34.410 ⇒ 00:20:38.420 Awaish Kumar: Okay, apart from that, like,

151 00:20:38.960 ⇒ 00:20:43.570 Awaish Kumar: Also, if we use database, like, are you familiar with, indexing?

152 00:20:43.860 ⇒ 00:21:03.850 austinW: Yeah. Yeah, so, going even as far back as the SQL Server days, there was times where I was actually using Python to, define my indexes, too, so I would go across schemas that didn’t have indexes, and based on, different primary keys or foreign key relationships, go through and generate indexes and submit that through the build process, yeah.

153 00:21:04.060 ⇒ 00:21:09.259 Awaish Kumar: How would you identify where… What columns we should apply indexes for.

154 00:21:09.700 ⇒ 00:21:25.820 austinW: Yeah. So generally speaking, it’s those that you’re most often grouping by. Date and date timestamp, columns are often ones where timestamps are kind of too low level, and you want to get to a denser grain, so you’re looking at days, you’re looking at, months sometimes on that partition strategy.

155 00:21:26.000 ⇒ 00:21:31.730 austinW: Specifically, we run into this with, What is it?

156 00:21:32.160 ⇒ 00:21:35.439 austinW: GDRP, but yeah, go ahead. Like, regions data, go ahead.

157 00:21:35.440 ⇒ 00:21:38.019 Awaish Kumar: We are not talking about partitioning right now, it’s…

158 00:21:38.020 ⇒ 00:21:39.139 austinW: Okay. Yeah.

159 00:21:39.140 ⇒ 00:21:46.160 Awaish Kumar: I’m talking about indexing. How would you identify your columns? What columns you would select for indexing?

160 00:21:46.340 ⇒ 00:21:48.949 Awaish Kumar: A table which is non-performant.

161 00:21:49.320 ⇒ 00:22:01.480 austinW: Gotcha. Yeah, so, again, still on columns that are typically grouped, so, major dimensions, customers, locations, geographies sometimes, but most often, like, dates.

162 00:22:01.770 ⇒ 00:22:03.749 austinW: Or not… yeah, yeah.

163 00:22:04.980 ⇒ 00:22:07.380 Awaish Kumar: Okay, and then.

164 00:22:08.080 ⇒ 00:22:15.119 austinW: Because you don’t want to scan a heap. You need to, you know… those that are qualified on where statements, those are typically part of joins, those cases.

165 00:22:15.510 ⇒ 00:22:16.570 Awaish Kumar: Yeah, okay.

166 00:22:16.730 ⇒ 00:22:25.119 Awaish Kumar: And, far… So, okay, so you mentioned dbt model. Do you have experience with using dbt?

167 00:22:25.280 ⇒ 00:22:33.169 austinW: In a lab setting. I did a GCP build-out with a startup where I was, like, the first capacity suite to build a lot of things, like, right into GCP.

168 00:22:33.350 ⇒ 00:22:41.650 austinW: But as far as, dbt goes, I do take a lot of, like, the Kimball and Iman practices of doing, like, an SCD pattern or a drop and replace pattern.

169 00:22:41.650 ⇒ 00:22:54.660 austinW: But, not in a production setting have I worked with dbt. But I appreciate that, you know, it takes that SDLC code, or that, rather, the DDL and DML, and makes it into a, you know, a code base that you can do Git commits with.

170 00:22:54.750 ⇒ 00:23:05.730 austinW: simplify the build process with, but also incorporate, you know, sanity tests. I’ve done so many manual sanity tests, so to have a mature dbt instance, that’s definitely an area of growth that I want to get involved in.

171 00:23:06.120 ⇒ 00:23:10.220 Awaish Kumar: Okay, how would you implement a CD in using dbt?

172 00:23:10.740 ⇒ 00:23:23.849 austinW: Yeah, so with the defined models and taking, like, reference linkages, you know, it’s the snapshot data, so I’d be kind of conver… you know, translating. In my case, I did a lot of SCD stuff using,

173 00:23:23.970 ⇒ 00:23:30.490 austinW: merge upcert kind of statements, and you could do kind of complicated things where you’re,

174 00:23:30.670 ⇒ 00:23:34.739 austinW: We achieved this with hashing on select columns and comparing a computed hash, but go ahead.

175 00:23:34.740 ⇒ 00:23:43.549 Awaish Kumar: But that you can use using SQL, right? We don’t have dbt for that. So, is there anything in dbt that we can use for…

176 00:23:45.420 ⇒ 00:23:45.910 austinW: Yeah.

177 00:23:45.910 ⇒ 00:23:47.099 Awaish Kumar: You’re running a city.

178 00:23:47.660 ⇒ 00:23:58.359 austinW: aware that, I guess, there’s out-of-the-box patterns, so this is one that I’m not fresh on, but I’m aware that, you know, dbt has, you know, key syntax where you can actually get the SCD behaviors.

179 00:23:59.830 ⇒ 00:24:00.530 Awaish Kumar: Okay.

180 00:24:01.050 ⇒ 00:24:08.249 Awaish Kumar: Okay, I think I’m good with my questions. We are left with a few more minutes, if you have any questions for me.

181 00:24:08.720 ⇒ 00:24:20.509 austinW: Yeah, yeah, no. I’m aware that, you know, y’all are doing, you know, it’s like end-to-end data, so from source to value, and so there’s analytics engineering in there, there’s data engineering and AI applications, too.

182 00:24:20.650 ⇒ 00:24:22.880 austinW: Can you tell me about, I guess.

183 00:24:23.640 ⇒ 00:24:34.870 austinW: you know, if the company has a sweet spot, basically, I know that y’all are maybe around 20 to 25 headstrong, so are there areas that you’re trying to build up more often than others?

184 00:24:36.550 ⇒ 00:24:42.549 Awaish Kumar: So, we are a company of 20-25 people, which also includes marketing, sales, operations.

185 00:24:42.850 ⇒ 00:24:43.490 austinW: Hmm.

186 00:24:44.640 ⇒ 00:24:50.120 Awaish Kumar: In the data team, we are maybe… 5, 6…

187 00:24:50.970 ⇒ 00:24:55.479 Awaish Kumar: Maybe 6 to 8 people, and then also the IA, 3-4 people.

188 00:24:58.050 ⇒ 00:24:58.720 austinW: Sure.

189 00:24:58.950 ⇒ 00:25:09.099 Awaish Kumar: Most of… majority of work is right now in data world. Yeah. Although we have AI clients, and that is also growing.

190 00:25:10.430 ⇒ 00:25:18.140 Awaish Kumar: But majority of the work, and then the enterprise clients we have is for the data work right now.

191 00:25:18.520 ⇒ 00:25:35.299 austinW: Okay. And then, I know that y’all are, like, I think, implementation partners for Snowflake, and that I think you guys got one of your first Fortune 500 clients. Can you tell me about some of the past projects? Are they typically going from 0 to 1, or are you coming into existing, you know, implementations of some of these tools?

192 00:25:36.050 ⇒ 00:25:38.449 Awaish Kumar: Yeah, so, yeah, I…

193 00:25:39.300 ⇒ 00:25:47.360 Awaish Kumar: So, for our processes that we go with the client, if it’s 0 to 1, we go in, with the

194 00:25:47.680 ⇒ 00:25:56.440 Awaish Kumar: Our discovery phase, meet with the stakeholders, try to understand the business, the business domains.

195 00:25:57.180 ⇒ 00:26:10.000 Awaish Kumar: where data lives, what are the different departments, who are the different heads of the departments, and then what reports they are looking for, what are their KPIs.

196 00:26:10.260 ⇒ 00:26:11.840 Awaish Kumar: Different kind of data.

197 00:26:12.030 ⇒ 00:26:14.530 Awaish Kumar: But where they are getting these data from.

198 00:26:14.690 ⇒ 00:26:23.490 Awaish Kumar: So, starting with, like, this is the discovery where we get to know about the company, business domain, their data, and the

199 00:26:23.650 ⇒ 00:26:27.780 Awaish Kumar: Basically, and what they measure for their, like, the success.

200 00:26:28.150 ⇒ 00:26:32.490 Awaish Kumar: For example, if someone is chief retail officer.

201 00:26:34.790 ⇒ 00:26:37.190 Awaish Kumar: In a company, then…

202 00:26:37.400 ⇒ 00:26:48.800 Awaish Kumar: managing retail… retailers, like Walmart, Target, so we… for him, like, what are the KPIs? Like, how he measures his success? And then we try to come up with

203 00:26:49.180 ⇒ 00:26:51.090 Awaish Kumar: I’ll go… and then go for Beck.

204 00:26:51.620 ⇒ 00:26:54.040 Awaish Kumar: Trek back to the data, and then…

205 00:26:54.240 ⇒ 00:26:57.449 Awaish Kumar: Come up with a standardized way of,

206 00:26:57.660 ⇒ 00:27:01.000 Awaish Kumar: doing it. So, after this discovery, we go…

207 00:27:01.440 ⇒ 00:27:04.129 Awaish Kumar: Come up with the architecture diagram.

208 00:27:04.500 ⇒ 00:27:05.040 austinW: Right.

209 00:27:05.040 ⇒ 00:27:11.569 Awaish Kumar: I can show them, like, that’s how we see your business, and this is how, every part of the company

210 00:27:12.170 ⇒ 00:27:12.830 Awaish Kumar: Mmm.

211 00:27:12.960 ⇒ 00:27:19.900 Awaish Kumar: We’ll bring the data, and we’ll unify it, and finally how it will be landed in the warehouse.

212 00:27:20.080 ⇒ 00:27:27.480 Awaish Kumar: So it’s… it’s a general purpose architecture diagram, which just shows you the, the first step of,

213 00:27:28.010 ⇒ 00:27:35.820 Awaish Kumar: your organization, how data flows through your organization. Second instinct then comes a… about discover, like,

214 00:27:36.260 ⇒ 00:27:42.780 Awaish Kumar: like, looking at the data volumes, how frequent they look at their KPIs, if there’s a need for batch processing, or…

215 00:27:43.000 ⇒ 00:27:49.840 Awaish Kumar: Real-time streaming, and then, looking at the cost, and then you come up with different memos, right?

216 00:27:50.650 ⇒ 00:27:54.409 Awaish Kumar: For each of the…

217 00:27:56.380 ⇒ 00:28:13.549 Awaish Kumar: sources, tools that you want to propose. We come up with competitive analysis for different tools, comparing it with the actual client scenarios, and then basically, get their decisions on it, and yeah, after that, we start…

218 00:28:13.550 ⇒ 00:28:14.420 austinW: implementation.

219 00:28:14.700 ⇒ 00:28:28.080 austinW: Gotcha. And in terms of those kind of proofs of concept or otherwise, I’d say specifically if you’re building, call it a Snowflake instance, is it the type of deal where you’ll, you know, you might set up a warehouse in Snowflake, in…

220 00:28:28.340 ⇒ 00:28:37.130 austinW: a capacity that’s managed by Brainforge and kind of port those things over, or are you often going into, like, the customer’s production, you know, snowflake environments?

221 00:28:37.130 ⇒ 00:28:41.109 Awaish Kumar: I have trial versions, right? Every tool now offers trial versions.

222 00:28:42.410 ⇒ 00:28:43.070 austinW: Yeah.

223 00:28:43.090 ⇒ 00:28:45.130 Awaish Kumar: You know, 4 weeks, or…

224 00:28:45.660 ⇒ 00:28:50.539 Awaish Kumar: 6 weeks’ time, we get as a trial, like, we can show them, like.

225 00:28:50.680 ⇒ 00:28:52.999 Awaish Kumar: Either something you want to use or not.

226 00:28:53.340 ⇒ 00:28:54.000 austinW: Okay.

227 00:28:54.110 ⇒ 00:29:08.230 austinW: And then, you know, I know that you guys are using a… you help the client pick their choice of a lot of these composable tools, so… are there particular, like, orchestration, tools that you guys implement most often? I’m thinking, like, AirByte, or Daxter, or Mage.

228 00:29:09.580 ⇒ 00:29:21.279 Awaish Kumar: Yeah, right now, we don’t… none of our clients right now have their own orchestrator. It’s mostly ingestion tools, then dbt running through…

229 00:29:21.280 ⇒ 00:29:21.830 austinW: Yeah.

230 00:29:21.830 ⇒ 00:29:24.240 Awaish Kumar: like, CSCD or whatever, and then be like… There you go.

231 00:29:24.550 ⇒ 00:29:28.090 Awaish Kumar: So, we don’t… like, right now, people don’t need…

232 00:29:28.340 ⇒ 00:29:31.870 Awaish Kumar: orchestrator, right? I would just want this data.

233 00:29:31.990 ⇒ 00:29:38.880 Awaish Kumar: It can be transformed through a skill, and then finally that. But we do have some clients where we need it, but then it is… we have internal…

234 00:29:39.010 ⇒ 00:29:41.650 Awaish Kumar: Brainforce Managed Action, where we…

235 00:29:41.770 ⇒ 00:29:46.480 Awaish Kumar: Use it for some of our clients here, we need custom scripts to run.

236 00:29:46.820 ⇒ 00:29:54.189 austinW: There you go. No, I like that so much, is that, I know that for me, dbt is a gap, but it’s one that, you know, to find the right, you know, big…

237 00:29:54.230 ⇒ 00:30:08.670 austinW: company or group that is working with it, frankly. So I know what it’s their doing, and I imagine that it makes it so much easier to make these kind of proposals to a client when the schema and everything can then be lifted and deployed into a target as, you know, Snowflake instance, so…

238 00:30:08.670 ⇒ 00:30:22.239 austinW: you know, the ability to seed, you know, things like, you know, forecasts or what have you, it’s great. And then, yeah, I’m one that’s used to kind of scraping schemas that exist, and then, you know, doing that data evaluation. So, I like that discovery, the way it sounds.

239 00:30:23.080 ⇒ 00:30:23.710 Awaish Kumar: Okay.

240 00:30:24.100 ⇒ 00:30:24.840 austinW: Yeah.

241 00:30:25.790 ⇒ 00:30:29.210 Awaish Kumar: Okay, yeah, thank you, thank you for your time today.

242 00:30:29.870 ⇒ 00:30:31.159 Awaish Kumar: Yeah, see you soon.

243 00:30:31.500 ⇒ 00:30:33.600 austinW: Thank you, Wish. Have a great weekend. Thanks.

Brainforge Knowledge

Explorer

2026-05-08_brainforge_interview_with_austin_whitake_5a19d6cd

Graph View