Meeting Title: Brainforge Interview w- Awaish Date: 2026-04-09 Meeting participants: Rhodes, Awaish Kumar


WEBVTT

1 00:00:13.450 00:00:14.060 Awaish Kumar: Hi.

2 00:00:15.000 00:00:15.920 Rhodes: Hey there.

3 00:00:16.239 00:00:16.870 Rhodes: How you doing?

4 00:00:17.780 00:00:19.720 Awaish Kumar: I’m good, how about you?

5 00:00:20.330 00:00:22.350 Rhodes: Yeah, I’m good. Good to meet you.

6 00:00:23.140 00:00:25.319 Awaish Kumar: Nice to meet you, too. Where are you located?

7 00:00:26.000 00:00:27.460 Rhodes: I’m in Portland, Oregon.

8 00:00:29.220 00:00:29.980 Awaish Kumar: Okay.

9 00:00:32.520 00:00:39.010 Awaish Kumar: No, so just… like, session, we are just going to… Talk about,

10 00:00:40.480 00:00:51.980 Awaish Kumar: a little bit about, like, Rain Forge, if you have any questions about Brainforge, and then we are just going to learn more about your background.

11 00:00:52.930 00:00:54.779 Awaish Kumar: What you have been doing so far?

12 00:00:55.710 00:00:56.450 Rhodes: Cool.

13 00:01:00.850 00:01:03.830 Awaish Kumar: Yeah, if you can start with your introduction.

14 00:01:04.209 00:01:06.299 Rhodes: Sure, yeah, so…

15 00:01:06.439 00:01:26.429 Rhodes: As you know, I’m Rhodes professionally. I’m a data engineer. I’ve spent the past few years working in consulting environments, where the core challenge was both building pipelines, but also making data reliable and taking fragmented data and making it usable for decision making.

16 00:01:26.489 00:01:35.879 Rhodes: A lot of the systems I’ve worked in had issues like inconsistent metric definitions, non-deterministic transformations.

17 00:01:35.879 00:01:48.539 Rhodes: Or pipelines that would technically run, but just kind of produced results that teams didn’t trust. So, my role was typically to step into those environments and rebuild those in Spark.

18 00:01:48.639 00:01:54.679 Rhodes: And put validation and structures in place so that it became consistent and explainable.

19 00:01:54.909 00:02:02.889 Rhodes: So I’ve worked across a few large-scale migrations from SaaS to Spark-based platforms, like Databricks and Fabric.

20 00:02:03.079 00:02:14.449 Rhodes: And I was responsible for both rewriting logic and making sure that outputs were deterministic, performant, and diagnosable when something went wrong, so…

21 00:02:14.649 00:02:24.499 Rhodes: I guess, what I found is that the biggest unlock for teams wasn’t just better pipelines, but systems that are quickly understandable when something breaks.

22 00:02:26.159 00:02:27.199 Rhodes: Okay.

23 00:02:28.730 00:02:29.450 Awaish Kumar: And.

24 00:02:32.280 00:02:33.250 Rhodes: Okay.

25 00:02:33.250 00:02:41.379 Awaish Kumar: Okay, let’s talk about, like, One of your projects, did you think, like, was the most complex project?

26 00:02:41.650 00:02:45.039 Awaish Kumar: And we can talk about, like, the…

27 00:02:46.180 00:02:51.069 Awaish Kumar: End-to-end delivery, and what tools and technologies were used, and why.

28 00:02:51.690 00:02:55.430 Rhodes: Okay, yeah, happy to. So I’d say from…

29 00:02:55.560 00:03:05.940 Rhodes: an end-to-end perspective, when I worked at Sun Life, that was probably the most complex. So, I owned the dental pricing unit for an insurance analytics system.

30 00:03:06.060 00:03:21.079 Rhodes: Where there was constantly, kind of, inconsistent metrics for quarterly reviews. Pipelines weren’t failing, but the outputs basically were dependent on execution timing, and

31 00:03:21.100 00:03:33.140 Rhodes: When I dug into it, the root cause was that different product groups had implemented their own SaaS transformations over time, and then those were being aggregated together.

32 00:03:33.200 00:03:38.300 Rhodes: And that led… that alone led to a lot of non-deterministic behavior.

33 00:03:38.380 00:03:41.510 Rhodes: There’s also a lot of…

34 00:03:41.620 00:03:50.039 Rhodes: Sort orders that weren’t handled deterministically, inconsistent joins, and different definitions of key metrics, so…

35 00:03:50.270 00:04:06.690 Rhodes: my… what I had to do is approach it by first isolating what the highest impact metrics were, rebuilding those in Spark in a deterministic way, working with stakeholders to define those things, and then adding validation checks to compare outputs.

36 00:04:06.830 00:04:18.220 Rhodes: And also in parallel, right, so I was working with stakeholders to standardize those definitions across product groups. A lot of the inconsistencies were just how metrics were…

37 00:04:19.890 00:04:27.819 Rhodes: you know, defined, themselves. So, yeah, the result was stable pipelines, but that was probably the most difficult.

38 00:04:29.510 00:04:34.309 Awaish Kumar: Okay, but can deep dive into what tools and technologies were used?

39 00:04:34.450 00:04:37.780 Rhodes: Oh, sure. Yeah, so in that… Sorry.

40 00:04:37.780 00:04:40.430 Awaish Kumar: What were the data sources.

41 00:04:40.920 00:04:42.040 Rhodes: Alright.

42 00:04:42.390 00:04:42.970 Awaish Kumar: Yeah.

43 00:04:43.380 00:04:55.350 Rhodes: Yeah, so we were modernizing from a SaaS Oracle, server and, like, MySQL Server into DataIQ directly.

44 00:04:55.370 00:05:10.160 Rhodes: That was one case where, right, DataIQ isn’t necessarily always considered an end-to-end platform, but they were trying to move quickly. So, as far as tools and technologies, I helped set up Spark clusters.

45 00:05:10.160 00:05:23.440 Rhodes: I, you know, took, like, both external files and cleaned them, made sure that schemas and delimiters were handled appropriately on ingestion, and then also.

46 00:05:23.450 00:05:42.999 Rhodes: I had to make a lot of things performant, especially when it came to bringing in SQL-type queries, so, like, directly querying it on ingestion. I orchestrated, multiple workflows together in Spark library functions into the notebook, and…

47 00:05:43.000 00:05:47.829 Rhodes: diagnosed it via, like, Spark UI, that kind of thing.

48 00:05:54.900 00:05:56.159 Rhodes: Oh, God.

49 00:05:58.500 00:06:03.640 Awaish Kumar: curate yourself, like… For example, in the tools that you mentioned.

50 00:06:04.260 00:06:05.320 Awaish Kumar: Out of 10.

51 00:06:06.230 00:06:17.559 Rhodes: Out of 10? Yeah, so I would say SQL specifically, I’m pretty comfortable in. I guess… I mean, I think there’s always room to grow, but maybe, like, a 9?

52 00:06:17.690 00:06:22.480 Rhodes: And then, Pi Spark, kind of same thing, I would say about a 9.

53 00:06:23.690 00:06:32.199 Rhodes: And then in terms of, like, lake houses, so, like, DataIQ is probably the most simplest, but I also worked in Databricks and Fabric.

54 00:06:32.370 00:06:48.670 Rhodes: I’d say Databricks is probably the most, like, complicated, so maybe, like, maybe, like, a 7 or 8, like, maybe at the level of the associate data engineer certificate, like, pushing the professional one.

55 00:06:50.690 00:06:51.610 Rhodes: Okay.

56 00:06:53.350 00:06:56.960 Awaish Kumar: And did you also have experience with Snowflake?

57 00:06:58.100 00:07:09.400 Rhodes: I haven’t had direct experience in Snowflake, but similar kind of data warehousing, concepts, so, for example.

58 00:07:10.060 00:07:13.030 Rhodes: Let’s see, I’m trying to think of an example, but

59 00:07:13.340 00:07:20.390 Rhodes: Yeah, like, like, a lot of times, like, within lake house architectures.

60 00:07:20.790 00:07:23.540 Awaish Kumar: Slowly jumping religion.

61 00:07:24.540 00:07:25.762 Rhodes: Yeah, yeah, I’m…

62 00:07:26.530 00:07:42.139 Rhodes: Yeah, so I have worked in slowly changing dimensions. So, like, type 0 is fixed. The main types of slowly changing dimensions I worked with were, like, 1, 2, and 4. So, for example, when I was working with FedEx.

63 00:07:42.270 00:07:53.309 Rhodes: They were analyzing different product combinations, so we implemented a Type 2 so that we could keep all the historical data, and then…

64 00:07:53.390 00:08:03.309 Rhodes: And then, when I worked with University of Texas, they needed all historical data for possible reporting reasons, but were only using

65 00:08:03.390 00:08:16.239 Rhodes: analysis on current data, so we implemented a Type 4, to append the current notebook, and then… or append the current table, and then kick the old data to a historical table.

66 00:08:18.780 00:08:19.640 Rhodes: Okay. Okay.

67 00:08:22.760 00:08:28.370 Awaish Kumar: And… And then… Okay, good.

68 00:08:28.980 00:08:34.610 Awaish Kumar: Let’s talk about a scenario where we have a table

69 00:08:34.720 00:08:40.789 Awaish Kumar: Which has a lot of different… Data columns, string data columns.

70 00:08:40.890 00:08:42.010 Awaish Kumar: Okay. Time.

71 00:08:43.600 00:08:49.800 Awaish Kumar: Like, maybe there is a record of some transaction or event that is happening.

72 00:08:49.990 00:08:51.899 Awaish Kumar: But it has all… all…

73 00:08:52.950 00:08:54.720 Rhodes: The screen column in there.

74 00:08:55.200 00:08:57.819 Rhodes: And it is huge.

75 00:08:57.870 00:09:04.330 Awaish Kumar: So, whenever I’m trying to carry it, maybe using… I mean, I have some, like, where clauses and…

76 00:09:04.520 00:09:07.010 Rhodes: Very close.

77 00:09:07.570 00:09:09.630 Rhodes: What are some of the dreams?

78 00:09:10.150 00:09:12.550 Rhodes: You can optimize the query.

79 00:09:12.550 00:09:14.630 Awaish Kumar: Or optimize the table itself.

80 00:09:14.830 00:09:17.540 Awaish Kumar: So, we can improve the query performance.

81 00:09:18.280 00:09:18.990 Rhodes: Okay.

82 00:09:19.590 00:09:27.210 Rhodes: Yeah, I would say… well, for one, a lot of my experience… Has showed me that

83 00:09:27.440 00:09:42.299 Rhodes: kind of not carrying data too far downstream, especially with joins, can be really expensive, so maybe implementing WHERE clauses before you’re joining those tables, to filter them.

84 00:09:43.700 00:09:44.640 Rhodes: I’m sorry?

85 00:09:45.560 00:09:46.630 Awaish Kumar: It’s just one table.

86 00:09:47.020 00:09:50.310 Rhodes: Okay. Yeah, so we can look at…

87 00:09:51.310 00:09:53.509 Rhodes: Okay, let me think for a second.

88 00:09:55.450 00:10:04.339 Rhodes: So I guess, yeah, so I guess one of the things we can do is we can select columns that we need, early.

89 00:10:04.670 00:10:08.859 Rhodes: We can filter rows that we need early as well.

90 00:10:09.230 00:10:12.810 Rhodes: Another thing we can do is…

91 00:10:13.140 00:10:21.200 Rhodes: We can… if there’s, for example, acronyms, that we can use, Sometimes that…

92 00:10:21.490 00:10:25.140 Rhodes: Or, like, a yes or no, changing that to binary.

93 00:10:25.300 00:10:27.490 Rhodes: Can also help.

94 00:10:30.000 00:10:30.900 Rhodes: Okay.

95 00:10:31.550 00:10:32.030 Awaish Kumar: They’re gonna give me.

96 00:10:32.030 00:10:39.790 Rhodes: Thanks, you can… Yeah, let me… I guess, let me think for a second,

97 00:10:40.750 00:10:44.589 Rhodes: So, we have one table, they’re all stream columns.

98 00:10:51.040 00:10:58.450 Rhodes: And then… Like, we’re not thinking of taking it downstream, we’re just saying, like.

99 00:10:58.790 00:11:07.870 Rhodes: how it is, right? So, okay, so first, I guess, understanding how it’s being used, like, is this for, like, ingestion?

100 00:11:08.020 00:11:11.380 Rhodes: Or transformation, or analytics, maybe?

101 00:11:13.090 00:11:19.759 Awaish Kumar: Next, I’m trying to generate… And my carry is really slow while I’m carrying on that table.

102 00:11:20.040 00:11:30.600 Awaish Kumar: And I want to optimize my… the query that I’m running. I want to optimize the… the time that it takes to run those queries, so…

103 00:11:31.470 00:11:32.350 Rhodes: Okay.

104 00:11:32.990 00:11:37.390 Awaish Kumar: what are the things that I can do, what I can do in my query, or what I can…

105 00:11:37.760 00:11:41.640 Awaish Kumar: Do it at the table, so it can… yeah.

106 00:11:41.640 00:11:42.510 Rhodes: Yeah.

107 00:11:43.230 00:11:49.430 Rhodes: Okay, so… Yeah, you said that they’re all string columns, so…

108 00:11:49.680 00:12:00.029 Rhodes: strings are definitely going to be heavier than numeric or dates, so, for example, if you have a date that’s in the form of a string, I guess.

109 00:12:00.700 00:12:01.070 Awaish Kumar: You don’t know.

110 00:12:01.970 00:12:02.980 Rhodes: You can think…

111 00:12:02.980 00:12:08.950 Awaish Kumar: String columns are a string, because these are strings, like, names, email address.

112 00:12:10.090 00:12:10.910 Rhodes: Okay.

113 00:12:12.180 00:12:16.110 Rhodes: Gotcha. Okay, so…

114 00:12:23.460 00:12:33.420 Rhodes: Okay, yeah, just give me one second to think, nope. I guess… Depending on the system…

115 00:12:33.560 00:12:38.369 Rhodes: We could partition by commonly filtered columns?

116 00:12:38.670 00:12:46.360 Rhodes: We could… if it’s, like, a warehouse-specific type system, for example.

117 00:12:47.210 00:12:51.859 Rhodes: like, in AWS, you can use, like, Redshift to…

118 00:12:52.190 00:12:56.169 Rhodes: Like, sort or distribute keys,

119 00:12:57.130 00:12:59.809 Rhodes: I’m trying to think of other things.

120 00:13:00.350 00:13:04.249 Rhodes: Yeah, you mentioned…

121 00:13:04.290 00:13:06.299 Awaish Kumar: our distribution keys.

122 00:13:06.710 00:13:13.840 Awaish Kumar: Or maybe let’s talk about partitioning first, so… Like…

123 00:13:14.320 00:13:17.290 Awaish Kumar: What columns would you take for partitioning on?

124 00:13:18.050 00:13:22.980 Rhodes: Oh, yeah, definitely anything…

125 00:13:23.310 00:13:33.910 Rhodes: that… I would say, like, date or region, anything that’s commonly filtered. I guess, like, in my… in my experience, I’m usually partitioning on

126 00:13:34.030 00:13:41.869 Rhodes: dates, or, like, regions, like, anything where you’re kind of wanting to cluster, like…

127 00:13:42.080 00:13:46.219 Awaish Kumar: how it is going to work here, because there is… This data isn’t…

128 00:13:46.480 00:13:49.659 Awaish Kumar: About date… like, there’s no date column.

129 00:13:53.330 00:13:55.040 Rhodes: Gotcha, because it’s all strings.

130 00:13:58.200 00:14:08.660 Rhodes: Okay, so… In that case…

131 00:14:10.270 00:14:15.410 Rhodes: Okay, so, right, so we’re not choosing on data type, we want to choose it on query pattern.

132 00:14:15.740 00:14:25.169 Rhodes: I mean, how, like, how is this data being queried, I guess, would be my first question.

133 00:14:25.170 00:14:27.880 Awaish Kumar: theory, for example, I don’t have any…

134 00:14:31.060 00:14:32.060 Rhodes: Any…

135 00:14:32.110 00:14:37.470 Awaish Kumar: Filter on any, any day, or any set of, like.

136 00:14:37.860 00:14:44.480 Awaish Kumar: Any column that can identify a group of columns where you can partition. For example, you can partition by

137 00:14:44.990 00:14:48.900 Awaish Kumar: Date, or you can maybe transition by… Some other…

138 00:14:49.110 00:15:01.950 Awaish Kumar: fields, but the problem is the way I’m using it, or the way people carry it is, like, maybe I’m carrying it by, okay, I have name of 10 people, written me… written all the rules for those 10 peoples.

139 00:15:03.990 00:15:04.730 Rhodes: Okay.

140 00:15:07.170 00:15:13.310 Rhodes: So… Okay, so… In that case…

141 00:15:16.990 00:15:17.890 Rhodes: Whoa.

142 00:15:21.980 00:15:28.180 Rhodes: Well, I’m just thinking, like… Well, I’m just thinking, I guess I’ve never…

143 00:15:29.760 00:15:35.979 Rhodes: I’ve always kind of thought partitioning makes more sense when Like, the query regularly filter…

144 00:15:36.110 00:15:41.780 Rhodes: It kind of filters, like, large scans, I mean…

145 00:15:42.100 00:15:50.829 Rhodes: doesn’t things like name, email, like, strings like that kind of have, like, high cardinality? Doesn’t that kind of create too many small partitions?

146 00:15:51.150 00:15:56.269 Rhodes: Like, with that… Maybe in that case, that wouldn’t be the right.

147 00:15:56.270 00:15:56.830 Awaish Kumar: Do you think?

148 00:15:56.830 00:15:57.709 Rhodes: I like it.

149 00:15:58.280 00:15:59.989 Awaish Kumar: Partitioning might not be the best.

150 00:16:00.670 00:16:03.000 Awaish Kumar: solution here, right? Because… Yeah.

151 00:16:03.560 00:16:06.080 Awaish Kumar: These are our high cardinality columns.

152 00:16:06.850 00:16:13.130 Rhodes: So… Okay, so I’m just trying to think of some other things, maybe.

153 00:16:13.800 00:16:22.419 Rhodes: So, there’s no, like, meaningful filter column. The table’s used, like, for maybe, like, lookups on those different string columns,

154 00:16:23.310 00:16:30.720 Rhodes: I guess… Okay, so they all are meant to be strings, so that’s one

155 00:16:31.660 00:16:36.320 Rhodes: Standardizing or cleaning names or values, maybe trimming.

156 00:16:36.800 00:16:39.349 Awaish Kumar: The existing companies are always changing.

157 00:16:40.080 00:16:45.250 Awaish Kumar: How you are going to… Support, or how you are going to optimize it,

158 00:16:45.460 00:16:49.429 Awaish Kumar: If how you want to structure it, this totally depends.

159 00:16:49.600 00:16:51.120 Awaish Kumar: On… on… on you.

160 00:16:55.830 00:16:57.290 Rhodes: Whoa, okay.

161 00:16:57.470 00:17:02.550 Rhodes: Oh, well, okay, okay, hold up, well, first off…

162 00:17:02.860 00:17:05.480 Rhodes: We can look to deduplicate, like.

163 00:17:06.630 00:17:16.220 Rhodes: We can look to deduplicate, possibly. I mean, right, you said that there’s a large number of transactions, so maybe this is being used for…

164 00:17:16.400 00:17:23.719 Rhodes: Like, unique, like, unique customer names, and then, like, we want, like, a count or something, so we could look at…

165 00:17:23.849 00:17:29.569 Rhodes: the uniqueness of keys, maybe that’s something we can do.

166 00:17:29.830 00:17:30.770 Rhodes: Hey, welcome.

167 00:17:31.130 00:17:32.270 Rhodes: We…

168 00:17:32.620 00:17:33.800 Awaish Kumar: I don’t interject.

169 00:17:34.550 00:17:34.890 Awaish Kumar: Hold on.

170 00:17:34.890 00:17:36.230 Rhodes: And then…

171 00:17:36.230 00:17:36.830 Awaish Kumar: cable.

172 00:17:37.610 00:17:43.720 Awaish Kumar: with the… Integer column, like, and then call it a primary key.

173 00:17:45.700 00:17:47.449 Awaish Kumar: Doesn’t it make it faster?

174 00:17:48.960 00:17:51.180 Rhodes: Oh, yeah. Yeah, I guess it would.

175 00:17:51.180 00:17:54.100 Awaish Kumar: Because primary keys have indexes on it.

176 00:17:55.700 00:17:59.250 Rhodes: Yeah, that makes sense. Okay.

177 00:18:01.760 00:18:11.080 Rhodes: So… So if we were to do that and join it to… A, like, numeric index.

178 00:18:13.020 00:18:21.920 Rhodes: then… So, so the question is still, like, So we want… to keep.

179 00:18:24.040 00:18:27.359 Rhodes: Okay, sorry, I’m just trying to think through this, cleanly.

180 00:18:28.070 00:18:34.710 Rhodes: Okay, so… We can clean and standardize, like, the string columns.

181 00:18:34.850 00:18:41.370 Rhodes: And then… Right, if we’re going to… join it.

182 00:18:41.980 00:18:45.339 Rhodes: Maybe, like… So, the table don’t…

183 00:18:45.490 00:18:53.030 Awaish Kumar: key right now, right? You are trying to add a primary key with an integer column that has indexes on top of it.

184 00:18:53.150 00:18:58.699 Awaish Kumar: Previously, when there was no primary key, you were trying to filter on a name, right?

185 00:18:59.060 00:18:59.510 Rhodes: Yeah.

186 00:19:00.030 00:19:08.030 Awaish Kumar: Okay, you can obviously get those IDs, right, in your system, and then carry on IDs instead of names.

187 00:19:08.320 00:19:10.910 Awaish Kumar: That is one way we can be faster, but…

188 00:19:12.700 00:19:20.879 Awaish Kumar: But having the one big table, along with a lot of different fields, is still…

189 00:19:21.000 00:19:26.970 Awaish Kumar: kind of bad choice, even with adding… even if we add a primary key. So what else we can do?

190 00:19:29.460 00:19:31.140 Rhodes: Okay,

191 00:19:35.490 00:19:44.380 Rhodes: Yeah, no, that’s a good question. I guess… Could we…

192 00:19:46.320 00:19:53.409 Rhodes: So have we… so have we already… we’ve already assigned an integer key to each distinct string ID?

193 00:19:53.640 00:19:56.870 Rhodes: Is that correct? Yeah. Okay.

194 00:19:57.060 00:20:05.520 Rhodes: Gotcha, so then we’ve… and we’ve already joined it back to the original rows?

195 00:20:06.270 00:20:07.590 Rhodes: Yeah, so tonight…

196 00:20:07.590 00:20:09.270 Awaish Kumar: I can imagine that now to have a

197 00:20:09.890 00:20:15.180 Awaish Kumar: It has a primary key, it is a primary key, and it has all other columns, string columns.

198 00:20:15.700 00:20:18.689 Awaish Kumar: Maybe 30, 40 columns.

199 00:20:18.800 00:20:23.340 Awaish Kumar: And just a lot of rows. So now,

200 00:20:23.490 00:20:29.570 Awaish Kumar: What else you can do to optimize the… Beautiful family.

201 00:20:30.500 00:20:31.520 Rhodes: Okay.

202 00:20:31.520 00:20:32.660 Awaish Kumar: Great time to do today.

203 00:20:33.130 00:20:37.909 Rhodes: Okay, gotcha, gotcha. Alright, so I think we’re on the same page now, so that means…

204 00:20:39.840 00:20:50.210 Rhodes: Okay, so that means it kind of depends on what the cost is. We could…

205 00:20:50.660 00:20:58.159 Rhodes: Well, one… one thing that I remember seeing was that… Like, once that…

206 00:20:58.340 00:21:01.180 Rhodes: Kind of, like, surrogate key exists.

207 00:21:01.620 00:21:03.819 Rhodes: We need to change…

208 00:21:05.020 00:21:15.499 Rhodes: that, like, we need to change what’s being joined, like, downstream to that integer key, like I’ve seen before where, like.

209 00:21:15.720 00:21:21.389 Rhodes: Business groups were still using, like, the old key to join.

210 00:21:21.550 00:21:25.359 Rhodes: So making sure that that’s in place is one thing.

211 00:21:25.570 00:21:32.710 Rhodes: Another thing… is that… we could…

212 00:21:33.830 00:21:37.300 Rhodes: Well, I’ve also seen… I’ve also seen that sometimes

213 00:21:37.460 00:21:47.350 Rhodes: One of the biggest issues is bad matching, so… Maybe, like, as we… Well…

214 00:21:48.600 00:22:06.820 Rhodes: Maybe as, like, we rebuild, like, the mapping of the runs, we could make sure that, like, white spaces are trimmed, we could lowercase, we could standardize, like, nulls and blanks, or remove, like, any formatting inconsistencies, and then…

215 00:22:07.730 00:22:10.609 Awaish Kumar: So, it is a flat level, right now.

216 00:22:10.840 00:22:15.609 Awaish Kumar: So, can’t we… Can’t we do anything about it? Like, can’t we…

217 00:22:16.650 00:22:20.289 Rhodes: You said it’s a flat file? Or what?

218 00:22:20.290 00:22:24.480 Awaish Kumar: Just one flat table with a lot of rows and a lot of columns.

219 00:22:25.020 00:22:25.770 Rhodes: Okay.

220 00:22:27.760 00:22:29.590 Rhodes: Okay, so…

221 00:22:34.180 00:22:35.120 Rhodes: Okay.

222 00:22:35.410 00:22:42.610 Rhodes: Yeah, I mean, I think, again, just, like, avoiding wide string columns unless needed, and then…

223 00:22:43.350 00:22:46.859 Rhodes: I guess, like, in a tool-specific way.

224 00:22:47.180 00:22:54.160 Rhodes: like, for example, in Databricks, you could, like, Z-order, like, cluster, Possibly.

225 00:22:55.440 00:22:56.429 Awaish Kumar: What about damage?

226 00:22:58.830 00:23:00.130 Rhodes: Sorry, what was that?

227 00:23:00.770 00:23:02.540 Awaish Kumar: What about dimensional modeling?

228 00:23:04.580 00:23:06.300 Rhodes: dimensional movement.

229 00:23:06.590 00:23:07.750 Rhodes: Boom.

230 00:23:07.990 00:23:08.660 Awaish Kumar: modeling.

231 00:23:09.280 00:23:12.800 Awaish Kumar: Breaking this table into multiple fact tundel tables.

232 00:23:13.070 00:23:14.430 Rhodes: Oh, yeah.

233 00:23:14.750 00:23:22.060 Rhodes: Yeah, definitely. Yeah, right, because everything kind of multiplies on itself, that would make sense.

234 00:23:22.410 00:23:29.839 Rhodes: In that case… Yeah, I mean, I think that that would work,

235 00:23:32.770 00:23:35.490 Rhodes: So, if we broke it up…

236 00:23:39.730 00:23:42.520 Rhodes: Okay, so you broke it up into fact and dim.

237 00:23:43.170 00:23:50.279 Rhodes: Then that means that, right, joins work better, and then for slow dimensions, for example, you could use, like, broadcast join.

238 00:23:50.410 00:23:54.110 Rhodes: That would definitely speed it up,

239 00:23:54.340 00:23:59.450 Rhodes: Maybe, like, null inclusion versus null exclusion in joins.

240 00:23:59.880 00:24:04.530 Rhodes: So… I guess, yeah.

241 00:24:04.980 00:24:06.820 Rhodes: That would be kind of where my…

242 00:24:07.180 00:24:09.669 Rhodes: Where my, head goes there.

243 00:24:11.880 00:24:12.280 Awaish Kumar: Okay.

244 00:24:13.090 00:24:18.379 Awaish Kumar: Just have the last few minutes left. Do you have any other questions to ask from me?

245 00:24:19.540 00:24:22.949 Rhodes: Yeah, yeah, so one thing I was wondering is, like.

246 00:24:23.610 00:24:37.929 Rhodes: I know in your site, you put a lot of… on the website, it puts a lot of emphasis on MVPs, and I was just kind of wondering how you guys generally balance, maybe, like, the fast pace of that with…

247 00:24:38.380 00:24:41.699 Rhodes: Like, engineering, like, standards, like, testing.

248 00:24:44.270 00:24:45.600 Awaish Kumar: Okay, so…

249 00:24:45.780 00:24:54.620 Awaish Kumar: Like, now that MVPs are being… like, now with the use of AI, things are obviously getting faster to deliver.

250 00:24:54.720 00:25:02.850 Awaish Kumar: Creating MVPs is… has become really fast. So… but we have defined the guardrails.

251 00:25:02.950 00:25:04.310 Awaish Kumar: So…

252 00:25:06.620 00:25:20.149 Awaish Kumar: like, for our workflows where we are using AI for development, we have defined playbooks, guidelines, and the guardrails for what we are doing. For example, if I am writing a data.

253 00:25:20.550 00:25:21.580 Rhodes: Wow.

254 00:25:22.420 00:25:29.039 Rhodes: like, if I’m writing a data pipeline, I will have a label.

255 00:25:29.420 00:25:38.740 Awaish Kumar: For example, I’m using Python to write the pipeline, I’m using Airflow, so I will have a… so we have defined our guides.

256 00:25:38.970 00:25:51.989 Awaish Kumar: like, when you are about to create a DAG for your flow, what are the best practices to follow, what not to do, and what to do? Everything is defined there. Similarly, for if you’re writing a Python script using

257 00:25:52.730 00:25:56.740 Awaish Kumar: found us, then what are the best practices?

258 00:25:57.430 00:26:02.440 Awaish Kumar: I’m… And there’s, like, this test suite.

259 00:26:02.670 00:26:05.879 Awaish Kumar: What type of tests need to be implemented?

260 00:26:06.030 00:26:08.609 Awaish Kumar: And what… and what not to do.

261 00:26:08.780 00:26:13.890 Awaish Kumar: And with that, it speeds up the development for us.

262 00:26:14.500 00:26:15.510 Rhodes: Okay, cool.

263 00:26:15.620 00:26:19.759 Awaish Kumar: That, ensuring that, like, it meets our standards.

264 00:26:20.450 00:26:22.710 Rhodes: Awesome. Yeah,

265 00:26:22.820 00:26:38.069 Rhodes: Yeah, I also saw that on the job description, it mentioned, documentation, and that’s something that I’ve, like, done a lot of work in, whether it was, like, using AI-assisted, kind of, like, post-client engagement type of stuff, or,

266 00:26:38.320 00:26:49.019 Rhodes: yeah, like, like, using it for, like, DAG-type, like, mappings, that kind of stuff, so I’m just kind of curious, like, how you guys use documentation, what your philosophy on it is.

267 00:26:50.740 00:26:53.840 Awaish Kumar: Like, we have a big… we maintain a…

268 00:26:54.210 00:26:57.800 Awaish Kumar: A repository of documentation, which includes internal

269 00:26:58.250 00:27:04.550 Awaish Kumar: comms, communication with clients. We have our meeting recordings.

270 00:27:04.780 00:27:13.779 Awaish Kumar: And, and the work we do is also already documented with the GitHub.

271 00:27:15.830 00:27:16.750 Awaish Kumar: Okay.

272 00:27:16.890 00:27:21.700 Awaish Kumar: Obviously, the docker strings we have for each file.

273 00:27:22.150 00:27:29.040 Awaish Kumar: And, with that, like, the event helped us generate that location, right?

274 00:27:29.930 00:27:31.499 Rhodes: We’re gonna ask Jan.

275 00:27:32.080 00:27:36.540 Awaish Kumar: And based on that, we have our extended defined, we have…

276 00:27:36.910 00:27:44.450 Awaish Kumar: playbooks to run, so, like, you can anytime go into our workspace and ask AI, like.

277 00:27:44.950 00:27:46.979 Awaish Kumar: What are… what are the best practices?

278 00:27:47.170 00:27:54.210 Awaish Kumar: for… building ABC, or doing XYZ, and it will give you the…

279 00:27:54.760 00:27:56.870 Awaish Kumar: Or whatever the guidelines we have.

280 00:27:57.560 00:28:00.550 Rhodes: Cool. Do you guys do stuff like link Jira?

281 00:28:00.660 00:28:06.089 Rhodes: like, notebooks, or do you guys use, like, microcommits, or anything like that?

282 00:28:06.730 00:28:11.040 Awaish Kumar: We use linear And that is also linked to our…

283 00:28:11.200 00:28:18.369 Awaish Kumar: development environment. So basically, you are in what development environment, you can ask, I’m working for this year.

284 00:28:18.990 00:28:26.450 Awaish Kumar: help me create the file, whatever, the pipeline, and then it can create PRs for you. It is connected to GitHub as well.

285 00:28:26.590 00:28:29.730 Awaish Kumar: it can automatically link linear with GitHub and everything.

286 00:28:30.050 00:28:47.479 Rhodes: Oh, that’s really cool. Okay. Yeah, and okay, one more question. How… I was kind of curious how hands-on the engineering team is with clients, when building solutions, or, for the engineering team some more just back-end.

287 00:28:48.640 00:28:50.529 Rhodes: Yeah, me too.

288 00:28:50.530 00:28:57.990 Awaish Kumar: It’s all back-end engineering is on the back end, but it’s like, we do have some FaceTime with clients.

289 00:28:58.170 00:29:03.580 Awaish Kumar: From time to time, we meet them, meet with them, like, maybe once a week, or… Hold on…

290 00:29:03.970 00:29:07.679 Awaish Kumar: On call basis, like, if somebody wants us to be in the meeting.

291 00:29:07.900 00:29:19.069 Awaish Kumar: But, like, yeah, initially, we all were, like, into meetings with the client, but now we are trying to… it’s a lot of work to handle the client, and then also…

292 00:29:21.460 00:29:23.900 Awaish Kumar: Maybe a layer of people which can just be…

293 00:29:24.230 00:29:26.630 Awaish Kumar: Handle the engagement with the client.

294 00:29:26.790 00:29:31.060 Awaish Kumar: Sharing… curating the documentation, presentations, and things like that.

295 00:29:31.230 00:29:38.829 Awaish Kumar: But then on the back end, we are the ones… engineers are basically delivering the work, so they know more about

296 00:29:39.040 00:29:41.079 Awaish Kumar: Christopher than anyone else, so…

297 00:29:41.470 00:29:41.850 Rhodes: Yep.

298 00:29:41.850 00:29:45.489 Awaish Kumar: Yeah, we are… the engineers are needed sometimes.

299 00:29:45.830 00:29:52.530 Awaish Kumar: To have comms with client. If things are stuck, then obviously you have to Huh.

300 00:29:52.730 00:29:56.379 Awaish Kumar: Talk to the client and let them know what we need.

301 00:29:56.570 00:30:01.469 Awaish Kumar: And they also have, like, engineers from their side, so you can actually talk to them.

302 00:30:01.710 00:30:07.730 Awaish Kumar: But mainly the… To handle the engagement itself is more of a…

303 00:30:07.910 00:30:09.690 Awaish Kumar: We have a separate person for that.

304 00:30:10.350 00:30:14.250 Rhodes: Okay, that makes sense. Yeah, it kind of matches up to what I was thinking, but cool.

305 00:30:14.580 00:30:15.859 Rhodes: Well…

306 00:30:15.860 00:30:16.180 Awaish Kumar: You know.

307 00:30:16.180 00:30:18.139 Rhodes: Interested, son. Oh, sorry, go ahead.

308 00:30:18.140 00:30:20.759 Awaish Kumar: Yeah, I have a hard stop here, so…

309 00:30:20.940 00:30:27.800 Awaish Kumar: Thank you for your time today, and I will summon my feedback today, and after that, our recruiter will reach out to you.

310 00:30:28.140 00:30:31.669 Rhodes: Awesome, good talking to you, and looking forward to next steps. Have a good one.