Meeting Title: Brainforge Interview w- Awaish Date: 2026-04-10 Meeting participants: Yeshwanth C Mineni, Awaish Kumar, Yeshwanth Chowdhary Mineni


WEBVTT

1 00:02:25.860 00:02:27.940 Awaish Kumar: Hello, Yasvand, how are you?

2 00:02:35.660 00:02:36.650 Awaish Kumar: Hello?

3 00:02:37.050 00:02:38.540 Yeshwanth C Mineni: Hi, Avaesh, how are you?

4 00:02:39.440 00:02:40.600 Awaish Kumar: I’m good. How about you?

5 00:02:41.460 00:02:44.239 Yeshwanth C Mineni: I’m good as well. Just one moment,

6 00:02:44.620 00:02:49.060 Yeshwanth C Mineni: Can I rejoin the call? Unfortunately, my camera is not working.

7 00:02:49.330 00:02:50.929 Awaish Kumar: Yeah, sure, sure, no worries.

8 00:02:50.930 00:02:51.719 Yeshwanth C Mineni: One moment.

9 00:08:34.429 00:08:36.459 Yeshwanth Chowdhary Mineni: Hi, Avish. I’m so sorry about it.

10 00:08:37.760 00:08:41.979 Awaish Kumar: Yeah, nobody’s, how are you doing?

11 00:08:42.280 00:08:43.690 Yeshwanth Chowdhary Mineni: I’m doing good, how are you?

12 00:08:44.670 00:08:48.250 Awaish Kumar: So, Jan, tell me about yourself.

13 00:08:49.210 00:09:07.499 Yeshwanth Chowdhary Mineni: Myself, I’m a data engineer. I’ve been working with, Upstart in my most recent role, where I was building pipelines on a daily basis, and I’ve been handling around, like, large amounts of data, like, they have millions of transactions happening on a daily basis.

14 00:09:07.680 00:09:11.139 Yeshwanth Chowdhary Mineni: And I’m the person, that makes sure that

15 00:09:11.240 00:09:19.090 Yeshwanth Chowdhary Mineni: I transfer the data with the due diligence, where I handle, like, at least 30 DB on a daily basis to the necessary teams.

16 00:09:19.190 00:09:31.240 Yeshwanth Chowdhary Mineni: Wherein our firm, it’s basically a credit lending firm. They operate on, several ML models, like, to make the additions and,

17 00:09:31.240 00:09:39.999 Yeshwanth Chowdhary Mineni: I evaluate the customer portfolio, and then they give out the loans and several other portfolios that they have.

18 00:09:40.260 00:09:50.510 Yeshwanth Chowdhary Mineni: So, basically, this is all that I was working on in my previous role, and I also had experience working with several data vendors as well over there.

19 00:09:50.900 00:09:53.649 Yeshwanth Chowdhary Mineni: So, this is how my typical day looks like.

20 00:09:53.650 00:10:01.080 Awaish Kumar: What your overall career looks like, how much years of experience you have, and in what domains and industries?

21 00:10:01.420 00:10:06.749 Yeshwanth Chowdhary Mineni: So, I have around 8 years of experience as a data engineer itself, and

22 00:10:06.970 00:10:13.400 Yeshwanth Chowdhary Mineni: So, coming to the domain part, I was working with, financial services domain itself.

23 00:10:13.610 00:10:31.019 Yeshwanth Chowdhary Mineni: Since the beginning of it. I started with EdCL as a data engineer, and then with… I worked with Cognizant, wherein, we have several clients over there, so it, particularly, they were, like, local India-based clients, not,

24 00:10:31.420 00:10:36.670 Yeshwanth Chowdhary Mineni: Not overseas, but in later projects, I got to work with overseas clients as well.

25 00:10:37.110 00:10:37.690 Yeshwanth Chowdhary Mineni: For depth.

26 00:10:37.690 00:10:39.090 Awaish Kumar: Right now?

27 00:10:40.200 00:10:40.920 Yeshwanth Chowdhary Mineni: Sorry?

28 00:10:41.370 00:10:43.100 Awaish Kumar: Where are you based in right now?

29 00:10:43.380 00:10:45.430 Yeshwanth Chowdhary Mineni: I’m based out of Jersey City right now.

30 00:10:45.800 00:10:49.500 Yeshwanth Chowdhary Mineni: New Jersey. I’m based in Jersey City, New Jersey, right now.

31 00:10:49.500 00:10:54.150 Awaish Kumar: Your experience of 10 years is in… is in U.S, or it’s more…

32 00:10:54.400 00:10:57.329 Awaish Kumar: of an experience in India, and then we moved.

33 00:10:57.810 00:11:13.089 Yeshwanth Chowdhary Mineni: So, I’ve done Masters over here. So, initially, I started working back in India after my bachelor’s, then I have come here for my master’s, that’s when I got this contract, and I started working with Upstart for 4 years.

34 00:11:14.140 00:11:16.390 Awaish Kumar: Upstart. Yes. Okay.

35 00:11:17.530 00:11:22.380 Awaish Kumar: Yeah, so what should… In what subject you did the master’s?

36 00:11:22.960 00:11:24.500 Yeshwanth Chowdhary Mineni: Or data science itself.

37 00:11:25.620 00:11:26.710 Awaish Kumar: Data science.

38 00:11:26.710 00:11:27.070 Yeshwanth Chowdhary Mineni: Yes.

39 00:11:27.070 00:11:30.890 Awaish Kumar: And… so why you didn’t chose to become a data scientist?

40 00:11:32.110 00:11:40.519 Yeshwanth Chowdhary Mineni: I mean, that’s not my traditional plan in achieving it, but, like, I’m looking forward to it.

41 00:11:40.520 00:11:50.220 Awaish Kumar: spending, like, 2 years of his… his… his, like, time, his life, like, learning data science.

42 00:11:50.550 00:11:55.830 Awaish Kumar: Yes. Not choosing it as a carrier, like, I’m curious to know, like, what…

43 00:11:56.270 00:11:56.780 Yeshwanth Chowdhary Mineni: Fair enough.

44 00:11:56.780 00:12:02.460 Awaish Kumar: Or what made you to work in engineering, and not, you know, As a data scientist.

45 00:12:02.460 00:12:11.529 Yeshwanth Chowdhary Mineni: So, it’s typically hard to enter as a data scientist directly, because there are a lot of constraints in case of my visa and other parts.

46 00:12:11.610 00:12:25.339 Yeshwanth Chowdhary Mineni: But, generally, I’ve been applying for data scientist roles as well, and also I’ve been applying to AI engineer roles as well. Those are my typical target roles. But then again, to break into those, I need a certain set of

47 00:12:25.430 00:12:30.350 Yeshwanth Chowdhary Mineni: Experience with certain, okay. Second client speaker.

48 00:12:31.960 00:12:32.620 Awaish Kumar: Okay.

49 00:12:35.560 00:12:42.529 Awaish Kumar: So… So you have, like, 8 years of experience working as a data engineer, or is it…

50 00:12:42.860 00:12:45.480 Awaish Kumar: Overall, 8 years of experience, we’re…

51 00:12:45.640 00:12:51.719 Awaish Kumar: some experience in data engineering, some in software engineering, or… is it a mix of different things, or is it just data engineering?

52 00:12:51.720 00:13:02.370 Yeshwanth Chowdhary Mineni: I would say I started as a software engineer. I mean, like, even though my title says as a data engineer, I would say typically it involves, like, software engineering skills itself, initially.

53 00:13:02.520 00:13:12.470 Yeshwanth Chowdhary Mineni: Back in India, while I was working at HCL. But then again, like, while I was working with ConkDesign, I… I… I was fully working as a data engineer.

54 00:13:12.710 00:13:15.380 Yeshwanth Chowdhary Mineni: And with Upstart as well as, yeah.

55 00:13:15.820 00:13:18.120 Awaish Kumar: Okay, since you mentioned that you have

56 00:13:18.550 00:13:25.620 Awaish Kumar: Okay, let’s first talk about, like, as a data engineer, what different tools and technologies you have worked with so far?

57 00:13:26.830 00:13:34.579 Yeshwanth Chowdhary Mineni: So typically, I’ve, I’ve, extensive experience in, tools such as, like.

58 00:13:34.900 00:13:41.499 Yeshwanth Chowdhary Mineni: I mean, I’m efficient in SQL, like, writing, like, advanced SQL,

59 00:13:41.610 00:13:53.519 Yeshwanth Chowdhary Mineni: queries and stuff, but it’s pretty generic to say that, but, like, I’m an expert in Databricks as well. I’m a certified Databricks architect and solution expert as well.

60 00:13:53.780 00:13:59.320 Yeshwanth Chowdhary Mineni: And also, for our data processing part, I did use Apache Spark as well.

61 00:13:59.430 00:14:05.709 Yeshwanth Chowdhary Mineni: And also, I’m good with, Spark structured streaming, and also,

62 00:14:06.010 00:14:10.580 Yeshwanth Chowdhary Mineni: I’m good with, delta-like features such as, acid or, like.

63 00:14:10.730 00:14:15.140 Yeshwanth Chowdhary Mineni: Transactions, or, like, schema enforcement, or, like… .

64 00:14:15.140 00:14:15.690 Awaish Kumar: Okay.

65 00:14:15.820 00:14:16.730 Yeshwanth Chowdhary Mineni: Any such?

66 00:14:16.880 00:14:17.660 Yeshwanth Chowdhary Mineni: Yes.

67 00:14:18.410 00:14:27.349 Awaish Kumar: Okay, and like, okay, so you worked in a mostly in an Azure environment?

68 00:14:27.750 00:14:34.650 Awaish Kumar: Is it, like, everything on Databricks, all the orchestration, pipelines, Spark jobs?

69 00:14:34.810 00:14:35.410 Yeshwanth Chowdhary Mineni: No.

70 00:14:35.410 00:14:35.870 Awaish Kumar: Not exactly.

71 00:14:36.630 00:14:44.489 Yeshwanth Chowdhary Mineni: But, like, I have, have experience working significantly with Azure and, Databricks as the ETL tool.

72 00:14:44.990 00:14:48.909 Yeshwanth Chowdhary Mineni: And data warehousing solution platform, generally.

73 00:14:49.080 00:14:55.539 Yeshwanth Chowdhary Mineni: like, our data is on-prem, and that’s where we landed on ADLS.

74 00:14:56.100 00:14:57.340 Yeshwanth Chowdhary Mineni: And,

75 00:14:58.740 00:15:09.290 Yeshwanth Chowdhary Mineni: We have… I mean, I used Databricks exten… I would say I used Databricks extensively for building, like, PySpark pipelines, like.

76 00:15:09.750 00:15:12.750 Yeshwanth Chowdhary Mineni: Or, any orchestration purposes?

77 00:15:12.890 00:15:26.850 Yeshwanth Chowdhary Mineni: But, like, it pretty much depends on the approach that we take, like… like, depends on the team’s requirement. Like, we have several verticals in our team. For example, we have, ML teams.

78 00:15:27.120 00:15:32.819 Yeshwanth Chowdhary Mineni: We also have, credit scoring teams, and, we also have…

79 00:15:33.350 00:15:41.480 Awaish Kumar: My question is more like, like, what your experience is about, right? So, for example.

80 00:15:41.750 00:15:45.430 Awaish Kumar: If, like, somebody… someone uses multiple set of tools.

81 00:15:45.690 00:15:46.160 Yeshwanth Chowdhary Mineni: Yes.

82 00:15:46.160 00:15:54.769 Awaish Kumar: Like, for example, in my case, I would say I’m writing Pythonscapes, I’m using open source tools like Airflow, I’m using,

83 00:15:56.530 00:16:01.960 Awaish Kumar: warehouses like Snowflake, BigQuery, Redshift, and then also using

84 00:16:02.200 00:16:05.520 Awaish Kumar: some open source tools for the BI tools, and then a mix of

85 00:16:05.830 00:16:08.879 Awaish Kumar: Tools that are being used to actually run a pipeline.

86 00:16:08.980 00:16:15.629 Awaish Kumar: So if somebody… now that Databricks is kind of a platform, like the data platform, which can… which provides you

87 00:16:15.860 00:16:18.529 Awaish Kumar: More than one thing, like, it, it, it…

88 00:16:18.840 00:16:34.159 Awaish Kumar: you can have a warehouse there, you can have a… you can run your Spark jobs in the same cluster. This is kind of a platform, not a single tool. So my question was more, like, if you have experience more working in the data objects itself.

89 00:16:34.290 00:16:38.929 Awaish Kumar: Or it’s, like, also, apart from Databricks, do you have…

90 00:16:39.150 00:16:40.919 Awaish Kumar: Experience with other tools as well.

91 00:16:42.000 00:16:45.819 Yeshwanth Chowdhary Mineni: I do have experience with other tools as well, as I said, like,

92 00:16:46.120 00:16:50.370 Yeshwanth Chowdhary Mineni: Primarily, I would be using Azure and Databricks.

93 00:16:50.500 00:16:52.810 Yeshwanth Chowdhary Mineni: But then again, we also have… yes.

94 00:16:53.230 00:16:56.570 Awaish Kumar: the, like, a cloud platform, right? Yes.

95 00:16:56.830 00:16:58.639 Awaish Kumar: And it also gives you the…

96 00:16:59.920 00:17:04.380 Awaish Kumar: use Waze… services to use the Databricks as well.

97 00:17:04.750 00:17:05.310 Yeshwanth Chowdhary Mineni: Yes.

98 00:17:06.540 00:17:13.699 Awaish Kumar: So, like, not an Azure as a cloud provider, I mean tools and services that you use inside of it.

99 00:17:14.260 00:17:16.400 Awaish Kumar: like, Azure services, exactly that.

100 00:17:16.680 00:17:29.219 Yeshwanth Chowdhary Mineni: Yes, so, coming to that part, we have ADLS for storage, right? So, we have Gen 2 ADLS blob storage. So, we… where we manage, like, certain regions and stuff.

101 00:17:29.420 00:17:38.110 Yeshwanth Chowdhary Mineni: And I did use, Azure Cosmos DB for, non-relational database purposes.

102 00:17:38.260 00:17:44.560 Yeshwanth Chowdhary Mineni: And also, I, I used other services, such as.

103 00:17:46.050 00:17:46.630 Awaish Kumar: Okay.

104 00:17:47.040 00:17:49.670 Awaish Kumar: Yes. But, yeah, in any way, it’s more…

105 00:17:49.960 00:17:55.470 Awaish Kumar: All the services that you have used, kind of, from Azure, like, as a…

106 00:17:56.120 00:17:56.670 Yeshwanth Chowdhary Mineni: Yes.

107 00:17:56.670 00:18:06.280 Awaish Kumar: Yes. I was just trying to understand how… Of your experiences, but… Okay, apart from that,

108 00:18:06.520 00:18:11.139 Awaish Kumar: For the recent, experience, you mentioned that

109 00:18:11.330 00:18:15.079 Awaish Kumar: You are processing millions of transactions? Yes.

110 00:18:15.730 00:18:22.260 Awaish Kumar: So, how that pipeline is working? Like, is it a real-time pipeline, is it a batch pipeline, or…

111 00:18:22.920 00:18:25.219 Awaish Kumar: And what is the purpose of that pipeline?

112 00:18:25.610 00:18:28.110 Awaish Kumar: Yeah, if you can elaborate more on…

113 00:18:28.650 00:18:36.590 Yeshwanth Chowdhary Mineni: So, I mean, I built, real-time as well as a batch pipeline also, but, like,

114 00:18:37.460 00:18:43.410 Yeshwanth Chowdhary Mineni: I mean, we have, we have data coming through several vendors that we have.

115 00:18:43.710 00:18:47.780 Yeshwanth Chowdhary Mineni: So, for real-time streaming, I did,

116 00:18:48.290 00:18:54.259 Yeshwanth Chowdhary Mineni: Instant insights based on the subscriber activity, and also…

117 00:18:54.460 00:19:02.100 Yeshwanth Chowdhary Mineni: I did, batch processing for, like, when we have, complex, historical analysis or something like that.

118 00:19:02.370 00:19:03.330 Yeshwanth Chowdhary Mineni: So basically…

119 00:19:03.450 00:19:06.589 Awaish Kumar: So here, end-to-end project.

120 00:19:07.320 00:19:15.079 Awaish Kumar: either it’s a badge or a stream. Let’s take a streaming project example, or a single pipeline example of a single pipeline.

121 00:19:15.470 00:19:24.460 Awaish Kumar: walk me through the end-to-end process. What is your source system? How it subscribes? How the data streams into

122 00:19:24.820 00:19:29.889 Awaish Kumar: Services, and how you process it, and finally, how it gets loaded to the warehouse.

123 00:19:30.880 00:19:34.940 Yeshwanth Chowdhary Mineni: Yes, so, for, like…

124 00:19:35.210 00:19:40.049 Yeshwanth Chowdhary Mineni: if I have to explain, like, the end-to-end pipeline that I manage, I would say that,

125 00:19:40.720 00:19:43.440 Yeshwanth Chowdhary Mineni: The source was the,

126 00:19:43.620 00:19:54.120 Yeshwanth Chowdhary Mineni: Data that we have on-prem, for example, where we have millions of transactions happening where the customer portfolios are being generated.

127 00:19:54.460 00:20:05.789 Yeshwanth Chowdhary Mineni: And, these events include, like, several data, such as their, their location, or their credit scores, and several other, criteria of data.

128 00:20:06.190 00:20:13.120 Yeshwanth Chowdhary Mineni: And when these events are, ingested, right? Like, we usually… I used to use,

129 00:20:13.230 00:20:24.480 Yeshwanth Chowdhary Mineni: Apache Kafka Topics, which acted as the ingestion layer, where, where it will also ensure high… high event stream.

130 00:20:24.840 00:20:30.120 Yeshwanth Chowdhary Mineni: And then, once the data is, yes.

131 00:20:30.120 00:20:33.689 Awaish Kumar: How… how the producer works for the Kafka.

132 00:20:35.510 00:20:38.609 Yeshwanth Chowdhary Mineni: So Coming to the…

133 00:20:38.730 00:20:48.629 Yeshwanth Chowdhary Mineni: producer part, it was responsible for sending, like, data from source system into topics, just for the downstream consumption.

134 00:20:48.630 00:20:50.250 Awaish Kumar: But I want to understand how…

135 00:20:51.100 00:20:53.870 Awaish Kumar: How that was working, like, how it was implemented.

136 00:20:54.330 00:20:59.080 Yeshwanth Chowdhary Mineni: So, it usually serializes the event data, like,

137 00:20:59.620 00:21:06.919 Yeshwanth Chowdhary Mineni: It is implemented on a microservice within the source system, like, to capture the real-time events.

138 00:21:07.290 00:21:13.730 Yeshwanth Chowdhary Mineni: Like, yeah. So, it was built, using a Kafka Client Library.

139 00:21:14.060 00:21:17.729 Yeshwanth Chowdhary Mineni: like, in… I mean, I used to use Python for that.

140 00:21:18.100 00:21:29.319 Yeshwanth Chowdhary Mineni: So, usually we can… it depends on the environment. The… then the producer serializes the event data into consistent format, like JSON or Appro.

141 00:21:29.660 00:21:33.340 Yeshwanth Chowdhary Mineni: Just to maintain the schema uniformity across the pipeline.

142 00:21:33.650 00:21:41.100 Yeshwanth Chowdhary Mineni: Then, it was, configured using parameters such as, acts or, retries.

143 00:21:41.280 00:21:44.870 Yeshwanth Chowdhary Mineni: Or, batch, based on the batch size.

144 00:21:45.500 00:21:48.329 Yeshwanth Chowdhary Mineni: And it also included, like,

145 00:21:48.500 00:21:52.399 Yeshwanth Chowdhary Mineni: Logging as well. Like, where we have,

146 00:21:52.520 00:22:02.880 Yeshwanth Chowdhary Mineni: Where it… where we can capture, like, if there are any failures or metrics while… while there is any latency or, issues while we’re monitoring.

147 00:22:03.240 00:22:11.939 Yeshwanth Chowdhary Mineni: And then it is deployed as a containerized service with several environment variables and the secrets that we have.

148 00:22:12.040 00:22:18.700 Awaish Kumar: Kafka was self-managed, self-hosted, or it was a managed version of Kafka?

149 00:22:18.860 00:22:20.119 Awaish Kumar: Not true, what you’re saying.

150 00:22:21.370 00:22:24.369 Yeshwanth Chowdhary Mineni: So, it was typically self-managed.

151 00:22:24.880 00:22:27.869 Yeshwanth Chowdhary Mineni: And, it’s on the cloud.

152 00:22:27.980 00:22:29.130 Yeshwanth Chowdhary Mineni: on those, Jord.

153 00:22:30.500 00:22:31.550 Yeshwanth Chowdhary Mineni: Because.

154 00:22:32.140 00:22:34.570 Awaish Kumar: That is… that is, like, managed service.

155 00:22:34.780 00:22:37.690 Awaish Kumar: Self-hosted means that you are hosting it.

156 00:22:38.990 00:22:39.770 Awaish Kumar: Right.

157 00:22:40.080 00:22:43.300 Awaish Kumar: if you’re using Azure, if it is hosted on Azure.

158 00:22:43.500 00:22:47.830 Awaish Kumar: Cloud, that means it’s kind of a managed service that you are using, right?

159 00:22:49.390 00:22:51.359 Awaish Kumar: is hosted by Zoom.

160 00:22:53.180 00:23:01.230 Yeshwanth Chowdhary Mineni: Yes, but then again, like, we used to use, VM for that, so I don’t know how to correlate that.

161 00:23:02.340 00:23:12.670 Awaish Kumar: No, I mean, that’s… that’s my point, like, was it a managed service? Was it… was… were you only using Azure compute instance and then hosting it yourself on top of it?

162 00:23:13.570 00:23:16.569 Awaish Kumar: How was it? I don’t know, like, I’m just asking.

163 00:23:17.130 00:23:25.819 Yeshwanth Chowdhary Mineni: So… like, I guess it was managed service, my bad, sorry about it.

164 00:23:26.620 00:23:30.099 Yeshwanth Chowdhary Mineni: We leverage the managed service itself.

165 00:23:30.550 00:23:32.140 Yeshwanth Chowdhary Mineni: And,

166 00:23:32.930 00:23:38.100 Yeshwanth Chowdhary Mineni: But, like, I was just confused, because we used to host it using a VM that we have.

167 00:23:38.250 00:23:43.410 Yeshwanth Chowdhary Mineni: through Azure, so… Yeah, I don’t have,

168 00:23:44.140 00:23:48.440 Yeshwanth Chowdhary Mineni: in-depth idea about it, but, like, I used to use that.

169 00:23:48.440 00:23:50.550 Awaish Kumar: My question is, were you part of that?

170 00:23:51.090 00:23:59.339 Awaish Kumar: like, you don’t need to, ex… like, explain, or things that you were not part of. Like, you can tell me, okay, that was not…

171 00:23:59.480 00:24:04.829 Awaish Kumar: not the… not… I’m… I was not the one that… was responsible for Kafka, right?

172 00:24:04.830 00:24:13.490 Yeshwanth Chowdhary Mineni: No, I was not responsible for Kafka, I used to handle, like, to envision part and, like, to land it into Databricks, that’s it.

173 00:24:13.490 00:24:24.309 Awaish Kumar: Yeah, but my question is, again, like, in Kafka, there are two parts. One, there’s the producer, and then there’s a consumer. Were you only on the consumer side, or were you also on the producer side?

174 00:24:25.730 00:24:29.029 Yeshwanth Chowdhary Mineni: I was on the producer side, I would say.

175 00:24:29.690 00:24:32.190 Yeshwanth Chowdhary Mineni: I was on the… Yes.

176 00:24:32.560 00:24:40.589 Awaish Kumar: Yeah, producer means the app, that is the events that are happening in an app, which is sending to Kafka.

177 00:24:40.700 00:24:42.179 Awaish Kumar: That is a producer.

178 00:24:42.580 00:24:48.240 Yeshwanth Chowdhary Mineni: No, no, I was on the consumer end, like, I was on the user end of Kafka. I’m so sorry, yeah, my bad.

179 00:24:49.010 00:24:51.409 Awaish Kumar: That is the consumer side. That’s why I asked.

180 00:24:51.410 00:24:54.559 Yeshwanth Chowdhary Mineni: Yeah, I was on the consumer end, yes.

181 00:24:56.340 00:25:01.739 Awaish Kumar: So… Okay, if you were on the consumer end, then,

182 00:25:03.150 00:25:08.919 Awaish Kumar: Do you know any, like, different patterns of, like, consuming patterns?

183 00:25:09.610 00:25:13.340 Awaish Kumar: from Kafka Topics, like, how… different,

184 00:25:15.870 00:25:22.199 Awaish Kumar: There are multiple consumers of the same topic, and… Got a different bike.

185 00:25:22.490 00:25:30.220 Yeshwanth Chowdhary Mineni: Yeah, I mean… Something that comes up to my mind is that, like, we… we have,

186 00:25:31.770 00:25:34.079 Yeshwanth Chowdhary Mineni: I mean, usually,

187 00:25:35.010 00:25:52.010 Yeshwanth Chowdhary Mineni: each, each partition that we have in Kafka is, consumed, like, to a particular group, or one particular user, so this is something that comes up to my mind, like, the consumer group pattern, or, like, the broadcast pattern, or,

188 00:25:52.160 00:25:57.470 Yeshwanth Chowdhary Mineni: Com- or, we have, when we have multiple consumers,

189 00:25:57.800 00:26:01.800 Yeshwanth Chowdhary Mineni: We have, competing, consumers pattern.

190 00:26:02.530 00:26:11.260 Yeshwanth Chowdhary Mineni: And, this is most common… I mean, like, this is most common and, like, efficient way for, like, when we are trying to build pipelines.

191 00:26:11.260 00:26:19.419 Awaish Kumar: what kind of… writing… so mainly your job was writing Spark jobs? Yes. That, from some Kafka topic.

192 00:26:19.780 00:26:20.120 Yeshwanth Chowdhary Mineni: Yes.

193 00:26:20.120 00:26:22.990 Awaish Kumar: Right? Can you do some transformation?

194 00:26:23.160 00:26:23.550 Yeshwanth Chowdhary Mineni: Yes.

195 00:26:23.550 00:26:27.840 Awaish Kumar: And store it, into some warehouse.

196 00:26:28.520 00:26:29.639 Yeshwanth Chowdhary Mineni: Yes, true.

197 00:26:30.320 00:26:33.660 Awaish Kumar: Right? So, were you also part of,

198 00:26:33.930 00:26:37.220 Awaish Kumar: Setting up clusters and things like that.

199 00:26:37.710 00:26:42.810 Awaish Kumar: Like, the Spark clusters, managed Spark clusters.

200 00:26:43.030 00:26:53.120 Yeshwanth Chowdhary Mineni: Yes, I did configure clusters, like, based on, like, high concurrency, or, the standard, or… like…

201 00:26:53.430 00:26:59.149 Yeshwanth Chowdhary Mineni: I used to select the particular instance types and sizes based on the load, actually.

202 00:26:59.750 00:27:02.920 Awaish Kumar: Okay, how that broadcast join works?

203 00:27:04.560 00:27:09.389 Yeshwanth Chowdhary Mineni: So… Like, in broadcast joints, like…

204 00:27:10.000 00:27:14.309 Yeshwanth Chowdhary Mineni: It is usually used to optimize the join operations.

205 00:27:14.930 00:27:16.440 Yeshwanth Chowdhary Mineni: like.

206 00:27:16.440 00:27:17.710 Awaish Kumar: Oh my goodness.

207 00:27:17.710 00:27:18.440 Yeshwanth Chowdhary Mineni: Sorry.

208 00:27:18.900 00:27:21.050 Awaish Kumar: How does it optimizes the joint operation?

209 00:27:22.710 00:27:31.780 Yeshwanth Chowdhary Mineni: So… Like, by minimizing the data shuffling that we have across the cluster, which is often the…

210 00:27:32.230 00:27:36.170 Yeshwanth Chowdhary Mineni: most expensive part in a join, I would say.

211 00:27:36.760 00:27:38.959 Yeshwanth Chowdhary Mineni: Especially when it comes to, like,

212 00:27:38.960 00:27:41.060 Awaish Kumar: That’s the question.

213 00:27:42.050 00:27:54.960 Yeshwanth Chowdhary Mineni: So… Usually, like, it reduces the CPU overhead, or, like, it speeds up the join operation significantly.

214 00:27:55.290 00:27:59.580 Yeshwanth Chowdhary Mineni: Like, the network input or output is, drastically reduced.

215 00:28:00.020 00:28:00.620 Yeshwanth Chowdhary Mineni: album.

216 00:28:00.620 00:28:02.870 Awaish Kumar: I understand, like.

217 00:28:03.150 00:28:10.570 Awaish Kumar: the purpose is to optimize… optimize it, right? But how does it do it, right? What exactly…

218 00:28:12.200 00:28:13.910 Awaish Kumar: It does. What is the technique?

219 00:28:14.080 00:28:17.810 Awaish Kumar: That makes it to minimize the…

220 00:28:18.280 00:28:20.729 Awaish Kumar: Shuffling or minimize the network cost.

221 00:28:22.040 00:28:29.120 Yeshwanth Chowdhary Mineni: It will… I mean, it will align according to the matching keys, I would say. That’s the…

222 00:28:29.440 00:28:34.300 Yeshwanth Chowdhary Mineni: way that it works. It has in-memory hash map, like.

223 00:28:34.770 00:28:44.009 Yeshwanth Chowdhary Mineni: across the broadcasted, small clusters, that’s how it will, look for, like, the key lookups during the join.

224 00:28:44.240 00:28:47.519 Yeshwanth Chowdhary Mineni: That’s how it will work efficiently to make the…

225 00:28:48.690 00:28:49.969 Awaish Kumar: Okay, so…

226 00:28:54.410 00:28:59.499 Awaish Kumar: So, like, is there any way that you… how do you do it? Like, do you remember the…

227 00:28:59.840 00:29:03.740 Awaish Kumar: Syntax or something, like, how you apply broadcast join?

228 00:29:05.510 00:29:07.820 Yeshwanth Chowdhary Mineni: So, for example, like,

229 00:29:09.210 00:29:16.330 Yeshwanth Chowdhary Mineni: any, any, any name of the dataset, for example, I can say, like, DataDF,

230 00:29:16.810 00:29:23.750 Yeshwanth Chowdhary Mineni: Is equal to, Like, we write our broadcast.

231 00:29:24.020 00:29:31.779 Yeshwanth Chowdhary Mineni: the particular, Data frame that we want to join, and I do a…

232 00:29:32.010 00:29:40.850 Yeshwanth Chowdhary Mineni: casting of a joint key that we have, then, then we choose the inner or, outer. So that’s…

233 00:29:41.810 00:29:43.330 Awaish Kumar: What is a bucket joint?

234 00:29:44.130 00:29:44.860 Yeshwanth Chowdhary Mineni: Sorry?

235 00:29:46.040 00:29:47.460 Awaish Kumar: What is the bucket joint?

236 00:29:49.460 00:29:55.890 Yeshwanth Chowdhary Mineni: So… Bucket join is, like, when you…

237 00:29:56.150 00:30:05.739 Yeshwanth Chowdhary Mineni: Like, we… when we… when we have the tables which are, like, hashed or divided based on the join key, like, into the number of buckets.

238 00:30:06.180 00:30:08.330 Yeshwanth Chowdhary Mineni: That’s when we do the bucket join.

239 00:30:08.630 00:30:11.420 Awaish Kumar: What is the difference between broadcast versus marketing?

240 00:30:12.540 00:30:21.249 Yeshwanth Chowdhary Mineni: So, the major difference is that broadcasts usually is entirely executed on, like,

241 00:30:21.570 00:30:32.700 Yeshwanth Chowdhary Mineni: small datasets, like, across all nodes. Like, each node joins the small datasets locally within its, partition of the large dataset.

242 00:30:32.960 00:30:35.330 Yeshwanth Chowdhary Mineni: Whereas, bucket…

243 00:30:35.900 00:30:43.610 Yeshwanth Chowdhary Mineni: Bucket join operates such as, like, we have, pre-bucketed, the dataset is entirely pre-bucketed, like, hashed and divided.

244 00:30:43.710 00:30:46.790 Yeshwanth Chowdhary Mineni: On the join key into a fixed number of buckets.

245 00:30:47.080 00:30:48.440 Yeshwanth Chowdhary Mineni: And,

246 00:30:48.560 00:30:56.230 Yeshwanth Chowdhary Mineni: Each bucket is aligned, so matching keys fall into, like, the same bucket number, or into both datasets.

247 00:30:56.520 00:30:57.570 Awaish Kumar: And.

248 00:30:57.570 00:30:58.180 Yeshwanth Chowdhary Mineni: Yes.

249 00:30:58.180 00:31:00.249 Awaish Kumar: Sorry, sorry to cut you off, but…

250 00:31:00.470 00:31:10.080 Awaish Kumar: Yeah, it was nice meeting you. We are on time. We are just… it’s just one minute left, so… did you have any questions for me?

251 00:31:11.640 00:31:21.029 Yeshwanth Chowdhary Mineni: I just want to know, like, what is the stack that we use at GrainForge, and, like, how does a typical day look like for a data engineer?

252 00:31:21.800 00:31:26.809 Awaish Kumar: So, for a data engineer, it looks like… it looks like we have… we use linear.

253 00:31:26.950 00:31:27.909 Awaish Kumar: As a word.

254 00:31:28.510 00:31:33.439 Awaish Kumar: Project management tool, you use that to see what different

255 00:31:35.590 00:31:39.460 Awaish Kumar: What different tickets are there, and what you have to work on on each client.

256 00:31:39.900 00:31:43.279 Awaish Kumar: And, you might be assigned for…

257 00:31:43.400 00:31:46.729 Awaish Kumar: More than 2 or 3… like, more than 1 or 2 clients.

258 00:31:47.530 00:31:47.850 Yeshwanth Chowdhary Mineni: Okay.

259 00:31:47.850 00:31:54.360 Awaish Kumar: time, and you have to divide your time between multiple clients, and then you have to look at the linear and swag.

260 00:31:54.640 00:31:55.870 Awaish Kumar: I’m gonna…

261 00:31:57.580 00:32:09.440 Awaish Kumar: basically, yeah, work through it. We don’t have regular stand-ups, but you have to work through ways of, okay, if you think your ticket is blocked, you are responsible to escalate it.

262 00:32:09.640 00:32:16.189 Awaish Kumar: In Slack, ask peers, if you are not getting responses, like, huddle with your team members.

263 00:32:16.790 00:32:22.130 Awaish Kumar: Me, anyone in the team, or even outside of, like, the…

264 00:32:23.540 00:32:38.700 Awaish Kumar: like, the BI, people, analysts, wherever it is blocked, like, it’s our duty to actually go and try to unblock it. If it is not possible, write a comment as a status that, okay, it’s not possible to unblock it right now for so-and-so reason, and that’s why we are putting it in block.

265 00:32:38.920 00:32:44.720 Awaish Kumar: And then… Apart from that, we don’t have regular stand-ups, as I mentioned, so we don’t, like, every day…

266 00:32:45.090 00:32:53.169 Awaish Kumar: Go with that, but we have quite a few meetings where you have to read more and actually give updates, get the updates, get things unblocked.

267 00:32:53.370 00:32:57.210 Awaish Kumar: And, then actually work on… on the tickets.

268 00:32:58.530 00:33:02.509 Awaish Kumar: We use Cursor, we use AI in our development.

269 00:33:02.830 00:33:05.920 Awaish Kumar: To speed up our, yeah, development.

270 00:33:07.260 00:33:08.680 Awaish Kumar: And basically.

271 00:33:09.810 00:33:14.810 Awaish Kumar: Yeah, that’s it. So we basically have the task, which we have tickets, and then we work to…

272 00:33:14.940 00:33:17.939 Awaish Kumar: Work on the music as I push the changes.

273 00:33:18.110 00:33:21.620 Awaish Kumar: Send the updates to the team’s team and the…

274 00:33:22.300 00:33:24.219 Awaish Kumar: And the clients, and that’s all.

275 00:33:25.340 00:33:33.040 Awaish Kumar: Used, yeah, different tools, like, and in terms of tech stack, we have, like, tools like Polytomic, Fivetran.

276 00:33:33.180 00:33:35.729 Awaish Kumar: That do for… that are used for data ingestion.

277 00:33:35.870 00:33:42.379 Awaish Kumar: sometimes we have Dexter as our orchestration tool, so when we don’t find any

278 00:33:42.850 00:33:50.629 Awaish Kumar: connector in, for example, Polytopic or Fivetran, we have to write our own scripts that are hosted on Dexter.

279 00:33:54.080 00:34:02.960 Awaish Kumar: Dexter is a tool that basically is used between DE and AI team, so AI team has their own automation. They are also… they also go on…

280 00:34:03.180 00:34:05.499 Awaish Kumar: Only Dexter as well.

281 00:34:06.020 00:34:14.870 Awaish Kumar: And apart from that, The other stake depends on the client, so sometimes clients use Snowflakes.

282 00:34:15.389 00:34:22.080 Awaish Kumar: Sometimes clients use, big carriers, their warehouse, sometimes it is redshifted.

283 00:34:22.540 00:34:23.389 Yeshwanth Chowdhary Mineni: Yes.

284 00:34:24.639 00:34:28.979 Awaish Kumar: And then we use dbt on top of it, forward, dear.

285 00:34:29.290 00:34:30.370 Awaish Kumar: Processing.

286 00:34:32.370 00:34:38.420 Awaish Kumar: some clients who prefer dbt Cloud, but many of them just are on DVD code, so we just…

287 00:34:38.639 00:34:46.250 Awaish Kumar: use code and our deployment for that happens on the GitHub action, so we run our dbt jobs, on…

288 00:34:46.400 00:34:47.730 Awaish Kumar: GitHub Actions.

289 00:34:48.080 00:34:50.650 Awaish Kumar: In staging and production as well.

290 00:34:50.960 00:34:53.619 Awaish Kumar: And… yeah, this is mainly it.

291 00:34:53.949 00:34:56.489 Awaish Kumar: We don’t have right now, like, we…

292 00:34:56.780 00:35:03.089 Awaish Kumar: Some of our clients have, like, streaming requirements that we do using Snowflake itself.

293 00:35:03.330 00:35:07.470 Awaish Kumar: So if data is coming in into the S3, lands in some storage.

294 00:35:07.890 00:35:11.459 Awaish Kumar: it’s piped into Snowflake, and then use dbt.

295 00:35:11.570 00:35:15.559 Awaish Kumar: to transform and show the insights that AWI do.

296 00:35:16.110 00:35:18.839 Awaish Kumar: That’s the normal… that’s the flow for now.

297 00:35:19.090 00:35:24.969 Awaish Kumar: We have… we don’t have any clients yet, right now, that uses Databricks or something.

298 00:35:25.220 00:35:28.170 Awaish Kumar: But, yeah, that is something we can offer.

299 00:35:28.960 00:35:31.490 Awaish Kumar: Sale as a service, but.

300 00:35:33.340 00:35:36.889 Yeshwanth Chowdhary Mineni: I mean, the architecture is pretty same when it comes to the…

301 00:35:37.220 00:35:43.780 Yeshwanth Chowdhary Mineni: Data texture-wise, both are same, Databricks and Snowflake, but, like, Yeah.

302 00:35:43.780 00:35:48.360 Awaish Kumar: I understand that. There’s a little bit of, like,

303 00:35:50.110 00:35:56.000 Awaish Kumar: Difference of… of taste, and the difference of, like, Use case, sometimes.

304 00:35:56.920 00:36:02.950 Awaish Kumar: But normally, yeah, it’s like, everything can be done in any of the tools, so…

305 00:36:04.170 00:36:07.690 Awaish Kumar: It’s not a big problem, okay? And,

306 00:36:08.570 00:36:11.529 Awaish Kumar: Yeah, these are the tech stack right now.

307 00:36:12.120 00:36:17.539 Awaish Kumar: We use AI a lot in our company, not just as engineers, as a developer, but

308 00:36:17.810 00:36:21.990 Awaish Kumar: Everyone in the company, uses AI to improve our workflow.

309 00:36:22.280 00:36:27.500 Awaish Kumar: even the sales marketing teams, they use AI to speed up their… Day-to-day work.

310 00:36:27.500 00:36:28.530 Yeshwanth Chowdhary Mineni: Makes sense.

311 00:36:29.400 00:36:29.980 Awaish Kumar: Yeah.

312 00:36:30.340 00:36:35.430 Awaish Kumar: So, we use Guster a lot, we have, like, Arabic server… It’s our,

313 00:36:36.040 00:36:53.889 Awaish Kumar: it has become a workspace now, it’s not as… just an ID, but it’s more like… it’s connected to all sorts of tools. Google Docs, GitHub, Slack, everything. So you… from… you don’t have to go out of Caso, basically, to do anything. You do development, get PR, supports on Slack.

314 00:36:54.320 00:36:55.840 Awaish Kumar: Everything from… I’m sorry.

315 00:36:56.700 00:36:57.650 Yeshwanth Chowdhary Mineni: Makes sense.

316 00:36:58.000 00:37:00.419 Yeshwanth Chowdhary Mineni: Thanks a lot for sharing all the detail.

317 00:37:00.830 00:37:02.240 Awaish Kumar: Yeah, no worries, thank you.

318 00:37:02.390 00:37:03.220 Awaish Kumar: Okay.

319 00:37:03.730 00:37:06.590 Yeshwanth Chowdhary Mineni: Yeah, thanks again, Vish, for taking up the interview.

320 00:37:06.940 00:37:14.660 Awaish Kumar: Yeah, so after I submit my feedback, our conglar will come back to you with the update.

321 00:37:14.850 00:37:20.969 Awaish Kumar: And, I think you… you might have heard from Kayla about our recruiting process.

322 00:37:21.370 00:37:23.270 Yeshwanth Chowdhary Mineni: Yes, yes, I did, actually.

323 00:37:23.270 00:37:25.719 Awaish Kumar: So, yeah. Okay, thank you.

324 00:37:25.880 00:37:27.570 Yeshwanth Chowdhary Mineni: Yeah, thank you so much.

325 00:37:28.080 00:37:28.880 Awaish Kumar: You can play.

326 00:37:29.150 00:37:29.869 Yeshwanth Chowdhary Mineni: You too, bye.