2025-07-25_data_engineer_interview_shravya_yermal

Meeting Title: Data Engineer Interview (Shravya Yermal) Date: 2025-07-25 Meeting participants: Shravya Yermal, Awaish Kumar

WEBVTT

1 00:01:16.060 ⇒ 00:01:17.360 Awaish Kumar: Hello. Yeah.

2 00:01:19.195 ⇒ 00:01:21.120 Shravya Yermal: Hello! Hi! Am I audible?

3 00:01:22.505 ⇒ 00:01:23.050 Awaish Kumar: Yes.

4 00:01:23.950 ⇒ 00:01:25.229 Shravya Yermal: Hi! Good morning!

5 00:01:25.840 ⇒ 00:01:27.189 Awaish Kumar: Good morning. How are you doing.

6 00:01:27.190 ⇒ 00:01:31.049 Shravya Yermal: I’m great. How about you? How is your day started.

7 00:01:31.050 ⇒ 00:01:31.690 Awaish Kumar: Right?

8 00:01:32.726 ⇒ 00:01:36.409 Awaish Kumar: Yeah, I’m good as well so like, where are you located?

9 00:01:36.460 ⇒ 00:01:38.979 Shravya Yermal: Right now. I’m located in Austin, Texas.

10 00:01:39.820 ⇒ 00:01:40.560 Awaish Kumar: Okay.

11 00:01:41.060 ⇒ 00:01:42.140 Shravya Yermal: How about you?

12 00:01:43.120 ⇒ 00:01:44.380 Awaish Kumar: Yeah, I’m in.

13 00:01:44.600 ⇒ 00:01:47.739 Awaish Kumar: I’m from Pakistan right now in Azerbaijan.

14 00:01:47.850 ⇒ 00:01:54.973 Awaish Kumar: So yeah, like, I will share the agenda of this meeting 1st and then we can

15 00:01:55.500 ⇒ 00:01:58.180 Awaish Kumar: start from there. So yeah.

16 00:01:59.040 ⇒ 00:02:11.144 Awaish Kumar: so like, I mean, we, we are going to like, after the introductions. We are just going to discuss more about your past experiences, the kind of projects you have worked and

17 00:02:11.680 ⇒ 00:02:15.240 Awaish Kumar: you know, like, I know, we deep dive into those projects, too.

18 00:02:15.410 ⇒ 00:02:19.440 Awaish Kumar: See the technical back to for each project.

19 00:02:19.770 ⇒ 00:02:20.075 Shravya Yermal: Okay.

20 00:02:20.642 ⇒ 00:02:26.069 Awaish Kumar: Yeah. So my name is Krish Kumar, and I’m kind of engineering manager and

21 00:02:27.439 ⇒ 00:02:34.820 Awaish Kumar: doing a tech technical lead as well, we, the brain forge is a consultancy from. And we are

22 00:02:35.190 ⇒ 00:02:39.200 Awaish Kumar: providing data and AI consulting services to

23 00:02:39.900 ⇒ 00:02:43.679 Awaish Kumar: to the clients across different industries.

24 00:02:44.200 ⇒ 00:02:44.970 Awaish Kumar: And

25 00:02:45.910 ⇒ 00:02:56.188 Awaish Kumar: like, we are mainly a data services company. But now we are providing a lot of AI services as well, and we have got. We have got a lot of AI clients.

26 00:02:57.610 ⇒ 00:03:06.889 Awaish Kumar: So yeah, so that’s like brief introduction of brain forge, what we are doing and the kind of environment we work in is the

27 00:03:08.531 ⇒ 00:03:17.080 Awaish Kumar: like, we like hire people from all over the world. So they can work on their own time zone they can manage.

28 00:03:17.230 ⇒ 00:03:23.129 Awaish Kumar: And they just have to manage their time. But yeah, we are flexible with that. And we also work

29 00:03:24.334 ⇒ 00:03:35.119 Awaish Kumar: like in in different work like streams here like full time, part time, kind of which which can be, which is suitable to each

30 00:03:35.360 ⇒ 00:03:39.480 Awaish Kumar: whatever is suitable for each candidate and the company requirements.

31 00:03:40.130 ⇒ 00:03:47.929 Awaish Kumar: it doesn’t yeah so yeah, now you can introduce yourself and then yeah, we can move on.

32 00:03:48.190 ⇒ 00:04:12.959 Shravya Yermal: Yeah, sure. So let me tell you, my background totally. So. My name is Shravia Yarma, and I am a data engineer with 4 years of experience, almost 4 years of experience in data engineering. Until now I did my bachelor’s in India and worked for one of the biggest company consulting company infosys for 2 years. Then

33 00:04:12.970 ⇒ 00:04:19.500 Shravya Yermal: I came over to us, pursued Masters, and worked for visa for a while.

34 00:04:19.500 ⇒ 00:04:45.720 Shravya Yermal: and then right now I’m with Vcloud, so with Vcloud. I am a contract. I’m on a contract basis. I work part time for them, so they pay me hourly basis. So the reason for me to find a job like actively looking out for opportunities is like is because there was a time like, 2 months ago I was sick, and

35 00:04:45.740 ⇒ 00:04:52.559 Shravya Yermal: there was a like. I did not get a paid time off. It was a pay loss. So

36 00:04:52.560 ⇒ 00:05:17.019 Shravya Yermal: then I realized, no, I need a full time job, so that I have the leverage of that benefit, and also full time, have other benefits like health insurance, and all health insurance is must in us. That that’s what I realized too much. So that is the reason I’m proactively looking for the job. And I’m reaching out to people. And I reached out to Uttam in that

37 00:05:17.250 ⇒ 00:05:41.409 Shravya Yermal: scenario, and then Utham said that I will schedule with one of my managers, and that is how this interview got landed. So yeah, that’s the background. How did I land here? And with the technical skills in infosys at visa, I have been doing data modeling and aware of data, warehousing tools.

38 00:05:41.410 ⇒ 00:05:55.340 Shravya Yermal: Python and SQL. Is my go to language right now. But earlier at when I used to work in India, I have worked with Java microservices a lot under data engineering stack.

39 00:05:55.560 ⇒ 00:05:57.410 Shravya Yermal: Also.

40 00:05:57.550 ⇒ 00:06:24.849 Shravya Yermal: I have worked over end to end etl projects and even the pipelines. I have a well understanding like, how does the data flow? And how does it go over multiple projects? And while I was like, I have a brief knowledge about the banking domain, because my I have worked in the banking domain as well as the payments network at visa. So overall 4 years are in the Fintech.

41 00:06:25.360 ⇒ 00:06:30.120 Shravya Yermal: like if you see at a whole lot. So that’s my.

42 00:06:30.120 ⇒ 00:06:34.550 Awaish Kumar: Okay. So you have worked with visa for like for full time.

43 00:06:34.550 ⇒ 00:06:39.320 Shravya Yermal: Yeah, as a full time for around like, 1516 months.

44 00:06:39.490 ⇒ 00:06:40.230 Shravya Yermal: Yeah.

45 00:06:40.850 ⇒ 00:06:43.889 Awaish Kumar: Okay? And what? What was the reason to like? Leave.

46 00:06:43.890 ⇒ 00:07:00.990 Shravya Yermal: Yeah. So after, like, before, there were elections around end of 2024, and before 2025 started. So they had implemented policies right in the mid year of 2024 that

47 00:07:00.990 ⇒ 00:07:17.369 Shravya Yermal: they had to do restructuring, and Reorg was happening in inside, and they did not wanted to sponsor visa like h. 1 b visa. So I am a candidate who needs a h. 1 b visa sponsorship. So they had to let me go. For that reason.

48 00:07:18.250 ⇒ 00:07:19.720 Shravya Yermal: while restructuring.

49 00:07:20.270 ⇒ 00:07:22.250 Awaish Kumar: Come back to like?

50 00:07:22.740 ⇒ 00:07:25.869 Awaish Kumar: Can you share a single project

51 00:07:27.440 ⇒ 00:07:32.910 Awaish Kumar: if we have, which is which? Are you like most proud of a data pipeline project?

52 00:07:33.840 ⇒ 00:07:45.459 Shravya Yermal: Yeah, sure. So when when I started data engineering, I I will tell you from the scratch. I did not know Kafka. So I was

53 00:07:45.490 ⇒ 00:08:06.679 Shravya Yermal: not aware of Kafka technology, which is a big data technology which I learned from the scratch. And while working for the project the client was a banking project. The banking client wanted to do their cloud migration from on-prem to on cloud. They wanted to move

54 00:08:06.710 ⇒ 00:08:26.460 Shravya Yermal: from the old oracle dB to new cloud migration. This was around the covid time. They wanted to go live in that covid time. So we at that time we used the cloud platform as aws. And how did we move? Was

55 00:08:27.000 ⇒ 00:08:32.319 Shravya Yermal: they? They wanted to. The reason for cloud migration was they wanted to

56 00:08:33.498 ⇒ 00:08:47.500 Shravya Yermal: save the cost, because whenever a user used to come to their platform. Suppose a user is coming to just check a bank statement, they used to

57 00:08:47.640 ⇒ 00:09:10.419 Shravya Yermal: query their old oracle dB, again and again, which used to cost them each transaction used to cost them. Suppose 20 bucks. So every time going and querying was a very costly thing for them. So what they did, what they wanted was to cost, save so implementing this cloud migration project

58 00:09:10.420 ⇒ 00:09:31.460 Shravya Yermal: was saving their cost. Eventually we made the database such a way that whenever we made it available at a very low latency speed, and as well as when they hit the database again and again, that they have their data. It’s not like.

59 00:09:31.460 ⇒ 00:09:35.340 Awaish Kumar: My question, what what was your role in that project?

60 00:09:35.580 ⇒ 00:09:58.670 Shravya Yermal: So my role was making making the data ingestion available. So I learned, I used the Kafka technology to prepare back end Apis, where data was fetched from the oracle dB. And then loaded into the aws redshift. So I.

61 00:09:58.670 ⇒ 00:10:01.660 Awaish Kumar: Moved from oracle to that shift

62 00:10:02.223 ⇒ 00:10:05.775 Awaish Kumar: like this is kind of data migration. Ask

63 00:10:06.627 ⇒ 00:10:18.452 Shravya Yermal: Yeah, that was the data ingestion. And that was the data ingestion part in the whole data lifecycle. And I was involved mainly into that data insertion lifecycle, where,

64 00:10:18.990 ⇒ 00:10:44.769 Shravya Yermal: the Apis, like Kafka streaming Apis, was the part which I did where Java the Apis were made using the Java spring boot technology and primarily in the microservice architecture. So eventually, we made individual independent micro services for each of the modularities and the functionalities which pulled out the data.

65 00:10:45.690 ⇒ 00:10:49.590 Awaish Kumar: In the ingestion part of it. So Kafka is a tool.

66 00:10:49.750 ⇒ 00:10:55.040 Awaish Kumar: We just have to utilize its sdks to push data

67 00:10:55.260 ⇒ 00:11:01.560 Awaish Kumar: to to the Kafka Cloud, or whatever we are using Kafka on Prem, or whatever. So.

68 00:11:02.040 ⇒ 00:11:07.199 Awaish Kumar: because, like what are your what were your? Why, you were like building Apis.

69 00:11:08.300 ⇒ 00:11:26.389 Shravya Yermal: So I I was rebuilding Apis to pull the data so that those Apis would get the live streaming like. So the whole project was migrating data like it was. Let me tell you.

70 00:11:26.390 ⇒ 00:11:40.297 Awaish Kumar: Let me just clarify. So is it. Are you talking about the integration of data which is already, in some data database we want to migrate to some other, or we are here talking about

71 00:11:41.430 ⇒ 00:11:49.559 Awaish Kumar: not about the historical data migration, but the new data which is coming in. So it doesn’t let in oracle. dB, directly. Go to the new

72 00:11:50.300 ⇒ 00:11:53.320 Awaish Kumar: database. So what what is your like? What

73 00:11:53.871 ⇒ 00:12:00.860 Awaish Kumar: like the in both of these which part you worked on, and how like you implemented that.

74 00:12:01.170 ⇒ 00:12:26.469 Shravya Yermal: Sure, so they were both the parts like cloud migration was a very big project. They were both the parts we had to transfer the historical as well as the transactional data. So for transactional data, I was involved in the transactional data, mainly historical data, too. But I have a brief understanding. My teammate was doing that in the Kafka. In the transactional real time transactional.

75 00:12:26.470 ⇒ 00:12:43.849 Shravya Yermal: I was involved in putting up the like, using up that confluent platform and pulling out the data, using the like, setting up the Apis like making the Apis work, basically writing the.

76 00:12:43.850 ⇒ 00:12:45.399 Awaish Kumar: Where’s the data coming from?

77 00:12:45.580 ⇒ 00:12:47.369 Awaish Kumar: What is the data? What about it?

78 00:12:47.760 ⇒ 00:12:55.550 Awaish Kumar: You can structure. So you can structure your parts thinking about like, what were my data sources where data is coming from?

79 00:12:55.940 ⇒ 00:12:58.499 Awaish Kumar: What you implemented like. What kind of

80 00:12:59.242 ⇒ 00:13:16.360 Awaish Kumar: did you made a project? Or I write python scripts, or you wrote SQL, whatever you did, and then what sdks do you? You used to put your data on Con Cloud, and then from there where it landed, how it processed thing like that.

81 00:13:17.183 ⇒ 00:13:18.330 Shravya Yermal: Yeah, sure.

82 00:13:18.790 ⇒ 00:13:22.180 Shravya Yermal: So what happened was that

83 00:13:23.120 ⇒ 00:13:39.269 Shravya Yermal: data like from oracle? When we used to like my apis used to pull the data not directly from the oracle database. It used to 1st land into the like.

84 00:13:39.910 ⇒ 00:13:57.099 Shravya Yermal: There was one like. So for me, the source of like. While doing the data warehousing for transformational like Apis was basically for transformations for changing the unstructured data. Whatever raw data.

85 00:13:57.100 ⇒ 00:13:59.430 Awaish Kumar: You know why we write Apis.

86 00:14:01.380 ⇒ 00:14:04.840 Shravya Yermal: Yeah, to to pull the data from, like.

87 00:14:06.440 ⇒ 00:14:10.879 Shravya Yermal: yeah, to to pull the data, to pull the data.

88 00:14:10.880 ⇒ 00:14:19.980 Awaish Kumar: With Api. Api is like we if we want to provide services to someone, it’s like Api is a way to serve someone right.

89 00:14:19.980 ⇒ 00:14:20.670 Shravya Yermal: Yes. Yeah.

90 00:14:20.670 ⇒ 00:14:25.019 Awaish Kumar: So my platform, some serve some website, whatever it is.

91 00:14:25.020 ⇒ 00:14:46.358 Shravya Yermal: So there were. There were front end Apis and there were back end Apis. So front end Apis were used by the other functional teams for other reasons. So they wanted they were giving some rest. Api calls so to fulfill those rest Api calls. I used to create Apis. So Apis used to pull data from the

92 00:14:46.720 ⇒ 00:14:51.620 Awaish Kumar: You know, restful Apis is are called back and off of

93 00:14:52.080 ⇒ 00:15:11.450 Awaish Kumar: upfront and platform. A front end platform is just calls like, for example, if I’m using react or Vjs or whatever I’m going to render, the data reacts is just going to render the data it is going to call back end to get the data. And that back end is implemented using restful Apis

94 00:15:13.064 ⇒ 00:15:16.449 Awaish Kumar: that is back in already. So

95 00:15:16.620 ⇒ 00:15:32.439 Awaish Kumar: you are in the back end. You didn’t you have restful Apis which are being used by front end? But the restful Apis? Why, you are building Apis to serve your like restful Apis like you can just.

96 00:15:32.650 ⇒ 00:15:33.080 Shravya Yermal: Oh!

97 00:15:33.080 ⇒ 00:15:49.990 Awaish Kumar: Right some python scripts to transform your data. So Apis can directly read. So I am trying to understand what is your use case. I don’t really get like Apis are being used to serve some other Apis like Front End can directly call your Apis as well.

98 00:15:50.470 ⇒ 00:16:18.660 Shravya Yermal: Yeah. So that’s how the microservice architecture was built that every service, like even the back end service or the front end service front end. Also they had their own Apis. So it was like a micro service architecture. So each micro service was kind of a Api which used to interact with each other whenever required, and that is how it was built. And each microservice was

99 00:16:18.660 ⇒ 00:16:20.059 Shravya Yermal: okay. So you are saying.

100 00:16:21.430 ⇒ 00:16:30.700 Awaish Kumar: Okay? So there’s 1 master, you’re saying what? There’s 1 front end which calls a master Api, which basically redirects to actual micro services. Right?

101 00:16:31.130 ⇒ 00:16:33.760 Shravya Yermal: Yeah. Yeah. Yes.

102 00:16:33.760 ⇒ 00:16:40.480 Awaish Kumar: One master for the front front end. And this master is going to decide like what service the user needs

103 00:16:40.850 ⇒ 00:16:43.289 Awaish Kumar: if he needs don’t like it.

104 00:16:43.390 ⇒ 00:16:50.500 Awaish Kumar: call micro service a microservice, BC. Whatever. So like. This was your architecture right? So.

105 00:16:50.500 ⇒ 00:16:51.110 Shravya Yermal: Yeah.

106 00:16:51.640 ⇒ 00:16:53.990 Awaish Kumar: So you are saying, you build some Apis

107 00:16:54.420 ⇒ 00:17:10.360 Awaish Kumar: right? And now I want to understand what are the micro service you built, which is basically a restful Apis plus some transformation. Right? So, yeah, now, yeah, what? What was exactly you doing in that micro service?

108 00:17:10.730 ⇒ 00:17:15.816 Shravya Yermal: So in that micro service what was happening is when

109 00:17:16.869 ⇒ 00:17:44.820 Shravya Yermal: There was one Ods layer on top of the. It was acting as a source of truth, like, there was a huge data in the history, and even in the real time, like I was building a pipeline for a real time data which will be incoming. But that was based on the history. Like whatever history, we used to get the Avro file format from the history data and based on that

110 00:17:44.820 ⇒ 00:18:07.720 Shravya Yermal: based on that Avro file. I used to using it as a dummy file. I used to get that raw data and perform some transformations into my Apis like, write some like lambda functions over like, try out some lambda functions and using the

111 00:18:07.900 ⇒ 00:18:31.840 Shravya Yermal: using the Kafka functions too. And I used to perform like a few few manipulations, like data manipulation, like standardizing the columns, or even removing the missing points like null values or the missing spaces. Maybe. And then those transformation, the basic transformation, like data preparation, were

112 00:18:31.930 ⇒ 00:18:37.830 Shravya Yermal: done. And then again, we used to put back that data into the redshift.

113 00:18:37.910 ⇒ 00:18:57.390 Shravya Yermal: That was my window like that was a very small window. But I used to do those data transformations and as well as when there were the phases. When these micro services used to go live. So when this is a different part from which I did until now.

114 00:18:57.390 ⇒ 00:19:08.400 Shravya Yermal: So, when used to go, live so many a times there used to be a plan that they needed a ad hoc analysis based on the

115 00:19:08.440 ⇒ 00:19:32.300 Shravya Yermal: like based on the current today’s date data. Suppose today there is some Xy based on the date they used. I used to apply partitions over the data, and there used to be an ad hoc analysis which we used to show to the executive leaders in the phase while the transformation was going on. Or basically I did monitoring

116 00:19:32.420 ⇒ 00:19:38.010 Shravya Yermal: for the fees. So that was another responsibility which I did in the project.

117 00:19:38.420 ⇒ 00:19:42.110 Shravya Yermal: So there were these 2 basic, but it.

118 00:19:42.110 ⇒ 00:19:47.419 Awaish Kumar: I didn’t understand that injection part yet. So it is like

119 00:19:47.990 ⇒ 00:19:57.069 Awaish Kumar: the like. I understand. You build an Api which maybe accept some request, and to does some transformation.

120 00:19:57.070 ⇒ 00:19:57.440 Shravya Yermal: Yes.

121 00:19:57.940 ⇒ 00:20:00.250 Awaish Kumar: Sends back the response right?

122 00:20:02.880 ⇒ 00:20:07.619 Awaish Kumar: And but like, what is the ingestion part here?

123 00:20:08.130 ⇒ 00:20:12.899 Awaish Kumar: It’s it’s I’m getting a request. I’m processing. I send some data back.

124 00:20:13.700 ⇒ 00:20:14.080 Awaish Kumar: Thank you.

125 00:20:14.080 ⇒ 00:20:23.289 Shravya Yermal: So the so I am getting the request. I’m sending it back. That was a part. And another data insertion was.

126 00:20:23.300 ⇒ 00:20:47.839 Shravya Yermal: when this gets successful. They used to apply like we used to apply that on the actual transformation, like the pipeline used to pull the data from the Cdc. Like we used to capture the data through Cdc onto the Ods layer. So from Ods layer, we used to do the data warehousing part.

127 00:20:47.840 ⇒ 00:20:54.690 Shravya Yermal: and when the data warehousing part was done, the data used to

128 00:20:54.700 ⇒ 00:21:07.170 Shravya Yermal: get pulled from the actually, it was used to get pulled from the Dms service, which was database migration storage service we used to use in aws.

129 00:21:07.170 ⇒ 00:21:09.410 Awaish Kumar: Named quite a lot of technologies

130 00:21:10.450 ⇒ 00:21:15.269 Awaish Kumar: I just want to focus on on one part which you really

131 00:21:15.400 ⇒ 00:21:22.660 Awaish Kumar: worked on are proud of and have real expertise on that one. So we can evaluate that part.

132 00:21:22.770 ⇒ 00:21:28.267 Awaish Kumar: So yeah, like, it’s, it’s just a lot of things you are describing.

133 00:21:29.300 ⇒ 00:21:32.846 Awaish Kumar: we we can like keep our like scope

134 00:21:33.840 ⇒ 00:21:38.802 Awaish Kumar: like, focus on, like some single project, a single kind of

135 00:21:39.510 ⇒ 00:21:44.200 Awaish Kumar: task or pipeline you’ve worked on, not not everything like.

136 00:21:44.330 ⇒ 00:22:10.200 Awaish Kumar: So if I’m I’m more focused on analytical part. So for the transactional systems, the request can come. I know, like, in a banking system. A transaction is coming. You you call in micro service. It can store it in some database, or or S 3, or postplace, or whatever, and like something is happening in a transactional systems. But how

137 00:22:10.380 ⇒ 00:22:13.971 Awaish Kumar: that is like. But that is not like,

138 00:22:17.410 ⇒ 00:22:27.079 Awaish Kumar: that’s like a transactional system. But how the data warehousing like, where’s the analytical part of it like how that data is going to data warehouse. I’m more interested in that part.

139 00:22:27.480 ⇒ 00:22:36.060 Awaish Kumar: And and then how it is being processed in like, when, how, how you are managing different change in data

140 00:22:36.410 ⇒ 00:22:40.460 Awaish Kumar: like what Ss Scd types you are using, and

141 00:22:41.610 ⇒ 00:22:46.400 Awaish Kumar: when it lends to warehousing, you process it beforehand.

142 00:22:46.510 ⇒ 00:22:49.600 Awaish Kumar: or you process after what kind of like

143 00:22:50.370 ⇒ 00:22:55.030 Awaish Kumar: Etl Elt, what kind of patterns you’re using for that one. Things like that.

144 00:22:55.030 ⇒ 00:23:13.110 Shravya Yermal: So, okay, regarding this, I would like to address this Elt part. So what used to happen was, we used to extract the data through Cdc. And the when. When the data was extracted, there was a Mini.

145 00:23:13.110 ⇒ 00:23:15.029 Awaish Kumar: Where the data is coming from.

146 00:23:15.540 ⇒ 00:23:23.400 Shravya Yermal: Data is coming from the source of truth. That is, the Ods layer, which is the redshift

147 00:23:25.300 ⇒ 00:23:26.010 Awaish Kumar: Okay.

148 00:23:26.010 ⇒ 00:23:39.390 Shravya Yermal: It’s a Ods layer on top of the like mentioned like dims and facts table in the redshift. And there is a Ods layer. So it’s coming from the redshift. And

149 00:23:39.520 ⇒ 00:23:49.700 Shravya Yermal: what used to happen was those the Cdc used to send the Avro files. So those Avro files

150 00:23:49.890 ⇒ 00:24:10.879 Shravya Yermal: based on those Avro files. We used to see what type of data like I used to go over and scan, what type of changes are there? Because every Avro file, depending upon the functionality and the modularity was different always like, until I was into that project.

151 00:24:10.880 ⇒ 00:24:28.060 Awaish Kumar: But like, yeah, just let’s like, break it down. So data like you mentioned about Kafka streaming. I thought, there is some front end, some Api, some app, or whatever, and then it is capturing some events from the app.

152 00:24:28.340 ⇒ 00:24:32.430 Awaish Kumar: and then they are going somewhere in the real time.

153 00:24:32.430 ⇒ 00:24:40.169 Shravya Yermal: This is the part which which we were trying to implement from the old system. So I learned.

154 00:24:40.170 ⇒ 00:24:45.529 Awaish Kumar: That is just migration of data. But like, where’s the real time part?

155 00:24:45.640 ⇒ 00:24:55.929 Awaish Kumar: Where is the real? The migration thing is is moving historical data. It’s not real time we are talking about data migration. You utilize Kafka for some

156 00:24:56.783 ⇒ 00:25:03.220 Awaish Kumar: caching or what some storage. But how like

157 00:25:03.740 ⇒ 00:25:09.850 Awaish Kumar: how you handle the real time pipelines like so real time pipelines are like a data.

158 00:25:09.850 ⇒ 00:25:10.480 Shravya Yermal: So this.

159 00:25:10.480 ⇒ 00:25:19.540 Awaish Kumar: For example, net on Netflix, the users, such as something on the real time. We give story communications. So the data is going

160 00:25:21.840 ⇒ 00:25:26.260 Awaish Kumar: to some back end Api, or maybe some

161 00:25:26.580 ⇒ 00:25:31.019 Awaish Kumar: Kafka layer. So, and it is getting processed. We are getting more

162 00:25:31.200 ⇒ 00:25:40.479 Awaish Kumar: Ml. Recommendations for different movies, and then we are serving it back through the rest full Apis. So I’m

163 00:25:41.070 ⇒ 00:25:44.940 Awaish Kumar: so this part where it is in the real time. It goes from an app

164 00:25:45.180 ⇒ 00:25:48.690 Awaish Kumar: to somewhere in some database, or in some warehouse.

165 00:25:48.690 ⇒ 00:25:50.420 Shravya Yermal: Correct. Yes, you are totally correct.

166 00:25:50.420 ⇒ 00:25:53.239 Awaish Kumar: Then it gets processed and it can comes back.

167 00:25:53.580 ⇒ 00:25:57.359 Shravya Yermal: Yeah, you are totally correct on that part. So

168 00:25:57.560 ⇒ 00:26:03.939 Shravya Yermal: the the point I am trying to explain is that

169 00:26:04.330 ⇒ 00:26:33.210 Shravya Yermal: 1st the Api was a trial and test in the development area, which I did by getting that arrow file and then writing some transformations, and it worked on the static one. Now we had to test this in the live environment and how the Kafka works was. Kafka is a messaging system. It works on the messages. If we give it a data. The logs also, it stores. In a event based system. It’s event driven

170 00:26:33.582 ⇒ 00:26:45.869 Shravya Yermal: like we have to, we have to create an event. For suppose each hour so for each hour it will create a event. It will take the data for the the that minute.

171 00:26:45.870 ⇒ 00:26:57.719 Shravya Yermal: and then it will store it in a partition format in the Kafka ecosystem, and it will store, based on the

172 00:26:57.720 ⇒ 00:27:17.460 Shravya Yermal: like the keys. There will be a key at the front of that message, and then it will be stored in a binary format for some encapsulation purposes, and these all things will happen on like in within the Kafka ecosystem. And that is how.

173 00:27:17.640 ⇒ 00:27:25.720 Awaish Kumar: Yeah, like, what is like, what? What is your role in it? You process error files. You

174 00:27:26.353 ⇒ 00:27:30.569 Awaish Kumar: set up the Kafka. You set up the partitions, or you

175 00:27:31.096 ⇒ 00:27:46.550 Awaish Kumar: set up where it is going like the destination. Are you process right like the transformation? How you wrote those transformation you wrote in SQL. What kind of data warehouse was being used from Kafka, where it was going

176 00:27:46.660 ⇒ 00:27:49.840 Awaish Kumar: like, what is the final destination.

177 00:27:49.840 ⇒ 00:27:57.040 Shravya Yermal: So Kafka was just a streaming like it was streaming that odious

178 00:27:57.450 ⇒ 00:28:10.730 Shravya Yermal: to making the transformation, making the changes, and then again going back to other table like in development. It was going to Cassandra, dB, in like.

179 00:28:10.780 ⇒ 00:28:21.990 Shravya Yermal: like, in development. It was going to Cassandra, dB, but I was trying to make the transformation for Silla, dB. Because Cassandra, dB. Did not support.

180 00:28:21.990 ⇒ 00:28:51.970 Shravya Yermal: Like Cassandra, there were few things which didn’t wasn’t supported by Silar dB. But that was not in my, that was not in my scope. So in my scope I used to apply the transformations for Cassandra, dB. And not for the production, but in production, there used to come many issues while doing for Silar, dB, so we used to scan. There used to be a requirement requirements that these are the errors getting up, so they used to send me those errors.

181 00:28:51.970 ⇒ 00:28:57.280 Awaish Kumar: Worked only in the aws environment, or, like.

182 00:28:58.010 ⇒ 00:29:02.980 Awaish Kumar: have you used open source tools like airflow, something dbt.

183 00:29:03.690 ⇒ 00:29:21.870 Shravya Yermal: No. So recently, I have been trying to learn airflow at my current job. They are trying to introduce. They are trying to introduce data lineage. We are trying to establish a data lineage. So there is

184 00:29:22.050 ⇒ 00:29:34.609 Shravya Yermal: spark already existing. But they want to use airflow. So I’m learning on the airflow part. And dbt, I’m just aware of the concepts, and I haven’t worked real time on that.

185 00:29:35.520 ⇒ 00:29:38.489 Awaish Kumar: So you work with where? What kind of warehouses you work with.

186 00:29:39.040 ⇒ 00:29:48.840 Shravya Yermal: So right now I am working with Gcp. I’m getting reskilled while working, but I have worked with aws in the past.

187 00:29:48.840 ⇒ 00:29:52.930 Awaish Kumar: I know, like the warehouse, like aws, is a cloud platform.

188 00:29:52.930 ⇒ 00:30:21.730 Shravya Yermal: Redshift bigquery and visa had its own warehouse like it has its own computing tools that’s named a cloud view. It’s for internal purposes and the confidentiality they want to maintain. So they they didn’t. They do not use azure or aws or Gcp. Anything but but the but the cloud, the cloud computing. It is really parallel to Aws or Gcp. You can say

189 00:30:21.790 ⇒ 00:30:23.920 Shravya Yermal: the things go hand in hand.

190 00:30:23.920 ⇒ 00:30:24.710 Awaish Kumar: Good.

191 00:30:25.710 ⇒ 00:30:35.080 Awaish Kumar: What are like the what are the difference between bigquery and redshift, like key differences?

192 00:30:37.140 ⇒ 00:30:44.890 Shravya Yermal: Bigquery and redshift key differences I am not like oh.

193 00:30:46.360 ⇒ 00:30:47.219 Awaish Kumar: You can just.

194 00:30:47.220 ⇒ 00:30:48.660 Shravya Yermal: Purpose, like.

195 00:30:49.220 ⇒ 00:30:49.840 Awaish Kumar: Sorry.

196 00:30:50.510 ⇒ 00:30:55.239 Awaish Kumar: Like in pros and cons, in terms of what the architecture of of both.

197 00:30:56.580 ⇒ 00:31:07.010 Shravya Yermal: Okay. So in aws, I feel there are like, it’s very really easy. For for example, for auditing, we use the cloud auditing trails.

198 00:31:07.380 ⇒ 00:31:13.999 Shravya Yermal: But in the issue which I am facing right now in Gcp. Is.

199 00:31:14.140 ⇒ 00:31:30.930 Shravya Yermal: there are lots of issues like we are. We are not able to use the cloud auditing logs. That that is the service which Gcp. Provides. We are not able to use that. So they have already established Rdds and the spark, and

200 00:31:31.040 ⇒ 00:31:39.039 Shravya Yermal: there is no like go to function in Gcp. For auditing logs, or even.

201 00:31:39.040 ⇒ 00:31:41.389 Awaish Kumar: That’s about different services in Gcp.

202 00:31:41.750 ⇒ 00:31:47.890 Awaish Kumar: I’m I’m more focused on just the architecture of a warehouse like big like. Just tell me about bigquery like

203 00:31:49.200 ⇒ 00:31:52.939 Awaish Kumar: how it works like in the why is it so fast?

204 00:31:53.490 ⇒ 00:31:59.599 Awaish Kumar: And what are the pros and cons of using quick query like when to use and when not to use.

205 00:32:02.170 ⇒ 00:32:05.779 Shravya Yermal: Okay, that’s a bit tricky question.

206 00:32:05.780 ⇒ 00:32:08.989 Awaish Kumar: How you could decide. Like, if you are a data engineer.

207 00:32:09.120 ⇒ 00:32:16.700 Awaish Kumar: if you are, you are here and, like clients comes in. He he has some requirements.

208 00:32:17.130 ⇒ 00:32:21.909 Awaish Kumar: and he comes to you like with these requirements. How you are going to decide

209 00:32:22.020 ⇒ 00:32:26.289 Awaish Kumar: what data warehouse I should use, what tools should I use? What.

210 00:32:26.840 ⇒ 00:32:28.180 Shravya Yermal: Oh, good!

211 00:32:28.410 ⇒ 00:32:39.059 Shravya Yermal: Honestly to tell you I was never at that level to decide the architecture like what data warehouse to use. I.

212 00:32:39.060 ⇒ 00:32:39.490 Awaish Kumar: Well, that’s.

213 00:32:39.490 ⇒ 00:32:43.110 Shravya Yermal: The value added value. I was always a value.

214 00:32:43.110 ⇒ 00:32:55.040 Awaish Kumar: I understand that, but that the thing is that, like we as a developer, we have to know how the victory is working like. What? What is the columnar storage? If you know.

215 00:32:56.530 ⇒ 00:33:02.979 Shravya Yermal: Yes, yeah, that’s true. We we have to know that.

216 00:33:02.980 ⇒ 00:33:07.490 Awaish Kumar: How, how you optimize the cost can be, carry.

217 00:33:08.560 ⇒ 00:33:11.943 Shravya Yermal: Okay? So basically,

218 00:33:13.230 ⇒ 00:33:28.750 Shravya Yermal: it’s a bit of using of a relational database or a columnar database like, for example, apart from bigquery, Gcp. There is the Sdf. Hdfs, Rdd. Also, like Hdfs also used.

219 00:33:29.310 ⇒ 00:33:33.839 Awaish Kumar: Somehow, just to say on like, for example, we are, we just

220 00:33:34.140 ⇒ 00:33:46.579 Awaish Kumar: say on this part, like the bigquery, only bigquery for example, the bigquery charges by the amount of data we process. So if you process

221 00:33:46.940 ⇒ 00:33:49.070 Awaish Kumar: like the the.

222 00:33:49.410 ⇒ 00:34:00.270 Awaish Kumar: for example, you process 1 TB, you are getting charged $5. If you process 100, it it increases by the amount of data gets processed. So how you make sure that

223 00:34:00.750 ⇒ 00:34:13.369 Awaish Kumar: your data, or maybe, like I have a column. We have a table which is getting, which I’m I’m curing. And there’s 100 TB of the the

224 00:34:13.889 ⇒ 00:34:19.399 Awaish Kumar: table. Size, like the in terms of files is like 100 TB,

225 00:34:19.510 ⇒ 00:34:24.140 Awaish Kumar: but not everything is useful for me. How I’m going to optimize it.

226 00:34:25.719 ⇒ 00:34:32.464 Shravya Yermal: So but basically, we can optimize it with with different

227 00:34:34.089 ⇒ 00:34:38.549 Shravya Yermal: with different solutions, like scaling up

228 00:34:38.869 ⇒ 00:34:47.889 Shravya Yermal: and like scaling has different like we can do horizontal scaling or the vertical scaling depending upon the data sets

229 00:34:48.059 ⇒ 00:34:49.009 Shravya Yermal: and.

230 00:34:49.010 ⇒ 00:34:52.910 Awaish Kumar: That is going to optimize my cost.

231 00:34:55.348 ⇒ 00:34:58.889 Shravya Yermal: So optimizing the cost of

232 00:34:59.899 ⇒ 00:35:06.179 Shravya Yermal: I have not come across such situation like optimizing the cost regarding

233 00:35:07.083 ⇒ 00:35:10.349 Shravya Yermal: In Gcp, like optimizing call.

234 00:35:10.350 ⇒ 00:35:11.650 Awaish Kumar: It’s not not in the book.

235 00:35:12.000 ⇒ 00:35:17.059 Awaish Kumar: like Gcp is is a Google Cloud platform which has hundreds of

236 00:35:17.710 ⇒ 00:35:23.250 Awaish Kumar: and different services. We are focusing on just one, which is which is like

237 00:35:23.500 ⇒ 00:35:30.949 Awaish Kumar: integral in in data engineering role, like, you are going to work with some database, some data warehouse.

238 00:35:30.950 ⇒ 00:35:41.080 Shravya Yermal: In that case, what I think is, if there is a set of data which is being queried again and again, for example.

239 00:35:41.080 ⇒ 00:36:06.420 Shravya Yermal: in a columnar database. Suppose, if there is a data which is like, suppose in what I remember from that banking project which I did, there was a use case when the customer used to come, and he used to hit the statement like he wanted to see the statement balance again and again. So there were the set of rules. There was one table which was going to hit again and again, and it was really

240 00:36:06.420 ⇒ 00:36:26.440 Shravya Yermal: costing the client for hitting the database again and again. So we created a review like basically staged the data, or we replicated the data, a small amount of data which is getting hit again and again, based on the partition like, suppose there are partition based on.

241 00:36:26.440 ⇒ 00:36:31.350 Awaish Kumar: You did like you did create, you created what cashier.

242 00:36:31.350 ⇒ 00:36:51.579 Shravya Yermal: Yeah, like, we create. Yeah, caching is one of the technique. And even we, we applied sharding technique like, we sharded the data based on the customers who were coming recently on the dates basis like, suppose in last 7 days.

243 00:36:51.580 ⇒ 00:37:16.980 Shravya Yermal: person is hitting the database for 2, 3 times. Suppose so. That person was put into a database like sharded into a database. And that person was a frequent user of online banking. So that is how the use case rolled. So we use the sharding technique or the partitioning that.

244 00:37:16.980 ⇒ 00:37:17.960 Awaish Kumar: This partitioning.

245 00:37:19.290 ⇒ 00:37:27.270 Shravya Yermal: Partitioning. Like we, we, we partition based on the columns based on the.

246 00:37:27.270 ⇒ 00:37:30.159 Awaish Kumar: Can you like? What column like, how it works?

247 00:37:31.759 ⇒ 00:37:34.559 Shravya Yermal: Okay, how it works is

248 00:37:36.360 ⇒ 00:37:49.349 Shravya Yermal: Partitioning is, suppose there are 2 tables and there is a key, and there is a key, and we apply the key. And

249 00:37:50.060 ⇒ 00:38:03.530 Shravya Yermal: we apply the key. And the data is basically if the data like, suppose a large data set is on a distributed model like 5 different machine. So on the.

250 00:38:03.530 ⇒ 00:38:04.100 Awaish Kumar: No

251 00:38:05.010 ⇒ 00:38:12.770 Awaish Kumar: like we are not going to distribute, and just say we just say we have a database, a postgres database you work with

252 00:38:14.560 ⇒ 00:38:16.090 Awaish Kumar: and we have a table

253 00:38:16.200 ⇒ 00:38:19.760 Awaish Kumar: right? A very big table. I just want to partition it.

254 00:38:20.010 ⇒ 00:38:22.300 Shravya Yermal: So how should I partition it?

255 00:38:25.110 ⇒ 00:38:39.479 Awaish Kumar: I have a big table, a table, for example, I have a table which has, like hundreds of terabytes of data, and each like the the columns are like. For example, my name, the customer name, the date

256 00:38:39.730 ⇒ 00:38:44.670 Awaish Kumar: and purchases, for example, the product names things like that.

257 00:38:44.800 ⇒ 00:38:47.339 Awaish Kumar: Now I want to partition it.

258 00:38:47.810 ⇒ 00:38:53.399 Awaish Kumar: What column should I use to partition? How it is? Will, how it is going to save

259 00:38:54.036 ⇒ 00:38:55.740 Awaish Kumar: during time, the cost.

260 00:38:55.870 ⇒ 00:39:02.009 Awaish Kumar: and how the insertions will work, and how the how the retrieval will work just like.

261 00:39:02.800 ⇒ 00:39:07.440 Awaish Kumar: just simply just describe what my questions, the answer to my question.

262 00:39:08.730 ⇒ 00:39:10.620 Shravya Yermal: Okay, so

263 00:39:14.310 ⇒ 00:39:20.580 Shravya Yermal: how the partitioning works is when we like.

264 00:39:20.830 ⇒ 00:39:26.460 Shravya Yermal: we apply joints and queries when we pull the data. No awesome.

265 00:39:26.460 ⇒ 00:39:29.771 Awaish Kumar: It’s not partitioning, so joins is a different thing

266 00:39:30.730 ⇒ 00:39:39.890 Shravya Yermal: I I have done it, but I’m unable to recall it right now. Truly speaking, I am unable to recall it, but I have been done that sharding when it.

267 00:39:39.890 ⇒ 00:39:46.520 Awaish Kumar: Adding is a different thing. And partitioning is a different technique.

268 00:39:47.260 ⇒ 00:40:02.440 Awaish Kumar: And okay, so do you know, like, what is the difference between normalization and denormalization, and when to use

269 00:40:03.020 ⇒ 00:40:03.870 Awaish Kumar: peach.

270 00:40:05.100 ⇒ 00:40:10.569 Shravya Yermal: So denormalization is a wide, flat file.

271 00:40:10.750 ⇒ 00:40:16.893 Shravya Yermal: If denormalization is is like,

272 00:40:18.150 ⇒ 00:40:25.630 Shravya Yermal: is a lot bigger table, like when when it like

273 00:40:26.390 ⇒ 00:40:55.580 Shravya Yermal: denormalization is used when we have to retrieve the data really faster, because everything is all together in one big table. That’s the denormalization. Usually we apply when the schemas or data modeling are built. And the normalizations are when we have the specific small tables like they are normalized, basically like, there are different one f 2.

274 00:40:55.580 ⇒ 00:41:05.990 Awaish Kumar: Like, why, yeah, yeah. But why? Like, the normalization is like, if we have a table we want, we will split it up in in multiple tables.

275 00:41:06.470 ⇒ 00:41:12.209 Awaish Kumar: Yes, that happens in normalization. But why we split them like what we want to achieve.

276 00:41:12.370 ⇒ 00:41:14.620 Awaish Kumar: What is the like? The pros of doing that.

277 00:41:16.660 ⇒ 00:41:21.190 Shravya Yermal: So basically pros of that is to

278 00:41:21.440 ⇒ 00:41:30.720 Shravya Yermal: reduce the time of the query which we are querying to that database. So the.

279 00:41:30.720 ⇒ 00:41:33.190 Awaish Kumar: That you said in the normalization.

280 00:41:33.730 ⇒ 00:41:35.330 Shravya Yermal: So even in a Dean.

281 00:41:35.330 ⇒ 00:41:35.975 Awaish Kumar: This.

282 00:41:37.080 ⇒ 00:41:52.730 Awaish Kumar: this benefit, this pros. You mentioned the opposite technique of that. The denormalization is opposite of normalization. And you mentioned that that is going to give us the faster curing time. And now you’re saying the same for normalization.

283 00:41:54.280 ⇒ 00:41:59.449 Shravya Yermal: Yeah, because the denormalization which I recall is.

284 00:42:00.350 ⇒ 00:42:03.979 Awaish Kumar: That’s okay. That’s correct. I’m I’m now talking about normalization.

285 00:42:04.780 ⇒ 00:42:07.969 Shravya Yermal: Normal. Yeah, in normalization.

286 00:42:08.340 ⇒ 00:42:30.200 Shravya Yermal: I really don’t remember the exact use case what pros it’s giving. But it is used when we don’t want lot of data to be scanned in a query like, we want to write a simple query. So at that point of time we use a normalized table.

287 00:42:30.200 ⇒ 00:42:33.980 Awaish Kumar: And okay, how familiar are you with the SQL.

288 00:42:36.540 ⇒ 00:42:37.070 Awaish Kumar: Like the.

289 00:42:37.070 ⇒ 00:42:39.679 Shravya Yermal: You can write, the windows function, the.

290 00:42:39.680 ⇒ 00:42:42.000 Awaish Kumar: Of them, how? How you rate yourself.

291 00:42:42.320 ⇒ 00:42:42.900 Shravya Yermal: Hmm.

292 00:42:43.460 ⇒ 00:42:54.769 Shravya Yermal: If you ask me right now, it will be 7 to 8. But if I look into that, according to for sequel, like SQL. Problem like, if you give me a.

293 00:42:54.770 ⇒ 00:42:57.450 Awaish Kumar: How, for example, if I have a query.

294 00:42:57.720 ⇒ 00:43:04.050 Awaish Kumar: what is the the execution sequence? If I have like select static from some table.

295 00:43:04.050 ⇒ 00:43:05.319 Shravya Yermal: It starts from.

296 00:43:05.320 ⇒ 00:43:08.440 Awaish Kumar: And then what? What is the sequence of execution?

297 00:43:08.690 ⇒ 00:43:18.690 Shravya Yermal: From where, then? Having from where? Having then order by, and then it goes to the select statement.

298 00:43:21.140 ⇒ 00:43:25.160 Shravya Yermal: and sorry from after, where there is a group by also.

299 00:43:29.570 ⇒ 00:43:33.780 Awaish Kumar: So you are saying it orders first, st and then returns the data.

300 00:43:35.594 ⇒ 00:43:44.549 Shravya Yermal: That is how the SQL. Query goes like it takes the 1st the tables, then it applies. Where then it applies group by then having is.

301 00:43:44.550 ⇒ 00:43:46.029 Awaish Kumar: Is it drawings as well.

302 00:43:47.340 ⇒ 00:43:50.290 Shravya Yermal: Yeah. Sorry.

303 00:43:50.940 ⇒ 00:43:53.250 Awaish Kumar: So the security also has some joints. So.

304 00:43:53.250 ⇒ 00:43:54.370 Shravya Yermal: Yeah, from

305 00:43:55.040 ⇒ 00:44:14.760 Shravya Yermal: in the, from statement only, we apply joins, and then where and then group by then, having then order by, and then then it goes to the select statement, and if the select statement has the windows or the sub queries, the sub queries are written like, it depends.

306 00:44:14.760 ⇒ 00:44:17.109 Awaish Kumar: So this query, I have

307 00:44:17.851 ⇒ 00:44:26.040 Awaish Kumar: some field, some joins with some tables, and I’m also having some filters. But this security is very simple.

308 00:44:26.220 ⇒ 00:44:28.839 Awaish Kumar: It’s just taking few columns from one table.

309 00:44:29.330 ⇒ 00:44:37.270 Awaish Kumar: few from other and 1 2 columns from 3rd table. It joins 3 tables, have some filters and

310 00:44:37.730 ⇒ 00:44:38.740 Awaish Kumar: returns the user.

311 00:44:38.740 ⇒ 00:44:39.899 Shravya Yermal: To simplify.

312 00:44:39.900 ⇒ 00:44:46.680 Awaish Kumar: It’s very, very simple query, but it is taking very long time to execute. I want to optimize that.

313 00:44:48.320 ⇒ 00:44:50.959 Awaish Kumar: What would you suggest I should do first? st

314 00:44:52.008 ⇒ 00:45:05.759 Shravya Yermal: You should 1st write that simple query first, st and put it into the cte the common expression like cities, and then use that Ct. To again apply the main

315 00:45:05.950 ⇒ 00:45:09.779 Shravya Yermal: clause, like the mean clause which you are trying to filter like.

316 00:45:10.630 ⇒ 00:45:17.119 Awaish Kumar: Yeah, my query is very simple, like it. It is the main clause, like I, I just select

317 00:45:17.420 ⇒ 00:45:29.060 Awaish Kumar: A, BCDF 5 columns, but to to get those 5 5 columns, for example, you say E and F columns are coming from 2 different tables. That’s why I needed to join 2 other tables, right.

318 00:45:29.060 ⇒ 00:45:31.150 Shravya Yermal: So we can write a case between for that.

319 00:45:31.550 ⇒ 00:45:35.169 Awaish Kumar: Select from and join, and where and that’s all.

320 00:45:35.644 ⇒ 00:45:42.110 Shravya Yermal: Case statement. So when we write a so that.

321 00:45:42.110 ⇒ 00:45:45.110 Awaish Kumar: I think that’s going to optimize my query.

322 00:45:45.980 ⇒ 00:45:52.550 Shravya Yermal: So we can. so case statement can be written in other sub query, so that sub query runs faster, and then.

323 00:45:52.550 ⇒ 00:45:59.230 Awaish Kumar: I want to optimize my query. But you can think of anything like you. Just don’t focus on query itself right?

324 00:45:59.916 ⇒ 00:46:03.179 Awaish Kumar: To optimize your queries, you you can

325 00:46:03.300 ⇒ 00:46:09.870 Awaish Kumar: you? You can think of your database like maybe the architecture is not good, maybe something wrong with the table.

326 00:46:10.080 ⇒ 00:46:12.189 Awaish Kumar: something like if the table is bloated.

327 00:46:13.100 ⇒ 00:46:24.900 Awaish Kumar: What different things I can do here, so that my query becomes fast. Query is really simple. There’s no ex extra aggregations. There’s no window functions. There’s nothing

328 00:46:25.530 ⇒ 00:46:37.369 Awaish Kumar: just selecting data, but because the table is a very big table. It. It takes like a minute to execute. But I want it to be like in a few seconds.

329 00:46:40.240 ⇒ 00:46:45.470 Shravya Yermal: In few. So at this time, I realized, if you

330 00:46:45.690 ⇒ 00:46:50.780 Shravya Yermal: at this time, the direction I see is if the data is

331 00:46:51.230 ⇒ 00:46:58.089 Shravya Yermal: clean data and a normalized data. Maybe that will bring out the data faster, like.

332 00:46:59.430 ⇒ 00:47:03.950 Awaish Kumar: Let’s say the table is already kind of be normalized.

333 00:47:04.400 ⇒ 00:47:04.860 Shravya Yermal: Hmm.

334 00:47:04.860 ⇒ 00:47:12.520 Awaish Kumar: It’s, it’s it’s it’s clean as well. It’s just. The amount of data is is a lot.

335 00:47:14.160 ⇒ 00:47:22.310 Shravya Yermal: Okay? So at that point of time, we can group by the actual actual amount.

336 00:47:22.310 ⇒ 00:47:30.199 Awaish Kumar: What are different like the strategies in in the database design. Like to optimize your read

337 00:47:30.460 ⇒ 00:47:31.909 Awaish Kumar: the excess time.

338 00:47:33.610 ⇒ 00:47:39.580 Shravya Yermal: Different strategies to read the data in X time or.

339 00:47:39.580 ⇒ 00:47:45.899 Awaish Kumar: No, no, no, I mean to optimize your access. Time, data, retrieval, time.

340 00:47:46.010 ⇒ 00:47:47.130 Shravya Yermal: Right.

341 00:47:47.680 ⇒ 00:47:53.009 Awaish Kumar: To optimize your data. Retrieval time. What are different strategies

342 00:47:53.740 ⇒ 00:47:56.860 Awaish Kumar: that you can use like in the database design.

343 00:47:57.370 ⇒ 00:47:59.060 Shravya Yermal: In the database design.

344 00:47:59.300 ⇒ 00:48:00.090 Shravya Yermal: So.

345 00:48:00.090 ⇒ 00:48:03.580 Awaish Kumar: Yeah, like, like, I want to design a database. Right?

346 00:48:03.730 ⇒ 00:48:08.490 Awaish Kumar: I want to like my, my, I have a table.

347 00:48:09.366 ⇒ 00:48:14.580 Awaish Kumar: So what kind of operation we can do either we can do, write, read, right.

348 00:48:15.290 ⇒ 00:48:16.290 Shravya Yermal: Right, read, or.

349 00:48:16.290 ⇒ 00:48:18.710 Awaish Kumar: Update, confirm.

350 00:48:19.320 ⇒ 00:48:35.919 Awaish Kumar: and delete like. These are the operations we we perform normally. So I I have a table. So I want I want to make my table. I want to optimize my table so that my read time gets improved. I’m I’m I’m not bothered by write time.

351 00:48:36.450 ⇒ 00:48:49.539 Awaish Kumar: I have very few in sessions every day. I don’t care about them. They happen in midnight, but I want I’m very concerned about the read time, because my table is getting access

352 00:48:50.180 ⇒ 00:48:56.530 Awaish Kumar: every a few like every few seconds. For example, I want to make them make it

353 00:48:56.810 ⇒ 00:49:02.329 Awaish Kumar: as optimal as possible for read queries.

354 00:49:04.830 ⇒ 00:49:05.540 Shravya Yermal: So

355 00:49:06.710 ⇒ 00:49:21.550 Shravya Yermal: I will look into the amount of data, it is pulling up at that point of time, if the it is going over the lot of scan, so it will eventually take time. So at that point of time I will

356 00:49:21.710 ⇒ 00:49:26.390 Shravya Yermal: oh, like, put a group by, maybe, so that

357 00:49:27.140 ⇒ 00:49:30.920 Shravya Yermal: it will fetch like it will just scan the required.

358 00:49:30.920 ⇒ 00:49:43.110 Awaish Kumar: Yeah, that can be that group by like, if my, if my requirement is that like, if I want to group by the data on I can on some columns, so that my data.

359 00:49:43.360 ⇒ 00:49:48.440 Shravya Yermal: Or apply, I will apply the indexes in that phase like a.

360 00:49:49.420 ⇒ 00:49:55.820 Awaish Kumar: That’s the second strategy. Here we can use an indexing. But where to apply indexing

361 00:49:56.200 ⇒ 00:50:00.140 Awaish Kumar: like how you choose and what columns will you apply the indexing.

362 00:50:00.840 ⇒ 00:50:07.840 Shravya Yermal: So index indexing will be applied based on the oh

363 00:50:08.190 ⇒ 00:50:12.900 Shravya Yermal: columns, I need to partition. So, using some windows file.

364 00:50:12.900 ⇒ 00:50:14.300 Awaish Kumar: Positioning is different.

365 00:50:14.430 ⇒ 00:50:14.830 Shravya Yermal: Yeah.

366 00:50:14.830 ⇒ 00:50:16.370 Awaish Kumar: Partitioning is different.

367 00:50:17.030 ⇒ 00:50:25.309 Awaish Kumar: I don’t want like we don’t have to index the the partition field already the partition is already in a different.

368 00:50:25.640 ⇒ 00:50:30.240 Awaish Kumar: in a different physical space. So we.

369 00:50:30.790 ⇒ 00:50:35.599 Awaish Kumar: it’s already optimized by that one. Right? We don’t. Partitioning is a

370 00:50:35.940 ⇒ 00:50:38.710 Awaish Kumar: strategy to optimize. But it’s a different

371 00:50:38.890 ⇒ 00:50:41.369 Awaish Kumar: strategy. Right? It’s 1 of these strategies.

372 00:50:41.660 ⇒ 00:50:51.649 Awaish Kumar: But we are now focusing on indexing. I want to index my table. That’s that you are correct. But now suggest to me on what field should I put in index.

373 00:50:53.390 ⇒ 00:50:57.689 Shravya Yermal: I will put on the date and time first, st because.

374 00:50:58.160 ⇒ 00:51:14.819 Awaish Kumar: That is, that you are going to decide, based on the fields which are being used in, join and fill where column, like the the fields which are being used in filters and in joins that are the fields. You should be indexing first.st

375 00:51:14.820 ⇒ 00:51:19.029 Shravya Yermal: Also indexing depends on the business logic. What they want exactly.

376 00:51:19.480 ⇒ 00:51:29.600 Awaish Kumar: No business logic is that like, join how you join the data, how you filter the data, how you group by the data that’s basically depends really on the business requirement.

377 00:51:29.850 ⇒ 00:51:42.850 Awaish Kumar: We translated business requirements in the query, we have the some from where joins. And so the for the indexing. You can only see. Okay, I have a very close. I want to see the data only from

378 00:51:44.320 ⇒ 00:51:45.460 Awaish Kumar: state

379 00:51:45.620 ⇒ 00:52:02.940 Awaish Kumar: taxes. So I know that the State is being used in as a business requirement, and they want to see the data from each state. So that’s the column I want to index on, because that is being used that is going to be used in queries as a filter. A lot.

380 00:52:03.230 ⇒ 00:52:05.739 Shravya Yermal: Correct. Yes, you’re correct for that.

381 00:52:07.112 ⇒ 00:52:12.599 Awaish Kumar: Okay, so enough of the the technical question, I ha! I don’t have any more.

382 00:52:12.860 ⇒ 00:52:16.230 Awaish Kumar: Now. I just have some questions on.

383 00:52:17.290 ⇒ 00:52:19.709 Awaish Kumar: like, what are your career? Goals.

384 00:52:21.990 ⇒ 00:52:22.630 Shravya Yermal: So.

385 00:52:22.630 ⇒ 00:52:26.980 Awaish Kumar: And like how you do. You see yourself in next 2 to 5 years.

386 00:52:28.150 ⇒ 00:52:32.320 Shravya Yermal: So right now. I

387 00:52:32.420 ⇒ 00:52:40.960 Shravya Yermal: I want. I want to work for a small scale company because I feel that I know the

388 00:52:40.960 ⇒ 00:53:05.250 Shravya Yermal: concepts. But I haven’t got that much of hands on until now in data engineering space, I’m and I’m really interested to upskill myself and even get it more stronger, like working on each part of like we. We have discussed so many in last hour, so there are many things which I know, but

389 00:53:05.260 ⇒ 00:53:23.269 Shravya Yermal: there are. There are few things. There are so many things I know few things, and there are so many things I want to learn. So I see myself in on my personal growth that I learn and get reskilled properly in each of them and get stronger technically.

390 00:53:23.450 ⇒ 00:53:33.520 Shravya Yermal: So that’s my short term goal and parallelly. I have been trying to learn technologies and even the parallel technologies as it comes on the go.

391 00:53:33.540 ⇒ 00:53:57.060 Shravya Yermal: So on the long term after 5 years, I see at your position at like lead position. I want to transform myself from like I’m at a junior level still at like Junior database in data engineer. So I want to go for a lead engineer. Maybe after 5 years I see myself at that position.

392 00:53:57.940 ⇒ 00:54:07.180 Awaish Kumar: Okay? So like, if I ask your few bosses.

393 00:54:07.790 ⇒ 00:54:15.300 Awaish Kumar: which, like in the past, like in the past few bosses, 2, 3 like about you, like

394 00:54:15.910 ⇒ 00:54:24.940 Awaish Kumar: as a person, as a as the technical capabilities, everything. How they are going to rate Shelvia

395 00:54:26.040 ⇒ 00:54:27.970 Awaish Kumar: out of like 500.

396 00:54:28.510 ⇒ 00:54:30.600 Shravya Yermal: They are going to rate me

397 00:54:31.090 ⇒ 00:54:33.569 Shravya Yermal: above 4.5 to tell you

398 00:54:33.630 ⇒ 00:54:57.179 Shravya Yermal: I have been always on time, on pace whenever the team required me. I have always like the work. Environment was a little bit different at every organization I have worked. So I will tell you for visa visa environment. So I was working with the infrastructure team. So on day to day basis, my work was different.

399 00:54:57.180 ⇒ 00:55:21.479 Shravya Yermal: But on parallel I used to do different projects. So whenever they require me on top, I was there for each and every project, like for infrastructure project, for development project or for automation project, whenever they wanted a helping hand. I was already there, and if you go and ask them right now, they will count on me. They will give an accountability and the credibility. Then yes, I can

400 00:55:21.480 ⇒ 00:55:51.080 Shravya Yermal: go and work like I am a resource that you can count on me, that I will go learn about the projects and make the things move forward. At least I won’t give the final solutions, because I’m still learning how the architecture works and how the architecture are made. What are the pros and cons that are decided by the architecture? But I’m still learning why they did that. I go and ask questions like hundreds of questions. And then I get to know

401 00:55:51.080 ⇒ 00:56:03.589 Shravya Yermal: some picture, and I try to analyze things on my basis. But I’m not up to the mark, but I know how to make things move, at least from one code to another code like.

402 00:56:07.765 ⇒ 00:56:11.319 Awaish Kumar: Okay, yeah. Now, do you have any questions for us?

403 00:56:12.204 ⇒ 00:56:25.155 Shravya Yermal: Yes, I would like to know more about the this position like how it is going to like you said earlier. It is a like brain forge is a consultancy. So

404 00:56:25.580 ⇒ 00:56:47.290 Shravya Yermal: what domain clients are there, and whom am I getting interviewed for, and how it is going to look like, will I be working on in-house projects, and or I will be directly working with the client and like, will it be hourly basis for a client, and is it mid base, like is the bainforce.

405 00:56:47.290 ⇒ 00:56:51.530 Awaish Kumar: So yeah, it is going to be decided by

406 00:56:52.310 ⇒ 00:57:00.709 Awaish Kumar: by that, like the team like after the interviews. How we want you to be in the company on a full time basis, or

407 00:57:00.820 ⇒ 00:57:05.836 Awaish Kumar: if we, if we can offer full time, part time, or whatever it is, and

408 00:57:06.770 ⇒ 00:57:24.050 Awaish Kumar: but like as a. What we are hiring for is as a data engineer and analytics engineer kind of role, where we have some data engineering work internally, which which is basically supporting the client’s work.

409 00:57:25.440 ⇒ 00:57:30.169 Awaish Kumar: and then you have directly the the client work as well, which is

410 00:57:31.031 ⇒ 00:57:33.890 Awaish Kumar: the mostly the analytical engineering.

411 00:57:34.331 ⇒ 00:57:58.989 Awaish Kumar: Sometimes it can be the data engineering part, some injections. But yeah, it’s but like we, what we are looking for is someone who is capable enough to work on data engineering part of it, and also on the analytics part of it, which is like writing SQL. Transformations in SQL. And put it in Dbt project, for example.

412 00:57:59.493 ⇒ 00:58:07.659 Awaish Kumar: and so we are more. We favor more at the the transformation on top of warehouse. So.

413 00:58:07.660 ⇒ 00:58:08.020 Shravya Yermal: Interesting.

414 00:58:08.020 ⇒ 00:58:11.670 Awaish Kumar: We are. Yeah, someone who is expert in warehouse

415 00:58:11.790 ⇒ 00:58:24.150 Awaish Kumar: architectures. Dbt, and the data engineering work. That’s what we are hiring for. And then we do have internal work. As I mentioned. Then we have some clients.

416 00:58:24.570 ⇒ 00:58:25.850 Awaish Kumar: and the

417 00:58:26.542 ⇒ 00:58:33.439 Awaish Kumar: like, we are going to pair you with with like with the clients as well. So

418 00:58:34.740 ⇒ 00:58:47.370 Awaish Kumar: clients on the clients. It depends really on how big they are. It can be like only one person working on the client, or it requires a team of 3, 4 people. It can be analytics, engineer, analyst.

419 00:58:48.047 ⇒ 00:58:52.110 Awaish Kumar: project manager working together for the client.

420 00:58:52.230 ⇒ 00:59:08.819 Awaish Kumar: So it really depends on on the client and for the client engagements. It’s up to the brain force. How they engaging with client or whatever. So that’s designed by the Brainforge leadership team or the team who who deals with client.

421 00:59:08.960 ⇒ 00:59:14.560 Awaish Kumar: But then, on the other side, the the engineering team is is mostly about executing.

422 00:59:14.700 ⇒ 00:59:19.585 Awaish Kumar: executing on the roadmap given and roadmap roadmap can be like

423 00:59:20.210 ⇒ 00:59:24.339 Awaish Kumar: creating a DVD project, or it can be a lot bigger.

424 00:59:25.150 ⇒ 00:59:28.119 Awaish Kumar: Then just that’s the video. So

425 00:59:29.100 ⇒ 00:59:38.679 Awaish Kumar: and you can be, yeah for each line. We are going to decide the hours for each person like how much hours you want to put

426 00:59:39.690 ⇒ 00:59:43.819 Awaish Kumar: for this client. Right? So like you, you will be asked like, okay.

427 00:59:44.080 ⇒ 00:59:46.630 Awaish Kumar: we have hired you for maybe

428 00:59:47.100 ⇒ 01:00:01.089 Awaish Kumar: 40 h per week, right as a full time. But you, if you are going to work on this client for 10 h and other 30 h, we we can split it up in between multiple clients, or it can be like client and internal work.

429 01:00:01.090 ⇒ 01:00:10.750 Shravya Yermal: Interesting. Also, I would like to know how many. How is the interview process going to go like how many rounds are going.

430 01:00:10.750 ⇒ 01:00:20.052 Awaish Kumar: Yeah for me. It’s I think it’s one or 2 more interviews. And the Rico from operations is going to let you know about

431 01:00:21.390 ⇒ 01:00:27.139 Awaish Kumar: about the decision to move forward and like he will. Just I got

432 01:00:27.978 ⇒ 01:00:32.459 Awaish Kumar: keep the communication continue with you on that part.

433 01:00:32.700 ⇒ 01:00:38.100 Awaish Kumar: And yeah, hope I hope, like in the next week he’s going to give response back.

434 01:00:39.030 ⇒ 01:00:43.500 Shravya Yermal: Okay, okay, interesting, yeah, so do you have.

435 01:00:43.500 ⇒ 01:00:44.989 Awaish Kumar: Any other questions.

436 01:00:46.000 ⇒ 01:00:48.809 Shravya Yermal: Yeah, does the Brainforge sponsor.

437 01:00:52.080 ⇒ 01:00:52.920 Shravya Yermal: Which one b.

438 01:00:53.920 ⇒ 01:00:59.680 Awaish Kumar: I’m not really sure about it. Right now, at the moment, yeah. But I can ask

439 01:01:00.000 ⇒ 01:01:02.029 Awaish Kumar: what are like, what are the

440 01:01:03.520 ⇒ 01:01:06.849 Awaish Kumar: if company responses or not. Yeah.

441 01:01:08.095 ⇒ 01:01:11.170 Shravya Yermal: Okay, I guess I’m good on my part.

442 01:01:11.470 ⇒ 01:01:12.170 Shravya Yermal: Oh.

443 01:01:12.480 ⇒ 01:01:30.830 Awaish Kumar: Okay, we are on time. So it was nice meeting you. And Rico will going to let you know next week, whatever the decision is, and if you want to move forward, he’s going to tell you everything. Who are you meeting with next, and what what the agenda is,

444 01:01:32.120 ⇒ 01:01:33.870 Awaish Kumar: okay, yeah, thank you.

445 01:01:33.870 ⇒ 01:01:35.800 Shravya Yermal: Thank you. Have a nice one.

446 01:01:36.040 ⇒ 01:01:37.400 Awaish Kumar: You too, bye.

Brainforge Knowledge

Explorer

2025-07-25_data_engineer_interview_shravya_yermal_eede767a

Graph View