Meeting Title: BF Interview: Awaish <> Ashwini Date: 2025-11-13 Meeting participants: Ashwini Sharma, Awaish Kumar
WEBVTT
1 00:04:06.010 ⇒ 00:04:06.890 Awaish Kumar: Hello.
2 00:04:09.760 ⇒ 00:04:11.279 Ashwini Sharma: Hi. Hi, awish.
3 00:04:11.600 ⇒ 00:04:12.870 Ashwini Sharma: A wife, right?
4 00:04:13.100 ⇒ 00:04:14.310 Awaish Kumar: I mean, sh…
5 00:04:14.310 ⇒ 00:04:15.770 Ashwini Sharma: Of which, of which, yeah, okay.
6 00:04:15.950 ⇒ 00:04:16.579 Awaish Kumar: Oh, yo.
7 00:04:17.249 ⇒ 00:04:18.809 Awaish Kumar: How are you? I’m good.
8 00:04:18.959 ⇒ 00:04:19.769 Ashwini Sharma: I’m good.
9 00:04:20.820 ⇒ 00:04:21.350 Awaish Kumar: Okay.
10 00:04:21.660 ⇒ 00:04:23.399 Ashwini Sharma: Let me increase my volume a little bit.
11 00:04:23.700 ⇒ 00:04:24.410 Ashwini Sharma: Yep.
12 00:04:25.580 ⇒ 00:04:28.170 Awaish Kumar: Okay, and this, like, I just want to…
13 00:04:28.310 ⇒ 00:04:33.690 Awaish Kumar: Summarized the agenda for this meeting, and that includes, like, this…
14 00:04:34.020 ⇒ 00:04:43.359 Awaish Kumar: Introductions, and then, like, getting to know a bit more about your experience and the projects you have worked on.
15 00:04:44.250 ⇒ 00:04:45.060 Awaish Kumar: Good.
16 00:04:45.650 ⇒ 00:04:48.799 Awaish Kumar: After that, maybe deep dive into the… some of the…
17 00:04:48.980 ⇒ 00:04:52.109 Awaish Kumar: projects, and learn more about it.
18 00:04:52.640 ⇒ 00:04:53.280 Ashwini Sharma: Sure.
19 00:04:54.450 ⇒ 00:05:00.220 Awaish Kumar: Yeah, if you can… yeah, I’m Ravesh Kumar, I’m a data engineer at Brain Foods.
20 00:05:00.420 ⇒ 00:05:05.410 Awaish Kumar: I’ve been working for a year, and… I have, like, overall…
21 00:05:05.600 ⇒ 00:05:12.499 Awaish Kumar: Years of experience working as a data engineer moves into various companies.
22 00:05:12.770 ⇒ 00:05:13.780 Awaish Kumar: So, yeah.
23 00:05:14.090 ⇒ 00:05:16.650 Awaish Kumar: Now you can, like, introduce yourself.
24 00:05:16.650 ⇒ 00:05:23.459 Ashwini Sharma: Sure, sure, yeah. So, my name is Ashwini, I’m working as a data architect at BSI Financial Services India.
25 00:05:23.550 ⇒ 00:05:32.490 Ashwini Sharma: In my resume, it must be mentioned as Entra Solutions, because that was the name when I created the resume. Recently, it got changed a few days back.
26 00:05:32.560 ⇒ 00:05:37.700 Ashwini Sharma: Yeah, so basically, as a part of the current role, what I do is,
27 00:05:37.710 ⇒ 00:05:51.629 Ashwini Sharma: I ingest data from a variety of different sources, and it’s a small company, we cannot afford to buy tools like Fivetran or AirByte Cloud, things like that, right? So, we write our own ingestion scripts based on PySpark.
28 00:05:51.630 ⇒ 00:05:59.870 Ashwini Sharma: We’re ingesting all the data in Databricks. Once the data is there in Databricks, I’m also working on data transformation using dbt scripts.
29 00:05:59.940 ⇒ 00:06:19.109 Ashwini Sharma: And, transforming it into the medallion, following the medallion architecture, and then exposing the data marts, to, to our consumers, which, which consume the data either through a visualization tool, like Sigma, or, they just run some analysis on, on top of those data sets.
30 00:06:19.310 ⇒ 00:06:38.360 Ashwini Sharma: So that’s, what I’ve been doing. I’ve worked in the data domain since almost 2018, with, starting with Fivetran, where I was, leading, I joined as IC, worked on several connectors, right? If you have used Fivetran, I’m sure you have used some of my connectors that
31 00:06:38.450 ⇒ 00:06:42.420 Ashwini Sharma: That, that are there in Fivetran. And,
32 00:06:42.700 ⇒ 00:06:52.390 Ashwini Sharma: Yeah, after Fivetran, I joined Shopify, where it was more of a data reliability, right? And then it was my tenure at Entra Solutions.
33 00:06:52.700 ⇒ 00:06:59.780 Awaish Kumar: Okay, so what exactly, you have done in terms of data reliability?
34 00:07:01.020 ⇒ 00:07:03.739 Awaish Kumar: What processes? So, yeah, in terms…
35 00:07:04.720 ⇒ 00:07:22.850 Ashwini Sharma: Right, in terms of data reliability, right, so we didn’t use any custom tools at that point. In Shopify, right, we were building our own tools, and our tools were basically monitoring our data sets for freshness, you know, the clusters for, if it is going down.
36 00:07:22.990 ⇒ 00:07:36.289 Ashwini Sharma: If, you know, data sets didn’t refresh, during a certain interval of time, creating data lineages to understand the impact, downstream impact, upstream impact, right?
37 00:07:36.510 ⇒ 00:07:43.749 Ashwini Sharma: And, it was mainly, like, an on-call duty, right, for the data reliability.
38 00:07:43.750 ⇒ 00:07:56.540 Ashwini Sharma: So, in case something goes down, somebody pushes a bad PR, things don’t work as expected, that’s where, you know, me and my team was coming into action, and then taking the necessary steps to recover from that failure.
39 00:07:57.190 ⇒ 00:08:06.879 Awaish Kumar: Yeah, well, like, my question will be more like, how are you maintaining the data, like, the data quality, in Shopify?
40 00:08:08.470 ⇒ 00:08:15.100 Ashwini Sharma: So, yeah, so in terms of data quality, we had some of the basic dbt tests that, sorry, you’re not audible.
41 00:08:16.640 ⇒ 00:08:30.030 Awaish Kumar: I was saying the freshness… the freshness is part of it, like, you are looking for table if it is refreshed, but apart from that, like, there are a lot of things can happen in data quality.
42 00:08:30.030 ⇒ 00:08:34.090 Ashwini Sharma: Right, right, right, right, so… so we had some of the tests.
43 00:08:34.530 ⇒ 00:08:38.069 Ashwini Sharma: Yeah, I’m explaining that, yeah. Yeah, like…
44 00:08:38.070 ⇒ 00:08:44.209 Awaish Kumar: finish my sentence, like, sometimes you get spam messages as well, so, like, how you…
45 00:08:44.320 ⇒ 00:08:45.570 Awaish Kumar: Handle all of these things.
46 00:08:46.230 ⇒ 00:08:58.169 Ashwini Sharma: Yeah, so some of the tests we had included in our dbt models that used to capture a lot of data quality issues, right? For example, null detection, right? Out-of-range detections.
47 00:08:58.320 ⇒ 00:09:02.679 Ashwini Sharma: you know, anomalies in data volume, right?
48 00:09:02.900 ⇒ 00:09:12.650 Ashwini Sharma: those kind of things that we were mainly interested in. Referential integrities, right? These are the things that we use to monitor through our custom tools.
49 00:09:13.530 ⇒ 00:09:15.230 Awaish Kumar: So we’re not…
50 00:09:15.480 ⇒ 00:09:30.640 Awaish Kumar: These are, like, very standard things, like, everybody wants to go for that. My point is, like, I have, like, almost 8 years, 9 years of experience in this field, and I’ve been doing exactly what you said.
51 00:09:30.860 ⇒ 00:09:41.049 Awaish Kumar: For data quality also. So things are, like, sometimes you get… get spammed, like… like, if I set a set of values, or ranges for…
52 00:09:41.520 ⇒ 00:09:46.220 Awaish Kumar: Price, or some other revenue, or something like that.
53 00:09:46.580 ⇒ 00:09:46.920 Ashwini Sharma: That’s a night.
54 00:09:46.920 ⇒ 00:09:49.690 Awaish Kumar: And most… 50% of the time, like…
55 00:09:49.890 ⇒ 00:09:55.270 Awaish Kumar: Like, you get the… when you charge the issue, it is generally…
56 00:09:55.470 ⇒ 00:10:13.599 Awaish Kumar: like, accepted value, right? So how, like, even, like, if I can say, like, if value is 20% higher than this, it’s an outlier. Sometimes it’s, like, maybe 1% higher, and it’s not an outlier, but it… it still gets detected.
57 00:10:15.030 ⇒ 00:10:25.310 Ashwini Sharma: Yeah, yeah, so, that happens a lot. I think our outlines were, you know, more than, basically 6 times of the standard deviation.
58 00:10:25.850 ⇒ 00:10:45.760 Ashwini Sharma: So, if anything is more than, you know, basically based on the… on the existing values, if something deviates from the average by 6 sigma, then that’s an outlier. And, generally, we used to catch that, right? I don’t have a statistical information on…
59 00:10:45.930 ⇒ 00:10:52.189 Ashwini Sharma: You know, how often we used to cache it, but whenever we caught it, it was definitely an outlier.
60 00:10:52.900 ⇒ 00:10:53.620 Awaish Kumar: Okay.
61 00:10:55.160 ⇒ 00:11:04.120 Ashwini Sharma: Yeah, but the metric was that, right? So, anything that is more than, you know, 6 times the standard deviation of the distribution.
62 00:11:04.350 ⇒ 00:11:08.759 Awaish Kumar: Have you used any of the, like, latest tools which are coming in the market?
63 00:11:08.760 ⇒ 00:11:10.770 Ashwini Sharma: No.
64 00:11:10.910 ⇒ 00:11:20.320 Ashwini Sharma: Dater, no, Monte Carlo guys, if a lot of things were there, but yeah, we were not authorized to purchase any of those things, and didn’t use…
65 00:11:20.990 ⇒ 00:11:26.020 Ashwini Sharma: Okay. And the same case is repeating in Entra solutions, right? No money to purchase, so…
66 00:11:26.190 ⇒ 00:11:30.870 Ashwini Sharma: Just developing custom tools to catch those, quality.
67 00:11:30.870 ⇒ 00:11:31.550 Awaish Kumar: Okay.
68 00:11:33.290 ⇒ 00:11:42.329 Awaish Kumar: So apart from that, like, you mentioned about writing, scripts, in PySpark, so how…
69 00:11:43.010 ⇒ 00:11:45.000 Awaish Kumar: Like, are you running it on,
70 00:11:46.300 ⇒ 00:11:51.920 Awaish Kumar: Like, are you running open source version of it, or is it more, like, cloud version?
71 00:11:52.190 ⇒ 00:11:59.319 Ashwini Sharma: No, we’re in Databricks, right? We’re in Databricks, so it runs on a Spark cluster.
72 00:12:00.630 ⇒ 00:12:05.659 Awaish Kumar: Understood, yeah. I thought maybe if the company is more cost-conscious, you know, maybe running.
73 00:12:06.170 ⇒ 00:12:09.380 Ashwini Sharma: No, no, no, we’re not doing that, so…
74 00:12:09.550 ⇒ 00:12:19.910 Ashwini Sharma: everything is within Databricks itself, right? The dbt models, they also run within Databricks, and ingestion also runs on Databricks. Only the visualization happens outside.
75 00:12:21.080 ⇒ 00:12:26.220 Awaish Kumar: Okay, and, like, the 5thread, like, you mentioned you are using FiveTrain, or not?
76 00:12:26.220 ⇒ 00:12:28.120 Ashwini Sharma: I’ve worked in Pipetran.
77 00:12:28.930 ⇒ 00:12:38.659 Awaish Kumar: Okay, so here, everything is in Databricks ecosystem, and you run the Python scripts application there, dbt, and everything.
78 00:12:38.660 ⇒ 00:12:40.640 Ashwini Sharma: Yes. At Entra, yeah.
79 00:12:41.490 ⇒ 00:12:42.340 Awaish Kumar: Okay.
80 00:12:44.130 ⇒ 00:12:49.750 Awaish Kumar: Okay, how would that you rate yourself, in Python and SQL?
81 00:12:50.820 ⇒ 00:12:53.110 Ashwini Sharma: In SQL, I would say I’m…
82 00:12:53.250 ⇒ 00:12:58.670 Ashwini Sharma: I’m good. I would rate myself maybe more than… 4 out of 5?
83 00:12:58.910 ⇒ 00:13:05.779 Ashwini Sharma: In, in Python, I would rate myself, like a 3 kind of thing, right?
84 00:13:06.020 ⇒ 00:13:23.409 Ashwini Sharma: Like, I know how coding works. I have coded for a very long period of time, right? But yeah, now and then, I still have to look into syntax and stuff like that, right? But a lot of, you know, coding work is done by AI these days, right? I leverage it a lot in order to write my…
85 00:13:23.790 ⇒ 00:13:25.150 Ashwini Sharma: So…
86 00:13:25.150 ⇒ 00:13:25.470 Awaish Kumar: Yeah.
87 00:13:25.470 ⇒ 00:13:25.890 Ashwini Sharma: Yeah.
88 00:13:25.890 ⇒ 00:13:35.969 Awaish Kumar: Like, my question was more like, not just understanding, like, not about the syntax, it’s more like understanding the core concepts of it.
89 00:13:35.970 ⇒ 00:13:36.419 Ashwini Sharma: Oh, yeah.
90 00:13:36.420 ⇒ 00:13:36.980 Awaish Kumar: Sweet.
91 00:13:37.160 ⇒ 00:13:47.240 Awaish Kumar: Like, this is how… For example, like, how you structure your code, like, the… modularity.
92 00:13:47.240 ⇒ 00:13:48.429 Ashwini Sharma: Right? Do you use it?
93 00:13:48.600 ⇒ 00:13:49.410 Awaish Kumar: Yeah.
94 00:13:49.410 ⇒ 00:13:50.530 Ashwini Sharma: Yeah.
95 00:13:50.730 ⇒ 00:13:57.370 Ashwini Sharma: So, yeah, yeah, yeah, I mean, like, I would say 4 in that case, for Python also. SQL is more than 4.
96 00:13:58.610 ⇒ 00:14:06.919 Awaish Kumar: Okay if… like… For example, like, if,
97 00:14:08.700 ⇒ 00:14:12.899 Awaish Kumar: In terms of data warehouses, what are the data warehouses you’ve worked on?
98 00:14:13.220 ⇒ 00:14:18.149 Ashwini Sharma: I’ve worked, on Databricks Snowflake, right?
99 00:14:18.290 ⇒ 00:14:21.120 Ashwini Sharma: I’ve also worked on BigQuery.
100 00:14:21.630 ⇒ 00:14:23.860 Ashwini Sharma: I’ve worked partially on Redshift.
101 00:14:24.270 ⇒ 00:14:32.200 Ashwini Sharma: That’s, that’s all. I’ve not… yeah, I’ve explored a little bit of, you know, Click House.
102 00:14:32.380 ⇒ 00:14:39.530 Ashwini Sharma: ran some queries on top of that, but yeah, that’s all, like, not… Not more than that.
103 00:14:40.550 ⇒ 00:14:50.179 Awaish Kumar: So, have you worked on any data, like, data pipeline where you have Optimize the existing conservations…
104 00:14:50.540 ⇒ 00:14:52.439 Awaish Kumar: Data warehouse architecture.
105 00:14:52.750 ⇒ 00:15:01.119 Awaish Kumar: And stuff like that, to, like, to optimize the cost, or to optimize the, the processing speed of the data pipeline.
106 00:15:02.390 ⇒ 00:15:18.079 Ashwini Sharma: Yeah, so basically, like, initially when I started with this, Entra, right, all the data was ingested, and, you know, the transformation was running on all that ingested data, and over a period of time, as we ingested more and more data, the
107 00:15:18.270 ⇒ 00:15:25.579 Ashwini Sharma: transformations started to become slower, so I adopted a partitioning strategy where, you know, all the
108 00:15:26.430 ⇒ 00:15:31.040 Ashwini Sharma: Data become partitioned, based on day, sorry, year, month, and day.
109 00:15:31.060 ⇒ 00:15:47.419 Ashwini Sharma: As well as, some of the models which were really complicated and had a huge amount of data, that got converted, I converted them into incremental models, so that we process on very small amounts of data. So, yeah, that’s… that’s how I’ve improved, some of the
110 00:15:47.790 ⇒ 00:15:48.890 Ashwini Sharma: Pipelines.
111 00:15:50.730 ⇒ 00:15:52.380 Awaish Kumar: Okay, is that, like…
112 00:15:54.690 ⇒ 00:16:02.159 Awaish Kumar: like, I’m… not just Entra, like, it was an overall question, like, if any of you… if you have any experience working on any kind of…
113 00:16:03.040 ⇒ 00:16:08.670 Awaish Kumar: Project where you have optimized the pipelines and… As code, and…
114 00:16:08.990 ⇒ 00:16:14.819 Awaish Kumar: Like, basically everything, from… you… you got something, that’s legacy.
115 00:16:15.010 ⇒ 00:16:16.939 Awaish Kumar: Then you have to optimize.
116 00:16:18.750 ⇒ 00:16:30.090 Ashwini Sharma: No, so I have not got, like, at Entry, it was all development, right? So, any code that I modified was my code only, which was not optimal at the point where I wrote the code.
117 00:16:30.090 ⇒ 00:16:39.020 Ashwini Sharma: I have worked on a very short period as a contractor with a UK-based firm, right? And they had DBT and Snowflake.
118 00:16:39.180 ⇒ 00:16:43.609 Ashwini Sharma: So, some of the code, optimizations were done there. It was written in a, in a…
119 00:16:43.760 ⇒ 00:16:55.849 Ashwini Sharma: In a wrong way, right? And, wrong way means, proper dimensional modeling was not there. So some of the optimizations, I did over there,
120 00:16:55.850 ⇒ 00:16:56.210 Awaish Kumar: What?
121 00:16:56.210 ⇒ 00:17:00.269 Ashwini Sharma: It was… but that was, yeah, I mean, like, it wasn’t worth mentioning, right?
122 00:17:00.860 ⇒ 00:17:07.190 Awaish Kumar: Okay, but how would you go for it? For, like, let’s have a hypothetical scenario that
123 00:17:07.869 ⇒ 00:17:13.770 Awaish Kumar: If you come into Brainforge, for example, we have a project which has a data pipeline.
124 00:17:13.770 ⇒ 00:17:14.300 Ashwini Sharma: Yeah.
125 00:17:14.690 ⇒ 00:17:18.270 Awaish Kumar: Which is not optimal, which is slow, costly.
126 00:17:18.390 ⇒ 00:17:29.030 Awaish Kumar: And we don’t know where, like, I’m not saying that my core is bad, or my architecture is bad, or my, source systems are bad, like.
127 00:17:29.090 ⇒ 00:17:38.519 Awaish Kumar: I don’t know, like, how would you… like, you have to figure out where the problem is, and then how to solve it. How would you, like, go for it? How would you approach that?
128 00:17:39.910 ⇒ 00:17:57.199 Ashwini Sharma: No, break it down, right? What is the cost involved in ingestion, right? Look at that. What is the cost involved in transformation, right? These are the two main areas where… and then maybe there is some cost involved in analysis and, you know, visualization. So break it down and then find out, like, where is the most,
129 00:17:57.200 ⇒ 00:18:05.699 Ashwini Sharma: expensive queries running, right? I doubt… I mean, like, for the ingestion part, I’m not much bothered, because
130 00:18:05.930 ⇒ 00:18:29.790 Ashwini Sharma: you know, the ingestion, you might have been using a tool like Fivetran or something else, AirByte Cloud or something like that, right? And doing the ingestion. Maybe, yeah, it is possible that the tool that you have choose to ingest data, like, for example, Fivetran, is much more expensive than the other options that are there in the market, right? So maybe that is one area that you can look at, right? Okay, instead of using Fivetran, maybe we should use something else.
131 00:18:29.990 ⇒ 00:18:48.320 Ashwini Sharma: But that’s just one aspect, right? Now let’s come… let’s finish with the ingestion. Now we are there with the transformation and visualization, right? Let’s look into the cost involved in each of these, right? Why are the costs high? Is the visualization cost high? Because, you know, the models are really inefficient. Then maybe we should re-look at how the models
132 00:18:48.320 ⇒ 00:18:58.280 Ashwini Sharma: Have been built, and expose a model on the serving layer, which is really efficient for visualization purpose, or analysis purpose, right?
133 00:18:58.280 ⇒ 00:19:10.659 Ashwini Sharma: And the other thing is, in the core transformation layer, where most of the transformation is happening, you might want to re-look at the queries that you are running over there. Maybe there are queries that are written in a really inefficient manner.
134 00:19:11.100 ⇒ 00:19:27.239 Ashwini Sharma: mainly duplicate queries, or things like that, right? That happens a lot, as you’re… I mean, on legacy systems, right, you don’t really think about what you’re doing, and you just keep doing, and ultimately, a lot of tech debt accumulates, which
135 00:19:27.430 ⇒ 00:19:32.890 Ashwini Sharma: results into a lot of unnecessary code, which could have been totally avoided. So…
136 00:19:33.240 ⇒ 00:19:35.880 Ashwini Sharma: So, for example, my… I mean…
137 00:19:36.200 ⇒ 00:19:38.090 Awaish Kumar: Yeah, let’s say you figure out…
138 00:19:38.760 ⇒ 00:19:40.990 Awaish Kumar: There is a problem in the…
139 00:19:41.100 ⇒ 00:19:48.910 Awaish Kumar: and transformations. There’s a problem with data-based architecture itself. That’s not optimal.
140 00:19:49.290 ⇒ 00:19:55.440 Awaish Kumar: The data… the… Transformations written, the queries written are not optimal.
141 00:19:55.550 ⇒ 00:20:01.099 Awaish Kumar: So, how would you, like… Like, go for it, like…
142 00:20:01.100 ⇒ 00:20:10.810 Ashwini Sharma: Yeah, you’ll have to see the execution plan of the query, right? How’s the, you know, query getting executed, and then see how you can improve it, right?
143 00:20:11.040 ⇒ 00:20:21.309 Ashwini Sharma: A lot of times, like, when, you know, inexperienced people will write the query, it’s really inefficient, so you’ll have to change that. I mean, see the plan, and then decide where you have to make it.
144 00:20:21.610 ⇒ 00:20:25.339 Awaish Kumar: You can just look at the carry, and you’ve… You can tell.
145 00:20:25.340 ⇒ 00:20:31.709 Ashwini Sharma: No, you have to look at the execution plan, mainly. Like, looking at the query, if it is very big, you might not be able.
146 00:20:32.630 ⇒ 00:20:36.500 Awaish Kumar: There are a few things, like the best practices, if they are not.
147 00:20:36.500 ⇒ 00:20:37.589 Ashwini Sharma: Oh, yeah, yeah.
148 00:20:38.000 ⇒ 00:20:44.960 Ashwini Sharma: Yeah, yeah, some of them are there, yeah, you can probably look at it and figure it out also. Like, for example, like, some things, right,
149 00:20:45.450 ⇒ 00:20:55.270 Ashwini Sharma: that come to the top of my mind is, like, people do select star, right, which can be totally avoided, right? They do, select distinct, and then, you know.
150 00:20:55.320 ⇒ 00:21:02.339 Ashwini Sharma: give a list of columns. I don’t like that, personally, right? I don’t… because that testing does a lot of set operations, and…
151 00:21:02.340 ⇒ 00:21:15.930 Ashwini Sharma: instead of distinct, what I do is I do a select group of columns that I want to be distinct, and then do an aggregate function on… at the end, and then do a group by. That’s more… much more efficient than…
152 00:21:15.940 ⇒ 00:21:17.530 Ashwini Sharma: Doing a select distinct.
153 00:21:17.600 ⇒ 00:21:20.870 Ashwini Sharma: It, it returns you the same thing.
154 00:21:21.160 ⇒ 00:21:27.560 Ashwini Sharma: The other thing is, looking at the kind of joins that are happening, right?
155 00:21:29.460 ⇒ 00:21:34.720 Ashwini Sharma: Basically, yeah, avoiding cross-joints, right?
156 00:21:36.180 ⇒ 00:21:37.840 Ashwini Sharma: And what else? .
157 00:21:37.840 ⇒ 00:21:39.020 Awaish Kumar: For example.
158 00:21:39.020 ⇒ 00:21:39.580 Ashwini Sharma: Really funny.
159 00:21:40.060 ⇒ 00:21:43.759 Awaish Kumar: There’s a… there are two tables, which are joining on… on a column.
160 00:21:43.980 ⇒ 00:21:52.820 Awaish Kumar: Right? That’s… that is needed. That… that is a left join, or you can say inner joint, and the problems…
161 00:21:53.730 ⇒ 00:21:57.080 Awaish Kumar: There’s a join on one column, and…
162 00:21:57.600 ⇒ 00:22:03.029 Awaish Kumar: That’s required, like… like, I would say I can’t just, remove it.
163 00:22:03.030 ⇒ 00:22:04.840 Ashwini Sharma: You can’t avoid it, yeah. Okay.
164 00:22:05.070 ⇒ 00:22:06.639 Awaish Kumar: But it’s very slow.
165 00:22:06.970 ⇒ 00:22:08.479 Awaish Kumar: What could be the…
166 00:22:11.070 ⇒ 00:22:18.110 Ashwini Sharma: Well, in terms of our OLTP database, maybe that joint column should be indexed, right? I would say that.
167 00:22:18.390 ⇒ 00:22:23.459 Ashwini Sharma: In terms of warehouse, I don’t think that that would matter, right?
168 00:22:25.640 ⇒ 00:22:26.510 Awaish Kumar: Okay.
169 00:22:27.900 ⇒ 00:22:28.770 Awaish Kumar: What?
170 00:22:28.960 ⇒ 00:22:35.279 Awaish Kumar: What, like, indexing technique we can apply on that column, if that column is a string?
171 00:22:36.830 ⇒ 00:22:39.870 Ashwini Sharma: Sorry, what… sorry, I didn’t follow your question.
172 00:22:39.870 ⇒ 00:22:44.379 Awaish Kumar: Data type, what indexing strategy we can apply on that column?
173 00:22:45.600 ⇒ 00:23:00.130 Ashwini Sharma: So, like, you could just create the index in such a way that it distributes the data in a equal fashion, right? So, what that would lead is, like, it will not,
174 00:23:00.330 ⇒ 00:23:02.709 Ashwini Sharma: It’ll ensure that the workload does not
175 00:23:02.990 ⇒ 00:23:06.149 Ashwini Sharma: Overload a single node that is processing it.
176 00:23:06.260 ⇒ 00:23:12.269 Ashwini Sharma: See, all these warehouse queries are distributed computing in the backend, right? So when you execute.
177 00:23:12.950 ⇒ 00:23:19.899 Awaish Kumar: Yeah, yeah, I’m asking, like, we are applying indexing on, for example, it’s Postgres database. We are applying indexing…
178 00:23:20.390 ⇒ 00:23:21.830 Ashwini Sharma: Dinner…
179 00:23:21.850 ⇒ 00:23:27.280 Awaish Kumar: there are a bunch of indexing techniques you can apply, right?
180 00:23:27.430 ⇒ 00:23:28.909 Awaish Kumar: On the word column.
181 00:23:29.090 ⇒ 00:23:32.630 Ashwini Sharma: I’m not aware of that,
182 00:23:33.730 ⇒ 00:23:35.620 Ashwini Sharma: No, I’m not aware of that, yeah.
183 00:23:36.760 ⇒ 00:23:40.590 Awaish Kumar: Like, do you know how indexing works?
184 00:23:40.590 ⇒ 00:23:41.740 Ashwini Sharma: Yeah, yeah, I know how…
185 00:23:42.470 ⇒ 00:23:42.920 Awaish Kumar: Okay.
186 00:23:42.920 ⇒ 00:23:51.910 Ashwini Sharma: basically, like, it creates an inverted tree-like structure in the backend, right? Which is called a B plus tree, or a tri structure, maybe, in some…
187 00:23:51.950 ⇒ 00:23:54.200 Awaish Kumar: Documents also, yeah, tree, yeah.
188 00:23:54.200 ⇒ 00:23:56.669 Ashwini Sharma: And normally, like, it’s,
189 00:23:56.850 ⇒ 00:24:06.519 Ashwini Sharma: Basically, you… it’s like searching a binary tree, right? You kind of follow the pointers and then look at the posting list when you identify the terms.
190 00:24:07.430 ⇒ 00:24:08.200 Awaish Kumar: Okay.
191 00:24:08.900 ⇒ 00:24:14.570 Awaish Kumar: We added index, and of that column.
192 00:24:17.530 ⇒ 00:24:26.680 Awaish Kumar: And then, like, the… So then there are, like, two different types of indexes as well. One is called
193 00:24:26.800 ⇒ 00:24:32.130 Awaish Kumar: Cluster index, And one is called… Clustered index, so…
194 00:24:32.620 ⇒ 00:24:35.290 Awaish Kumar: Can you explain what are the differences?
195 00:24:36.700 ⇒ 00:24:40.789 Ashwini Sharma: Clustered index and non-clustered, that’s a DBA concept, right? Okay.
196 00:24:41.920 ⇒ 00:24:49.510 Ashwini Sharma: What’s a cluster? I do not know what’s a cluster index. I’ve read it somewhere, but yeah, I never had to utilize that thing.
197 00:24:50.020 ⇒ 00:24:54.400 Ashwini Sharma: Okay. But are these applicable on the warehouses? No, right?
198 00:24:56.570 ⇒ 00:24:59.889 Awaish Kumar: It depends, like, there are different…
199 00:25:00.370 ⇒ 00:25:02.809 Awaish Kumar: Like, if you are using Redshift.
200 00:25:02.950 ⇒ 00:25:03.910 Ashwini Sharma: Then…
201 00:25:03.910 ⇒ 00:25:10.359 Awaish Kumar: The… have similar concepts, like disk keys, they call it.
202 00:25:11.320 ⇒ 00:25:25.550 Awaish Kumar: the strategy is called using disk key. That is, like, a key which basically distributes the data uniquely across different nodes. That’s, like, indexing, but it’s not called indexing in that shift.
203 00:25:25.870 ⇒ 00:25:26.690 Ashwini Sharma: Okay.
204 00:25:29.020 ⇒ 00:25:30.440 Ashwini Sharma: Cool, yeah.
205 00:25:30.440 ⇒ 00:25:38.079 Awaish Kumar: Okay, so, yeah, I think, yeah, we are almost on the time, so I would leave this time to…
206 00:25:38.200 ⇒ 00:25:41.670 Awaish Kumar: I’ll let you know, he’ll let you ask questions.
207 00:25:41.930 ⇒ 00:25:44.149 Awaish Kumar: Like, about breadfoot, or about anything?
208 00:25:45.260 ⇒ 00:25:52.549 Ashwini Sharma: Oh, okay. So, yeah, I mean, like, you know, what does a typical workflow look like, I mean…
209 00:25:52.780 ⇒ 00:25:54.330 Ashwini Sharma: Typical day.
210 00:25:54.590 ⇒ 00:25:59.820 Ashwini Sharma: for an engineer, a consultant who is working on… at Brain Forge.
211 00:26:00.720 ⇒ 00:26:04.450 Awaish Kumar: Typical days, like, we have standoffs.
212 00:26:04.660 ⇒ 00:26:08.010 Awaish Kumar: Probably at the beginning of your day.
213 00:26:09.340 ⇒ 00:26:11.730 Awaish Kumar: Or the, like, first half of your…
214 00:26:12.290 ⇒ 00:26:16.270 Awaish Kumar: day, like, according to Eastern Time Zone, mostly you have a few meetings.
215 00:26:16.500 ⇒ 00:26:20.370 Awaish Kumar: Stand-ups, maybe planning, grooming, or things like that.
216 00:26:20.520 ⇒ 00:26:24.889 Awaish Kumar: Where you plan for your project with your team.
217 00:26:25.920 ⇒ 00:26:28.600 Awaish Kumar: That includes, like, project manager.
218 00:26:29.460 ⇒ 00:26:40.649 Awaish Kumar: that maybe include, like, some… like, someone like, Utam or Robert, like, as a strategic, person, like, account manager for that project.
219 00:26:40.820 ⇒ 00:26:45.130 Awaish Kumar: And then… once…
220 00:26:46.190 ⇒ 00:27:00.599 Awaish Kumar: That is off, like, you basically can just go and work on your tickets, which are assigned… we use linear for that, so there are tickets in Linear, you can go and work on that, and obviously, we have Slack for communication, you are… you will be included in the
221 00:27:00.940 ⇒ 00:27:08.470 Awaish Kumar: channels, you… client channels where you are working, and you can ask questions, you can ask… we have some general, like,
222 00:27:08.500 ⇒ 00:27:27.829 Awaish Kumar: team, specific channels, like data team, and for AI team, and things like that. So, you can ask questions if… if you need help, or something, or something like that. And… but the main… also, like, you have to do some context switching in a day, maybe, because we are working… like, one engineer might be working on multiple clients, and that’s also…
223 00:27:29.300 ⇒ 00:27:39.019 Awaish Kumar: Because, like, the clients we get are… are… sometimes, like, doesn’t need a full-time analytics engineer or a data engineer, they need, like.
224 00:27:39.290 ⇒ 00:27:48.689 Awaish Kumar: like, 20 hours per week from a data engineer, and 20 hours per week from a analyst. So we distribute it, like, the needs of…
225 00:27:49.160 ⇒ 00:27:50.060 Awaish Kumar: So you will…
226 00:27:50.560 ⇒ 00:27:59.009 Awaish Kumar: Maybe if you are full-time here, well, 20 hours… for the 20 hours, you will be working on one client, maybe the other 20 hours will be divided
227 00:27:59.520 ⇒ 00:28:01.120 Awaish Kumar: Some other projects.
228 00:28:01.120 ⇒ 00:28:03.110 Ashwini Sharma: Other clients, okay.
229 00:28:03.110 ⇒ 00:28:04.390 Awaish Kumar: project, whatever.
230 00:28:04.720 ⇒ 00:28:08.180 Ashwini Sharma: And you’ve been here for almost, like, over a year now, right?
231 00:28:09.090 ⇒ 00:28:13.219 Ashwini Sharma: Okay, and what has the experience been, like.
232 00:28:14.040 ⇒ 00:28:18.079 Awaish Kumar: Yeah, my experience is… good, like,
233 00:28:18.630 ⇒ 00:28:20.990 Awaish Kumar: What I would say is that, like.
234 00:28:22.420 ⇒ 00:28:27.120 Awaish Kumar: It’s a startup, obviously, you have to… it is fast-paced.
235 00:28:27.240 ⇒ 00:28:27.840 Awaish Kumar: To round?
236 00:28:27.840 ⇒ 00:28:28.789 Ashwini Sharma: Exactly, yeah, yeah.
237 00:28:28.790 ⇒ 00:28:29.380 Awaish Kumar: Oh.
238 00:28:29.550 ⇒ 00:28:32.870 Awaish Kumar: It’s a fast-paced environment, you have to work really…
239 00:28:33.050 ⇒ 00:28:37.789 Awaish Kumar: Past and hard, and if you are… like, there’s no…
240 00:28:38.320 ⇒ 00:28:45.390 Awaish Kumar: People are collaborative, you can ask the questions, and come, like, bring your solution to the finish line.
241 00:28:45.440 ⇒ 00:29:02.089 Awaish Kumar: But it’s a startup environment, and a lot of context switching. As I mentioned, you might… if you are working on multiple clients, some clients, maybe you’re working on one client, but you are getting questions for another client on some… something else.
242 00:29:02.150 ⇒ 00:29:11.749 Awaish Kumar: You have to switch your mind between answering them and then working on some other stuff and things like that. And then maybe helping team members, like, maybe I asked something…
243 00:29:11.820 ⇒ 00:29:16.360 Awaish Kumar: Yeah. Which is entirely different, yeah, like, so that, that can happen.
244 00:29:16.660 ⇒ 00:29:23.090 Ashwini Sharma: Got it, got it. Yeah, yeah, I mean, like, any startup is like that only. Fivetran was like that when we started here.
245 00:29:23.280 ⇒ 00:29:25.090 Ashwini Sharma: So, cool.
246 00:29:25.090 ⇒ 00:29:25.710 Awaish Kumar: That’s…
247 00:29:26.070 ⇒ 00:29:28.519 Ashwini Sharma: Yeah, I don’t have any more questions.
248 00:29:29.450 ⇒ 00:29:31.460 Awaish Kumar: Okay, thank you.
249 00:29:31.460 ⇒ 00:29:32.110 Ashwini Sharma: Yeah.
250 00:29:32.110 ⇒ 00:29:35.269 Awaish Kumar: I’ll go back on some of my feedback, and then Rico
251 00:29:36.000 ⇒ 00:29:38.240 Awaish Kumar: Come back with the next steps.
252 00:29:38.240 ⇒ 00:29:40.780 Ashwini Sharma: Sure, alright. Thanks. Thanks, Avish.