Meeting Title: Brainforge Interview w- Demilade Date: 2026-02-25 Meeting participants: Kaela Gallagher, Anish Gupta, Demilade Agboola
WEBVTT
1 00:00:14.750 ⇒ 00:00:15.740 Anish Gupta: Hello.
2 00:00:20.660 ⇒ 00:00:32.509 Kaela Gallagher: Hi, Anish. It’s Ayla. I, am kind of on the recruiting side of the house, so I’m not going to be interviewing you today, just sitting in and listening.
3 00:00:32.790 ⇒ 00:00:34.220 Anish Gupta: Sure, sure, sounds good. Okay.
4 00:00:38.900 ⇒ 00:00:42.849 Demilade Agboola: Hi, Anish. Hey, hi. My name is Dimlady.
5 00:00:42.990 ⇒ 00:00:44.719 Demilade Agboola: Nice to meet you.
6 00:00:45.010 ⇒ 00:00:46.100 Anish Gupta: Nice to meet you too.
7 00:00:46.890 ⇒ 00:00:49.899 Anish Gupta: So, I know you’ve had one interview already.
8 00:00:49.970 ⇒ 00:00:55.689 Demilade Agboola: And so the context of this interview will just be, like, us talking through…
9 00:00:55.980 ⇒ 00:00:59.800 Demilade Agboola: Like, technical things, and just kind of having an understanding of how you…
10 00:01:00.740 ⇒ 00:01:06.830 Demilade Agboola: Problem solve, and how you think about these technical things as we, walk through the interview.
11 00:01:07.010 ⇒ 00:01:08.390 Anish Gupta: Okay, yeah, sounds good.
12 00:01:08.780 ⇒ 00:01:19.819 Demilade Agboola: I’m aware you might have some questions, or some things about how Brainforge or anything, so towards the end, I will also give you room to ask any questions, or, you know, anything that you might want to ask about…
13 00:01:20.170 ⇒ 00:01:22.329 Demilade Agboola: Bring Forge and just, like, the company.
14 00:01:22.660 ⇒ 00:01:24.080 Anish Gupta: Sure, sounds good, yeah, yeah.
15 00:01:24.690 ⇒ 00:01:33.539 Demilade Agboola: Okay, I think we can start off with the basics, which would just be, like, please, can you tell me about yourself and, like, your work experience and technical experience?
16 00:01:33.900 ⇒ 00:01:40.399 Anish Gupta: Yeah, yeah. So, my name’s Anish. I’m a software engineer currently at Juniper Networks. I’ve been there for about…
17 00:01:40.680 ⇒ 00:01:46.019 Anish Gupta: 2 years now, or a year and a half now. I graduated from UC Berkeley before that.
18 00:01:46.180 ⇒ 00:02:04.820 Anish Gupta: With the data science and chemical engineering double major. So, I’ve been… I’ve been working at Juniper as an AIML engineer, a lot more on the… on the, AI side more so than ML, but, I’ve been, you know, been working on our main product, our… which is, like,
19 00:02:04.860 ⇒ 00:02:23.720 Anish Gupta: an agent to help our users, our customers, to be able to track and predict network traffic, because Juniper’s working as a routing-focused company, working with Wi-Fi routers and modems, things like that. So, this product is so we can automatically track network traffic and predict errors before they occur, and then
20 00:02:23.950 ⇒ 00:02:33.099 Anish Gupta: start to allocate and divert network traffic as needed to try to prevent network downtimes and improve overall customer experience, so…
21 00:02:33.170 ⇒ 00:02:48.999 Anish Gupta: Yeah, most of my work recently on that end has been on some model development. I’ve been working a lot with managing our current ETL pipelines that we have set up for this system, because we’re transferring a lot of data at once, obviously, so working with a lot of
22 00:02:49.000 ⇒ 00:02:53.209 Anish Gupta: Airflow, working with a lot of Kafka, topics as well, so…
23 00:02:53.210 ⇒ 00:03:03.459 Anish Gupta: Yeah, it’s been a pretty extensive and pretty varied work experience so far. Pretty customer-facing as well, since we’re working with our customers that are using these routers.
24 00:03:03.650 ⇒ 00:03:08.609 Anish Gupta: So, you know, like, getting a lot of experience working with clients and client meetings and things like that. So, yeah.
25 00:03:09.270 ⇒ 00:03:10.860 Demilade Agboola: Okay, sounds good, sounds good.
26 00:03:13.470 ⇒ 00:03:26.710 Demilade Agboola: So, given your experience, what would you say the most complex part, or the most complex data pipeline you’ve worked on was, and what made it complex, and how did you, like, handle and navigate the complexity?
27 00:03:27.160 ⇒ 00:03:45.790 Anish Gupta: Yeah, yeah, sure. So, I think the current… the current project I’m working on is probably my most complex, type of… type of pipeline that I’ve been working with. So, since we’re using these network devices to be able to track, all these different types of data, right, since it’s not just… it’s not just data, like.
28 00:03:45.940 ⇒ 00:03:56.549 Anish Gupta: you know, like, information received here and information outputted there. We’re receiving a lot of status of individual devices, overall device summaries,
29 00:03:56.650 ⇒ 00:04:16.039 Anish Gupta: constant time… time series dependent logs. So, a lot of my work recently has been actually using these individual data… data piece and data types to try to create our actual… we call them entity types, but they’re different error categories, for this… for this error tracking and error prediction. So.
30 00:04:16.070 ⇒ 00:04:31.369 Anish Gupta: For example, if we see something like a high temperature reading, or a group of routers in a similar geographical location giving a higher CPU outputs than normal for that type of device, we need to automatically translate that as
31 00:04:31.370 ⇒ 00:04:48.710 Anish Gupta: potential high CPU error, and then also try to correlate other errors that are given at the same time from that device to try to predict what the overall problem is. So, not just on the individual device side, but on our overall, like, grand system, it’s been a lot of
32 00:04:48.820 ⇒ 00:05:08.130 Anish Gupta: trying to predict, trying to, correlate individual error types into one, specific recommendation. So that’s been what I’ve been working on so far, and it’s a lot of multi-threaded work as well, because this has to scale up very well for large amounts of devices and large amounts of networks. So the scalability definitely has been the most challenging aspect there.
33 00:05:09.240 ⇒ 00:05:14.720 Demilade Agboola: And how do you handle the scalability? What sort of architecture do you use?
34 00:05:15.030 ⇒ 00:05:17.629 Demilade Agboola: And how do you just handle that sort of volume?
35 00:05:18.040 ⇒ 00:05:31.510 Anish Gupta: Yeah, yeah, for… for the scalability, we’ve been working on trying to develop, like, a multi… multi-container approach. Since we’re using these individual, airflow… airflow,
36 00:05:31.530 ⇒ 00:05:32.860 Anish Gupta: pipelines.
37 00:05:32.860 ⇒ 00:05:51.619 Anish Gupta: and these individual topics, these Kafka topics, to stream our data. We’re trying to run a multi-server approach and, like, horizontally scaling our system, not just vertically scaling our system, so developing multiple servers running simultaneously, working on implementing load balancers for our system.
38 00:05:51.680 ⇒ 00:05:59.399 Anish Gupta: We’ve been using a lot of Redis caches as well for that work, so when we get more frequent errors, like temperature readings or CPU
39 00:05:59.400 ⇒ 00:06:16.610 Anish Gupta: issues, we can cache those more commonly read errors and more commonly stored, error types, and then we can pull them up very faster and implement that sort of faster caching approach. So that’s been a very interesting field that we’ve been working on more recently. More interesting technology we’ve been working recently with.
40 00:06:17.360 ⇒ 00:06:18.819 Demilade Agboola: Okay, that’s fair, that’s fair.
41 00:06:19.000 ⇒ 00:06:31.329 Demilade Agboola: What would you… what would you… I mean, I know it tends to be more extensive than what you can always list, but I’m just curious what your… what you would consider your tech stack to be, so what tools, and what, like…
42 00:06:31.600 ⇒ 00:06:35.259 Demilade Agboola: Technologies would you say you have, on lock?
43 00:06:35.610 ⇒ 00:06:39.420 Anish Gupta: Yeah, so I think for me, like, my… like, my, my…
44 00:06:39.420 ⇒ 00:07:02.800 Anish Gupta: My main language of experience is Python. It’s what I’ve been using most of my… most of my coding and most of my engineering career, but I’ve been more recently working a lot with Go, since our network devices and our network tracking has been working with, Golang, so I’ve been using Go there. Java has always been a more developed language I’ve been working with. And then, on the more tech stack side.
45 00:07:02.900 ⇒ 00:07:17.650 Anish Gupta: Been using a lot of Docker containers. S3 always is always, like, a recommended, and obviously something I’ve been working with a lot. We’re using our S3 to store all of our individual data files. Airflow, like I mentioned before, and Kafka topics as well.
46 00:07:18.050 ⇒ 00:07:24.400 Anish Gupta: Also, we’ve been using a lot of PyTorch for our model prediction and model development.
47 00:07:24.620 ⇒ 00:07:41.919 Anish Gupta: So, a lot of, like, PyTorch, PySpark, that kind of implementation there. Since with these, with these data types, it’s a lot of data storage, so we’re using PySpark instead of, like, instead of Pandas for our data manipulation, just for faster and larger scalability.
48 00:07:41.970 ⇒ 00:07:45.859 Anish Gupta: Additionally, I’ve been working with a lot of
49 00:07:46.380 ⇒ 00:07:53.419 Anish Gupta: different… sorry, a lot of different… I mentioned Docker already, right? What’d I miss.
50 00:07:53.650 ⇒ 00:08:13.160 Anish Gupta: Yeah, I’ve been doing a lot of model work with, like, libraries such as, you know, Seabourn, Pandas, NumPy, like, all these more commonly used li- these Python libraries, TensorFlow, obviously, for machine learning as well. So, yeah, and gotten some work on, like, some more popular NLP libraries as well for, not just for…
51 00:08:13.220 ⇒ 00:08:23.289 Anish Gupta: our normal natural language processing, but since we want to convert these error types to be more user-friendly and more user-usable, I’ve been using a lot of,
52 00:08:23.770 ⇒ 00:08:27.420 Anish Gupta: like, a lane chain for those AI agent works, and…
53 00:08:27.670 ⇒ 00:08:39.830 Anish Gupta: Yeah, LangChain’s been my agent work of, like, my workflow there, but we’ve been using that more recently for our customer-facing error modeling and error detection as well.
54 00:08:40.690 ⇒ 00:08:42.950 Demilade Agboola: That’s fair, that’s a pretty good,
55 00:08:44.600 ⇒ 00:08:51.289 Demilade Agboola: set of, tools to know and have. Just one question, because I noticed you didn’t mention it, I just wanted to be sure if
56 00:08:51.700 ⇒ 00:08:55.439 Demilade Agboola: Misha and Ari, if it was just… do you… do you write SQL?
57 00:08:56.050 ⇒ 00:08:58.650 Anish Gupta: Oh, yeah, yeah, yeah, oh, I forgot to sign with Sarah, yeah.
58 00:08:58.650 ⇒ 00:09:00.200 Demilade Agboola: Yeah, just wanted to be sure, because that…
59 00:09:00.200 ⇒ 00:09:05.210 Anish Gupta: Yeah, for sure, for sure, for sure. Yeah. SQL, that’s good.
60 00:09:06.830 ⇒ 00:09:09.279 Demilade Agboola: Alright, so let’s… let’s come up with,
61 00:09:11.410 ⇒ 00:09:21.210 Demilade Agboola: we’re going to start, like, system design, so it’s just kind of, we’re going to think through a system. Just kind of want to have this… this would be kind of the kind of thing you would come across.
62 00:09:21.880 ⇒ 00:09:22.520 Anish Gupta: Sure.
63 00:09:23.160 ⇒ 00:09:30.189 Demilade Agboola: So the idea is I just want to hear and see how you would walk through solving this, system design.
64 00:09:30.190 ⇒ 00:09:31.999 Anish Gupta: Okay, that sounds good.
65 00:09:32.650 ⇒ 00:09:36.150 Demilade Agboola: Alright, so let’s say we have a client who has a…
66 00:09:36.550 ⇒ 00:09:39.389 Demilade Agboola: Daily revenue marks that they want to get built out.
67 00:09:39.530 ⇒ 00:09:42.820 Demilade Agboola: Right? So they want to see their revenue on, every day.
68 00:09:43.780 ⇒ 00:09:48.830 Demilade Agboola: And they have 3 main data sources. So they have Salesforce, Yeah, Stripe.
69 00:09:49.120 ⇒ 00:09:54.090 Demilade Agboola: If you don’t know any of this or what’s going on there, you can always ask clarifying questions.
70 00:09:54.090 ⇒ 00:09:54.930 Anish Gupta: Yeah, that’s cool.
71 00:09:54.930 ⇒ 00:09:55.530 Demilade Agboola: with us.
72 00:09:55.550 ⇒ 00:09:58.749 Anish Gupta: So we have Salesforce, they have Stripe, and they have Google Ads.
73 00:09:59.560 ⇒ 00:10:01.469 Demilade Agboola: And so they want to gather into a…
74 00:10:02.260 ⇒ 00:10:06.769 Demilade Agboola: A cloud warehouse, any one of your choice, you don’t need it, but no restricting it here.
75 00:10:08.450 ⇒ 00:10:15.899 Demilade Agboola: Off the top of your head, and you can take some seconds to think about it, doesn’t have to be, like, instantaneous, how would you design this solution?
76 00:10:17.340 ⇒ 00:10:34.539 Anish Gupta: Yeah, so I just want to kind of clarify our question here. So we want to be able to gather up our data from these individual services and store them in our data warehouse just for employee lookup, or is it for user… so, like, users can look at their information through a dashboard? Like, what is our main, end use case here?
77 00:10:35.270 ⇒ 00:10:43.679 Demilade Agboola: Great question. These are the kind of things we want people to ask. Okay, so the end goal is we want to have our C-suite, so, like, the CEOs and the…
78 00:10:43.960 ⇒ 00:10:47.969 Demilade Agboola: CFOs, they need a dashboard to be looking at every day.
79 00:10:47.970 ⇒ 00:10:50.010 Anish Gupta: Cool. Okay. Sounds good. Okay.
80 00:10:50.130 ⇒ 00:11:09.479 Anish Gupta: So, that would most likely affect how we’re gonna actually eventually store our data, so we can make it easier to translate into a dashboard, or just a more visually understanding output there. So, yeah. And on the client side, so in this case, our client is our C-suite here, is that right?
81 00:11:09.480 ⇒ 00:11:10.960 Demilade Agboola: Yeah, the stakeholders, yeah.
82 00:11:10.960 ⇒ 00:11:12.730 Anish Gupta: Stakeholders, yeah, okay, cool.
83 00:11:13.630 ⇒ 00:11:21.760 Anish Gupta: Cool, cool. And, for our main data sources, so these are… these are gathering just customer…
84 00:11:22.030 ⇒ 00:11:33.110 Anish Gupta: customer results of our individuals, like, Stripe for our payments, Salesforce, I guess, for our individual services that the customers are going to be paying for, I believe? Is that… I’m understanding that correctly?
85 00:11:33.250 ⇒ 00:11:34.560 Anish Gupta: Would that be a Salesforce historian.
86 00:11:34.560 ⇒ 00:11:39.779 Demilade Agboola: We also have, like, information around, like, the accounts, the team members within the accounts.
87 00:11:39.780 ⇒ 00:11:40.319 Anish Gupta: Yeah, yeah.
88 00:11:40.320 ⇒ 00:11:47.229 Demilade Agboola: different stages that they have to go through to, for instance, pay, or like… So you can kind of see if you have a flow, for instance.
89 00:11:49.000 ⇒ 00:11:56.439 Demilade Agboola: where the people are dropping off, and where people finally make the payments. You can also use that if you want to model, like, other stuff around that.
90 00:11:56.820 ⇒ 00:12:04.610 Anish Gupta: Okay, cool, cool, that makes sense. So yeah, I think… so, one thing I would start off with is just…
91 00:12:06.340 ⇒ 00:12:24.619 Anish Gupta: kind of trying to figure out where our individual users, will be kind of grouped in, so we can try to quantize our main user groups here. So, our customers that this group will be looking at, that the clients want to analyze, would most likely be grouped as
92 00:12:26.240 ⇒ 00:12:45.709 Anish Gupta: like, people that are going to be buying our individual separate services and their main account storages, but do we know if there is, like, there are more details on specific user groups or buckets that we can group our users into? Or the users that the customer might be interested in, or is that just up to our individual analysis?
93 00:12:45.710 ⇒ 00:12:50.079 Demilade Agboola: So let’s not get too deep into the data of it.
94 00:12:50.970 ⇒ 00:12:54.249 Demilade Agboola: Let’s think more of the system of it.
95 00:12:54.430 ⇒ 00:12:55.270 Anish Gupta: Okay, okay.
96 00:12:55.520 ⇒ 00:12:57.750 Demilade Agboola: Don’t focus on, like, the data.
97 00:12:57.750 ⇒ 00:12:58.400 Anish Gupta: Yeah, yeah.
98 00:12:58.630 ⇒ 00:13:01.399 Demilade Agboola: Just focus on the system.
99 00:13:01.550 ⇒ 00:13:06.780 Demilade Agboola: How would you go about designing and building that system? What are the parameters you’d be looking out for?
100 00:13:07.050 ⇒ 00:13:09.240 Demilade Agboola: And how would you ensure that
101 00:13:10.330 ⇒ 00:13:15.870 Demilade Agboola: Whatever system you’re building will meet the needs Of the stakeholders.
102 00:13:17.180 ⇒ 00:13:23.749 Demilade Agboola: So that’s just basically… so let’s not go too granular, just kind of stay high level, but we’re talking about the systems that you’ll be…
103 00:13:24.980 ⇒ 00:13:39.480 Anish Gupta: Great, okay, okay, that makes sense then. Okay, so… so we can maybe start to look at our, like, our main requirements that we’re going to be looking for here, which seems to be that we want to gather up our individual, gather up all our data into a visual dashboard.
104 00:13:39.480 ⇒ 00:13:56.509 Anish Gupta: Most likely split it up into different user groups and different user types, per individual service. So, like, if you’re looking at Salesforce, you want to see specific types of accounts. If you’re looking at Stripe, you want to see specific payment options and payment plans that the users might have set up, things like that. So, different user grouping.
105 00:13:56.510 ⇒ 00:14:05.290 Anish Gupta: And we want to make sure that this is a pretty low latency system, I’m assuming, because our clients will want to be able to quickly look up information and make sure it’s readily available, so…
106 00:14:05.470 ⇒ 00:14:08.440 Anish Gupta: Keep… just keeping in mind of that.
107 00:14:08.610 ⇒ 00:14:11.490 Demilade Agboola: When you say low latency, are you referring to latency
108 00:14:11.870 ⇒ 00:14:16.789 Demilade Agboola: Data latency, or are you referring to… the dashboard latency.
109 00:14:17.680 ⇒ 00:14:21.880 Anish Gupta: Oh, I mean, I mean database latency, right? Because as…
110 00:14:21.990 ⇒ 00:14:33.200 Anish Gupta: as a customer gets updated on a payment plan, we want to be able to quickly reflect that into our database so that we can see it on our end, relatively quickly. But…
111 00:14:33.600 ⇒ 00:14:38.689 Anish Gupta: Speaking of the low latencies, on the latency side, I believe what most likely would be more important
112 00:14:38.800 ⇒ 00:14:48.489 Anish Gupta: is… we want to make sure that we have more relevant information, more so than, I guess, more quickly gathered information, so consistency might be more important there, actually, than latency, now that I’m thinking about it.
113 00:14:48.770 ⇒ 00:14:50.060 Anish Gupta: Because we want to have…
114 00:14:50.400 ⇒ 00:15:04.220 Anish Gupta: Does that make sense? I think I’m thinking on the client side, we want to be able to gather not just necessarily the most up-to-date… not just… sorry… not gathering up the information as fast as possible, but gathering the most relevant and most up-to-date information.
115 00:15:04.740 ⇒ 00:15:15.070 Anish Gupta: Because if we get a big client, we want to see how that would impact the overall portfolio, the overall performance of our products, things like that. Does that make sense?
116 00:15:15.070 ⇒ 00:15:16.870 Demilade Agboola: Yeah. Yeah, it does. Yeah, it does.
117 00:15:16.870 ⇒ 00:15:23.410 Anish Gupta: Okay, cool. So, yeah, starting from there, I think my next step would be that
118 00:15:23.800 ⇒ 00:15:34.959 Anish Gupta: we want to be able to connect our client, to these individual services and these individual services, and we want to be able to gather up the data from these individual services. So to do that,
119 00:15:35.330 ⇒ 00:15:49.699 Anish Gupta: I’d like to implement some dashboard service that our client can directly interact with. So this would be something like a… like a Power BI, like an Excel, like, something like… one of these, like, core dashboard services that a user can visualize and…
120 00:15:50.510 ⇒ 00:15:52.230 Demilade Agboola: Sorry, sorry to interrupt,
121 00:15:52.620 ⇒ 00:15:59.670 Demilade Agboola: Before we get to the dashboard, how do we get the data into the warehouse? So, kind of like, again, like I said, this is just kind of system design.
122 00:15:59.840 ⇒ 00:16:05.480 Demilade Agboola: So how would you want to get the different… from the three different sources, how would you want to get it into the…
123 00:16:05.620 ⇒ 00:16:07.410 Demilade Agboola: cloud warehouse.
124 00:16:07.670 ⇒ 00:16:12.460 Anish Gupta: Oh, oh, oh, sorry, I thought… I thought we were going from the client side first to the database. Okay, okay, I think I missed something.
125 00:16:13.420 ⇒ 00:16:14.360 Demilade Agboola: But, like, upstream.
126 00:16:14.360 ⇒ 00:16:28.620 Anish Gupta: Okay, okay, cool, cool. Sounds good, sounds good. I think I was misunderstanding that in the question there, yeah. So, from our individual data sources, we’d probably want to create, sorry, individual, like, topics, individual Kafka topics for our services. I think…
127 00:16:28.760 ⇒ 00:16:37.940 Anish Gupta: a Kafka… Kafka topics could make the most sense here, because we want to create individual service to gather this information and be able to process this information
128 00:16:37.940 ⇒ 00:16:49.079 Anish Gupta: individually, based off of our different services. So for something like Stripe, if we’re connecting to our Stripe service, for example, we’d want to gather up our specific payment options and be able to transport that into our warehouse.
129 00:16:49.130 ⇒ 00:16:51.920 Anish Gupta: And store that into our warehouse.
130 00:16:52.070 ⇒ 00:16:57.599 Anish Gupta: per timeframes. I think, like, if we want to have a monthly payment option for our customer versus
131 00:16:57.600 ⇒ 00:17:14.419 Anish Gupta: something like a individually one-time purchase option for a particular service, this could be processed individually through this Kafka topic, and then this topic… this topic can operate on this, imported data, and then send that to our warehouse.
132 00:17:14.420 ⇒ 00:17:25.570 Anish Gupta: So I think we could set up 3 individual topics for our 3 individual data services, and have these running continuously as we’re inputting new data from our services here.
133 00:17:25.680 ⇒ 00:17:32.310 Anish Gupta: From our… yeah, from our Google Ads, from our Salesforce, and from our Stripe services. So to actually facilitate that
134 00:17:32.460 ⇒ 00:17:52.170 Anish Gupta: that, trans… transitions the… from our data sources to our warehouse, we can use Airflow for that. I think we can set up these individual pipelines here through Airflow, pass that into our Kafka topics, and then from Kafka topics, have our process data that goes into our warehouse with an additional, Airflow
135 00:17:52.180 ⇒ 00:17:54.680 Anish Gupta: Pipeline there, so we can have that, kind of.
136 00:17:54.890 ⇒ 00:18:10.470 Anish Gupta: that chain set up and running at batch intervals, just for simplicity’s sake, so we’re not continuously streaming data and potentially overload our system. Maybe, like, I think based off these individual services, about every 5 to 10 minutes makes the most sense.
137 00:18:10.540 ⇒ 00:18:25.820 Anish Gupta: Because then, because… so that if our server does end up going down for whatever reason, if one of these services ends up going down, then we don’t lose all of our data, we still have a batch that we can process and gather up, and then we can stop individual Kafka topics as need be.
138 00:18:25.900 ⇒ 00:18:27.350 Anish Gupta: For our service.
139 00:18:27.760 ⇒ 00:18:29.860 Demilade Agboola: So I think that makes the most sense.
140 00:18:30.340 ⇒ 00:18:34.270 Anish Gupta: yeah, to have, like, an Airflow and Kafka combination there.
141 00:18:34.420 ⇒ 00:18:42.500 Anish Gupta: And then to actually store our data, do we know the actual potential formatting of this data? I know,
142 00:18:42.750 ⇒ 00:18:53.649 Anish Gupta: I know since it’s 3 different services, the data is probably going to be in different formats, but do we have, like, an idea of a general structure that this data will follow, or is it kind of freeform, depending on the service?
143 00:18:54.540 ⇒ 00:19:02.970 Demilade Agboola: Let’s just say it would be structured it’ll be regular tables, structured…
144 00:19:03.600 ⇒ 00:19:09.309 Demilade Agboola: And we can assume, just for ease of simplicity, like, no JSON, so we’re not unnesting anything.
145 00:19:09.410 ⇒ 00:19:24.829 Anish Gupta: It’s just structured tables… Okay. Sounds good. So yeah, with the table approach, I would probably use something like Postgres, for our actual data warehouse, just because we can scale that up vertically as much as we need to, and handle… we can handle some pretty quick lookup.
146 00:19:24.910 ⇒ 00:19:38.419 Anish Gupta: For our individual, data structures. So we can have 3 different tables, for 3 different services, and, you know, based off of what the client needs, we can split that up further in our warehouse later on as need be. So that’s how to approach that first step.
147 00:19:39.170 ⇒ 00:19:39.830 Demilade Agboola: Okay.
148 00:19:40.370 ⇒ 00:19:47.120 Demilade Agboola: Alright, so let’s try and see… Quick questions…
149 00:19:47.870 ⇒ 00:19:53.110 Demilade Agboola: So, let’s kind of go into the modeling approach for, like, the data.
150 00:19:53.270 ⇒ 00:19:53.680 Anish Gupta: Hmm.
151 00:19:53.680 ⇒ 00:19:55.709 Demilade Agboola: How would you…
152 00:19:55.880 ⇒ 00:20:01.249 Demilade Agboola: want to model the data? Would you want to use the star schema? Would you want to use the normalized schema?
153 00:20:01.920 ⇒ 00:20:04.670 Demilade Agboola: And why would you choose one over the other?
154 00:20:06.030 ⇒ 00:20:10.300 Anish Gupta: Yeah, that’s a good question. Can I take a… maybe a minute or two to think about that?
155 00:20:10.600 ⇒ 00:20:11.540 Anish Gupta: I think that’s a…
156 00:20:11.540 ⇒ 00:20:12.150 Demilade Agboola: Sure.
157 00:20:12.150 ⇒ 00:20:15.180 Anish Gupta: Yeah, I kinda wanna, like, just walk through my potential process here.
158 00:20:15.860 ⇒ 00:20:17.050 Demilade Agboola: Okay, sure.
159 00:20:38.680 ⇒ 00:20:56.760 Anish Gupta: Okay, yeah, I think we can… I think I got an idea of what we would probably do. I think to approach these, since we have 3 different data sources, and it’s kind of organized for individual customers or individual, purchase options, I think a normalized approach makes more sense here, just because.
160 00:20:56.760 ⇒ 00:20:57.220 Demilade Agboola: Okay.
161 00:20:57.220 ⇒ 00:21:01.970 Anish Gupta: Since we have different data structured and different from different services.
162 00:21:01.990 ⇒ 00:21:14.379 Anish Gupta: we can kind of normalize our structure around our users, since we can have foreign keys for our user IDs be the same between all our services, since the same user will be paying for the service
163 00:21:14.400 ⇒ 00:21:30.489 Anish Gupta: in Salesforce, and they’ll be using Stripe to pay for said service, and they can also, you know, Google Ads, the individual user would be implementing whatever ads they want to get. So that can be used with multiple foreign keys there, most likely user ID, so we can normalize around that, and…
164 00:21:30.490 ⇒ 00:21:37.899 Anish Gupta: Kind of structure our data around this more normalized approach, so we can have a consistent formatting across our services.
165 00:21:37.940 ⇒ 00:21:41.719 Anish Gupta: And it makes it easier to store in Postgres individually with a more normal structure.
166 00:21:42.970 ⇒ 00:21:43.590 Demilade Agboola: Okay.
167 00:21:44.690 ⇒ 00:21:45.490 Anish Gupta: Alright.
168 00:21:49.400 ⇒ 00:21:52.129 Demilade Agboola: So, I think my follow-up question to that would be…
169 00:21:52.800 ⇒ 00:21:55.519 Demilade Agboola: You prefer a normalized approach, which, fine.
170 00:21:57.560 ⇒ 00:22:01.609 Demilade Agboola: My question to you would be, so… would you…
171 00:22:08.220 ⇒ 00:22:11.239 Demilade Agboola: Would you not be able to have some sort of…
172 00:22:13.640 ⇒ 00:22:19.580 Demilade Agboola: Like, when you use a normalized approach, you run the risk of potentially… having…
173 00:22:19.740 ⇒ 00:22:22.859 Demilade Agboola: Just one. Again, there’s no right or wrong, I’m just trying to pick up.
174 00:22:22.860 ⇒ 00:22:25.290 Anish Gupta: Yeah, yeah, yeah, of course, of course, of course, yeah.
175 00:22:25.290 ⇒ 00:22:28.990 Demilade Agboola: You run the risk of, because of a lack of…
176 00:22:30.990 ⇒ 00:22:35.630 Demilade Agboola: You run the risk of heavy queries, because you’re doing a lot of things in one big.
177 00:22:36.070 ⇒ 00:22:38.590 Anish Gupta: Right. Okay, yeah, that makes sense.
178 00:22:38.750 ⇒ 00:22:48.580 Demilade Agboola: how do you mitigate the risk, and how do you mitigate against that risk? Or how do you ensure… like, at what point do you decide, hey, I’m going to… like, I think it’s a bit better as a star schema?
179 00:22:49.580 ⇒ 00:22:50.190 Anish Gupta: Hmm.
180 00:22:50.810 ⇒ 00:22:58.609 Anish Gupta: Yeah, okay, I see what you mean. Yeah, because normalizing, you have the approach of… you have the reasoning, like, if you do, like, a select…
181 00:22:58.900 ⇒ 00:23:03.289 Anish Gupta: Huge select query, then you can potentially overload your system, and that can just…
182 00:23:03.490 ⇒ 00:23:07.740 Anish Gupta: Crash your system, or potentially just delay your… System by a lot.
183 00:23:07.930 ⇒ 00:23:11.659 Anish Gupta: That is a good point. I think…
184 00:23:12.390 ⇒ 00:23:22.140 Anish Gupta: And with the normalized approach, you’d want to probably use a different type of database to handle that, maybe not a Postgres approach then, maybe, like, a NoSQL approach, so you can handle larger queries, but…
185 00:23:22.220 ⇒ 00:23:36.239 Anish Gupta: if you… if you want to stick with a more traditional database, with a more traditional SQL database, since our data type is tables, and that usually works well with the SQL type, a star schema probably makes more sense then, yeah. So we can have…
186 00:23:36.390 ⇒ 00:23:38.889 Anish Gupta: larger amounts of queries, so I think it kind of depends on
187 00:23:39.010 ⇒ 00:23:57.620 Anish Gupta: it’s kind of a trade-off of if you want to handle… if you want to have a more standardized SQL approach, then Postgres would make more sense, and then we can use a star schema, but if you’re okay with using a more document-based structure, allow multiple different data types to enter our system, and kind of have that potential flexibility that a normalized approach would work there.
188 00:23:57.770 ⇒ 00:24:00.410 Anish Gupta: Because then we can have faster queries with a NoSQL approach.
189 00:24:01.620 ⇒ 00:24:07.409 Demilade Agboola: Okay, fair enough Have one last question, and then if you have any questions, you can ask me.
190 00:24:07.670 ⇒ 00:24:12.849 Demilade Agboola: So say we have… Models that have been built out.
191 00:24:13.760 ⇒ 00:24:15.400 Demilade Agboola: And… it’s…
192 00:24:15.540 ⇒ 00:24:21.829 Demilade Agboola: say, 400 million rows, billion rows, whatever, like, the amount doesn’t really matter. We just have a very large table.
193 00:24:22.470 ⇒ 00:24:31.460 Demilade Agboola: And it’s taking a long time for it to run. What would your debugging and optimization process look like? What would you be looking out for?
194 00:24:31.830 ⇒ 00:24:34.289 Demilade Agboola: And what would you be trying to see?
195 00:24:35.000 ⇒ 00:24:38.399 Demilade Agboola: Like, just assume the worst possible job was done.
196 00:24:38.730 ⇒ 00:24:39.200 Anish Gupta: Mmm.
197 00:24:39.200 ⇒ 00:24:41.939 Demilade Agboola: Trying to look at to optimize that query.
198 00:24:42.820 ⇒ 00:24:46.540 Anish Gupta: So this is a query gonna be done on our main data warehouse?
199 00:24:46.560 ⇒ 00:25:04.839 Anish Gupta: Right? Not in our… okay, okay, yeah. So, if we’re having such a long delay in our… from our query, the first thing I would look for is if there’s any just large branching select statements, because usually a quick workaround approach that people sometimes use for queries, they just do, like, a select star approach, and then they start to filter out that
200 00:25:04.970 ⇒ 00:25:21.340 Anish Gupta: individual query, but when you have 400 million-ish rows or whatever, something really huge, just doing an initial select all really can slow down your system. So there’s no pre-filtering there before you start doing your select statements. So I would look for, like, in our initial queries, if there are no
201 00:25:21.440 ⇒ 00:25:23.149 Anish Gupta: Pre-filtering steps.
202 00:25:23.510 ⇒ 00:25:28.259 Anish Gupta: we would look for that. I think another thing is if we’re carrying… if we’re…
203 00:25:28.660 ⇒ 00:25:32.089 Anish Gupta: Doing multiple joins with a potential single query.
204 00:25:32.140 ⇒ 00:25:46.190 Anish Gupta: these multiple joins, instead of using subqueries in our SQL approaches, if we’re doing a single joins, or single table joins at a time, we lead to a lot of potential overhead for no reason.
205 00:25:46.190 ⇒ 00:26:01.760 Anish Gupta: So, we can look at making subqueries and making smaller individual joins if needed. You know, again, applying some filtering as might need be, for doing, like, an inner join, for example, for a system, for a query. But this inner join is with the whole table, querying with the whole table, again.
206 00:26:01.760 ⇒ 00:26:10.239 Anish Gupta: that’s just gonna cause such a huge delay that the query is completely useless. So at that point, we’d probably start to look at, you know, if we can…
207 00:26:10.310 ⇒ 00:26:30.020 Anish Gupta: take our smaller individual tables and maybe join those together, instead of just querying our individual large database. Or we can maybe filter out, first, doing a subquery to filter out… query, filter out data types, data rows that are only from a certain timeframe, and then doing joins on those, for example.
208 00:26:30.020 ⇒ 00:26:34.359 Anish Gupta: That would be a… that would be a very simple debugging step right there.
209 00:26:34.670 ⇒ 00:26:43.909 Anish Gupta: So yeah, just, like, our subqueries are gonna be our main thing that we’ll be using for most likely, and then to reduce the number of joins as well.
210 00:26:44.120 ⇒ 00:26:56.600 Anish Gupta: Additionally, I think just, like, breaking up queries into smaller steps in general, just because a lot of times, I think with data engineering and a lot of data querying, data analysis, we want to try to do a work with the least amount of queries as possible, just because it…
211 00:26:57.520 ⇒ 00:27:17.490 Anish Gupta: can be faster sometimes, but if we are doing such large data, sometimes multiple smaller queries is just the right way to approach it. So breaking up our… breaking up our system into smaller, breaking up our main problem to smaller individual problems, and then storing these eventual results as smaller tables would also be a good approach.
212 00:27:17.520 ⇒ 00:27:18.560 Anish Gupta: Bro.
213 00:27:18.810 ⇒ 00:27:28.439 Anish Gupta: And… yeah, like, so I would mainly subquerying and reducing the amount of joins, and then breaking our problem into smaller individual components would be how I would approach that.
214 00:27:28.550 ⇒ 00:27:29.160 Anish Gupta: Mainly.
215 00:27:30.320 ⇒ 00:27:32.890 Demilade Agboola: That’s fair, I mean, those are really good strategies,
216 00:27:35.260 ⇒ 00:27:45.109 Demilade Agboola: Two things that could also just help, just random, would also be things around, like, indexing. Indexing would always be one of those things that really helps, especially if you’re able to…
217 00:27:45.420 ⇒ 00:27:52.279 Demilade Agboola: Because that helps with filtering, and if you also have, like, things around, like, distribution and keys, depending on… but that’s warehouse specific.
218 00:27:52.420 ⇒ 00:28:02.040 Demilade Agboola: If you have, like, distribution on keys, that would help with things like, your joins, so that would help significantly, and that can also speed up your process.
219 00:28:02.040 ⇒ 00:28:03.710 Anish Gupta: Right, yeah, that makes sense, okay.
220 00:28:03.900 ⇒ 00:28:04.600 Demilade Agboola: That’s fair.
221 00:28:05.590 ⇒ 00:28:12.430 Demilade Agboola: Okay, do you have any questions about, like, Brainforge? We have about… Couple more means left.
222 00:28:12.810 ⇒ 00:28:25.300 Anish Gupta: Yeah, yeah, I think just one question is, like, how do you enjoy your time working at Brainforge? I’ve talked to, I think, two people now about the company, and just asked about their experience, because I was curious. How’s it been for you so far?
223 00:28:26.010 ⇒ 00:28:30.599 Demilade Agboola: I mean, I’m… I become… I’m going to be one year here.
224 00:28:30.600 ⇒ 00:28:32.049 Anish Gupta: Oh, wow, nice, congrats.
225 00:28:32.050 ⇒ 00:28:33.449 Demilade Agboola: On 2nd of March, so…
226 00:28:33.450 ⇒ 00:28:34.469 Anish Gupta: Oh, wow, okay.
227 00:28:34.470 ⇒ 00:28:42.090 Demilade Agboola: That’s around the corner. I’ll say, like, I enjoy Brainforge. It’s consulting, so it’s fast-paced.
228 00:28:42.090 ⇒ 00:28:42.440 Anish Gupta: Amazing.
229 00:28:42.440 ⇒ 00:28:45.029 Demilade Agboola: Exposed to a lot of, like, clients.
230 00:28:45.470 ⇒ 00:28:48.819 Demilade Agboola: Different use cases, you have to think on your feet, you have to.
231 00:28:48.820 ⇒ 00:28:49.260 Anish Gupta: Beautiful.
232 00:28:49.260 ⇒ 00:28:51.520 Demilade Agboola: To figure out how you want to deliver.
233 00:28:51.730 ⇒ 00:28:55.360 Demilade Agboola: And also good client communication, basically.
234 00:28:55.570 ⇒ 00:28:57.860 Demilade Agboola: So you have to be able to know
235 00:28:58.050 ⇒ 00:29:08.239 Demilade Agboola: what… because you can deliver stuff, and the client is unhappy with what you delivered, even though you worked and broke your back making it, so you have to be able to, like, balance all of that together.
236 00:29:08.370 ⇒ 00:29:16.249 Demilade Agboola: And I think, so far it’s been a pretty good experience, like, balancing all of that together, and being able to work with a team that
237 00:29:16.730 ⇒ 00:29:23.069 Demilade Agboola: People on the team are very hardworking, very supportive, and also very open to each other.
238 00:29:23.830 ⇒ 00:29:32.109 Anish Gupta: Okay, that’s really cool. And I guess it’s, like, with the client, client consulting and the consulting approach in general, yeah.
239 00:29:32.110 ⇒ 00:29:46.920 Anish Gupta: what has been your… I guess, like, what… so, like, your typical workflow is… is it more… way more on the client meeting side, or is it, like, a couple client meetings, and then you dive into the technical work there? How does your… how does it typically work at Brainforge? Or is it depending on the client?
240 00:29:47.890 ⇒ 00:29:52.260 Demilade Agboola: yes and no, but… because I say no because it depends. So, someone like Otam.
241 00:29:52.550 ⇒ 00:30:04.469 Demilade Agboola: but the CEOs are very busy, so they have a lot of meetings, right? For the engineers, we try our best to ensure that your meetings happen early in the morning, and then so you can have the afternoon to code, and so, like, you don’t have too much…
242 00:30:04.710 ⇒ 00:30:08.450 Demilade Agboola: Interruption with your meetings.
243 00:30:08.450 ⇒ 00:30:09.210 Anish Gupta: I see. Okay.
244 00:30:09.210 ⇒ 00:30:17.630 Demilade Agboola: To be fair, like, I’m… as you rise higher up within the organizations, you might have a bit more… you might have a couple more meetings here and there.
245 00:30:18.310 ⇒ 00:30:24.669 Demilade Agboola: Generally, that’s… that would be the approach. So you have stand-up in the morning, you would have,
246 00:30:24.850 ⇒ 00:30:26.389 Demilade Agboola: Summup is, like, 30 minutes.
247 00:30:27.940 ⇒ 00:30:32.360 Demilade Agboola: And then you might have, like, one or two specific client meetings to attend in a week.
248 00:30:32.910 ⇒ 00:30:35.639 Demilade Agboola: Not usually, like, one, to be fair.
249 00:30:35.810 ⇒ 00:30:37.740 Anish Gupta: Okay. Not bad.
250 00:30:37.740 ⇒ 00:30:39.040 Demilade Agboola: And yeah, so it’s…
251 00:30:39.160 ⇒ 00:30:41.919 Demilade Agboola: But you might be on multiple clients, so that’s why you might have two of those.
252 00:30:41.920 ⇒ 00:30:51.430 Anish Gupta: Yeah, I see, I see, I see. So, so typically, do you get assigned to, like, probably, like, 2 to 3 clients a person, or is it depending on the weeks, sometimes more, sometimes less?
253 00:30:51.630 ⇒ 00:30:54.829 Demilade Agboola: So usually it’s 2 to 3. Very, very few, like…
254 00:30:55.240 ⇒ 00:30:58.910 Demilade Agboola: You coming in will probably be, like, 1, and then you’ll ramp up to 2.
255 00:30:58.910 ⇒ 00:31:00.090 Anish Gupta: Yeah, that makes sense.
256 00:31:00.090 ⇒ 00:31:10.719 Demilade Agboola: based off, like, ops and the feedback they’re getting from you, they might then decide 3. So I get 3 because, like, you know, I’m… in the team, I’m more senior, so I kind of…
257 00:31:11.370 ⇒ 00:31:12.769 Demilade Agboola: For a lot of other things as well.
258 00:31:12.770 ⇒ 00:31:13.300 Anish Gupta: Hmm.
259 00:31:13.570 ⇒ 00:31:16.549 Demilade Agboola: but, yeah, most people are, too.
260 00:31:17.110 ⇒ 00:31:18.620 Anish Gupta: Okay, pretty cool.
261 00:31:19.220 ⇒ 00:31:25.570 Anish Gupta: Yeah, I think that’s all the questions I have on my end. Appreciate taking the time to talk, it was a really, really fun conversation.
262 00:31:25.570 ⇒ 00:31:26.680 Demilade Agboola: Easier? Yeah.
263 00:31:26.680 ⇒ 00:31:27.650 Anish Gupta: Welcome to you, too.
264 00:31:27.650 ⇒ 00:31:28.950 Demilade Agboola: Alright then. Bye.
265 00:31:28.950 ⇒ 00:31:30.089 Anish Gupta: Alright, see you. Bye.