Meeting Title: Brainforge Interview w- Demilade Date: 2026-02-25 Meeting participants: Kaela Gallagher, Anish Gupta, Demilade Agboola


WEBVTT

1 00:00:14.750 00:00:15.740 Anish Gupta: Hello.

2 00:00:20.660 00:00:32.509 Kaela Gallagher: Hi, Anish. It’s Ayla. I, am kind of on the recruiting side of the house, so I’m not going to be interviewing you today, just sitting in and listening.

3 00:00:32.790 00:00:34.220 Anish Gupta: Sure, sure, sounds good. Okay.

4 00:00:38.900 00:00:42.849 Demilade Agboola: Hi, Anish. Hey, hi. My name is Dimlady.

5 00:00:42.990 00:00:44.719 Demilade Agboola: Nice to meet you.

6 00:00:45.010 00:00:46.100 Anish Gupta: Nice to meet you too.

7 00:00:46.890 00:00:49.899 Anish Gupta: So, I know you’ve had one interview already.

8 00:00:49.970 00:00:55.689 Demilade Agboola: And so the context of this interview will just be, like, us talking through…

9 00:00:55.980 00:00:59.800 Demilade Agboola: Like, technical things, and just kind of having an understanding of how you…

10 00:01:00.740 00:01:06.830 Demilade Agboola: Problem solve, and how you think about these technical things as we, walk through the interview.

11 00:01:07.010 00:01:08.390 Anish Gupta: Okay, yeah, sounds good.

12 00:01:08.780 00:01:19.819 Demilade Agboola: I’m aware you might have some questions, or some things about how Brainforge or anything, so towards the end, I will also give you room to ask any questions, or, you know, anything that you might want to ask about…

13 00:01:20.170 00:01:22.329 Demilade Agboola: Bring Forge and just, like, the company.

14 00:01:22.660 00:01:24.080 Anish Gupta: Sure, sounds good, yeah, yeah.

15 00:01:24.690 00:01:33.539 Demilade Agboola: Okay, I think we can start off with the basics, which would just be, like, please, can you tell me about yourself and, like, your work experience and technical experience?

16 00:01:33.900 00:01:40.399 Anish Gupta: Yeah, yeah. So, my name’s Anish. I’m a software engineer currently at Juniper Networks. I’ve been there for about…

17 00:01:40.680 00:01:46.019 Anish Gupta: 2 years now, or a year and a half now. I graduated from UC Berkeley before that.

18 00:01:46.180 00:02:04.820 Anish Gupta: With the data science and chemical engineering double major. So, I’ve been… I’ve been working at Juniper as an AIML engineer, a lot more on the… on the, AI side more so than ML, but, I’ve been, you know, been working on our main product, our… which is, like,

19 00:02:04.860 00:02:23.720 Anish Gupta: an agent to help our users, our customers, to be able to track and predict network traffic, because Juniper’s working as a routing-focused company, working with Wi-Fi routers and modems, things like that. So, this product is so we can automatically track network traffic and predict errors before they occur, and then

20 00:02:23.950 00:02:33.099 Anish Gupta: start to allocate and divert network traffic as needed to try to prevent network downtimes and improve overall customer experience, so…

21 00:02:33.170 00:02:48.999 Anish Gupta: Yeah, most of my work recently on that end has been on some model development. I’ve been working a lot with managing our current ETL pipelines that we have set up for this system, because we’re transferring a lot of data at once, obviously, so working with a lot of

22 00:02:49.000 00:02:53.209 Anish Gupta: Airflow, working with a lot of Kafka, topics as well, so…

23 00:02:53.210 00:03:03.459 Anish Gupta: Yeah, it’s been a pretty extensive and pretty varied work experience so far. Pretty customer-facing as well, since we’re working with our customers that are using these routers.

24 00:03:03.650 00:03:08.609 Anish Gupta: So, you know, like, getting a lot of experience working with clients and client meetings and things like that. So, yeah.

25 00:03:09.270 00:03:10.860 Demilade Agboola: Okay, sounds good, sounds good.

26 00:03:13.470 00:03:26.710 Demilade Agboola: So, given your experience, what would you say the most complex part, or the most complex data pipeline you’ve worked on was, and what made it complex, and how did you, like, handle and navigate the complexity?

27 00:03:27.160 00:03:45.790 Anish Gupta: Yeah, yeah, sure. So, I think the current… the current project I’m working on is probably my most complex, type of… type of pipeline that I’ve been working with. So, since we’re using these network devices to be able to track, all these different types of data, right, since it’s not just… it’s not just data, like.

28 00:03:45.940 00:03:56.549 Anish Gupta: you know, like, information received here and information outputted there. We’re receiving a lot of status of individual devices, overall device summaries,

29 00:03:56.650 00:04:16.039 Anish Gupta: constant time… time series dependent logs. So, a lot of my work recently has been actually using these individual data… data piece and data types to try to create our actual… we call them entity types, but they’re different error categories, for this… for this error tracking and error prediction. So.

30 00:04:16.070 00:04:31.369 Anish Gupta: For example, if we see something like a high temperature reading, or a group of routers in a similar geographical location giving a higher CPU outputs than normal for that type of device, we need to automatically translate that as

31 00:04:31.370 00:04:48.710 Anish Gupta: potential high CPU error, and then also try to correlate other errors that are given at the same time from that device to try to predict what the overall problem is. So, not just on the individual device side, but on our overall, like, grand system, it’s been a lot of

32 00:04:48.820 00:05:08.130 Anish Gupta: trying to predict, trying to, correlate individual error types into one, specific recommendation. So that’s been what I’ve been working on so far, and it’s a lot of multi-threaded work as well, because this has to scale up very well for large amounts of devices and large amounts of networks. So the scalability definitely has been the most challenging aspect there.

33 00:05:09.240 00:05:14.720 Demilade Agboola: And how do you handle the scalability? What sort of architecture do you use?

34 00:05:15.030 00:05:17.629 Demilade Agboola: And how do you just handle that sort of volume?

35 00:05:18.040 00:05:31.510 Anish Gupta: Yeah, yeah, for… for the scalability, we’ve been working on trying to develop, like, a multi… multi-container approach. Since we’re using these individual, airflow… airflow,

36 00:05:31.530 00:05:32.860 Anish Gupta: pipelines.

37 00:05:32.860 00:05:51.619 Anish Gupta: and these individual topics, these Kafka topics, to stream our data. We’re trying to run a multi-server approach and, like, horizontally scaling our system, not just vertically scaling our system, so developing multiple servers running simultaneously, working on implementing load balancers for our system.

38 00:05:51.680 00:05:59.399 Anish Gupta: We’ve been using a lot of Redis caches as well for that work, so when we get more frequent errors, like temperature readings or CPU

39 00:05:59.400 00:06:16.610 Anish Gupta: issues, we can cache those more commonly read errors and more commonly stored, error types, and then we can pull them up very faster and implement that sort of faster caching approach. So that’s been a very interesting field that we’ve been working on more recently. More interesting technology we’ve been working recently with.

40 00:06:17.360 00:06:18.819 Demilade Agboola: Okay, that’s fair, that’s fair.

41 00:06:19.000 00:06:31.329 Demilade Agboola: What would you… what would you… I mean, I know it tends to be more extensive than what you can always list, but I’m just curious what your… what you would consider your tech stack to be, so what tools, and what, like…

42 00:06:31.600 00:06:35.259 Demilade Agboola: Technologies would you say you have, on lock?

43 00:06:35.610 00:06:39.420 Anish Gupta: Yeah, so I think for me, like, my… like, my, my…

44 00:06:39.420 00:07:02.800 Anish Gupta: My main language of experience is Python. It’s what I’ve been using most of my… most of my coding and most of my engineering career, but I’ve been more recently working a lot with Go, since our network devices and our network tracking has been working with, Golang, so I’ve been using Go there. Java has always been a more developed language I’ve been working with. And then, on the more tech stack side.

45 00:07:02.900 00:07:17.650 Anish Gupta: Been using a lot of Docker containers. S3 always is always, like, a recommended, and obviously something I’ve been working with a lot. We’re using our S3 to store all of our individual data files. Airflow, like I mentioned before, and Kafka topics as well.

46 00:07:18.050 00:07:24.400 Anish Gupta: Also, we’ve been using a lot of PyTorch for our model prediction and model development.

47 00:07:24.620 00:07:41.919 Anish Gupta: So, a lot of, like, PyTorch, PySpark, that kind of implementation there. Since with these, with these data types, it’s a lot of data storage, so we’re using PySpark instead of, like, instead of Pandas for our data manipulation, just for faster and larger scalability.

48 00:07:41.970 00:07:45.859 Anish Gupta: Additionally, I’ve been working with a lot of

49 00:07:46.380 00:07:53.419 Anish Gupta: different… sorry, a lot of different… I mentioned Docker already, right? What’d I miss.

50 00:07:53.650 00:08:13.160 Anish Gupta: Yeah, I’ve been doing a lot of model work with, like, libraries such as, you know, Seabourn, Pandas, NumPy, like, all these more commonly used li- these Python libraries, TensorFlow, obviously, for machine learning as well. So, yeah, and gotten some work on, like, some more popular NLP libraries as well for, not just for…

51 00:08:13.220 00:08:23.289 Anish Gupta: our normal natural language processing, but since we want to convert these error types to be more user-friendly and more user-usable, I’ve been using a lot of,

52 00:08:23.770 00:08:27.420 Anish Gupta: like, a lane chain for those AI agent works, and…

53 00:08:27.670 00:08:39.830 Anish Gupta: Yeah, LangChain’s been my agent work of, like, my workflow there, but we’ve been using that more recently for our customer-facing error modeling and error detection as well.

54 00:08:40.690 00:08:42.950 Demilade Agboola: That’s fair, that’s a pretty good,

55 00:08:44.600 00:08:51.289 Demilade Agboola: set of, tools to know and have. Just one question, because I noticed you didn’t mention it, I just wanted to be sure if

56 00:08:51.700 00:08:55.439 Demilade Agboola: Misha and Ari, if it was just… do you… do you write SQL?

57 00:08:56.050 00:08:58.650 Anish Gupta: Oh, yeah, yeah, yeah, oh, I forgot to sign with Sarah, yeah.

58 00:08:58.650 00:09:00.200 Demilade Agboola: Yeah, just wanted to be sure, because that…

59 00:09:00.200 00:09:05.210 Anish Gupta: Yeah, for sure, for sure, for sure. Yeah. SQL, that’s good.

60 00:09:06.830 00:09:09.279 Demilade Agboola: Alright, so let’s… let’s come up with,

61 00:09:11.410 00:09:21.210 Demilade Agboola: we’re going to start, like, system design, so it’s just kind of, we’re going to think through a system. Just kind of want to have this… this would be kind of the kind of thing you would come across.

62 00:09:21.880 00:09:22.520 Anish Gupta: Sure.

63 00:09:23.160 00:09:30.189 Demilade Agboola: So the idea is I just want to hear and see how you would walk through solving this, system design.

64 00:09:30.190 00:09:31.999 Anish Gupta: Okay, that sounds good.

65 00:09:32.650 00:09:36.150 Demilade Agboola: Alright, so let’s say we have a client who has a…

66 00:09:36.550 00:09:39.389 Demilade Agboola: Daily revenue marks that they want to get built out.

67 00:09:39.530 00:09:42.820 Demilade Agboola: Right? So they want to see their revenue on, every day.

68 00:09:43.780 00:09:48.830 Demilade Agboola: And they have 3 main data sources. So they have Salesforce, Yeah, Stripe.

69 00:09:49.120 00:09:54.090 Demilade Agboola: If you don’t know any of this or what’s going on there, you can always ask clarifying questions.

70 00:09:54.090 00:09:54.930 Anish Gupta: Yeah, that’s cool.

71 00:09:54.930 00:09:55.530 Demilade Agboola: with us.

72 00:09:55.550 00:09:58.749 Anish Gupta: So we have Salesforce, they have Stripe, and they have Google Ads.

73 00:09:59.560 00:10:01.469 Demilade Agboola: And so they want to gather into a…

74 00:10:02.260 00:10:06.769 Demilade Agboola: A cloud warehouse, any one of your choice, you don’t need it, but no restricting it here.

75 00:10:08.450 00:10:15.899 Demilade Agboola: Off the top of your head, and you can take some seconds to think about it, doesn’t have to be, like, instantaneous, how would you design this solution?

76 00:10:17.340 00:10:34.539 Anish Gupta: Yeah, so I just want to kind of clarify our question here. So we want to be able to gather up our data from these individual services and store them in our data warehouse just for employee lookup, or is it for user… so, like, users can look at their information through a dashboard? Like, what is our main, end use case here?

77 00:10:35.270 00:10:43.679 Demilade Agboola: Great question. These are the kind of things we want people to ask. Okay, so the end goal is we want to have our C-suite, so, like, the CEOs and the…

78 00:10:43.960 00:10:47.969 Demilade Agboola: CFOs, they need a dashboard to be looking at every day.

79 00:10:47.970 00:10:50.010 Anish Gupta: Cool. Okay. Sounds good. Okay.

80 00:10:50.130 00:11:09.479 Anish Gupta: So, that would most likely affect how we’re gonna actually eventually store our data, so we can make it easier to translate into a dashboard, or just a more visually understanding output there. So, yeah. And on the client side, so in this case, our client is our C-suite here, is that right?

81 00:11:09.480 00:11:10.960 Demilade Agboola: Yeah, the stakeholders, yeah.

82 00:11:10.960 00:11:12.730 Anish Gupta: Stakeholders, yeah, okay, cool.

83 00:11:13.630 00:11:21.760 Anish Gupta: Cool, cool. And, for our main data sources, so these are… these are gathering just customer…

84 00:11:22.030 00:11:33.110 Anish Gupta: customer results of our individuals, like, Stripe for our payments, Salesforce, I guess, for our individual services that the customers are going to be paying for, I believe? Is that… I’m understanding that correctly?

85 00:11:33.250 00:11:34.560 Anish Gupta: Would that be a Salesforce historian.

86 00:11:34.560 00:11:39.779 Demilade Agboola: We also have, like, information around, like, the accounts, the team members within the accounts.

87 00:11:39.780 00:11:40.319 Anish Gupta: Yeah, yeah.

88 00:11:40.320 00:11:47.229 Demilade Agboola: different stages that they have to go through to, for instance, pay, or like… So you can kind of see if you have a flow, for instance.

89 00:11:49.000 00:11:56.439 Demilade Agboola: where the people are dropping off, and where people finally make the payments. You can also use that if you want to model, like, other stuff around that.

90 00:11:56.820 00:12:04.610 Anish Gupta: Okay, cool, cool, that makes sense. So yeah, I think… so, one thing I would start off with is just…

91 00:12:06.340 00:12:24.619 Anish Gupta: kind of trying to figure out where our individual users, will be kind of grouped in, so we can try to quantize our main user groups here. So, our customers that this group will be looking at, that the clients want to analyze, would most likely be grouped as

92 00:12:26.240 00:12:45.709 Anish Gupta: like, people that are going to be buying our individual separate services and their main account storages, but do we know if there is, like, there are more details on specific user groups or buckets that we can group our users into? Or the users that the customer might be interested in, or is that just up to our individual analysis?

93 00:12:45.710 00:12:50.079 Demilade Agboola: So let’s not get too deep into the data of it.

94 00:12:50.970 00:12:54.249 Demilade Agboola: Let’s think more of the system of it.

95 00:12:54.430 00:12:55.270 Anish Gupta: Okay, okay.

96 00:12:55.520 00:12:57.750 Demilade Agboola: Don’t focus on, like, the data.

97 00:12:57.750 00:12:58.400 Anish Gupta: Yeah, yeah.

98 00:12:58.630 00:13:01.399 Demilade Agboola: Just focus on the system.

99 00:13:01.550 00:13:06.780 Demilade Agboola: How would you go about designing and building that system? What are the parameters you’d be looking out for?

100 00:13:07.050 00:13:09.240 Demilade Agboola: And how would you ensure that

101 00:13:10.330 00:13:15.870 Demilade Agboola: Whatever system you’re building will meet the needs Of the stakeholders.

102 00:13:17.180 00:13:23.749 Demilade Agboola: So that’s just basically… so let’s not go too granular, just kind of stay high level, but we’re talking about the systems that you’ll be…

103 00:13:24.980 00:13:39.480 Anish Gupta: Great, okay, okay, that makes sense then. Okay, so… so we can maybe start to look at our, like, our main requirements that we’re going to be looking for here, which seems to be that we want to gather up our individual, gather up all our data into a visual dashboard.

104 00:13:39.480 00:13:56.509 Anish Gupta: Most likely split it up into different user groups and different user types, per individual service. So, like, if you’re looking at Salesforce, you want to see specific types of accounts. If you’re looking at Stripe, you want to see specific payment options and payment plans that the users might have set up, things like that. So, different user grouping.

105 00:13:56.510 00:14:05.290 Anish Gupta: And we want to make sure that this is a pretty low latency system, I’m assuming, because our clients will want to be able to quickly look up information and make sure it’s readily available, so…

106 00:14:05.470 00:14:08.440 Anish Gupta: Keep… just keeping in mind of that.

107 00:14:08.610 00:14:11.490 Demilade Agboola: When you say low latency, are you referring to latency

108 00:14:11.870 00:14:16.789 Demilade Agboola: Data latency, or are you referring to… the dashboard latency.

109 00:14:17.680 00:14:21.880 Anish Gupta: Oh, I mean, I mean database latency, right? Because as…

110 00:14:21.990 00:14:33.200 Anish Gupta: as a customer gets updated on a payment plan, we want to be able to quickly reflect that into our database so that we can see it on our end, relatively quickly. But…

111 00:14:33.600 00:14:38.689 Anish Gupta: Speaking of the low latencies, on the latency side, I believe what most likely would be more important

112 00:14:38.800 00:14:48.489 Anish Gupta: is… we want to make sure that we have more relevant information, more so than, I guess, more quickly gathered information, so consistency might be more important there, actually, than latency, now that I’m thinking about it.

113 00:14:48.770 00:14:50.060 Anish Gupta: Because we want to have…

114 00:14:50.400 00:15:04.220 Anish Gupta: Does that make sense? I think I’m thinking on the client side, we want to be able to gather not just necessarily the most up-to-date… not just… sorry… not gathering up the information as fast as possible, but gathering the most relevant and most up-to-date information.

115 00:15:04.740 00:15:15.070 Anish Gupta: Because if we get a big client, we want to see how that would impact the overall portfolio, the overall performance of our products, things like that. Does that make sense?

116 00:15:15.070 00:15:16.870 Demilade Agboola: Yeah. Yeah, it does. Yeah, it does.

117 00:15:16.870 00:15:23.410 Anish Gupta: Okay, cool. So, yeah, starting from there, I think my next step would be that

118 00:15:23.800 00:15:34.959 Anish Gupta: we want to be able to connect our client, to these individual services and these individual services, and we want to be able to gather up the data from these individual services. So to do that,

119 00:15:35.330 00:15:49.699 Anish Gupta: I’d like to implement some dashboard service that our client can directly interact with. So this would be something like a… like a Power BI, like an Excel, like, something like… one of these, like, core dashboard services that a user can visualize and…

120 00:15:50.510 00:15:52.230 Demilade Agboola: Sorry, sorry to interrupt,

121 00:15:52.620 00:15:59.670 Demilade Agboola: Before we get to the dashboard, how do we get the data into the warehouse? So, kind of like, again, like I said, this is just kind of system design.

122 00:15:59.840 00:16:05.480 Demilade Agboola: So how would you want to get the different… from the three different sources, how would you want to get it into the…

123 00:16:05.620 00:16:07.410 Demilade Agboola: cloud warehouse.

124 00:16:07.670 00:16:12.460 Anish Gupta: Oh, oh, oh, sorry, I thought… I thought we were going from the client side first to the database. Okay, okay, I think I missed something.

125 00:16:13.420 00:16:14.360 Demilade Agboola: But, like, upstream.

126 00:16:14.360 00:16:28.620 Anish Gupta: Okay, okay, cool, cool. Sounds good, sounds good. I think I was misunderstanding that in the question there, yeah. So, from our individual data sources, we’d probably want to create, sorry, individual, like, topics, individual Kafka topics for our services. I think…

127 00:16:28.760 00:16:37.940 Anish Gupta: a Kafka… Kafka topics could make the most sense here, because we want to create individual service to gather this information and be able to process this information

128 00:16:37.940 00:16:49.079 Anish Gupta: individually, based off of our different services. So for something like Stripe, if we’re connecting to our Stripe service, for example, we’d want to gather up our specific payment options and be able to transport that into our warehouse.

129 00:16:49.130 00:16:51.920 Anish Gupta: And store that into our warehouse.

130 00:16:52.070 00:16:57.599 Anish Gupta: per timeframes. I think, like, if we want to have a monthly payment option for our customer versus

131 00:16:57.600 00:17:14.419 Anish Gupta: something like a individually one-time purchase option for a particular service, this could be processed individually through this Kafka topic, and then this topic… this topic can operate on this, imported data, and then send that to our warehouse.

132 00:17:14.420 00:17:25.570 Anish Gupta: So I think we could set up 3 individual topics for our 3 individual data services, and have these running continuously as we’re inputting new data from our services here.

133 00:17:25.680 00:17:32.310 Anish Gupta: From our… yeah, from our Google Ads, from our Salesforce, and from our Stripe services. So to actually facilitate that

134 00:17:32.460 00:17:52.170 Anish Gupta: that, trans… transitions the… from our data sources to our warehouse, we can use Airflow for that. I think we can set up these individual pipelines here through Airflow, pass that into our Kafka topics, and then from Kafka topics, have our process data that goes into our warehouse with an additional, Airflow

135 00:17:52.180 00:17:54.680 Anish Gupta: Pipeline there, so we can have that, kind of.

136 00:17:54.890 00:18:10.470 Anish Gupta: that chain set up and running at batch intervals, just for simplicity’s sake, so we’re not continuously streaming data and potentially overload our system. Maybe, like, I think based off these individual services, about every 5 to 10 minutes makes the most sense.

137 00:18:10.540 00:18:25.820 Anish Gupta: Because then, because… so that if our server does end up going down for whatever reason, if one of these services ends up going down, then we don’t lose all of our data, we still have a batch that we can process and gather up, and then we can stop individual Kafka topics as need be.

138 00:18:25.900 00:18:27.350 Anish Gupta: For our service.

139 00:18:27.760 00:18:29.860 Demilade Agboola: So I think that makes the most sense.

140 00:18:30.340 00:18:34.270 Anish Gupta: yeah, to have, like, an Airflow and Kafka combination there.

141 00:18:34.420 00:18:42.500 Anish Gupta: And then to actually store our data, do we know the actual potential formatting of this data? I know,

142 00:18:42.750 00:18:53.649 Anish Gupta: I know since it’s 3 different services, the data is probably going to be in different formats, but do we have, like, an idea of a general structure that this data will follow, or is it kind of freeform, depending on the service?

143 00:18:54.540 00:19:02.970 Demilade Agboola: Let’s just say it would be structured it’ll be regular tables, structured…

144 00:19:03.600 00:19:09.309 Demilade Agboola: And we can assume, just for ease of simplicity, like, no JSON, so we’re not unnesting anything.

145 00:19:09.410 00:19:24.829 Anish Gupta: It’s just structured tables… Okay. Sounds good. So yeah, with the table approach, I would probably use something like Postgres, for our actual data warehouse, just because we can scale that up vertically as much as we need to, and handle… we can handle some pretty quick lookup.

146 00:19:24.910 00:19:38.419 Anish Gupta: For our individual, data structures. So we can have 3 different tables, for 3 different services, and, you know, based off of what the client needs, we can split that up further in our warehouse later on as need be. So that’s how to approach that first step.

147 00:19:39.170 00:19:39.830 Demilade Agboola: Okay.

148 00:19:40.370 00:19:47.120 Demilade Agboola: Alright, so let’s try and see… Quick questions…

149 00:19:47.870 00:19:53.110 Demilade Agboola: So, let’s kind of go into the modeling approach for, like, the data.

150 00:19:53.270 00:19:53.680 Anish Gupta: Hmm.

151 00:19:53.680 00:19:55.709 Demilade Agboola: How would you…

152 00:19:55.880 00:20:01.249 Demilade Agboola: want to model the data? Would you want to use the star schema? Would you want to use the normalized schema?

153 00:20:01.920 00:20:04.670 Demilade Agboola: And why would you choose one over the other?

154 00:20:06.030 00:20:10.300 Anish Gupta: Yeah, that’s a good question. Can I take a… maybe a minute or two to think about that?

155 00:20:10.600 00:20:11.540 Anish Gupta: I think that’s a…

156 00:20:11.540 00:20:12.150 Demilade Agboola: Sure.

157 00:20:12.150 00:20:15.180 Anish Gupta: Yeah, I kinda wanna, like, just walk through my potential process here.

158 00:20:15.860 00:20:17.050 Demilade Agboola: Okay, sure.

159 00:20:38.680 00:20:56.760 Anish Gupta: Okay, yeah, I think we can… I think I got an idea of what we would probably do. I think to approach these, since we have 3 different data sources, and it’s kind of organized for individual customers or individual, purchase options, I think a normalized approach makes more sense here, just because.

160 00:20:56.760 00:20:57.220 Demilade Agboola: Okay.

161 00:20:57.220 00:21:01.970 Anish Gupta: Since we have different data structured and different from different services.

162 00:21:01.990 00:21:14.379 Anish Gupta: we can kind of normalize our structure around our users, since we can have foreign keys for our user IDs be the same between all our services, since the same user will be paying for the service

163 00:21:14.400 00:21:30.489 Anish Gupta: in Salesforce, and they’ll be using Stripe to pay for said service, and they can also, you know, Google Ads, the individual user would be implementing whatever ads they want to get. So that can be used with multiple foreign keys there, most likely user ID, so we can normalize around that, and…

164 00:21:30.490 00:21:37.899 Anish Gupta: Kind of structure our data around this more normalized approach, so we can have a consistent formatting across our services.

165 00:21:37.940 00:21:41.719 Anish Gupta: And it makes it easier to store in Postgres individually with a more normal structure.

166 00:21:42.970 00:21:43.590 Demilade Agboola: Okay.

167 00:21:44.690 00:21:45.490 Anish Gupta: Alright.

168 00:21:49.400 00:21:52.129 Demilade Agboola: So, I think my follow-up question to that would be…

169 00:21:52.800 00:21:55.519 Demilade Agboola: You prefer a normalized approach, which, fine.

170 00:21:57.560 00:22:01.609 Demilade Agboola: My question to you would be, so… would you…

171 00:22:08.220 00:22:11.239 Demilade Agboola: Would you not be able to have some sort of…

172 00:22:13.640 00:22:19.580 Demilade Agboola: Like, when you use a normalized approach, you run the risk of potentially… having…

173 00:22:19.740 00:22:22.859 Demilade Agboola: Just one. Again, there’s no right or wrong, I’m just trying to pick up.

174 00:22:22.860 00:22:25.290 Anish Gupta: Yeah, yeah, yeah, of course, of course, of course, yeah.

175 00:22:25.290 00:22:28.990 Demilade Agboola: You run the risk of, because of a lack of…

176 00:22:30.990 00:22:35.630 Demilade Agboola: You run the risk of heavy queries, because you’re doing a lot of things in one big.

177 00:22:36.070 00:22:38.590 Anish Gupta: Right. Okay, yeah, that makes sense.

178 00:22:38.750 00:22:48.580 Demilade Agboola: how do you mitigate the risk, and how do you mitigate against that risk? Or how do you ensure… like, at what point do you decide, hey, I’m going to… like, I think it’s a bit better as a star schema?

179 00:22:49.580 00:22:50.190 Anish Gupta: Hmm.

180 00:22:50.810 00:22:58.609 Anish Gupta: Yeah, okay, I see what you mean. Yeah, because normalizing, you have the approach of… you have the reasoning, like, if you do, like, a select…

181 00:22:58.900 00:23:03.289 Anish Gupta: Huge select query, then you can potentially overload your system, and that can just…

182 00:23:03.490 00:23:07.740 Anish Gupta: Crash your system, or potentially just delay your… System by a lot.

183 00:23:07.930 00:23:11.659 Anish Gupta: That is a good point. I think…

184 00:23:12.390 00:23:22.140 Anish Gupta: And with the normalized approach, you’d want to probably use a different type of database to handle that, maybe not a Postgres approach then, maybe, like, a NoSQL approach, so you can handle larger queries, but…

185 00:23:22.220 00:23:36.239 Anish Gupta: if you… if you want to stick with a more traditional database, with a more traditional SQL database, since our data type is tables, and that usually works well with the SQL type, a star schema probably makes more sense then, yeah. So we can have…

186 00:23:36.390 00:23:38.889 Anish Gupta: larger amounts of queries, so I think it kind of depends on

187 00:23:39.010 00:23:57.620 Anish Gupta: it’s kind of a trade-off of if you want to handle… if you want to have a more standardized SQL approach, then Postgres would make more sense, and then we can use a star schema, but if you’re okay with using a more document-based structure, allow multiple different data types to enter our system, and kind of have that potential flexibility that a normalized approach would work there.

188 00:23:57.770 00:24:00.410 Anish Gupta: Because then we can have faster queries with a NoSQL approach.

189 00:24:01.620 00:24:07.409 Demilade Agboola: Okay, fair enough Have one last question, and then if you have any questions, you can ask me.

190 00:24:07.670 00:24:12.849 Demilade Agboola: So say we have… Models that have been built out.

191 00:24:13.760 00:24:15.400 Demilade Agboola: And… it’s…

192 00:24:15.540 00:24:21.829 Demilade Agboola: say, 400 million rows, billion rows, whatever, like, the amount doesn’t really matter. We just have a very large table.

193 00:24:22.470 00:24:31.460 Demilade Agboola: And it’s taking a long time for it to run. What would your debugging and optimization process look like? What would you be looking out for?

194 00:24:31.830 00:24:34.289 Demilade Agboola: And what would you be trying to see?

195 00:24:35.000 00:24:38.399 Demilade Agboola: Like, just assume the worst possible job was done.

196 00:24:38.730 00:24:39.200 Anish Gupta: Mmm.

197 00:24:39.200 00:24:41.939 Demilade Agboola: Trying to look at to optimize that query.

198 00:24:42.820 00:24:46.540 Anish Gupta: So this is a query gonna be done on our main data warehouse?

199 00:24:46.560 00:25:04.839 Anish Gupta: Right? Not in our… okay, okay, yeah. So, if we’re having such a long delay in our… from our query, the first thing I would look for is if there’s any just large branching select statements, because usually a quick workaround approach that people sometimes use for queries, they just do, like, a select star approach, and then they start to filter out that

200 00:25:04.970 00:25:21.340 Anish Gupta: individual query, but when you have 400 million-ish rows or whatever, something really huge, just doing an initial select all really can slow down your system. So there’s no pre-filtering there before you start doing your select statements. So I would look for, like, in our initial queries, if there are no

201 00:25:21.440 00:25:23.149 Anish Gupta: Pre-filtering steps.

202 00:25:23.510 00:25:28.259 Anish Gupta: we would look for that. I think another thing is if we’re carrying… if we’re…

203 00:25:28.660 00:25:32.089 Anish Gupta: Doing multiple joins with a potential single query.

204 00:25:32.140 00:25:46.190 Anish Gupta: these multiple joins, instead of using subqueries in our SQL approaches, if we’re doing a single joins, or single table joins at a time, we lead to a lot of potential overhead for no reason.

205 00:25:46.190 00:26:01.760 Anish Gupta: So, we can look at making subqueries and making smaller individual joins if needed. You know, again, applying some filtering as might need be, for doing, like, an inner join, for example, for a system, for a query. But this inner join is with the whole table, querying with the whole table, again.

206 00:26:01.760 00:26:10.239 Anish Gupta: that’s just gonna cause such a huge delay that the query is completely useless. So at that point, we’d probably start to look at, you know, if we can…

207 00:26:10.310 00:26:30.020 Anish Gupta: take our smaller individual tables and maybe join those together, instead of just querying our individual large database. Or we can maybe filter out, first, doing a subquery to filter out… query, filter out data types, data rows that are only from a certain timeframe, and then doing joins on those, for example.

208 00:26:30.020 00:26:34.359 Anish Gupta: That would be a… that would be a very simple debugging step right there.

209 00:26:34.670 00:26:43.909 Anish Gupta: So yeah, just, like, our subqueries are gonna be our main thing that we’ll be using for most likely, and then to reduce the number of joins as well.

210 00:26:44.120 00:26:56.600 Anish Gupta: Additionally, I think just, like, breaking up queries into smaller steps in general, just because a lot of times, I think with data engineering and a lot of data querying, data analysis, we want to try to do a work with the least amount of queries as possible, just because it…

211 00:26:57.520 00:27:17.490 Anish Gupta: can be faster sometimes, but if we are doing such large data, sometimes multiple smaller queries is just the right way to approach it. So breaking up our… breaking up our system into smaller, breaking up our main problem to smaller individual problems, and then storing these eventual results as smaller tables would also be a good approach.

212 00:27:17.520 00:27:18.560 Anish Gupta: Bro.

213 00:27:18.810 00:27:28.439 Anish Gupta: And… yeah, like, so I would mainly subquerying and reducing the amount of joins, and then breaking our problem into smaller individual components would be how I would approach that.

214 00:27:28.550 00:27:29.160 Anish Gupta: Mainly.

215 00:27:30.320 00:27:32.890 Demilade Agboola: That’s fair, I mean, those are really good strategies,

216 00:27:35.260 00:27:45.109 Demilade Agboola: Two things that could also just help, just random, would also be things around, like, indexing. Indexing would always be one of those things that really helps, especially if you’re able to…

217 00:27:45.420 00:27:52.279 Demilade Agboola: Because that helps with filtering, and if you also have, like, things around, like, distribution and keys, depending on… but that’s warehouse specific.

218 00:27:52.420 00:28:02.040 Demilade Agboola: If you have, like, distribution on keys, that would help with things like, your joins, so that would help significantly, and that can also speed up your process.

219 00:28:02.040 00:28:03.710 Anish Gupta: Right, yeah, that makes sense, okay.

220 00:28:03.900 00:28:04.600 Demilade Agboola: That’s fair.

221 00:28:05.590 00:28:12.430 Demilade Agboola: Okay, do you have any questions about, like, Brainforge? We have about… Couple more means left.

222 00:28:12.810 00:28:25.300 Anish Gupta: Yeah, yeah, I think just one question is, like, how do you enjoy your time working at Brainforge? I’ve talked to, I think, two people now about the company, and just asked about their experience, because I was curious. How’s it been for you so far?

223 00:28:26.010 00:28:30.599 Demilade Agboola: I mean, I’m… I become… I’m going to be one year here.

224 00:28:30.600 00:28:32.049 Anish Gupta: Oh, wow, nice, congrats.

225 00:28:32.050 00:28:33.449 Demilade Agboola: On 2nd of March, so…

226 00:28:33.450 00:28:34.469 Anish Gupta: Oh, wow, okay.

227 00:28:34.470 00:28:42.090 Demilade Agboola: That’s around the corner. I’ll say, like, I enjoy Brainforge. It’s consulting, so it’s fast-paced.

228 00:28:42.090 00:28:42.440 Anish Gupta: Amazing.

229 00:28:42.440 00:28:45.029 Demilade Agboola: Exposed to a lot of, like, clients.

230 00:28:45.470 00:28:48.819 Demilade Agboola: Different use cases, you have to think on your feet, you have to.

231 00:28:48.820 00:28:49.260 Anish Gupta: Beautiful.

232 00:28:49.260 00:28:51.520 Demilade Agboola: To figure out how you want to deliver.

233 00:28:51.730 00:28:55.360 Demilade Agboola: And also good client communication, basically.

234 00:28:55.570 00:28:57.860 Demilade Agboola: So you have to be able to know

235 00:28:58.050 00:29:08.239 Demilade Agboola: what… because you can deliver stuff, and the client is unhappy with what you delivered, even though you worked and broke your back making it, so you have to be able to, like, balance all of that together.

236 00:29:08.370 00:29:16.249 Demilade Agboola: And I think, so far it’s been a pretty good experience, like, balancing all of that together, and being able to work with a team that

237 00:29:16.730 00:29:23.069 Demilade Agboola: People on the team are very hardworking, very supportive, and also very open to each other.

238 00:29:23.830 00:29:32.109 Anish Gupta: Okay, that’s really cool. And I guess it’s, like, with the client, client consulting and the consulting approach in general, yeah.

239 00:29:32.110 00:29:46.920 Anish Gupta: what has been your… I guess, like, what… so, like, your typical workflow is… is it more… way more on the client meeting side, or is it, like, a couple client meetings, and then you dive into the technical work there? How does your… how does it typically work at Brainforge? Or is it depending on the client?

240 00:29:47.890 00:29:52.260 Demilade Agboola: yes and no, but… because I say no because it depends. So, someone like Otam.

241 00:29:52.550 00:30:04.469 Demilade Agboola: but the CEOs are very busy, so they have a lot of meetings, right? For the engineers, we try our best to ensure that your meetings happen early in the morning, and then so you can have the afternoon to code, and so, like, you don’t have too much…

242 00:30:04.710 00:30:08.450 Demilade Agboola: Interruption with your meetings.

243 00:30:08.450 00:30:09.210 Anish Gupta: I see. Okay.

244 00:30:09.210 00:30:17.630 Demilade Agboola: To be fair, like, I’m… as you rise higher up within the organizations, you might have a bit more… you might have a couple more meetings here and there.

245 00:30:18.310 00:30:24.669 Demilade Agboola: Generally, that’s… that would be the approach. So you have stand-up in the morning, you would have,

246 00:30:24.850 00:30:26.389 Demilade Agboola: Summup is, like, 30 minutes.

247 00:30:27.940 00:30:32.360 Demilade Agboola: And then you might have, like, one or two specific client meetings to attend in a week.

248 00:30:32.910 00:30:35.639 Demilade Agboola: Not usually, like, one, to be fair.

249 00:30:35.810 00:30:37.740 Anish Gupta: Okay. Not bad.

250 00:30:37.740 00:30:39.040 Demilade Agboola: And yeah, so it’s…

251 00:30:39.160 00:30:41.919 Demilade Agboola: But you might be on multiple clients, so that’s why you might have two of those.

252 00:30:41.920 00:30:51.430 Anish Gupta: Yeah, I see, I see, I see. So, so typically, do you get assigned to, like, probably, like, 2 to 3 clients a person, or is it depending on the weeks, sometimes more, sometimes less?

253 00:30:51.630 00:30:54.829 Demilade Agboola: So usually it’s 2 to 3. Very, very few, like…

254 00:30:55.240 00:30:58.910 Demilade Agboola: You coming in will probably be, like, 1, and then you’ll ramp up to 2.

255 00:30:58.910 00:31:00.090 Anish Gupta: Yeah, that makes sense.

256 00:31:00.090 00:31:10.719 Demilade Agboola: based off, like, ops and the feedback they’re getting from you, they might then decide 3. So I get 3 because, like, you know, I’m… in the team, I’m more senior, so I kind of…

257 00:31:11.370 00:31:12.769 Demilade Agboola: For a lot of other things as well.

258 00:31:12.770 00:31:13.300 Anish Gupta: Hmm.

259 00:31:13.570 00:31:16.549 Demilade Agboola: but, yeah, most people are, too.

260 00:31:17.110 00:31:18.620 Anish Gupta: Okay, pretty cool.

261 00:31:19.220 00:31:25.570 Anish Gupta: Yeah, I think that’s all the questions I have on my end. Appreciate taking the time to talk, it was a really, really fun conversation.

262 00:31:25.570 00:31:26.680 Demilade Agboola: Easier? Yeah.

263 00:31:26.680 00:31:27.650 Anish Gupta: Welcome to you, too.

264 00:31:27.650 00:31:28.950 Demilade Agboola: Alright then. Bye.

265 00:31:28.950 00:31:30.089 Anish Gupta: Alright, see you. Bye.