Meeting Title: Brainforge Final Interview Date: 2026-03-13 Meeting participants: Uttam Kumaran, Godwin Ekainu, Awaish Kumar, Demilade Agboola


WEBVTT

1 00:00:16.570 00:00:17.760 Uttam Kumaran: Hello!

2 00:00:23.050 00:00:24.150 Godwin Ekainu: Hi, Tom.

3 00:00:24.470 00:00:26.449 Uttam Kumaran: Hey, how are you? Good to see you.

4 00:00:28.160 00:00:30.170 Godwin Ekainu: I’m doing great. How are you, Dean?

5 00:00:30.600 00:00:31.320 Uttam Kumaran: Good.

6 00:00:31.440 00:00:33.329 Uttam Kumaran: How’s the… how’s the week going?

7 00:00:34.170 00:00:39.479 Godwin Ekainu: Yeah, it’s been fine so far. The week went great. How was yours?

8 00:00:40.890 00:00:43.909 Uttam Kumaran: It’s good. It’s busy. Yeah. Busy.

9 00:00:45.420 00:00:52.270 Uttam Kumaran: It’s good, though. It’s, good busy. Yeah, we’re… we’re… team is growing, and…

10 00:00:53.630 00:00:57.260 Uttam Kumaran: Yeah, team is growing, and I feel like it’s… it’s just been,

11 00:00:57.390 00:01:01.179 Uttam Kumaran: Good to start to work with some new clients and some new capabilities, so…

12 00:01:02.150 00:01:03.650 Godwin Ekainu: That’s on this 18.

13 00:01:04.060 00:01:04.730 Uttam Kumaran: Yeah…

14 00:01:04.730 00:01:06.280 Godwin Ekainu: Obviously, lots is going on.

15 00:01:06.650 00:01:07.290 Uttam Kumaran: Yeah.

16 00:01:07.490 00:01:09.990 Uttam Kumaran: What’s a… what’s a weekend plan, tell me?

17 00:01:12.330 00:01:20.390 Godwin Ekainu: So my UK plan is, I’ve been working on with the… so I shut down my home lab. Not really shut down, but I’m trying to automate it.

18 00:01:20.890 00:01:27.879 Godwin Ekainu: So I… I’m trying to, have you had… do you know what’s called PXC boots?

19 00:01:28.470 00:01:36.099 Godwin Ekainu: No. So there’s something… there’s something called PXA Boots, where, you can… Putting a new machine without…

20 00:01:36.100 00:01:36.989 Uttam Kumaran: PX reboot?

21 00:01:37.900 00:01:40.630 Godwin Ekainu: PXE boots, so it’s a network boots.

22 00:01:41.030 00:01:42.170 Godwin Ekainu: PXE.

23 00:01:42.910 00:01:44.270 Godwin Ekainu: Okay, okay, okay, okay.

24 00:01:45.150 00:01:55.119 Godwin Ekainu: So I’m trying to set that up for my home lab so that when I add a new machine to it, it automatically starts up the machine, assigns an IP address to it, then installs Ubuntu.

25 00:01:55.400 00:02:01.710 Godwin Ekainu: Set up the server, set up the network, then, trains the… my… my cluster.

26 00:02:02.630 00:02:04.660 Uttam Kumaran: Compared to that, this weekend.

27 00:02:05.430 00:02:08.660 Uttam Kumaran: Nice! I didn’t know you were doing a lot of, like, networking stuff.

28 00:02:09.440 00:02:17.480 Godwin Ekainu: Yeah, I play around with it. So I have a home lab, a Trinode home lab, where I set up Kubernetes, and I host some stuffs on it.

29 00:02:17.620 00:02:19.870 Godwin Ekainu: Play around with infrastructures and all.

30 00:02:21.030 00:02:23.370 Uttam Kumaran: Oh, interesting, great. Nice.

31 00:02:25.010 00:02:25.530 Godwin Ekainu: Kevin.

32 00:02:26.410 00:02:27.020 Awaish Kumar: Aye.

33 00:02:27.710 00:02:30.160 Godwin Ekainu: Hi, Arish. How are you doing?

34 00:02:30.640 00:02:31.980 Awaish Kumar: I’m good, how about you?

35 00:02:32.600 00:02:33.880 Godwin Ekainu: I’m doing great.

36 00:02:34.000 00:02:34.820 Godwin Ekainu: Thank you.

37 00:02:36.700 00:02:38.309 Uttam Kumaran: Wish, is it just me and you?

38 00:02:40.370 00:02:41.850 Awaish Kumar: Demi’s coming.

39 00:02:42.790 00:02:43.949 Awaish Kumar: Might be coming, yeah.

40 00:02:46.220 00:02:47.799 Uttam Kumaran: Okay, let’s go ahead and get started.

41 00:02:48.040 00:02:53.959 Uttam Kumaran: Yeah, I think, Godwin, I don’t know if you… if Awish, you guys already met once before.

42 00:02:55.120 00:02:55.780 Godwin Ekainu: Yes.

43 00:02:55.890 00:02:57.200 Awaish Kumar: Pretty much.

44 00:02:58.400 00:03:01.720 Uttam Kumaran: Cool, then yeah, I think we can get into the exercise. Yeah, feel free.

45 00:03:02.620 00:03:03.830 Godwin Ekainu: Okay,

46 00:03:04.730 00:03:10.800 Godwin Ekainu: So where should I start from? I don’t know if you guys have seen… gone through the…

47 00:03:11.130 00:03:12.240 Godwin Ekainu: SSAs.

48 00:03:13.370 00:03:14.880 Godwin Ekainu: The solution, rather.

49 00:03:15.940 00:03:18.520 Uttam Kumaran: Yeah, I think one thing that would be… Wish, go ahead.

50 00:03:19.730 00:03:27.450 Awaish Kumar: Yeah, I think I… we have reviewed the submitted challenge, but we want you to kind of give a demo of…

51 00:03:27.570 00:03:31.530 Awaish Kumar: What you have worked on, and how it looks like, what…

52 00:03:31.690 00:03:36.060 Awaish Kumar: How you made the choices, or… While communicating.

53 00:03:36.400 00:03:37.090 Awaish Kumar: Yep.

54 00:03:37.880 00:03:40.430 Godwin Ekainu: So let me share my window.

55 00:03:43.580 00:03:45.450 Godwin Ekainu: I’m just meant, screen.

56 00:03:47.230 00:03:48.899 Godwin Ekainu: I’m sorry, can you see my screen?

57 00:03:50.130 00:03:50.930 Uttam Kumaran: Yes.

58 00:03:52.380 00:03:55.669 Godwin Ekainu: Yeah. So,

59 00:04:00.800 00:04:04.619 Godwin Ekainu: Sorry, I wish. You’re seeing something.

60 00:04:06.130 00:04:11.300 Awaish Kumar: No, no, I just… I was just saying, I can see your screen, you can start.

61 00:04:12.220 00:04:12.990 Godwin Ekainu: Okay.

62 00:04:13.330 00:04:15.669 Godwin Ekainu: So for the challenge,

63 00:04:16.410 00:04:22.500 Godwin Ekainu: It was really straightforward for me, so it had a lot of information in it, basically.

64 00:04:22.710 00:04:26.370 Godwin Ekainu: I think it’s… you stated that I actually…

65 00:04:26.550 00:04:31.590 Godwin Ekainu: I was going to use a byte to set up the, for the ingestion.

66 00:04:31.810 00:04:36.199 Uttam Kumaran: Yeah, and Godwin, we’re still seeing… we’re seeing ourselves on the Zoom. I don’t know if you’re sharing something else.

67 00:04:37.430 00:04:39.469 Godwin Ekainu: I’m sharing my screen.

68 00:04:43.100 00:04:44.360 Godwin Ekainu: Sorry.

69 00:04:49.110 00:04:49.960 Godwin Ekainu: Okay.

70 00:04:49.960 00:04:52.779 Uttam Kumaran: I was seeing your screen, but it was just, like, us on the Zoom.

71 00:04:53.710 00:04:59.109 Godwin Ekainu: Yeah, I think I made a mistake. Can I see my, setup?

72 00:04:59.330 00:05:01.960 Godwin Ekainu: Yes.

73 00:05:02.180 00:05:03.120 Uttam Kumaran: Yeah, yeah, yeah.

74 00:05:04.600 00:05:06.369 Godwin Ekainu: I’m sharing my VS Code.

75 00:05:06.780 00:05:07.940 Awaish Kumar: Yes.

76 00:05:08.300 00:05:14.320 Godwin Ekainu: So, for the, AirBytes installation, that didn’t take much of a time.

77 00:05:14.510 00:05:18.679 Godwin Ekainu: One thing I noticed was that AirByte did not have a Docker Compose setup.

78 00:05:18.890 00:05:24.929 Godwin Ekainu: a Docker setup, from the last time I used it, you had, but just quite shocking when I went in and…

79 00:05:25.150 00:05:29.019 Godwin Ekainu: He had, like, a command line to set it up locally.

80 00:05:29.270 00:05:39.369 Godwin Ekainu: And so I used their command line. I noticed that it installed, Kubernetes, and installs a byte on top of the Kubernetes kind cluster.

81 00:05:39.620 00:05:42.159 Godwin Ekainu: Which is what I did, so…

82 00:05:42.640 00:05:48.899 Godwin Ekainu: You can see that, if I do a byte, ABC tail, sorry, let me do that again.

83 00:05:53.510 00:05:57.709 Godwin Ekainu: Why should you do this, it checks and sees that a bike is installed.

84 00:05:57.810 00:06:09.770 Godwin Ekainu: To get my credentials, I just drawn by ABCTL local credential, and I get my command line. I’m exposing this, though, because it’s localhost, I don’t think anyone has access to it.

85 00:06:10.050 00:06:16.730 Godwin Ekainu: So when… once a byte is… has… was installed, kind of,

86 00:06:18.130 00:06:24.359 Godwin Ekainu: The next part was to, try out AirByte, basically, and see how it works.

87 00:06:24.650 00:06:28.120 Godwin Ekainu: So I thought about ingestion,

88 00:06:29.930 00:06:32.810 Godwin Ekainu: So let me, do this.

89 00:06:39.250 00:06:41.249 Godwin Ekainu: So, the setup was like this.

90 00:06:44.680 00:06:46.950 Godwin Ekainu: And bytes the,

91 00:06:53.650 00:06:54.780 Godwin Ekainu: Who’s Chris?

92 00:06:55.900 00:07:03.140 Godwin Ekainu: for the, destination… I buy things, they tend to…

93 00:07:06.860 00:07:08.280 Godwin Ekainu: Gcs…

94 00:07:08.870 00:07:16.159 Godwin Ekainu: So the reason I used GCS was that I needed a way to, post the data, and I tried doing that locally.

95 00:07:17.210 00:07:23.219 Godwin Ekainu: But, I could not find a way to, to, track

96 00:07:23.460 00:07:28.609 Godwin Ekainu: link, mounts the data router in pita directory in my local machine.

97 00:07:28.860 00:07:37.609 Godwin Ekainu: Airbyte has a command where if you’re setting up Airbyte, the installation, you do, like, a volume path directory. I tried that, it didn’t work.

98 00:07:37.760 00:07:48.750 Godwin Ekainu: So I decided to just uploaded files to GCS, and the way… the way I uploaded them was, this… into, what do you call it? Call them? Directories, basically.

99 00:07:48.960 00:07:51.220 Godwin Ekainu: You can’t see it here, so…

100 00:07:51.360 00:07:57.249 Godwin Ekainu: We had the customers, we have the others, and we have the products, and each file went into a secret directory.

101 00:07:57.520 00:08:03.269 Godwin Ekainu: And in a normal production setting, each file, we have, like, a partitioning.

102 00:08:03.500 00:08:06.880 Godwin Ekainu: So for each stage data, it goes into a separate partition.

103 00:08:07.030 00:08:15.850 Godwin Ekainu: Where you have, like, your file name, the partition date, and some random, codes or numbers, basically.

104 00:08:16.310 00:08:24.889 Godwin Ekainu: So I operate the files into, GCS, then I use a byte, I set up, like, an byte connection, so set up the source connection.

105 00:08:26.100 00:08:28.289 Godwin Ekainu: So this is it.

106 00:08:29.220 00:08:37.690 Godwin Ekainu: For the source connection, source name GCS, I created a service account, and the service account just had, the…

107 00:08:38.049 00:08:46.319 Godwin Ekainu: GCS, data viewer row, and GCS bucket object row. I don’t know if I’m calling that correctly. And I set up my streams.

108 00:08:46.590 00:08:54.179 Godwin Ekainu: For each… for each file, I have a separate stream, so for customers, I’m looking into,

109 00:08:54.350 00:08:55.680 Godwin Ekainu: a foul.

110 00:08:56.040 00:08:58.849 Godwin Ekainu: name, basically, default in…

111 00:08:59.330 00:09:07.760 Godwin Ekainu: how it’s arranged in the bucket. I think for production use case, when you have multiple files, you probably just use an asteric tool.

112 00:09:08.150 00:09:10.280 Godwin Ekainu: Forgoti files in that directory.

113 00:09:10.450 00:09:18.380 Godwin Ekainu: This was just, optional if, had multiple days backfilled for 3 days.

114 00:09:18.560 00:09:23.589 Godwin Ekainu: Then, the rest were… The rest follow the same pattern.

115 00:09:23.840 00:09:30.530 Godwin Ekainu: shows the JSON L formats here, to ensure that, it follows the format of the files.

116 00:09:30.740 00:09:31.720 Godwin Ekainu: himself.

117 00:09:32.070 00:09:38.739 Godwin Ekainu: than, I tested the connection and made sure it, was working.

118 00:09:38.930 00:09:43.540 Godwin Ekainu: After setting up my source, I went to set up my destination.

119 00:09:43.670 00:09:50.179 Godwin Ekainu: I have a lot of destinations, because I’m using two currently, and the local PG to connect to the local Postgres.

120 00:09:50.400 00:10:02.070 Godwin Ekainu: for local run, and I have a, a Postgres instance running on PlanetScale, that I use to test for the, GitHub action and production instance.

121 00:10:02.180 00:10:06.240 Godwin Ekainu: So for the local PG, I used,

122 00:10:07.470 00:10:11.650 Godwin Ekainu: So, the host is my Docker bridge network.

123 00:10:11.820 00:10:18.140 Godwin Ekainu: So I’m using this because when you do… when you set up

124 00:10:18.780 00:10:22.459 Godwin Ekainu: what do you call it? So AirByte is set up inside a Docker container.

125 00:10:22.610 00:10:27.010 Godwin Ekainu: Postgres is set up using Docker Compose.

126 00:10:27.130 00:10:36.139 Godwin Ekainu: And for them to communicate, you have to use a bridge, so everybody can’t really access, what do you call it, Postgres, because of…

127 00:10:36.310 00:10:46.069 Godwin Ekainu: Because, they are not directly in the same network, so you have to bridge them. So, when you do run stuff on Docker, you can do,

128 00:10:47.420 00:10:49.390 Godwin Ekainu: What was it called Commanding Kane?

129 00:10:54.400 00:10:57.110 Godwin Ekainu: She’s in my dock somewhere, can’t really remember that.

130 00:10:58.300 00:11:07.430 Godwin Ekainu: then use the IP address, basically, for this. Then, my port of 5432, data business, Shopify, Kimaro…

131 00:11:07.660 00:11:10.749 Godwin Ekainu: Then the role was in flu ingestion.

132 00:11:11.040 00:11:13.180 Godwin Ekainu: So no sacrifice or so.

133 00:11:13.740 00:11:16.790 Godwin Ekainu: Configured for this, then my database password.

134 00:11:17.450 00:11:24.120 Godwin Ekainu: then, this was just, optional, or default setting, basically. So it has a connection.

135 00:11:25.590 00:11:27.680 Godwin Ekainu: made sure it was connecting before I…

136 00:11:27.850 00:11:30.519 Godwin Ekainu: If I, trigger the job.

137 00:11:30.820 00:11:33.319 Godwin Ekainu: So after doing that,

138 00:11:34.000 00:11:36.549 Godwin Ekainu: Created… created, like, a connection to…

139 00:11:36.940 00:11:42.729 Godwin Ekainu: sync data from the look… from the storage location into the Postgres instance.

140 00:11:43.580 00:11:44.070 Awaish Kumar: Okay.

141 00:11:45.350 00:11:45.930 Godwin Ekainu: Wow.

142 00:11:46.560 00:11:47.610 Godwin Ekainu: So, okay.

143 00:11:47.610 00:11:53.119 Awaish Kumar: Sorry, I have a question. If I have to read… instead of reading from Google Cloud Storage, if I just have to read

144 00:11:53.260 00:11:54.720 Awaish Kumar: From local machine.

145 00:11:55.640 00:11:58.520 Awaish Kumar: Can you reproduce the flow?

146 00:12:00.130 00:12:07.819 Godwin Ekainu: Yes, if I’m able to get, the local, Look after fans, look after…

147 00:12:07.820 00:12:08.350 Awaish Kumar: Pretty?

148 00:12:10.470 00:12:11.190 Godwin Ekainu: Sorry?

149 00:12:11.730 00:12:16.700 Awaish Kumar: The files are already, like, they’re… you can… you’re able to download it, right?

150 00:12:16.870 00:12:17.910 Awaish Kumar: Yeah, so…

151 00:12:17.910 00:12:24.469 Godwin Ekainu: Yes, I downloaded them, so I had issues with, mounting the files on my local machine.

152 00:12:24.820 00:12:32.660 Godwin Ekainu: Basically, to enable, AirByte to read the file, you have to mount the file to AirByte.

153 00:12:33.330 00:12:48.560 Godwin Ekainu: Into the Airbyte, data directory. When I did that, AirByte wasn’t reading… actually seeing the file. I suspect it has to do with some networking issue, because I tried, other, approaches. I tried other approaches, and that didn’t work, so…

154 00:12:48.660 00:12:53.679 Godwin Ekainu: Airbite has a… Let me see if I can look at the documentation.

155 00:12:56.790 00:12:58.810 Awaish Kumar: Okay, that’s okay, we can move on.

156 00:12:59.370 00:12:59.950 Godwin Ekainu: Okay.

157 00:13:00.070 00:13:10.939 Godwin Ekainu: But basically, it’s… you can reproduce it using the local directory, so you just need to set up your source to point to, so if you go to a source and create a news…

158 00:13:11.190 00:13:14.819 Godwin Ekainu: status, you have to… can just let file.

159 00:13:16.110 00:13:22.240 Godwin Ekainu: And you create a file, source, you create your schema, you format, JSONL for this case.

160 00:13:22.440 00:13:25.909 Godwin Ekainu: If you’re doing local, local system.

161 00:13:26.110 00:13:29.029 Godwin Ekainu: And it paths to your directory, basically.

162 00:13:30.500 00:13:42.389 Godwin Ekainu: Yeah, but for me, I could not, get the URL, so I was mounting the files, but it wasn’t really seeing the files when I do this, so I just decided to go with the storage location.

163 00:13:45.080 00:13:48.949 Godwin Ekainu: Yeah, so, back to the connection,

164 00:13:49.740 00:13:53.549 Godwin Ekainu: for this, it creates a connection, I just…

165 00:13:53.870 00:14:02.199 Godwin Ekainu: Click on your sync now and syncs. So you can see that I did this 5 days ago, when I… when I was working on this. So it syncs…

166 00:14:02.200 00:14:02.950 Awaish Kumar: Don’t worry.

167 00:14:03.550 00:14:10.189 Awaish Kumar: Yeah, on these connections, did you know how to set up the monitoring, so if any of the sync failed?

168 00:14:10.340 00:14:14.409 Awaish Kumar: It just, send the, alert to Slack.

169 00:14:15.160 00:14:17.159 Godwin Ekainu: Niggin to that.

170 00:14:20.450 00:14:20.780 Awaish Kumar: No.

171 00:14:21.470 00:14:22.770 Godwin Ekainu: I was doing these.

172 00:14:23.180 00:14:26.740 Awaish Kumar: Do you have an idea, like, if it can be done, or not, or whatever?

173 00:14:27.430 00:14:30.450 Godwin Ekainu: For none, I’m not sure, but I think it can be done.

174 00:14:31.130 00:14:38.739 Godwin Ekainu: I’m not really sure, but it should be something that should be able… it can be done, basically, but I’m not really sure, because I’ve not really tried it.

175 00:14:39.380 00:14:40.060 Awaish Kumar: Okay.

176 00:14:42.030 00:14:54.300 Godwin Ekainu: So, this is just for the air bytes, basically. Then, for the Postgres, Postgres, postgres setup,

177 00:14:57.040 00:14:59.810 Godwin Ekainu: I have the Docker Compose here.

178 00:15:00.890 00:15:07.490 Godwin Ekainu: Just the basic Docker Compose for setting up Postgres 18 instance, setting up the Docker name.

179 00:15:07.740 00:15:12.959 Godwin Ekainu: restart policy on the environments. I’m using an ESV file, so I’ll do my

180 00:15:13.200 00:15:16.559 Godwin Ekainu: database and user and password and UP name.

181 00:15:16.930 00:15:19.659 Godwin Ekainu: Than to be able to access my…

182 00:15:20.030 00:15:28.180 Godwin Ekainu: Docker, my Postgres instance externally, that’s for everybody to be accessed instance, I had to set up this,

183 00:15:28.700 00:15:33.750 Godwin Ekainu: IP address, localhost IP address with, reports.

184 00:15:34.110 00:15:38.259 Godwin Ekainu: The amount in volume, where my data is going to be stored.

185 00:15:38.410 00:15:42.699 Godwin Ekainu: Then I have an initialization script for the, ARPA crows.

186 00:15:42.820 00:15:47.029 Godwin Ekainu: for the… so once you start… start up the Docker Compose, it’s…

187 00:15:47.190 00:15:55.270 Godwin Ekainu: Initializes the scripts, creates the databases, creates the rows, basically, for each of the, schemas.

188 00:15:55.890 00:16:00.580 Godwin Ekainu: Then, this is more of a health check to check that, by one.

189 00:16:01.500 00:16:05.200 Awaish Kumar: So, we have a volume here called Postgres Data.

190 00:16:05.400 00:16:09.699 Awaish Kumar: Is it part of Local Compose, and it, like, if I…

191 00:16:10.440 00:16:15.029 Awaish Kumar: like, it will get deleted if I just run Docker Compose.

192 00:16:15.820 00:16:16.990 Awaish Kumar: Tom or something.

193 00:16:17.100 00:16:20.549 Awaish Kumar: So how we can make it something which

194 00:16:20.970 00:16:23.959 Awaish Kumar: Is not deleted as part of that command.

195 00:16:27.190 00:16:30.679 Godwin Ekainu: So, I know if you do docker compose down,

196 00:16:30.890 00:16:39.460 Godwin Ekainu: minus V, delete the storage and resets everything. I assume to not delete it, you have to set up, like, a,

197 00:16:39.940 00:16:44.299 Godwin Ekainu: a delete policy, I don’t know if, or you set up a backup, basically.

198 00:16:44.640 00:16:54.589 Godwin Ekainu: You can do, like, a backup for your instance, to make sure it’s not deleted, and if it’s deleted, I can easily restore the backup, or you can set up, like, a delete

199 00:16:54.700 00:17:01.680 Godwin Ekainu: a non-delete, what do you call it, policy on your Docker Compose. So your storage location, your storage is not really deleted.

200 00:17:03.550 00:17:05.130 Awaish Kumar: Okay. Okay.

201 00:17:06.869 00:17:11.419 Godwin Ekainu: Yeah, and use the volume mounts, to ensure that the volume is mounted.

202 00:17:12.349 00:17:15.009 Godwin Ekainu: So to our back froze,

203 00:17:15.459 00:17:18.749 Godwin Ekainu: It’s basically just, creating the schemas first.

204 00:17:19.339 00:17:37.349 Godwin Ekainu: So I have my raw schema for the ingestion, AirByte sends data to the raw schema, and I have my dev staging, dev intermediate, and dev mats for my dbt dev environment, and I have my staging intermediate and match layer for my

205 00:17:37.539 00:17:39.219 Godwin Ekainu: DPT production layer.

206 00:17:39.729 00:17:44.719 Godwin Ekainu: Then I’m revoking, assets, so that,

207 00:17:45.649 00:17:50.129 Godwin Ekainu: Users will not have access to this database unless they are granted.

208 00:17:50.449 00:17:53.369 Godwin Ekainu: That access, or that permission, basically.

209 00:17:53.759 00:17:56.919 Godwin Ekainu: And yeah, I’m doing… I’m doing a follow-up to…

210 00:17:57.139 00:18:06.839 Godwin Ekainu: Create the rules, check if the rule exists. If it doesn’t exist, create the rule with the login password, basically. For production instance, this will be changed.

211 00:18:07.400 00:18:10.820 Awaish Kumar: Is it password hard-coded in the scroll, or…

212 00:18:11.220 00:18:14.930 Godwin Ekainu: Yeah, and the script is… the script is hard-coded,

213 00:18:15.330 00:18:18.639 Godwin Ekainu: But in… for a production setting, this should be…

214 00:18:18.950 00:18:22.880 Godwin Ekainu: Use setup using an environmental variable, basically.

215 00:18:23.030 00:18:25.890 Awaish Kumar: But, like, it’s already on GitHub now.

216 00:18:28.480 00:18:34.019 Godwin Ekainu: So, in the documentation, you see that I said you should change this, so these are, like, default passwords.

217 00:18:37.490 00:18:38.100 Godwin Ekainu: Yeah.

218 00:18:38.380 00:18:42.439 Godwin Ekainu: So, for the, rule ingestion,

219 00:18:42.810 00:18:50.120 Godwin Ekainu: I’m granting, database Shopify to the rule ingestion, so this is the rule that AirByte uses.

220 00:18:50.730 00:18:55.950 Godwin Ekainu: Basically, to ingest it, then, according to air by documentation, you have to grant this, too.

221 00:18:56.100 00:18:59.720 Godwin Ekainu: To the role you are using, or the user you are using.

222 00:19:00.710 00:19:07.620 Godwin Ekainu: then I’m granting access to the raw database so that I can twitch, initial tables on the…

223 00:19:07.800 00:19:09.829 Godwin Ekainu: Raw, schema 2.

224 00:19:10.000 00:19:16.319 Godwin Ekainu: then I’m granting the ability to, to also create, insert, update, and delete.

225 00:19:16.870 00:19:22.619 Godwin Ekainu: on all tables in the schema, in the raw schema, too, so that every time you’re trying to…

226 00:19:22.960 00:19:27.309 Godwin Ekainu: Ingest the time to the schema updates, everybody needs access to all this.

227 00:19:27.590 00:19:28.810 Godwin Ekainu: that,

228 00:19:30.990 00:19:43.810 Godwin Ekainu: So we have, similar setup for the, row transformation. So this is the role, DBT uses. The DBT user uses, so I’m sequencing it access to read.

229 00:19:44.260 00:19:45.420 Godwin Ekainu: From the road.

230 00:19:46.110 00:19:49.420 Godwin Ekainu: schema, so that it can, read from the raw schema and…

231 00:19:49.600 00:19:52.359 Godwin Ekainu: And get the data from it, basically.

232 00:19:52.650 00:19:54.650 Godwin Ekainu: Then, also…

233 00:19:55.460 00:19:55.870 Awaish Kumar: So.

234 00:19:55.870 00:19:56.359 Godwin Ekainu: Giving that.

235 00:19:57.780 00:20:01.729 Awaish Kumar: Yeah, like, in the challenge, we asked to, like,

236 00:20:01.970 00:20:05.550 Awaish Kumar: Also, to have, two different, like,

237 00:20:05.930 00:20:15.590 Awaish Kumar: workflows for run… to run on GitHub Actions. One on PR validations, which basically points to staging, right? I don’t see any,

238 00:20:15.960 00:20:18.569 Awaish Kumar: Databases, for staging.

239 00:20:19.640 00:20:21.310 Godwin Ekainu: Yeah, so this is us.

240 00:20:21.530 00:20:22.220 Awaish Kumar: comes?

241 00:20:23.350 00:20:25.569 Awaish Kumar: There are no schemas for a staging environment.

242 00:20:25.570 00:20:27.019 Godwin Ekainu: This is cause teaching.

243 00:20:28.720 00:20:33.939 Awaish Kumar: So, basically, what I did was just, for Devon staging, using the same schema.

244 00:20:34.010 00:20:36.629 Godwin Ekainu: And for production, right through this schema.

245 00:20:38.220 00:20:39.930 Godwin Ekainu: We also keep it simple, too.

246 00:20:43.010 00:20:45.950 Godwin Ekainu: Yes, well, thanks for stitching, and for…

247 00:20:46.360 00:20:53.649 Godwin Ekainu: transformation… road transformation, which is what DBT uses, granting it access to… Rights to the deaf.

248 00:20:53.970 00:20:56.520 Godwin Ekainu: And the production, schemas.

249 00:20:56.660 00:20:58.780 Godwin Ekainu: Professional schemas.

250 00:20:59.030 00:21:03.100 Godwin Ekainu: Then, for the, road developer.

251 00:21:03.440 00:21:09.020 Godwin Ekainu: Just continue to read access to the entire schemas, so that it can view the data in there.

252 00:21:09.270 00:21:11.259 Godwin Ekainu: And see what’s in there.

253 00:21:11.380 00:21:15.490 Godwin Ekainu: And for the BI row, you’re just going to ask us to view the max layer.

254 00:21:15.960 00:21:21.200 Godwin Ekainu: Basically, just only the material, no access to any other… any of the other databases.

255 00:21:21.650 00:21:25.870 Godwin Ekainu: Then, for the DBT,

256 00:21:26.500 00:21:33.259 Godwin Ekainu: I’m using, what do you call it? UV to install my dbt. I can see my dbt projects.

257 00:21:33.630 00:21:42.650 Godwin Ekainu: So that’s all for the Postgres setup. So, to run it, you just run your docker compose up minus D to set up. I have it running already, so…

258 00:21:42.900 00:21:45.220 Godwin Ekainu: I don’t want to do that, so…

259 00:21:46.230 00:21:48.100 Godwin Ekainu: More than the step you follow.

260 00:21:48.280 00:21:58.460 Godwin Ekainu: And for the dbt setup, I have my projects, dbt project YAML4, and I can, it’s just the default dbt settings, basically.

261 00:21:58.630 00:22:03.610 Godwin Ekainu: But for my model setup, I’m using this, Shopify staging. For staging.

262 00:22:03.880 00:22:05.210 Godwin Ekainu: I am.

263 00:22:05.710 00:22:10.359 Godwin Ekainu: Setting it as a view, because, he wants…

264 00:22:10.460 00:22:12.759 Godwin Ekainu: To have fresh data each time you run.

265 00:22:13.110 00:22:20.729 Godwin Ekainu: Your staging environment, and… and you want… you don’t want to create statistic one, and you don’t save on storage costs, basically.

266 00:22:20.880 00:22:27.970 Godwin Ekainu: For the intermediate clear, I’m showing the same, reason, setting up… setting it up as if you… Soon?

267 00:22:28.090 00:22:29.690 Godwin Ekainu: Managers…

268 00:22:30.600 00:22:39.510 Awaish Kumar: Yeah, I have a question here. For the warehouses we have right now in the market, do you think compute cost

269 00:22:39.630 00:22:43.330 Awaish Kumar: Is higher, or the storage cost is higher.

270 00:22:44.560 00:22:47.370 Godwin Ekainu: I would say compute course is higher,

271 00:22:47.770 00:22:51.600 Godwin Ekainu: Now, I’m aware of BigQuery and Snowflake.

272 00:22:51.820 00:22:54.590 Godwin Ekainu: The compute cost is higher than storage costs.

273 00:22:55.700 00:22:59.440 Awaish Kumar: What do you think, then, it makes sense to create as views or tables?

274 00:23:00.560 00:23:14.550 Godwin Ekainu: I mean, for me, I usually just create my intermediate layers, or my staging layers, views. Intermediate layer can also be set as ephemera, but I don’t use that, I don’t use that at all.

275 00:23:15.190 00:23:22.210 Godwin Ekainu: The only time I set up my intermediate layers tables is when I’ll save up on costs.

276 00:23:22.690 00:23:28.940 Godwin Ekainu: So, for example, my current company, we dig that it costs

277 00:23:29.180 00:23:47.750 Godwin Ekainu: I went on reducing our cost or maintaining our costs to a lower level. So, for large tables where we arranged them as views in the interject layer, I… we converted them to tables. So, status table, so no matter how many times someone calls a table or runs… gives a table, you know this, the status…

278 00:23:47.930 00:24:03.909 Godwin Ekainu: The byte process for that table is static, and it’s fixed. So, yeah, it’s smaller than when you use… you leave the table as a view, and you’re querying that view. So, for example, if I have an others table now, and when you query the view, you’re querying about,

279 00:24:04.400 00:24:07.710 Godwin Ekainu: 500 gigabytes worth of data.

280 00:24:08.040 00:24:15.280 Godwin Ekainu: When you convert to the table, to a table, you can see that when you convert to a table, it’s about 100 gigabytes.

281 00:24:15.480 00:24:18.349 Godwin Ekainu: Because the storage cost is cheap, and it’s…

282 00:24:18.810 00:24:23.250 Godwin Ekainu: for BigQuery, it compresses the data size to a lesser amount.

283 00:24:23.480 00:24:28.880 Godwin Ekainu: And then when other individuals are trying to call or write a query against that table.

284 00:24:29.050 00:24:39.520 Godwin Ekainu: They are only querying about 100, depending on how they write their query. If you are doing… if they are, in their select statement, they are calling the columns rather than doing a select star.

285 00:24:39.670 00:24:44.779 Godwin Ekainu: It’s more cheaper, so… So that’s how BigQuery works, basically, so…

286 00:24:44.960 00:24:53.230 Godwin Ekainu: for that reason, we set up our table as… as… or intermediately as tables, rather than our views. But initially.

287 00:24:53.570 00:24:59.629 Godwin Ekainu: in the starting stage, we usually do a view, basically. I usually do the view.

288 00:24:59.880 00:25:04.220 Godwin Ekainu: Then for Maxley, I… I prefer the table, because it’s…

289 00:25:04.480 00:25:08.820 Godwin Ekainu: The max data is fixed, and accept you updating it regularly.

290 00:25:09.900 00:25:11.879 Godwin Ekainu: So, for my…

291 00:25:12.190 00:25:18.430 Godwin Ekainu: profiles like the YAML, I’m using environmental variables, so I have two… two layers,

292 00:25:19.010 00:25:23.139 Godwin Ekainu: My dev layer, and my third layer.

293 00:25:24.120 00:25:33.469 Godwin Ekainu: So in the AIV file, you set up your, database and users for each of these layers, and the schemas you want to refer for each of these layers.

294 00:25:33.930 00:25:37.220 Godwin Ekainu: Then for the, GitHub action workflow.

295 00:25:37.330 00:25:39.340 Godwin Ekainu: I have my staging area here.

296 00:25:42.500 00:25:45.250 Godwin Ekainu: I have my, yes, my staging works right here.

297 00:25:46.290 00:25:48.230 Godwin Ekainu: You can see…

298 00:25:48.570 00:25:59.009 Godwin Ekainu: I have my environmental variables set here. I’m setting up the GitHub secrets using, I use command line. I have my EMV file, so when I do GH,

299 00:25:59.230 00:26:09.230 Godwin Ekainu: secrets-f.env. It set up the secret for me automatically my GitHub action repository, or my repository, rather.

300 00:26:09.540 00:26:17.839 Godwin Ekainu: You can see I’m calling, running on Ubuntu data test, checking out the code, setting up my Python, my Python version.

301 00:26:18.140 00:26:20.899 Godwin Ekainu: So, I’m using Python 3.12.

302 00:26:21.190 00:26:28.080 Godwin Ekainu: and installing dbt Postgres, so dbt Core and dbt Postgres, libraries.

303 00:26:28.360 00:26:29.630 Godwin Ekainu: then,

304 00:26:30.490 00:26:39.059 Godwin Ekainu: checking my dependencies. For these reports, for this project, I didn’t use any external dependencies, so it’s keep… just keeping it simple.

305 00:26:39.310 00:26:43.049 Godwin Ekainu: And I’m compiling my code, and then running against the staging.

306 00:26:43.350 00:26:49.059 Godwin Ekainu: environments once this one is completed, I do my tests.

307 00:26:49.470 00:26:59.829 Godwin Ekainu: So, once I… Merge, the production environment runs, follows the same format, you can see.

308 00:26:59.960 00:27:03.430 Godwin Ekainu: Runs on your schedule 6 hours. Every 6 hours.

309 00:27:03.860 00:27:07.040 Godwin Ekainu: She says not my environment here.

310 00:27:07.240 00:27:12.110 Godwin Ekainu: Still following the same, flow.

311 00:27:12.940 00:27:16.420 Godwin Ekainu: Except for where you’re running against production, so yeah.

312 00:27:16.620 00:27:19.749 Godwin Ekainu: Pointing to your production environment, rather.

313 00:27:21.450 00:27:24.879 Awaish Kumar: What else is that workflow dispatch keyword is doing?

314 00:27:25.090 00:27:26.790 Awaish Kumar: Like, what is the purpose of that?

315 00:27:27.450 00:27:28.060 Awaish Kumar: On the top.

316 00:27:28.060 00:27:28.640 Godwin Ekainu: H.

317 00:27:29.230 00:27:30.360 Awaish Kumar: Top of this file.

318 00:27:30.880 00:27:33.629 Awaish Kumar: There’s a keyword called, workflow dispatch.

319 00:27:35.670 00:27:38.010 Godwin Ekainu: I’m not really sure, to be honest.

320 00:27:40.790 00:27:41.520 Awaish Kumar: Okay.

321 00:27:42.640 00:27:46.900 Godwin Ekainu: So, I also forgot to mention my macros.

322 00:27:47.310 00:27:50.939 Godwin Ekainu: So, when you are running… when you set up a…

323 00:27:51.100 00:27:56.980 Godwin Ekainu: DPT project, it gives the name… Should I put this?

324 00:27:57.250 00:28:08.529 Godwin Ekainu: So, when you create your DPT project, it automatically assigns schema names to your files, or your schema in your database. So, based on your name, you can assign,

325 00:28:08.630 00:28:26.530 Godwin Ekainu: let’s say I wash, dev intermediate, something like that. So, to prevent that, I really wanted the development environment to have dev staging, dev intermediate, and for the production environment, you should just use the schema name to make it cleaner. So, instead of doing prod…

326 00:28:28.000 00:28:38.709 Godwin Ekainu: Prod Intermediate, Pro Staging, prod maths, just use the name, maths, intermediate Analytics, staging, rather, for those environments.

327 00:28:39.040 00:28:41.500 Godwin Ekainu: Am I missing anything here?

328 00:28:42.440 00:28:44.210 Godwin Ekainu: Kia.

329 00:28:45.800 00:28:48.029 Awaish Kumar: Yeah, we can look at the activity part.

330 00:28:49.370 00:28:50.220 Godwin Ekainu: Sorry?

331 00:28:51.170 00:28:53.939 Awaish Kumar: Yeah, like, we can look at the dbt models you have created.

332 00:28:54.480 00:28:55.910 Godwin Ekainu: Okay, okay, that’s true.

333 00:28:58.680 00:29:07.229 Godwin Ekainu: So, for these staging, schemas, staging orders, and staging products, these are… so for staging,

334 00:29:07.460 00:29:21.699 Godwin Ekainu: environment, or staging layer, is more of a one-to-one of your, raw layer. Just, yeah, just doing basic type change, rename, column, renames, typing, and all. So the, what do you call it, the…

335 00:29:22.150 00:29:32.819 Godwin Ekainu: the… you are just refining a little to make your data readable, make your data, presentable at the staging layer. So that… which is what I did here, so…

336 00:29:33.040 00:29:36.940 Godwin Ekainu: For each of these columns, all capital letter.

337 00:29:37.140 00:29:45.839 Godwin Ekainu: I am calling them, renaming them as, lowercase letter, and also adding, not too much.

338 00:29:46.000 00:29:50.680 Godwin Ekainu: So this is just for a byte’s metadata management.

339 00:29:51.070 00:29:55.790 Godwin Ekainu: I’m calling these source products, raw products, basically, for these products.

340 00:29:55.940 00:29:59.399 Godwin Ekainu: For the staging orders, she’s doing the same thing.

341 00:29:59.650 00:30:02.330 Godwin Ekainu: For my creator that on…

342 00:30:02.520 00:30:09.350 Godwin Ekainu: my time, time, columns, basically, my daytime columns. Change that type to timestamp.

343 00:30:10.470 00:30:14.229 Godwin Ekainu: The, for these, columns, so we have some

344 00:30:14.550 00:30:19.400 Godwin Ekainu: orders, without, canceled that or closed that.

345 00:30:19.640 00:30:24.660 Godwin Ekainu: Then I’m checking, I’m converting them to knowledge and empty, then assigning a timestamp to it.

346 00:30:24.780 00:30:28.710 Godwin Ekainu: So then the others are just renaming,

347 00:30:29.270 00:30:35.009 Godwin Ekainu: Same thing here. I’m setting them to know if they are empty, if they’re empty, rather.

348 00:30:36.840 00:30:46.260 Godwin Ekainu: Changing a type to text, change a type to text. And this is just for bytes, metadata in the station layer.

349 00:30:46.580 00:30:53.509 Godwin Ekainu: Same thing as the customer’s model, too. Both type… just the Skype inference, but…

350 00:30:56.590 00:31:02.410 Godwin Ekainu: This address was a JSON, blob.

351 00:31:04.720 00:31:10.840 Godwin Ekainu: And I’m leaving… I left it at this… some club in the… Staging layout, so…

352 00:31:11.050 00:31:17.209 Godwin Ekainu: For the schema that I am, the schema that I… normally, I don’t usually combine.

353 00:31:17.630 00:31:21.240 Godwin Ekainu: He’s sourcing… The same with his schema.

354 00:31:21.450 00:31:22.330 Godwin Ekainu: Follow.

355 00:31:22.520 00:31:30.150 Godwin Ekainu: Normally, I have a sauce by yamo file, and I call this here. Because I’m not doing much for this, I’m just adding them in a single file.

356 00:31:31.330 00:31:36.050 Godwin Ekainu: then I have my mod… my… each table description.

357 00:31:36.540 00:31:43.319 Godwin Ekainu: and simple tests, so I’m just testing for known nodes and unique in each of these tables.

358 00:31:43.440 00:31:47.070 Godwin Ekainu: Then, here I’m referring, there’s a foreign key.

359 00:31:47.290 00:31:52.419 Godwin Ekainu: Some… Checking the relationship in the court for the foreign queues, basically.

360 00:31:53.300 00:32:01.809 Godwin Ekainu: Same thing here, description, just simple test of nodes and non-nodes in your ID, new product ID is a column.

361 00:32:03.240 00:32:08.749 Godwin Ekainu: For the intermediate clear, install this.

362 00:32:11.150 00:32:18.509 Godwin Ekainu: So, for the line item, which was a blob, the idea was to build it like an order summary, which

363 00:32:18.920 00:32:23.330 Godwin Ekainu: And for my own, for what I talked about, it was, just be, like, a simple…

364 00:32:23.450 00:32:27.139 Godwin Ekainu: Other summary that shows, like, basic details about an order.

365 00:32:27.390 00:32:43.860 Godwin Ekainu: And to do that, the line… I decided to use the line item, and the line item is a blob, JSON blob, and I have to… I have to flatten that JSON blob. I decided to do that in the interject layer. So the interject layer contains where… the layer where you have your complex.

366 00:32:44.100 00:32:47.910 Godwin Ekainu: Transformations, data processing.

367 00:32:48.400 00:32:53.789 Godwin Ekainu: Basically, trying to fit that data into a particular use case, a business use case, a business.

368 00:32:53.900 00:32:57.639 Godwin Ekainu: Yeah, business metrics. Metric.

369 00:32:58.230 00:33:11.579 Godwin Ekainu: So, I’m first flattening… okay, I’m referring to my staging orders here, I’m referring to my products here, and I’m flattening the data and assigning some types to the columns, some of the columns.

370 00:33:11.980 00:33:15.760 Godwin Ekainu: And then, after flattening it, I’m draining it.

371 00:33:15.900 00:33:19.079 Godwin Ekainu: with the product table, based on the product ID.

372 00:33:19.350 00:33:22.230 Godwin Ekainu: To get, some other product information.

373 00:33:23.230 00:33:24.500 Godwin Ekainu: Encasam also.

374 00:33:24.810 00:33:30.369 Godwin Ekainu: like, assigning types to the numeric columns, numeric times and boolean types to…

375 00:33:30.610 00:33:32.429 Godwin Ekainu: Some of the fields, too.

376 00:33:32.830 00:33:36.840 Godwin Ekainu: After joining, I’m doing my select order join,

377 00:33:37.290 00:33:44.249 Godwin Ekainu: you notice in these, projects, I did not, set up, like, SQL typing or parsing.

378 00:33:44.560 00:33:49.579 Godwin Ekainu: I didn’t want to complicate it, that much. Normally, I’ll use SQL Fluff.

379 00:33:49.730 00:33:52.189 Godwin Ekainu: to Chan’s, to Chan.

380 00:33:52.380 00:33:54.510 Godwin Ekainu: Setup, what do you call it?

381 00:33:54.690 00:34:02.850 Godwin Ekainu: Sgo type, rules, so that if… if my projects, if my…

382 00:34:03.660 00:34:08.070 Godwin Ekainu: my, what do you call, my SQL script doesn’t follow particular format on all

383 00:34:08.219 00:34:14.239 Godwin Ekainu: I won’t be able to push to my GitHub repository, it will fail that and tell me to go back and fix.

384 00:34:14.780 00:34:16.800 Godwin Ekainu: The passing arrow, really clip.

385 00:34:17.000 00:34:21.219 Godwin Ekainu: For the ints orders, I’m just basically calling the orders,

386 00:34:22.239 00:34:39.269 Godwin Ekainu: I did this… I’m doing a basic join, so I did this because I did not want to do, like, a complex transformation in my summary layer, lies in the math layer. I prefer to just do, like, a basic join, calling the tables directly in the mark layer, to keep it simple.

387 00:34:39.679 00:34:43.870 Godwin Ekainu: Sorry, let me… I’ve missed this.

388 00:34:46.469 00:34:54.460 Godwin Ekainu: So in my Yamelf, my int Yamelf, I’m following the same principle, basic discussion about the tables.

389 00:34:54.750 00:35:00.760 Godwin Ekainu: Then, busy testing on some tables, so… And follows.

390 00:35:02.080 00:35:09.469 Godwin Ekainu: So, you have a non… not no test, and you have a not-known unique test on the other ID, and having not known test on the customer ID here.

391 00:35:09.670 00:35:13.690 Godwin Ekainu: For the interorderline item, this is me,

392 00:35:14.120 00:35:18.970 Godwin Ekainu: like I said, flattening the line item to, like, a single row.

393 00:35:19.320 00:35:22.800 Godwin Ekainu: Line item, and joining that to the,

394 00:35:23.220 00:35:28.120 Godwin Ekainu: What do you guys say, orders table, right now. Oh, the customers, the product table.

395 00:35:28.380 00:35:32.520 Godwin Ekainu: So I’m also doing busy tests on some field, the other IT.

396 00:35:32.790 00:35:37.239 Godwin Ekainu: Not noticed. The line item and the ID, not noticed.

397 00:35:37.640 00:35:40.070 Godwin Ekainu: Then busy discussion.

398 00:35:40.330 00:35:47.470 Godwin Ekainu: for… I was checking, the refuge.

399 00:35:48.860 00:35:52.680 Godwin Ekainu: Down to my summary table.

400 00:35:52.940 00:35:58.070 Godwin Ekainu: I have, my orders here. I’m calling my reference, my int orders.

401 00:35:58.390 00:36:01.740 Godwin Ekainu: Then, also calling.

402 00:36:01.940 00:36:10.449 Godwin Ekainu: G… line item, metrics, some aggregating metrics on the… aggregating metrics on the line item, and intermediate table.

403 00:36:10.790 00:36:16.999 Godwin Ekainu: And I’m calling the, doing, like, a join, combining them together to give me, like, an order summary.

404 00:36:17.940 00:36:21.190 Godwin Ekainu: So, in practice, let me see…

405 00:36:21.540 00:36:23.479 Godwin Ekainu: Let me show you how it looks like.

406 00:36:37.880 00:36:41.160 Awaish Kumar: Yeah, let’s talk about, like, for example.

407 00:36:42.320 00:36:47.580 Awaish Kumar: In dbt, we have created a mods table, March summary, Yeah.

408 00:36:48.340 00:36:52.320 Awaish Kumar: like, if… audio summary, sorry. So, if…

409 00:36:52.760 00:36:59.160 Awaish Kumar: If, like, that table grows to, like, hundreds of millions of rows, and…

410 00:36:59.490 00:37:01.870 Awaish Kumar: And it becomes really slow to execute it.

411 00:37:01.990 00:37:05.050 Awaish Kumar: What changes you would make, to optimize it?

412 00:37:06.850 00:37:12.070 Godwin Ekainu: Normally I would break down that TB into multiple,

413 00:37:12.510 00:37:18.530 Godwin Ekainu: So, I followed three approaches. First, I partitioned the table to ensure

414 00:37:18.940 00:37:24.490 Godwin Ekainu: Apart soon and closer the table. So this is the approach I pull on BigQuery. I’m not sure.

415 00:37:25.100 00:37:31.959 Godwin Ekainu: about Snowflake, but I think it’s a general practice, to partition a cluster your table, so that you ensure that,

416 00:37:32.370 00:37:41.620 Godwin Ekainu: when you’re across… when you’re acquiring that table, you infer… you also enforce that partition. So, what I mean by that is, when you’re acquiring a partition table.

417 00:37:41.720 00:37:58.020 Godwin Ekainu: you have to query by… you have to filter by the partition column. If not, the query is not going to run. I don’t know if that works for Snowflips, then BigQuery, we do that. So if I partition, like, another table name accurately, and someone is going to query that table, you have to put… filter by the partition field.

418 00:37:58.270 00:38:05.529 Godwin Ekainu: To ensure that the data runs smooth. That is fastens the query time and increase the… increase the speed.

419 00:38:05.730 00:38:19.719 Godwin Ekainu: Basically, for the query, and it also reduces the cost, reduces the processing time, so… so instead of querying or scanning the entire table, it just picks the data from the particular partition you’re interested in, and…

420 00:38:19.830 00:38:32.859 Godwin Ekainu: Gives that to… to partitioning, of course, clustering works. If that doesn’t work, I break down the table into multiple tables. Basically, I reduce the… the… instead of doing… when creating a table, instead of doing, like, a…

421 00:38:33.110 00:38:45.690 Godwin Ekainu: what do you call it? Let’s say you have 100 million records. I archive part of the data into, like, a, archived layer, then just, query, the latest layer.

422 00:38:46.090 00:39:02.969 Godwin Ekainu: view the table using the latest data, basically. So, data for the last few years, instead of data for the last 20 years, I reduce that table to maybe query data for the last 5 years or so, but I don’t do this without discussing with the stakeholders who depend on this data.

423 00:39:07.100 00:39:09.650 Godwin Ekainu: Hmm, I hope that answers the question.

424 00:39:14.960 00:39:17.099 Awaish Kumar: Yeah, Demi, you have any follow-up?

425 00:39:17.650 00:39:23.680 Demilade Agboola: I think my question would just be around, ensure… how do we ensure that the tests

426 00:39:24.470 00:39:30.749 Demilade Agboola: On the data, like, if anything goes wrong with the data, we’re the first people to know before any stakeholders.

427 00:39:33.390 00:39:40.950 Godwin Ekainu: So normally, Normally, you, you… in TBTC,

428 00:39:41.250 00:39:45.860 Godwin Ekainu: I don’t know how to do that on dbt Couple. In dbt Cloud, you usually do, like, a,

429 00:39:46.010 00:39:50.019 Godwin Ekainu: Data quality check on your data, so you like your source freshness.

430 00:39:50.150 00:39:54.090 Godwin Ekainu: And you do, like, your data quality checks, your dbtml 5, too.

431 00:39:54.250 00:40:02.489 Godwin Ekainu: So if there’s anything wrong with the data during the job run, it’s, we set up, like, an alert to your Slack channel.

432 00:40:02.690 00:40:21.239 Godwin Ekainu: Dbt has a way of doing that easily. So, once there’s something wrong with the data, based on your data quality, your source permissions check, it sends it… it sends the alert basically to a Slack channel, or your email, some… wherever you decide to want to receive your alert, and when you get that, you go there and fix it,

433 00:40:21.640 00:40:32.300 Godwin Ekainu: Before… so it enforces, I… it enforces that the, the, what do you call it? So this is usually done on the staging layer, before it gets to the production layer.

434 00:40:32.570 00:40:41.469 Godwin Ekainu: Basically, so once you see that, you go back and fix it, and you run your jobs, and it sends the data to the, run the production job, rather.

435 00:40:53.570 00:41:01.889 Awaish Kumar: Okay, yeah, can you, like… Name, what is the materialization in dbt, and what are different materializations?

436 00:41:03.020 00:41:07.780 Godwin Ekainu: So in DBT, we have the…

437 00:41:08.180 00:41:13.440 Godwin Ekainu: We have the table, the view, we have the incrementer, we have the infirmary.

438 00:41:13.690 00:41:21.369 Godwin Ekainu: I think it also depends on the, so those are naturalization.

439 00:41:21.930 00:41:24.680 Godwin Ekainu: And I think there’s something…

440 00:41:25.100 00:41:30.449 Godwin Ekainu: Those are, like, the forms I’m familiar with, basically. I’m not sure if there are others.

441 00:41:31.300 00:41:36.269 Uttam Kumaran: Are you familiar… yeah, I guess, are you familiar with, like, in what situation you would use…

442 00:41:36.400 00:41:38.429 Uttam Kumaran: Like, incremental, for example.

443 00:41:39.890 00:41:49.379 Godwin Ekainu: So for incremental is when you don’t want to, do, like, a full run on your entire data, because when you’re inserting into your… when you’re creating your…

444 00:41:49.500 00:41:51.729 Godwin Ekainu: your models. So, for example.

445 00:41:51.940 00:41:59.920 Godwin Ekainu: When you do, like, an incremental run, it checks your source table, your destination table, right, and compares against yours.

446 00:42:00.200 00:42:07.810 Godwin Ekainu: The source, and checks that, and based on the, based on the field, basically.

447 00:42:08.030 00:42:23.789 Godwin Ekainu: It checks and see that, if there’s no data in a particular… if data is missing from a particular partition, the particular row inserts that data. So instead of doing, like, a full run, where you rerun your whole table, or you create… recreate the whole table, if you… you recreate the whole table, it’s…

448 00:42:23.940 00:42:29.739 Godwin Ekainu: Only inserts what’s not… what doesn’t… what’s not existing in that particular table.

449 00:42:31.400 00:42:31.960 Uttam Kumaran: Okay.

450 00:42:35.130 00:42:39.110 Godwin Ekainu: So everybody’s gonna work.

451 00:42:39.650 00:42:43.400 Awaish Kumar: Yeah, I think that’s it for me. Utum, Dami, if you have anything else.

452 00:42:44.610 00:42:48.129 Uttam Kumaran: Yeah, I guess, Godwin, any questions for us?

453 00:42:48.670 00:42:54.859 Uttam Kumaran: like, anything as part of this process, or any questions while you have the three of us, about BrainForge, or anything you’d like to ask?

454 00:42:56.890 00:42:59.110 Godwin Ekainu: So…

455 00:42:59.250 00:43:09.949 Godwin Ekainu: Don’t have any questions. So for this interview, I didn’t compare with any questions, because I know I… discussing with, Auation, you had… I already asked a lot of the questions, I wasn’t.

456 00:43:10.380 00:43:12.980 Godwin Ekainu: I was particularly interested in Axi.

457 00:43:13.650 00:43:21.200 Godwin Ekainu: My question will just be basically on the project, so, why,

458 00:43:21.830 00:43:27.610 Godwin Ekainu: How did you find the project for the entire, solution,

459 00:43:28.140 00:43:33.139 Godwin Ekainu: How did you see it? What was your feedback based on the entire solution?

460 00:43:35.060 00:43:49.979 Uttam Kumaran: Yeah, I mean, I think I always love to see, like, a broader depth in, like, data engineering, so I think you have, like, a lot of depth there in, like, setting up Airbyte, and sort of how you’re thinking about, like, grants, and I think you have a pretty good understanding of dbt, so that’s probably my feedback.

461 00:43:54.550 00:44:00.099 Godwin Ekainu: Thank you, okay, so, I guess…

462 00:44:00.730 00:44:07.959 Godwin Ekainu: I didn’t show you guys… I don’t know if you guys want to see the run. I left this to be… to run for…

463 00:44:09.010 00:44:10.630 Godwin Ekainu: visits.

464 00:44:10.930 00:44:11.850 Godwin Ekainu: Thank you.

465 00:44:11.960 00:44:14.569 Godwin Ekainu: So it has been running since I deployed it, so…

466 00:44:16.070 00:44:23.710 Godwin Ekainu: So I set up a… what do you call it? A production instance on PlanetSQ to test out the production on…

467 00:44:23.820 00:44:26.630 Godwin Ekainu: Physically. He’s been training touch for a while.

468 00:44:27.310 00:44:31.449 Godwin Ekainu: I didn’t show you guys the data, but I guess, I’m not sure about the…

469 00:44:31.740 00:44:33.919 Godwin Ekainu: Credentials I use for contacts now.

470 00:44:36.070 00:44:38.889 Godwin Ekainu: In case that was all.

471 00:44:43.720 00:44:44.560 Uttam Kumaran: Right.

472 00:44:49.910 00:44:53.330 Godwin Ekainu: So, any questions at handoffs?

473 00:44:54.360 00:44:56.139 Uttam Kumaran: Yeah, I think that’s it from my side.

474 00:45:00.090 00:45:01.659 Demilade Agboola: Yeah, that’s it from my side, too.

475 00:45:02.070 00:45:02.660 Uttam Kumaran: Okay.

476 00:45:03.210 00:45:06.720 Uttam Kumaran: Perfect. Alright, thank you, everyone. Thank you, Godwin. Appreciate it.

477 00:45:07.580 00:45:08.490 Godwin Ekainu: Thank you, everyone.

478 00:45:09.040 00:45:10.109 Awaish Kumar: Okay, nice.

479 00:45:10.110 00:45:11.230 Godwin Ekainu: Chatting with you.

480 00:45:11.710 00:45:13.019 Uttam Kumaran: Yeah, appreciate it.

481 00:45:13.020 00:45:14.170 Godwin Ekainu: Right. Talk to you soon.

482 00:45:14.280 00:45:14.850 Uttam Kumaran: Bye.