Meeting Title: Brainforge Final Interview Date: 2026-03-13 Meeting participants: Uttam Kumaran, Godwin Ekainu, Awaish Kumar, Demilade Agboola
WEBVTT
1 00:00:16.570 ⇒ 00:00:17.760 Uttam Kumaran: Hello!
2 00:00:23.050 ⇒ 00:00:24.150 Godwin Ekainu: Hi, Tom.
3 00:00:24.470 ⇒ 00:00:26.449 Uttam Kumaran: Hey, how are you? Good to see you.
4 00:00:28.160 ⇒ 00:00:30.170 Godwin Ekainu: I’m doing great. How are you, Dean?
5 00:00:30.600 ⇒ 00:00:31.320 Uttam Kumaran: Good.
6 00:00:31.440 ⇒ 00:00:33.329 Uttam Kumaran: How’s the… how’s the week going?
7 00:00:34.170 ⇒ 00:00:39.479 Godwin Ekainu: Yeah, it’s been fine so far. The week went great. How was yours?
8 00:00:40.890 ⇒ 00:00:43.909 Uttam Kumaran: It’s good. It’s busy. Yeah. Busy.
9 00:00:45.420 ⇒ 00:00:52.270 Uttam Kumaran: It’s good, though. It’s, good busy. Yeah, we’re… we’re… team is growing, and…
10 00:00:53.630 ⇒ 00:00:57.260 Uttam Kumaran: Yeah, team is growing, and I feel like it’s… it’s just been,
11 00:00:57.390 ⇒ 00:01:01.179 Uttam Kumaran: Good to start to work with some new clients and some new capabilities, so…
12 00:01:02.150 ⇒ 00:01:03.650 Godwin Ekainu: That’s on this 18.
13 00:01:04.060 ⇒ 00:01:04.730 Uttam Kumaran: Yeah…
14 00:01:04.730 ⇒ 00:01:06.280 Godwin Ekainu: Obviously, lots is going on.
15 00:01:06.650 ⇒ 00:01:07.290 Uttam Kumaran: Yeah.
16 00:01:07.490 ⇒ 00:01:09.990 Uttam Kumaran: What’s a… what’s a weekend plan, tell me?
17 00:01:12.330 ⇒ 00:01:20.390 Godwin Ekainu: So my UK plan is, I’ve been working on with the… so I shut down my home lab. Not really shut down, but I’m trying to automate it.
18 00:01:20.890 ⇒ 00:01:27.879 Godwin Ekainu: So I… I’m trying to, have you had… do you know what’s called PXC boots?
19 00:01:28.470 ⇒ 00:01:36.099 Godwin Ekainu: No. So there’s something… there’s something called PXA Boots, where, you can… Putting a new machine without…
20 00:01:36.100 ⇒ 00:01:36.989 Uttam Kumaran: PX reboot?
21 00:01:37.900 ⇒ 00:01:40.630 Godwin Ekainu: PXE boots, so it’s a network boots.
22 00:01:41.030 ⇒ 00:01:42.170 Godwin Ekainu: PXE.
23 00:01:42.910 ⇒ 00:01:44.270 Godwin Ekainu: Okay, okay, okay, okay.
24 00:01:45.150 ⇒ 00:01:55.119 Godwin Ekainu: So I’m trying to set that up for my home lab so that when I add a new machine to it, it automatically starts up the machine, assigns an IP address to it, then installs Ubuntu.
25 00:01:55.400 ⇒ 00:02:01.710 Godwin Ekainu: Set up the server, set up the network, then, trains the… my… my cluster.
26 00:02:02.630 ⇒ 00:02:04.660 Uttam Kumaran: Compared to that, this weekend.
27 00:02:05.430 ⇒ 00:02:08.660 Uttam Kumaran: Nice! I didn’t know you were doing a lot of, like, networking stuff.
28 00:02:09.440 ⇒ 00:02:17.480 Godwin Ekainu: Yeah, I play around with it. So I have a home lab, a Trinode home lab, where I set up Kubernetes, and I host some stuffs on it.
29 00:02:17.620 ⇒ 00:02:19.870 Godwin Ekainu: Play around with infrastructures and all.
30 00:02:21.030 ⇒ 00:02:23.370 Uttam Kumaran: Oh, interesting, great. Nice.
31 00:02:25.010 ⇒ 00:02:25.530 Godwin Ekainu: Kevin.
32 00:02:26.410 ⇒ 00:02:27.020 Awaish Kumar: Aye.
33 00:02:27.710 ⇒ 00:02:30.160 Godwin Ekainu: Hi, Arish. How are you doing?
34 00:02:30.640 ⇒ 00:02:31.980 Awaish Kumar: I’m good, how about you?
35 00:02:32.600 ⇒ 00:02:33.880 Godwin Ekainu: I’m doing great.
36 00:02:34.000 ⇒ 00:02:34.820 Godwin Ekainu: Thank you.
37 00:02:36.700 ⇒ 00:02:38.309 Uttam Kumaran: Wish, is it just me and you?
38 00:02:40.370 ⇒ 00:02:41.850 Awaish Kumar: Demi’s coming.
39 00:02:42.790 ⇒ 00:02:43.949 Awaish Kumar: Might be coming, yeah.
40 00:02:46.220 ⇒ 00:02:47.799 Uttam Kumaran: Okay, let’s go ahead and get started.
41 00:02:48.040 ⇒ 00:02:53.959 Uttam Kumaran: Yeah, I think, Godwin, I don’t know if you… if Awish, you guys already met once before.
42 00:02:55.120 ⇒ 00:02:55.780 Godwin Ekainu: Yes.
43 00:02:55.890 ⇒ 00:02:57.200 Awaish Kumar: Pretty much.
44 00:02:58.400 ⇒ 00:03:01.720 Uttam Kumaran: Cool, then yeah, I think we can get into the exercise. Yeah, feel free.
45 00:03:02.620 ⇒ 00:03:03.830 Godwin Ekainu: Okay,
46 00:03:04.730 ⇒ 00:03:10.800 Godwin Ekainu: So where should I start from? I don’t know if you guys have seen… gone through the…
47 00:03:11.130 ⇒ 00:03:12.240 Godwin Ekainu: SSAs.
48 00:03:13.370 ⇒ 00:03:14.880 Godwin Ekainu: The solution, rather.
49 00:03:15.940 ⇒ 00:03:18.520 Uttam Kumaran: Yeah, I think one thing that would be… Wish, go ahead.
50 00:03:19.730 ⇒ 00:03:27.450 Awaish Kumar: Yeah, I think I… we have reviewed the submitted challenge, but we want you to kind of give a demo of…
51 00:03:27.570 ⇒ 00:03:31.530 Awaish Kumar: What you have worked on, and how it looks like, what…
52 00:03:31.690 ⇒ 00:03:36.060 Awaish Kumar: How you made the choices, or… While communicating.
53 00:03:36.400 ⇒ 00:03:37.090 Awaish Kumar: Yep.
54 00:03:37.880 ⇒ 00:03:40.430 Godwin Ekainu: So let me share my window.
55 00:03:43.580 ⇒ 00:03:45.450 Godwin Ekainu: I’m just meant, screen.
56 00:03:47.230 ⇒ 00:03:48.899 Godwin Ekainu: I’m sorry, can you see my screen?
57 00:03:50.130 ⇒ 00:03:50.930 Uttam Kumaran: Yes.
58 00:03:52.380 ⇒ 00:03:55.669 Godwin Ekainu: Yeah. So,
59 00:04:00.800 ⇒ 00:04:04.619 Godwin Ekainu: Sorry, I wish. You’re seeing something.
60 00:04:06.130 ⇒ 00:04:11.300 Awaish Kumar: No, no, I just… I was just saying, I can see your screen, you can start.
61 00:04:12.220 ⇒ 00:04:12.990 Godwin Ekainu: Okay.
62 00:04:13.330 ⇒ 00:04:15.669 Godwin Ekainu: So for the challenge,
63 00:04:16.410 ⇒ 00:04:22.500 Godwin Ekainu: It was really straightforward for me, so it had a lot of information in it, basically.
64 00:04:22.710 ⇒ 00:04:26.370 Godwin Ekainu: I think it’s… you stated that I actually…
65 00:04:26.550 ⇒ 00:04:31.590 Godwin Ekainu: I was going to use a byte to set up the, for the ingestion.
66 00:04:31.810 ⇒ 00:04:36.199 Uttam Kumaran: Yeah, and Godwin, we’re still seeing… we’re seeing ourselves on the Zoom. I don’t know if you’re sharing something else.
67 00:04:37.430 ⇒ 00:04:39.469 Godwin Ekainu: I’m sharing my screen.
68 00:04:43.100 ⇒ 00:04:44.360 Godwin Ekainu: Sorry.
69 00:04:49.110 ⇒ 00:04:49.960 Godwin Ekainu: Okay.
70 00:04:49.960 ⇒ 00:04:52.779 Uttam Kumaran: I was seeing your screen, but it was just, like, us on the Zoom.
71 00:04:53.710 ⇒ 00:04:59.109 Godwin Ekainu: Yeah, I think I made a mistake. Can I see my, setup?
72 00:04:59.330 ⇒ 00:05:01.960 Godwin Ekainu: Yes.
73 00:05:02.180 ⇒ 00:05:03.120 Uttam Kumaran: Yeah, yeah, yeah.
74 00:05:04.600 ⇒ 00:05:06.369 Godwin Ekainu: I’m sharing my VS Code.
75 00:05:06.780 ⇒ 00:05:07.940 Awaish Kumar: Yes.
76 00:05:08.300 ⇒ 00:05:14.320 Godwin Ekainu: So, for the, AirBytes installation, that didn’t take much of a time.
77 00:05:14.510 ⇒ 00:05:18.679 Godwin Ekainu: One thing I noticed was that AirByte did not have a Docker Compose setup.
78 00:05:18.890 ⇒ 00:05:24.929 Godwin Ekainu: a Docker setup, from the last time I used it, you had, but just quite shocking when I went in and…
79 00:05:25.150 ⇒ 00:05:29.019 Godwin Ekainu: He had, like, a command line to set it up locally.
80 00:05:29.270 ⇒ 00:05:39.369 Godwin Ekainu: And so I used their command line. I noticed that it installed, Kubernetes, and installs a byte on top of the Kubernetes kind cluster.
81 00:05:39.620 ⇒ 00:05:42.159 Godwin Ekainu: Which is what I did, so…
82 00:05:42.640 ⇒ 00:05:48.899 Godwin Ekainu: You can see that, if I do a byte, ABC tail, sorry, let me do that again.
83 00:05:53.510 ⇒ 00:05:57.709 Godwin Ekainu: Why should you do this, it checks and sees that a bike is installed.
84 00:05:57.810 ⇒ 00:06:09.770 Godwin Ekainu: To get my credentials, I just drawn by ABCTL local credential, and I get my command line. I’m exposing this, though, because it’s localhost, I don’t think anyone has access to it.
85 00:06:10.050 ⇒ 00:06:16.730 Godwin Ekainu: So when… once a byte is… has… was installed, kind of,
86 00:06:18.130 ⇒ 00:06:24.359 Godwin Ekainu: The next part was to, try out AirByte, basically, and see how it works.
87 00:06:24.650 ⇒ 00:06:28.120 Godwin Ekainu: So I thought about ingestion,
88 00:06:29.930 ⇒ 00:06:32.810 Godwin Ekainu: So let me, do this.
89 00:06:39.250 ⇒ 00:06:41.249 Godwin Ekainu: So, the setup was like this.
90 00:06:44.680 ⇒ 00:06:46.950 Godwin Ekainu: And bytes the,
91 00:06:53.650 ⇒ 00:06:54.780 Godwin Ekainu: Who’s Chris?
92 00:06:55.900 ⇒ 00:07:03.140 Godwin Ekainu: for the, destination… I buy things, they tend to…
93 00:07:06.860 ⇒ 00:07:08.280 Godwin Ekainu: Gcs…
94 00:07:08.870 ⇒ 00:07:16.159 Godwin Ekainu: So the reason I used GCS was that I needed a way to, post the data, and I tried doing that locally.
95 00:07:17.210 ⇒ 00:07:23.219 Godwin Ekainu: But, I could not find a way to, to, track
96 00:07:23.460 ⇒ 00:07:28.609 Godwin Ekainu: link, mounts the data router in pita directory in my local machine.
97 00:07:28.860 ⇒ 00:07:37.609 Godwin Ekainu: Airbyte has a command where if you’re setting up Airbyte, the installation, you do, like, a volume path directory. I tried that, it didn’t work.
98 00:07:37.760 ⇒ 00:07:48.750 Godwin Ekainu: So I decided to just uploaded files to GCS, and the way… the way I uploaded them was, this… into, what do you call it? Call them? Directories, basically.
99 00:07:48.960 ⇒ 00:07:51.220 Godwin Ekainu: You can’t see it here, so…
100 00:07:51.360 ⇒ 00:07:57.249 Godwin Ekainu: We had the customers, we have the others, and we have the products, and each file went into a secret directory.
101 00:07:57.520 ⇒ 00:08:03.269 Godwin Ekainu: And in a normal production setting, each file, we have, like, a partitioning.
102 00:08:03.500 ⇒ 00:08:06.880 Godwin Ekainu: So for each stage data, it goes into a separate partition.
103 00:08:07.030 ⇒ 00:08:15.850 Godwin Ekainu: Where you have, like, your file name, the partition date, and some random, codes or numbers, basically.
104 00:08:16.310 ⇒ 00:08:24.889 Godwin Ekainu: So I operate the files into, GCS, then I use a byte, I set up, like, an byte connection, so set up the source connection.
105 00:08:26.100 ⇒ 00:08:28.289 Godwin Ekainu: So this is it.
106 00:08:29.220 ⇒ 00:08:37.690 Godwin Ekainu: For the source connection, source name GCS, I created a service account, and the service account just had, the…
107 00:08:38.049 ⇒ 00:08:46.319 Godwin Ekainu: GCS, data viewer row, and GCS bucket object row. I don’t know if I’m calling that correctly. And I set up my streams.
108 00:08:46.590 ⇒ 00:08:54.179 Godwin Ekainu: For each… for each file, I have a separate stream, so for customers, I’m looking into,
109 00:08:54.350 ⇒ 00:08:55.680 Godwin Ekainu: a foul.
110 00:08:56.040 ⇒ 00:08:58.849 Godwin Ekainu: name, basically, default in…
111 00:08:59.330 ⇒ 00:09:07.760 Godwin Ekainu: how it’s arranged in the bucket. I think for production use case, when you have multiple files, you probably just use an asteric tool.
112 00:09:08.150 ⇒ 00:09:10.280 Godwin Ekainu: Forgoti files in that directory.
113 00:09:10.450 ⇒ 00:09:18.380 Godwin Ekainu: This was just, optional if, had multiple days backfilled for 3 days.
114 00:09:18.560 ⇒ 00:09:23.589 Godwin Ekainu: Then, the rest were… The rest follow the same pattern.
115 00:09:23.840 ⇒ 00:09:30.530 Godwin Ekainu: shows the JSON L formats here, to ensure that, it follows the format of the files.
116 00:09:30.740 ⇒ 00:09:31.720 Godwin Ekainu: himself.
117 00:09:32.070 ⇒ 00:09:38.739 Godwin Ekainu: than, I tested the connection and made sure it, was working.
118 00:09:38.930 ⇒ 00:09:43.540 Godwin Ekainu: After setting up my source, I went to set up my destination.
119 00:09:43.670 ⇒ 00:09:50.179 Godwin Ekainu: I have a lot of destinations, because I’m using two currently, and the local PG to connect to the local Postgres.
120 00:09:50.400 ⇒ 00:10:02.070 Godwin Ekainu: for local run, and I have a, a Postgres instance running on PlanetScale, that I use to test for the, GitHub action and production instance.
121 00:10:02.180 ⇒ 00:10:06.240 Godwin Ekainu: So for the local PG, I used,
122 00:10:07.470 ⇒ 00:10:11.650 Godwin Ekainu: So, the host is my Docker bridge network.
123 00:10:11.820 ⇒ 00:10:18.140 Godwin Ekainu: So I’m using this because when you do… when you set up
124 00:10:18.780 ⇒ 00:10:22.459 Godwin Ekainu: what do you call it? So AirByte is set up inside a Docker container.
125 00:10:22.610 ⇒ 00:10:27.010 Godwin Ekainu: Postgres is set up using Docker Compose.
126 00:10:27.130 ⇒ 00:10:36.139 Godwin Ekainu: And for them to communicate, you have to use a bridge, so everybody can’t really access, what do you call it, Postgres, because of…
127 00:10:36.310 ⇒ 00:10:46.069 Godwin Ekainu: Because, they are not directly in the same network, so you have to bridge them. So, when you do run stuff on Docker, you can do,
128 00:10:47.420 ⇒ 00:10:49.390 Godwin Ekainu: What was it called Commanding Kane?
129 00:10:54.400 ⇒ 00:10:57.110 Godwin Ekainu: She’s in my dock somewhere, can’t really remember that.
130 00:10:58.300 ⇒ 00:11:07.430 Godwin Ekainu: then use the IP address, basically, for this. Then, my port of 5432, data business, Shopify, Kimaro…
131 00:11:07.660 ⇒ 00:11:10.749 Godwin Ekainu: Then the role was in flu ingestion.
132 00:11:11.040 ⇒ 00:11:13.180 Godwin Ekainu: So no sacrifice or so.
133 00:11:13.740 ⇒ 00:11:16.790 Godwin Ekainu: Configured for this, then my database password.
134 00:11:17.450 ⇒ 00:11:24.120 Godwin Ekainu: then, this was just, optional, or default setting, basically. So it has a connection.
135 00:11:25.590 ⇒ 00:11:27.680 Godwin Ekainu: made sure it was connecting before I…
136 00:11:27.850 ⇒ 00:11:30.519 Godwin Ekainu: If I, trigger the job.
137 00:11:30.820 ⇒ 00:11:33.319 Godwin Ekainu: So after doing that,
138 00:11:34.000 ⇒ 00:11:36.549 Godwin Ekainu: Created… created, like, a connection to…
139 00:11:36.940 ⇒ 00:11:42.729 Godwin Ekainu: sync data from the look… from the storage location into the Postgres instance.
140 00:11:43.580 ⇒ 00:11:44.070 Awaish Kumar: Okay.
141 00:11:45.350 ⇒ 00:11:45.930 Godwin Ekainu: Wow.
142 00:11:46.560 ⇒ 00:11:47.610 Godwin Ekainu: So, okay.
143 00:11:47.610 ⇒ 00:11:53.119 Awaish Kumar: Sorry, I have a question. If I have to read… instead of reading from Google Cloud Storage, if I just have to read
144 00:11:53.260 ⇒ 00:11:54.720 Awaish Kumar: From local machine.
145 00:11:55.640 ⇒ 00:11:58.520 Awaish Kumar: Can you reproduce the flow?
146 00:12:00.130 ⇒ 00:12:07.819 Godwin Ekainu: Yes, if I’m able to get, the local, Look after fans, look after…
147 00:12:07.820 ⇒ 00:12:08.350 Awaish Kumar: Pretty?
148 00:12:10.470 ⇒ 00:12:11.190 Godwin Ekainu: Sorry?
149 00:12:11.730 ⇒ 00:12:16.700 Awaish Kumar: The files are already, like, they’re… you can… you’re able to download it, right?
150 00:12:16.870 ⇒ 00:12:17.910 Awaish Kumar: Yeah, so…
151 00:12:17.910 ⇒ 00:12:24.469 Godwin Ekainu: Yes, I downloaded them, so I had issues with, mounting the files on my local machine.
152 00:12:24.820 ⇒ 00:12:32.660 Godwin Ekainu: Basically, to enable, AirByte to read the file, you have to mount the file to AirByte.
153 00:12:33.330 ⇒ 00:12:48.560 Godwin Ekainu: Into the Airbyte, data directory. When I did that, AirByte wasn’t reading… actually seeing the file. I suspect it has to do with some networking issue, because I tried, other, approaches. I tried other approaches, and that didn’t work, so…
154 00:12:48.660 ⇒ 00:12:53.679 Godwin Ekainu: Airbite has a… Let me see if I can look at the documentation.
155 00:12:56.790 ⇒ 00:12:58.810 Awaish Kumar: Okay, that’s okay, we can move on.
156 00:12:59.370 ⇒ 00:12:59.950 Godwin Ekainu: Okay.
157 00:13:00.070 ⇒ 00:13:10.939 Godwin Ekainu: But basically, it’s… you can reproduce it using the local directory, so you just need to set up your source to point to, so if you go to a source and create a news…
158 00:13:11.190 ⇒ 00:13:14.819 Godwin Ekainu: status, you have to… can just let file.
159 00:13:16.110 ⇒ 00:13:22.240 Godwin Ekainu: And you create a file, source, you create your schema, you format, JSONL for this case.
160 00:13:22.440 ⇒ 00:13:25.909 Godwin Ekainu: If you’re doing local, local system.
161 00:13:26.110 ⇒ 00:13:29.029 Godwin Ekainu: And it paths to your directory, basically.
162 00:13:30.500 ⇒ 00:13:42.389 Godwin Ekainu: Yeah, but for me, I could not, get the URL, so I was mounting the files, but it wasn’t really seeing the files when I do this, so I just decided to go with the storage location.
163 00:13:45.080 ⇒ 00:13:48.949 Godwin Ekainu: Yeah, so, back to the connection,
164 00:13:49.740 ⇒ 00:13:53.549 Godwin Ekainu: for this, it creates a connection, I just…
165 00:13:53.870 ⇒ 00:14:02.199 Godwin Ekainu: Click on your sync now and syncs. So you can see that I did this 5 days ago, when I… when I was working on this. So it syncs…
166 00:14:02.200 ⇒ 00:14:02.950 Awaish Kumar: Don’t worry.
167 00:14:03.550 ⇒ 00:14:10.189 Awaish Kumar: Yeah, on these connections, did you know how to set up the monitoring, so if any of the sync failed?
168 00:14:10.340 ⇒ 00:14:14.409 Awaish Kumar: It just, send the, alert to Slack.
169 00:14:15.160 ⇒ 00:14:17.159 Godwin Ekainu: Niggin to that.
170 00:14:20.450 ⇒ 00:14:20.780 Awaish Kumar: No.
171 00:14:21.470 ⇒ 00:14:22.770 Godwin Ekainu: I was doing these.
172 00:14:23.180 ⇒ 00:14:26.740 Awaish Kumar: Do you have an idea, like, if it can be done, or not, or whatever?
173 00:14:27.430 ⇒ 00:14:30.450 Godwin Ekainu: For none, I’m not sure, but I think it can be done.
174 00:14:31.130 ⇒ 00:14:38.739 Godwin Ekainu: I’m not really sure, but it should be something that should be able… it can be done, basically, but I’m not really sure, because I’ve not really tried it.
175 00:14:39.380 ⇒ 00:14:40.060 Awaish Kumar: Okay.
176 00:14:42.030 ⇒ 00:14:54.300 Godwin Ekainu: So, this is just for the air bytes, basically. Then, for the Postgres, Postgres, postgres setup,
177 00:14:57.040 ⇒ 00:14:59.810 Godwin Ekainu: I have the Docker Compose here.
178 00:15:00.890 ⇒ 00:15:07.490 Godwin Ekainu: Just the basic Docker Compose for setting up Postgres 18 instance, setting up the Docker name.
179 00:15:07.740 ⇒ 00:15:12.959 Godwin Ekainu: restart policy on the environments. I’m using an ESV file, so I’ll do my
180 00:15:13.200 ⇒ 00:15:16.559 Godwin Ekainu: database and user and password and UP name.
181 00:15:16.930 ⇒ 00:15:19.659 Godwin Ekainu: Than to be able to access my…
182 00:15:20.030 ⇒ 00:15:28.180 Godwin Ekainu: Docker, my Postgres instance externally, that’s for everybody to be accessed instance, I had to set up this,
183 00:15:28.700 ⇒ 00:15:33.750 Godwin Ekainu: IP address, localhost IP address with, reports.
184 00:15:34.110 ⇒ 00:15:38.259 Godwin Ekainu: The amount in volume, where my data is going to be stored.
185 00:15:38.410 ⇒ 00:15:42.699 Godwin Ekainu: Then I have an initialization script for the, ARPA crows.
186 00:15:42.820 ⇒ 00:15:47.029 Godwin Ekainu: for the… so once you start… start up the Docker Compose, it’s…
187 00:15:47.190 ⇒ 00:15:55.270 Godwin Ekainu: Initializes the scripts, creates the databases, creates the rows, basically, for each of the, schemas.
188 00:15:55.890 ⇒ 00:16:00.580 Godwin Ekainu: Then, this is more of a health check to check that, by one.
189 00:16:01.500 ⇒ 00:16:05.200 Awaish Kumar: So, we have a volume here called Postgres Data.
190 00:16:05.400 ⇒ 00:16:09.699 Awaish Kumar: Is it part of Local Compose, and it, like, if I…
191 00:16:10.440 ⇒ 00:16:15.029 Awaish Kumar: like, it will get deleted if I just run Docker Compose.
192 00:16:15.820 ⇒ 00:16:16.990 Awaish Kumar: Tom or something.
193 00:16:17.100 ⇒ 00:16:20.549 Awaish Kumar: So how we can make it something which
194 00:16:20.970 ⇒ 00:16:23.959 Awaish Kumar: Is not deleted as part of that command.
195 00:16:27.190 ⇒ 00:16:30.679 Godwin Ekainu: So, I know if you do docker compose down,
196 00:16:30.890 ⇒ 00:16:39.460 Godwin Ekainu: minus V, delete the storage and resets everything. I assume to not delete it, you have to set up, like, a,
197 00:16:39.940 ⇒ 00:16:44.299 Godwin Ekainu: a delete policy, I don’t know if, or you set up a backup, basically.
198 00:16:44.640 ⇒ 00:16:54.589 Godwin Ekainu: You can do, like, a backup for your instance, to make sure it’s not deleted, and if it’s deleted, I can easily restore the backup, or you can set up, like, a delete
199 00:16:54.700 ⇒ 00:17:01.680 Godwin Ekainu: a non-delete, what do you call it, policy on your Docker Compose. So your storage location, your storage is not really deleted.
200 00:17:03.550 ⇒ 00:17:05.130 Awaish Kumar: Okay. Okay.
201 00:17:06.869 ⇒ 00:17:11.419 Godwin Ekainu: Yeah, and use the volume mounts, to ensure that the volume is mounted.
202 00:17:12.349 ⇒ 00:17:15.009 Godwin Ekainu: So to our back froze,
203 00:17:15.459 ⇒ 00:17:18.749 Godwin Ekainu: It’s basically just, creating the schemas first.
204 00:17:19.339 ⇒ 00:17:37.349 Godwin Ekainu: So I have my raw schema for the ingestion, AirByte sends data to the raw schema, and I have my dev staging, dev intermediate, and dev mats for my dbt dev environment, and I have my staging intermediate and match layer for my
205 00:17:37.539 ⇒ 00:17:39.219 Godwin Ekainu: DPT production layer.
206 00:17:39.729 ⇒ 00:17:44.719 Godwin Ekainu: Then I’m revoking, assets, so that,
207 00:17:45.649 ⇒ 00:17:50.129 Godwin Ekainu: Users will not have access to this database unless they are granted.
208 00:17:50.449 ⇒ 00:17:53.369 Godwin Ekainu: That access, or that permission, basically.
209 00:17:53.759 ⇒ 00:17:56.919 Godwin Ekainu: And yeah, I’m doing… I’m doing a follow-up to…
210 00:17:57.139 ⇒ 00:18:06.839 Godwin Ekainu: Create the rules, check if the rule exists. If it doesn’t exist, create the rule with the login password, basically. For production instance, this will be changed.
211 00:18:07.400 ⇒ 00:18:10.820 Awaish Kumar: Is it password hard-coded in the scroll, or…
212 00:18:11.220 ⇒ 00:18:14.930 Godwin Ekainu: Yeah, and the script is… the script is hard-coded,
213 00:18:15.330 ⇒ 00:18:18.639 Godwin Ekainu: But in… for a production setting, this should be…
214 00:18:18.950 ⇒ 00:18:22.880 Godwin Ekainu: Use setup using an environmental variable, basically.
215 00:18:23.030 ⇒ 00:18:25.890 Awaish Kumar: But, like, it’s already on GitHub now.
216 00:18:28.480 ⇒ 00:18:34.019 Godwin Ekainu: So, in the documentation, you see that I said you should change this, so these are, like, default passwords.
217 00:18:37.490 ⇒ 00:18:38.100 Godwin Ekainu: Yeah.
218 00:18:38.380 ⇒ 00:18:42.439 Godwin Ekainu: So, for the, rule ingestion,
219 00:18:42.810 ⇒ 00:18:50.120 Godwin Ekainu: I’m granting, database Shopify to the rule ingestion, so this is the rule that AirByte uses.
220 00:18:50.730 ⇒ 00:18:55.950 Godwin Ekainu: Basically, to ingest it, then, according to air by documentation, you have to grant this, too.
221 00:18:56.100 ⇒ 00:18:59.720 Godwin Ekainu: To the role you are using, or the user you are using.
222 00:19:00.710 ⇒ 00:19:07.620 Godwin Ekainu: then I’m granting access to the raw database so that I can twitch, initial tables on the…
223 00:19:07.800 ⇒ 00:19:09.829 Godwin Ekainu: Raw, schema 2.
224 00:19:10.000 ⇒ 00:19:16.319 Godwin Ekainu: then I’m granting the ability to, to also create, insert, update, and delete.
225 00:19:16.870 ⇒ 00:19:22.619 Godwin Ekainu: on all tables in the schema, in the raw schema, too, so that every time you’re trying to…
226 00:19:22.960 ⇒ 00:19:27.309 Godwin Ekainu: Ingest the time to the schema updates, everybody needs access to all this.
227 00:19:27.590 ⇒ 00:19:28.810 Godwin Ekainu: that,
228 00:19:30.990 ⇒ 00:19:43.810 Godwin Ekainu: So we have, similar setup for the, row transformation. So this is the role, DBT uses. The DBT user uses, so I’m sequencing it access to read.
229 00:19:44.260 ⇒ 00:19:45.420 Godwin Ekainu: From the road.
230 00:19:46.110 ⇒ 00:19:49.420 Godwin Ekainu: schema, so that it can, read from the raw schema and…
231 00:19:49.600 ⇒ 00:19:52.359 Godwin Ekainu: And get the data from it, basically.
232 00:19:52.650 ⇒ 00:19:54.650 Godwin Ekainu: Then, also…
233 00:19:55.460 ⇒ 00:19:55.870 Awaish Kumar: So.
234 00:19:55.870 ⇒ 00:19:56.359 Godwin Ekainu: Giving that.
235 00:19:57.780 ⇒ 00:20:01.729 Awaish Kumar: Yeah, like, in the challenge, we asked to, like,
236 00:20:01.970 ⇒ 00:20:05.550 Awaish Kumar: Also, to have, two different, like,
237 00:20:05.930 ⇒ 00:20:15.590 Awaish Kumar: workflows for run… to run on GitHub Actions. One on PR validations, which basically points to staging, right? I don’t see any,
238 00:20:15.960 ⇒ 00:20:18.569 Awaish Kumar: Databases, for staging.
239 00:20:19.640 ⇒ 00:20:21.310 Godwin Ekainu: Yeah, so this is us.
240 00:20:21.530 ⇒ 00:20:22.220 Awaish Kumar: comes?
241 00:20:23.350 ⇒ 00:20:25.569 Awaish Kumar: There are no schemas for a staging environment.
242 00:20:25.570 ⇒ 00:20:27.019 Godwin Ekainu: This is cause teaching.
243 00:20:28.720 ⇒ 00:20:33.939 Awaish Kumar: So, basically, what I did was just, for Devon staging, using the same schema.
244 00:20:34.010 ⇒ 00:20:36.629 Godwin Ekainu: And for production, right through this schema.
245 00:20:38.220 ⇒ 00:20:39.930 Godwin Ekainu: We also keep it simple, too.
246 00:20:43.010 ⇒ 00:20:45.950 Godwin Ekainu: Yes, well, thanks for stitching, and for…
247 00:20:46.360 ⇒ 00:20:53.649 Godwin Ekainu: transformation… road transformation, which is what DBT uses, granting it access to… Rights to the deaf.
248 00:20:53.970 ⇒ 00:20:56.520 Godwin Ekainu: And the production, schemas.
249 00:20:56.660 ⇒ 00:20:58.780 Godwin Ekainu: Professional schemas.
250 00:20:59.030 ⇒ 00:21:03.100 Godwin Ekainu: Then, for the, road developer.
251 00:21:03.440 ⇒ 00:21:09.020 Godwin Ekainu: Just continue to read access to the entire schemas, so that it can view the data in there.
252 00:21:09.270 ⇒ 00:21:11.259 Godwin Ekainu: And see what’s in there.
253 00:21:11.380 ⇒ 00:21:15.490 Godwin Ekainu: And for the BI row, you’re just going to ask us to view the max layer.
254 00:21:15.960 ⇒ 00:21:21.200 Godwin Ekainu: Basically, just only the material, no access to any other… any of the other databases.
255 00:21:21.650 ⇒ 00:21:25.870 Godwin Ekainu: Then, for the DBT,
256 00:21:26.500 ⇒ 00:21:33.259 Godwin Ekainu: I’m using, what do you call it? UV to install my dbt. I can see my dbt projects.
257 00:21:33.630 ⇒ 00:21:42.650 Godwin Ekainu: So that’s all for the Postgres setup. So, to run it, you just run your docker compose up minus D to set up. I have it running already, so…
258 00:21:42.900 ⇒ 00:21:45.220 Godwin Ekainu: I don’t want to do that, so…
259 00:21:46.230 ⇒ 00:21:48.100 Godwin Ekainu: More than the step you follow.
260 00:21:48.280 ⇒ 00:21:58.460 Godwin Ekainu: And for the dbt setup, I have my projects, dbt project YAML4, and I can, it’s just the default dbt settings, basically.
261 00:21:58.630 ⇒ 00:22:03.610 Godwin Ekainu: But for my model setup, I’m using this, Shopify staging. For staging.
262 00:22:03.880 ⇒ 00:22:05.210 Godwin Ekainu: I am.
263 00:22:05.710 ⇒ 00:22:10.359 Godwin Ekainu: Setting it as a view, because, he wants…
264 00:22:10.460 ⇒ 00:22:12.759 Godwin Ekainu: To have fresh data each time you run.
265 00:22:13.110 ⇒ 00:22:20.729 Godwin Ekainu: Your staging environment, and… and you want… you don’t want to create statistic one, and you don’t save on storage costs, basically.
266 00:22:20.880 ⇒ 00:22:27.970 Godwin Ekainu: For the intermediate clear, I’m showing the same, reason, setting up… setting it up as if you… Soon?
267 00:22:28.090 ⇒ 00:22:29.690 Godwin Ekainu: Managers…
268 00:22:30.600 ⇒ 00:22:39.510 Awaish Kumar: Yeah, I have a question here. For the warehouses we have right now in the market, do you think compute cost
269 00:22:39.630 ⇒ 00:22:43.330 Awaish Kumar: Is higher, or the storage cost is higher.
270 00:22:44.560 ⇒ 00:22:47.370 Godwin Ekainu: I would say compute course is higher,
271 00:22:47.770 ⇒ 00:22:51.600 Godwin Ekainu: Now, I’m aware of BigQuery and Snowflake.
272 00:22:51.820 ⇒ 00:22:54.590 Godwin Ekainu: The compute cost is higher than storage costs.
273 00:22:55.700 ⇒ 00:22:59.440 Awaish Kumar: What do you think, then, it makes sense to create as views or tables?
274 00:23:00.560 ⇒ 00:23:14.550 Godwin Ekainu: I mean, for me, I usually just create my intermediate layers, or my staging layers, views. Intermediate layer can also be set as ephemera, but I don’t use that, I don’t use that at all.
275 00:23:15.190 ⇒ 00:23:22.210 Godwin Ekainu: The only time I set up my intermediate layers tables is when I’ll save up on costs.
276 00:23:22.690 ⇒ 00:23:28.940 Godwin Ekainu: So, for example, my current company, we dig that it costs
277 00:23:29.180 ⇒ 00:23:47.750 Godwin Ekainu: I went on reducing our cost or maintaining our costs to a lower level. So, for large tables where we arranged them as views in the interject layer, I… we converted them to tables. So, status table, so no matter how many times someone calls a table or runs… gives a table, you know this, the status…
278 00:23:47.930 ⇒ 00:24:03.909 Godwin Ekainu: The byte process for that table is static, and it’s fixed. So, yeah, it’s smaller than when you use… you leave the table as a view, and you’re querying that view. So, for example, if I have an others table now, and when you query the view, you’re querying about,
279 00:24:04.400 ⇒ 00:24:07.710 Godwin Ekainu: 500 gigabytes worth of data.
280 00:24:08.040 ⇒ 00:24:15.280 Godwin Ekainu: When you convert to the table, to a table, you can see that when you convert to a table, it’s about 100 gigabytes.
281 00:24:15.480 ⇒ 00:24:18.349 Godwin Ekainu: Because the storage cost is cheap, and it’s…
282 00:24:18.810 ⇒ 00:24:23.250 Godwin Ekainu: for BigQuery, it compresses the data size to a lesser amount.
283 00:24:23.480 ⇒ 00:24:28.880 Godwin Ekainu: And then when other individuals are trying to call or write a query against that table.
284 00:24:29.050 ⇒ 00:24:39.520 Godwin Ekainu: They are only querying about 100, depending on how they write their query. If you are doing… if they are, in their select statement, they are calling the columns rather than doing a select star.
285 00:24:39.670 ⇒ 00:24:44.779 Godwin Ekainu: It’s more cheaper, so… So that’s how BigQuery works, basically, so…
286 00:24:44.960 ⇒ 00:24:53.230 Godwin Ekainu: for that reason, we set up our table as… as… or intermediately as tables, rather than our views. But initially.
287 00:24:53.570 ⇒ 00:24:59.629 Godwin Ekainu: in the starting stage, we usually do a view, basically. I usually do the view.
288 00:24:59.880 ⇒ 00:25:04.220 Godwin Ekainu: Then for Maxley, I… I prefer the table, because it’s…
289 00:25:04.480 ⇒ 00:25:08.820 Godwin Ekainu: The max data is fixed, and accept you updating it regularly.
290 00:25:09.900 ⇒ 00:25:11.879 Godwin Ekainu: So, for my…
291 00:25:12.190 ⇒ 00:25:18.430 Godwin Ekainu: profiles like the YAML, I’m using environmental variables, so I have two… two layers,
292 00:25:19.010 ⇒ 00:25:23.139 Godwin Ekainu: My dev layer, and my third layer.
293 00:25:24.120 ⇒ 00:25:33.469 Godwin Ekainu: So in the AIV file, you set up your, database and users for each of these layers, and the schemas you want to refer for each of these layers.
294 00:25:33.930 ⇒ 00:25:37.220 Godwin Ekainu: Then for the, GitHub action workflow.
295 00:25:37.330 ⇒ 00:25:39.340 Godwin Ekainu: I have my staging area here.
296 00:25:42.500 ⇒ 00:25:45.250 Godwin Ekainu: I have my, yes, my staging works right here.
297 00:25:46.290 ⇒ 00:25:48.230 Godwin Ekainu: You can see…
298 00:25:48.570 ⇒ 00:25:59.009 Godwin Ekainu: I have my environmental variables set here. I’m setting up the GitHub secrets using, I use command line. I have my EMV file, so when I do GH,
299 00:25:59.230 ⇒ 00:26:09.230 Godwin Ekainu: secrets-f.env. It set up the secret for me automatically my GitHub action repository, or my repository, rather.
300 00:26:09.540 ⇒ 00:26:17.839 Godwin Ekainu: You can see I’m calling, running on Ubuntu data test, checking out the code, setting up my Python, my Python version.
301 00:26:18.140 ⇒ 00:26:20.899 Godwin Ekainu: So, I’m using Python 3.12.
302 00:26:21.190 ⇒ 00:26:28.080 Godwin Ekainu: and installing dbt Postgres, so dbt Core and dbt Postgres, libraries.
303 00:26:28.360 ⇒ 00:26:29.630 Godwin Ekainu: then,
304 00:26:30.490 ⇒ 00:26:39.059 Godwin Ekainu: checking my dependencies. For these reports, for this project, I didn’t use any external dependencies, so it’s keep… just keeping it simple.
305 00:26:39.310 ⇒ 00:26:43.049 Godwin Ekainu: And I’m compiling my code, and then running against the staging.
306 00:26:43.350 ⇒ 00:26:49.059 Godwin Ekainu: environments once this one is completed, I do my tests.
307 00:26:49.470 ⇒ 00:26:59.829 Godwin Ekainu: So, once I… Merge, the production environment runs, follows the same format, you can see.
308 00:26:59.960 ⇒ 00:27:03.430 Godwin Ekainu: Runs on your schedule 6 hours. Every 6 hours.
309 00:27:03.860 ⇒ 00:27:07.040 Godwin Ekainu: She says not my environment here.
310 00:27:07.240 ⇒ 00:27:12.110 Godwin Ekainu: Still following the same, flow.
311 00:27:12.940 ⇒ 00:27:16.420 Godwin Ekainu: Except for where you’re running against production, so yeah.
312 00:27:16.620 ⇒ 00:27:19.749 Godwin Ekainu: Pointing to your production environment, rather.
313 00:27:21.450 ⇒ 00:27:24.879 Awaish Kumar: What else is that workflow dispatch keyword is doing?
314 00:27:25.090 ⇒ 00:27:26.790 Awaish Kumar: Like, what is the purpose of that?
315 00:27:27.450 ⇒ 00:27:28.060 Awaish Kumar: On the top.
316 00:27:28.060 ⇒ 00:27:28.640 Godwin Ekainu: H.
317 00:27:29.230 ⇒ 00:27:30.360 Awaish Kumar: Top of this file.
318 00:27:30.880 ⇒ 00:27:33.629 Awaish Kumar: There’s a keyword called, workflow dispatch.
319 00:27:35.670 ⇒ 00:27:38.010 Godwin Ekainu: I’m not really sure, to be honest.
320 00:27:40.790 ⇒ 00:27:41.520 Awaish Kumar: Okay.
321 00:27:42.640 ⇒ 00:27:46.900 Godwin Ekainu: So, I also forgot to mention my macros.
322 00:27:47.310 ⇒ 00:27:50.939 Godwin Ekainu: So, when you are running… when you set up a…
323 00:27:51.100 ⇒ 00:27:56.980 Godwin Ekainu: DPT project, it gives the name… Should I put this?
324 00:27:57.250 ⇒ 00:28:08.529 Godwin Ekainu: So, when you create your DPT project, it automatically assigns schema names to your files, or your schema in your database. So, based on your name, you can assign,
325 00:28:08.630 ⇒ 00:28:26.530 Godwin Ekainu: let’s say I wash, dev intermediate, something like that. So, to prevent that, I really wanted the development environment to have dev staging, dev intermediate, and for the production environment, you should just use the schema name to make it cleaner. So, instead of doing prod…
326 00:28:28.000 ⇒ 00:28:38.709 Godwin Ekainu: Prod Intermediate, Pro Staging, prod maths, just use the name, maths, intermediate Analytics, staging, rather, for those environments.
327 00:28:39.040 ⇒ 00:28:41.500 Godwin Ekainu: Am I missing anything here?
328 00:28:42.440 ⇒ 00:28:44.210 Godwin Ekainu: Kia.
329 00:28:45.800 ⇒ 00:28:48.029 Awaish Kumar: Yeah, we can look at the activity part.
330 00:28:49.370 ⇒ 00:28:50.220 Godwin Ekainu: Sorry?
331 00:28:51.170 ⇒ 00:28:53.939 Awaish Kumar: Yeah, like, we can look at the dbt models you have created.
332 00:28:54.480 ⇒ 00:28:55.910 Godwin Ekainu: Okay, okay, that’s true.
333 00:28:58.680 ⇒ 00:29:07.229 Godwin Ekainu: So, for these staging, schemas, staging orders, and staging products, these are… so for staging,
334 00:29:07.460 ⇒ 00:29:21.699 Godwin Ekainu: environment, or staging layer, is more of a one-to-one of your, raw layer. Just, yeah, just doing basic type change, rename, column, renames, typing, and all. So the, what do you call it, the…
335 00:29:22.150 ⇒ 00:29:32.819 Godwin Ekainu: the… you are just refining a little to make your data readable, make your data, presentable at the staging layer. So that… which is what I did here, so…
336 00:29:33.040 ⇒ 00:29:36.940 Godwin Ekainu: For each of these columns, all capital letter.
337 00:29:37.140 ⇒ 00:29:45.839 Godwin Ekainu: I am calling them, renaming them as, lowercase letter, and also adding, not too much.
338 00:29:46.000 ⇒ 00:29:50.680 Godwin Ekainu: So this is just for a byte’s metadata management.
339 00:29:51.070 ⇒ 00:29:55.790 Godwin Ekainu: I’m calling these source products, raw products, basically, for these products.
340 00:29:55.940 ⇒ 00:29:59.399 Godwin Ekainu: For the staging orders, she’s doing the same thing.
341 00:29:59.650 ⇒ 00:30:02.330 Godwin Ekainu: For my creator that on…
342 00:30:02.520 ⇒ 00:30:09.350 Godwin Ekainu: my time, time, columns, basically, my daytime columns. Change that type to timestamp.
343 00:30:10.470 ⇒ 00:30:14.229 Godwin Ekainu: The, for these, columns, so we have some
344 00:30:14.550 ⇒ 00:30:19.400 Godwin Ekainu: orders, without, canceled that or closed that.
345 00:30:19.640 ⇒ 00:30:24.660 Godwin Ekainu: Then I’m checking, I’m converting them to knowledge and empty, then assigning a timestamp to it.
346 00:30:24.780 ⇒ 00:30:28.710 Godwin Ekainu: So then the others are just renaming,
347 00:30:29.270 ⇒ 00:30:35.009 Godwin Ekainu: Same thing here. I’m setting them to know if they are empty, if they’re empty, rather.
348 00:30:36.840 ⇒ 00:30:46.260 Godwin Ekainu: Changing a type to text, change a type to text. And this is just for bytes, metadata in the station layer.
349 00:30:46.580 ⇒ 00:30:53.509 Godwin Ekainu: Same thing as the customer’s model, too. Both type… just the Skype inference, but…
350 00:30:56.590 ⇒ 00:31:02.410 Godwin Ekainu: This address was a JSON, blob.
351 00:31:04.720 ⇒ 00:31:10.840 Godwin Ekainu: And I’m leaving… I left it at this… some club in the… Staging layout, so…
352 00:31:11.050 ⇒ 00:31:17.209 Godwin Ekainu: For the schema that I am, the schema that I… normally, I don’t usually combine.
353 00:31:17.630 ⇒ 00:31:21.240 Godwin Ekainu: He’s sourcing… The same with his schema.
354 00:31:21.450 ⇒ 00:31:22.330 Godwin Ekainu: Follow.
355 00:31:22.520 ⇒ 00:31:30.150 Godwin Ekainu: Normally, I have a sauce by yamo file, and I call this here. Because I’m not doing much for this, I’m just adding them in a single file.
356 00:31:31.330 ⇒ 00:31:36.050 Godwin Ekainu: then I have my mod… my… each table description.
357 00:31:36.540 ⇒ 00:31:43.319 Godwin Ekainu: and simple tests, so I’m just testing for known nodes and unique in each of these tables.
358 00:31:43.440 ⇒ 00:31:47.070 Godwin Ekainu: Then, here I’m referring, there’s a foreign key.
359 00:31:47.290 ⇒ 00:31:52.419 Godwin Ekainu: Some… Checking the relationship in the court for the foreign queues, basically.
360 00:31:53.300 ⇒ 00:32:01.809 Godwin Ekainu: Same thing here, description, just simple test of nodes and non-nodes in your ID, new product ID is a column.
361 00:32:03.240 ⇒ 00:32:08.749 Godwin Ekainu: For the intermediate clear, install this.
362 00:32:11.150 ⇒ 00:32:18.509 Godwin Ekainu: So, for the line item, which was a blob, the idea was to build it like an order summary, which
363 00:32:18.920 ⇒ 00:32:23.330 Godwin Ekainu: And for my own, for what I talked about, it was, just be, like, a simple…
364 00:32:23.450 ⇒ 00:32:27.139 Godwin Ekainu: Other summary that shows, like, basic details about an order.
365 00:32:27.390 ⇒ 00:32:43.860 Godwin Ekainu: And to do that, the line… I decided to use the line item, and the line item is a blob, JSON blob, and I have to… I have to flatten that JSON blob. I decided to do that in the interject layer. So the interject layer contains where… the layer where you have your complex.
366 00:32:44.100 ⇒ 00:32:47.910 Godwin Ekainu: Transformations, data processing.
367 00:32:48.400 ⇒ 00:32:53.789 Godwin Ekainu: Basically, trying to fit that data into a particular use case, a business use case, a business.
368 00:32:53.900 ⇒ 00:32:57.639 Godwin Ekainu: Yeah, business metrics. Metric.
369 00:32:58.230 ⇒ 00:33:11.579 Godwin Ekainu: So, I’m first flattening… okay, I’m referring to my staging orders here, I’m referring to my products here, and I’m flattening the data and assigning some types to the columns, some of the columns.
370 00:33:11.980 ⇒ 00:33:15.760 Godwin Ekainu: And then, after flattening it, I’m draining it.
371 00:33:15.900 ⇒ 00:33:19.079 Godwin Ekainu: with the product table, based on the product ID.
372 00:33:19.350 ⇒ 00:33:22.230 Godwin Ekainu: To get, some other product information.
373 00:33:23.230 ⇒ 00:33:24.500 Godwin Ekainu: Encasam also.
374 00:33:24.810 ⇒ 00:33:30.369 Godwin Ekainu: like, assigning types to the numeric columns, numeric times and boolean types to…
375 00:33:30.610 ⇒ 00:33:32.429 Godwin Ekainu: Some of the fields, too.
376 00:33:32.830 ⇒ 00:33:36.840 Godwin Ekainu: After joining, I’m doing my select order join,
377 00:33:37.290 ⇒ 00:33:44.249 Godwin Ekainu: you notice in these, projects, I did not, set up, like, SQL typing or parsing.
378 00:33:44.560 ⇒ 00:33:49.579 Godwin Ekainu: I didn’t want to complicate it, that much. Normally, I’ll use SQL Fluff.
379 00:33:49.730 ⇒ 00:33:52.189 Godwin Ekainu: to Chan’s, to Chan.
380 00:33:52.380 ⇒ 00:33:54.510 Godwin Ekainu: Setup, what do you call it?
381 00:33:54.690 ⇒ 00:34:02.850 Godwin Ekainu: Sgo type, rules, so that if… if my projects, if my…
382 00:34:03.660 ⇒ 00:34:08.070 Godwin Ekainu: my, what do you call, my SQL script doesn’t follow particular format on all
383 00:34:08.219 ⇒ 00:34:14.239 Godwin Ekainu: I won’t be able to push to my GitHub repository, it will fail that and tell me to go back and fix.
384 00:34:14.780 ⇒ 00:34:16.800 Godwin Ekainu: The passing arrow, really clip.
385 00:34:17.000 ⇒ 00:34:21.219 Godwin Ekainu: For the ints orders, I’m just basically calling the orders,
386 00:34:22.239 ⇒ 00:34:39.269 Godwin Ekainu: I did this… I’m doing a basic join, so I did this because I did not want to do, like, a complex transformation in my summary layer, lies in the math layer. I prefer to just do, like, a basic join, calling the tables directly in the mark layer, to keep it simple.
387 00:34:39.679 ⇒ 00:34:43.870 Godwin Ekainu: Sorry, let me… I’ve missed this.
388 00:34:46.469 ⇒ 00:34:54.460 Godwin Ekainu: So in my Yamelf, my int Yamelf, I’m following the same principle, basic discussion about the tables.
389 00:34:54.750 ⇒ 00:35:00.760 Godwin Ekainu: Then, busy testing on some tables, so… And follows.
390 00:35:02.080 ⇒ 00:35:09.469 Godwin Ekainu: So, you have a non… not no test, and you have a not-known unique test on the other ID, and having not known test on the customer ID here.
391 00:35:09.670 ⇒ 00:35:13.690 Godwin Ekainu: For the interorderline item, this is me,
392 00:35:14.120 ⇒ 00:35:18.970 Godwin Ekainu: like I said, flattening the line item to, like, a single row.
393 00:35:19.320 ⇒ 00:35:22.800 Godwin Ekainu: Line item, and joining that to the,
394 00:35:23.220 ⇒ 00:35:28.120 Godwin Ekainu: What do you guys say, orders table, right now. Oh, the customers, the product table.
395 00:35:28.380 ⇒ 00:35:32.520 Godwin Ekainu: So I’m also doing busy tests on some field, the other IT.
396 00:35:32.790 ⇒ 00:35:37.239 Godwin Ekainu: Not noticed. The line item and the ID, not noticed.
397 00:35:37.640 ⇒ 00:35:40.070 Godwin Ekainu: Then busy discussion.
398 00:35:40.330 ⇒ 00:35:47.470 Godwin Ekainu: for… I was checking, the refuge.
399 00:35:48.860 ⇒ 00:35:52.680 Godwin Ekainu: Down to my summary table.
400 00:35:52.940 ⇒ 00:35:58.070 Godwin Ekainu: I have, my orders here. I’m calling my reference, my int orders.
401 00:35:58.390 ⇒ 00:36:01.740 Godwin Ekainu: Then, also calling.
402 00:36:01.940 ⇒ 00:36:10.449 Godwin Ekainu: G… line item, metrics, some aggregating metrics on the… aggregating metrics on the line item, and intermediate table.
403 00:36:10.790 ⇒ 00:36:16.999 Godwin Ekainu: And I’m calling the, doing, like, a join, combining them together to give me, like, an order summary.
404 00:36:17.940 ⇒ 00:36:21.190 Godwin Ekainu: So, in practice, let me see…
405 00:36:21.540 ⇒ 00:36:23.479 Godwin Ekainu: Let me show you how it looks like.
406 00:36:37.880 ⇒ 00:36:41.160 Awaish Kumar: Yeah, let’s talk about, like, for example.
407 00:36:42.320 ⇒ 00:36:47.580 Awaish Kumar: In dbt, we have created a mods table, March summary, Yeah.
408 00:36:48.340 ⇒ 00:36:52.320 Awaish Kumar: like, if… audio summary, sorry. So, if…
409 00:36:52.760 ⇒ 00:36:59.160 Awaish Kumar: If, like, that table grows to, like, hundreds of millions of rows, and…
410 00:36:59.490 ⇒ 00:37:01.870 Awaish Kumar: And it becomes really slow to execute it.
411 00:37:01.990 ⇒ 00:37:05.050 Awaish Kumar: What changes you would make, to optimize it?
412 00:37:06.850 ⇒ 00:37:12.070 Godwin Ekainu: Normally I would break down that TB into multiple,
413 00:37:12.510 ⇒ 00:37:18.530 Godwin Ekainu: So, I followed three approaches. First, I partitioned the table to ensure
414 00:37:18.940 ⇒ 00:37:24.490 Godwin Ekainu: Apart soon and closer the table. So this is the approach I pull on BigQuery. I’m not sure.
415 00:37:25.100 ⇒ 00:37:31.959 Godwin Ekainu: about Snowflake, but I think it’s a general practice, to partition a cluster your table, so that you ensure that,
416 00:37:32.370 ⇒ 00:37:41.620 Godwin Ekainu: when you’re across… when you’re acquiring that table, you infer… you also enforce that partition. So, what I mean by that is, when you’re acquiring a partition table.
417 00:37:41.720 ⇒ 00:37:58.020 Godwin Ekainu: you have to query by… you have to filter by the partition column. If not, the query is not going to run. I don’t know if that works for Snowflips, then BigQuery, we do that. So if I partition, like, another table name accurately, and someone is going to query that table, you have to put… filter by the partition field.
418 00:37:58.270 ⇒ 00:38:05.529 Godwin Ekainu: To ensure that the data runs smooth. That is fastens the query time and increase the… increase the speed.
419 00:38:05.730 ⇒ 00:38:19.719 Godwin Ekainu: Basically, for the query, and it also reduces the cost, reduces the processing time, so… so instead of querying or scanning the entire table, it just picks the data from the particular partition you’re interested in, and…
420 00:38:19.830 ⇒ 00:38:32.859 Godwin Ekainu: Gives that to… to partitioning, of course, clustering works. If that doesn’t work, I break down the table into multiple tables. Basically, I reduce the… the… instead of doing… when creating a table, instead of doing, like, a…
421 00:38:33.110 ⇒ 00:38:45.690 Godwin Ekainu: what do you call it? Let’s say you have 100 million records. I archive part of the data into, like, a, archived layer, then just, query, the latest layer.
422 00:38:46.090 ⇒ 00:39:02.969 Godwin Ekainu: view the table using the latest data, basically. So, data for the last few years, instead of data for the last 20 years, I reduce that table to maybe query data for the last 5 years or so, but I don’t do this without discussing with the stakeholders who depend on this data.
423 00:39:07.100 ⇒ 00:39:09.650 Godwin Ekainu: Hmm, I hope that answers the question.
424 00:39:14.960 ⇒ 00:39:17.099 Awaish Kumar: Yeah, Demi, you have any follow-up?
425 00:39:17.650 ⇒ 00:39:23.680 Demilade Agboola: I think my question would just be around, ensure… how do we ensure that the tests
426 00:39:24.470 ⇒ 00:39:30.749 Demilade Agboola: On the data, like, if anything goes wrong with the data, we’re the first people to know before any stakeholders.
427 00:39:33.390 ⇒ 00:39:40.950 Godwin Ekainu: So normally, Normally, you, you… in TBTC,
428 00:39:41.250 ⇒ 00:39:45.860 Godwin Ekainu: I don’t know how to do that on dbt Couple. In dbt Cloud, you usually do, like, a,
429 00:39:46.010 ⇒ 00:39:50.019 Godwin Ekainu: Data quality check on your data, so you like your source freshness.
430 00:39:50.150 ⇒ 00:39:54.090 Godwin Ekainu: And you do, like, your data quality checks, your dbtml 5, too.
431 00:39:54.250 ⇒ 00:40:02.489 Godwin Ekainu: So if there’s anything wrong with the data during the job run, it’s, we set up, like, an alert to your Slack channel.
432 00:40:02.690 ⇒ 00:40:21.239 Godwin Ekainu: Dbt has a way of doing that easily. So, once there’s something wrong with the data, based on your data quality, your source permissions check, it sends it… it sends the alert basically to a Slack channel, or your email, some… wherever you decide to want to receive your alert, and when you get that, you go there and fix it,
433 00:40:21.640 ⇒ 00:40:32.300 Godwin Ekainu: Before… so it enforces, I… it enforces that the, the, what do you call it? So this is usually done on the staging layer, before it gets to the production layer.
434 00:40:32.570 ⇒ 00:40:41.469 Godwin Ekainu: Basically, so once you see that, you go back and fix it, and you run your jobs, and it sends the data to the, run the production job, rather.
435 00:40:53.570 ⇒ 00:41:01.889 Awaish Kumar: Okay, yeah, can you, like… Name, what is the materialization in dbt, and what are different materializations?
436 00:41:03.020 ⇒ 00:41:07.780 Godwin Ekainu: So in DBT, we have the…
437 00:41:08.180 ⇒ 00:41:13.440 Godwin Ekainu: We have the table, the view, we have the incrementer, we have the infirmary.
438 00:41:13.690 ⇒ 00:41:21.369 Godwin Ekainu: I think it also depends on the, so those are naturalization.
439 00:41:21.930 ⇒ 00:41:24.680 Godwin Ekainu: And I think there’s something…
440 00:41:25.100 ⇒ 00:41:30.449 Godwin Ekainu: Those are, like, the forms I’m familiar with, basically. I’m not sure if there are others.
441 00:41:31.300 ⇒ 00:41:36.269 Uttam Kumaran: Are you familiar… yeah, I guess, are you familiar with, like, in what situation you would use…
442 00:41:36.400 ⇒ 00:41:38.429 Uttam Kumaran: Like, incremental, for example.
443 00:41:39.890 ⇒ 00:41:49.379 Godwin Ekainu: So for incremental is when you don’t want to, do, like, a full run on your entire data, because when you’re inserting into your… when you’re creating your…
444 00:41:49.500 ⇒ 00:41:51.729 Godwin Ekainu: your models. So, for example.
445 00:41:51.940 ⇒ 00:41:59.920 Godwin Ekainu: When you do, like, an incremental run, it checks your source table, your destination table, right, and compares against yours.
446 00:42:00.200 ⇒ 00:42:07.810 Godwin Ekainu: The source, and checks that, and based on the, based on the field, basically.
447 00:42:08.030 ⇒ 00:42:23.789 Godwin Ekainu: It checks and see that, if there’s no data in a particular… if data is missing from a particular partition, the particular row inserts that data. So instead of doing, like, a full run, where you rerun your whole table, or you create… recreate the whole table, if you… you recreate the whole table, it’s…
448 00:42:23.940 ⇒ 00:42:29.739 Godwin Ekainu: Only inserts what’s not… what doesn’t… what’s not existing in that particular table.
449 00:42:31.400 ⇒ 00:42:31.960 Uttam Kumaran: Okay.
450 00:42:35.130 ⇒ 00:42:39.110 Godwin Ekainu: So everybody’s gonna work.
451 00:42:39.650 ⇒ 00:42:43.400 Awaish Kumar: Yeah, I think that’s it for me. Utum, Dami, if you have anything else.
452 00:42:44.610 ⇒ 00:42:48.129 Uttam Kumaran: Yeah, I guess, Godwin, any questions for us?
453 00:42:48.670 ⇒ 00:42:54.859 Uttam Kumaran: like, anything as part of this process, or any questions while you have the three of us, about BrainForge, or anything you’d like to ask?
454 00:42:56.890 ⇒ 00:42:59.110 Godwin Ekainu: So…
455 00:42:59.250 ⇒ 00:43:09.949 Godwin Ekainu: Don’t have any questions. So for this interview, I didn’t compare with any questions, because I know I… discussing with, Auation, you had… I already asked a lot of the questions, I wasn’t.
456 00:43:10.380 ⇒ 00:43:12.980 Godwin Ekainu: I was particularly interested in Axi.
457 00:43:13.650 ⇒ 00:43:21.200 Godwin Ekainu: My question will just be basically on the project, so, why,
458 00:43:21.830 ⇒ 00:43:27.610 Godwin Ekainu: How did you find the project for the entire, solution,
459 00:43:28.140 ⇒ 00:43:33.139 Godwin Ekainu: How did you see it? What was your feedback based on the entire solution?
460 00:43:35.060 ⇒ 00:43:49.979 Uttam Kumaran: Yeah, I mean, I think I always love to see, like, a broader depth in, like, data engineering, so I think you have, like, a lot of depth there in, like, setting up Airbyte, and sort of how you’re thinking about, like, grants, and I think you have a pretty good understanding of dbt, so that’s probably my feedback.
461 00:43:54.550 ⇒ 00:44:00.099 Godwin Ekainu: Thank you, okay, so, I guess…
462 00:44:00.730 ⇒ 00:44:07.959 Godwin Ekainu: I didn’t show you guys… I don’t know if you guys want to see the run. I left this to be… to run for…
463 00:44:09.010 ⇒ 00:44:10.630 Godwin Ekainu: visits.
464 00:44:10.930 ⇒ 00:44:11.850 Godwin Ekainu: Thank you.
465 00:44:11.960 ⇒ 00:44:14.569 Godwin Ekainu: So it has been running since I deployed it, so…
466 00:44:16.070 ⇒ 00:44:23.710 Godwin Ekainu: So I set up a… what do you call it? A production instance on PlanetSQ to test out the production on…
467 00:44:23.820 ⇒ 00:44:26.630 Godwin Ekainu: Physically. He’s been training touch for a while.
468 00:44:27.310 ⇒ 00:44:31.449 Godwin Ekainu: I didn’t show you guys the data, but I guess, I’m not sure about the…
469 00:44:31.740 ⇒ 00:44:33.919 Godwin Ekainu: Credentials I use for contacts now.
470 00:44:36.070 ⇒ 00:44:38.889 Godwin Ekainu: In case that was all.
471 00:44:43.720 ⇒ 00:44:44.560 Uttam Kumaran: Right.
472 00:44:49.910 ⇒ 00:44:53.330 Godwin Ekainu: So, any questions at handoffs?
473 00:44:54.360 ⇒ 00:44:56.139 Uttam Kumaran: Yeah, I think that’s it from my side.
474 00:45:00.090 ⇒ 00:45:01.659 Demilade Agboola: Yeah, that’s it from my side, too.
475 00:45:02.070 ⇒ 00:45:02.660 Uttam Kumaran: Okay.
476 00:45:03.210 ⇒ 00:45:06.720 Uttam Kumaran: Perfect. Alright, thank you, everyone. Thank you, Godwin. Appreciate it.
477 00:45:07.580 ⇒ 00:45:08.490 Godwin Ekainu: Thank you, everyone.
478 00:45:09.040 ⇒ 00:45:10.109 Awaish Kumar: Okay, nice.
479 00:45:10.110 ⇒ 00:45:11.230 Godwin Ekainu: Chatting with you.
480 00:45:11.710 ⇒ 00:45:13.019 Uttam Kumaran: Yeah, appreciate it.
481 00:45:13.020 ⇒ 00:45:14.170 Godwin Ekainu: Right. Talk to you soon.
482 00:45:14.280 ⇒ 00:45:14.850 Uttam Kumaran: Bye.