Meeting Title: Supabase to Motherduck ETL Date: 2025-12-02 Meeting participants: Awaish Kumar, Thomas Pilger, Mustafa Raja
WEBVTT
1 00:00:05.980 ⇒ 00:00:06.860 Awaish Kumar: Hello.
2 00:00:08.670 ⇒ 00:00:09.660 Thomas Pilger: Hey, how’s it going?
3 00:00:10.660 ⇒ 00:00:12.040 Awaish Kumar: All good, how about you?
4 00:00:12.880 ⇒ 00:00:21.960 Thomas Pilger: Doing pretty good. So I spoke with Utami, gave… I think he told you, gave you a little bit of a brief.
5 00:00:23.030 ⇒ 00:00:32.680 Thomas Pilger: So… I… I read through the data integration pipeline Notion doc,
6 00:00:33.670 ⇒ 00:00:36.220 Thomas Pilger: I didn’t see anything specific to Omni.
7 00:00:37.010 ⇒ 00:00:51.829 Thomas Pilger: But basically, like, what I’m trying to get out of this is, like, just some confirmation on exactly what I need to do in S3, and what the data that I’m going to be pushing… what I need to do in
8 00:00:51.980 ⇒ 00:00:58.740 Thomas Pilger: GitHub Actions, or the script that I write there, to get it into S3 in, like, the proper way.
9 00:01:00.180 ⇒ 00:01:04.420 Thomas Pilger: That makes sense. I’m very, I do not have, like, any experience with
10 00:01:04.890 ⇒ 00:01:12.169 Thomas Pilger: anything AWS-related. So, like, yeah, I’m just trying to get some… some confirmation on… on that end.
11 00:01:14.140 ⇒ 00:01:21.729 Awaish Kumar: Okay, I, like, are you… are we talking about moving data from S3 to Catalyst?
12 00:01:22.800 ⇒ 00:01:30.430 Thomas Pilger: No, so… that is part of it, that is something that I do want to cover. But mostly, so, the way that we’re…
13 00:01:31.370 ⇒ 00:01:33.759 Thomas Pilger: setting this up in Mother Duck.
14 00:01:34.010 ⇒ 00:01:35.840 Thomas Pilger: is we are…
15 00:01:37.240 ⇒ 00:01:46.359 Thomas Pilger: basically going to be querying production data for those tables every day using GitHub Actions, and then
16 00:01:46.520 ⇒ 00:01:55.890 Thomas Pilger: putting that in an S3 bucket, and then using, like, that bucket, to connect to Mother Duck as, like, a read-only, database.
17 00:01:56.720 ⇒ 00:02:01.970 Awaish Kumar: Okay, so… why we want to use GitHub Actions, or…
18 00:02:03.570 ⇒ 00:02:09.659 Thomas Pilger: That was just what was suggested to us. Tom suggested that in one of our calls.
19 00:02:10.509 ⇒ 00:02:14.590 Thomas Pilger: If there’s something… I… we can use whatever. Whatever’s easiest.
20 00:02:15.800 ⇒ 00:02:17.529 Awaish Kumar: Yeah, like, I…
21 00:02:17.990 ⇒ 00:02:27.480 Awaish Kumar: I don’t know, like, I haven’t… like, we have… we have multiple ingestion tools available now in the market, so we can use any of those to move data from,
22 00:02:28.650 ⇒ 00:02:32.189 Awaish Kumar: supervisor, for example, Madera.
23 00:02:32.980 ⇒ 00:02:38.049 Awaish Kumar: And three, if you want to move to S3, or with another directory.
24 00:02:38.230 ⇒ 00:02:46.839 Awaish Kumar: But, or, like, we could have wrote the Python scripts. Like, GitHub Actions, I, I’m… I’m just…
25 00:02:46.990 ⇒ 00:02:51.760 Awaish Kumar: like, on the GitHub Actions, we have to run some kind of script, basically.
26 00:02:51.990 ⇒ 00:02:57.409 Awaish Kumar: the GitHub Actions will just give us some… Place to run our compute.
27 00:02:57.630 ⇒ 00:03:11.760 Awaish Kumar: But at the end of it, we just have to do… maybe write some Python or something that’s basically going to connect with Superbase, get the data, and put it into the S3 or MotherDoc, wherever you want to…
28 00:03:13.570 ⇒ 00:03:15.670 Thomas Pilger: So, I’m just basically…
29 00:03:16.000 ⇒ 00:03:23.930 Thomas Pilger: just querying… I was gonna use, like, the Node SDK for Superbase, and then I’m just, like, shoving that into S3.
30 00:03:24.160 ⇒ 00:03:28.830 Thomas Pilger: in the same format, basically, like, full table exports going straight into S3.
31 00:03:29.760 ⇒ 00:03:32.650 Thomas Pilger: Right? Is that… am I understanding that?
32 00:03:32.880 ⇒ 00:03:33.430 Awaish Kumar: Okay.
33 00:03:33.430 ⇒ 00:03:38.790 Mustafa Raja: So, the Notion doc that was linked also included a test CSV, right?
34 00:03:39.090 ⇒ 00:03:46.890 Mustafa Raja: So, what we want to do is, we want to, just push the KPIs to… to S3, and then connect those to…
35 00:03:47.100 ⇒ 00:03:49.410 Mustafa Raja: What’s it called? The gate list.
36 00:03:50.150 ⇒ 00:03:54.299 Thomas Pilger: Yeah, that… I… that part seemed pretty straightforward.
37 00:03:54.480 ⇒ 00:04:00.940 Thomas Pilger: And sorry if a lot of this seems pretty simple, or a lot of my questions are pretty simple.
38 00:04:01.550 ⇒ 00:04:04.470 Thomas Pilger: But I was just thinking, yeah, in terms of…
39 00:04:05.250 ⇒ 00:04:12.420 Thomas Pilger: what you need from me is just, like, the production data going straight into S3, and then… S3-wise.
40 00:04:12.810 ⇒ 00:04:20.330 Thomas Pilger: Or bucket-wise, rather. Like, is there anything else that you need me to… like, how does that need to be structured?
41 00:04:20.839 ⇒ 00:04:24.829 Mustafa Raja: Okay, so what we wanted is… Yeah, go on.
42 00:04:25.330 ⇒ 00:04:30.300 Awaish Kumar: Yeah, I heard the question. So, like, the data which is coming from, Superbase.
43 00:04:30.550 ⇒ 00:04:38.950 Awaish Kumar: is also part of KPI? Like, is it going to be landed in Madhurag, and then it’s going… going to… going to land in, like, Catalyst?
44 00:04:40.610 ⇒ 00:04:43.720 Thomas Pilger: Yes, that was… that’s my understanding, is that…
45 00:04:44.280 ⇒ 00:04:47.250 Thomas Pilger: You would be using this,
46 00:04:47.950 ⇒ 00:04:58.200 Thomas Pilger: like, our production data, and then writing that to another S3 bucket, or storing that in S3, and then using that to go into Catalyst. I’m not sure if the actual data
47 00:04:59.290 ⇒ 00:05:05.249 Thomas Pilger: or the data that I’m putting into S3 from Subabase is being used in Catalyst, I’m sure it probably is.
48 00:05:06.420 ⇒ 00:05:10.500 Thomas Pilger: But… as far as… like…
49 00:05:11.660 ⇒ 00:05:16.910 Thomas Pilger: storing the bit… or getting the… I know that you’re doing things in… in, Mother Duck.
50 00:05:17.400 ⇒ 00:05:17.880 Mustafa Raja: Yeah.
51 00:05:17.880 ⇒ 00:05:21.530 Thomas Pilger: you need that data in Mother Doc. So, as…
52 00:05:21.680 ⇒ 00:05:26.070 Thomas Pilger: I spoke with Victor about this, and he was… he… to reduce, like, any kind of…
53 00:05:26.840 ⇒ 00:05:39.530 Thomas Pilger: just for data security purposes, he wants to set up a read-only database in MotherDoc using S3, like, connecting it through the HTTP URL option, or whatever.
54 00:05:40.840 ⇒ 00:05:41.620 Mustafa Raja: Hmm.
55 00:05:42.140 ⇒ 00:05:48.780 Awaish Kumar: Yeah, so what I… yeah, if, like, if Superbase has an SDK, and which is, like.
56 00:05:49.160 ⇒ 00:05:54.500 Awaish Kumar: installable. I think that’s why, maybe, Utam should…
57 00:05:54.630 ⇒ 00:06:08.849 Awaish Kumar: like, suggested GitHub Actions, that we can install that CLI for Superbase and run some commands directly to export data, export full tables, and put it into S3. So maybe that was his, basically.
58 00:06:09.080 ⇒ 00:06:10.490 Awaish Kumar: position.
59 00:06:11.970 ⇒ 00:06:12.639 Awaish Kumar: That’s the only.
60 00:06:12.640 ⇒ 00:06:13.300 Thomas Pilger: I bet.
61 00:06:13.300 ⇒ 00:06:25.110 Awaish Kumar: We can… we can do that. So, our ask is two things. Basically, one, bringing that data to S3, and and if you need, like.
62 00:06:25.550 ⇒ 00:06:31.529 Awaish Kumar: any help with utam suggestions? Like, yeah, we can be there. Second thing is…
63 00:06:31.840 ⇒ 00:06:43.800 Awaish Kumar: when that data lands in S3, we are going to move it to Mother Doc, maybe. And in Mother Duck, we are going to run some transformations, because the data may be not in the format that
64 00:06:44.060 ⇒ 00:06:46.009 Awaish Kumar: that we are going to send to Catalyst.
65 00:06:46.170 ⇒ 00:07:01.900 Awaish Kumar: So, we will going to… we are going to run some transformation in Mother Dark, and then we are going to put that data in S3 again, in some bucket. So, we are need… we need access to that bucket, where we can put some of our… some of our transformed data.
66 00:07:02.100 ⇒ 00:07:15.920 Awaish Kumar: And then what we need from you is basically connect that S3 bucket with Catalyst. So there is a direct integration with Catalyst and S3, so you have to set that up so that it reads data from S3 bucket.
67 00:07:16.670 ⇒ 00:07:18.230 Thomas Pilger: Okay.
68 00:07:18.560 ⇒ 00:07:22.690 Thomas Pilger: I think that that… have…
69 00:07:22.880 ⇒ 00:07:27.389 Thomas Pilger: That makes sense. I get what… I get the general flow of what’s needed.
70 00:07:27.810 ⇒ 00:07:31.250 Thomas Pilger: I wasn’t sure… I heard, it was…
71 00:07:31.810 ⇒ 00:07:34.790 Thomas Pilger: maybe dbt or some, like, ETL.
72 00:07:35.780 ⇒ 00:07:36.329 Awaish Kumar: Yeah, that…
73 00:07:36.330 ⇒ 00:07:37.660 Thomas Pilger: Something, yeah.
74 00:07:37.660 ⇒ 00:07:43.480 Awaish Kumar: side, that’s going to happen on our side. When data lands in, let’s see, and we are going to move it to Mother Dark.
75 00:07:43.570 ⇒ 00:07:59.250 Awaish Kumar: At that point, we are going to run some dbt. dbt is just running some SQL queries to transform the data. So we are going to do that on our end, and then we are going to put those transformed data back to some S3.
76 00:07:59.540 ⇒ 00:08:00.470 Awaish Kumar: bucket.
77 00:08:00.620 ⇒ 00:08:07.620 Awaish Kumar: And that data is going to be moved to Catalyst. But, like, we wanna… we want, I think.
78 00:08:07.990 ⇒ 00:08:15.200 Awaish Kumar: We want you to own, like, these two parts, Super Base to S3, and then S3 to Catalyst.
79 00:08:15.990 ⇒ 00:08:19.809 Thomas Pilger: Okay. Yeah, that makes sense. I think…
80 00:08:19.970 ⇒ 00:08:22.520 Thomas Pilger: I may have overthought some of the,
81 00:08:23.560 ⇒ 00:08:28.309 Thomas Pilger: the GitHub Actions-specific things. I saw that dbt…
82 00:08:28.570 ⇒ 00:08:35.779 Thomas Pilger: I remember hearing about it, and I was not sure where that… the responsibility of that lies. So it’s pretty simple. I just need to…
83 00:08:36.539 ⇒ 00:08:45.749 Thomas Pilger: throw the, you know, query the data every day, throw it in S3, you pull it, you use it, mother duck, and then you shove it back into S3, and then that’s going into Catalyst.
84 00:08:46.600 ⇒ 00:08:47.530 Awaish Kumar: Yep.
85 00:08:48.080 ⇒ 00:08:48.790 Thomas Pilger: Okay.
86 00:08:49.220 ⇒ 00:08:49.839 Awaish Kumar: And…
87 00:08:49.840 ⇒ 00:08:50.330 Thomas Pilger: Alright.
88 00:08:50.440 ⇒ 00:08:57.630 Awaish Kumar: For a… like, I don’t know if my… like, most of us, we… I think… I don’t know if we have access to S3E buckets.
89 00:08:57.760 ⇒ 00:09:06.649 Awaish Kumar: So, like, to read data from S3, and then move it to Mother Dark, and then, again, sending it back to S3, we need… we will need access to S3 bucket.
90 00:09:08.150 ⇒ 00:09:11.029 Awaish Kumar: Maybe someone P, or whatever.
91 00:09:11.920 ⇒ 00:09:14.669 Thomas Pilger: Yeah, I was thinking, I’m sure that…
92 00:09:16.570 ⇒ 00:09:30.709 Thomas Pilger: Well, that shouldn’t be a problem. I kind of thought that that would be the case after reading the doc. I’ll just have to make sure that everything’s good with Victor in terms of if I need to, like, that needs to be separated, or how that’s gonna work, but that shouldn’t be an issue, okay?
93 00:09:31.030 ⇒ 00:09:38.520 Thomas Pilger: I think you’ve answered most of my questions. I can figure out, like, the… the permissions and everything that’s required on my end.
94 00:09:38.820 ⇒ 00:09:41.900 Thomas Pilger: but…
95 00:09:43.980 ⇒ 00:09:50.919 Thomas Pilger: Yeah, I think I’ve got it. Sorry that this was… sorry to make you get on a call for this, but this was, like, super helpful.
96 00:09:51.080 ⇒ 00:09:53.910 Awaish Kumar: Yeah, no worries. Like, Muswa, do we have anything else?
97 00:09:54.720 ⇒ 00:09:56.269 Mustafa Raja: Yeah, this is pretty much it.
98 00:09:57.400 ⇒ 00:09:59.170 Thomas Pilger: Alright, sweet.
99 00:09:59.670 ⇒ 00:10:02.170 Thomas Pilger: Well, I appreciate it.
100 00:10:02.510 ⇒ 00:10:04.629 Thomas Pilger: I should be able to…
101 00:10:04.780 ⇒ 00:10:10.190 Thomas Pilger: get this set up pretty quickly then, seems pretty simple. So I will let you know.
102 00:10:10.190 ⇒ 00:10:12.159 Awaish Kumar: Okay, yeah.
103 00:10:12.400 ⇒ 00:10:15.129 Awaish Kumar: put that data into S3, and also…
104 00:10:15.320 ⇒ 00:10:19.659 Awaish Kumar: Yeah, let us know what service found, or how we can access that volcano.
105 00:10:19.800 ⇒ 00:10:20.919 Awaish Kumar: To read the data.
106 00:10:26.670 ⇒ 00:10:28.949 Thomas Pilger: Sorry, you just cut out there.
107 00:10:28.950 ⇒ 00:10:29.559 Awaish Kumar: I was just saying.
108 00:10:29.560 ⇒ 00:10:31.130 Thomas Pilger: But yeah, I’ll let you know when…
109 00:10:31.460 ⇒ 00:10:32.090 Awaish Kumar: Yeah.
110 00:10:32.090 ⇒ 00:10:32.690 Thomas Pilger: Go ahead.
111 00:10:32.910 ⇒ 00:10:41.279 Awaish Kumar: Yeah, let us know when the data is in this tree, and also how you can access it, like the bucket, or service count key, or whatever, yeah.
112 00:10:42.310 ⇒ 00:10:50.789 Thomas Pilger: Okay. I will do that, and I will do the, catalyst integration, the Phase 1 part of the doc as well.
113 00:10:51.330 ⇒ 00:10:51.850 Mustafa Raja: Yep.
114 00:10:52.920 ⇒ 00:10:53.270 Awaish Kumar: Okay.
115 00:10:53.270 ⇒ 00:10:54.900 Thomas Pilger: Alright, well, thank you.
116 00:10:55.270 ⇒ 00:10:55.889 Awaish Kumar: Thank you.
117 00:10:55.890 ⇒ 00:10:56.660 Mustafa Raja: Thank you.
118 00:10:57.050 ⇒ 00:10:57.899 Thomas Pilger: See you guys.
119 00:10:58.150 ⇒ 00:10:58.910 Mustafa Raja: Yep.