Meeting Title: Pandas Data Extraction Discussion Date: 2026-01-22 Meeting participants: Mustafa Raja, Demilade Agboola
WEBVTT
1 00:01:53.120 ⇒ 00:01:54.220 Demilade Agboola: Hi, Mustafa.
2 00:01:56.540 ⇒ 00:01:57.990 Mustafa Raja: Hey, how are you?
3 00:01:58.470 ⇒ 00:01:59.709 Demilade Agboola: I’m pretty good, how are you?
4 00:02:00.640 ⇒ 00:02:02.009 Mustafa Raja: Yeah, doing good.
5 00:02:04.800 ⇒ 00:02:09.009 Demilade Agboola: So, my question here was…
6 00:02:09.160 ⇒ 00:02:14.120 Mustafa Raja: Would Pandas be able to detect multiple tables within the same CSV?
7 00:02:14.560 ⇒ 00:02:20.530 Mustafa Raja: Not actually CSV, but, it would be Excel tab, right?
8 00:02:21.730 ⇒ 00:02:24.049 Demilade Agboola: So, are you talking about the format of the sheets?
9 00:02:24.630 ⇒ 00:02:25.680 Mustafa Raja: Yeah.
10 00:02:26.860 ⇒ 00:02:30.470 Demilade Agboola: So, for the sheets, you can always specify the names of the sheets.
11 00:02:30.710 ⇒ 00:02:33.569 Demilade Agboola: But… the tricky part would be…
12 00:02:34.300 ⇒ 00:02:40.360 Demilade Agboola: Are the column names the same? Like, are they… are they… even though the formats are different, are they similar column names?
13 00:02:42.560 ⇒ 00:02:45.170 Mustafa Raja: Sorry, could you… Could you say that again?
14 00:02:45.430 ⇒ 00:02:49.550 Demilade Agboola: I said, even if the formats are different, are the column names the same, or are they similar?
15 00:02:50.450 ⇒ 00:02:51.250 Mustafa Raja: No…
16 00:02:51.580 ⇒ 00:03:09.120 Mustafa Raja: it’s a lot different. It’s, each, each, not each, but, let’s say, after every, 3-4 CSVs, or after every 3-4 Excel files, the format changes just drastically.
17 00:03:10.720 ⇒ 00:03:15.870 Demilade Agboola: Okay, but what I’m saying is, are the column names the same? So, like, is date, date, is,
18 00:03:16.760 ⇒ 00:03:20.140 Demilade Agboola: You know, revenue, revenue, or whatever they’re using as…
19 00:03:20.960 ⇒ 00:03:23.289 Demilade Agboola: The… the name of the column.
20 00:03:23.400 ⇒ 00:03:25.169 Demilade Agboola: Even if the format is changing.
21 00:03:25.990 ⇒ 00:03:30.120 Mustafa Raja: Let me see… Hmm…
22 00:03:34.870 ⇒ 00:03:35.730 Mustafa Raja: B.
23 00:03:35.960 ⇒ 00:03:37.520 Mustafa Raja: Printing…
24 00:03:47.980 ⇒ 00:03:48.750 Mustafa Raja: Bro.
25 00:03:51.870 ⇒ 00:03:57.950 Demilade Agboola: Also, are there any particular, tables, like, because there are a lot of… there are a lot of,
26 00:03:58.500 ⇒ 00:04:01.540 Demilade Agboola: Excel files. Are there any ones we want in particular?
27 00:04:02.570 ⇒ 00:04:06.530 Mustafa Raja: I think we want all of this in the… What’s it called?
28 00:04:07.470 ⇒ 00:04:08.429 Demilade Agboola: In the warehouse.
29 00:04:08.430 ⇒ 00:04:10.020 Mustafa Raja: Yeah, in the warehouse.
30 00:04:10.930 ⇒ 00:04:23.329 Mustafa Raja: So, one thing is, we would first have to also detect how many rows we need to, you know, skip to actually get to the table, right? Is that a correct assumption?
31 00:04:24.510 ⇒ 00:04:29.199 Demilade Agboola: Mmm… I believe so…
32 00:04:29.370 ⇒ 00:04:36.169 Mustafa Raja: And then you would see that they’re skipping one row, one row, one row, and that’s not the case for every single file.
33 00:04:36.730 ⇒ 00:04:39.320 Demilade Agboola: I think, actually, no, because I believe…
34 00:04:40.250 ⇒ 00:04:44.240 Demilade Agboola: So let me… because I haven’t used Pandas in a long time, so I…
35 00:04:44.240 ⇒ 00:04:45.089 Mustafa Raja: I was just curious.
36 00:04:45.090 ⇒ 00:04:49.319 Demilade Agboola: I was like… But I do know that, yes, you can specify
37 00:04:49.610 ⇒ 00:04:56.189 Demilade Agboola: But I was also asking, like, ChatGPT, I was like, can you, like, programmatically do it?
38 00:04:56.530 ⇒ 00:05:04.419 Demilade Agboola: It’s just like, yeah, you can’t… the only thing you’ll need to do is largely specify what the column names will be, so I just shared that with you.
39 00:05:04.920 ⇒ 00:05:05.440 Mustafa Raja: Hmm…
40 00:05:05.440 ⇒ 00:05:12.200 Demilade Agboola: So it will define… so once he knows what the target schema is, it’s going to kind of go through and detect.
41 00:05:13.080 ⇒ 00:05:19.970 Mustafa Raja: Okay… Yeah, you see how different each of these is, you know? And I…
42 00:05:19.970 ⇒ 00:05:22.650 Demilade Agboola: So if the names are different, yeah, that would be a problem.
43 00:05:23.210 ⇒ 00:05:26.010 Mustafa Raja: Yeah, I think they are, right?
44 00:05:27.040 ⇒ 00:05:30.129 Mustafa Raja: I think these two are somewhat similar.
45 00:05:30.960 ⇒ 00:05:33.900 Mustafa Raja: V3, what this isn’t.
46 00:05:34.360 ⇒ 00:05:44.090 Mustafa Raja: And… And we are only looking into one folder, and we have about… A lot of folders.
47 00:05:45.300 ⇒ 00:05:48.529 Mustafa Raja: Yeah, this is a little different also.
48 00:05:57.600 ⇒ 00:05:59.440 Mustafa Raja: And this is also different.
49 00:06:00.640 ⇒ 00:06:05.089 Mustafa Raja: I guess these might be just similar… is this similar?
50 00:06:06.050 ⇒ 00:06:07.360 Mustafa Raja: somewhat similar.
51 00:06:10.440 ⇒ 00:06:18.289 Mustafa Raja: Okay, so I will try… I will try doing… I guess I will try doing it with pandas, and maybe get back to you on this?
52 00:06:18.770 ⇒ 00:06:19.770 Demilade Agboola: Okay, sure.
53 00:06:20.560 ⇒ 00:06:32.740 Demilade Agboola: I’m not sure if it doesn’t fit perfectly, we can then try and think about it again. Maybe certain names, like, for instance, the inventory tables might have similar… we can create the formats for inventory tables, create the format.
54 00:06:32.740 ⇒ 00:06:33.120 Mustafa Raja: So this…
55 00:06:33.120 ⇒ 00:06:33.490 Demilade Agboola: ceiling.
56 00:06:33.490 ⇒ 00:06:36.720 Mustafa Raja: This would then require some exploring, right?
57 00:06:37.420 ⇒ 00:06:41.690 Mustafa Raja: Because I really haven’t explored it too much, I just looked into…
58 00:06:41.940 ⇒ 00:06:47.869 Mustafa Raja: A lot of these, and they had a lot of different, you know, formats, and I was like.
59 00:06:48.160 ⇒ 00:06:55.300 Mustafa Raja: Because, these two folders, they had the same structure.
60 00:06:55.490 ⇒ 00:07:05.830 Mustafa Raja: They had the similar problem, but all of the files had the same structure, so I was able to do that. But this is, like, every third or fourth CSV has a different format.
61 00:07:06.980 ⇒ 00:07:11.040 Demilade Agboola: Yeah, that makes it tricky. So if we can just… even if it’s just to find a pattern.
62 00:07:11.170 ⇒ 00:07:16.689 Demilade Agboola: That might help us at least reduce the workload of what is left that we might have to now do manually.
63 00:07:16.940 ⇒ 00:07:19.670 Mustafa Raja: Okay, okay. I’ll get back to you on this then. Thank you.
64 00:07:19.670 ⇒ 00:07:21.669 Demilade Agboola: Alright, sounds good. Yeah, thank you.
65 00:07:21.920 ⇒ 00:07:22.620 Mustafa Raja: Right.
66 00:07:22.870 ⇒ 00:07:23.460 Demilade Agboola: Bye.