Meeting Title: Pandas Data Extraction Discussion Date: 2026-01-22 Meeting participants: Mustafa Raja, Demilade Agboola


WEBVTT

1 00:01:53.120 00:01:54.220 Demilade Agboola: Hi, Mustafa.

2 00:01:56.540 00:01:57.990 Mustafa Raja: Hey, how are you?

3 00:01:58.470 00:01:59.709 Demilade Agboola: I’m pretty good, how are you?

4 00:02:00.640 00:02:02.009 Mustafa Raja: Yeah, doing good.

5 00:02:04.800 00:02:09.009 Demilade Agboola: So, my question here was…

6 00:02:09.160 00:02:14.120 Mustafa Raja: Would Pandas be able to detect multiple tables within the same CSV?

7 00:02:14.560 00:02:20.530 Mustafa Raja: Not actually CSV, but, it would be Excel tab, right?

8 00:02:21.730 00:02:24.049 Demilade Agboola: So, are you talking about the format of the sheets?

9 00:02:24.630 00:02:25.680 Mustafa Raja: Yeah.

10 00:02:26.860 00:02:30.470 Demilade Agboola: So, for the sheets, you can always specify the names of the sheets.

11 00:02:30.710 00:02:33.569 Demilade Agboola: But… the tricky part would be…

12 00:02:34.300 00:02:40.360 Demilade Agboola: Are the column names the same? Like, are they… are they… even though the formats are different, are they similar column names?

13 00:02:42.560 00:02:45.170 Mustafa Raja: Sorry, could you… Could you say that again?

14 00:02:45.430 00:02:49.550 Demilade Agboola: I said, even if the formats are different, are the column names the same, or are they similar?

15 00:02:50.450 00:02:51.250 Mustafa Raja: No…

16 00:02:51.580 00:03:09.120 Mustafa Raja: it’s a lot different. It’s, each, each, not each, but, let’s say, after every, 3-4 CSVs, or after every 3-4 Excel files, the format changes just drastically.

17 00:03:10.720 00:03:15.870 Demilade Agboola: Okay, but what I’m saying is, are the column names the same? So, like, is date, date, is,

18 00:03:16.760 00:03:20.140 Demilade Agboola: You know, revenue, revenue, or whatever they’re using as…

19 00:03:20.960 00:03:23.289 Demilade Agboola: The… the name of the column.

20 00:03:23.400 00:03:25.169 Demilade Agboola: Even if the format is changing.

21 00:03:25.990 00:03:30.120 Mustafa Raja: Let me see… Hmm…

22 00:03:34.870 00:03:35.730 Mustafa Raja: B.

23 00:03:35.960 00:03:37.520 Mustafa Raja: Printing…

24 00:03:47.980 00:03:48.750 Mustafa Raja: Bro.

25 00:03:51.870 00:03:57.950 Demilade Agboola: Also, are there any particular, tables, like, because there are a lot of… there are a lot of,

26 00:03:58.500 00:04:01.540 Demilade Agboola: Excel files. Are there any ones we want in particular?

27 00:04:02.570 00:04:06.530 Mustafa Raja: I think we want all of this in the… What’s it called?

28 00:04:07.470 00:04:08.429 Demilade Agboola: In the warehouse.

29 00:04:08.430 00:04:10.020 Mustafa Raja: Yeah, in the warehouse.

30 00:04:10.930 00:04:23.329 Mustafa Raja: So, one thing is, we would first have to also detect how many rows we need to, you know, skip to actually get to the table, right? Is that a correct assumption?

31 00:04:24.510 00:04:29.199 Demilade Agboola: Mmm… I believe so…

32 00:04:29.370 00:04:36.169 Mustafa Raja: And then you would see that they’re skipping one row, one row, one row, and that’s not the case for every single file.

33 00:04:36.730 00:04:39.320 Demilade Agboola: I think, actually, no, because I believe…

34 00:04:40.250 00:04:44.240 Demilade Agboola: So let me… because I haven’t used Pandas in a long time, so I…

35 00:04:44.240 00:04:45.089 Mustafa Raja: I was just curious.

36 00:04:45.090 00:04:49.319 Demilade Agboola: I was like… But I do know that, yes, you can specify

37 00:04:49.610 00:04:56.189 Demilade Agboola: But I was also asking, like, ChatGPT, I was like, can you, like, programmatically do it?

38 00:04:56.530 00:05:04.419 Demilade Agboola: It’s just like, yeah, you can’t… the only thing you’ll need to do is largely specify what the column names will be, so I just shared that with you.

39 00:05:04.920 00:05:05.440 Mustafa Raja: Hmm…

40 00:05:05.440 00:05:12.200 Demilade Agboola: So it will define… so once he knows what the target schema is, it’s going to kind of go through and detect.

41 00:05:13.080 00:05:19.970 Mustafa Raja: Okay… Yeah, you see how different each of these is, you know? And I…

42 00:05:19.970 00:05:22.650 Demilade Agboola: So if the names are different, yeah, that would be a problem.

43 00:05:23.210 00:05:26.010 Mustafa Raja: Yeah, I think they are, right?

44 00:05:27.040 00:05:30.129 Mustafa Raja: I think these two are somewhat similar.

45 00:05:30.960 00:05:33.900 Mustafa Raja: V3, what this isn’t.

46 00:05:34.360 00:05:44.090 Mustafa Raja: And… And we are only looking into one folder, and we have about… A lot of folders.

47 00:05:45.300 00:05:48.529 Mustafa Raja: Yeah, this is a little different also.

48 00:05:57.600 00:05:59.440 Mustafa Raja: And this is also different.

49 00:06:00.640 00:06:05.089 Mustafa Raja: I guess these might be just similar… is this similar?

50 00:06:06.050 00:06:07.360 Mustafa Raja: somewhat similar.

51 00:06:10.440 00:06:18.289 Mustafa Raja: Okay, so I will try… I will try doing… I guess I will try doing it with pandas, and maybe get back to you on this?

52 00:06:18.770 00:06:19.770 Demilade Agboola: Okay, sure.

53 00:06:20.560 00:06:32.740 Demilade Agboola: I’m not sure if it doesn’t fit perfectly, we can then try and think about it again. Maybe certain names, like, for instance, the inventory tables might have similar… we can create the formats for inventory tables, create the format.

54 00:06:32.740 00:06:33.120 Mustafa Raja: So this…

55 00:06:33.120 00:06:33.490 Demilade Agboola: ceiling.

56 00:06:33.490 00:06:36.720 Mustafa Raja: This would then require some exploring, right?

57 00:06:37.420 00:06:41.690 Mustafa Raja: Because I really haven’t explored it too much, I just looked into…

58 00:06:41.940 00:06:47.869 Mustafa Raja: A lot of these, and they had a lot of different, you know, formats, and I was like.

59 00:06:48.160 00:06:55.300 Mustafa Raja: Because, these two folders, they had the same structure.

60 00:06:55.490 00:07:05.830 Mustafa Raja: They had the similar problem, but all of the files had the same structure, so I was able to do that. But this is, like, every third or fourth CSV has a different format.

61 00:07:06.980 00:07:11.040 Demilade Agboola: Yeah, that makes it tricky. So if we can just… even if it’s just to find a pattern.

62 00:07:11.170 00:07:16.689 Demilade Agboola: That might help us at least reduce the workload of what is left that we might have to now do manually.

63 00:07:16.940 00:07:19.670 Mustafa Raja: Okay, okay. I’ll get back to you on this then. Thank you.

64 00:07:19.670 00:07:21.669 Demilade Agboola: Alright, sounds good. Yeah, thank you.

65 00:07:21.920 00:07:22.620 Mustafa Raja: Right.

66 00:07:22.870 00:07:23.460 Demilade Agboola: Bye.