Meeting Title: Annie_Uttam Date: 2025-03-18 Meeting participants: Annie Yu, Uttam Kumaran


WEBVTT

1 00:03:42.640 00:03:44.419 Uttam Kumaran: Hey? How’s it going

2 00:03:44.580 00:03:46.446 Annie Yu: Hello, Tim! How’s it going

3 00:03:46.820 00:03:48.100 Uttam Kumaran: Good good morning.

4 00:03:50.380 00:03:51.210 Uttam Kumaran: Going on

5 00:03:52.063 00:03:54.080 Annie Yu: Is this still morning? Okay.

6 00:03:58.050 00:03:59.200 Uttam Kumaran: How’s it been

7 00:04:00.460 00:04:06.253 Annie Yu: Not too bad. I am realizing I’m actually taking tasks now. So I’m trying to

8 00:04:06.640 00:04:09.685 Uttam Kumaran: Oh, don’t! Don’t be! Don’t be stressed.

9 00:04:11.276 00:04:17.520 Annie Yu: And then 1 1 thing I was poking around about Joby’s dashboard, is it?

10 00:04:18.410 00:04:23.179 Annie Yu: How how is it that they are making? I guess, pretty good margin

11 00:04:23.520 00:04:30.349 Annie Yu: from the coffee products, but not the accessories I’m looking at, like all the minus, and it’s

12 00:04:31.600 00:04:33.670 Uttam Kumaran: Accessories they give away for free

13 00:04:33.940 00:04:35.320 Annie Yu: Oh, okay.

14 00:04:35.460 00:04:40.890 Uttam Kumaran: But that’s so. That’s 1 of the problems, though, that the executives flagged is that

15 00:04:41.100 00:04:46.239 Uttam Kumaran: they always give it out for free. So they want to bundle it, meaning they want to show it all as part of one

16 00:04:46.430 00:04:48.559 Annie Yu: Yeah, okay, that makes sense

17 00:04:48.560 00:04:51.780 Uttam Kumaran: That’s 1 of the open tickets.

18 00:04:52.280 00:04:54.610 Annie Yu: Yeah, got it? Got it.

19 00:04:55.080 00:04:57.710 Annie Yu: And right now we are.

20 00:04:59.040 00:05:02.300 Annie Yu: What’s the subject for this session?

21 00:05:03.040 00:05:04.249 Uttam Kumaran: For this session

22 00:05:04.250 00:05:05.080 Annie Yu: Hmm.

23 00:05:06.290 00:05:11.140 Uttam Kumaran: Pius has an address matching process

24 00:05:11.890 00:05:19.220 Uttam Kumaran: like. So basically, we wrote a script that matches customers between shopify and Amazon

25 00:05:21.433 00:05:27.280 Uttam Kumaran: Which is not, which doesn’t come out of the box. And so every week we we run

26 00:05:28.360 00:05:34.259 Uttam Kumaran: we every week or so we run that for the client on the existing addresses.

27 00:05:34.410 00:05:39.209 Uttam Kumaran: So in this meeting I just wanted to talk to see. I haven’t looked at it yet, but I want to see if

28 00:05:39.360 00:05:42.950 Uttam Kumaran: if we can. We can start to own

29 00:05:43.100 00:05:46.110 Uttam Kumaran: this process between me and you. Basically

30 00:05:47.330 00:05:48.410 Uttam Kumaran: So next week.

31 00:05:49.060 00:05:54.250 Uttam Kumaran: when the client asks we can run this, I I just I’m gonna look at it, too, for the 1st time. So

32 00:05:55.760 00:06:01.539 Annie Yu: So it’s to match the address from Amazon and shopify

33 00:06:01.540 00:06:07.039 Uttam Kumaran: It still match customers between Amazon and shopify, based on the addresses.

34 00:06:07.670 00:06:10.240 Uttam Kumaran: Because Amazon doesn’t give us a lot of

35 00:06:10.660 00:06:21.419 Uttam Kumaran: Amazon doesn’t give us enough data. They only give us like the address. But for shopify, we have some of the address data, so we can match to to figure out who’s buying on both platforms

36 00:06:22.348 00:06:27.550 Uttam Kumaran: So what? Let me just share my screen. And I’ll I’m gonna go through it, too. And I just wanted to include you because I’m

37 00:06:27.690 00:06:29.920 Uttam Kumaran: I’m gonna go through for the 1st time as well.

38 00:06:30.150 00:06:33.977 Uttam Kumaran: So it looks like,

39 00:06:39.390 00:06:40.590 Uttam Kumaran: Oh, okay.

40 00:06:55.360 00:06:58.840 Uttam Kumaran: okay. So address Matching is here. So let’s see this.

41 00:07:17.090 00:07:18.920 Uttam Kumaran: I’ll go to the pull request.

42 00:07:24.180 00:07:27.569 Uttam Kumaran: and there’s an address matching function here

43 00:07:30.400 00:07:32.960 Annie Yu: Do have one question. I don’t.

44 00:07:33.450 00:07:35.498 Annie Yu: I think I can’t get in

45 00:07:37.010 00:07:38.009 Uttam Kumaran: There you go!

46 00:07:40.250 00:07:42.790 Annie Yu: The the Github

47 00:07:46.052 00:07:47.229 Uttam Kumaran: Let me check

48 00:08:02.264 00:08:04.320 Uttam Kumaran: you’re you’re invited here

49 00:08:04.810 00:08:05.560 Annie Yu: Hmm!

50 00:08:05.970 00:08:07.060 Annie Yu: Where’s

51 00:08:07.950 00:08:11.739 Uttam Kumaran: Go to gear on the top right, go to your notifications.

52 00:08:11.910 00:08:16.549 Uttam Kumaran: You should see an invite, or if you go to your organizations here

53 00:08:19.900 00:08:28.170 Annie Yu: Okay, yeah, I see it now. Organization, yeah, think I’m in.

54 00:08:29.810 00:08:30.520 Annie Yu: Thanks

55 00:08:39.760 00:08:44.900 Uttam Kumaran: So all of the address matching code is here.

56 00:08:47.560 00:08:51.919 Uttam Kumaran: So maybe let’s watch the loom together because I haven’t seen anything.

57 00:09:05.480 00:09:25.080 Uttam Kumaran: Hey, guys, how’s it going? I just wanted to provide them works for closing. And we finally, basically, here’s the update is that we now have it set up. So it’s like very large, so you can run it again and again with new Amazon data as well as like, break up the code into like clear, traceable pieces

58 00:09:25.080 00:09:29.330 Annie Yu: Awesome, is it? Is there any way you can hide that sidebar

59 00:09:30.030 00:09:30.980 Uttam Kumaran: This one.

60 00:09:30.980 00:09:34.939 Annie Yu: Yeah, thanks. That that helps

61 00:09:34.940 00:10:03.630 Uttam Kumaran: Notebooks, and again with new Amazon data as well as like, break up the code into like clear, traceable pieces rather than a bunch of rogue Jupyter notebooks and scripts. So I’m just going to walk through everything that is here, and sort of like how you’re going to run this task repeatedly for shape. And Justin, on getting new Amazon data, new Walmart data from Key Place match. So essentially the file they provide. This is an example of what the file looks like. You sort of have

62 00:10:03.980 00:10:10.959 Uttam Kumaran: you sort of have these like key columns? Has domain, address, city state, these type of things.

63 00:10:11.466 00:10:15.739 Uttam Kumaran: Important to note that there’s sort of like when the files provided.

64 00:10:15.910 00:10:33.059 Uttam Kumaran: There’s like all these extra columns for certain types of data. So what you actually want to do is what you can do is just make sure that this is the header row and standardized correctly. Once you have that header row, you can actually delete everything outside this, so everything from columns A to B

65 00:10:33.330 00:10:47.420 Uttam Kumaran: as long as they don’t change the headers too much. This should work correctly. For the most part. This is the file that they’ll provide. What you have to do is, first, st you just clean up the file and make sure you delete all of these extra random cells. So you just like delete everything

66 00:10:48.000 00:10:51.610 Uttam Kumaran: that isn’t the core like A to P,

67 00:10:51.620 00:11:04.009 Uttam Kumaran: that’s an important note. So that’s what we have when it comes to the shop idea that we’re matching it to. We’re doing that here. The shop idea is actually all coming from a snow point. Query which I have linked. You just need to pull that again.

68 00:11:04.010 00:11:33.439 Uttam Kumaran: And I have ways that we don’t have to keep running the cleaning on the shopify data repeatedly. We can actually institutionalized as well. So let me just walk through what’s in the code. So we have the data loaders file, which is the 1st thing you need to know. And what that does is it basically loads in the data and cleans everything up the way it cleans everything up. Is it tries to standardize and tokenize everything in the address also tries to standardize the columns as well as like, applies the mappings. So we just have to make sure the country codes align all the stuff you don’t really need to know. But essentially what this thing does is you’ll have 2 functions here is like, load shopify data, load Amazon data

69 00:11:33.440 00:11:41.080 Uttam Kumaran: and clean it all up. And the way it cleans it up is it uses this like set of details where we clean up the name we normalize. Like unit and house numbers, we

70 00:11:41.090 00:12:09.479 Uttam Kumaran: apply a bunch of cleaning and tokenizing to the various address components. We also like standardize all the names you have, like street turns into street like street St. Whatever. We just make sure we map everything over to make sure there’s like a standardized set of names. These are useful because, like, it’s separate. So now you can just edit them. If you want to change different mappings, different types of things, you can continue to edit and iterate on this code in the future. Yeah. So that’s what this does. Is it like basically processes all the data, standardize the columns and tokenizes everything. There’s a bunch of progress bars to help you run

71 00:12:09.500 00:12:16.779 Uttam Kumaran: key call outs and some of these functions for the shopify data. You can see. Here’s the sampling query that actually runs it. You should have share permissions

72 00:12:16.810 00:12:45.729 Uttam Kumaran: important to know this test flag, not really to worry about. It’s just like, if you want to test a smaller set of data because these data sets are huge, any changes that you run use captions are important is like, I have at the end of this. It runs like a dump. Local dump. Csv, you know, not the best practice, but it basically like dumps a Csv locally. And if you want to run it again, if you’re pulling the shop idea given, there’s like over a million customer emails in there and a million addresses. You can basically take the latest file as long as the snowflake query isn’t too updated, you can just reuse it. So I just set this slide to. True. If I’m like

73 00:12:45.840 00:12:47.620 Uttam Kumaran: running something fast with me.

74 00:12:47.970 00:13:13.219 Uttam Kumaran: they usually address utils. There’s a bunch of tokenizing. There’s some documentation on what goes on there. The next key thing is sort of we have these, the matching algorithm itself. Here, we actually have, like key criteria. So these are the weightings that it uses to match criterias like how close the match needs to be. This is a score 0 to 100. We have the confidence levels of how the scoring is done. And it’s based on these weights. There’s functions here that are important is identifying matches.

75 00:13:13.370 00:13:30.750 Uttam Kumaran: We’ll help you match and figure out the matches in different Amazon and shopify data sets. This actually calculates the match score using the weights that we’ve set up and the stitch identified. Data just puts everything together, including the metrics, you can get everything out finally, this is just like a print match report. So if you just want to print and understand between shopify Amazon and like our matches, data set like what happened

76 00:13:30.750 00:13:47.040 Uttam Kumaran: and give a good summary. If you want to provide that adjustment we can. So the final thing I’ll just say, here is an identifier runner. Dot pi is just a folder that has all the different scripts that just run it all together so you can make it easy. And I’ve also written like a main function where you can theoretically speed in an Amazon path and run it from command line.

77 00:13:48.890 00:13:54.170 Uttam Kumaran: Okay, so looks like we got

78 00:13:54.870 00:13:58.789 Uttam Kumaran: the 2 files from them today.

79 00:13:59.600 00:14:01.130 Uttam Kumaran: Which is this

80 00:14:04.800 00:14:07.000 Uttam Kumaran: Amazon customer list.

81 00:14:11.330 00:14:12.590 Uttam Kumaran: These 2

82 00:14:13.320 00:14:17.040 Annie Yu: I also can’t access those files

83 00:14:17.040 00:14:21.879 Uttam Kumaran: Yeah, I’m gonna I’m gonna I’ll I’ll send it. I’ll start a thread here.

84 00:14:35.900 00:14:38.010 Uttam Kumaran: Okay, so that’s there.

85 00:14:40.180 00:14:41.762 Uttam Kumaran: Let’s talk about

86 00:14:43.760 00:14:46.850 Uttam Kumaran: Let’s talk about snowflake piece.

87 00:14:47.350 00:14:53.629 Uttam Kumaran: So he mentioned that, and I’ll just log into their snowflake so I can run that query that he sends

88 00:14:54.050 00:14:54.490 Annie Yu: Hmm.

89 00:15:07.160 00:15:12.849 Uttam Kumaran: So the so he said, it’s somewhere in the code, right?

90 00:15:18.040 00:15:25.710 Uttam Kumaran: So it looks like line 55 of data loaders is this query.

91 00:15:26.000 00:15:29.949 Uttam Kumaran: I’m just gonna go to it and see what it says. Okay.

92 00:15:30.230 00:15:33.639 Uttam Kumaran: oh, it looks like he’s just trying to get. He’s getting all the customers

93 00:15:34.230 00:15:38.480 Annie Yu: Yeah from the is that the shopify? Right?

94 00:15:38.480 00:15:42.079 Uttam Kumaran: Yeah, from from well, this is actually all customers.

95 00:15:43.040 00:15:44.720 Uttam Kumaran: This will actually be

96 00:15:46.350 00:15:48.019 Annie Yu: The dim customers

97 00:15:48.020 00:15:49.270 Uttam Kumaran: This will be everything.

98 00:15:49.930 00:15:52.550 Uttam Kumaran: So I guess 1 1 of my questions is.

99 00:15:54.280 00:15:56.659 Uttam Kumaran: can we just pull this? Oh.

100 00:15:57.160 00:16:01.989 Uttam Kumaran: well, the so, looking at the where clauses we’re not gonna have.

101 00:16:05.560 00:16:08.242 Uttam Kumaran: I guess it’s a good question, like, let me

102 00:16:10.740 00:16:13.380 Uttam Kumaran: I’m good. Let me just copy this to a new page.

103 00:16:29.440 00:16:31.660 Uttam Kumaran: Okay, it looks like these filters

104 00:16:33.270 00:16:41.329 Uttam Kumaran: just maintain that it’s just shopify and Tiktok. So probably it’s because we don’t have emails from the people from Amazon. It’s my guess

105 00:16:41.980 00:16:42.570 Annie Yu: Yep.

106 00:16:43.615 00:16:48.370 Uttam Kumaran: So okay, so he’s basically taking this whole query. And then

107 00:16:53.560 00:16:57.850 Uttam Kumaran: I mean, I guess we can just try to pass it. Do you want to try it on your side.

108 00:17:00.540 00:17:08.780 Annie Yu: Okay, let me. I was still in the process of getting into the the github because it

109 00:17:09.690 00:17:12.640 Annie Yu: set up the 2 fa

110 00:17:12.839 00:17:14.369 Uttam Kumaran: Okay, yeah. Go check that out

111 00:17:22.640 00:17:23.430 Annie Yu: Hmm!

112 00:18:29.230 00:18:34.020 Annie Yu: Can you remind me again how to get to this page? I’m not in the organization

113 00:18:34.350 00:18:36.850 Uttam Kumaran: Are you in? Are you in a brain? Forge? AI! Here

114 00:18:36.850 00:18:37.480 Annie Yu: Yes.

115 00:18:38.120 00:18:40.930 Uttam Kumaran: So if you just search here for Javi coffee

116 00:18:42.320 00:18:48.119 Annie Yu: Looks like. I only have one that everybody eats

117 00:18:48.670 00:18:51.755 Uttam Kumaran: Oh, okay, hold on one second.

118 00:18:52.270 00:18:53.717 Annie Yu: View, invitation, right.

119 00:18:55.160 00:18:58.170 Uttam Kumaran: What’s your what is your username? Again.

120 00:18:58.170 00:19:01.690 Annie Yu: Wait! Wait! Actually, never mind, I’m I’m in. I’m back

121 00:19:01.690 00:19:02.850 Uttam Kumaran: Okay, cool.

122 00:19:03.710 00:19:05.840 Uttam Kumaran: It’s okay.

123 00:19:06.170 00:19:08.050 Annie Yu: Just drop me coffee right

124 00:19:08.050 00:19:11.080 Uttam Kumaran: Yes, yeah, you’re here. Okay, cool.

125 00:19:12.830 00:19:17.020 Uttam Kumaran: Yes. And so if you go to the Javi coffee repo.

126 00:19:17.770 00:19:20.400 Uttam Kumaran: he’s in here and under data science

127 00:19:20.400 00:19:21.020 Annie Yu: No.

128 00:19:21.230 00:19:22.729 Uttam Kumaran: Where you’re gonna see everything

129 00:19:22.730 00:19:23.250 Annie Yu: Yep.

130 00:19:47.440 00:19:48.180 Annie Yu: Manage.

131 00:20:24.490 00:20:29.130 Annie Yu: Are you in the address? Match file.

132 00:20:32.720 00:20:34.619 Uttam Kumaran: I’m in data, science folder

133 00:20:36.970 00:20:37.750 Annie Yu: Yeah.

134 00:20:38.870 00:20:44.420 Uttam Kumaran: And so yeah, pies went through the

135 00:20:45.500 00:20:51.800 Uttam Kumaran: I mean, he went through data loaders. And I think his. And then this is his match algorithm for how they do the matching

136 00:20:51.800 00:20:52.310 Annie Yu: Yeah.

137 00:20:52.680 00:20:54.609 Uttam Kumaran: And then this is the runner file.

138 00:20:54.800 00:20:57.189 Uttam Kumaran: So ideally, I think you could just do.

139 00:20:57.950 00:21:02.420 Uttam Kumaran: I think you could just do generate matches.

140 00:21:07.590 00:21:08.400 Uttam Kumaran: Yeah.

141 00:21:26.890 00:21:27.800 Annie Yu: Maybe.

142 00:23:38.880 00:23:50.859 Annie Yu: Yes, I’m I’m looking at data loaders. Is this complete or not? I I think it’s not or

143 00:23:50.860 00:23:51.780 Uttam Kumaran: What do you mean?

144 00:23:52.330 00:23:56.610 Annie Yu: So this one is complete already. Is that

145 00:23:56.610 00:23:59.330 Uttam Kumaran: What do you mean? What do you mean by what do you mean by complete though

146 00:23:59.330 00:24:01.299 Annie Yu: So we we don’t have touch anything here

147 00:24:01.300 00:24:04.350 Uttam Kumaran: Oh, yeah, no, I don’t actually think we need to touch anything. I

148 00:24:04.350 00:24:05.130 Annie Yu: Oh, okay.

149 00:24:05.130 00:24:06.850 Uttam Kumaran: I just think we need to run.

150 00:24:07.490 00:24:10.229 Uttam Kumaran: I think you need to run generate matches.

151 00:24:12.840 00:24:15.480 Uttam Kumaran: Yeah, I think you just need to run the matches.

152 00:24:16.840 00:24:20.470 Uttam Kumaran: And then, whatever the final data frame is.

153 00:24:20.630 00:24:24.059 Uttam Kumaran: that’s that’s what we need to send back to them.

154 00:24:28.100 00:24:29.240 Annie Yu: So do it.

155 00:24:29.370 00:24:31.170 Annie Yu: Do we do it here.

156 00:24:31.910 00:24:33.809 Uttam Kumaran: You would run this on your on your laptop.

157 00:24:35.030 00:24:35.800 Annie Yu: Okay.

158 00:24:37.680 00:24:40.650 Uttam Kumaran: Or you can run this in. Yeah, I mean, you can run this in.

159 00:24:42.170 00:24:43.820 Uttam Kumaran: Yeah, locally, basically.

160 00:24:45.370 00:24:49.780 Annie Yu: Okay, so what’s what’s the the and the next step? Now, can you

161 00:24:49.780 00:24:57.135 Uttam Kumaran: Yeah. So the the task right now is, I sent the 2 Amazon files. The the 1st thing to do is

162 00:24:57.620 00:24:59.549 Uttam Kumaran: You’ll need to run this

163 00:24:59.550 00:25:00.030 Annie Yu: Hmm.

164 00:25:00.030 00:25:02.380 Uttam Kumaran: Export the results locally.

165 00:25:03.730 00:25:04.360 Annie Yu: Yeah.

166 00:25:04.620 00:25:10.512 Uttam Kumaran: And then once you run once and I’ll I’ll I’ll send this to you in in the slack as well. Maybe I’ll

167 00:25:14.820 00:25:27.290 Uttam Kumaran: So the steps are one, save this list locally to run the runners.py function

168 00:25:29.610 00:25:36.790 Uttam Kumaran: So the second, the second thing is actually cleaning the excel sheets if needed.

169 00:25:38.350 00:25:44.370 Uttam Kumaran: Bias, explain in the room and then run the runners Pi function

170 00:25:44.920 00:25:48.610 Uttam Kumaran: and then return the matched return.

171 00:25:50.540 00:25:55.260 Uttam Kumaran: 2 matched Csv outputs

172 00:25:59.640 00:26:00.280 Annie Yu: Yeah.

173 00:26:01.470 00:26:02.630 Uttam Kumaran: You want to give it a shot

174 00:26:04.880 00:26:07.799 Uttam Kumaran: I’m good. I’m I have to run. I have to run to a meeting

175 00:26:07.800 00:26:08.550 Annie Yu: Okay.

176 00:26:09.340 00:26:09.959 Uttam Kumaran: Give it a shot

177 00:26:10.990 00:26:20.310 Annie Yu: And okay, okay? And one more question. I think he might have mentioned this. So he cleaned already the the Amazon.

178 00:26:20.410 00:26:25.540 Annie Yu: the Amazon data, but the the format might still not match

179 00:26:25.840 00:26:29.079 Uttam Kumaran: So I think I think the only thing he said is, they may send us

180 00:26:29.380 00:26:36.700 Uttam Kumaran: the Csv’s. But just to make sure that the Csv just has the columns that the function requires

181 00:26:36.700 00:26:37.130 Annie Yu: Okay.

182 00:26:37.130 00:26:39.339 Uttam Kumaran: If there’s any extras, he said, just delete them

183 00:26:40.110 00:26:40.820 Annie Yu: Okay.

184 00:26:41.090 00:26:43.090 Annie Yu: Alright, I’m gonna try that

185 00:26:43.090 00:26:45.130 Uttam Kumaran: Try it out, try it out! This is fun!

186 00:26:45.350 00:26:46.080 Annie Yu: Sure.

187 00:26:46.080 00:26:46.530 Uttam Kumaran: Happens

188 00:26:46.530 00:26:46.900 Annie Yu: That

189 00:26:46.900 00:26:51.310 Uttam Kumaran: I think it may take a while. So yeah, if it’s running.

190 00:26:51.750 00:26:59.030 Uttam Kumaran: I maybe I can ask bias like, what are the runtime expectations.

191 00:26:59.250 00:27:02.320 Uttam Kumaran: but give it a shot, and just just laugh me. I’ll be on

192 00:27:02.760 00:27:04.780 Annie Yu: All right, all right. Let me do that.

193 00:27:05.000 00:27:06.250 Uttam Kumaran: Okay. Thank you.

194 00:27:06.250 00:27:08.640 Annie Yu: Oh! And one more question.

195 00:27:08.640 00:27:09.330 Uttam Kumaran: Yeah.

196 00:27:09.330 00:27:18.320 Annie Yu: I I’ve never like run my code on the git I usually like, use pi charm and then push it back. So what’s the best way to do that here

197 00:27:18.626 00:27:21.079 Uttam Kumaran: You can, so you can use github desktop

198 00:27:21.610 00:27:22.240 Annie Yu: Okay.

199 00:27:22.240 00:27:25.399 Uttam Kumaran: Right Github desktop. You can clone the repo

200 00:27:25.660 00:27:26.360 Annie Yu: Hmm.

201 00:27:26.360 00:27:27.520 Uttam Kumaran: To your machine.

202 00:27:27.660 00:27:32.530 Uttam Kumaran: and then you can run it. If you can run this in Pycharm, you can run this in Vs code. You can run this in terminal

203 00:27:32.990 00:27:34.460 Annie Yu: Okay. Alright.

204 00:27:34.760 00:27:37.740 Annie Yu: Okay, thanks so much. Well, let me try this.

205 00:27:37.740 00:27:39.810 Annie Yu: Let me know how it goes.

206 00:27:40.290 00:27:40.790 Uttam Kumaran: Thank you.

207 00:27:40.790 00:27:41.360 Annie Yu: Bye.

208 00:27:41.360 00:27:41.850 Uttam Kumaran: Bye.