Meeting Title: Magic Spoon Data Extraction Sync Date: 2026-01-07 Meeting participants: Ashwini Sharma, Michael Thorson


WEBVTT

1 00:11:36.190 00:11:36.710 Ashwini Sharma: Hey!

2 00:11:36.710 00:11:39.829 Michael Thorson: Hey, Ashwini, I’m so sorry about that. Sorry to keep hanging in.

3 00:11:39.970 00:11:41.230 Ashwini Sharma: No problem, no problem.

4 00:11:41.960 00:11:47.210 Ashwini Sharma: Alright, so I had a few quick questions on… on this,

5 00:11:48.300 00:11:51.399 Ashwini Sharma: this report, right? Let me share my screen once.

6 00:11:57.770 00:12:01.370 Ashwini Sharma: Oh, which one is… I think this one.

7 00:12:02.430 00:12:02.960 Ashwini Sharma: Q.

8 00:12:04.970 00:12:06.500 Ashwini Sharma: Can you see my screen?

9 00:12:07.900 00:12:09.589 Michael Thorson: Yep, I’m seeing your calendar.

10 00:12:09.590 00:12:14.010 Ashwini Sharma: Alright, okay, cool. Yeah.

11 00:12:14.320 00:12:21.380 Ashwini Sharma: So, I looked into these APIs, and then I’m kind of hitting these APIs to extract the data.

12 00:12:22.010 00:12:29.150 Ashwini Sharma: So, one of the things that I wanted to understand is, you had shared some of the reports, I think, right?

13 00:12:29.440 00:12:37.010 Ashwini Sharma: And I see here the time period is always 52 weeks, right? There was another report also where it was still 52 weeks. Is this the…

14 00:12:37.720 00:12:43.839 Ashwini Sharma: Is this the timeline where you want to aggregate the data for,

15 00:12:44.480 00:12:51.690 Ashwini Sharma: Various dimensions, or… or should we just keep it at one week, and then everything else can be derived later on, right?

16 00:12:53.540 00:12:57.380 Michael Thorson: Let me think about that.

17 00:13:04.160 00:13:09.270 Michael Thorson: So, a couple, couple things. So, I guess, first, the,

18 00:13:09.670 00:13:20.429 Michael Thorson: I don’t know if you’ve experimented too much with the time period end date, but the time period end date, just for, like, something I found out, is it has to align with the data release schedule. Did you find that at all?

19 00:13:21.020 00:13:24.899 Ashwini Sharma: So, what is that thing? Time, period, end date has to align with? .

20 00:13:26.020 00:13:27.910 Michael Thorson: Yeah, let me…

21 00:13:28.420 00:13:45.039 Michael Thorson: So, the Spins data aggregator, like, that company, like, they release data once a month, so, like, the column C, which is a query parameter, or time period end date, it has to align with one of the, like, data release

22 00:13:45.360 00:13:46.280 Michael Thorson: Dates?

23 00:13:46.600 00:13:50.050 Michael Thorson: So I have the calendar, and I can, like, share that over with you as well.

24 00:13:50.080 00:14:09.950 Michael Thorson: Okay. I’ve been using 8-10 since, like, that’s when I started this project, and I’ve been just kind of using that as my time period end date. But it’s basically, like, every month, we can pull these different time periods backed off of that end date. So it’s like, I see what you’re saying with, like, oh, we could just, like, pull one week at a time, and…

25 00:14:10.030 00:14:11.450 Michael Thorson: Do all history.

26 00:14:12.140 00:14:15.350 Michael Thorson: Aggregate on our end.

27 00:14:16.040 00:14:21.530 Ashwini Sharma: So, this is what I was trying to say, right? If you look at time periods, right,

28 00:14:22.570 00:14:39.859 Ashwini Sharma: And this is what it has. So, basically, the first set of values, if you see, it starts at 20220109, and then it’s a one-week length, right? And the next value is just 7 days after that, right? One week. And then 7 days after that, one week. So this goes on until…

29 00:14:40.250 00:14:46.130 Ashwini Sharma: until, 2025, November 30th. I think that’s, that’s the last,

30 00:14:46.340 00:14:49.739 Ashwini Sharma: Time period for which the system has data.

31 00:14:50.280 00:14:51.970 Ashwini Sharma: Right over here.

32 00:14:52.160 00:15:07.680 Ashwini Sharma: Here, yeah, this is the last, right? And then we start again from this date, which is for 4 weeks, right? So, basically, when you provide this time period, end date as 2022 0-130, and then length as 4 weeks, what the system does internally is.

33 00:15:07.680 00:15:15.370 Ashwini Sharma: It aggregates data 4 weeks prior to this end date, and then it returns you, right, based on the various attributes, right?

34 00:15:15.620 00:15:31.810 Ashwini Sharma: So that looks like a derived data to me. I mean, I would be happy if we had absolute raw data, but that’s not available through spins, and this is the… this is the, you know, smallest aggregation in terms of time period that we can have, one week, right?

35 00:15:31.920 00:15:37.090 Ashwini Sharma: So I was thinking, like, if we have one week of data, let me share something else, right?

36 00:15:37.450 00:15:42.949 Ashwini Sharma: Which is one of the extracts, yeah… this one.

37 00:15:46.870 00:15:49.569 Ashwini Sharma: Are you able to see this, Excel sheet?

38 00:15:54.320 00:15:55.989 Ashwini Sharma: Okay, can you see this Excel sheet?

39 00:15:56.820 00:15:58.199 Michael Thorson: Yeah, I can see it.

40 00:15:58.200 00:16:06.690 Ashwini Sharma: Okay, yeah. So, I did extraction for, one time period, right? Let me show you the parameters that I had sent,

41 00:16:07.450 00:16:10.989 Ashwini Sharma: Right? So this is the extraction that I’ve done.

42 00:16:11.630 00:16:29.539 Ashwini Sharma: And, what it does is, actually, I added almost all the attributes that are there, accessible to us, and… and all the measures as well, right? So, basically, these are the measures, and… and what I’ve not done over here is the Iago thing, right? Against each measure.

43 00:16:29.570 00:16:35.579 Ashwini Sharma: And, you know, that’s missing right now from this one, and

44 00:16:35.980 00:16:44.699 Ashwini Sharma: We need to ensure that the reporting level contains only one value, so there are around 7 different reporting levels, so that goes in a loop, and then

45 00:16:45.050 00:16:58.969 Ashwini Sharma: creates this extract for each reporting level, and then geography, I just looked into one, but what I can do is, like, I can add all the geographies possible, and then generate one extract for one brand at a time.

46 00:17:00.620 00:17:08.279 Ashwini Sharma: And what I was thinking is, like, in these time frames, right, if I provide all the different time frames that exist.

47 00:17:09.130 00:17:10.660 Ashwini Sharma: for every week.

48 00:17:11.839 00:17:14.749 Ashwini Sharma: Then any other data could be derived from that.

49 00:17:19.710 00:17:37.069 Michael Thorson: Yes. I think… we just need to validate that, yeah. I think, like, I’m totally with you, I think that should be fine. I think maybe we should valid… just, like, QA that against, like, what they’re doing… what Spins is aggregating on their end, just to make sure we’re not…

50 00:17:37.290 00:17:44.980 Michael Thorson: Like, you know, they’re modeling on their end, and then we’re modeling on our end, and then our aggregations are, like, slightly different.

51 00:17:45.760 00:17:59.849 Ashwini Sharma: Right, I’ll validate that, and this is why I thought, you know… in fact, like, I’d looked at one set of data, and then it worked, but I’ll validate it against, for example, like, 2020, right? This is one of the end dates.

52 00:17:59.970 00:18:06.369 Ashwini Sharma: And this is a one week, and if I, if I go at four weeks, time frame, right.

53 00:18:06.370 00:18:07.190 Michael Thorson: Yeah, totally.

54 00:18:07.370 00:18:10.589 Ashwini Sharma: It’s the same date from which it starts, right?

55 00:18:10.590 00:18:12.500 Michael Thorson: Yeah, yeah, yeah, yeah, makes sense.

56 00:18:12.720 00:18:19.360 Ashwini Sharma: Right over here, 130, right? It doesn’t have any earlier than that data, right?

57 00:18:19.470 00:18:21.190 Ashwini Sharma: And it looks like.

58 00:18:21.190 00:18:23.249 Michael Thorson: Yeah, that’s… that’s expected.

59 00:18:23.250 00:18:23.860 Ashwini Sharma: Yeah.

60 00:18:24.080 00:18:26.929 Ashwini Sharma: Right. So the thing is,

61 00:18:28.610 00:18:47.940 Ashwini Sharma: right now, in order to test this thing, what I’m doing is, you know, I’m building a Docker container and running it locally in my machine, right? But I need a mechanism to write it into the Redshift for my testing purpose. So, do you have an instance of Redshift that I can use to write this data?

62 00:18:49.520 00:18:55.419 Ashwini Sharma: Or, like, you know… Or what I could do instead is, you have already given.

63 00:18:55.420 00:18:55.920 Michael Thorson: Yeah.

64 00:18:55.920 00:18:56.370 Ashwini Sharma: iterate.

65 00:18:56.370 00:18:56.750 Michael Thorson: Yeah, yeah.

66 00:18:56.750 00:18:57.950 Ashwini Sharma: this, right?

67 00:19:01.610 00:19:02.970 Ashwini Sharma: Yeah, we can do the CSM.

68 00:19:03.090 00:19:04.230 Ashwini Sharma: Yeah.

69 00:19:05.780 00:19:07.340 Michael Thorson: One sec,

70 00:19:08.180 00:19:23.390 Michael Thorson: Yeah, let me… I’m… I’m pulling up my IDE right now, and I wanted to show… have you explored the… before we hop away from this Apollo sandbox, have you explored the… go back to root?

71 00:19:23.640 00:19:25.570 Michael Thorson: On the documentation, on the left?

72 00:19:25.590 00:19:26.340 Ashwini Sharma: Yep.

73 00:19:26.340 00:19:33.830 Michael Thorson: Just… this may help… so in the mutation, like, yeah, click into mutation, this…

74 00:19:34.110 00:19:38.210 Michael Thorson: So, create extract from Market Insights Trend.

75 00:19:38.440 00:19:54.210 Michael Thorson: is probably what you want to use. Instead of, like, looping and doing, like, one end date, one week, you can set end date, length, length… so it’s, like, end date equals 1.30, like, the last release date, the length is one week, and then you can also set the number of periods.

76 00:19:54.210 00:19:59.509 Michael Thorson: So you could say, pull the last 52 weeks in one packet of information.

77 00:19:59.950 00:20:08.580 Michael Thorson: I was experimenting, so that should be, like, just a way easier, like, simpler, API call.

78 00:20:09.140 00:20:10.780 Michael Thorson: I don’t know, were you in this?

79 00:20:10.940 00:20:13.860 Michael Thorson: Yeah, it is going on. I can show you…

80 00:20:14.490 00:20:17.199 Michael Thorson: I can show you the parameter I use, but

81 00:20:17.730 00:20:20.560 Michael Thorson: Yeah, let me just share my screen really quick, or unless you want to…

82 00:20:20.560 00:20:25.579 Ashwini Sharma: Oh, oh, this is good, right? So the number of periods can be anything between

83 00:20:26.380 00:20:33.610 Ashwini Sharma: up to 2 to part 31 minus 1, that covers everything that we need, right? So, maybe, like, I can click the latest…

84 00:20:33.950 00:20:42.950 Ashwini Sharma: I can do this one, and then mark it as one-week data, and then give it to me for, you know, since 1970, or something like that.

85 00:20:42.950 00:20:49.510 Michael Thorson: That might save you a couple loops, for sure. A couple trips to the server.

86 00:20:49.860 00:20:52.380 Ashwini Sharma: Yeah, yeah, I think this is cool, this is cool, yeah.

87 00:20:52.380 00:21:05.119 Michael Thorson: Okay, cool. Yeah, I found this… I explained… I finally explored this endpoint, like, last week, and I was like, wait, this is, like, way easier. We’ll still probably have to loop for a few different parameters, but we…

88 00:21:05.120 00:21:08.969 Ashwini Sharma: Brands, sorry, reporting levels, mainly. That’s where you have to look.

89 00:21:09.200 00:21:14.569 Michael Thorson: Yeah, reporting levels, so it’s like, reporting levels and also product universe, so I think you…

90 00:21:15.770 00:21:17.700 Michael Thorson: I do one at a time for those.

91 00:21:17.700 00:21:24.889 Ashwini Sharma: No, I didn’t have to do anything for the product. Like, for example, over here, in this one, right, I included all the three product universes.

92 00:21:25.120 00:21:26.530 Ashwini Sharma: And it worked.

93 00:21:26.750 00:21:27.739 Michael Thorson: Oh, nice.

94 00:21:27.980 00:21:37.960 Ashwini Sharma: Yeah, it tells in the video. In the video, it does tell them, you know, you can put only one, but yeah, I just tried putting all three of them over there, and then it worked.

95 00:21:39.240 00:21:43.069 Ashwini Sharma: Cisco. Yeah, so, I… And did you.

96 00:21:43.070 00:21:45.260 Michael Thorson: Did you see the… yeah, and then…

97 00:21:46.290 00:21:51.749 Michael Thorson: Cool. And then for the geographies, did you see the packet I had in the Excel? You’re good on that?

98 00:21:53.610 00:21:54.310 Ashwini Sharma: Honey.

99 00:21:54.310 00:21:57.210 Michael Thorson: array pre-built in the Excel for, like.

100 00:21:58.160 00:22:02.100 Ashwini Sharma: In the Excel… was it this one?

101 00:22:03.620 00:22:09.059 Michael Thorson: Oh… Yeah… yeah,

102 00:22:10.590 00:22:18.339 Michael Thorson: Look on the… yeah, here we go. Yeah, these are some examples, so if you do… if you look at the selects, for example.

103 00:22:18.530 00:22:21.039 Michael Thorson: And it’s all, like, horribly furnished.

104 00:22:24.430 00:22:26.160 Ashwini Sharma: No, these are all the measures.

105 00:22:26.160 00:22:27.620 Michael Thorson: Like, I pre-built…

106 00:22:28.280 00:22:33.330 Ashwini Sharma: If we build something, oh, this, I think this one?

107 00:22:35.260 00:22:36.589 Michael Thorson: Yeah, this one. There you go.

108 00:22:36.590 00:22:37.070 Ashwini Sharma: This one.

109 00:22:37.070 00:22:42.270 Michael Thorson: The geography, yeah, you can use that array for filtering geographies eventually.

110 00:22:42.270 00:22:42.799 Ashwini Sharma: Oh, okay.

111 00:22:42.800 00:22:46.340 Michael Thorson: like, business logic for… from our team. It’s, like, what they care about.

112 00:22:47.000 00:22:50.650 Ashwini Sharma: Oh, this is the only thing that’s needed? You don’t need any other geographies?

113 00:22:51.030 00:22:55.520 Ashwini Sharma: Yeah, okay, cool, cool, yeah. I was planning to add all the geographies that are there.

114 00:22:56.360 00:23:08.559 Michael Thorson: Sweet. Yeah, that’ll just, like, save you some time of, like, mapping, and then for now, I think we can just focus on serial, so you see how, like, subcategory is just those two SS Serial and SS Serial cold kids?

115 00:23:08.950 00:23:24.550 Michael Thorson: Okay. So, subcategories, like, that spins, like, basically, like, yeah, like, what grocery aisle it’s in, or whatever. So, we have a couple more product lines that we’re looking at, but for this first, API hit, we can just hit SS…

116 00:23:24.770 00:23:27.130 Michael Thorson: Like, the serial subcategories.

117 00:23:27.720 00:23:28.510 Ashwini Sharma: Okay.

118 00:23:28.900 00:23:29.670 Ashwini Sharma: Awesome, man.

119 00:23:29.670 00:23:35.749 Michael Thorson: And you’re totally… yeah, and you’re a lot… like, sorry, you’re correct with the reporting level, though. That’ll need to switch between

120 00:23:35.940 00:23:38.590 Michael Thorson: Yeah, subcategory, brand, and UPC.

121 00:23:39.400 00:23:48.010 Ashwini Sharma: Right, yeah, yeah, there are 7 of these, right, at reporting levels, so we’ll have to iterate across, each of these, and then get it.

122 00:23:48.010 00:23:59.860 Ashwini Sharma: And in terms of this thing, right, the attributes and measures, are we looking at all the measures available, and all the attributes available, or is it some selected ones that…

123 00:24:00.840 00:24:01.940 Ashwini Sharma: That you want to look at?

124 00:24:01.940 00:24:08.580 Michael Thorson: This list of the attributes is what our sales… what our sales team wants, so we can use that for now.

125 00:24:10.030 00:24:11.330 Ashwini Sharma: Okay.

126 00:24:13.070 00:24:22.340 Ashwini Sharma: Okay, so there’s a difference between what they want now versus what they will ask for later. So…

127 00:24:22.450 00:24:30.999 Ashwini Sharma: I mean, considering… I mean, since the raw data is available, would it make sense to get all of them, or is it going to be a…

128 00:24:31.580 00:24:36.340 Ashwini Sharma: You know, it will be slightly low on the server, definitely.

129 00:24:39.490 00:24:40.579 Ashwini Sharma: Okay, we’ll start.

130 00:24:40.960 00:24:43.179 Ashwini Sharma: With the limited set, yeah.

131 00:24:44.750 00:24:59.110 Michael Thorson: Yeah, let me know what you think. I mean, a lot of the data is pretty irrelevant for us. These are the ones that I know for a fact she’s been utilizing. She’s been working with this dataset for, like, most of her career, so she’s, like… I feel like she’s, like, she kind of knows what she wants, if you will.

132 00:24:59.990 00:25:11.909 Michael Thorson: Okay. But to your point, if we want to use this to drive the modeling in a little bit, I totally understand that. I just wasn’t sure how expin… like, how much larger the dataset would be to include all the fields versus…

133 00:25:12.440 00:25:16.470 Michael Thorson: just this subset. It gets pretty big.

134 00:25:16.470 00:25:26.459 Ashwini Sharma: it returned quite fast, right? For example, like, for one geography that I selected, everything, right? That came up in a few seconds.

135 00:25:26.950 00:25:35.000 Ashwini Sharma: And it… it didn’t have, let me see where it went… 4… yeah.

136 00:25:35.250 00:25:38.460 Ashwini Sharma: So it had almost… how many rows?

137 00:25:42.090 00:25:43.839 Michael Thorson: Is it one geography? Yeah.

138 00:25:43.840 00:25:45.180 Ashwini Sharma: Yeah, this is one geography.

139 00:25:45.790 00:25:47.859 Michael Thorson: One week, one geography.

140 00:25:48.280 00:25:52.810 Ashwini Sharma: One week, one geography, one reporting level.

141 00:25:55.930 00:25:57.520 Ashwini Sharma: 727 rows.

142 00:25:58.770 00:26:07.430 Michael Thorson: So it’ll fan out by product universe times 3, plus the geography times, like, it’s, like, 30 or 40 or something.

143 00:26:07.710 00:26:08.090 Ashwini Sharma: Yeah.

144 00:26:08.090 00:26:12.200 Michael Thorson: fandom again with category by another 3 or 4.

145 00:26:12.600 00:26:14.530 Ashwini Sharma: No, everything else is there.

146 00:26:16.010 00:26:18.400 Michael Thorson: Oh, all, like, all, product levels?

147 00:26:18.600 00:26:22.950 Ashwini Sharma: Yeah, everything is there, except for filters are,

148 00:26:23.560 00:26:28.720 Ashwini Sharma: are the one week, end date with November 30th?

149 00:26:28.850 00:26:37.490 Ashwini Sharma: And then, brand… reporting level is brand, and geography is one single item. I don’t remember which one was it.

150 00:26:40.830 00:26:41.570 Michael Thorson: Yeah.

151 00:26:42.460 00:26:44.110 Ashwini Sharma: Geography is Mississippi.

152 00:26:44.280 00:26:45.660 Ashwini Sharma: Okay, cool.

153 00:26:47.690 00:26:52.820 Michael Thorson: Yeah, I think this will make sense for the, like, reporting levels?

154 00:26:53.350 00:26:57.440 Michael Thorson: I’m curious what you think about this, too, because the reporting levels are, like, it…

155 00:26:57.920 00:27:02.679 Michael Thorson: When you get down to the UPC level, that’s when the data gets really big, as well.

156 00:27:03.290 00:27:07.369 Ashwini Sharma: Oh, okay, I didn’t know that, I haven’t seen it, okay. Did you see…

157 00:27:08.980 00:27:23.139 Michael Thorson: So, the UPC, I think, like, think of it in, like, two different levels, where it’s, like, UPC is, like, we only really care about our own products, so it’s, like, like, filter brand equals Magic Spoon, then show UPCs, and, like, a couple, like, key competitors.

158 00:27:23.440 00:27:23.830 Ashwini Sharma: Okay.

159 00:27:23.830 00:27:32.799 Michael Thorson: If you do, like, every UPC for every brand, you’ll end up with, like, every cereal product on shelves in the country, which I think that’ll be massive.

160 00:27:33.120 00:27:35.090 Michael Thorson: Be careful there.

161 00:27:38.100 00:27:38.530 Michael Thorson: Yeah.

162 00:27:38.530 00:27:39.940 Ashwini Sharma: Alright,

163 00:27:40.090 00:27:50.609 Ashwini Sharma: Yeah, I think one other question that I had was, like, we don’t want to download this data every time the sync runs, right? So, in a way.

164 00:27:51.650 00:27:59.420 Ashwini Sharma: If you have already extracted data for a weekly basis from 2022 till November.

165 00:27:59.690 00:28:04.210 Ashwini Sharma: When the next set of data releases, like, you said it happens every month, right?

166 00:28:04.500 00:28:11.040 Ashwini Sharma: So maybe in February we’ll… maybe in January we’ll get December’s data, right? Sometime in January.

167 00:28:12.350 00:28:16.930 Ashwini Sharma: We want to pull only the December’s data, right? The November’s data won’t change.

168 00:28:17.390 00:28:18.549 Ashwini Sharma: At that point.

169 00:28:19.780 00:28:20.990 Michael Thorson: Correct, yeah.

170 00:28:20.990 00:28:21.570 Ashwini Sharma: Alright.

171 00:28:21.570 00:28:26.810 Michael Thorson: The… I think the… the… Actually… oh, man.

172 00:28:28.450 00:28:33.659 Michael Thorson: Oh… And I hate to tell you, but it actually will change.

173 00:28:34.150 00:28:35.329 Michael Thorson: Oh, it does?

174 00:28:35.330 00:28:38.999 Ashwini Sharma: Till what point in time is it going to change?

175 00:28:39.210 00:28:47.999 Michael Thorson: Yeah, no, it’s… there’s, like, a long tail of updates, it’s probably 2 to 3 months or something, and I totally forgot about that. It’s a really good call-out.

176 00:28:48.340 00:28:52.329 Michael Thorson: So, what happens, and it’s… it’s usually, like, so, like, this…

177 00:28:54.040 00:28:56.659 Michael Thorson: If you poll the data for 810,

178 00:28:57.360 00:29:10.460 Michael Thorson: And you pull, like, from this deadline, like, a month later, for example, what you’ll see is, like, some of the changes, some of the, like, total product sales, within, you know, one year of…

179 00:29:10.610 00:29:11.820 Michael Thorson: A10?

180 00:29:11.940 00:29:15.179 Ashwini Sharma: Like, that’ll actually… it’ll typically go down.

181 00:29:15.760 00:29:21.069 Michael Thorson: It’s like, think about, like, people returning product, or, like, there’s, like, refunds, or etc, etc, so, like…

182 00:29:21.070 00:29:21.460 Ashwini Sharma: Oh!

183 00:29:21.460 00:29:34.980 Michael Thorson: The total size, yeah. So, like, there’ll be, like, a settling period of, like, one to two months, but there is, like, a long tail there, where it’s, like, I don’t have a clear cutoff for, like, when the data is, like, locked, locked. And I think that’s why…

184 00:29:36.570 00:29:52.340 Michael Thorson: our sales team works in trends, and why this company also reports in trends. It basically says, like, oh, like, on A10, 52 weeks, it changed .01%, but if you look at the past 4 weeks, the total dollar value changed by, like, 10%.

185 00:29:54.180 00:29:56.430 Ashwini Sharma: Yeah, that makes sense, right? Okay.

186 00:29:56.630 00:30:04.049 Michael Thorson: Yeah, so actually, come to think of that, too, I’m like, as you’re hitting these endpoints, I think there’s kind of two different use cases,

187 00:30:04.050 00:30:04.660 Ashwini Sharma: Yeah.

188 00:30:04.980 00:30:05.980 Michael Thorson: Which is…

189 00:30:07.260 00:30:24.979 Michael Thorson: Yeah, to your point, it’s like, she only, like, one of our teams only looks at, like, trends, so it’s like, she wants to look in a 52-week period, she wants to look at, like, a previous 3 months. She aligns all her analyses with the spins reported periods.

190 00:30:25.850 00:30:42.019 Michael Thorson: So honestly, I think we should align with the spin’s timeframes, instead of aggregating on our end, just to make sure we have, like, one-to-one historical data, and we, like, don’t over… we don’t miss any of that settling that’s happening.

191 00:30:42.260 00:30:42.800 Michael Thorson: Does that make sense.

192 00:30:42.800 00:30:47.720 Ashwini Sharma: Yeah, yeah, yeah. Okay, we’ll do that only in that case here.

193 00:30:48.460 00:30:58.189 Michael Thorson: Okay, cool. And then we do have another use case, which is part of the… I know we’ve kind of talked about the MMM projects, they… but that team cares about… so…

194 00:30:58.600 00:31:05.250 Michael Thorson: The commercial team, like, that whole data model will be trend-based, 4 weeks and up, so it’s every time frame, 4 weeks to 52 weeks.

195 00:31:06.120 00:31:14.200 Michael Thorson: Does that make sense? So, like, once a month, we would pull all… 4 weeks, I think it’s 8 weeks, 12 weeks, up to 52 weeks.

196 00:31:14.600 00:31:16.170 Ashwini Sharma: Yeah. Okay.

197 00:31:16.170 00:31:18.240 Michael Thorson: The geographies that we’ve talked to.

198 00:31:18.500 00:31:25.990 Ashwini Sharma: Yeah. So, trend for last 52 weeks, based on 4 weeks, time chunks.

199 00:31:29.740 00:31:31.469 Michael Thorson: Is that what,

200 00:31:32.540 00:31:36.819 Michael Thorson: Yeah, I’m sorry, I’m like, I don’t know if I’m doing… it’s probably better if I show you what…

201 00:31:36.820 00:31:39.600 Ashwini Sharma: Yeah, let me stop sharing.

202 00:31:40.130 00:31:47.650 Michael Thorson: This is… Spins… Data… I think it’s one of these two.

203 00:32:00.690 00:32:05.700 Michael Thorson: Let me share kind of what the use case is, I guess.

204 00:32:06.440 00:32:11.679 Michael Thorson: it’s pretty… it’s pretty rudimentary. Like, it’s nothing too fancy on the modeling side.

205 00:32:12.200 00:32:14.840 Michael Thorson: Yeah, this is probably the most useful.

206 00:32:18.550 00:32:25.330 Michael Thorson: Remove this… API join key, and then, dang it.

207 00:32:26.990 00:32:35.940 Michael Thorson: Yeah, so I’m just in all new really quick, so this is, like, I seeded this into the warehouse manually a while ago,

208 00:32:37.360 00:32:42.530 Michael Thorson: Where did I put that? Let me just, like, quickly pull in all the fields for you.

209 00:32:50.840 00:33:08.959 Michael Thorson: But, like, long story short, I think… sorry, I’m trying to do this, long story short, it’s really two different use cases. One is, like, longer-term trends, so it’s four weeks and ups, and then the other use case is going to be a weekly granularity. So we want to kind of split those out into two separate,

210 00:33:09.120 00:33:10.350 Michael Thorson: base models.

211 00:33:11.990 00:33:13.610 Ashwini Sharma: Got it, yeah. Okay.

212 00:33:14.720 00:33:23.269 Michael Thorson: Uploaded CSVs, spins, API… There we go.

213 00:33:24.370 00:33:30.600 Michael Thorson: So it’ll be basically, like, for our sales team, and the first one we’re talking about here is…

214 00:33:31.100 00:33:36.480 Michael Thorson: It’s gonna be… Product universe, reporting level…

215 00:33:36.930 00:33:41.410 Michael Thorson: Just to give you kind of an idea here of what this is looking like, and then time frame.

216 00:33:42.780 00:33:49.080 Michael Thorson: Number of weeks… There you go.

217 00:33:55.750 00:34:03.890 Michael Thorson: So it’s a 52 weeks, year-to-date, which is, like, 32 weeks, 24 weeks, 12 weeks. It’s, it’s pretty much everything, and…

218 00:34:04.050 00:34:08.689 Michael Thorson: We can ignore the number of weeks equals 1.

219 00:34:09.290 00:34:10.439 Michael Thorson: Does that make sense?

220 00:34:12.429 00:34:18.559 Ashwini Sharma: Yeah, okay. So, why did you ignore that? Number of weeks equals to 1?

221 00:34:23.050 00:34:34.310 Michael Thorson: We can… we can pull it in. I think she doesn’t, like, typically look at that, I guess, as, like, our end customer doesn’t really look at that piece very often.

222 00:34:34.310 00:34:35.070 Ashwini Sharma: Okay.

223 00:34:37.679 00:34:38.289 Michael Thorson: Cool.

224 00:34:38.729 00:34:44.969 Michael Thorson: So this is that… this is an example extract, but it’s, like, all the geographies fanning out,

225 00:34:45.939 00:34:48.109 Michael Thorson: From one end date. Does this make sense?

226 00:34:49.179 00:34:50.480 Ashwini Sharma: Yeah, yeah.

227 00:34:51.500 00:34:52.909 Ashwini Sharma: Got it, okay.

228 00:35:01.320 00:35:01.930 Michael Thorson: Cool.

229 00:35:02.990 00:35:09.780 Michael Thorson: Yeah, any… any questions while we’re… we’re chatting here? Before we hop into a Scheming? I’ve been to?

230 00:35:10.210 00:35:12.840 Ashwini Sharma: Yeah, yeah, we could hop into that shift now.

231 00:35:14.080 00:35:14.820 Michael Thorson: Yeah.

232 00:35:17.610 00:35:25.849 Michael Thorson: Yeah, and then… Are you… are you clear on the use cases for the commercial team, which is…

233 00:35:26.110 00:35:32.990 Michael Thorson: Trend-based data like this versus… the… Like, week over week.

234 00:35:33.200 00:35:34.300 Michael Thorson: Use case?

235 00:35:37.910 00:35:44.339 Ashwini Sharma: No. I got what we need to pull out of the system, right?

236 00:35:44.630 00:35:48.589 Ashwini Sharma: I’ll just align the extract with the timeframes that is provided by…

237 00:35:48.840 00:35:54.919 Ashwini Sharma: by spends, and then create multiple tables in Redshift for each,

238 00:35:55.180 00:36:14.890 Ashwini Sharma: time frame, even though, like, I feel that if we have a week-on-week data, we could always, you know, derive the other timeframe data. But yeah, we could verify that later. For now, what I’m going to do is, for every time frame that Spence provides, I’m going to create a different table that represents aggregated data.

239 00:36:15.270 00:36:16.999 Ashwini Sharma: For that time period.

240 00:36:18.710 00:36:22.330 Ashwini Sharma: So maybe one table for a weekly data that contains

241 00:36:22.680 00:36:27.280 Ashwini Sharma: You know, details for every week since 2022?

242 00:36:27.910 00:36:31.069 Ashwini Sharma: Another table for 4 weeks data, and so on, right?

243 00:36:32.360 00:36:37.069 Michael Thorson: Yeah, that’s… that sounds perfect, and then…

244 00:36:37.360 00:36:48.020 Michael Thorson: I think that’s the… a good way to go about it, and then the cadence of those API hits are aligned to the data release schedule. I think that’s the one next layer to this. So we basically have, like.

245 00:36:48.140 00:36:53.059 Michael Thorson: Yeah, like, the raw data dump… the raw table dumps will be…

246 00:36:53.210 00:37:00.760 Michael Thorson: Yeah, the 52-week table, the 24-week, etc, etc, etc, all the way down to 1 week. Every time frame has…

247 00:37:01.040 00:37:07.649 Michael Thorson: a release… Like, a date of release date, the timeframe that was extracted.

248 00:37:08.670 00:37:11.159 Michael Thorson: And then all those measures that we’ve discussed.

249 00:37:17.730 00:37:22.530 Ashwini Sharma: So, this data release, it happens once a month, and whenever it happens, it

250 00:37:24.560 00:37:28.629 Ashwini Sharma: Again, it releases all the data for 4 weeks of the month.

251 00:37:32.060 00:37:38.430 Michael Thorson: Correct. And if any changes happen in 2 months previous.

252 00:37:38.430 00:37:40.410 Ashwini Sharma: In, in that break, yeah.

253 00:37:40.410 00:37:41.080 Michael Thorson: that exact…

254 00:37:41.080 00:37:43.670 Ashwini Sharma: Oh, that, that also comes, yeah, yeah, okay.

255 00:37:43.670 00:37:48.799 Michael Thorson: So it’s like a totally new dataset every release date. Does that make sense?

256 00:37:48.800 00:37:49.950 Ashwini Sharma: Right, yes.

257 00:37:50.070 00:37:52.680 Ashwini Sharma: And we don’t want to save that, right?

258 00:37:53.400 00:37:58.269 Ashwini Sharma: Probably, like, we’re going to override The new dataset every time.

259 00:38:01.890 00:38:07.019 Michael Thorson: No, let’s keep it, as long as you pull in the field that is the end date.

260 00:38:07.340 00:38:14.779 Michael Thorson: we should have an end date, and then a time frame next to each other. So, like, that extract that’s in the warehouse currently.

261 00:38:15.330 00:38:19.860 Michael Thorson: is, like… That’ll be the context that’s useful, it’s…

262 00:38:21.940 00:38:29.290 Michael Thorson: End date is when the data was extracted or released, and then this is the aggregate period number of weeks, like 24, 52, etc.

263 00:38:29.620 00:38:35.569 Ashwini Sharma: Oh, okay, okay. Oh, yeah, yeah, yeah, yeah, yeah, in that case, that serves as a differentiator.

264 00:38:38.660 00:38:41.329 Michael Thorson: Yeah, that would be the differentiator, then.

265 00:38:57.980 00:39:01.720 Michael Thorson: Yeah, and what’s… maybe a question back for you, like, what would help…

266 00:39:02.030 00:39:20.100 Michael Thorson: what would help you understand, kind of, what we need for the two different use cases of Magic Spoon? I can draw a bit of a diagram, maybe? I’m like, I’m nervous I’ve thrown too many details over at you guys, but I don’t know if that’s the clearest way to communicate. Do you have a… do you have something, like, a format that you prefer?

267 00:39:20.700 00:39:24.469 Ashwini Sharma: No, anything is fine, anything is fine. A diagram helps.

268 00:39:27.040 00:39:27.750 Michael Thorson: Okay.

269 00:39:29.370 00:39:30.170 Michael Thorson: Cool.

270 00:39:32.270 00:39:36.339 Ashwini Sharma: We can take a look at the redshift.

271 00:39:37.340 00:39:38.040 Michael Thorson: Yep.

272 00:39:38.630 00:39:42.619 Michael Thorson: I’ll keep sharing then, so you can see the IDE I’m in.

273 00:39:45.720 00:39:53.299 Michael Thorson: I think you can just, like, we’ll have you connect to the dev instance of Redshift, and then we have a couple schemas that are…

274 00:39:54.400 00:40:01.210 Ashwini Sharma: I connected to AWS, but then I don’t think there was Redshift in that. There is no access to Redshift in that.

275 00:40:03.040 00:40:03.990 Michael Thorson: What was that?

276 00:40:04.880 00:40:05.490 Ashwini Sharma: Yep.

277 00:40:07.190 00:40:08.619 Michael Thorson: Do you need, like, a login?

278 00:40:10.530 00:40:17.229 Ashwini Sharma: I think that could help. Normally, I was expecting that I would be able to go to Redshift directly from AWS.

279 00:40:19.030 00:40:31.569 Michael Thorson: Yeah, I think the account permissions are also kind of messed up, so I can share… we have a shared login, and I have it stored in a profile with YML, I can share that with you really quick.

280 00:40:33.420 00:40:41.709 Michael Thorson: This is probably… It’s a little detailed, but that’s… it includes, like… it’s kind of crazy, I know.

281 00:40:41.750 00:40:45.029 Ashwini Sharma: Yeah. Host name, password, port…

282 00:40:45.030 00:40:57.949 Michael Thorson: Schema, this is, like, we’ve… this is our, like, dev schema for each user, so this is just, like, my initials, but model, dev, and then, like, test schema, like, your initials will be fine.

283 00:40:58.160 00:41:00.490 Ashwini Sharma: And then we use, yeah, we use a shared.

284 00:41:00.610 00:41:06.869 Michael Thorson: user account that was built by our current analytics partner, so we actually don’t have, like, user controls, which is kind of crazy, I know.

285 00:41:10.800 00:41:14.539 Ashwini Sharma: So, these are the DBT modeling, right?

286 00:41:15.460 00:41:21.280 Ashwini Sharma: I think, what I’m looking at is, where do I put the raw data? That’s.

287 00:41:21.280 00:41:21.860 Michael Thorson: Yeah.

288 00:41:22.250 00:41:28.320 Ashwini Sharma: This is the same instance, right? It’s… all three are the same instance. Yeah, it’s the same instance.

289 00:41:29.160 00:41:35.289 Ashwini Sharma: Only the schemas are different. Model dev, model, and schema is parabola.

290 00:41:37.360 00:41:38.800 Michael Thorson: Oh, gotcha.

291 00:41:39.890 00:41:48.829 Michael Thorson: Sorry, the organization here is any dbt models are in the model schema, and then we have raw data dumping into folders

292 00:41:48.980 00:41:55.530 Michael Thorson: of business use case. So… or, like, where it’s coming from. So I think we have…

293 00:41:55.830 00:41:58.639 Michael Thorson: Let me just double-check this.

294 00:41:58.950 00:42:04.219 Michael Thorson: So we’ll want to probably make a new schema for spins, is the long story short.

295 00:42:04.220 00:42:04.800 Ashwini Sharma: Okay.

296 00:42:05.190 00:42:08.490 Ashwini Sharma: And this is the dev instance of, database?

297 00:42:09.450 00:42:10.450 Ashwini Sharma: Okay.

298 00:42:10.890 00:42:12.120 Ashwini Sharma: Bernstun’s.

299 00:42:12.120 00:42:12.720 Michael Thorson: Corrects.

300 00:42:12.720 00:42:13.480 Ashwini Sharma: Right.

301 00:42:27.490 00:42:33.290 Ashwini Sharma: Yeah, can you share me that profile? I could use that. This is, what, DBeaver? Okay.

302 00:42:35.350 00:42:36.480 Ashwini Sharma: Alright, I don’t know.

303 00:42:36.480 00:42:37.640 Michael Thorson: since I’ve been using.

304 00:42:46.720 00:42:51.599 Michael Thorson: Yeah, I’ll send you a OnePass after this call with those in it. Does that work?

305 00:42:51.600 00:42:53.130 Ashwini Sharma: Sure, sure, yeah?

306 00:42:53.290 00:42:56.149 Ashwini Sharma: Cool. What else was there?

307 00:43:02.210 00:43:06.150 Ashwini Sharma: So the overall process is, like, you create the extract.

308 00:43:06.490 00:43:10.870 Ashwini Sharma: push the extracted files… files to S3.

309 00:43:11.070 00:43:15.899 Ashwini Sharma: And then you load it from S3 into… into the database tables, right?

310 00:43:16.730 00:43:23.769 Ashwini Sharma: That’s one thing. And we don’t really care about the incremental extract, because we have to extract the prior data

311 00:43:23.920 00:43:27.569 Ashwini Sharma: And we’re not sure of the lock-in period, right? So…

312 00:43:27.790 00:43:31.019 Ashwini Sharma: Basically, every time we do the extract, once a month.

313 00:43:31.180 00:43:38.720 Ashwini Sharma: We’re going to fetch all the data sets, that’s… Wow.

314 00:43:40.430 00:43:47.410 Ashwini Sharma: Should I limit it to one year, or should I just pull it everything, everything since 2022?

315 00:43:48.870 00:43:51.310 Michael Thorson: Good question.

316 00:43:53.400 00:44:03.189 Michael Thorson: Let me… I’ll take a follow-up with that, and take a look at the data. There’s… I’ll just check with our sales team, like, what the settling period is. I’m sure we can go incremental in, like, 6 months or something, you know?

317 00:44:03.820 00:44:04.540 Ashwini Sharma: Yeah.

318 00:44:04.540 00:44:07.830 Michael Thorson: Like, just call that… call that good.

319 00:44:08.720 00:44:10.659 Michael Thorson: Yeah, I see what you’re saying.

320 00:44:11.510 00:44:16.530 Michael Thorson: But for the… for the commercial team, like, what they really look at is they look at

321 00:44:16.990 00:44:22.540 Michael Thorson: Every single month, like, every data release schedule, they want to see what’s the trend over the past year.

322 00:44:22.670 00:44:28.860 Michael Thorson: They don’t look further than that, so it’s like, we don’t necessarily need a historic backfill. Like, you don’t need to go…

323 00:44:28.990 00:44:32.179 Michael Thorson: Like, you don’t need to go to, like,

324 00:44:32.740 00:44:41.290 Michael Thorson: June 2025, and pull all those timeframes. I’m just saying, like, let’s start fresh of, like, let’s start with the latest release.

325 00:44:41.750 00:44:45.919 Michael Thorson: Extracts just every time frame available.

326 00:44:45.920 00:44:46.420 Ashwini Sharma: Yep.

327 00:44:46.420 00:44:52.429 Michael Thorson: for that release. Yeah, and we can start moving forward, and, like, the most recent will be the most useful.

328 00:44:52.830 00:44:54.020 Ashwini Sharma: Right, yes.

329 00:44:54.580 00:44:55.170 Ashwini Sharma: Okay.

330 00:44:55.170 00:44:55.890 Michael Thorson: Yeah.

331 00:44:57.460 00:45:03.630 Michael Thorson: So it’s, yeah, I guess it’s, like, we don’t need a backfill, just, like, we’re… Just doing this go-forward.

332 00:45:03.750 00:45:06.310 Michael Thorson: For the commercial use case?

333 00:45:06.830 00:45:12.289 Michael Thorson: So that’s really, like, for context, too, like, that’s gonna be diagnostic. They look at a snapshot.

334 00:45:12.660 00:45:14.550 Michael Thorson: At the release, every single month.

335 00:45:14.610 00:45:24.090 Michael Thorson: they just want to know, like, what have the past periods looked like? The second use case, which would be, like, it’s down the road, but I want to flag it to you, is we do care about the week over week.

336 00:45:24.090 00:45:35.149 Michael Thorson: And we do want historical backfill, like, all time. We really want to… it’s more of, like, a trend-based analysis to, like, track how sales performed, with relation to, like, marketing spend.

337 00:45:35.150 00:45:45.209 Michael Thorson: So we want to see the week-over-week sales just for Magic Spoon products, for all time. So that one we do want to pull for the last, like, since 2022, or whatever the…

338 00:45:45.990 00:45:55.270 Michael Thorson: Yeah, and that’s… that’s the extract that we’ll need for…

339 00:45:55.410 00:45:58.649 Michael Thorson: the upcoming, like, mark… was it mass…

340 00:45:59.430 00:46:05.469 Michael Thorson: Marketing Mix Media… Marketing Media Mix, the MMM testing that’s coming up?

341 00:46:05.470 00:46:06.550 Ashwini Sharma: Hmm,

342 00:46:07.180 00:46:08.620 Michael Thorson: Have you followed that?

343 00:46:08.620 00:46:15.030 Ashwini Sharma: No, I haven’t. I think it’s a bit, too much information.

344 00:46:15.140 00:46:28.489 Ashwini Sharma: So, let’s start simple. I think I’ll just pull whatever is there, and then we can, you know, separate the data in the final models, dbt models, and present

345 00:46:29.000 00:46:32.160 Ashwini Sharma: Whatever is needed to whichever team needs it.

346 00:46:35.990 00:46:37.590 Michael Thorson: Yeah, that sounds perfect.

347 00:46:38.370 00:46:48.800 Michael Thorson: Yeah, I’ll send you the profiles YML, so you can get access to Redshift, and I’ll just send you, like, a sample schema of where to dump the information into, like, what raw table.

348 00:46:49.230 00:47:00.310 Ashwini Sharma: Sure, and yeah, one more question I remember right now. So the way that I’m testing is by building a Docker container and then running it locally in my laptop. Is there a better way to do it?

349 00:47:04.750 00:47:08.370 Ashwini Sharma: During the pipeline development, how did your team do it?

350 00:47:09.810 00:47:16.600 Michael Thorson: We, like, we’ve actually never set up any, like, prefix workflows. That’s all been managed by our analytics partner.

351 00:47:16.600 00:47:17.810 Ashwini Sharma: Oh, okay.

352 00:47:19.360 00:47:23.970 Michael Thorson: Yeah, so eventually, eventually I think the workflow will be deployed to…

353 00:47:24.150 00:47:31.379 Michael Thorson: the prefixed environment, and then run from there. What would you need to make that work? I think that’s where the gap is, and…

354 00:47:31.380 00:47:49.769 Ashwini Sharma: Oh, no, no, I can do that too, right? I mean, I can make the changes, deploy it into the perfect environment that you have provided, and then test it out there, right? But that’s like testing in production, right? And things will break when you are doing the development multiple times, right? And I don’t want to

355 00:47:49.770 00:48:00.210 Ashwini Sharma: go through the entire workflow of you, push a PR, merge it into the main, and then it gets deployed, and then only, you know… So the way that I’m working is

356 00:48:00.240 00:48:07.359 Ashwini Sharma: entirely locally by building a Docker container. I thought I’ll just check with you in case you are following a different procedure that is

357 00:48:07.440 00:48:12.090 Ashwini Sharma: much more efficient than what I’m doing, right? So, alright.

358 00:48:12.090 00:48:23.059 Michael Thorson: Yeah, we’re mostly working, and the same for, like, I’m mostly working in the model as well, like, I’m running anything, like, locally when I do make changes, and then once I’m, like, I validated nothing broke… breaks, then I just…

359 00:48:23.470 00:48:37.279 Michael Thorson: submit a pull request to GitHub, and I think that’s kind of the same for Prefect, where, like, unless you had, like, open recommendations there as well. Like, I haven’t touched Prefect at all, I’ve only worked locally for any ETL, so…

360 00:48:37.670 00:48:44.320 Ashwini Sharma: Okay, cool. All right, no issues, yeah. I’ll continue the way that I’m doing it, right now.

361 00:48:44.480 00:48:45.780 Ashwini Sharma: That’s fine. Yeah.

362 00:48:45.780 00:48:47.289 Michael Thorson: I wish I could help Maureen.

363 00:48:48.300 00:48:57.509 Ashwini Sharma: What else was there? Oh, yeah, yeah, yeah, the… some of the environment variables has to be set into… into the perfect environment.

364 00:48:57.690 00:49:02.580 Ashwini Sharma: Mainly for client ID, client secret, and audience.

365 00:49:05.920 00:49:12.370 Ashwini Sharma: Yeah, that’s… we’re waiting on owner access for Prefect for our account as well.

366 00:49:12.530 00:49:13.700 Michael Thorson: I know.

367 00:49:14.260 00:49:16.040 Ashwini Sharma: Alright, no problem.

368 00:49:16.700 00:49:22.189 Michael Thorson: So I think, yeah, we’ll have to run locally until we can actually… we actually have owner access. I know… I’m sorry we’re in a…

369 00:49:22.730 00:49:25.219 Michael Thorson: Transition point with our current vendor as well.

370 00:49:25.220 00:49:31.169 Ashwini Sharma: Okay, no problem. Alright, cool, I think that’s… that’s all I have right now.

371 00:49:33.510 00:49:43.690 Michael Thorson: Sweet! Yeah, thanks, thanks for bearing with me. It’s all kind of like… I’m kind of swimming in the spins data, and it’s like, I don’t quite know how to best communicate what I’ve learned, but…

372 00:49:44.970 00:49:52.220 Ashwini Sharma: Alright, I’ll update you tomorrow once I have something more concrete, and yeah, alright, cool.

373 00:49:53.060 00:50:02.919 Michael Thorson: Sweet. Yeah, I’ll follow up. I’m gonna send you the schema, and then the profiles, like, YML file, or just, like, the details in there, so you can get access to Redshift writing.

374 00:50:03.410 00:50:04.809 Ashwini Sharma: Sure. Cool.

375 00:50:04.810 00:50:05.530 Michael Thorson: All good?

376 00:50:05.810 00:50:07.609 Ashwini Sharma: Okay, yeah, thank you.

377 00:50:08.610 00:50:11.960 Michael Thorson: Great, yeah, ping me if you have any more questions. Thanks for the… thanks for waiting today.

378 00:50:12.290 00:50:13.120 Michael Thorson: Sure.

379 00:50:13.270 00:50:13.980 Ashwini Sharma: Bye.

380 00:50:14.340 00:50:14.940 Michael Thorson: Peace.