Meeting Title: Annie <> Brian Coaching Session Date: 2025-08-07 Meeting participants: Awaish Kumar, Brian Pei, Annie Yu


WEBVTT

1 00:01:50.660 00:01:51.620 Awaish Kumar: Hello!

2 00:01:52.650 00:01:54.000 Brian Pei: Hey! Good morning!

3 00:01:54.870 00:01:56.330 Awaish Kumar: Good morning. How are you.

4 00:01:57.390 00:01:59.210 Brian Pei: Good! Good! How are you?

5 00:01:59.870 00:02:01.350 Awaish Kumar: Yeah, I think we did as well.

6 00:02:02.500 00:02:05.080 Brian Pei: Awesome. What time zone are you in.

7 00:02:06.310 00:02:07.900 Awaish Kumar: I’m in Pakistan.

8 00:02:09.694 00:02:10.609 Brian Pei: Say that again.

9 00:02:11.330 00:02:12.689 Awaish Kumar: I’m in Pakistan.

10 00:02:13.700 00:02:16.100 Brian Pei: Oh, no way! Wait! What time is it? There.

11 00:02:17.090 00:02:20.430 Awaish Kumar: It’s like 7 pm. Now.

12 00:02:21.480 00:02:23.619 Brian Pei: Wow, that’s late. Okay.

13 00:02:26.630 00:02:27.710 Brian Pei: cool. That’s awesome.

14 00:02:29.180 00:02:32.989 Awaish Kumar: I think Amy will be joining soon.

15 00:02:34.160 00:02:35.110 Brian Pei: No problem.

16 00:02:36.070 00:02:37.530 Awaish Kumar: They can also.

17 00:03:39.910 00:03:40.910 Annie Yu: Hello!

18 00:03:43.400 00:03:44.360 Brian Pei: Good morning!

19 00:03:45.190 00:03:46.690 Annie Yu: Morning, Brian.

20 00:03:47.170 00:03:48.219 Brian Pei: How’s it going.

21 00:03:51.310 00:03:54.600 Annie Yu: Wait. Let me. Okay, turn on video.

22 00:03:54.850 00:03:57.249 Annie Yu: Yeah. Hi, how are you?

23 00:03:57.250 00:03:59.549 Brian Pei: Hi! Nice to meet you. I’m good. How are you.

24 00:04:00.060 00:04:01.643 Annie Yu: Nice to meet you. It’s

25 00:04:02.100 00:04:10.139 Annie Yu: I’m I’m I’m in Portland. So this is not usually like the best time to meet. But, thanks for making the time.

26 00:04:10.370 00:04:12.329 Brian Pei: Yeah, we can change it next time. It’s all good.

27 00:04:12.330 00:04:13.800 Annie Yu: Yeah, that’ll be awesome.

28 00:04:14.640 00:04:19.733 Brian Pei: Yeah, this will just be a 1st 1st time. Things won’t be recurring at this time.

29 00:04:20.829 00:04:23.540 Brian Pei: but I’m in Florida, and

30 00:04:24.370 00:04:36.079 Brian Pei: I don’t usually work from Florida is my family has a condo here. So I’m here on vacation, going back tomorrow. Yeah. And I usually I live in Brooklyn.

31 00:04:36.900 00:04:40.830 Annie Yu: Man nice. Where in Florida are you in right now?

32 00:04:41.430 00:04:42.449 Brian Pei: West Palm Beach.

33 00:04:43.270 00:04:51.280 Annie Yu: Okay, I haven’t been but I’m committed to going to Florida every year, for, like manatee I love, I love manatees.

34 00:04:53.100 00:04:55.160 Brian Pei: Really just for manatees. That’s cool.

35 00:04:55.160 00:05:01.479 Annie Yu: Yeah, I usually go to Crystal River. I think that’s the only place you can legally swim with them.

36 00:05:02.380 00:05:04.450 Brian Pei: Man, it’s okay. So you’ve done this before.

37 00:05:04.860 00:05:07.979 Annie Yu: Yeah, yeah, yeah, I try to do that every year, like, it’s.

38 00:05:07.980 00:05:08.570 Brian Pei: Yeah.

39 00:05:09.140 00:05:09.710 Annie Yu: That’s.

40 00:05:09.710 00:05:11.056 Brian Pei: So interesting.

41 00:05:11.990 00:05:20.660 Annie Yu: Yeah, and New York. I I’ve also been a few times I I went to Rutgers for my master’s, so I got a chance to visit a lot.

42 00:05:21.250 00:05:22.160 Brian Pei: Okay.

43 00:05:22.310 00:05:24.029 Brian Pei: And what did you get? Your master’s in.

44 00:05:24.600 00:05:28.419 Annie Yu: Supply chain analytics. So it was kind of like a combination of

45 00:05:29.194 00:05:34.270 Annie Yu: part of supply chain management and data analytics.

46 00:05:35.050 00:05:39.819 Brian Pei: Gotcha. When did you receive the masters? When did you finish.

47 00:05:40.630 00:05:45.459 Annie Yu: That was back in 2022 end of 2022.

48 00:05:45.460 00:05:47.930 Brian Pei: End of 2022. Okay, yeah.

49 00:05:47.930 00:05:49.529 Annie Yu: It’s been a while.

50 00:05:49.970 00:06:05.660 Brian Pei: So cool. This is going to be a get get to know you and your and your skills, and how I can help. So can you let me? Can you tell me in the past 3 years, since you finished the masters, the kind of work that you’ve been doing.

51 00:06:06.280 00:06:26.399 Annie Yu: Yeah, so I joined Nike. I was a summer intern and eventually converted to a full time, joined Nike on like a North America marketplace team, where we helped the merchandising and planning team to to see like what skews are productive, what’s not.

52 00:06:26.510 00:06:51.399 Annie Yu: And while I use there were mainly like snowflake data, bricks and tableau. But back then I was more so like an end user for table. So I so everything’s already like organizing snowflake. I just have to grab the tables and the fields that I need, and join them into a new table for my visualization and report and then.

53 00:06:51.400 00:06:52.050 Brian Pei: Okay.

54 00:06:52.864 00:07:09.190 Annie Yu: And we did like store and direct. Nike, direct digital site as well. Then I later on join. I was contracted for Microsoft doing like voice of customer analytics.

55 00:07:09.300 00:07:14.460 Annie Yu: So we used a lot of like scraping AI.

56 00:07:14.460 00:07:15.200 Brian Pei: Cool.

57 00:07:15.200 00:07:25.290 Annie Yu: See like the verbatim, and see what customers are saying about the products and the website. So that was like a brief time. I was

58 00:07:25.750 00:07:29.410 Annie Yu: stepping in for like a full time, and she eventually came back.

59 00:07:30.041 00:07:37.740 Annie Yu: And then I landed here but before Masters I was doing more so like marketing

60 00:07:38.400 00:07:42.960 Annie Yu: content, like C SEO kind of thing, and also did some

61 00:07:43.170 00:07:57.830 Annie Yu: digital commerce and analytics not so much like event based stuff, but more so like around the whole customer journey, like from awareness, to acquisition, retention.

62 00:07:59.260 00:08:03.179 Brian Pei: Got it. Did you like what you were doing at Microsoft? Was it fun.

63 00:08:04.320 00:08:09.229 Annie Yu: I think it was fun in terms of cause. I think

64 00:08:09.340 00:08:18.990 Annie Yu: analytics is always like part of quantitative versus qualitative, and there was a lot of like qualitative stuff. That’s actually

65 00:08:19.250 00:08:29.480 Annie Yu: something that I never really saw a lot in my past lives. So it was pretty cool. But I remember I also thought it was not technical enough.

66 00:08:29.910 00:08:38.199 Annie Yu: because there was like developed AI tool that we could already leverage so not so much change to do like

67 00:08:39.730 00:08:41.309 Annie Yu: querying stuff.

68 00:08:41.789 00:08:47.469 Annie Yu: And the visualization part was also like, not super advanced.

69 00:08:47.610 00:08:54.489 Annie Yu: So yeah, it was. It was really a cool experience, but not something that I would want to do like long term.

70 00:08:55.830 00:09:01.039 Brian Pei: Gotcha. Okay, that’s a lot of context. Alright, let’s see.

71 00:09:02.540 00:09:04.600 Brian Pei: So do you.

72 00:09:07.620 00:09:09.020 Brian Pei: How’s your sequel?

73 00:09:10.130 00:09:12.229 Brian Pei: So would you say, it’s really good.

74 00:09:13.675 00:09:17.730 Annie Yu: I would say not.

75 00:09:17.920 00:09:20.740 Annie Yu: Well, I think.

76 00:09:21.130 00:09:27.189 Annie Yu: in terms of querying to answer a question. It’s alright, it’s it’s not. It’s not bad at all. But then.

77 00:09:27.190 00:09:27.540 Brian Pei: Yeah.

78 00:09:27.540 00:09:34.030 Annie Yu: I’m not familiar with all the syntaxes when it comes to like data modeling.

79 00:09:35.130 00:09:45.060 Annie Yu: If you really wanna build a a comprehensive model with lots of different tables. That’s where that’s where I’m not the most skilled at.

80 00:09:45.460 00:09:49.329 Brian Pei: Got it. Okay, that’s really good to know. Cool.

81 00:09:49.330 00:10:00.439 Annie Yu: Yeah, like, I’m not really familiar with that like area, that kind of thing. So like window functions are probably my weakest at this point, like, I usually.

82 00:10:00.440 00:10:01.450 Brian Pei: Oh, yeah.

83 00:10:01.450 00:10:08.790 Annie Yu: Row number but other than that, it’s usually something I have to ask AI or Google about.

84 00:10:08.790 00:10:11.962 Brian Pei: That’s fine. I mean, AI just can answer anything now.

85 00:10:13.210 00:10:18.580 Brian Pei: I sometimes still use row number to. I use row number to just remove duplicates.

86 00:10:19.070 00:10:19.690 Annie Yu: Yeah.

87 00:10:20.040 00:10:26.569 Brian Pei: And Snowflake has qualify statements now. So you don’t have to do like the

88 00:10:26.860 00:10:38.685 Brian Pei: sub query, where row number equals one? Blah blah. Okay, cool I will give you a rundown of my experience. And then

89 00:10:39.770 00:10:46.920 Brian Pei: this sort of almost feels like an interview. But it’s not so. There’s no pressure so very casual, and figure out

90 00:10:47.640 00:10:49.820 Brian Pei: how I can help in the future. So

91 00:10:50.510 00:10:58.080 Brian Pei: let’s see. So I graduated in 2013, doing Math and Econ.

92 00:10:58.270 00:11:00.630 Brian Pei: And then, after that.

93 00:11:01.390 00:11:07.639 Brian Pei: 2013. Yeah. So I’ve been doing data for like 12 years in a whole bunch of different

94 00:11:07.830 00:11:10.740 Brian Pei: ways. I started in consulting.

95 00:11:10.880 00:11:17.749 Brian Pei: and I moved over to business intelligence consulting, which is basically just data modeling, consulting. There’s like

96 00:11:18.270 00:11:30.700 Brian Pei: 20 different names for database, database, administrator, AI architect, data architect data model error

97 00:11:31.130 00:11:34.969 Brian Pei: analytics, engineer. They’re all the same thing to me.

98 00:11:35.070 00:11:40.540 Brian Pei: You’re sourcing data somehow and building a

99 00:11:41.110 00:11:54.235 Brian Pei: optimized model. It’s like the most important thing for analysts and data scientists and people on finance to report on numbers accurately, which also includes orchestration

100 00:11:54.980 00:12:02.019 Brian Pei: orchestration and testing so figuring out how often to update data for a team

101 00:12:02.140 00:12:06.649 Brian Pei: so like, you know, run this every hour or every day, or run this every 3 days. That kind of stuff

102 00:12:07.704 00:12:13.295 Brian Pei: it was harder to do back then. Now it’s simpler, because there’s so many tools that you can use to do it.

103 00:12:13.690 00:12:18.740 Brian Pei: but depending on a client’s orchestration tool like

104 00:12:19.150 00:12:38.940 Brian Pei: airflow or something. It doesn’t matter. It requires either a little bit of extra python and understanding like strong cron schedules. That’s all pretty easy to pick up. I would say it’s not that bad. The intuition comes in for deciding what needs to run first, st second, 3, rd and 4, th

105 00:12:39.070 00:12:41.589 Brian Pei: because you don’t want to run

106 00:12:42.460 00:12:45.180 Brian Pei: a profit and loss report before

107 00:12:45.420 00:12:53.519 Brian Pei: the invoices table gets built. It’s just stuff like that. So understanding what order that you need to build things in to get the data to actually work

108 00:12:54.430 00:12:59.190 Brian Pei: in the past before there were dags

109 00:12:59.640 00:13:07.310 Brian Pei: direct, like acyclical something. I don’t remember the word for it we kind of just ran everything

110 00:13:07.410 00:13:15.060 Brian Pei: multiple times a day, and nothing talked to each other, and it was all crazy. So like invoices runs randomly.

111 00:13:15.280 00:13:20.140 Brian Pei: reporting aggregate tables run randomly, and then at some point in the future they’ll sync up.

112 00:13:20.280 00:13:27.200 Brian Pei: But that was dumb. There wasn’t a lot of the tools that we have now, or AI or

113 00:13:27.800 00:13:31.210 Brian Pei: anything when I 1st started building

114 00:13:31.690 00:13:35.190 Brian Pei: orchestration and doing data models. So it’s nice that we have

115 00:13:35.340 00:13:42.450 Brian Pei: more organization now. And it’s nice that data architects and even business analysts and data analysts.

116 00:13:43.090 00:13:51.770 Brian Pei: Work more as software engineers. Now. So like, I also have seen the evolution of data teams

117 00:13:52.070 00:13:53.820 Brian Pei: moving into

118 00:13:54.180 00:14:14.170 Brian Pei: software engineering protocols where you have sprints and you have Jira tickets and you have to do unit testing in Dev before you promote a data model into production. That’s all stuff that software engineers do. And data used to never have that. And now it does. And it’s really really good that

119 00:14:14.690 00:14:24.750 Brian Pei: we’re treated in the same vein as software engineers. And we use the same tools as software engineers, even though I don’t know how to, you know, be a real software engineer. But you know we do the same things which is really nice.

120 00:14:25.500 00:14:32.569 Brian Pei: So after consulting, I was a data analyst and like a business intelligence analyst.

121 00:14:32.770 00:14:45.560 Brian Pei: but I did a lot of modeling for startups. So I worked in health tech. I was in DC, and then I worked in digital marketing for a year and I hated it. So I left.

122 00:14:45.640 00:14:53.859 Brian Pei: and then, in 2016, I did 3 years at wework, which is then where I

123 00:14:53.880 00:15:18.400 Brian Pei: met Utam, because I’m a little bit older than Utam, so we both went to Bucknell together, but I didn’t know him at Bucknell. I think he was a freshman when I was a Senior. I want to say so. Didn’t know him. Before I graduated. He sent me an email when I was at wework like a alumni email like, Hey, I saw that. You know, you work at wework. I’m looking for a job.

124 00:15:19.400 00:15:20.310 Brian Pei: So

125 00:15:20.510 00:15:35.630 Brian Pei: yeah, we met randomly through through that email. I got him a job interview. My team really liked him, and so we hired him. I interviewed him, and we hired him on my team at Wework for his 1st job out of school, and he did great

126 00:15:36.213 00:15:38.689 Brian Pei: he was like one of the best

127 00:15:38.930 00:15:43.350 Brian Pei: data people I’ve worked with. And he had just graduated college at the time. So that was awesome.

128 00:15:43.920 00:15:46.330 Brian Pei: So I work with Utam that we work for

129 00:15:46.820 00:15:50.440 Brian Pei: 2, maybe 3 years, and then we just kept in contact.

130 00:15:50.510 00:16:17.060 Brian Pei: So since then he did some contracting and then started Brain forge. Obviously. And for me, I’ve been supporting him. Sometimes I do projects in the beginning. When he didn’t have a team. I did a couple of contracting projects for him while I was also working, and for the past 4 years I’ve been at spotify. So I am still at spotify, and I

131 00:16:17.160 00:16:21.100 Brian Pei: help out when I have the time. But now I’m since

132 00:16:21.370 00:16:27.669 Brian Pei: Brainforge has been growing so rapidly and getting so many clients and doing doing so many cool things. I’ve

133 00:16:28.070 00:16:35.702 Brian Pei: stepped in to advise and potentially be part of Brainforge in the future.

134 00:16:36.570 00:16:41.600 Brian Pei: dirt between spotify and wework. I did like 7 data contracts.

135 00:16:42.020 00:16:44.810 Brian Pei: small medium and large companies, and

136 00:16:45.010 00:16:49.899 Brian Pei: all of them I did one with Utam. I did a company called Athletic Greens with Utam.

137 00:16:50.260 00:16:57.110 Brian Pei: and then I helped out with a company called Spark, who’s the parent company for

138 00:16:57.300 00:17:02.700 Brian Pei: Adidas and Reebok, so close to Nike, Adidas Reebok?

139 00:17:02.820 00:17:09.690 Brian Pei: What else do they have Random forever. 21. And Eddie Bauer.

140 00:17:10.069 00:17:11.379 Brian Pei: what was the one that I?

141 00:17:12.480 00:17:17.940 Brian Pei: There’s something else, nautica nautica. So we had to go and

142 00:17:18.069 00:17:26.449 Brian Pei: build a new database where we pulled I don’t know. Inventory data invoices, data. All of these clothing brands

143 00:17:26.650 00:17:27.869 Brian Pei: started

144 00:17:28.089 00:17:42.309 Brian Pei: like they did their own thing for so long. Right? So like you have a company that’s maybe on Oracle and a company on Microsoft, SQL. Server, a company on Redshift, a company that’s just doing Google sheets. And our job was to quote unquote, wrangle the data and put it all into Snowflake, but also make it match

145 00:17:43.450 00:17:44.410 Brian Pei: and

146 00:17:44.630 00:18:04.560 Brian Pei: at a high level. That’s data modeling right? It’s like this, company has invoices or even even easier, like this. Company has skews. But their skews are by color and size. And this other company has skews, but the colors

147 00:18:04.710 00:18:05.830 Brian Pei: are

148 00:18:06.190 00:18:35.230 Brian Pei: in one row and eliminate it by a comma. So it just says, red, green, blue, yellow, and so if you union those together, it doesn’t make any sense, because if you group by red, one company will say red, and the other company, it says red, blue, green, yellow, and so you need to parse the data to get it to work. So whatever parse red, green, blue, yellow, out, cross, join, make multiple rows make it so that when you group by the skew it makes sense at a very simple level. That

149 00:18:35.330 00:18:43.520 Brian Pei: is like, you know, there’s like a hundred of those use cases for quote, unquote, optimizing a

150 00:18:43.870 00:18:45.120 Brian Pei: what I call like

151 00:18:45.510 00:19:00.160 Brian Pei: a master table. So if you think about what I just mentioned at Spark, if they have Adidas Reebok and Nautica and Brooks Brothers, and they want to see sales. The executive team wants to see sales and revenue across all of them

152 00:19:00.460 00:19:16.079 Brian Pei: split by skew and split by state. And and then, you know, Reebok has New York as NY. And Adidas has New York by New York all strings. It’s just hundreds of little mini problems. Of how do I get all of it to fit

153 00:19:16.610 00:19:30.710 Brian Pei: perfectly into this like big puzzle piece where the cleaning happens, the storage happens, the unions happen and it keeps the integrity of all of their data for an executive team.

154 00:19:31.460 00:19:45.630 Brian Pei: That’s like 90% of data architecture and data modeling. And it’s fun for me, because every company that I go to has different problems like like that to to solve. And there are also good ways and bad ways to do that.

155 00:19:45.750 00:19:56.400 Brian Pei: If I I’m just pulling examples out of my brain. So if I go back to the you know, this shirt skew is red, comma blue, comma yellow, comma green

156 00:19:56.700 00:19:59.280 Brian Pei: there, there’s right ways and wrong ways to

157 00:19:59.930 00:20:03.770 Brian Pei: parse that out into 4 rows instead of one row.

158 00:20:04.000 00:20:13.400 Brian Pei: They probably all work. But you have to then think about query, optimization, or like Runtime, where, if you.

159 00:20:13.930 00:20:15.840 Brian Pei: if you do like a

160 00:20:18.070 00:20:29.109 Brian Pei: the up in unnesting list. Ag, and you say the delimiter is a comma, and then it splits it out. It’s like, is that better than doing Regex? Is it better than doing

161 00:20:29.390 00:20:43.530 Brian Pei: cross joining it on itself. Is it better to make another lookup table where you manually put in red, green, blue, yellow, as 4 different rows, and you save that lookup table, and you left. Join it into

162 00:20:43.630 00:21:01.800 Brian Pei: the skew table, and you do a left join or an outer join. It’s like all these different things. And I say this all to say that even for those simple problems there’s a hundred different ways to do it, and it is also at least fun for me to figure out what the best way to do it is especially considering.

163 00:21:02.390 00:21:11.159 Brian Pei: or at least for me, I think really good data architects consider the future. So if I’ve seen a lot of solutions built for the data that they have now.

164 00:21:11.270 00:21:23.340 Brian Pei: but you have to take, you have to consider like, Hey, what happens in in a year? If they add teal as a color, will my solution hold up, or is it built to handle additions to this queue in the future

165 00:21:23.970 00:21:32.750 Brian Pei: all fun things that I’ve done before, and they’re really fun, and I enjoy it. It feels like I was a math Major. So I like these little puzzles.

166 00:21:34.770 00:21:43.049 Brian Pei: At spotify. I do very similar stuff. I’m an analytics engineer. I do data modeling.

167 00:21:43.660 00:21:45.160 Brian Pei: I do.

168 00:21:45.480 00:21:52.120 Brian Pei: I guess, from all the way from raw data.

169 00:21:52.880 00:21:55.359 Brian Pei: there’s many different definitions for raw data.

170 00:21:55.780 00:22:06.050 Brian Pei: Raw data could be somebody’s Google sheet where they just have typed in stuff. And it could be an Api, and it could be scraping like you said it could be all these things, but we put everything together into bigquery.

171 00:22:06.340 00:22:09.870 Brian Pei: and we do transformation with Dbt.

172 00:22:10.150 00:22:14.109 Brian Pei: And we have a homegrown tool for orchestration.

173 00:22:14.790 00:22:22.630 Brian Pei: So with all that being said, I’m curious.

174 00:22:23.551 00:22:30.809 Brian Pei: This might sound like an interview interview question. But it’s not. Can you describe to me like

175 00:22:33.960 00:22:39.330 Brian Pei: let’s say I’m a small company, and I use. Do you know, Workday.

176 00:22:40.990 00:22:41.710 Annie Yu: Yes.

177 00:22:42.080 00:22:42.470 Brian Pei: Okay.

178 00:22:42.470 00:22:45.869 Annie Yu: The Hr. Managed system. Kind of.

179 00:22:45.870 00:22:51.479 Brian Pei: Yeah, we actually do invoices. Now, too, let’s take that out. Or do you know

180 00:22:52.740 00:23:03.419 Brian Pei: let’s just do, Paypal. Let’s say I’m a small company, and I use Paypal to charge customers to do their laundry, and I want like I want to report

181 00:23:03.690 00:23:12.060 Brian Pei: every month that tells me my monthly revenue. But I have like no data team. Can you tell me like how you would

182 00:23:12.450 00:23:28.999 Brian Pei: start like, what tools you would use? What you would do if I obviously, you know, gave you my paypal Api key. What you would do. And I have an unlimited budget for data tools. What you would do to get this data

183 00:23:29.190 00:23:34.809 Brian Pei: into a report like end to end? If you could describe that to me. So I kind of can gauge.

184 00:23:35.030 00:23:35.530 Brian Pei: You’re.

185 00:23:35.530 00:23:44.940 Annie Yu: So from the very upstream, like from the Api, how it gets moved to like the data warehouse that’s where I like. Don’t have

186 00:23:45.130 00:23:52.430 Annie Yu: much knowledge about. So I don’t know. I know that people do something. And then ingest the data. So.

187 00:23:52.430 00:23:53.689 Brian Pei: You can guess, yeah, yeah.

188 00:23:53.690 00:23:56.339 Annie Yu: But yeah, I never

189 00:23:56.610 00:24:10.989 Annie Yu: like, really, I’m like comfortable with like Api or things of that sort. So I don’t know how it moves gets moved. But I guess that’s something that’s we like, I would wanna like, I will probably have to learn.

190 00:24:11.900 00:24:17.680 Annie Yu: Sure, yeah, do the so it’s revenue right?

191 00:24:18.370 00:24:18.970 Brian Pei: Yeah.

192 00:24:19.120 00:24:27.070 Annie Yu: So yeah, I guess after ingestion, then so it’s already in the warehouse. Then we can do the transformation.

193 00:24:28.070 00:24:31.389 Annie Yu: And am I like oversimplifying this.

194 00:24:31.580 00:24:37.959 Annie Yu: So I think that’s the end goal and then just make it clean for the end user to use.

195 00:24:39.505 00:24:43.730 Brian Pei: Not an interview question. So it’s it’s all good you can. You can

196 00:24:43.850 00:24:46.440 Brian Pei: as much depth as as you want.

197 00:24:46.750 00:24:51.539 Annie Yu: Yeah, I would say, I’m I’m not really like, clear on, like

198 00:24:51.860 00:24:58.760 Annie Yu: what you just said, like orchestration and pipeline tools, or.

199 00:24:59.020 00:25:02.210 Annie Yu: okay, CICD. Whatever that means

200 00:25:03.250 00:25:06.690 Annie Yu: that are like still a buzzword for me.

201 00:25:07.120 00:25:09.360 Brian Pei: Cool. This is perfect. I can

202 00:25:09.510 00:25:15.579 Brian Pei: basically tell you all that stuff. Now, okay, are you taking notes.

203 00:25:16.784 00:25:17.489 Annie Yu: Yes.

204 00:25:17.910 00:25:25.019 Brian Pei: Okay, cool. So let’s take that example. When you were, I’ll start

205 00:25:25.410 00:25:32.879 Brian Pei: small when you were during your master’s program, or or any anything. Did you ever have to write a custom

206 00:25:33.020 00:25:39.409 Brian Pei: python script or Javascript to ping an Api, and like, just get data from an Api.

207 00:25:39.660 00:25:58.449 Annie Yu: I remember, did one exercise that’s doing like Twitter. And one course, I think we yeah, we did. We get the Api. I forgot what we did. But like there is like a token key. And then we run some script. And eventually we can do some analysis using those data.

208 00:25:58.900 00:26:15.780 Brian Pei: Okay, cool. Let’s talk about that 1st step. So that 1st step is called extraction. I think you said the word. But data extraction is just moving data into a data warehouse. Right? So that 1st step is, you have all these data sources

209 00:26:16.050 00:26:24.209 Brian Pei: salesforce paypal, whatever and if it’s a good application. They’ll have an Api.

210 00:26:24.350 00:26:27.319 Brian Pei: If it’s a bad application, you might have to scrape it yourself.

211 00:26:27.820 00:26:39.040 Brian Pei: If it’s a Google Sheet, there’s Google Sheet Apis there, Utam and Brainforge use enterprise, cloud-based

212 00:26:39.150 00:26:44.489 Brian Pei: tools to kind of do that. So you don’t have to write a custom. See a custom python script

213 00:26:46.380 00:26:50.840 Brian Pei: but sometimes you still have to write a little bit of custom, Python, but it’s like super easy.

214 00:26:51.030 00:26:55.319 Brian Pei: So one of those tools that Utam likes to use is called Fivetran.

215 00:26:55.530 00:27:01.989 Brian Pei: So Fivetran is an extraction tool, extraction, meaning

216 00:27:02.500 00:27:06.109 Brian Pei: all these sources that don’t talk to each other.

217 00:27:06.650 00:27:09.330 Brian Pei: Get them into a data warehouse like you said.

218 00:27:09.620 00:27:15.700 Brian Pei: and a data warehouse or a database. So that’s kind of step one.

219 00:27:15.950 00:27:20.689 Brian Pei: The data warehouse can be snowflake. It can be redshift. It can be. Whatever the client uses.

220 00:27:20.950 00:27:27.709 Brian Pei: It’s all basically the same, though you know it’s just a place where everything can go into one place.

221 00:27:29.150 00:27:35.709 Brian Pei: So fivetran and other tools that utam uses have pretty clean uis where you click a button

222 00:27:36.050 00:27:40.339 Brian Pei: and you click Paypal, you type in the Api.

223 00:27:41.054 00:27:44.740 Brian Pei: And it will do 2 things

224 00:27:45.070 00:27:47.790 Brian Pei: if I’ve been using Paypal for many, many years.

225 00:27:47.940 00:27:50.020 Brian Pei: There’s going to be a full sync first.st

226 00:27:50.200 00:27:53.689 Brian Pei: So if I run that full sync today.

227 00:27:54.300 00:28:04.800 Brian Pei: it’ll sync all of their paypal information into, let’s say, Snowflake, for this example into Snowflake one time. Right? So now you have all of my invoices.

228 00:28:06.430 00:28:10.019 Brian Pei: But tomorrow new invoices come in.

229 00:28:10.490 00:28:14.769 Brian Pei: And so if you just full sync at once, it’s not going to update every day.

230 00:28:15.040 00:28:18.399 Brian Pei: So that’s orchestration. Orchestration just means.

231 00:28:18.650 00:28:20.869 Brian Pei: Now that I have this Api set up.

232 00:28:21.390 00:28:28.159 Brian Pei: I need it to run on a schedule automatically. So I don’t have to log into the computer and click full sync every single day by myself.

233 00:28:29.155 00:28:38.090 Brian Pei: There’s multiple types of orchestration. And there’s also orchestration downstream during like

234 00:28:38.210 00:28:49.920 Brian Pei: transformation and reporting stuff. So like orchestration happens in extraction, in transform in reporting, but for the purposes of extraction, orchestration will say.

235 00:28:50.630 00:29:00.269 Brian Pei: I’ll go to the client, and I’ll just be like, Hey, there’s a lot of data here. How often would you like your invoices data to be refreshed

236 00:29:00.480 00:29:08.975 Brian Pei: because the end user might want to see a report that is just like every month he checks the report. Okay? Well, we can. Then we can

237 00:29:10.390 00:29:12.669 Brian Pei: orchestrate it to run once a month.

238 00:29:12.940 00:29:16.280 Brian Pei: It will also be cheaper, because you’re only doing it once a month.

239 00:29:17.000 00:29:19.740 Brian Pei: The higher the cadence of orchestration.

240 00:29:19.970 00:29:24.310 Brian Pei: probably the more expensive it’ll be in the sense that

241 00:29:25.400 00:29:32.050 Brian Pei: now, if he wants to see sales every day, then I have to change my orchestration schedule for Paypal to run every day.

242 00:29:32.210 00:29:39.959 Brian Pei: or if they want real time, which a lot of people say they want. But they don’t really use. That’s like, you know, every minute.

243 00:29:40.210 00:29:47.280 Brian Pei: And the reason that it’ll cost more is because a tool like Fivetran is going to have to ping

244 00:29:47.570 00:30:00.070 Brian Pei: the Api every single minute and see if there’s new data, and that just costs more. But it’s up to the client. So should always ask at what cadence do you want this raw data?

245 00:30:00.840 00:30:06.890 Brian Pei: So let’s say, we have Paypal. We have salesforce. We have an Excel sheet or a Google sheet, or whatever.

246 00:30:07.370 00:30:25.049 Brian Pei: So you set up the extraction part one time full sync, and then you figure out what schedule they want the data, and maybe what time? Right? Because it might be better to upload the data at 4 Am. So that when they come into work at 7 Am. The data is already there. So you also have to pick a time.

247 00:30:25.330 00:30:29.169 Brian Pei: and also time zones are a thing when you do orchestration.

248 00:30:29.710 00:30:37.210 Brian Pei: Some clients don’t care about it, and some clients are like, oh, I’m in Pst, and my friend is works from Australia. Can we get it in

249 00:30:37.770 00:30:44.830 Brian Pei: Utc or GMT, like the universal time thing? And then you have to like do time, time, zone conversions, and that’s all. Whatever. It’s easy.

250 00:30:45.560 00:31:02.180 Brian Pei: So now it runs once a month or sorry. Now it runs every day, and it’s in Snowflake, but it isn’t in a consolidated database in the sense that usually then have a custom database in Snowflake that has like

251 00:31:02.980 00:31:07.119 Brian Pei: transaction tables and fact tables and dimension tables and reporting tables.

252 00:31:08.600 00:31:11.440 Brian Pei: That doesn’t exist yet. So that’s the second step.

253 00:31:12.010 00:31:27.770 Brian Pei: Because when you full sync paypal and salesforce and stuff in Snowflake. You’re gonna have like a paypal schema, and just a bunch of raw data right like invoices and customers and addresses and blah blah blah! And then the same with salesforce and everything so.

254 00:31:28.210 00:31:36.160 Brian Pei: and a good modeling part comes there. That’s where you write custom. SQL. For modeling.

255 00:31:37.336 00:31:40.739 Brian Pei: You have to clean this data up.

256 00:31:40.890 00:31:44.800 Brian Pei: especially like if I get revenue from Salesforce and Paypal

257 00:31:45.360 00:31:48.580 Brian Pei: outside of the database salesforce and Paypal. Don’t talk to each other.

258 00:31:48.960 00:31:59.449 Brian Pei: but I’m going to get salesforce invoices, and I’m going to get paypal invoices. So it’s up to me to stack them together, much like the example I gave earlier with the

259 00:32:00.580 00:32:04.969 Brian Pei: Clothing brands, the structure might be different.

260 00:32:05.765 00:32:09.510 Brian Pei: Salesforce might have them by line. Item.

261 00:32:09.720 00:32:12.750 Brian Pei: where my order was $80,

262 00:32:12.890 00:32:19.780 Brian Pei: and it was a 20 sneakers and a $10 service charge. So that’s 3 rows of data.

263 00:32:20.080 00:32:24.370 Brian Pei: and then Paypal might be at the invoice total level

264 00:32:24.540 00:32:46.770 Brian Pei: where you don’t have the 3 rows. It’s just all aggregated. So you wouldn’t want to union those together, because one is by item, and one is by order, number or fulfillment number. So that’s intuition. Then of like, okay, well, I can’t union those things together. So I have to aggregate the salesforce line items first, st

265 00:32:46.940 00:32:47.910 Brian Pei: to make

266 00:32:48.120 00:32:55.189 Brian Pei: a total invoice number before I can union it to the Paypal invoices if they want to see all their money

267 00:32:56.402 00:33:02.029 Brian Pei: that usually all happens with Dbt. Which Tom likes to use a lot.

268 00:33:02.380 00:33:10.499 Brian Pei: Dbt is just enhanced sequel. In the past without Dbt.

269 00:33:11.609 00:33:24.819 Brian Pei: You said you use snowflake. So like I’ve used snowflake tasks before where you write some sequel that aggregates data and unions data together. Right? And

270 00:33:25.310 00:33:28.820 Brian Pei: it’s all just raw SQL stored in Snowflake

271 00:33:29.000 00:33:35.130 Brian Pei: as a task, and you can like click the task to run but

272 00:33:35.490 00:33:39.580 Brian Pei: it’s not in git. There’s no version control.

273 00:33:39.770 00:33:46.989 Brian Pei: If it’s wrong, it’s harder to figure out what’s wrong. There’s no logs. And so Dbt is just

274 00:33:49.400 00:33:51.120 Brian Pei: sequel with

275 00:33:52.460 00:34:02.250 Brian Pei: bells and whistles. It’s like, yeah. What did I say? Advanced sequel. It’s not really a new language. It’s just SQL. With a little bit of ginger and a little bit of python where

276 00:34:02.500 00:34:06.250 Brian Pei: it’s in a version controlled repo on Github.

277 00:34:06.932 00:34:15.870 Brian Pei: You can create folders. And you can do all this organization, especially when companies have, like hundreds of sequel models that have to run.

278 00:34:16.760 00:34:22.660 Brian Pei: you know, in different times, and it’s I would say it’s easy to learn. So

279 00:34:23.239 00:34:31.490 Brian Pei: in the future we could probably do like an actual Dbt deep dive, but at a broad level. Dbt.

280 00:34:31.650 00:34:41.409 Brian Pei: just syncs to Snowflake, sets up a schedule and Dvt treats data models as software.

281 00:34:41.620 00:34:55.500 Brian Pei: So you have metadata related to your models. That Dbt stores change log regular logs unit testing.

282 00:34:56.340 00:34:57.969 Brian Pei: regular testing.

283 00:35:00.710 00:35:05.132 Brian Pei: Auto-generated lineage auto, generated dags.

284 00:35:06.510 00:35:09.330 Brian Pei: But at the end of the day you’re just doing.

285 00:35:10.140 00:35:15.560 Brian Pei: Select. Start from this select start from this. You need it together, make it a reporting table

286 00:35:16.740 00:35:21.110 Brian Pei: in the simplest term, but it can get complicated like anything can get complicated.

287 00:35:21.910 00:35:28.959 Brian Pei: So that transform step happens before reporting. And it happens in Snowflake using Dbt.

288 00:35:29.600 00:35:32.630 Brian Pei: Dbt. Also does orchestration.

289 00:35:32.810 00:35:38.979 Brian Pei: So the orchestration and the extraction layer with like 5 Tran and like Paypal and salesforce stuff

290 00:35:39.360 00:35:51.130 Brian Pei: that is separate from the orchestration of Dbt, because if my paypal invoices Update, if my raw paypal invoices data updates this morning at 5 am.

291 00:35:51.330 00:35:56.836 Brian Pei: And I need to, you know, aggregate that data in a separate snowflake table.

292 00:35:58.290 00:36:07.779 Brian Pei: Then my Dbt orchestration needs to run at like 9 Am. Or 10 Am. So just different schedules that you have to make sure that they

293 00:36:08.590 00:36:11.959 Brian Pei: kind of like sync together, and

294 00:36:12.630 00:36:16.230 Brian Pei: I’m going to keep saying 5 Tran for orchestration. But the 5 Tran

295 00:36:16.520 00:36:30.259 Brian Pei: orchestration schedule lives in 5 Tran and the Dbt orchestration level lives in Dbt, it’s not one platform. So you have to go like multiple places to figure out what’s running at at the same time.

296 00:36:32.000 00:36:36.770 Brian Pei: Cool. So that’s dbt like in a very

297 00:36:39.210 00:36:49.390 Brian Pei: up. What is it called high level? That’s dbt at a very high level. There’s a lot of things Dbt can do, but we use it for transformation scripts

298 00:36:52.160 00:36:57.509 Brian Pei: kind of like before. Dbt, and I use tableau like

299 00:36:57.650 00:37:04.859 Brian Pei: a bunch of really complicated custom. SQL. Logic would live in a tableau data source, right? Or a tableau workflow.

300 00:37:05.170 00:37:14.339 Brian Pei: But in tableau there’s no git repo. Again, there’s no version control.

301 00:37:14.560 00:37:19.669 Brian Pei: You could copy paste the same sequel over and over again for different tableau reports.

302 00:37:19.810 00:37:24.459 Brian Pei: But if a table changes, or if logic changes.

303 00:37:24.660 00:37:29.490 Brian Pei: you have to change it in tableau, and you have to remember to change it in like 5 different places.

304 00:37:29.950 00:37:38.149 Brian Pei: This is why we use Dbt, because, dbt, you have it all in one place, and then tableau should just select star from a Dbt table.

305 00:37:38.310 00:37:47.319 Brian Pei: And then, if you have a change in Dbt, you make it in one place, it’s version controlled, and it should cascade into all the reports. If Dbt is updated.

306 00:37:49.150 00:37:58.240 Brian Pei: Do let’s see. So so yeah, so then you use Dbt to make like a universal invoices table.

307 00:37:58.360 00:38:02.170 Brian Pei: It has excel, invoices and salesforce invoices and

308 00:38:02.480 00:38:08.679 Brian Pei: paypal invoices, and it’s all clean now, and it’s union together. And

309 00:38:09.840 00:38:18.600 Brian Pei: then we use that table. Fact invoices whatever you want to call it. For reporting.

310 00:38:18.770 00:38:25.660 Brian Pei: or you can write another table on top of invoices that does monthly aggregations. And you put that into reporting.

311 00:38:26.180 00:38:33.069 Brian Pei: but that’s you know the bread and butter, and the

312 00:38:33.280 00:38:40.039 Brian Pei: most important part probably, is that transformation step, because that’s also where you have to

313 00:38:41.700 00:38:43.960 Brian Pei: make sure that the data is right.

314 00:38:44.370 00:38:49.210 Brian Pei: because you can do all this stuff. But if revenue is wrong, then

315 00:38:49.670 00:38:53.959 Brian Pei: the stakeholder will know it’s wrong. So that’s also where validation happens. Right?

316 00:38:54.170 00:38:58.200 Brian Pei: You need to ask the stakeholder like, Hey, I’m doing all this stuff. Can you give me

317 00:38:58.570 00:39:00.330 Brian Pei: like last year’s?

318 00:39:02.700 00:39:04.679 Brian Pei: Last year’s revenue

319 00:39:06.710 00:39:16.239 Brian Pei: run rate whatever, because you already reported that. And when I run my tables I’ll just match the numbers to make sure that they’re

320 00:39:16.900 00:39:21.540 Brian Pei: they’re correct. So there’s a validation. Step in there and then

321 00:39:21.680 00:39:23.680 Brian Pei: you put it into reporting, and

322 00:39:23.840 00:39:30.020 Brian Pei: you’re working for Brainforge. So reporting is on the client. So you use tableau right? But

323 00:39:30.210 00:39:32.760 Brian Pei: clients use tableau or looker or

324 00:39:32.920 00:39:37.140 Brian Pei: pie chartered all these different things for reporting netscape.

325 00:39:38.750 00:39:43.709 Brian Pei: I would say that the reporting tool is the once you get used to, it is the easiest part.

326 00:39:44.140 00:39:46.210 Brian Pei: Every reporting tool is basically the same.

327 00:39:46.340 00:39:52.429 Brian Pei: If the data in Snowflake looks right, then you could put it in any

328 00:39:52.700 00:39:55.940 Brian Pei: reporting tool. You just have to learn the ui of the reporting tool.

329 00:39:57.670 00:40:10.699 Brian Pei: So before I used like when I was just starting out, and I only knew how to use Looker. I got nervous when a client would be like, well, we use, let’s say tableau. And then I started using tableau. And I realized it’s all the same.

330 00:40:11.200 00:40:16.660 Brian Pei: If the table looks. If the table in Snowflake is correct, then every

331 00:40:16.880 00:40:26.806 Brian Pei: business intelligence tool, every reporting tool that’s ever been made. It all does the same thing. It’s just like select star from this table.

332 00:40:27.400 00:40:39.299 Brian Pei: And then you just have to figure out what buttons to press, to do, bar, chart, or make it a report, conditional formatting, all that, all that ui stuff you just have to learn the reporting tool. And it’s like, pretty easy.

333 00:40:40.872 00:40:43.840 Brian Pei: What am I missing?

334 00:40:45.730 00:40:54.410 Brian Pei: I think that’s wow. I’ve been talking for like 25 min. Yeah, that’s like a that’s high level. I I think of like those buzzwords, and like what I do.

335 00:40:55.360 00:40:57.739 Brian Pei: Do you have any questions about anything that I just said?

336 00:40:59.087 00:41:08.539 Annie Yu: Yeah, I have a few. So I’ve taken some good notes. And one thing, though, I want to bring up the real data. So what’s that? I think

337 00:41:09.000 00:41:38.419 Annie Yu: because I have to do some like dashboards with real data. But then I realized, okay, I myself have to do the transformation there, and also, like Yaml Code. That was the only one time that I had to start writing Yaml code. So those are the things that I never really knew I was expected to do. But then I don’t know which part of the the things that you just mentioned, like that tool, or whatever happens in that tool

338 00:41:38.690 00:41:40.200 Annie Yu: belong to.

339 00:41:41.854 00:41:45.130 Brian Pei: Can you explain what you mean by yaml coding.

340 00:41:46.145 00:41:52.819 Annie Yu: So in real data, I had to write like a source source files.

341 00:41:52.940 00:42:13.790 Annie Yu: And then, ma, there’s like a source section. And there’s model section. There’s metrics and then eventually dashboard. So source I will use, like, maybe, like a connector, select a table from Snowflake and then have to do the transformation and models.

342 00:42:14.180 00:42:16.970 Annie Yu: And then I have to set up metrics

343 00:42:17.130 00:42:20.480 Annie Yu: using Yaml in the metrics section.

344 00:42:20.980 00:42:21.980 Annie Yu: So that’s cool.

345 00:42:22.540 00:42:27.049 Annie Yu: And then eventually, I can use those metrics to build dashboard.

346 00:42:27.460 00:42:36.940 Annie Yu: And yeah, those are the things I don’t really know. Like how to categorize them. And

347 00:42:37.570 00:42:39.629 Annie Yu: I think that’s kind of

348 00:42:40.250 00:42:49.800 Annie Yu: like a dbt, but I’m not sure if they are, and I don’t even know the flow of like source model metrics dashboard. And I don’t know if that’s

349 00:42:50.160 00:42:55.799 Annie Yu: that’s like like normal or I I just I just don’t know why it is.

350 00:42:56.934 00:43:05.199 Brian Pei: It’s not normal or not normal. It just like all all of the different tools that a company uses like, I just rambled about

351 00:43:06.090 00:43:14.979 Brian Pei: specific tools. But a company can totally use real data if they don’t want to use Dvt or

352 00:43:15.460 00:43:22.849 Brian Pei: tableau, right? It’s just what you’re saying is basically like yaml files.

353 00:43:24.610 00:43:44.439 Brian Pei: Our metadata that you you like kind of like. Write. This is what the metric is. This is what the dimension is. This is the column it’s coming from, and then the yaml file like aggregates it, and I guess displays it. It’s basically like it’s tableau. But without the ui instead of you doing this in a Ui, you have to write it down in Yaml.

354 00:43:44.907 00:44:02.590 Brian Pei: It’s still the source data would in this case would be the paypal data, and then the models would be the transformation layer. And then the reporting would be, it’s just you’re describing a tool that I haven’t used very much, but it’s just. It does all of that in a different way, because.

355 00:44:02.590 00:44:03.120 Annie Yu: Bye.

356 00:44:03.360 00:44:10.030 Brian Pei: Clients use different things, and that’s that’s part of consulting you as long as you know. Kind of like the

357 00:44:11.150 00:44:19.329 Brian Pei: what’s it called the general high level knowledge of

358 00:44:20.830 00:44:32.430 Brian Pei: source source data modeling, reporting, and somebody is like, Oh, we don’t use any of those tools. But we have this weird yaml thing. Then you just use those principles, and you have to kind of like, learn what the yamls are actually doing.

359 00:44:32.760 00:44:42.200 Brian Pei: So I would say that since this is like a meet and greet that when we meet next week you share your screen.

360 00:44:42.640 00:44:48.180 Brian Pei: I’ll take a look at what the requirements are, and how this Yaml thing works.

361 00:44:48.280 00:44:51.570 Brian Pei: And I’ll basically just like talk you through how I would do it

362 00:44:52.900 00:44:56.500 Brian Pei: unless this thing is due like tomorrow. I hope it’s not.

363 00:44:56.910 00:45:07.659 Annie Yu: Oh, yeah, no. Yeah. I’ve already gone through some of those, and I think they are usually like lower risk. That’s why we don’t use dbt, so so should be.

364 00:45:08.240 00:45:13.729 Brian Pei: Yeah, this this makes sense. I probably should have asked that in the beginning I kind of just talked about tools that I like using.

365 00:45:13.860 00:45:19.719 Brian Pei: But you’re using. You have to use tools that clients want to use. So.

366 00:45:19.720 00:45:44.369 Annie Yu: A lot of Dvt. And I also do have one more question. So I forgot to mention this. But I’m taking like a part time data science degree with Ut Austin. I’m just doing one course per semester. So I’m doing part-time because I’m more interested in doing like more applied Ml. And stats that kind of thing. So I’m not like going to be like a full stack engineer or things of that sort. But

367 00:45:44.940 00:45:59.009 Annie Yu: I think it’s definitely good for me to to have knowledge of what’s happening upstream. But with like that in mind, does it make sense like I would prefer to learn kind of from still from

368 00:45:59.520 00:46:08.819 Annie Yu: lower, like downstream to upstream, so like Dbt before Api ingestion. Does that make sense.

369 00:46:09.170 00:46:12.560 Brian Pei: A 100%. I work with

370 00:46:13.160 00:46:17.169 Brian Pei: data scientists and Ml. Engineers every day at Spotify.

371 00:46:17.340 00:46:22.979 Brian Pei: And they, you said, you don’t want to be full stack, but they’re all full stack like unfortunate.

372 00:46:23.150 00:46:24.290 Brian Pei: They all know.

373 00:46:24.290 00:46:24.690 Annie Yu: Okay.

374 00:46:24.690 00:46:25.120 Brian Pei: Nice.

375 00:46:25.120 00:46:28.290 Annie Yu: Also do like Api ingestion.

376 00:46:28.830 00:46:29.789 Brian Pei: i i i help them.

377 00:46:29.790 00:46:39.599 Brian Pei: but they can do it. If I’m like on vacation, they they’re they’re able to do it. It’s all part of the same scope, I guess. Like.

378 00:46:40.260 00:46:43.889 Brian Pei: if a data set. So here’s an example. So

379 00:46:46.580 00:46:50.499 Brian Pei: so I help data scientists in, let me pick something

380 00:46:52.010 00:46:54.260 Brian Pei: for music streams. Do you spotify.

381 00:46:54.450 00:46:55.040 Annie Yu: Yeah.

382 00:46:55.460 00:47:00.320 Brian Pei: Perfect. So I have all of my

383 00:47:00.640 00:47:10.419 Brian Pei: companies Spotify’s events right? So like when you click and listen to music, I know what you’re clicking on. I know who you are.

384 00:47:10.820 00:47:13.280 Brian Pei: I know how long you’re listening to

385 00:47:13.400 00:47:18.029 Brian Pei: Sabrina Carpenter, and I know how many tracks you’re listening to.

386 00:47:18.550 00:47:18.900 Annie Yu: Yeah.

387 00:47:18.900 00:47:22.870 Brian Pei: That’s that’s all data from

388 00:47:23.090 00:47:36.770 Brian Pei: spotify right? And I am able to give a data. If a data science scientist wants to do an analysis on that. I have cleaned up streams and users and whatever. And I give it to them.

389 00:47:37.020 00:47:46.889 Brian Pei: And then that data scientist wants to do some sort of correlation analysis on that data, to see in

390 00:47:47.390 00:47:53.779 Brian Pei: developing cities who are just downloading, spotify for the 1st time what artists

391 00:47:53.910 00:47:59.500 Brian Pei: are they listening to? Are they listening to Sabrina Carpenter? Or are they listening to local artists in their city?

392 00:47:59.930 00:48:13.560 Brian Pei: That data scientist then needs to get external public data right? They need to get data for a list of developing cities that spotify is in. They need to get a geometric map of like

393 00:48:15.670 00:48:24.410 Brian Pei: the the places that people are downloading spotify? Are they urban? Are they suburban? Are they next to restaurants? Are they next to offices?

394 00:48:24.650 00:48:34.460 Brian Pei: That’s not music data. And so I’m not responsible for that it’s data scientist’s job. If they want to do a correlation between these things, they have to scrape

395 00:48:34.630 00:48:46.910 Brian Pei: geographical and demographic data from developing cities themselves, and then they have to merge my spotify data with their city data to do a correlation analysis.

396 00:48:47.000 00:49:06.610 Brian Pei: So that’s like the give and take like, I can only help data scientists on what I have available in Spotify’s database if they want to correlate it with any other demographic information that they may have, or competitors, or anything like that. It’s outside of my purview, but they have to do that themselves.

397 00:49:10.070 00:49:10.990 Brian Pei: Does that make sense.

398 00:49:11.706 00:49:17.069 Annie Yu: Yeah. So then, okay, so then, how would you recommend?

399 00:49:17.600 00:49:20.829 Annie Yu: Like, with everything in mind, how would you recommend like

400 00:49:21.210 00:49:26.280 Annie Yu: cause? We can’t tackle everything at once. Right? So

401 00:49:26.700 00:49:33.480 Annie Yu: what would be the most recommended way like, do we start digging into

402 00:49:35.340 00:49:42.049 Annie Yu: Which part of the like extraction, orchestration, or or just what happens in Dbt.

403 00:49:42.750 00:49:46.440 Brian Pei: We should start with your

404 00:49:47.810 00:49:49.830 Brian Pei: the work that you actually have to do.

405 00:49:50.400 00:49:57.880 Brian Pei: Because what? When I look through it and review it, then I’ll know what what concepts we should do. Deeper dives into.

406 00:49:58.210 00:49:59.509 Annie Yu: Okay, so that makes.

407 00:49:59.860 00:50:10.480 Brian Pei: So next week it’ll be like a pair coding session where you share your screen and you walk me through what needs to happen. And I just

408 00:50:11.880 00:50:20.519 Brian Pei: I like, help you pair code through anything else, or review what you’re doing, or give suggestions. And then from there I’ll know

409 00:50:21.040 00:50:22.770 Brian Pei: where we have to go next.

410 00:50:23.955 00:50:27.529 Annie Yu: Think all, all the real data.

411 00:50:28.740 00:50:30.050 Brian Pei: Great. Yeah.

412 00:50:31.311 00:50:41.550 Brian Pei: Cool. Yeah, we can do that next time. I feel like I’ve been talking for 50 min. That’s long enough. I can give you 10 min back, and we’ll schedule something early next week

413 00:50:41.670 00:50:44.650 Brian Pei: to do. To do real data pair coding.

414 00:50:45.210 00:50:47.220 Annie Yu: Yeah, that’s awesome.

415 00:50:47.690 00:50:48.380 Brian Pei: Cool.

416 00:50:48.720 00:50:51.269 Annie Yu: Yeah, thank you very much for your time.

417 00:50:51.460 00:50:55.260 Brian Pei: Yeah, of course, it’s nice to meet you. And yeah, we’ll talk next week.

418 00:50:55.560 00:50:57.290 Annie Yu: Yeah. Thanks. Brian.

419 00:50:57.590 00:50:58.999 Brian Pei: Thank you. Bye.