Meeting Title: Annie Coaching Session Date: 2025-08-12 Meeting participants: Awaish Kumar, Annie Yu, Brian Pei
WEBVTT
1 00:00:33.090 ⇒ 00:00:33.950 Awaish Kumar: Hello.
2 00:00:34.370 ⇒ 00:00:35.500 Brian Pei: Hello!
3 00:00:36.580 ⇒ 00:00:37.519 Brian Pei: What’s up?
4 00:00:39.860 ⇒ 00:00:41.559 Awaish Kumar: I’m good. How are you doing?
5 00:00:42.700 ⇒ 00:00:48.020 Brian Pei: Good! Let me see… one thing… I’m too old.
6 00:00:48.020 ⇒ 00:00:49.270 Awaish Kumar: It’s just that cool.
7 00:00:49.600 ⇒ 00:00:53.859 Awaish Kumar: ask any, like, I had requested if you come up with some
8 00:00:54.150 ⇒ 00:00:58.979 Awaish Kumar: Like, come prepared with some agenda, or any template, or something.
9 00:00:59.100 ⇒ 00:01:03.779 Awaish Kumar: To go over those things, like, with Brian, that would be nice.
10 00:01:05.119 ⇒ 00:01:15.079 Brian Pei: Yeah, I think mostly today. Last time was just mostly intro and figuring out, level of comfortability in
11 00:01:15.419 ⇒ 00:01:34.229 Brian Pei: all the things that we do, there’s a lot of things. And so today, the plan was for Annie to share her screen and go through, the YAMLs for metrics and dimensions she was writing, because I don’t have access to them, so I… I can’t really prep for it, but I was gonna take a look at, what she’s trying to do.
12 00:01:34.229 ⇒ 00:01:38.409 Brian Pei: The order of operations, and, just give some…
13 00:01:38.749 ⇒ 00:01:47.109 Brian Pei: insights once I… once I look at it. So I was kind of gonna hand it over to Annie to share her screen, and then I’m gonna ask questions, …
14 00:01:47.369 ⇒ 00:01:50.889 Brian Pei: It’s gonna be paracoding, we’re gonna do pericoding today, so…
15 00:01:51.099 ⇒ 00:01:59.699 Brian Pei: I might have to leave 15 minutes early, so, let’s just get right into it. Annie, if you can share…
16 00:01:59.919 ⇒ 00:02:02.769 Brian Pei: What was the name of the thing again? Data Vault?
17 00:02:02.770 ⇒ 00:02:03.700 Annie Yu: Well.
18 00:02:03.700 ⇒ 00:02:04.130 Brian Pei: Something?
19 00:02:04.130 ⇒ 00:02:19.640 Annie Yu: Yeah, I’m thinking, can we actually, start with maybe DVT, and then we can look at the real? Just because I brought this up to my leader, and… because currently I’m not on any projects that are using REAL.
20 00:02:19.860 ⇒ 00:02:20.560 Annie Yu: So….
21 00:02:20.560 ⇒ 00:02:23.880 Brian Pei: Oh, yeah, okay, yeah, we can go straight into dbt, that’s great.
22 00:02:23.880 ⇒ 00:02:31.770 Annie Yu: Yeah, yeah, and then maybe from there, I can have a better sense, or, like, what’s really different from real versus dbt.
23 00:02:31.930 ⇒ 00:02:41.540 Annie Yu: Cool. And I also do have some questions that I would love to talk about. Really, it’s like, I’m an analyst, and sometimes I have
24 00:02:41.540 ⇒ 00:02:55.380 Annie Yu: engineering requirements, and I would love to hear… I can show you, like, how I usually communicate, what kind of… what kind of fields and the details that I need, but I would love to hear from your perspective, like, what’s the best way? I guess I also differ
25 00:02:56.100 ⇒ 00:02:58.829 Annie Yu: person by person, but that’s something I would love to talk about.
26 00:02:58.830 ⇒ 00:03:09.130 Brian Pei: No, I’m gonna go by, … I’ve done… I think I’ve done 20 of these. So, of the things that I’ve seen work and not work, I’m gonna give you what works….
27 00:03:09.770 ⇒ 00:03:10.130 Annie Yu: Can you please?
28 00:03:10.130 ⇒ 00:03:10.650 Brian Pei: Senator.
29 00:03:11.150 ⇒ 00:03:12.020 Brian Pei: So….
30 00:03:13.750 ⇒ 00:03:14.870 Annie Yu: Yeah. Do you… yeah.
31 00:03:14.870 ⇒ 00:03:18.799 Brian Pei: Yeah, do you… do you have… do you want to start… do you want me to take a look at this? Like, how do you want to start.
32 00:03:18.800 ⇒ 00:03:27.140 Annie Yu: Sure, sure, sure, cause I… I also… have realized not everyone Like…
33 00:03:28.250 ⇒ 00:03:37.430 Annie Yu: is taking… well, I think some engineers are… I… I think sometimes it’s easier for me to list out, like, all the fields
34 00:03:37.680 ⇒ 00:03:44.350 Annie Yu: so I wrote this, like, all the fields that I need, and then what they mean, and then… but I know sometimes…
35 00:03:44.730 ⇒ 00:03:52.549 Annie Yu: Juniors are more like, oh, you tell me what you eventually want to present, and I can decide what fields and, like, what granularity.
36 00:03:52.550 ⇒ 00:03:53.310 Brian Pei: Interesting.
37 00:03:53.310 ⇒ 00:03:58.820 Annie Yu: Should be, so I… yeah, so these… this is what one of the… …
38 00:03:59.000 ⇒ 00:04:09.519 Annie Yu: example, so for this one, I’m saying, we need to ex… just enrich this pre-existing table, and I’m listing out what
39 00:04:09.810 ⇒ 00:04:12.659 Annie Yu: What should be kept as is, and…
40 00:04:12.930 ⇒ 00:04:19.350 Annie Yu: And some for more renaming, clarity, and then what new fields that I want, and what they mean.
41 00:04:19.490 ⇒ 00:04:20.519 Annie Yu: … Cool.
42 00:04:20.649 ⇒ 00:04:27.210 Annie Yu: And then this one is like a… kind of like a brand new one, so I list out all the fuels and what they mean.
43 00:04:27.550 ⇒ 00:04:31.949 Annie Yu: So we are… I’m proposing, like, phasing out a legacy…
44 00:04:32.070 ⇒ 00:04:36.549 Annie Yu: model that’s not maintained, and then… So, yeah, let’s…
45 00:04:37.300 ⇒ 00:04:56.139 Annie Yu: I do it, and then sometimes I even, like, sketch out something in, like, a CSV, like, because I sometimes can’t go directly to this part, like, I have to sketch out something, for me to see, and then I’ll share this, like, sample, like, final table.
46 00:04:56.700 ⇒ 00:04:57.950 Annie Yu: That’s fine.
47 00:04:58.430 ⇒ 00:05:10.719 Annie Yu: Yeah, but I… yeah, I think I do it differently every time, because sometimes I’m very clear on what I need, but sometimes I kind of struggle to go from the end result I want, and then…
48 00:05:10.940 ⇒ 00:05:16.639 Annie Yu: to all the details through how the model should look like, so that’s… Okay.
49 00:05:16.800 ⇒ 00:05:17.720 Annie Yu: Yeah.
50 00:05:18.150 ⇒ 00:05:24.839 Brian Pei: Okay, let’s start from, let’s go to the… Repo in your code editor.
51 00:05:25.510 ⇒ 00:05:29.939 Annie Yu: Did you mean GitHub, or… or the cursor?
52 00:05:29.940 ⇒ 00:05:31.640 Brian Pei: Cursor. Cursor’s fine.
53 00:05:33.290 ⇒ 00:05:38.769 Brian Pei: Okay, and then let me test… Can you see this?
54 00:05:40.000 ⇒ 00:05:40.560 Annie Yu: Mmm?
55 00:05:41.000 ⇒ 00:05:41.610 Annie Yu: Hmm.
56 00:05:41.610 ⇒ 00:05:44.380 Brian Pei: I’m typing on the screen. Can you see that?
57 00:05:44.460 ⇒ 00:05:45.540 Annie Yu: Nice.
58 00:05:46.130 ⇒ 00:05:47.650 Annie Yu: No.
59 00:05:47.650 ⇒ 00:05:49.570 Brian Pei: How does this work?
60 00:05:51.000 ⇒ 00:05:54.940 Brian Pei: … draw? Can you see… can you see… wait….
61 00:05:55.370 ⇒ 00:05:55.900 Annie Yu: Yeah.
62 00:05:55.900 ⇒ 00:05:57.680 Brian Pei: Can you see, like, stuff like this?
63 00:05:58.130 ⇒ 00:06:04.250 Annie Yu: Yes, but I should look through… wait… oh, I don’t have to look at the Zoom meeting, I… oh, yeah, I see that.
64 00:06:04.500 ⇒ 00:06:12.550 Brian Pei: Okay, let me clear… Okay, so can you see… this is Brian. Can you see that?
65 00:06:14.790 ⇒ 00:06:16.899 Annie Yu: Not now, not this time.
66 00:06:17.040 ⇒ 00:06:19.390 Brian Pei: Damn it, okay, text is not working, that’s fine.
67 00:06:19.820 ⇒ 00:06:24.680 Brian Pei: Okay, let me start from… let me get rid of this.
68 00:06:26.560 ⇒ 00:06:30.150 Brian Pei: You have a DBT project, …
69 00:06:30.430 ⇒ 00:06:35.680 Brian Pei: Let’s take a look first at dbt Project YAML, near the bottom.
70 00:06:36.240 ⇒ 00:06:38.949 Annie Yu: … okay, wait.
71 00:06:39.710 ⇒ 00:06:43.370 Brian Pei: Yep, go up a little bit, go up a little bit, keep going up, dbchie….
72 00:06:43.370 ⇒ 00:06:49.319 Annie Yu: And these are not maintained by me, it’s maintained by the actual engineers.
73 00:06:49.940 ⇒ 00:06:57.980 Brian Pei: That’s fine. … Eden, okay, cool, da-da-da-da-da, that’s fine. Go to Profiles, the fourth one.
74 00:06:58.370 ⇒ 00:06:59.520 Annie Yu: profiles.
75 00:07:01.150 ⇒ 00:07:03.190 Brian Pei: Okay, cool, so….
76 00:07:03.190 ⇒ 00:07:13.789 Annie Yu: Great, so you’re using BigQuery, I’m assuming, is what it says. Yeah. You have a service account, okay, so this is what I needed to know, that we’re not in Snowflake or anything. So this is BigQuery.
77 00:07:13.790 ⇒ 00:07:17.830 Brian Pei: They have environment variables for credentials, so that’s all fine.
78 00:07:18.280 ⇒ 00:07:23.580 Brian Pei: … do you… is your terminal set up where you can run dbt commands yet?
79 00:07:24.280 ⇒ 00:07:31.930 Annie Yu: I don’t think so. That’s something I would love to learn, because I know sometimes people can run a query and see, like, a table here, but I….
80 00:07:32.450 ⇒ 00:07:37.680 Brian Pei: 100%. That’s… that’s important. Okay, do you know, is this…
81 00:07:37.980 ⇒ 00:07:40.690 Brian Pei: project on dbt Cloud, do you know that?
82 00:07:42.740 ⇒ 00:07:46.290 Annie Yu: No. Oh, it’s probably not.
83 00:07:46.590 ⇒ 00:07:48.210 Brian Pei: It’s not on dbt Cloud?
84 00:07:48.890 ⇒ 00:07:50.310 Annie Yu: what….
85 00:07:50.310 ⇒ 00:07:54.860 Brian Pei: No… DB Cloud is like a… What’s up?
86 00:07:55.610 ⇒ 00:08:01.349 Awaish Kumar: I was just saying, we work on dbt Core, so none of our projects right now are on dbt Cloud.
87 00:08:01.760 ⇒ 00:08:13.979 Brian Pei: It’s not on cloud. Okay, that’s fine. Let’s… Let’s see… can you do… Mmm… It’s gonna be hard to…
88 00:08:14.170 ⇒ 00:08:21.320 Brian Pei: It’s gonna take too much time to… to set up. Try, just… maybe really quick, do LS in terminal.
89 00:08:22.240 ⇒ 00:08:26.570 Brian Pei: ls, enter, CD, dbt project.
90 00:08:31.630 ⇒ 00:08:32.710 Brian Pei: LS…
91 00:08:34.700 ⇒ 00:08:40.820 Brian Pei: Okay, so we’re in dbt. Just do brew install dbt. Let’s just see if we can one and done.
92 00:08:45.160 ⇒ 00:08:47.649 Brian Pei: Okay, so you have a brew, that’s good.
93 00:08:47.940 ⇒ 00:08:48.470 Annie Yu: Yeah.
94 00:08:49.380 ⇒ 00:08:53.790 Brian Pei: Sweet. Sweet. What’s that say on the bottom there?
95 00:08:55.260 ⇒ 00:09:00.640 Brian Pei: tasks. To install dbt run… Oh, to install a de- yeah, just, just do it.
96 00:09:07.300 ⇒ 00:09:15.949 Brian Pei: … That’s fine. I don’t know if we need whatever the hell that is. Okay. Okay. Now do…
97 00:09:17.000 ⇒ 00:09:20.809 Brian Pei: Just type, type dbt space dash dash version.
98 00:09:21.870 ⇒ 00:09:23.120 Annie Yu: Wait, ….
99 00:09:23.490 ⇒ 00:09:24.520 Brian Pei: with a space.
100 00:09:24.630 ⇒ 00:09:25.510 Brian Pei: Yeah.
101 00:09:28.690 ⇒ 00:09:32.360 Brian Pei: Okay, so it’s not, like, fully installed. …
102 00:09:37.970 ⇒ 00:09:45.739 Brian Pei: I’m opening my terminal to see if I… We’re gonna spend 2 minutes max on this, it’s not worth spending all this time.
103 00:09:46.790 ⇒ 00:09:51.340 Annie Yu: We’re trying to install dbt on my… Laptop, is that it?
104 00:09:52.020 ⇒ 00:09:54.290 Brian Pei: Basically, after this, …
105 00:09:54.550 ⇒ 00:10:09.460 Brian Pei: you should try… Uouton probably has documentation, I have to assume, of a… it’s not just installing dbt, it’s getting your terminal to be able to locally run dbt commands, because…
106 00:10:10.030 ⇒ 00:10:13.359 Brian Pei: 90% of this is going to be…
107 00:10:13.600 ⇒ 00:10:24.419 Brian Pei: like, I assume right now, you’re merging into master and just crossing your fingers that it works. Like, there’s no way for you to test if you don’t… if you can’t run dbt in here right now.
108 00:10:24.970 ⇒ 00:10:26.310 Brian Pei: …
109 00:10:26.870 ⇒ 00:10:37.250 Brian Pei: We can try something. Go to the cursor, prompt page, and just say, hi cursor, what are the steps for me to run dbt locally in terminal?
110 00:10:38.250 ⇒ 00:10:42.630 Annie Yu: Okay, wait, okay. I am still learning to use this, too. Okay.
111 00:10:42.630 ⇒ 00:10:47.390 Brian Pei: I think it’s, top right by the gear.
112 00:10:47.600 ⇒ 00:10:49.919 Brian Pei: Click the… click to the left of the gear.
113 00:10:51.850 ⇒ 00:10:57.080 Brian Pei: Yeah, there we go. And… yeah, just be like… Hi!
114 00:10:57.710 ⇒ 00:11:00.760 Annie Yu: Alright! What’s the….
115 00:11:01.700 ⇒ 00:11:04.030 Brian Pei: Or, or whatever, … Nope.
116 00:11:04.460 ⇒ 00:11:12.380 Brian Pei: Hi, how can I… run… and how can I install and run dbt locally?
117 00:11:17.260 ⇒ 00:11:18.920 Brian Pei: Yeah, I think that’s fine.
118 00:11:23.290 ⇒ 00:11:25.250 Brian Pei: Oh, it’s reading everything!
119 00:11:26.130 ⇒ 00:11:31.409 Brian Pei: Okay… This is awesome.
120 00:11:33.240 ⇒ 00:11:35.369 Brian Pei: Virtual environment, that’s one that I forgot.
121 00:11:36.340 ⇒ 00:11:36.780 Annie Yu: Okay.
122 00:11:36.780 ⇒ 00:11:40.009 Brian Pei: Okay, so for now, we’re not gonna do all this stuff.
123 00:11:40.190 ⇒ 00:11:42.679 Brian Pei: Tried to do all this stuff.
124 00:11:43.060 ⇒ 00:11:47.270 Brian Pei: After this meeting, because…
125 00:11:48.380 ⇒ 00:11:55.040 Brian Pei: for quickness of development, you need to be in your terminal running stuff, you just don’t know if things are gonna work or not.
126 00:11:55.760 ⇒ 00:11:56.640 Brian Pei: Okay.
127 00:11:56.960 ⇒ 00:11:59.160 Brian Pei: Let’s just do… let me do….
128 00:11:59.160 ⇒ 00:12:00.370 Annie Yu: 10-minute bucket.
129 00:12:00.590 ⇒ 00:12:01.600 Brian Pei: What’d you say?
130 00:12:01.700 ⇒ 00:12:04.530 Annie Yu: No, I don’t know what that means.
131 00:12:05.200 ⇒ 00:12:09.889 Brian Pei: Oh, basically, like… Open… open models in dbt Project.
132 00:12:10.190 ⇒ 00:12:10.880 Annie Yu: Yeah.
133 00:12:11.430 ⇒ 00:12:17.930 Brian Pei: An open intermediate, or whatever, and open any SQL file, So, like…
134 00:12:18.430 ⇒ 00:12:28.510 Brian Pei: you now understand that, like, dbt SQL is not raw SQL, right? It’s like, you can’t copy-paste this whole thing and put it into BigQuery and have it run, because.
135 00:12:28.510 ⇒ 00:12:28.880 Annie Yu: there’s.
136 00:12:28.880 ⇒ 00:12:32.749 Brian Pei: Jinja in here. Line 11 is Jinja, …
137 00:12:33.950 ⇒ 00:12:40.919 Brian Pei: like, line 11 is Jinja, it’s dbt-specific language, BigQuery does not know what that means. …
138 00:12:41.530 ⇒ 00:12:55.350 Brian Pei: dbt, compiles Jinja into readable SQL, but you have to tell it to do that. And the only way to do that is to run in terminal dbt compile, and it spits out, like.
139 00:12:55.790 ⇒ 00:13:02.489 Brian Pei: the raw SQL that you can then copy-paste into BigQuery and run it in BigQuery to make sure that the SQL executes.
140 00:13:02.850 ⇒ 00:13:06.200 Brian Pei: And also, you can run…
141 00:13:06.260 ⇒ 00:13:25.689 Brian Pei: if you have dbt installed locally, so you can’t do this right now, but you would write something like, dbt run int form responses, in production, you click enter, and then in your… in cursor right now, it’ll take int form responses, and it’ll run it in BigQuery automatically.
142 00:13:25.980 ⇒ 00:13:33.520 Brian Pei: it’ll create the table, or it’ll be incremental, or whatever, and then you can go into BigQuery, and you can look at the table, every time you make a change.
143 00:13:33.660 ⇒ 00:13:40.800 Brian Pei: So you can iteratively develop if you can run dbt locally in your terminal, which is the thing on the bottom.
144 00:13:40.900 ⇒ 00:13:46.489 Brian Pei: You can’t do that right now, but the instructions on how to do that are on the right. That’s what I’m saying.
145 00:13:47.200 ⇒ 00:13:54.649 Annie Yu: Okay. And does that mean overwriting everyone’s view, or just my… Things on my end.
146 00:13:55.530 ⇒ 00:13:56.520 Brian Pei: …
147 00:13:57.210 ⇒ 00:14:08.079 Brian Pei: then we have… we kind of have to talk about testing, so if, like, if you’re making new stuff, and it doesn’t exist in BigQuery, then you can go crazy, right? Nobody’s… nobody knows about it. If you’re editing…
148 00:14:08.850 ⇒ 00:14:13.100 Brian Pei: SQL that already exists, you would have to…
149 00:14:13.870 ⇒ 00:14:29.909 Brian Pei: run it in development, or you can say, don’t write to the production schema. You can build, like, a sandbox schema, like any underscore sandbox, and you can tell it to run into your sandbox so it doesn’t overwrite what’s in production.
150 00:14:32.020 ⇒ 00:14:38.780 Brian Pei: When we get to that point, I’ll teach you all the ways where you can’t fuck up prod, but for now…
151 00:14:39.150 ⇒ 00:14:44.899 Brian Pei: We can’t do it yet, that’ll be the next session. But it’s fine, because there’s a lot of stuff that we can talk about without having to run
152 00:14:45.190 ⇒ 00:14:47.479 Brian Pei: Oh my god, stop texting me! Okay.
153 00:14:47.690 ⇒ 00:14:51.829 Brian Pei: So, here we go. Under, can you collapse models?
154 00:14:53.280 ⇒ 00:15:01.579 Brian Pei: Okay, so these are… this is all very important. Everything under dbt project is, …
155 00:15:02.200 ⇒ 00:15:17.899 Brian Pei: one of these… 1, 2, 3, 4, 5 things. These five things are all different, for different needs. Everything under that is YAMLs that are instructions of how dbt…
156 00:15:18.090 ⇒ 00:15:23.870 Brian Pei: connects to BigQuery. So, it’s like, business logic SQL is under dbt project.
157 00:15:23.870 ⇒ 00:15:44.459 Brian Pei: all the connection environment variables, blah blah blah blah blah, that’s already been set up, it looks like, is there in these YAMLs, so you probably don’t have to worry about that, but just so you know, dbt project YAML, packages YAML, and packages YAML is just, like, if you need to, I don’t know, install Pandas or whatever, it’s just like a Python… so there’s not really too many, but if you wanted to bring in,
158 00:15:44.600 ⇒ 00:16:00.549 Brian Pei: any of the packages that people use, timestamp shit, or whatever, like, these are… these are Python packages. So… if you don’t need any of that stuff, you don’t have to worry about this, but I’m giving you context because it’s there, and you’re looking at it, and you should know what it does.
159 00:16:00.610 ⇒ 00:16:12.799 Brian Pei: Okay, let’s start with the first folder. Macros are not SQL models. They are SQL files that hold, SQL code
160 00:16:14.100 ⇒ 00:16:17.740 Brian Pei: In a macro. So, click the third one, Marketing Product Name.
161 00:16:21.550 ⇒ 00:16:23.930 Brian Pei: Whatever. Okay, so if you scroll all the way up.
162 00:16:25.590 ⇒ 00:16:27.600 Brian Pei: You see, it says macro at the top?
163 00:16:27.790 ⇒ 00:16:37.829 Brian Pei: This is saying that if I ever call marketing product name in one of my models, inject this SQL into it.
164 00:16:38.010 ⇒ 00:16:40.550 Brian Pei: So if you scroll all the way down.
165 00:16:41.810 ⇒ 00:16:49.610 Brian Pei: It says end macro, so that’s how you… that’s how you have a macro. You have a macro, end macro. Here is…
166 00:16:50.230 ⇒ 00:16:56.309 Brian Pei: SQL, and you’ll notice that it’s not full SQL, right? It’s not select star from table, it’s just a case when statement.
167 00:16:56.530 ⇒ 00:16:59.139 Brian Pei: Now scroll all the way up again.
168 00:17:00.720 ⇒ 00:17:18.960 Brian Pei: there’s a variable in the… in line 1 that says standardized product name, right? Yeah. So when you call this macro, you can put in something in there, which is, like, a column name, let’s say, and those brackets under it that say standardized product name, will…
169 00:17:19.310 ⇒ 00:17:24.409 Brian Pei: change to whatever variable you put in there. So, if the column name is, like.
170 00:17:24.630 ⇒ 00:17:28.359 Brian Pei: P name. Then, when you call this macro.
171 00:17:28.500 ⇒ 00:17:35.539 Brian Pei: in a model, somewhere, you would do marketing, product name, parentheses, P name, and then
172 00:17:35.620 ⇒ 00:17:50.690 Brian Pei: the SQL will compile to case when P name equals blah blah blah, P name equals blah blah blah. This is, … this is the software engineering way of dry principles. Do not repeat yourself, right? So, like.
173 00:17:51.000 ⇒ 00:18:04.600 Brian Pei: This is a giant case win statement that’s very custom, I’m assuming, where the… the client, wants to, you know, rename products so that when you group it, it probably, like.
174 00:18:04.930 ⇒ 00:18:19.180 Brian Pei: makes it a lot less text or whatever. So, like, in a Tableau dashboard, instead of having all those products you see on the left, it’s like gummies and bundles and whatever. And so, it’s the same thing as doing, like, …
175 00:18:19.810 ⇒ 00:18:33.290 Brian Pei: case when, New York, then NY, whatever. It’s cleaning up, right? So, let’s say, I’m making all this up as examples. Let’s say they have, like, 6 reporting models.
176 00:18:33.430 ⇒ 00:18:47.900 Brian Pei: SQL files that all need to, change their product name to this. Instead of having this giant thing in all six SQL models, you keep it in one place and call it.
177 00:18:47.950 ⇒ 00:18:49.940 Brian Pei: Right? Because if…
178 00:18:49.940 ⇒ 00:19:10.059 Brian Pei: if we didn’t have macros, and those 6 file names have this giant chunk of SQL in it, and we ever need to change or update these names, we have to remember to go to every single of the 6 file names, and we have to remember to change it, and make sure they all tie together. That’s stupid. So, basically, repeatable SQL logic.
179 00:19:10.090 ⇒ 00:19:13.699 Brian Pei: We try to put it in a macro, so that
180 00:19:13.840 ⇒ 00:19:23.350 Brian Pei: if 6 models call this macro, and one of these names changes, we just change it here, in the macro, save it, and it cascades down to all those models. Does that make sense?
181 00:19:23.830 ⇒ 00:19:24.700 Annie Yu: Yes.
182 00:19:25.000 ⇒ 00:19:25.650 Brian Pei: Great.
183 00:19:25.760 ⇒ 00:19:27.470 Brian Pei: That’s macros. Any questions?
184 00:19:27.870 ⇒ 00:19:29.260 Annie Yu: How would…
185 00:19:29.680 ⇒ 00:19:37.270 Annie Yu: I think you mentioned this, but how would, if I’m writing a model, how would I utilize this? How do I call this?
186 00:19:37.990 ⇒ 00:19:41.779 Brian Pei: Yeah, select, marketing product name.
187 00:19:42.340 ⇒ 00:19:45.749 Brian Pei: At the… on line 1, sorry, highlight it, is what I mean.
188 00:19:45.990 ⇒ 00:19:49.380 Brian Pei: Not the whole macro, just highlight marketing product name.
189 00:19:49.570 ⇒ 00:19:50.330 Annie Yu: Okay.
190 00:19:50.530 ⇒ 00:19:52.229 Brian Pei: And then, copy.
191 00:19:53.220 ⇒ 00:20:01.130 Brian Pei: And then… go to Find and Files, Somewhere… where’s find?
192 00:20:01.860 ⇒ 00:20:02.690 Annie Yu: Oh….
193 00:20:03.660 ⇒ 00:20:06.760 Brian Pei: Probably go all the way up to File.
194 00:20:07.960 ⇒ 00:20:11.130 Brian Pei: Like, all the way up All the way up.
195 00:20:11.670 ⇒ 00:20:12.400 Annie Yu: Oh, come on.
196 00:20:12.490 ⇒ 00:20:18.420 Brian Pei: Oh, here. Yeah, File, edit, selection, maybe?
197 00:20:19.690 ⇒ 00:20:25.470 Brian Pei: … View, maybe? Keep, like, hovering over the top things.
198 00:20:26.160 ⇒ 00:20:28.550 Brian Pei: … no…
199 00:20:28.820 ⇒ 00:20:38.710 Brian Pei: Go to… go to… no… oh, there it is, go! … no, not go to file, find in file.
200 00:20:38.930 ⇒ 00:20:41.760 Brian Pei: Scroll over to run, let’s see if it’s in there.
201 00:20:42.530 ⇒ 00:20:45.649 Brian Pei: Scroll back to the left to edit, maybe it’s an edit.
202 00:20:45.890 ⇒ 00:20:46.590 Annie Yu: Edit.
203 00:20:47.530 ⇒ 00:20:50.439 Brian Pei: Find in Files. There it is. Find in Files.
204 00:20:51.810 ⇒ 00:20:54.519 Brian Pei: Control, paste, or paste, whatever.
205 00:20:55.880 ⇒ 00:20:56.940 Brian Pei: Enter.
206 00:20:57.850 ⇒ 00:21:04.169 Brian Pei: … There it is. So, Legacy Transaction SQL, like, click it.
207 00:21:05.780 ⇒ 00:21:06.590 Brian Pei: Oh.
208 00:21:07.680 ⇒ 00:21:09.399 Brian Pei: Okay, that’s how you would call it.
209 00:21:11.180 ⇒ 00:21:12.840 Annie Yu: So….
210 00:21:12.840 ⇒ 00:21:18.610 Brian Pei: So, two squiggly brackets, name of the macro, Name of the variable.
211 00:21:19.350 ⇒ 00:21:26.270 Brian Pei: And then, basically, When this runs, in real SQL,
212 00:21:26.420 ⇒ 00:21:32.900 Brian Pei: It’ll look for marketing product name in the macro, and it’ll put that giant case win statement in here.
213 00:21:35.780 ⇒ 00:21:37.839 Brian Pei: So, that’s how you call macro. It’s very easy.
214 00:21:38.410 ⇒ 00:21:42.210 Brian Pei: Something I can remind you about if you ever have to do them. All good.
215 00:21:43.040 ⇒ 00:21:45.609 Annie Yu: Okay. Yeah, this makes sense.
216 00:21:46.190 ⇒ 00:21:46.890 Brian Pei: Cool.
217 00:21:47.010 ⇒ 00:21:52.989 Brian Pei: Okay, let’s go back to… great, so you can collapse macros. So that’s macros. Pretty important.
218 00:21:53.380 ⇒ 00:21:56.620 Brian Pei: … For a dry coating.
219 00:21:56.820 ⇒ 00:22:00.399 Brian Pei: You can collapse this, we don’t have to look at the other macros.
220 00:22:01.050 ⇒ 00:22:07.029 Annie Yu: Okay. And dry coating means just not repeat, not repeating….
221 00:22:08.120 ⇒ 00:22:12.990 Brian Pei: Yeah, dry is more a software engineering thing, D-R-Y, do not repeat yourself.
222 00:22:13.820 ⇒ 00:22:16.829 Annie Yu: Okay, okay, that makes sense. Yeah.
223 00:22:17.410 ⇒ 00:22:18.560 Brian Pei: Okay.
224 00:22:18.830 ⇒ 00:22:22.859 Brian Pei: You can collapse models, I wanna save that for last, that’s the most important thing.
225 00:22:23.170 ⇒ 00:22:24.390 Annie Yu: seeds.
226 00:22:24.390 ⇒ 00:22:26.619 Brian Pei: Let’s go to seeds next. You know what seeds are?
227 00:22:27.260 ⇒ 00:22:36.740 Annie Yu: I think there are… I know these are from, like, CSV files, and, like, for… is it more, like, static lookup table kind of thing?
228 00:22:36.940 ⇒ 00:22:50.180 Brian Pei: That’s exactly what it is, yep. These are just CSVs that they want to make into a table, so this doesn’t exist, I assume, in their BigQuery, and so they just put it in here.
229 00:22:50.390 ⇒ 00:22:51.460 Brian Pei: That’s fine.
230 00:22:51.710 ⇒ 00:23:00.610 Brian Pei: So if you ever have CSV files for lookup tables that aren’t in BigQuery, you can put them in Seeds, as a CSV, and
231 00:23:01.140 ⇒ 00:23:06.860 Brian Pei: the next dbt run, it’ll look through the seeds folder, and it’ll make all these CSVs into tables.
232 00:23:07.460 ⇒ 00:23:13.280 Annie Yu: But it only takes CSV files as… the input….
233 00:23:13.650 ⇒ 00:23:19.060 Brian Pei: Let’s see… can CDBT take text files?
234 00:23:21.620 ⇒ 00:23:24.350 Annie Yu: What about, okay, CSV.
235 00:23:24.350 ⇒ 00:23:32.370 Brian Pei: It looks like it’s only CSV files. It does not take text files or Excel files. You have to export it as a CSV.
236 00:23:33.580 ⇒ 00:23:44.140 Annie Yu: So I know sometimes we put stuff in, like, a Google spreadsheet, so I assume that means the engineer then later downloads that into a CSV and put it here.
237 00:23:44.970 ⇒ 00:23:48.280 Brian Pei: Probably… Probably.
238 00:23:48.390 ⇒ 00:23:53.110 Brian Pei: Some, some Google Sheets… so, you know how, like, so BigQuery is a Google product, right?
239 00:23:53.850 ⇒ 00:23:54.670 Annie Yu: Yep.
240 00:23:54.670 ⇒ 00:24:01.719 Brian Pei: Some Google Sheets, there’s a button in the Google Sheet that says export to BigQuery, so you can bypass this.
241 00:24:04.410 ⇒ 00:24:06.570 Brian Pei: It depends, yeah, it depends on what they need.
242 00:24:06.960 ⇒ 00:24:12.239 Brian Pei: So that’s easy. Seeds are easy, they’re just CSVs. Sometimes you need them, sometimes you don’t.
243 00:24:13.240 ⇒ 00:24:17.299 Brian Pei: Okay, snapshots, let’s see, is there anything in there?
244 00:24:20.110 ⇒ 00:24:33.119 Brian Pei: Snapshots are… the only… very specific… SQL transformation type, That requires its own folder.
245 00:24:34.190 ⇒ 00:24:37.770 Brian Pei: I don’t want to get too much into this, because I don’t think you’re gonna use it.
246 00:24:38.000 ⇒ 00:24:49.510 Brian Pei: But snapshots in data engineering land, are… called, … SCD2 tables.
247 00:24:50.000 ⇒ 00:24:54.120 Brian Pei: which you could Google, which stands for Slowly Changing Dimension.
248 00:24:54.720 ⇒ 00:24:55.680 Annie Yu: which….
249 00:24:55.680 ⇒ 00:25:09.989 Brian Pei: Is a special type of incremental… Function where the table only… Updates… an invoice.
250 00:25:10.250 ⇒ 00:25:12.969 Brian Pei: Or, I’m using invoice as an example.
251 00:25:14.280 ⇒ 00:25:17.530 Brian Pei: Let’s say… okay, let’s say I have an… let’s say I paid an invoice.
252 00:25:18.510 ⇒ 00:25:25.430 Brian Pei: Let’s say I created an order on Shopify. I bought… I bought perfume, so I’m order A in Shopify.
253 00:25:25.540 ⇒ 00:25:27.150 Brian Pei: In a regular table.
254 00:25:27.740 ⇒ 00:25:32.750 Brian Pei: there is going to be order A as a row, and it’s gonna say my name, and it’s gonna say, …
255 00:25:33.310 ⇒ 00:25:39.690 Brian Pei: purchased, or whatever, right? Yeah. In a regular Shopify table, then, if I…
256 00:25:39.900 ⇒ 00:25:42.709 Brian Pei: purchased it, and I get it shipped.
257 00:25:43.030 ⇒ 00:25:46.740 Brian Pei: The column name for a status changes to shipped.
258 00:25:47.800 ⇒ 00:25:50.160 Brian Pei: So, it’s not an incremental row.
259 00:25:50.450 ⇒ 00:26:01.549 Brian Pei: that row just changes. And then if I return it, it’s gonna change… or if I refund it, it’s gonna change to refund, etc. So, in a regular table,
260 00:26:02.180 ⇒ 00:26:04.920 Brian Pei: in that… example.
261 00:26:05.170 ⇒ 00:26:16.289 Brian Pei: it’s hard to do look-back analysis because it’s overwriting my order row. So you can’t really see, like, when did I first place the order? When did I, you know, get a refund?
262 00:26:16.400 ⇒ 00:26:27.039 Brian Pei: And usually, in Shopify and stuff like that, there is a table that has, like, a changelog, where all of my actions are rows. So you’ll have my order, which is one distinct row.
263 00:26:27.150 ⇒ 00:26:34.740 Brian Pei: And you’ll have some sort of changelog where I ordered it, it shipped, I didn’t like it, I returned it, I got a refund. That’s 5 rows.
264 00:26:34.930 ⇒ 00:26:36.500 Annie Yu: In the changelog.
265 00:26:37.050 ⇒ 00:26:43.770 Brian Pei: A slowly changing dimension, takes my order, and it appends a new row.
266 00:26:43.920 ⇒ 00:26:52.909 Brian Pei: to that table when I make a change with a date range. And the date range is usually, like, start date, end date.
267 00:26:53.220 ⇒ 00:26:56.719 Brian Pei: And if the end date is null, that’s the most recent
268 00:26:57.080 ⇒ 00:27:05.769 Brian Pei: event. So, if my last thing was a refund, it’ll say, brine pay order, refund, end date null. The start date would be…
269 00:27:06.040 ⇒ 00:27:08.060 Brian Pei: the day that I made that action.
270 00:27:09.410 ⇒ 00:27:13.510 Brian Pei: I don’t think you’re ever gonna use this, but, you know, I might as well just, like.
271 00:27:13.630 ⇒ 00:27:30.970 Brian Pei: say that to you out loud, because maybe in the future, you hear something, and you’re like, oh, maybe this could be slowly changing dimension. There’s very, very specific use cases for it, and some people like them, and some people don’t even know what they are, so we don’t have to talk about it too much, but that’s what a slowly changing dimension table is. Okay.
272 00:27:31.340 ⇒ 00:27:32.210 Brian Pei: Yeah.
273 00:27:32.680 ⇒ 00:27:35.950 Annie Yu: So each populates one row, right?
274 00:27:37.120 ⇒ 00:27:38.530 Annie Yu: Pretty much. Okay.
275 00:27:38.740 ⇒ 00:27:39.330 Brian Pei: Yep.
276 00:27:39.860 ⇒ 00:27:45.319 Brian Pei: And in dbt, they’re called snapshots. It’s snapshotting the data every time there’s a change.
277 00:27:47.180 ⇒ 00:27:49.370 Brian Pei: Okay, you can collapse snapshots now.
278 00:27:50.750 ⇒ 00:27:52.080 Brian Pei: Let’s look at tests.
279 00:27:56.350 ⇒ 00:28:06.990 Brian Pei: Tests are… Tests are tests. It’s SQL that… does not materialize
280 00:28:07.320 ⇒ 00:28:09.960 Brian Pei: anything. It doesn’t make a table.
281 00:28:10.150 ⇒ 00:28:15.070 Brian Pei: They are business logic tests that can pass or fail.
282 00:28:15.220 ⇒ 00:28:22.329 Brian Pei: And if they fail, it’ll… if you have this linked up to PagerDuty or some kind of alerting thing.
283 00:28:22.330 ⇒ 00:28:26.019 Brian Pei: It’ll notify the user that some business logic thing failed.
284 00:28:26.020 ⇒ 00:28:47.730 Brian Pei: And here’s an example. I don’t know anything about what you’re looking at, but a test would be, like, invoices in this table should never be negative. They should always be positive. So I’d write SQL that would be, like, select, select star from this where invoices are negative. That would be my SQL block, right? So that’ll… that should never return rows.
285 00:28:47.880 ⇒ 00:28:57.460 Brian Pei: If it… if it doesn’t return rows, if there’s null rows, then the SQL test passes. If it does return rows, then the SQL test fails.
286 00:28:57.640 ⇒ 00:29:03.830 Brian Pei: So… If my test is select star from invoices where invoice amount is
287 00:29:04.190 ⇒ 00:29:06.200 Brian Pei: Less than or equal to zero.
288 00:29:06.360 ⇒ 00:29:13.149 Brian Pei: And it runs every day, and it’s null, that means all the invoices are positive, so it’s good. And then one day, if…
289 00:29:13.450 ⇒ 00:29:18.539 Brian Pei: If… if an invoice is negative, my WHERE clause will show…
290 00:29:18.710 ⇒ 00:29:25.360 Brian Pei: the data, right? It’ll… it’ll return results. That would be a failure. Then you could be like, -oh.
291 00:29:25.510 ⇒ 00:29:32.719 Brian Pei: one of the invoices was negative. That should never happen. I got an alert. I go in and I delete the negative invoice.
292 00:29:32.850 ⇒ 00:29:40.129 Brian Pei: That’s what tests are, which are good for basically later on in a project when everything’s done.
293 00:29:40.370 ⇒ 00:29:47.440 Brian Pei: … if the client wants it, or, like, we suggest it, I always suggest tests, because, you know.
294 00:29:47.610 ⇒ 00:30:02.279 Brian Pei: when engineers leave, they don’t… sometimes they don’t know how to maintain their… their shit. So it’s always usually… I always usually do tests at the end of everything, for… for stuff like… for stuff like that. …
295 00:30:02.860 ⇒ 00:30:04.350 Brian Pei: Tests are…
296 00:30:04.450 ⇒ 00:30:15.510 Brian Pei: as creative as the business logic as you could want. I like to do anomaly detection tests, meaning look at the past 12 months of revenue.
297 00:30:15.710 ⇒ 00:30:17.340 Brian Pei: If…
298 00:30:17.670 ⇒ 00:30:35.679 Brian Pei: this month’s revenue is, like, 10x last month’s revenue, then fail, because that’s probably a duplication of rows or something. It’s an anomaly, right? So, like, revenue grows month over month by, like, 5-10%. One month, it’s 1000%.
299 00:30:36.600 ⇒ 00:30:41.800 Brian Pei: Maybe it’s real, but I would like to know that something…
300 00:30:42.090 ⇒ 00:30:56.809 Brian Pei: that it’s not normal. Like, why did revenue grow 1000% in a month? Give me an alert to either verify that they just had a banger month and they were killing it, or probably something… some rows got duplicated and it aggregated up too much data.
301 00:30:56.970 ⇒ 00:30:57.630 Brian Pei: I like that.
302 00:30:58.170 ⇒ 00:30:59.290 Brian Pei: I like knowing that.
303 00:31:00.130 ⇒ 00:31:03.219 Brian Pei: Okay, so any questions about tests?
304 00:31:03.700 ⇒ 00:31:11.039 Annie Yu: Just one small one. So, do, like, business stakeholders usually know about this part, or it’s more, like…
305 00:31:11.360 ⇒ 00:31:12.930 Annie Yu: Engineers.
306 00:31:13.880 ⇒ 00:31:15.310 Brian Pei: Mmm, application.
307 00:31:15.310 ⇒ 00:31:16.110 Annie Yu: Yep.
308 00:31:17.780 ⇒ 00:31:24.229 Brian Pei: It depends on the client. I think most of the time, UTAM just likes to do them, because it keeps us feeling safe.
309 00:31:25.070 ⇒ 00:31:43.570 Brian Pei: But sometimes a client… or if we’re trying to upsell a client on extra work, right? We can be like, you guys don’t have any tests or monitoring, I think you should do it because… and then all the examples that I just said, basically, is how I would sell… is how I would sell, testing and monitoring work.
310 00:31:43.640 ⇒ 00:31:46.120 Brian Pei: I would say for the most part, if you see it in here.
311 00:31:46.310 ⇒ 00:31:52.439 Brian Pei: Probably engineering did it to make themselves Sleep at night, comfortably.
312 00:31:52.440 ⇒ 00:31:53.970 Annie Yu: Yeah.
313 00:31:54.230 ⇒ 00:31:55.709 Annie Yu: Okay, that makes sense.
314 00:31:55.710 ⇒ 00:32:06.449 Brian Pei: So, yeah, so you’ll notice that, like, a lot of the things in dbt… in our first session, I said dbt is just SQL on steroids. It just… dbt…
315 00:32:06.460 ⇒ 00:32:25.829 Brian Pei: is really good, because it treats SQL tables and SQL models as if it was software. Software engineerings do unit tests all the time. Software engineerings build DRY macros all the time. They do all this stuff in software engineering, and I think it’s good to have software engineering principles in data science. So, that’s what DBT is.
316 00:32:26.860 ⇒ 00:32:38.770 Brian Pei: And that’s why there’s, like, so many layers here, where it’s not just a SQL file that you run every day in Snowflake. Like, it’s very unique and organized, and every folder has a purpose, blah blah blah.
317 00:32:39.770 ⇒ 00:32:41.100 Brian Pei: Does that make sense?
318 00:32:41.710 ⇒ 00:32:48.199 Annie Yu: Yes, so without the soft software engineering layer, it’s just models.
319 00:32:48.780 ⇒ 00:32:51.929 Brian Pei: It’s just models, yeah. Yeah, it’s just models.
320 00:32:53.120 ⇒ 00:33:01.830 Brian Pei: Back when I first started, dbt didn’t exist, and so it was just a bunch of SQL files that we had a bash script that, like, ran them every day.
321 00:33:02.400 ⇒ 00:33:04.180 Brian Pei: And it was bad.
322 00:33:04.910 ⇒ 00:33:07.659 Annie Yu: What does that mean? What’s the bash?
323 00:33:08.380 ⇒ 00:33:13.700 Brian Pei: Oh, Bash is like, … have you ever written, like, just in terminal?
324 00:33:14.020 ⇒ 00:33:27.439 Brian Pei: You know how in Terminal you can do, like, open folder, and then it’ll open a folder? Create file, and it creates a file. That… that’s a bash script. You can schedule that to run every day. So, like.
325 00:33:27.800 ⇒ 00:33:33.340 Brian Pei: 10 years ago, there was a computer in an office that was always on.
326 00:33:33.590 ⇒ 00:33:41.730 Brian Pei: And there would be batch scripts saved on it that’s like, open this folder of SQL files, execute the SQL files, exit out of the folder.
327 00:33:42.010 ⇒ 00:33:57.549 Brian Pei: And then, like, every morning at 8 AM, that weird computer sitting in the back of the office just runs that. There is no cloud bullshit, and there was no serverless bullshit. It was like… it was basically somebody’s work computer, but it’s just sitting in a closet, and it’s always on because it has to do shit.
328 00:33:58.790 ⇒ 00:34:01.039 Brian Pei: But we’ve come a long way, which is really good.
329 00:34:02.770 ⇒ 00:34:03.530 Annie Yu: Yeah.
330 00:34:04.830 ⇒ 00:34:09.799 Brian Pei: Okay, last but not least, and give me one second, I’m getting messages.
331 00:34:29.489 ⇒ 00:34:30.349 Brian Pei: Okay.
332 00:34:31.199 ⇒ 00:34:33.839 Brian Pei: Models. Let’s look at models, the most important thing.
333 00:34:35.040 ⇒ 00:34:35.639 Annie Yu: Yep.
334 00:34:36.190 ⇒ 00:34:38.360 Brian Pei: Okay, let’s click Sources first.
335 00:34:40.440 ⇒ 00:34:43.879 Brian Pei: This is how you call tables in dbt.
336 00:34:44.030 ⇒ 00:34:49.800 Brian Pei: It’s a YAML file. This is all in BigQuery. It’s, like, the name of the dataset and the name of the table.
337 00:34:50.620 ⇒ 00:34:54.880 Brian Pei: And that’s it. And if a new table comes into BigQuery.
338 00:34:55.030 ⇒ 00:34:56.870 Brian Pei: You have to add that table in here.
339 00:34:57.110 ⇒ 00:35:03.090 Brian Pei: So, I see Google Ads, I see Facebook ads, there’s a bunch of… Bunch of tables in here.
340 00:35:03.460 ⇒ 00:35:05.179 Brian Pei: They’re all BigQuery tables.
341 00:35:06.960 ⇒ 00:35:12.320 Brian Pei: That’s… I feel like pretty simple. … It’s a…
342 00:35:12.810 ⇒ 00:35:15.199 Brian Pei: Yeah, it’s a… it’s a file that…
343 00:35:15.600 ⇒ 00:35:22.969 Brian Pei: has all of the BigQuery databases and tables that you would need to create your models. And…
344 00:35:23.360 ⇒ 00:35:30.340 Brian Pei: Yeah, and this probably doesn’t have everything, right? I assume whoever built this just put in the tables that they need to use.
345 00:35:31.030 ⇒ 00:35:33.659 Brian Pei: But, again, if…
346 00:35:34.010 ⇒ 00:35:39.189 Brian Pei: new tables come into BigQuery that aren’t in here. All you have to do is add them. That’s it.
347 00:35:39.500 ⇒ 00:35:45.999 Brian Pei: The table name is the table name of the BigQuery table, and the dataset is the name of the database in BigQuery.
348 00:35:48.650 ⇒ 00:35:51.710 Annie Yu: So, I can find every of these
349 00:35:51.970 ⇒ 00:35:56.339 Annie Yu: in BigQuery, if I just type in, like, the name.
350 00:35:56.540 ⇒ 00:35:57.440 Annie Yu: Yep.
351 00:35:57.860 ⇒ 00:35:58.450 Brian Pei: Yep.
352 00:35:59.850 ⇒ 00:36:01.510 Brian Pei: Cool.
353 00:36:02.420 ⇒ 00:36:10.040 Brian Pei: So, now we have 3 subfolders in models. Models are the SQL files that actually run.
354 00:36:10.530 ⇒ 00:36:24.210 Brian Pei: We like to organize them in raw, intermediate, and marts. Raw is raw data, I assume, where some cleaning happens, right? It’s like…
355 00:36:24.400 ⇒ 00:36:33.459 Brian Pei: filter out this state, filter out, … oh, do cast logic, give me the columns that I want. It’s just a way…
356 00:36:33.880 ⇒ 00:36:44.900 Brian Pei: to… Organize data better, … renaming shit that all happens in base. Er, sorry, not base.
357 00:36:45.340 ⇒ 00:36:51.499 Brian Pei: Basque, whatever. … It’s… yeah, it’s just…
358 00:36:52.190 ⇒ 00:36:56.050 Brian Pei: I need this single table with
359 00:36:56.340 ⇒ 00:37:03.329 Brian Pei: different… a little bit of business logic, and then I’m gonna use this as a sub-query later, if I reference it.
360 00:37:03.470 ⇒ 00:37:08.530 Brian Pei: If I reference back order completed, I’m referencing this block of SQL.
361 00:37:08.830 ⇒ 00:37:12.490 Brian Pei: Which has, you know, some… some business logic. So that’s good.
362 00:37:13.060 ⇒ 00:37:18.310 Annie Yu: Okay, wait, … Okay, no, I’m trying to follow here, so…
363 00:37:19.900 ⇒ 00:37:29.560 Annie Yu: Okay, so we do some transformation here, like you said, so it’s usually, like, renaming cast And then… version.
364 00:37:29.560 ⇒ 00:37:39.489 Brian Pei: Or anything you want, like, adding columns together, simple arithmetic, subtraction, multiplication, changing the names of columns, so, like.
365 00:37:40.010 ⇒ 00:37:42.350 Brian Pei: Line 36 is, like.
366 00:37:42.860 ⇒ 00:37:50.550 Brian Pei: it’s a trim of some column name that’s really, really big, so I guess they renamed the column name. Stuff like that.
367 00:37:51.610 ⇒ 00:38:04.900 Brian Pei: And you’ll actually see on line 5, this is how you, select from the sources YAML. So, bask order completed, comma, order completed. If you go to sources.
368 00:38:05.520 ⇒ 00:38:06.060 Annie Yu: No.
369 00:38:06.720 ⇒ 00:38:16.869 Brian Pei: You should be… and you scroll to wherever Bask order… yeah, it’s… there it is, line 61, or line 63. So, that’s how you call a source.
370 00:38:18.300 ⇒ 00:38:24.009 Annie Yu: But there’s… okay, but there’s this name, so we call both this name and this name.
371 00:38:24.500 ⇒ 00:38:25.130 Brian Pei: Yep.
372 00:38:26.040 ⇒ 00:38:28.330 Brian Pei: Data set is… okay.
373 00:38:28.720 ⇒ 00:38:32.310 Brian Pei: Dataset is the actual name of the BigQuery database.
374 00:38:32.620 ⇒ 00:38:36.259 Brian Pei: Name can be anything. It’s your alias.
375 00:38:36.680 ⇒ 00:38:38.080 Brian Pei: You can name it anything.
376 00:38:38.210 ⇒ 00:38:41.159 Brian Pei: You can name it Basque Orders for DBT modeling.
377 00:38:41.260 ⇒ 00:38:44.470 Brian Pei: But data set and tables are the actual names in BigQuery.
378 00:38:44.850 ⇒ 00:38:46.099 Annie Yu: Okay, makes sense.
379 00:38:46.940 ⇒ 00:38:57.809 Brian Pei: Cool. So, this is just, like, a clean, here’s your source, select star from that order completed table, rename, blah blah blah blah blah, and on the bottom, I assume it’s, like, select star from renamed.
380 00:38:58.250 ⇒ 00:39:13.940 Brian Pei: There it is, select star from renamed with a qualify statement for deduping. Window function. … yeah, line 125, that’s a very, common window function to do deduplic- deduplication.
381 00:39:14.090 ⇒ 00:39:15.230 Brian Pei: …
382 00:39:15.350 ⇒ 00:39:25.910 Brian Pei: It’s the row number thing, right? Except you don’t have to do, like, R equals 1, select star from this, where R equals 1, it’s just all in one line. You’ll see that everywhere. I use it all the time.
383 00:39:26.300 ⇒ 00:39:27.430 Brian Pei: Okay.
384 00:39:27.670 ⇒ 00:39:33.510 Brian Pei: So, it’s raw, intermediate, and Martz, so now you can go to intermediate.
385 00:39:33.650 ⇒ 00:39:37.710 Brian Pei: Intermediate is, like, staging tables, where…
386 00:39:38.180 ⇒ 00:39:41.060 Brian Pei: You’re join… now you’re joining tables together.
387 00:39:42.310 ⇒ 00:39:49.100 Brian Pei: Perfect example. So… so here, now, there’s more business logic, but…
388 00:39:49.220 ⇒ 00:39:53.319 Brian Pei: Raw is usually, like, one table, and you’re cleaning it.
389 00:39:53.450 ⇒ 00:39:56.349 Brian Pei: Intermediate is where you take a bunch of tables.
390 00:39:56.750 ⇒ 00:40:03.330 Brian Pei: that I have been cleaned, and you’re joining them together, To join together data.
391 00:40:03.930 ⇒ 00:40:05.510 Brian Pei: And that usually happens here.
392 00:40:08.590 ⇒ 00:40:12.260 Annie Yu: And there’s still more transformation happening here.
393 00:40:12.830 ⇒ 00:40:17.609 Brian Pei: Oh, yeah, we picked a random one, but, like, line 26, so there…
394 00:40:17.780 ⇒ 00:40:25.560 Brian Pei: there’s soft rules, like, line 26, he’s using a source, that’s fine. Line 13, line 9, he’s using a ref.
395 00:40:25.920 ⇒ 00:40:32.100 Brian Pei: So, in the little curly brackets, if you see source, it’s coming from the source’s YAML. If you see ref.
396 00:40:32.220 ⇒ 00:40:35.210 Brian Pei: That means that that is another SQL table.
397 00:40:35.560 ⇒ 00:40:36.909 Brian Pei: somewhere in here.
398 00:40:37.080 ⇒ 00:40:46.800 Brian Pei: So, like, on line 9, if you, if you copy FACT transactions and do CTRL-T or Command-T,
399 00:40:48.190 ⇒ 00:40:53.329 Brian Pei: And then, sorry, delete the number sign, I don’t know why that’s there.
400 00:40:55.580 ⇒ 00:41:01.490 Brian Pei: Yep, so… What we were just looking at is using this.
401 00:41:02.540 ⇒ 00:41:07.260 Brian Pei: And then it just, like, goes all the way back. So you can always trace it all the way back to a source table.
402 00:41:07.410 ⇒ 00:41:08.650 Brian Pei: But yeah.
403 00:41:08.840 ⇒ 00:41:16.280 Brian Pei: SQL dbt models layer on top of each other all the time. It can be very deep.
404 00:41:16.440 ⇒ 00:41:18.590 Brian Pei: And, you know, it is what it is.
405 00:41:19.060 ⇒ 00:41:24.409 Annie Yu: So even an intermediate model can be using a MARS model.
406 00:41:24.880 ⇒ 00:41:33.510 Brian Pei: Yeah, because sometimes… so this naming structure is not, like, a dbt thing. It’s, like, an organization thing for this specific client.
407 00:41:34.160 ⇒ 00:41:52.600 Brian Pei: It’s just for organization. It can be named, like, whatever you want, but yeah, like, maybe you have a fact table, fact transactions, and a dim table, DIM customer, and you need to join them together to aggregate customer… distinct customers by whatever. That could be an intermediate table.
408 00:41:52.990 ⇒ 00:41:55.930 Brian Pei: for, like, reporting, right? You know, it’s…
409 00:41:56.540 ⇒ 00:42:00.809 Brian Pei: As long as it’s not a circular dependency, where this depends on this.
410 00:42:01.260 ⇒ 00:42:08.040 Brian Pei: But this depends on that. But it’ll tell… dbt will tell you, like, hey, this is a circular dependency, we can’t do this.
411 00:42:08.430 ⇒ 00:42:12.340 Brian Pei: Anything else is… is fair game. Anything can happen.
412 00:42:12.640 ⇒ 00:42:26.079 Brian Pei: And again, this is the first time I’m looking at this repo, so, you know, sometimes… some intermediate tables use fact and dim tables to create them, some fact tables use intermediate tables to create them.
413 00:42:26.450 ⇒ 00:42:28.409 Brian Pei: It all comes from somewhere.
414 00:42:28.620 ⇒ 00:42:32.509 Brian Pei: upstream. It’s just… it is what it is. …
415 00:42:33.580 ⇒ 00:42:41.859 Brian Pei: And then, yeah. And then, let’s see, that’s intermediate. Yeah, and then marts is usually, like, final tables.
416 00:42:42.040 ⇒ 00:42:51.910 Brian Pei: reporting tables, dimension tables, and they’re organized by… wow, there’s a lot of stuff in here. They’re organized by, business segment. So you got customers.
417 00:42:52.400 ⇒ 00:42:58.009 Brian Pei: You got sales, you got shipments, you got marketing. This is just, again, for organization.
418 00:42:58.180 ⇒ 00:42:59.010 Brian Pei: you know.
419 00:42:59.130 ⇒ 00:43:07.510 Brian Pei: This client just… they care about customers and inventory and marketing, and they’re all just in different places, because they’re all probably separate mini-projects.
420 00:43:11.280 ⇒ 00:43:16.020 Brian Pei: So… That is…
421 00:43:16.590 ⇒ 00:43:22.659 Brian Pei: everything you need to know about dbt, and I know we didn’t really paracode today, but we can pair code next time.
422 00:43:22.930 ⇒ 00:43:28.150 Brian Pei: I actually, I have to run a little bit early today, but… what I would say is…
423 00:43:28.560 ⇒ 00:43:33.890 Brian Pei: Try to follow the instructions on the right to get dbt to work in your terminal.
424 00:43:34.360 ⇒ 00:43:38.450 Brian Pei: You’ll know it works when you type in dbt
425 00:43:38.750 ⇒ 00:43:43.010 Brian Pei: dash dash version, and it tells you what version of dbt you’re on.
426 00:43:43.340 ⇒ 00:43:47.160 Brian Pei: You’ll probably have to create a virtual environment, that’s fine.
427 00:43:47.360 ⇒ 00:43:53.409 Brian Pei: you know, I think all the steps are there. Try to get dbt to run locally on your machine.
428 00:43:53.870 ⇒ 00:43:57.849 Brian Pei: And then, now that you kind of know the structure of, like, models and stuff.
429 00:43:58.030 ⇒ 00:44:02.439 Brian Pei: You can still start to write models from your checklist.
430 00:44:02.700 ⇒ 00:44:17.690 Brian Pei: Because I think we looked at enough stuff that you kind of understand. It’s not all gonna come together at once, because we went through a lot today, but the more you go through here, and it’s good that there’s already stuff in here, because you can use it as a reference.
431 00:44:18.640 ⇒ 00:44:24.760 Brian Pei: Then the more you’ll get comfortable knowing where to put business logic, and knowing where to…
432 00:44:24.910 ⇒ 00:44:41.299 Brian Pei: use a macro or whatever it is. Knowing how to call a source, which, you know, we already see, like, line 8, that’s how you call a source. It’s all… it’s all there for you, and it’s all basically just SQL, so if you’re comfortable with SQL, this should be… this should be, like, not that big of a problem.
433 00:44:41.690 ⇒ 00:44:50.670 Brian Pei: … Cool. So… yeah, next time. So, basically, try to get as far as you can on your own.
434 00:44:50.910 ⇒ 00:44:54.290 Brian Pei: And next time we can actually start
435 00:44:55.120 ⇒ 00:45:00.010 Brian Pei: testing, running dbt models in your laptop.
436 00:45:01.050 ⇒ 00:45:04.840 Brian Pei: with these commands on the right and stuff like that. So we’ll do that next time.
437 00:45:05.290 ⇒ 00:45:16.030 Annie Yu: Okay, yeah, I’ll try to get this set up. I actually tried, like, asking ChatGPT, but I couldn’t really figure it out, but I’ll try cursor this time.
438 00:45:16.030 ⇒ 00:45:20.609 Brian Pei: Cursor’s good because Cursor looked at your repo and knew that it’s BigQuery.
439 00:45:20.900 ⇒ 00:45:22.040 Annie Yu: yeah.
440 00:45:22.040 ⇒ 00:45:25.679 Brian Pei: ChatGPT wouldn’t have known that you’re using BigQuery here.
441 00:45:25.680 ⇒ 00:45:26.630 Annie Yu: Yeah, yeah.
442 00:45:26.630 ⇒ 00:45:27.180 Brian Pei: Yeah.
443 00:45:27.410 ⇒ 00:45:33.359 Brian Pei: But yeah, okay. I’m gonna run, I will… I’m on Slack, so you can also just ping me whenever you want.
444 00:45:33.750 ⇒ 00:45:35.510 Annie Yu: Yeah, thank you very much, Brian.
445 00:45:35.860 ⇒ 00:45:37.539 Brian Pei: Thank you! Have a good one.
446 00:45:37.540 ⇒ 00:45:38.900 Annie Yu: Bye.