Meeting Title: SQL-Refresher Date: 2024-06-19 Meeting participants: Jared Patterson, Nicolas Sucari, Atharv Gudi, Priyadharshini Aravindan, Uttam Kumaran, Shankar Krishna Varma
WEBVTT
1 00:00:18.920 ⇒ 00:00:27.860 Priyadharshini Aravindan: Now, like Karl and Anawan from pulling, led Drop, Banita, Ningal, and Aura to put her on the ground.
2 00:00:37.585 ⇒ 00:00:37.890 Shankar Krishna Varma: Up!
3 00:00:38.720 ⇒ 00:00:39.750 Shankar Krishna Varma: How do we get.
4 00:05:18.300 ⇒ 00:05:19.230 Uttam Kumaran: Hey? Everyone!
5 00:05:19.390 ⇒ 00:05:21.540 Uttam Kumaran: Good morning, or Good evening.
6 00:05:23.220 ⇒ 00:05:25.120 Shankar Krishna Varma: Good evening, good morning!
7 00:05:25.400 ⇒ 00:05:26.220 Atharv Gudi: Hello!
8 00:05:26.220 ⇒ 00:05:26.880 JARED PATTERSON: Hope.
9 00:05:28.430 ⇒ 00:05:29.495 Uttam Kumaran: How’s everyone doing?
10 00:05:32.440 ⇒ 00:05:34.050 JARED PATTERSON: Doing well yourself.
11 00:05:34.750 ⇒ 00:05:35.800 Priyadharshini Aravindan: Yeah. Hi, good.
12 00:05:37.760 ⇒ 00:05:38.580 Uttam Kumaran: Aye
13 00:05:39.100 ⇒ 00:05:48.990 Uttam Kumaran: cool. I just today I wanted to just do a brief call about SQL. And do a little SQL. Refresher. I know some folks
14 00:05:49.386 ⇒ 00:05:51.319 Uttam Kumaran: on the team are doing
15 00:05:52.913 ⇒ 00:05:59.706 Uttam Kumaran: little bit of an introduction to sequel on sequel. Zoo, so just wanted to take some time and answer any questions.
16 00:06:00.080 ⇒ 00:06:09.452 Uttam Kumaran: We’re also kind of just give you an overview of kind of some of the stuff that, like some of the sequel that we’re writing for clients
17 00:06:09.990 ⇒ 00:06:35.150 Uttam Kumaran: And for folks that I know Jared. You’re mainly on this, helping me on the sales side, so we’ll kind of come back to sequel zoo and stuff, but I think this recording will probably be helpful. I don’t know. With our view you have. You have familiarity with sequel already. So any questions I can answer there. I’m more than happy to. So I guess Priya and Akshay and Athar, how did it go?
18 00:06:35.200 ⇒ 00:06:36.559 Uttam Kumaran: What sequels do.
19 00:06:40.570 ⇒ 00:06:47.569 Atharv Gudi: I thought it wasn’t bad. I I got a pretty good refresher. I think I’m familiar with most of what sequel Zoo’s been able to
20 00:06:48.150 ⇒ 00:06:49.090 Atharv Gudi: beach.
21 00:06:49.590 ⇒ 00:06:50.170 Uttam Kumaran: Okay.
22 00:06:52.400 ⇒ 00:07:07.240 Priyadharshini Aravindan: Yeah, same with me I mean, it’s been very long since I did some SQL, but it has been a good refresher. And thankfully I was able to remember most of the syntax. So it was easy also.
23 00:07:07.240 ⇒ 00:07:07.910 Uttam Kumaran: Okay.
24 00:07:08.350 ⇒ 00:07:09.310 Uttam Kumaran: Okay. Great.
25 00:07:12.140 ⇒ 00:07:12.909 Uttam Kumaran: I’ll say.
26 00:07:13.610 ⇒ 00:07:14.719 Akshay kumar.G: Yeah, it was good, like.
27 00:07:14.790 ⇒ 00:07:21.149 Akshay kumar.G: awesome. I was also learning sequel at my school time. So it was around 2 years ago. So actually, the website is complete.
28 00:07:21.480 ⇒ 00:07:23.273 Akshay kumar.G: Yeah, like, I’m able to grasp things.
29 00:07:24.420 ⇒ 00:07:32.749 Uttam Kumaran: Was there anything in particular that was interesting that you learned or something you didn’t know before that, like you kind of learned through sequel. Zoo.
30 00:07:33.890 ⇒ 00:07:37.400 Akshay kumar.G: Yeah, like in my school times, like they didn’t paint till the death like
31 00:07:37.630 ⇒ 00:07:44.779 Akshay kumar.G: they didn’t go till like the joints and all like select within select. So those things like London equals.
32 00:07:45.640 ⇒ 00:07:46.490 Uttam Kumaran: Okay.
33 00:07:53.230 ⇒ 00:07:54.819 Nicolas Sucari: Hi guys morning.
34 00:07:55.470 ⇒ 00:07:55.870 Uttam Kumaran: There you go!
35 00:07:55.870 ⇒ 00:07:58.080 Nicolas Sucari: Tuning. I don’t know. Yeah.
36 00:07:58.710 ⇒ 00:07:59.630 Nicolas Sucari: save them.
37 00:08:00.020 ⇒ 00:08:08.176 Uttam Kumaran: Hey? Okay, so let me give you like let me think about a good example to kind of share to kind of walk through.
38 00:08:08.957 ⇒ 00:08:10.749 Uttam Kumaran: Give me one second.
39 00:08:21.570 ⇒ 00:08:24.200 Uttam Kumaran: So here is.
40 00:08:25.160 ⇒ 00:08:30.153 Uttam Kumaran: here’s like an example of some production code that we have for
41 00:08:30.710 ⇒ 00:08:35.159 Uttam Kumaran: This software, called Shopify Shopify, is used by a client of ours.
42 00:08:35.735 ⇒ 00:08:40.894 Uttam Kumaran: to basically measure their business. And in this in this
43 00:08:41.559 ⇒ 00:08:51.347 Uttam Kumaran: in this sequel file, we’re using a software called Dbt to basically execute the sequel. So some stuff will look like sequel. Some stuff will not.
44 00:08:51.770 ⇒ 00:08:55.599 Uttam Kumaran: The other thing is that the basically the
45 00:08:55.850 ⇒ 00:09:25.290 Uttam Kumaran: the tough part about SQL is actually not like writing all the functions and things like that. It’s actually knowing what to write. And it takes like a really you know, a really, really strong understanding the business. Because basically, you’re modeling business concepts and data. So this is why, you know, I really enjoyed data is because it takes a really good on. You need to have a really good understanding how the business works. I used to work with the business leaders to basically model out their business in data.
46 00:09:25.350 ⇒ 00:09:31.583 Uttam Kumaran: model things out, meaning using all those different functions, all the different joins selecting the right fields.
47 00:09:32.394 ⇒ 00:09:37.210 Uttam Kumaran: And so you know, in this file alone. You can see that we’re
48 00:09:37.592 ⇒ 00:09:55.097 Uttam Kumaran: using several different functions. So all our all of our SQL runs on Snowflake. So snowflake does have some different functions than what you may have found in sequel. Zoo but you can see things like converting time zones, doing mins and Maxes, we have sums.
49 00:09:57.700 ⇒ 00:10:00.210 Uttam Kumaran: we’re doing different expressions.
50 00:10:01.800 ⇒ 00:10:16.280 Uttam Kumaran: and this file, really like has everything right. We also have comments. So one of the things that we tried to do, although we don’t do a great job at is adding comments. So when you know, when when we’re coming to look at a file, we kind of understand what’s going on.
51 00:10:16.789 ⇒ 00:10:21.220 Uttam Kumaran: And you can see that there is some comments that is at
52 00:10:21.790 ⇒ 00:10:29.159 Uttam Kumaran: that that are at the actual column level, explaining a little bit about why a certain thing is the way it is.
53 00:10:29.565 ⇒ 00:10:33.710 Uttam Kumaran: So this is an example of like production code, where we are
54 00:10:33.770 ⇒ 00:10:41.910 Uttam Kumaran: bringing in some data from multiple different sources. We’re selecting different things, changing expressions, doing unions.
55 00:10:42.400 ⇒ 00:10:53.169 Uttam Kumaran: and then at the end, you know, kind of consolidating everything into one table again, continuing to have use case ones use different expressions.
56 00:10:53.792 ⇒ 00:10:57.660 Uttam Kumaran: Use a variety of functions, string concatenation.
57 00:10:58.287 ⇒ 00:11:01.450 Uttam Kumaran: A whole bunch of stuff to basically get our final table.
58 00:11:01.530 ⇒ 00:11:02.590 Uttam Kumaran: So
59 00:11:02.620 ⇒ 00:11:27.070 Uttam Kumaran: on the left is an example of one of our production repositories. So yesterday I mentioned to everybody in the channel. If you don’t mind sending me your github username. So I can begin to add, you and you guys can begin to. You know, poke around at some of these repositories. But basically, in here we have all of our different
60 00:11:27.540 ⇒ 00:11:32.470 Uttam Kumaran: SQL files that runs the reporting for this company. So
61 00:11:32.840 ⇒ 00:11:49.440 Uttam Kumaran: I don’t know. Maybe there’s like a hundred or 200 files. I don’t know but there’s a structure to the way these folders are structured that, you know everyone will kind of learn as as we kind of go along. But I wanted to give you a little bit of sense of like we have sequel that’s as simple as this.
62 00:11:49.480 ⇒ 00:11:51.420 Uttam Kumaran: just like a simple select
63 00:11:51.440 ⇒ 00:11:54.089 Uttam Kumaran: from like a a simple source
64 00:11:54.140 ⇒ 00:11:55.135 Uttam Kumaran: to
65 00:11:57.640 ⇒ 00:12:01.030 Uttam Kumaran: to something like, let’s see
66 00:12:04.990 ⇒ 00:12:12.641 Uttam Kumaran: to something like this, where we’re bringing a lot of data together from multiple different sources for a reporting table that goes directly to
67 00:12:13.320 ⇒ 00:12:15.920 Uttam Kumaran: like the CEO and the and the CEO.
68 00:12:17.740 ⇒ 00:12:23.560 Uttam Kumaran: So yeah, that kind of shows the breath. I guess today, I really wanted to, you know, focus on
69 00:12:23.600 ⇒ 00:12:25.100 Uttam Kumaran: different.
70 00:12:25.150 ⇒ 00:12:27.069 Uttam Kumaran: you know. SQL
71 00:12:27.860 ⇒ 00:12:33.825 Uttam Kumaran: functions that we use, and like kind of where to find them. So if I go to snowflake functions.
72 00:12:35.045 ⇒ 00:12:51.189 Uttam Kumaran: a lot of what we use is coming from the Snowflake documentation. So this is kind of like what I learned basically over time. Just, you know, doing data work in the real world is leveraging these functions to really speed up
73 00:12:51.632 ⇒ 00:13:06.917 Uttam Kumaran: and help to do different calculations. So under snowflake documentation, if you just go to Snowflake, Docs snowflake.com. You’ll see all this and basically they have a lot of great introductions that I’ll be, you know, assigning to each of you guys
74 00:13:07.250 ⇒ 00:13:21.047 Uttam Kumaran: that will kind of teach you how to figure into login to snowflake leverage. Snowflake, write some data models. But in here there are really some great tutorials and some some great ability to actually
75 00:13:21.590 ⇒ 00:13:34.469 Uttam Kumaran: to create tables and to leverage functions. And all the function references. Are basically here in Doc. So everything you can see from aggregating functions like counts, but also men’s Maxes
76 00:13:34.560 ⇒ 00:13:44.160 Uttam Kumaran: ton of different logic, some of which, you know, you may use, some of which are pretty like narrow in their use case some statistical related functions.
77 00:13:45.590 ⇒ 00:13:57.973 Uttam Kumaran: and then also expressions about dates, conversions, geospatial related functions. Numeric vector string. So there’s a lot of great functions that we use.
78 00:13:58.600 ⇒ 00:14:17.009 Uttam Kumaran: So learning all these functions and learning when to use them will come over time. And we’ll come by working with, you know, folks that are more senior like myself and other folks in the team, because you’ll be trying to do something. And basically, you know, I’ll mention that. Hey? There’s a function that really easily does that but we leverage a ton of different functions.
79 00:14:17.359 ⇒ 00:14:26.870 Uttam Kumaran: And Snowflake is always coming out with new functions. So I basically read every snowflake release that comes out. And I’m kind of always sharing with the team.
80 00:14:27.650 ⇒ 00:14:30.919 Uttam Kumaran: you know, what’s the latest and greatest that we could leverage
81 00:14:31.433 ⇒ 00:14:36.579 Uttam Kumaran: so the basic thing that I maybe I’ll I’ll do after today is one again.
82 00:14:36.988 ⇒ 00:14:42.009 Uttam Kumaran: Just wanna confirm if everybody is able to get into Github and one password
83 00:14:43.023 ⇒ 00:14:48.530 Uttam Kumaran: I know, Jared. You sent me your Github was everyone else able to
84 00:14:48.670 ⇒ 00:14:52.679 Uttam Kumaran: set that up. If not, maybe we could just do that on the call, so I can add everybody
85 00:14:53.310 ⇒ 00:14:54.889 Uttam Kumaran: into that right now.
86 00:15:06.335 ⇒ 00:15:08.890 Uttam Kumaran: Priya Darb Akshay, how about you guys?
87 00:15:10.760 ⇒ 00:15:11.430 Uttam Kumaran: I’m so.
88 00:15:11.430 ⇒ 00:15:14.540 Atharv Gudi: And my Github, I’ll let you know as soon as I can do that.
89 00:15:23.270 ⇒ 00:15:27.000 Uttam Kumaran: Akshay Priya, were you able to create a Github account.
90 00:15:32.207 ⇒ 00:15:35.550 Priyadharshini Aravindan: No, not yet. I’ll do it now. Yeah.
91 00:15:36.130 ⇒ 00:15:40.389 Uttam Kumaran: Yeah, if you could create the Github account and also log into one password.
92 00:15:41.227 ⇒ 00:15:43.019 Uttam Kumaran: That would be
93 00:15:44.615 ⇒ 00:15:45.660 Uttam Kumaran: amazing.
94 00:15:49.440 ⇒ 00:15:53.432 Uttam Kumaran: The other thing that I’m gonna go through today is kind of create
95 00:15:54.000 ⇒ 00:16:00.099 Uttam Kumaran: for the folks on the analyst side, like a little bit of a next steps. On learning snowflake
96 00:16:00.403 ⇒ 00:16:08.599 Uttam Kumaran: so I’ll be sending that out. The other thing that I’ll be doing is also, I’ve spoken to folks on the team, so we’ll begin to pair you up
97 00:16:09.275 ⇒ 00:16:11.030 Uttam Kumaran: with folks internally.
98 00:16:11.486 ⇒ 00:16:17.900 Uttam Kumaran: That you can bounce questions off of, and then next week we’ll plan on actually working on some
99 00:16:18.350 ⇒ 00:16:25.499 Uttam Kumaran: taking on like some tickets and some real world problems that will actually go into production.
100 00:16:25.950 ⇒ 00:16:27.770 Uttam Kumaran: So we can begin working on that.
101 00:16:28.099 ⇒ 00:16:35.760 Uttam Kumaran: I think at our view, have a couple of more things that you’re working on, so I’ll kind of let you kind of continue to run through Dbt, and things like that
102 00:16:36.157 ⇒ 00:16:59.930 Uttam Kumaran: and then for pri. And actually, I think it’s great that you guys quickly pick up sequel. So that’s helpful for me to know so that we can. I can continue to throw a couple of other things at you. I, in order to kind of get you guys into the system in terms of like tickets, it would. I’d really need github. So as soon as you guys can get that and one password so that I can share some credentials to log into snowflake.
103 00:17:00.583 ⇒ 00:17:25.199 Uttam Kumaran: Other than that. Yeah. I wanna talk a little. I sent a note about putting Brain Forge on the resume if anyone needs any sort of like help with doing that, let me know. Just like context. On linkedin. I’m gonna have everyone wait just for maybe a week or so. We’re gonna have. We’re gonna do a little bit of a marketing push on Linkedin. So I just want folks to
104 00:17:25.680 ⇒ 00:17:35.070 Uttam Kumaran: just way to set before updating their linkedin so that we can drive every drive some traffic to the website. And you can kind of share out a little bit about what brain forge does.
105 00:17:37.210 ⇒ 00:17:38.770 Uttam Kumaran: Apart from that.
106 00:17:39.956 ⇒ 00:17:42.800 Uttam Kumaran: I didn’t have anything else really was
107 00:17:43.180 ⇒ 00:17:51.269 Uttam Kumaran: was hoping to answer some sequel questions with, you guys basically feel good about it. And I feel good. If there’s anything else I can show in the repository.
108 00:17:51.603 ⇒ 00:17:57.970 Uttam Kumaran: Let me know. I’m happy to walk through it. It’s a little bit specific to dbt, but yeah, let me know.
109 00:18:02.170 ⇒ 00:18:04.410 JARED PATTERSON: Yeah, I was curious for, oh.
110 00:18:04.630 ⇒ 00:18:05.330 JARED PATTERSON: sorry. Yeah.
111 00:18:05.330 ⇒ 00:18:07.939 Nicolas Sucari: No, no! Go ahead. Go ahead, Jared. Yeah. I’ll go.
112 00:18:08.350 ⇒ 00:18:12.280 JARED PATTERSON: I was just like interested in seeing. Kinda
113 00:18:12.590 ⇒ 00:18:16.570 JARED PATTERSON: and I guess, like, it’s pretty complex, like a basic version of like.
114 00:18:16.730 ⇒ 00:18:27.970 JARED PATTERSON: okay, so like when we’re 1st starting out with a client like this is what we’re doing with sequel. This is what we’re doing with Snowflake, and just kind of seeing how that process works out like, I don’t necessarily need to see all the code and stuff. But
115 00:18:28.360 ⇒ 00:18:35.139 JARED PATTERSON: like for someone like me that, like I’ve learned sequel I’ve been teaching myself. But the snowflake and Dbt stuff is all pretty
116 00:18:35.440 ⇒ 00:18:37.590 JARED PATTERSON: foreign to me. So just yeah.
117 00:18:38.630 ⇒ 00:18:44.296 Uttam Kumaran: Yeah. So let me walk. Let’s walk through that great question. So I’ll go ahead and just pull up.
118 00:18:47.360 ⇒ 00:18:47.860 Uttam Kumaran: I’ll go.
119 00:18:47.860 ⇒ 00:18:48.540 Nicolas Sucari: So.
120 00:18:49.430 ⇒ 00:18:56.419 Nicolas Sucari: tell me, if you have, like a diagram on how we are connecting from the data, from the clients to Snowflake, and that probably that should help too.
121 00:18:56.940 ⇒ 00:18:57.360 Uttam Kumaran: Yeah.
122 00:18:57.360 ⇒ 00:18:59.403 Nicolas Sucari: I don’t know if we have something like that.
123 00:19:00.130 ⇒ 00:19:04.570 Uttam Kumaran: Yeah, I think. I think I might have something. Hold on.
124 00:19:05.310 ⇒ 00:19:05.900 Nicolas Sucari: Okay.
125 00:19:24.490 ⇒ 00:19:26.049 Uttam Kumaran: Give me one second.
126 00:20:16.380 ⇒ 00:20:23.907 Uttam Kumaran: So I just shared this diagram in slack. This is a good diagram, although good point.
127 00:20:26.320 ⇒ 00:20:32.113 Uttam Kumaran: good point, Nico, I will actually maybe we could prepare a little presentation on
128 00:20:32.580 ⇒ 00:20:57.569 Uttam Kumaran: like data movement. But here are, here’s 1 diagram. And here’s 1 more. That kind of share a little bit about high level and like the detail level. But how we actually do like data movement and what pieces touch wish part. If you actually wanna look at the second picture that I sent, it’s a little bit more helpful. Basically. What you’re seeing is that we have a lot of different data sources.
129 00:20:57.590 ⇒ 00:21:18.849 Uttam Kumaran: So we have different databases that we’re bringing in. We have different files, applications, applications that may be stuff they use for transaction processing, running, marketing, collecting, customer information. We bring all that in using this tool called 5 Tran. We also bring some data in custom using our own pipelines. But for the most part we use this tool called 5 tran
130 00:21:19.222 ⇒ 00:21:42.289 Uttam Kumaran: basically, it’s a, it’s a managed service that pulls data from each of those Api structures it and drops it in snowflake really nicely. And I’ll show you kind of where that lands in Snowflake. All that data gets put into Snowflake, and then we use Dbt to transform it. Transforming means we may get a table that’s every single transaction.
131 00:21:42.470 ⇒ 00:22:00.550 Uttam Kumaran: and we may get a table with every customer. Let’s say we want to answer a simple question like which customer spent the most right. We 1st need to join those 2 tables together, and then we probably need to date. Take customer Id, and then take the sum of spend. Right? That’s data modeling.
132 00:22:00.640 ⇒ 00:22:08.079 Uttam Kumaran: You’re doing a join. And then you’re running a select. And you’re doing some aggregation. That’s all the stuff that’s going here.
133 00:22:08.160 ⇒ 00:22:35.522 Uttam Kumaran: The orchestration piece piece that you see at the bottom is actually like having that run every day. Right? So you’re you may not you? You’re not only looking at that like at a point in time. You wanna look at it every day for all historical days. So there’s a orchestration piece of like this runs every 6 h, or this runs every day, and we kind of handle that as well. And then, once that data is modeled, we actually get it out of snowflake. So
134 00:22:35.830 ⇒ 00:23:00.740 Uttam Kumaran: sometimes you can use this for data science or people use the data within snowflake. For the most part our clients use it on the business and using business intelligence tools, which will also be walking through like real, like looker like tableau. So that’s actually what they’re consuming is they’re consuming graphs, charts, and data points in a visualization tool. And that is our actual end deliverable.
135 00:23:00.740 ⇒ 00:23:15.107 Uttam Kumaran: However, most of the work happens before that. So that’s probably like 20% of the work, 80% of the work happens before we even get to that stage which makes this job good and bad one. It’s good in that
136 00:23:16.080 ⇒ 00:23:20.840 Uttam Kumaran: like I. We don’t have to explain too much about how some of these, like
137 00:23:21.000 ⇒ 00:23:24.260 Uttam Kumaran: back of the back office things work all the time.
138 00:23:24.410 ⇒ 00:23:25.720 Uttam Kumaran: However.
139 00:23:25.770 ⇒ 00:23:30.429 Uttam Kumaran: some one small issue in the middle here could affect
140 00:23:30.590 ⇒ 00:23:53.119 Uttam Kumaran: what ends up like in the baked cake and the bi tool. And so we always are. We have different ways that we run observability. And we do different things around. Making sure that data isn’t stale and data is full and calculations are correct. But that’s kind of our challenge is that we run things all the way from
141 00:23:53.120 ⇒ 00:24:05.038 Uttam Kumaran: the raw data to modeled in the warehouse. To then the Bi tool and then are able to, you know. Have a conversation with the client. About what we’re doing. So here in Snowflake.
142 00:24:06.110 ⇒ 00:24:09.179 Uttam Kumaran: What you’ll see is on the left here
143 00:24:09.620 ⇒ 00:24:13.800 Uttam Kumaran: and again. I’ll everyone will kind of walk through the introduction to Snowflake.
144 00:24:14.234 ⇒ 00:24:21.730 Uttam Kumaran: Kind, of course, and you’ll become familiar with with all this. So on the left. Here we have our databases.
145 00:24:22.073 ⇒ 00:24:29.729 Uttam Kumaran: And here’s where we have our 5 train database. So this database has a bunch of schemas so you can think of these are just sub folders
146 00:24:30.130 ⇒ 00:24:39.720 Uttam Kumaran: and then within a different schema, you have data. So let’s say, we’re looking at shopify. So from shopify. We have all these tables that come in raw.
147 00:24:40.320 ⇒ 00:24:45.550 Uttam Kumaran: raw meaning. These are just simply like representations of the data that shopify has
148 00:24:45.690 ⇒ 00:24:53.759 Uttam Kumaran: in this format. It’s not really helpful for us, because yes, I can see everything from discount codes to customers. But
149 00:24:53.860 ⇒ 00:25:06.999 Uttam Kumaran: I need those joins, and I need that sort of modeling in order to make it easy for the analyst on the team to represent in a in a business intelligence tool. So, for example, if I was to look at
150 00:25:07.900 ⇒ 00:25:11.179 Uttam Kumaran: let’s look at, for example, like the Facebook ads table.
151 00:25:11.390 ⇒ 00:25:13.892 Uttam Kumaran: And in here I can look at
152 00:25:14.780 ⇒ 00:25:16.449 Uttam Kumaran: like, add history.
153 00:25:16.660 ⇒ 00:25:23.100 Uttam Kumaran: What I can do here is I can just basically like preview this table. And you could see that. Okay, great. There’s a bunch of different information here.
154 00:25:23.420 ⇒ 00:25:24.430 Uttam Kumaran: This is
155 00:25:24.540 ⇒ 00:25:33.218 Uttam Kumaran: you, you know, broadly, you could think of this is just like an excel spreadsheet, like, there’s a bunch of data here that comes from Facebook that gets updated every single day.
156 00:25:33.630 ⇒ 00:25:59.320 Uttam Kumaran: the challenge is again, every single one of these tables is in a different domain we’re looking at. We’re looking at everything from marketing data to supply chain data to inventory to customer service, to sales, to revenue, to product costs like the entire business. Right? So it so this is in part on the analytics engineering side. You really need to have a good understanding of
157 00:25:59.390 ⇒ 00:26:08.739 Uttam Kumaran: the business, or you need to have a stakeholder in the business that can really help you think through. How do you model data for their use case? The nice.
158 00:26:08.880 ⇒ 00:26:21.130 Uttam Kumaran: I would say. The nice thing is, I’ve been doing marketing data and stuff for a long time. So it’s not that hard for me anymore to think through. You know, marketing data. And that’s actually, you know, part of the onus of starting brain forge was
159 00:26:21.170 ⇒ 00:26:38.610 Uttam Kumaran: marketing data sales data, a lot of the way you model. It is actually the same across businesses. Meaning every business that that takes in money, models it as transactions, and then summing up a transaction to a total sum is is the same thing.
160 00:26:38.610 ⇒ 00:26:53.820 Uttam Kumaran: So there are commonalities between every business, right? So everybody has sales. Everybody may have customer service. Everybody uses Facebook or Instagram or something to run marketing ads. So in those ways, a lot of these businesses aren’t unique.
161 00:26:53.900 ⇒ 00:26:59.400 Uttam Kumaran: meaning they’re using the same vendors to run the same types of things. And we get the same data out.
162 00:26:59.510 ⇒ 00:27:27.770 Uttam Kumaran: The things that are unique is, for example, this client. They have specific warehouses that they use to send their products out. Not every company has those warehouse warehouses and those ways of that where the data is shaped. So there is some stuff that’s unique. But but I would say, on average, it’s usually like 60, 40, 70, 30 not unique to unique meaning. I’ve seen it before, or someone on the team has seen it before to something’s that that’s unique.
163 00:27:28.055 ⇒ 00:27:32.480 Uttam Kumaran: And I know, Jared, we’ve been talking about. This is that’s the challenge on the sales side is
164 00:27:32.640 ⇒ 00:27:47.910 Uttam Kumaran: a lot of people come to us with problems that they think are very unique to them. And so on the sales side, we do think about how do we? You know, craft a story or show case studies that we’ve tackled things that they need. But frankly, on the data side, it’s
165 00:27:47.930 ⇒ 00:27:49.722 Uttam Kumaran: it’s the same thing.
166 00:27:50.633 ⇒ 00:28:17.320 Uttam Kumaran: so really, a lot of the way I learned is, I just was able to get thrown into working on data in real estate working on data in finance. And you just see these common patterns. And so data is really a lot of like pattern matching. And so a lot of the tasks that we’ll start with will be just basic changes to small tables. Understanding like the engineering process. But also, I wanna get you guys into snowflake as soon as possible this week.
167 00:28:17.639 ⇒ 00:28:22.729 Uttam Kumaran: So that you guys can begin creating tables and running selects and playing with some sample data.
168 00:28:25.460 ⇒ 00:28:47.170 Uttam Kumaran: yeah, I think it would be helpful, probably, to do a little bit of a a like a intro to snowflake as soon as like, you guys are also able to run through their like introduction course. And then I also wanna do I? Wanna kind of like, take you guys down this actual path, basically getting it to Snowflake.
169 00:28:47.280 ⇒ 00:29:03.339 Uttam Kumaran: I’ll show you guys 5 Tran, and then also begin to introduce you to the Vi tools. And that way you’ll basically see how the whole cake is baked again. The scope of this is very large, so I don’t want to overwhelm you with how many domains we’re working on, how many tables there are.
170 00:29:03.400 ⇒ 00:29:07.789 Uttam Kumaran: but basically give you the sense of like this is where data is coming from. Here’s how it lands.
171 00:29:07.820 ⇒ 00:29:18.490 Uttam Kumaran: Here’s how we’re modeling it. And then here’s how it it leaves and gets into the clients, hands and to even give you an example, of something that we’re doing today with a client.
172 00:29:18.520 ⇒ 00:29:21.151 Uttam Kumaran: So today we are
173 00:29:22.800 ⇒ 00:29:41.989 Uttam Kumaran: today, we’re working with a client on an analysis of their customer segmentation. Basically for one of our clients. They have 2 customer types. They have one that’s professionals. And they one that’s consumers, meaning like consumers like me and you, they have another group that’s more professional meaning. They’re buying on behalf of a business.
174 00:29:41.990 ⇒ 00:29:55.220 Uttam Kumaran: The customer asked us, hey? Do you guys, can you guys find a way to segment? These meaning, can you guys learn? Are there opportunities where we can target these customers differently. Do these customers have different buying patterns?
175 00:29:55.517 ⇒ 00:30:12.202 Uttam Kumaran: Does do one sell more? Do they live in different places? And so today, we’re actually doing a presentation for them, which what we did is we went all the way down to the source data. We found that when they check out they ask the customers, hey, are you guys businesses or you guys consumers?
176 00:30:12.510 ⇒ 00:30:18.159 Uttam Kumaran: We also looked at their email. So we looked like, Hey, are these business emails? Are these like gmails or Yahoos.
177 00:30:18.490 ⇒ 00:30:25.630 Uttam Kumaran: The 3rd thing is, we look if they’re buying frequently, if they’re buying frequently for this specific customer and what they sell.
178 00:30:25.670 ⇒ 00:30:54.599 Uttam Kumaran: we can. We know that they’re not a usual person, because some of these are big items. You don’t. They’re not. They don’t really have a lot of frequent buyers, and so we use that to basically create a a case when where it’s a flag. That’s you’re you are business owner, or you’re not and then we’re able to take that flag and then segment. All of our reporting by that segment meaning create a dimension. That’s that flag. And basically say, Okay, how much of sales is attributed to businesses versus, not how much. What is the average order value
179 00:30:54.600 ⇒ 00:31:07.130 Uttam Kumaran: for folks that come from businesses or not, and we noticed that and again, very nicely. We noticed that people that were businesses spend twice as much. They counted for about 70% of sales.
180 00:31:07.140 ⇒ 00:31:26.920 Uttam Kumaran: E, while being less than 50% of the entire amount of orders, meaning they’re buying more. They account for more sales. They come more often and buy and really the conversation we’re gonna have today with a client is one establishing a baseline saying, Hey, here’s what we found in the data. The second thing we’re gonna do
181 00:31:27.190 ⇒ 00:31:51.720 Uttam Kumaran: is try to provide them with opportunities to learn more right? Where? What are other questions that we can ask? Follow up questions that we can ask about the data? Are these folks concentrating in a specific area? Are there like 20 or 30 businesses that really buy the bulk of the business orders? The the other thing I wanted to do is we’re gonna have the head of marketing on that call. So I’m gonna talk to her about, hey? How are we targeting these people differently?
182 00:31:51.720 ⇒ 00:32:04.129 Uttam Kumaran: Are we if we are how are we targeting them differently? And basically the goal is to be, how do we grow that segment of customers right? Because if we notice that one of those people is equivalent to 4
183 00:32:04.260 ⇒ 00:32:21.280 Uttam Kumaran: or 5 consumers, it makes so much sense to go after those folks right? And so here’s an example where in this case, we’re even involved in the business decision of what happens after we get that data. We’re not often we’re, we’re we’re sometimes involved like that. Sometimes we’re not. But often
184 00:32:21.360 ⇒ 00:32:28.804 Uttam Kumaran: we have the most context about a problem because we’ve seen all the data associated with it. And we can have that conversation with a client. So
185 00:32:29.120 ⇒ 00:32:33.010 Uttam Kumaran: that’s an example of actually a conversation that’s happening today.
186 00:32:33.730 ⇒ 00:32:44.079 Uttam Kumaran: but again, want to give you guys like a full scope of a problem that we went from bringing that data in, from shopify finding and creating that flag, and then actually doing some analysis.
187 00:32:51.750 ⇒ 00:32:53.599 Uttam Kumaran: great any questions.
188 00:32:57.290 ⇒ 00:33:00.965 JARED PATTERSON: No, thank you. That that kind of cleared up my questions.
189 00:33:01.560 ⇒ 00:33:06.090 JARED PATTERSON: I just really wanted to see the the process of it all, how it starts and how it finishes. Type thing.
190 00:33:06.530 ⇒ 00:33:07.200 Uttam Kumaran: Yeah, and I know.
191 00:33:07.200 ⇒ 00:33:07.920 Nicolas Sucari: It’s not me.
192 00:33:08.340 ⇒ 00:33:10.460 Uttam Kumaran: It’s not easy yet to see
193 00:33:10.480 ⇒ 00:33:12.115 Uttam Kumaran: it in Snowflake.
194 00:33:12.810 ⇒ 00:33:14.481 Uttam Kumaran: but we’ll get there. We’ll get there.
195 00:33:15.760 ⇒ 00:33:21.509 Nicolas Sucari: I I’m new to Snowflake, too. But I I am finding it like really easy to
196 00:33:21.570 ⇒ 00:33:30.790 Nicolas Sucari: to to understand. I mean, you have the worksheets you can create. You can do like your queries and just clicking on the databases that you have on the left.
197 00:33:30.790 ⇒ 00:33:54.569 Nicolas Sucari: just like you can really quick! Go into the tables and see the information. Probably the most important part there is to understand where to look right, because there is like a lot of table. There is a lot of information and understanding how the data is like ordered there, and where to look is like the most important part, but you’ll see that it’s kind of easy to to go through Snowflake Yup.
198 00:33:54.880 ⇒ 00:34:03.130 Nicolas Sucari: And it’s really interesting tool. Yeah. And you can. You will start trying to write some queries, to run some some information there, and it will be easier. Yup.
199 00:34:03.930 ⇒ 00:34:04.660 Uttam Kumaran: Yeah.
200 00:34:09.190 ⇒ 00:34:22.279 Uttam Kumaran: So a lot of this knowledge kind of compounds, right? Like, we went from a simple select to like running a large query with a ton of different functions to then orchestrating that in another software.
201 00:34:22.611 ⇒ 00:34:26.150 Uttam Kumaran: To then bringing that into a piece of analysis. Right? So
202 00:34:26.320 ⇒ 00:34:41.749 Uttam Kumaran: we you go from almost like understanding one simple thing to now. Now, you need to understand the entire business problem, and then our job is to solve that problem for them. So when I talk to the client, I don’t talk about sequel like I talk about the business. However.
203 00:34:41.889 ⇒ 00:34:53.660 Uttam Kumaran: like my mind, is, only think like. I only think in sequel the like. Now, right? So when I, when I listen to them, I think about how, what are the ways that we solve that problem, using the technologies that we have cause. That’s the tools we have.
204 00:34:53.679 ⇒ 00:35:16.929 Uttam Kumaran: The the marketing person may be may be able to run better ads, maybe able to change the content, the sit, the the pricing people may be able to change pricing. But what we can do is basically surface these insights. And so that’s our job. And the those are the that’s the shovels that that we use, and the additionally like to even talk about the tools like we chose 5 train. We chose Snowflake because they’re the best shovels in the game.
205 00:35:17.556 ⇒ 00:35:37.520 Uttam Kumaran: And so for me, it’s it’s really easy to bring new people on to get folks on the team that have used Snowflake because Snowflake is really good. There are a lot of worse versions of Snowflake. There are a lot of worse versions of 5 Tran, and both of these tools are not so cheap.
206 00:35:37.530 ⇒ 00:35:40.834 Uttam Kumaran: All this stuffing is like kind of cheap for what it is
207 00:35:41.790 ⇒ 00:35:52.199 Uttam Kumaran: I will say I push all the clients to use it, and they really listen to recommendations. I tell them the time they’re gonna save and paying us to get the insights. It’s well worth paying for these tools.
208 00:35:52.240 ⇒ 00:36:03.527 Uttam Kumaran: and at at this at for frankly like I won’t I these days won’t even go take a client on if they’re not using one of these tools because it makes our job way harder.
209 00:36:04.440 ⇒ 00:36:09.840 Uttam Kumaran: the nice thing is like I’ve been using Snowflake for like 6 years. I’ve been using 5 train for like 5 years.
210 00:36:09.850 ⇒ 00:36:14.549 Uttam Kumaran: Both of these companies are not that old. So that’s like quite a long time, and
211 00:36:14.780 ⇒ 00:36:19.549 Uttam Kumaran: it’s not like second nature for me. So that’s the thing I’m I’m hoping to kind of get you guys
212 00:36:19.710 ⇒ 00:36:41.790 Uttam Kumaran: to see the value of some of these tools and some of these learnings and stuff like they’ll translate to other databases. But you’ll see how easy it is to run stuff. And unfortunately you may not get the appreciation for as much how much it takes care of until you go see, like one of these really crappy databases that really like is hard to use, and it’s really clunky.
213 00:36:42.104 ⇒ 00:36:49.710 Uttam Kumaran: But at the same time this is what is cutting edge right now. Snowflake is like the leading data warehouse product of the market. So.
214 00:36:54.900 ⇒ 00:36:57.609 JARED PATTERSON: Yeah, I think definitely like understanding and learning
215 00:36:57.620 ⇒ 00:37:06.280 JARED PATTERSON: a little bit more about it like you said, using kind of those intro courses. And even if we did something like this again, once, we kind of have that base understanding.
216 00:37:06.280 ⇒ 00:37:07.040 Uttam Kumaran: Yeah.
217 00:37:07.040 ⇒ 00:37:10.920 JARED PATTERSON: I remember like looking at job applications. A lot of them
218 00:37:11.250 ⇒ 00:37:12.799 JARED PATTERSON: like they just had
219 00:37:12.880 ⇒ 00:37:14.279 JARED PATTERSON: listed on there like
220 00:37:14.420 ⇒ 00:37:19.190 JARED PATTERSON: knowing Snowflake is a huge plus knowing dbt, stuff like that. So.
221 00:37:19.190 ⇒ 00:37:22.033 Uttam Kumaran: Yeah. And also you’ll you’ll realize that
222 00:37:22.610 ⇒ 00:37:33.570 Uttam Kumaran: knowing snow thing and knowing Dbt, are like huge things. But at the same time you could learn the basics pretty quickly. But again, and I talked to everybody here kind of a little bit like
223 00:37:33.710 ⇒ 00:37:37.029 Uttam Kumaran: learning about how everybody learns, and it seems for the most part.
224 00:37:37.570 ⇒ 00:37:45.159 Uttam Kumaran: most people wanna are good at learning by doing so. My job is try to get us to actually push some code into production.
225 00:37:45.220 ⇒ 00:37:56.539 Uttam Kumaran: have that mess up kind of understand like that whole process and work with people internally. Because that’ll give you the real like. Feel for how this all works. And then also again, like
226 00:37:56.610 ⇒ 00:37:59.079 Uttam Kumaran: for a resume or for your experience.
227 00:37:59.100 ⇒ 00:38:11.620 Uttam Kumaran: I want you to be able to say, Yeah, I I identified a business problem. We moved data in using an Etl tool. We then modeled it, using Dbt, and then I surfaced an insight using a bi tool.
228 00:38:11.710 ⇒ 00:38:13.410 Uttam Kumaran: That’s the entire job.
229 00:38:13.490 ⇒ 00:38:24.640 Uttam Kumaran: You just done the entire job there, right? So I’m trying to get everybody to do that within the next few weeks. Here. Because that’s that’s it. It’s it’s really that rinse and repeat
230 00:38:24.710 ⇒ 00:38:42.820 Uttam Kumaran: what what the challenges is like. You have to do that in multiple different domains. You may need to use multiple different tools. And then, of course, dealing with the business is like, always variable. But that’s what I want to get everybody to basically do one end to end thing where they see that flow.
231 00:38:44.380 ⇒ 00:39:04.440 Uttam Kumaran: And then again, we have everything ticketed out, and all this everything scoped out in terms of tasks for clients. So you’ll see, like, basically, how your Nico and I plan, and how we assign things and and how those things get executed and delivered. You guys are already seeing kind of the stand up bot that everybody does every day. Basically.
232 00:39:04.480 ⇒ 00:39:32.130 Uttam Kumaran: we’re we’re a remote team. So and I’m also not a big fan of like having daily meetings like that you might have heard of like daily stand ups and things like that. But I’m more of a fan of having meetings about topics like this, where we can have a discussion meetings just to get updates is not a great meeting. And so what we’re trying to think about is, how do we leverage slack and things like that to get updates so that we can plan? And I can read through. And we can understand where everybody’s at.
233 00:39:33.590 ⇒ 00:39:48.270 Uttam Kumaran: So that’s a great suggestion, Jared, I think what I’m gonna do. Now that I think everybody basically is is okay with with sequel. Zoo is, I want everybody to start the like introduction to Snowflake.
234 00:39:48.811 ⇒ 00:40:04.800 Uttam Kumaran: Which I will send, and then I’ll also get. I’ll also work with Patrick to get everybody. We have an internal snowflake for us. Where everybody can download like sample data and begin to play around, and it won’t affect any customers, so I will get us all in there as well.
235 00:40:07.090 ⇒ 00:40:09.809 Uttam Kumaran: So I want to get intro to snowflake. And then
236 00:40:09.860 ⇒ 00:40:14.850 Uttam Kumaran: I wanna plan something, maybe for Friday, if not tomorrow. I’m gonna
237 00:40:14.940 ⇒ 00:40:28.200 Uttam Kumaran: see whether I have time tomorrow. Basically just to walk through. You know what everybody did in Snowflake, and then kind of walk through. One example, W. As we get more closer to taking on tasks.
238 00:40:28.230 ⇒ 00:40:29.870 Uttam Kumaran: we’ll do some
239 00:40:30.269 ⇒ 00:40:50.719 Uttam Kumaran: sessions on how to work within like a data engineering team. How brain forge kind of executes on projects. Which we’re we’ve we’ve changed quite a bit since we started. But I think the main thing is, I wanna make sure you guys have all the tools and references of how to use Snowflake and how to write. SQL, the last piece.
240 00:40:51.022 ⇒ 00:41:09.380 Uttam Kumaran: I will. I wanna share with you guys is how to use the business intelligence tools. And then that’ll really give you the the top part which is going from model data to an to a piece of analysis. And then I think that’ll be enough to at least give the folks that are on the data analysis side questions to go answer.
241 00:41:09.550 ⇒ 00:41:26.945 Uttam Kumaran: and you’ll be able to write simple queries. Go use the business intelligence tools, and answer those questions and understand how we even prepare analyses to give to clients. You know, kind of like how I described today. So I’ll go ahead and send
242 00:41:27.550 ⇒ 00:41:31.939 Uttam Kumaran: I’ll go ahead and send that intro course in slack right after this.
243 00:41:32.629 ⇒ 00:41:39.649 Uttam Kumaran: And then I see that. Yeah, we have almost everybody’s github. So I’ll go ahead and add you to our Github team.
244 00:41:40.393 ⇒ 00:41:44.660 Uttam Kumaran: And then Patrick will add everybody to
245 00:41:44.810 ⇒ 00:41:48.091 Uttam Kumaran: Snowflake. So you should get like an invite from him.
246 00:41:50.550 ⇒ 00:41:51.900 Uttam Kumaran: and
247 00:41:52.000 ⇒ 00:41:55.929 Uttam Kumaran: yeah, any other questions I can answer right now.
248 00:42:03.090 ⇒ 00:42:22.299 Uttam Kumaran: Cool, if nothing else. I know it’s been a little bit of like these sorts of courses and kind of waiting on some stuff. So just bear with us as we kind of like. Plan out how best to kind of like, drop you guys in I think by next week we should be able to have a planning session to to basically think about what like a.
249 00:42:22.550 ⇒ 00:42:30.259 Uttam Kumaran: what like a task that you guys could take for the next 2 weeks. And data is and kind of let you guys run in parallel with learning.
250 00:42:30.546 ⇒ 00:42:36.949 Uttam Kumaran: So that’ll be our goal for, for you know, next week, and we’re going to be doing some planning on Friday for that so
251 00:42:39.660 ⇒ 00:42:51.620 Uttam Kumaran: cool? If nothing else. I can probably plan on chatting with you guys on slack. I know, Jared. We’re talking in like 20 min, so just I’ll look forward to that. So.
252 00:42:53.720 ⇒ 00:42:54.590 JARED PATTERSON: Sounds good.
253 00:42:55.140 ⇒ 00:42:55.740 Uttam Kumaran: Cool.
254 00:42:57.370 ⇒ 00:42:58.279 Nicolas Sucari: Thanks. Adam.
255 00:42:58.540 ⇒ 00:42:59.266 Uttam Kumaran: Thanks. Everyone.
256 00:42:59.630 ⇒ 00:43:00.270 Atharv Gudi: Alright!
257 00:43:00.270 ⇒ 00:43:01.589 Nicolas Sucari: You guys, bye, bye.
258 00:43:02.450 ⇒ 00:43:03.319 Priyadharshini Aravindan: Good. Thank you.