Meeting Title: Brainforge x Urban Stems: Next Phase Discussion Date: 2025-05-13 Meeting participants: Uttam Kumaran, Amber Lin, Emily Giant, Demilade Agboola, Alex K, Zack Gibbs
WEBVTT
1 00:00:46.786 ⇒ 00:00:47.403 Uttam Kumaran: Hello!
2 00:00:48.250 ⇒ 00:00:49.349 Emily Giant: He lives.
3 00:00:49.350 ⇒ 00:00:50.460 Alex K: Hey! There!
4 00:00:58.540 ⇒ 00:00:59.330 Amber Lin: Hi.
5 00:00:59.640 ⇒ 00:01:00.920 Emily Giant: Hi amber.
6 00:01:01.260 ⇒ 00:01:04.890 Amber Lin: Nice to meet you all. I’ve heard these names.
7 00:01:05.620 ⇒ 00:01:07.860 Uttam Kumaran: Yeah, yeah, here’s your name.
8 00:01:07.860 ⇒ 00:01:11.679 Emily Giant: And you as well tonight, and I’m super excited that you’re here.
9 00:01:12.050 ⇒ 00:01:27.249 Amber Lin: Me, too. I’ve been working on the roadmap so so much I’ve been hearing like, okay, this is the person that does this. This is the person that does this like. Okay, let me actually go talk to them. But I was like, no, they’re so. They’re so overwhelmed with Mother Day’s work.
10 00:01:27.510 ⇒ 00:01:31.810 Uttam Kumaran: So I was like, let’s let’s just work on the plan and our best
11 00:01:31.980 ⇒ 00:01:33.899 Uttam Kumaran: before we ask the follow up.
12 00:01:34.220 ⇒ 00:01:37.210 Emily Giant: You didn’t want to meet us last week. That was a good call.
13 00:01:41.420 ⇒ 00:01:46.749 Emily Giant: I think demolade is off on an errand I sent him on, so I’m sure he’ll be here soon.
14 00:01:46.950 ⇒ 00:01:57.836 Uttam Kumaran: That’s perfect. We can get started, I think. We all have context. And you know, Amber, I think, was really the primary and putting a lot of this together, but was a lot of interviews of of me and them a lot.
15 00:01:58.240 ⇒ 00:02:03.789 Uttam Kumaran: I don’t know if everyone had a chance to look. Take a look at the document, but it’s pretty expansive, I think.
16 00:02:03.960 ⇒ 00:02:11.090 Uttam Kumaran: as much as I learned in the 1st month. We learned a lot more in the second month. And I think we’re
17 00:02:11.280 ⇒ 00:02:32.150 Uttam Kumaran: we sort of broke it up into a couple of phases, I think, especially in the last few weeks. There’s a lot of really like fundamental problems even today that that we’re dealing with, that we wanted to make sure that get that get on the roadmap. So how we like to begin Zack, should we walk through the document? Or should we just like talk about any key areas.
18 00:02:33.095 ⇒ 00:02:43.184 Zack Gibbs: Let’s walk. Let’s walk through the document by section, and then just chat through. Make sure that we are aligned and can chat about if we have questions or other commentary.
19 00:02:43.790 ⇒ 00:02:47.999 Zack Gibbs: and then I think the the overall goal here is to
20 00:02:48.180 ⇒ 00:02:58.520 Zack Gibbs: make sure that we agree on the rough problem areas, the rough priority of those problem areas and then creating like, what is the action plan? And who’s take? Who’s taking? What right.
21 00:03:00.220 ⇒ 00:03:03.720 Uttam Kumaran: Yes, that’s correct. So I again, basically
22 00:03:04.220 ⇒ 00:03:30.982 Uttam Kumaran: amber. If you wanna be sort of going through the document and then I can be here for commentary. Yeah. Again, I just wanna make sure we agree on the scope and the timeline. And then I think now that we can work a little bit slower. We have some fundamental things that that need to happen. But again, it’s a lot of it’s a lot of just going a little bit deeper into the discussions we had after the 1st month. So
23 00:03:32.050 ⇒ 00:03:34.556 Uttam Kumaran: amber, do you wanna do you wanna share?
24 00:03:36.970 ⇒ 00:03:37.515 Uttam Kumaran: Yeah.
25 00:03:38.620 ⇒ 00:03:41.680 Amber Lin: Sorry I was muted. I can share screen and we can walk through them.
26 00:03:46.470 ⇒ 00:03:47.690 Amber Lin: So
27 00:03:51.770 ⇒ 00:04:05.889 Amber Lin: I hope you guys had a chance to just skim through. I didn’t expect you to read everything because we’re gonna go through it in a meeting, anyways. So I structured this document based on our data structure, right? So we go from the
28 00:04:06.340 ⇒ 00:04:15.859 Amber Lin: So the ingestion part was red ship. And then we go to Dvt. And then I also talk about looker. So it’s kind of by
29 00:04:15.970 ⇒ 00:04:45.230 Amber Lin: how the data flows through. So it makes more sense to us. And for each of these major tools we did a did a diagnostic. So we looked at what are the current problems we have? And then we propose some fixes and also a implementation plan. So this document is kind of detailed. But hopefully, I can provide a high level enough overview, and if you’re if we want to go, dive deeper into any of them, we can do that as well.
30 00:04:46.640 ⇒ 00:04:47.680 Amber Lin: So.
31 00:04:47.860 ⇒ 00:04:54.969 Amber Lin: 1st of all, this is an executive summary. We had a few very interesting findings, so we found that
32 00:04:55.580 ⇒ 00:05:09.229 Amber Lin: at least when we looked at a looker dashboard reports, we found that 80% of them were not used. 90% of the looks don’t get queried. And we had very interesting findings of what’s really
33 00:05:09.470 ⇒ 00:05:23.149 Amber Lin: driving the cost, but is unnecessary. So these are all the things that we want to address, and we want to save the money for you and spend it where it’s most needed. Right? So if we have clarity on
34 00:05:23.520 ⇒ 00:05:32.000 Amber Lin: what’s going on, we can fix it. So this is sort of 1st phase of what we’re trying to do. We’re trying to identify
35 00:05:32.310 ⇒ 00:05:37.810 Amber Lin: what is going under the radar, and then next phase will be, we’ll be fixing them.
36 00:05:39.230 ⇒ 00:05:44.962 Zack Gibbs: 1 1 man not super surprised by the the non usage.
37 00:05:45.440 ⇒ 00:05:45.910 Amber Lin: Yeah.
38 00:05:45.910 ⇒ 00:05:53.672 Zack Gibbs: And percentages. But what was the timeframe like? Wasn’t queried within what timeframe? And wasn’t
39 00:05:54.546 ⇒ 00:05:56.287 Amber Lin: Let me, what was it?
40 00:05:56.920 ⇒ 00:05:57.540 Amber Lin: Yeah.
41 00:05:58.016 ⇒ 00:06:09.770 Amber Lin: I can send you guys these reports that we pulled from Looker after this meeting or Utam. If you had a chance, you can just drop it in an external channel. Here, how do I.
42 00:06:10.660 ⇒ 00:06:19.279 Uttam Kumaran: Yeah, I believe it was the the dashboard defaults to just the last 90 days, if not even shorter than that.
43 00:06:20.400 ⇒ 00:06:28.360 Uttam Kumaran: but I think you have the excel sheet that I pulled. I I pulled just the content usage directly from from
44 00:06:30.210 ⇒ 00:06:31.500 Uttam Kumaran: okay.
45 00:06:32.110 ⇒ 00:06:36.849 Amber Lin: Yeah, we can definitely do a closer, deep dive on the metrics that we found.
46 00:06:37.737 ⇒ 00:06:43.580 Amber Lin: I would love to go through the other ones. Before we get super detailed. I know the finance.
47 00:06:45.290 ⇒ 00:06:48.869 Amber Lin: and I was also very, very surprised when I saw it as well.
48 00:06:53.000 ⇒ 00:06:57.218 Uttam Kumaran: And that’s not like entirely on, that’s not entirely uncommon. It’s just
49 00:06:58.280 ⇒ 00:07:12.930 Uttam Kumaran: And those can be people’s personal dashboards. It. It just like happens that those exist. But I will say there’s also a lot of those are broken, or a lot of those have stale logic. And so I mean Emily, you probably know that there’s just a ton in there.
50 00:07:14.210 ⇒ 00:07:17.468 Emily Giant: Yeah, they yeah. When we
51 00:07:18.350 ⇒ 00:07:29.705 Emily Giant: When we got rid of mode they dumped all of our Pdts into looker, and then nobody used them because they didn’t understand how like. Yes, I’m not surprised by it at all.
52 00:07:30.550 ⇒ 00:07:31.130 Amber Lin: Hmm
53 00:07:34.580 ⇒ 00:07:35.280 Amber Lin: awesome.
54 00:07:35.280 ⇒ 00:07:37.279 Emily Giant: People don’t realize that you can send
55 00:07:37.940 ⇒ 00:07:42.950 Emily Giant: queries without saving it as a look. So there’s just thousands of looks saved.
56 00:07:42.950 ⇒ 00:07:43.290 Amber Lin: Yeah.
57 00:07:43.690 ⇒ 00:07:49.469 Emily Giant: Because they don’t realize like. Just send the link. It will send exactly what you queried, and you don’t have to save it.
58 00:07:54.456 ⇒ 00:08:17.640 Amber Lin: So now the 1st part of what we did, we tried it. We went ahead and did a technical discovery to look at the current gaps. And ultimately, what we want to deliver is a fully prioritized space to roadmap a clear backlog and a scope deck. So we’re right now, what we’re doing together is we’re looking at. Okay, is this actually something
59 00:08:17.640 ⇒ 00:08:28.430 Amber Lin: that we want to do? And then, after we finalize this, I can help put together. A prioritized roadmap, and I can make all the tickets so we can start planning the next phase.
60 00:08:31.160 ⇒ 00:08:31.850 Amber Lin: Oh.
61 00:08:35.156 ⇒ 00:08:41.030 Amber Lin: just a quick, quick note before we move on. I think the main themes that we’re trying to
62 00:08:41.179 ⇒ 00:08:56.630 Amber Lin: tackle here is make sure that we tackle reporting and decision making make sure that the pipeline is reliable, and ultimately make sure that we can be efficient and reduce as much cost as recently possible.
63 00:08:56.790 ⇒ 00:09:01.280 Amber Lin: So I kept those things in mind when I was developing this roadmap.
64 00:09:02.270 ⇒ 00:09:06.814 Amber Lin: So 1st of all, if we look at Dbt,
65 00:09:07.950 ⇒ 00:09:26.829 Amber Lin: here are the few issues that we identified. So I I bet you guys already know this because we’ve been already working with Dbt, and we recreated a new inventory model for you guys. So 1st thing that we already know is that right now our data models are structured by source
66 00:09:27.090 ⇒ 00:09:29.180 Amber Lin: rather than business domains.
67 00:09:29.800 ⇒ 00:09:56.970 Amber Lin: and that makes logic and tracking lineage quite hard, and will cause a lot of problems down the line. When we start to try and troubleshoot, or we try and change things, it just causes a lot more unnecessary work if we structured it in the standardized way. Similar stuff with point 2 of the layering and materializations.
68 00:09:57.170 ⇒ 00:10:04.999 Amber Lin: And yeah, so with these things, we try, we’ll try and create a good structure
69 00:10:05.300 ⇒ 00:10:10.580 Amber Lin: to prevent issues happening down the road.
70 00:10:10.920 ⇒ 00:10:23.770 Amber Lin: Other things we found are about the snapshots, the lack of testing documentation and observability. And lastly, I believe we’re already tackling this of
71 00:10:24.333 ⇒ 00:10:35.279 Amber Lin: long, longer run times. I know one of our goals is to try and make it as much as close to real time as as possible. And so this is also.
72 00:10:35.390 ⇒ 00:10:37.430 Amber Lin: These are the 5 things that
73 00:10:37.680 ⇒ 00:10:43.550 Amber Lin: we identified, and we would like to address. I’ll just pause here for any questions.
74 00:10:44.604 ⇒ 00:10:51.200 Zack Gibbs: What is the S. What is the S. 3 data sources? That is that just note card stuff. What else do we do in the S. 3.
75 00:10:51.200 ⇒ 00:10:57.310 Alex K: Nothing to do with nothing to do with note cards. It’s all like ad hoc.
76 00:10:57.770 ⇒ 00:11:03.709 Alex K: Csv, and stuff like that, and also on fleet currently is using that.
77 00:11:04.750 ⇒ 00:11:05.840 Zack Gibbs: Oh, okay.
78 00:11:07.830 ⇒ 00:11:08.749 Uttam Kumaran: So the other thing is like
79 00:11:09.220 ⇒ 00:11:14.649 Uttam Kumaran: that problem. There is data that is just sitting. But we’re like, it’s, it’s hard to know what is
80 00:11:14.890 ⇒ 00:11:29.716 Uttam Kumaran: like backfill and like static versus what is like live refresh. And we are running materializations over these, like basically stale stated, that’s sitting there as part of every run.
81 00:11:33.020 ⇒ 00:11:33.849 Uttam Kumaran: Why not?
82 00:11:34.700 ⇒ 00:11:35.400 Zack Gibbs: Gotcha.
83 00:11:35.830 ⇒ 00:11:37.849 Alex K: No, no surprises here for me. This all makes.
84 00:11:38.142 ⇒ 00:11:46.049 Amber Lin: Yeah, you guys probably already work with this a lot especially put before mother’s day. I bet this was the main thing that we’re trying to fix.
85 00:11:46.250 ⇒ 00:11:55.920 Amber Lin: and so I’ll just run through quickly with the required fixes. So we want to refractor all the model folders we might want to consider deprecating the old models.
86 00:11:56.250 ⇒ 00:12:11.099 Amber Lin: Want to introduce standardized Dbt, layering. Wanna introduce incremental builds, especially for those ones that have static historic data, want to consolidate redundant snapshots.
87 00:12:11.310 ⇒ 00:12:18.250 Amber Lin: apply testing not only this standard Dbt testing and also custom tests tests.
88 00:12:18.906 ⇒ 00:12:46.800 Amber Lin: We want the we want to add exposures and tags, resolve any logic inconsistencies and make sure we document everything. So all of these will make sure that in the back end everything is running with a structure that’s traceable and reliable, and won’t cause too many issues when it’s just running on its own. Our ultimate goal is to make sure that this thing can run on its own and not causing problems.
89 00:12:47.384 ⇒ 00:12:48.920 Amber Lin: Also. Go ahead.
90 00:12:48.920 ⇒ 00:12:49.640 Alex K: Oh, sorry go ahead!
91 00:12:50.044 ⇒ 00:12:51.660 Amber Lin: No, that was it.
92 00:12:52.050 ⇒ 00:13:01.820 Alex K: I just wanna make sure I’m validating my understanding 80% of the work. My my estimate is 80% of the work is this 1st bullet point and 20% is the last 6.
93 00:13:01.820 ⇒ 00:13:03.770 Amber Lin: Yeah, I imagined.
94 00:13:03.770 ⇒ 00:13:06.539 Alex K: Want to make sure, like I’m validating that with you all’s understanding.
95 00:13:07.400 ⇒ 00:13:12.980 Uttam Kumaran: Yes, for for this particular again. And this is really like.
96 00:13:13.270 ⇒ 00:13:34.257 Uttam Kumaran: it’s hard to make the changes we want to make in looker and in Dbt, like the models without this foundational work you’ve we’ve seen a peak of of like what this looks like when we refactored inventory just to make it work within netsuite data. Which I know has been a lot quicker to debug for Emily and for them a lot. So
97 00:13:35.560 ⇒ 00:13:47.310 Uttam Kumaran: yeah, most of this is like, just like naming conventions, moving things to the right folders, rewriting things. You know, and we’ll talk about a couple of other redshift gaps. But right.
98 00:13:47.310 ⇒ 00:13:54.760 Demilade Agboola: I also want to add another thing that potentially could add quite a bit of work to this. Would be documentation.
99 00:13:55.200 ⇒ 00:14:17.239 Demilade Agboola: So document all the maths and stuff. So that is, that also takes a bit of time generally, because we will need to be able to 1st after we’re done creating. Then we’ll sync with your team and be able to like, communicate what the logic is and like what assumptions are being made with the data. So everyone is on the same page and knows what’s going on within the data. But yes, between the refactoring and documentation, most of it.
100 00:14:17.690 ⇒ 00:14:20.929 Demilade Agboola: That’s like most of the heavy lifting in this, in this part of it.
101 00:14:23.030 ⇒ 00:14:23.910 Amber Lin: Awesome.
102 00:14:24.393 ⇒ 00:14:43.400 Amber Lin: I try to make a implementation plan as best of my abilities. I am not a data engineer, so I will. This is the best of my abilities. Let me know what you guys think, and we can totally move things around. So I, separated by sort of different different steps that we kind of outlined here.
103 00:14:43.490 ⇒ 00:14:55.340 Amber Lin: So we want to audit and clean it up. We want to restructure the folders, make sure we have layering materialize those fixes. So all of these and then broke. I kind of broke them down by each task.
104 00:14:55.510 ⇒ 00:15:06.149 Amber Lin: Do believe they’re probably like closer to tickets if we expand them a little bit more. So if you guys have time, we can look at these, make any comments
105 00:15:06.250 ⇒ 00:15:12.830 Amber Lin: and then make sure that we’re aligned on not only what we want to achieve, but also how we’re going to achieve that.
106 00:15:15.360 ⇒ 00:15:22.540 Uttam Kumaran: I took a look. I took a look through these, and I’m I’m pretty comfortable that this is probably like the closest view to to tickets
107 00:15:22.830 ⇒ 00:15:25.701 Uttam Kumaran: at this stage. So yeah, sorry. Go ahead.
108 00:15:28.330 ⇒ 00:15:37.309 Zack Gibbs: I was just. I was just trying to scroll through and see. Is there? Is there like a like a gantt, or something similar that shows. For, like each of these sections.
109 00:15:37.820 ⇒ 00:15:40.040 Zack Gibbs: what’s the rough start and end?
110 00:15:40.510 ⇒ 00:15:45.349 Zack Gibbs: Do you guys have a view viewpoint there yet or or no?
111 00:15:45.700 ⇒ 00:16:09.970 Amber Lin: Not yet. I didn’t. Wanna put on due dates when I wasn’t sure if this was the exact same thing that we wanted. But I can definitely estimate with how these things would take. And then we can make a gantt chart because we are not only just working on one area. So we have a lot of different tasks. And I want to make the Gantt chart after we prioritize.
112 00:16:10.590 ⇒ 00:16:14.700 Alex K: I I think the call out for me is these don’t look like tickets. These look like interdependent
113 00:16:14.870 ⇒ 00:16:17.700 Alex K: components to the project. So I
114 00:16:18.110 ⇒ 00:16:25.350 Alex K: like you’re not, gonna you know, refactor one model without also fixing the fold, restructuring and layer standardization right.
115 00:16:25.350 ⇒ 00:16:25.760 Amber Lin: It’s like.
116 00:16:25.760 ⇒ 00:16:29.829 Alex K: Think that is perhaps a false equivalency in this plan, but like
117 00:16:30.260 ⇒ 00:16:39.849 Alex K: it’s up to you all to like kinda group that in the way that makes sense so like logically, this makes sense to me. But I don’t. This doesn’t read like tickets at all to me, because they’re all
118 00:16:40.270 ⇒ 00:16:43.549 Alex K: more inter interrelated, I guess is the right word.
119 00:16:43.550 ⇒ 00:16:46.033 Amber Lin: I agree. I agree. Totally.
120 00:16:46.530 ⇒ 00:16:49.760 Uttam Kumaran: Yeah, I think my my own call out, Yeah, sorry. Go ahead, Amber.
121 00:16:49.760 ⇒ 00:16:52.050 Amber Lin: No, I have nothing to say. I repeat what you said.
122 00:16:52.356 ⇒ 00:17:02.149 Uttam Kumaran: I think my only point is that I just want to know out of this list what’s more important than not? And then we can move to tickets. It’s just gonna
123 00:17:02.280 ⇒ 00:17:22.279 Uttam Kumaran: it’ll take another block of time to to do the ticket work. I mean, I wanna make sure we’re prioritizing, creating the tickets for at least the 1st month or 2. So out of these core steps, I’m happy to provide what I feel like is higher priority. We have ranked them in that in that way.
124 00:17:22.770 ⇒ 00:17:50.520 Uttam Kumaran: but for us to do the gantt chart and look at parallelization and timeline, I just want to get a confirmation that you guys see the same way. And if we’re like, Hey for each of the core areas. So in again, in this list, we have we have redshift we then have Dvt modeling, and then we have stuff in looker. It would be helpful to know what’s more important, and then I can share what is blocking.
125 00:17:50.600 ⇒ 00:18:00.869 Uttam Kumaran: because some stuff in looker we could take on a lot of the cleanup work. But there are core refactoring and looker. That can only be done after we do. A bunch of Dvc model fixes.
126 00:18:00.870 ⇒ 00:18:05.909 Alex K: Recommend a separate way of looking at this, which I don’t know if it’s the final way. But I just want to propose this for, like.
127 00:18:05.910 ⇒ 00:18:06.270 Uttam Kumaran: Sure.
128 00:18:06.270 ⇒ 00:18:11.830 Alex K: Brain states. These are all to me. These look like
129 00:18:12.420 ⇒ 00:18:17.040 Alex K: one through 8, or whatever these steps are is like something you’ll have to do on every data mart.
130 00:18:17.180 ⇒ 00:18:18.070 Alex K: So like.
131 00:18:18.300 ⇒ 00:18:24.220 Alex K: like, you know what I’m saying like this, you know, we need all these pieces in place for every data mart for every like
132 00:18:24.570 ⇒ 00:18:29.449 Alex K: business domain that we’re looking at. So like. That might be the way to
133 00:18:30.020 ⇒ 00:18:48.710 Alex K: write all our acceptance criteria right through all these steps. But like, I think, the work itself possibly could be looked at through the lens of business domains that need to be touched, or like tables that need not models that need to be touched. I don’t know if that’s the best way of looking at it, but I know that that’s at least another way to kind of help
134 00:18:49.010 ⇒ 00:18:51.370 Alex K: quantify the work as you’re going through this.
135 00:18:51.570 ⇒ 00:19:08.150 Amber Lin: Yeah, I I agree. At 1st I was thinking about if I were able to create those timelines based on the business functions. But what I’m lacking here is that I don’t know what’s most important to you guys. I know we did inventory first, st but to have a I totally agree that we should
136 00:19:08.290 ⇒ 00:19:20.139 Amber Lin: create any ticket based on the business domain. I guess a step that I need from you guys is what is the most important part, what will we do next after inventory?
137 00:19:20.600 ⇒ 00:19:26.059 Alex K: Can. Can I also ask the question, then do like Utam demalade? Emily is inventory done?
138 00:19:26.290 ⇒ 00:19:28.340 Alex K: No, I don’t. I don’t think it is, but I’m like.
139 00:19:28.340 ⇒ 00:19:28.690 Uttam Kumaran: No.
140 00:19:28.690 ⇒ 00:19:30.139 Alex K: Validate that with y’all so.
141 00:19:30.140 ⇒ 00:19:53.762 Uttam Kumaran: Yeah. So inventory is not not. It’s not done yet, I think, what you’re you’re. I think your point is fair, which is, we want to do prioritization on what capabilities you want to unlock and what business domain is it for? Right? We can go the distance for one. And then just like, focus on that. Or we could sort of do this across the board.
142 00:19:54.960 ⇒ 00:19:59.270 Uttam Kumaran: it’s like, there are trade offs about like.
143 00:19:59.650 ⇒ 00:20:03.179 Uttam Kumaran: So yeah, again, I think probably, Zack, you may be best.
144 00:20:03.180 ⇒ 00:20:03.560 Zack Gibbs: Multi.
145 00:20:03.560 ⇒ 00:20:04.300 Uttam Kumaran: Answer. What’s more.
146 00:20:04.300 ⇒ 00:20:07.854 Zack Gibbs: Most. What’s the most efficient way to do the work?
147 00:20:08.210 ⇒ 00:20:10.880 Zack Gibbs: the most efficient way to do the work.
148 00:20:11.120 ⇒ 00:20:18.699 Uttam Kumaran: So at the at the redshift level, I would prefer we do everything at once at the Dbt level. It’s domain specific.
149 00:20:18.930 ⇒ 00:20:22.440 Uttam Kumaran: For example, when we go clean up like stale data.
150 00:20:23.050 ⇒ 00:20:28.570 Uttam Kumaran: I’m not. Gonna really. I don’t really need to care whether the stale data is for finance or whatever it’s sort of like.
151 00:20:28.900 ⇒ 00:20:32.620 Uttam Kumaran: I just need to move it. And and redo schema names stuff like that. So I can.
152 00:20:32.730 ⇒ 00:20:42.780 Uttam Kumaran: Everybody will benefit from that. And it’s that’s gonna be very hard, because multiple domains pull from the same sources at the Dbt level. We should go mark by mark. That would be ideal.
153 00:20:43.130 ⇒ 00:20:47.660 Uttam Kumaran: So meaning at the Dbt level. I would prefer us
154 00:20:47.820 ⇒ 00:21:03.839 Uttam Kumaran: to make a decision to sub to satisfy one business domain there. That doesn’t mean that. Only that 1st that group gets all the love like as we work through those, a lot of those things we create will get reused elsewhere. But it allows us to make one stakeholder happy
155 00:21:03.890 ⇒ 00:21:31.829 Uttam Kumaran: and kind of go the distance. The only call out is, as soon as we start working with that stakeholder, and we get stuff done. I wanna make it clear that that person has an analyst, you know whether it’s Perry or Felipe, or or whoever that we’re that’s the person we’re enabling which I think was made clear at the beginning of our engagement, which is, we’re enabling them to answer the question so that we don’t become immediately like, okay. Now, we’re just like kind of data help desk, you know.
156 00:21:34.950 ⇒ 00:21:43.669 Emily Giant: Yeah, the the inventory data. Very close to where we’re working on inventory data right now becomes revenue data. They go like this.
157 00:21:44.250 ⇒ 00:21:45.110 Emily Giant: And
158 00:21:46.330 ⇒ 00:21:49.280 Emily Giant: So one kind of belays the other.
159 00:21:49.640 ⇒ 00:22:01.180 Emily Giant: just as like an aside to what we’re looking at here. I do think that that’s the most important work is where we started, and it’s it will sequentially work its way
160 00:22:01.320 ⇒ 00:22:07.809 Emily Giant: into like completing both of those boxes like revenues messed up because inventory is messed up.
161 00:22:08.180 ⇒ 00:22:11.319 Amber Lin: And I see.
162 00:22:11.590 ⇒ 00:22:11.945 Emily Giant: Yeah.
163 00:22:12.590 ⇒ 00:22:16.860 Alex K: So it so it sounds like utam. Your recommendation is just to kind of summarize what I’m hearing.
164 00:22:17.870 ⇒ 00:22:22.750 Alex K: We scope some work for redshift optimus, like organization.
165 00:22:23.740 ⇒ 00:22:25.670 Zack Gibbs: Then we hit Mart by mart.
166 00:22:26.070 ⇒ 00:22:26.670 Alex K: Domains.
167 00:22:26.670 ⇒ 00:22:31.395 Uttam Kumaran: We hit Mark by Mark Dvt. And looker, yeah.
168 00:22:33.230 ⇒ 00:22:40.530 Alex K: That sequence correct, like redshift optimization or query, sorry cleanup. It’s not optimizations, quite. It is, but bigger picture redshift.
169 00:22:40.960 ⇒ 00:22:44.817 Alex K: And then Mart’s, that’s correct.
170 00:22:46.400 ⇒ 00:22:52.730 Uttam Kumaran: Yeah, it’s not like it’s that we will find a way to paralyze as much as we can, you know. But yes.
171 00:22:54.570 ⇒ 00:23:08.090 Zack Gibbs: A sense already of what the marts are like. Is there a marketing mart? Is there a, you know, sales, Mart? Is there a inventory mart like, what are the what are the distinct areas.
172 00:23:09.190 ⇒ 00:23:11.440 Uttam Kumaran: Yeah. Do you have them here? Amber.
173 00:23:12.810 ⇒ 00:23:13.900 Amber Lin: I.
174 00:23:14.640 ⇒ 00:23:22.102 Amber Lin: This is my personal thoughts. It’s not based on it’s based on the current.
175 00:23:23.250 ⇒ 00:23:30.679 Amber Lin: what is it? Github? Repo structure? But I would love to hear from you right now. I have inventory, revenue.
176 00:23:30.790 ⇒ 00:23:35.210 Amber Lin: subscription revenues, kind of like sales, subscriptions, customer care and marketing.
177 00:23:35.950 ⇒ 00:23:41.049 Uttam Kumaran: The only thing that’s not here is sort of finance. Broadly. Okay.
178 00:23:41.220 ⇒ 00:23:50.160 Demilade Agboola: The idea of building the moth would also like coincide like we’re looking at this in terms of overall structure.
179 00:23:50.650 ⇒ 00:24:07.559 Demilade Agboola: But in the real sense of it. These things are like, they work hand in hand, and they will be like similar steps. And so like, the idea is, as we’re doing this in Dbt, there’s also like a local cleanup going on as well, and part of that is being able to figure out like what dashboards are being used? What are the heavily relied on? Dashboards
180 00:24:07.850 ⇒ 00:24:35.450 Demilade Agboola: and people who use these dashboards? What do they need to see? What do they need to action on a daily basis, and being able to put all of that together, allows us to be able to structure the math in the most useful way to the business. So if we know, for instance, that the person who is using event tree is also using revenue, we don’t necessarily need to create 2 separate Mods for that. For instance, you might end up having everything like that, powers that user or powers the decisions that come out of that in one month instead.
181 00:24:35.810 ⇒ 00:24:37.100 Demilade Agboola: And so it’s.
182 00:24:37.300 ⇒ 00:24:51.729 Demilade Agboola: it’s not as easy as just saying, Oh, this is, this is, these are number of months we’re coming up with. It’s more of like, okay, so what is the business use case, what decisions and what actions are we trying to drive from? A business user, a business stakeholder perspective.
183 00:24:52.430 ⇒ 00:25:22.099 Uttam Kumaran: So marks marks are less of a marks. Are. You could think of it as a place to go to get an answer about a concept, they’re less tied to a team. Meaning if someone needs to answer a question about subscriptions, there is an area where there’s set of tables for subscriptions. That question may be answered by multiple teams. Right? So it’s tied to to the business one that’s tied to those like core business concepts inventory. I think the ones we have listed here are distinct enough.
184 00:25:22.170 ⇒ 00:25:45.339 Uttam Kumaran: Finance is the one that isn’t here which I think amber. I have a lot in my in my past notes, so we can add that but this is where it’s like we’re not. We’re trying not to align the data work directly to like you. 5 people get in, and the rest of people don’t get anything. It’s all it’s we want to align it towards solving, answering questions about a particular business area.
185 00:25:47.580 ⇒ 00:25:57.019 Uttam Kumaran: And that way we can say, if you have x 5 to 10 questions about inventory, these are the tables that you should go to whether that person in finance or marketing or sales.
186 00:25:57.270 ⇒ 00:26:00.449 Uttam Kumaran: It’s it’s it’s sort of irrelevant.
187 00:26:04.000 ⇒ 00:26:17.350 Uttam Kumaran: Otherwise we we get left with what we have now, which is team specific tables that all answer kind of the same thing. And then they all have logic. And then they all basically go unmaintained.
188 00:26:19.060 ⇒ 00:26:19.590 Zack Gibbs: Yeah.
189 00:26:24.430 ⇒ 00:26:25.550 Amber Lin: Sounds good.
190 00:26:25.720 ⇒ 00:26:39.340 Amber Lin: I’ll move on to the next section, and we’ll look at Redshift. I think this will also inform our very 1st step, which is helping to organize everything. So we looked at it, and we
191 00:26:39.560 ⇒ 00:26:47.560 Amber Lin: I listed my findings in these main categories, and so
192 00:26:49.990 ⇒ 00:27:03.870 Amber Lin: I don’t know that much about redshift. So I’ll let please time in if you, if you have anything to fill. So we have missing permissions and new tables, meaning that when we create new tables.
193 00:27:05.020 ⇒ 00:27:20.030 Amber Lin: it’s not automatically granted. And that makes our Dvt jobs fail. Right now, we’re making a super user. But it is not the best practice. And we wanna figure that out eventually.
194 00:27:21.479 ⇒ 00:27:26.530 Amber Lin: The second part is on ingestion. Right now. We have such
195 00:27:26.740 ⇒ 00:27:47.320 Amber Lin: and polyatomic all at once. That will that might cause some inconsistencies in data. Maybe there will be some overlap and there will be certain confusions around where it comes from which tool. And of course, there’s an increased cost, because we’re using 3 tools instead of one.
196 00:27:48.520 ⇒ 00:27:54.490 Amber Lin: Next part is redshift concurrency issues, which
197 00:27:54.670 ⇒ 00:27:58.650 Amber Lin: basically just means, especially at peak
198 00:27:58.760 ⇒ 00:28:03.020 Amber Lin: sync windows. A lot of things run at the same time.
199 00:28:03.140 ⇒ 00:28:06.060 Amber Lin: and that makes it really really slow.
200 00:28:06.250 ⇒ 00:28:17.919 Amber Lin: So we would love to add some queue prioritizations or to manage to workflows so that our models can can run as fast as possible.
201 00:28:19.460 ⇒ 00:28:26.489 Amber Lin: The next one is also pretty important. We found that all of the Dbt models has a full refresh
202 00:28:26.770 ⇒ 00:28:38.420 Amber Lin: and even for static data, and that really increases the redshift compute time, especially when for the static data, we don’t really need to refresh them.
203 00:28:38.570 ⇒ 00:28:48.909 Amber Lin: And that’s why we were looking into. I believe the incremental levels to make sure that we don’t spend extra computing time on things that doesn’t need to be refreshed.
204 00:28:49.920 ⇒ 00:29:04.939 Amber Lin: Lastly, similarly, we don’t yet have alerting for failures. We don’t yet have tracking for data freshness. And right now, based on my interviews with a lot of times we are.
205 00:29:05.360 ⇒ 00:29:18.470 Amber Lin: do manual recovery for Etl loads. So meaning that our analysts or engineers on your side having to go in manually and check for these errors that happen.
206 00:29:19.010 ⇒ 00:29:20.180 Amber Lin: I’ll pause here.
207 00:29:21.060 ⇒ 00:29:22.220 Alex K: My only flagship.
208 00:29:22.550 ⇒ 00:29:22.880 Uttam Kumaran: That.
209 00:29:23.050 ⇒ 00:29:23.369 Alex K: Oh, go ahead
210 00:29:27.130 ⇒ 00:29:32.279 Alex K: my only flag is that we found I don’t. We should probably report this as correctly here, that, like
211 00:29:33.110 ⇒ 00:29:38.849 Alex K: the for the Etl tools that cost really isn’t
212 00:29:39.490 ⇒ 00:29:49.449 Alex K: a key thing, that one tool would get us here like, we’ve the same data loads using all 3 tools we have now would be exorbitantly more expensive in polythomic. So like.
213 00:29:49.740 ⇒ 00:29:54.668 Alex K: that’s that’s that’s for our case. That is, not a true statement. So
214 00:29:55.530 ⇒ 00:30:07.040 Alex K: that’s like we’re paying pennies for stitch and hevo, right? So I think it’s very valid these other points. But I just want to make sure that from the findings of what we’ve just done. Polytomic going all in polytomic would not be the cheapest option.
215 00:30:07.830 ⇒ 00:30:14.059 Amber Lin: Sounds good sounds good, so we should only switch if the other benefits justify the cost, increase.
216 00:30:14.060 ⇒ 00:30:20.100 Alex K: Yeah, I mean, they’re all kind of pay as you drink. So it’s that would be my ask. There is like, let’s make sure we look at it holistically.
217 00:30:21.840 ⇒ 00:30:23.669 Amber Lin: Let me tell me, you had something earlier.
218 00:30:23.670 ⇒ 00:30:29.039 Uttam Kumaran: Yeah, if you just go to the required fixes. So again, like on each of these sections, this is the
219 00:30:29.240 ⇒ 00:30:49.252 Uttam Kumaran: this is like everything in the kitchen sink. So I really think we pay attention to the top. But for for 3 or 4 items in particular, the top 3 items here are really like what we need. Very, very soon. Tuning performance less concerned about. I’ve monitored CPU and stuff. We’re pretty fine.
220 00:30:50.390 ⇒ 00:31:01.049 Uttam Kumaran: we the only other piece that once we have this working across the board is, we need better alerting and triage. We did some of that with the Dbt. Cloud work.
221 00:31:01.230 ⇒ 00:31:08.653 Uttam Kumaran: although it’s still very, very hard for us to debug quickly when we see those the top 3 items here are the most important
222 00:31:09.674 ⇒ 00:31:22.009 Uttam Kumaran: and again for the top item. I agree. I wanna make sure that if Stitch and Evo, if there are connectors that we can run through those, and they’re very, very cheap, and we don’t need to touch them, and they’ve been working. That’s fine.
223 00:31:22.160 ⇒ 00:31:29.649 Uttam Kumaran: I will say that that I think the Netsuite work moving to Polycom was a was a big win, and and from my angle
224 00:31:29.950 ⇒ 00:31:39.669 Uttam Kumaran: seemingly worth the cost. So I just want to consider that if there, if there are connectors in that high priority range that we discuss, that we can maintain
225 00:31:39.900 ⇒ 00:31:55.631 Uttam Kumaran: a 3 Etl system. It’s not ideal, but it’s not the end of the world. So not like looking to consolidate just for consolidation sake. But I wanna make sure that if there’s like 2 other ones that really need to run. And we need someone on the line that we
226 00:31:56.290 ⇒ 00:31:57.890 Uttam Kumaran: we make that call.
227 00:32:02.230 ⇒ 00:32:06.369 Zack Gibbs: We have talked about in the past. Stitch and Hevo are doing slightly different things, and.
228 00:32:06.800 ⇒ 00:32:07.340 Uttam Kumaran: Yeah.
229 00:32:07.340 ⇒ 00:32:17.450 Zack Gibbs: That correct or we couldn’t switch like we couldn’t kill off Hevo in favor of moving those use cases over to stitch, at least in the short term.
230 00:32:17.450 ⇒ 00:32:28.839 Uttam Kumaran: I think there’s some there, even within those. If you want to consolidate. I think there are opportunities. Because they’re both doing rest. Api calls batch rest. Api calls, so
231 00:32:29.090 ⇒ 00:32:33.190 Uttam Kumaran: I would say out of both of those, I mean, I guess I don’t really have a
232 00:32:33.650 ⇒ 00:32:38.240 Uttam Kumaran: favorite out of either of those, but maybe it’s stitch.
233 00:32:38.240 ⇒ 00:32:40.840 Alex K: My recommendation is that that
234 00:32:41.100 ⇒ 00:33:02.729 Alex K: the recommendation for each data source be made on a source by source basis, because, like, I really don’t think there is a once one drink fits all approach here. That’s gonna be what we choose. Maybe it is, but I I think it needs to be considered on a source by source basis, because they all do slightly different things. Right? So I think, and it’s all pay as you drink right? It’s not like it’s.
235 00:33:03.090 ⇒ 00:33:04.220 Uttam Kumaran: Yeah, one.
236 00:33:04.220 ⇒ 00:33:12.480 Alex K: Has a completely different pay payment structure. So if we have 20 events in one tool, we’re gonna have 20 events in another tool. Right. So I just wanna make sure that that
237 00:33:12.820 ⇒ 00:33:13.340 Alex K: done.
238 00:33:16.050 ⇒ 00:33:25.679 Uttam Kumaran: Yeah, so amber on this like consolidate ingestion piece, can we? Can we just add that this ends up in the data platform documentation.
239 00:33:25.870 ⇒ 00:33:27.769 Uttam Kumaran: And we go source by source.
240 00:33:28.140 ⇒ 00:33:34.600 Uttam Kumaran: We already have it there. We just need to add a column about like basically like criticality.
241 00:33:35.138 ⇒ 00:33:48.381 Uttam Kumaran: With netsuite being the number one. And I just want to do like basically one to 3 in terms of criticality. And then we can have a well. I also have the row counts for each. So we can have a conversation about
242 00:33:49.340 ⇒ 00:33:54.762 Uttam Kumaran: which one we want to consider moving around
243 00:33:58.394 ⇒ 00:34:13.390 Uttam Kumaran: like Facebook and stuff like, I don’t care. That’s gonna work everywhere. There’s no need to sort of consolidate so it it primary needs for consolidation at this point are just gonna be based on criticality.
244 00:34:13.847 ⇒ 00:34:24.399 Uttam Kumaran: And then potentially, if this doesn’t seem to be the case. But if anything beyond netsuite or a couple of other things need to run like every few minutes or so. And that’s the only other option.
245 00:34:27.659 ⇒ 00:34:33.099 Uttam Kumaran: Yeah, sounds good. We I will meet with you guys after this meeting, and we can.
246 00:34:33.629 ⇒ 00:34:37.919 Amber Lin: We can change this based on our discussions in this meeting.
247 00:34:40.355 ⇒ 00:34:49.939 Amber Lin: We’ve talked a bit about the required fixes. So I’m gonna run through them really quickly. So consolidate ingestion automate the permissions
248 00:34:50.989 ⇒ 00:34:55.189 Amber Lin: create development versus production schema separations.
249 00:34:55.349 ⇒ 00:35:05.080 Amber Lin: And then, I believe these, we said, are less important for now of having is monitoring that across monitoring.
250 00:35:05.080 ⇒ 00:35:11.994 Uttam Kumaran: We’re getting. We’re getting a good deal on red ship. So like, I’m not, I’m not like too concerned with
251 00:35:12.720 ⇒ 00:35:15.350 Uttam Kumaran: poking this like anymore, at least for now.
252 00:35:16.360 ⇒ 00:35:17.470 Amber Lin: Okay.
253 00:35:17.780 ⇒ 00:35:28.719 Amber Lin: sounds good. And this is a similar layout than the part we have for Dvt. But I believe, as we discuss this, we just apply across the board.
254 00:35:28.860 ⇒ 00:35:35.109 Amber Lin: So if anything of this stands out to you, or just let us know.
255 00:35:41.320 ⇒ 00:35:43.516 Uttam Kumaran: I think we can do a lot.
256 00:35:43.830 ⇒ 00:35:47.899 Amber Lin: Okay, sounds good. This part.
257 00:35:47.900 ⇒ 00:35:56.169 Zack Gibbs: 1 1 quick. One question. Why? Why is the the the grant automation for access
258 00:35:56.650 ⇒ 00:36:02.060 Zack Gibbs: between DVD redshift like, why is that a problem. To begin with.
259 00:36:02.885 ⇒ 00:36:03.650 Amber Lin: Don’t want it.
260 00:36:03.650 ⇒ 00:36:06.310 Zack Gibbs: Like, why are permissions? A problem here.
261 00:36:06.640 ⇒ 00:36:07.616 Uttam Kumaran: Yes.
262 00:36:10.354 ⇒ 00:36:28.625 Uttam Kumaran: it’s a good question. So redshift the way you do role based access control. There’s a there’s sort of like default privileges based on when objects are created that need to be applied. And it’s just gonna take some like, we just basically have to drop a lot of roles
263 00:36:29.654 ⇒ 00:36:47.899 Uttam Kumaran: recreate them with the specific permissions on each schema and make sure that anything accessing the warehouse, whether it’s dropping data in pulling data out, which is looker or modeling data within which is dbt, they each have roles. Previously they were all tied to Steven’s user account.
264 00:36:48.225 ⇒ 00:37:03.350 Uttam Kumaran: And our fix in the short term was just like, at least for Dbt, we’re like you get everything. The reason why it’s not great is like accidents happen, and we don’t want certain roles to have access to do certain things like your Bi tool shouldn’t have access like the right stuff.
265 00:37:03.736 ⇒ 00:37:18.860 Uttam Kumaran: You. Also, we don’t want the bi tool to pull directly from raw. Otherwise, people are gonna build dashboards directly on raw data skipping the Dbt layer. So those are kind of considerations. We didn’t do this this month just because it takes like
266 00:37:19.620 ⇒ 00:37:32.260 Uttam Kumaran: it’s sort of like a something like a weekend project to like. Get all the roles, get all the grants ready, drop and rerun, and then make sure that things work previously, I think everything was just set up to one or 2
267 00:37:32.500 ⇒ 00:37:37.156 Uttam Kumaran: super user roles that were that were out of access everywhere.
268 00:37:38.030 ⇒ 00:37:57.539 Uttam Kumaran: the last point I’ll make is now we have several services that are now running queries on objects, and dropping and recreating objects, and staging and production, our users and looker and poly comic redshift, you have this concept of locks where, if the table is getting written to, there’s a lock on the table.
269 00:37:57.590 ⇒ 00:38:08.979 Uttam Kumaran: If multiple processes have a lock on the table, it will cause issues. So there’s a couple of things we can do where we have retries. We have query killers ways to mitigate this.
270 00:38:09.070 ⇒ 00:38:16.579 Uttam Kumaran: This is just specific to red ship management, though. And soflake. This isn’t necessarily a problem
271 00:38:16.830 ⇒ 00:38:18.809 Uttam Kumaran: in bigquery or something.
272 00:38:20.310 ⇒ 00:38:21.190 Uttam Kumaran: Hold on.
273 00:38:23.490 ⇒ 00:38:25.980 Zack Gibbs: Historic cool.
274 00:38:26.530 ⇒ 00:38:32.039 Zack Gibbs: Historically, there are bad decisions made around our back. And now we have to
275 00:38:32.150 ⇒ 00:38:38.649 Zack Gibbs: go and actually put put those those different roles and permissions in place versus having use super users is what I’m hearing.
276 00:38:38.650 ⇒ 00:38:40.009 Uttam Kumaran: That’s that. Yeah, that’s correct.
277 00:38:40.010 ⇒ 00:38:40.609 Zack Gibbs: This is more.
278 00:38:40.610 ⇒ 00:38:42.220 Uttam Kumaran: Yeah, yeah, this is.
279 00:38:42.220 ⇒ 00:38:46.579 Zack Gibbs: It’s more of an administrative item that will just help mitigate risk.
280 00:38:46.780 ⇒ 00:39:12.659 Uttam Kumaran: Yeah. And of course, like this is also for data security, like, you know, which we haven’t. We haven’t talked through much. But right now, it’s sort of easy for anyone to get access to everything. So this just sets the groundwork for role based access control. And similarly, in looker, when we have you. When we talk about the like looker users and governance, we will tie them to role based access control. So it depends on what what levels and grants that you have.
281 00:39:13.137 ⇒ 00:39:25.719 Uttam Kumaran: But fundamentally, the initial goal is just to prevent these like overlaps and make sure that tools have access the right permissions and right access to the data. They need to do what they need. But it’s a, it’s a symptom of like, yeah legacy. We just
282 00:39:25.860 ⇒ 00:39:27.210 Uttam Kumaran: this was never done.
283 00:39:30.660 ⇒ 00:39:31.280 Zack Gibbs: Gotcha.
284 00:39:31.710 ⇒ 00:39:32.540 Zack Gibbs: Okay.
285 00:39:36.180 ⇒ 00:39:38.510 Amber Lin: Cool. So let’s move on to looker.
286 00:39:40.230 ⇒ 00:39:55.799 Amber Lin: I think this is quite a bit big area, especially because I have more data from the few reports we pulled. So it’s a bit more detailed. And I put here why things matter. So 1st of all, there’s the cost.
287 00:39:55.920 ⇒ 00:40:14.579 Amber Lin: There’s a user experience for our developers. And like everybody using those dashboards, there’s a trust that these people have in the data and overall how much time and cost we’re spending over the maintenance of these dashboards versus actually getting value out of them.
288 00:40:15.660 ⇒ 00:40:18.518 Amber Lin: So with those in mind,
289 00:40:19.240 ⇒ 00:40:30.630 Amber Lin: I separated this by a few areas. So 1st of all is a look at structure and governance problems. We have the business logic and metric confusion.
290 00:40:31.412 ⇒ 00:40:39.290 Amber Lin: Have the explore and view level issues. And lastly, of the Pd Pdt issues.
291 00:40:40.130 ⇒ 00:40:57.630 Amber Lin: So let’s dive into each of them. So 1st of all, based on the reports that we pulled, we found that there was a lot of dashboards. There was a 581 total and only 18% of them was used in the last month.
292 00:40:58.140 ⇒ 00:41:06.540 Amber Lin: and when it comes to looks. There was 2,000 total, but only 8% was used, even a more striking number.
293 00:41:06.730 ⇒ 00:41:22.690 Amber Lin: And similarly, there was not really a folder governance. All the Dev development dashboard live alongside the product ones not really access control or freshness indicators and
294 00:41:22.810 ⇒ 00:41:23.910 Amber Lin: ownership.
295 00:41:25.050 ⇒ 00:41:26.519 Amber Lin: So this one
296 00:41:27.140 ⇒ 00:41:40.440 Amber Lin: sort of relates to the 1st one, so I’ll go over this before. I pause for anything, any questions. So 1st of all, there’s a lot of dashboards that have some very similar purposes.
297 00:41:40.620 ⇒ 00:41:50.949 Amber Lin: and I believe that was caused by a lot of teams just made their own dashboards because they needed it, and ultimately each one of them require their own logic and their own maintenance.
298 00:41:51.400 ⇒ 00:41:56.019 Amber Lin: and similarly, because a lot of them was developed developed by the teams.
299 00:41:56.590 ⇒ 00:42:19.989 Amber Lin: they have their logic logic directly in look in looker. And that causes a lot of problems when we’re trying to troubleshoot, or we try to change something in the base. And then it doesn’t really take the same effects across different dashboards, and then we have to individually troubleshoot each dashboard to find out what’s going on.
300 00:42:20.960 ⇒ 00:42:36.470 Amber Lin: Similarly, we have a lot of manual filters replicating the joins, which is a bit similar than having of having the logic and looker instead of in Dvt. And lastly, we have somewhat of a mixing of
301 00:42:36.720 ⇒ 00:42:39.890 Amber Lin: unrelated logic. And so
302 00:42:40.110 ⇒ 00:42:55.780 Amber Lin: one of our goals here is to make sure everything is clear, and we know where where all these logics are defined, in what place, so that when we have to change something, or when we have to troubleshoot, we know exactly where to go.
303 00:42:55.940 ⇒ 00:42:59.929 Amber Lin: and that will save us a lot of time and a lot of confusion.
304 00:43:01.850 ⇒ 00:43:12.289 Amber Lin: So lastly, on these 2 areas, we have a few dashboards have a lot of exports.
305 00:43:12.770 ⇒ 00:43:25.499 Amber Lin: And, for example, the sales data is the one that has the most explorers. It has had 2,000 queries in the past 30 days, but also includes a lot of unused dimensions.
306 00:43:26.020 ⇒ 00:43:39.409 Amber Lin: for the other ones, however, explores are very underused, so if we take a closer look at the report that we pulled, we’ll find that it’s very skewed usage of explores
307 00:43:39.920 ⇒ 00:43:42.539 Amber Lin: that fit. Let’s see.
308 00:43:44.310 ⇒ 00:43:48.139 Amber Lin: Lastly, of the persistent derived table issues.
309 00:43:48.350 ⇒ 00:43:55.989 Amber Lin: I’ll just place it there. I’m not the most experienced based on this. So I’ll let someone else take over to talk about it.
310 00:43:56.850 ⇒ 00:44:06.839 Uttam Kumaran: Yeah, maybe I’ll just I’ll just recap. So Pdts are basically the equivalent of running a Ddt model like in Booker, meaning, it materializes a table.
311 00:44:07.410 ⇒ 00:44:11.540 Uttam Kumaran: and then queries that the reason being is
312 00:44:11.995 ⇒ 00:44:24.804 Uttam Kumaran: look, when when looker was created like really Dbt was sort of like not there. And the the past strategy for materialization was you? Would you would run jobs like in airflow and materialize tables?
313 00:44:25.457 ⇒ 00:44:37.319 Uttam Kumaran: Instead, they’re like, Oh, let’s just put in Looker. So you could do that. There. Of course, the problem is now there are tables on top of the tables that we have, and there’s logic in there there are where clauses and like changes that like
314 00:44:37.340 ⇒ 00:44:55.531 Uttam Kumaran: we can’t govern in in the Dbt layer. So part of this is just stripping those out like very, very simple and these run on frequencies that we can’t control. They’re cached. And of course they have logic. So we just want to move that to the Dvt layer which I think no one will have a problem with.
315 00:44:55.900 ⇒ 00:45:00.009 Uttam Kumaran: The other thing is, yes, we. There are a lot of like refresh schedules and
316 00:45:00.280 ⇒ 00:45:06.130 Uttam Kumaran: triggers in there that we want to clean up. Not so much to save costs in a redshift.
317 00:45:06.320 ⇒ 00:45:27.650 Uttam Kumaran: but just because they’re getting sent to people’s emails. And they’re most likely slowing down the the look of service in general. But I think the underlying thing is, yeah, there’s a ton of dashboards and a ton of explorers that we just need to find a way to deprecate. I think there we will sort of work through like a migration plan with whether it’s
318 00:45:27.650 ⇒ 00:45:39.019 Uttam Kumaran: moving existing dashboards to a new one or cutting looks out of existing dashboards. We had this sort of select star problem in Dbt, and it’s very similar in Looker, where we just like
319 00:45:39.020 ⇒ 00:46:02.370 Uttam Kumaran: brought everything in every time. If you in Looker, and you see the left side. That’s the problem we’re dealing with. Where there are way, too many dimensions and metrics. There are several ones that say the same thing. Total sales is like there are 10 times if and if I, if one of us on this team can’t figure out which the right one, there’s no way any of the users can. So I really am
320 00:46:02.600 ⇒ 00:46:09.529 Uttam Kumaran: excited to do this work because this is the true source of truth for everyone to get access to the data.
321 00:46:10.340 ⇒ 00:46:13.710 Uttam Kumaran: And so I think again, the I would say
322 00:46:14.030 ⇒ 00:46:18.190 Uttam Kumaran: on this section, I think the top, like 5
323 00:46:18.560 ⇒ 00:46:33.780 Uttam Kumaran: items are, are very important. probably the top 6 items here are are very important dashboard freshness and and governance a little bit less, but really moving all the logic out of there and refactoring the high use explores.
324 00:46:34.185 ⇒ 00:46:59.570 Uttam Kumaran: We’ll spend save a lot of people a lot of time, and the nice thing is the last piece I’ll say is, we’ll be partnered directly with Perry. Whoever on this work through this last few weeks, I can tell how interested they are in doing this and how much looker development. They’re doing so. This is purely like partly an enablement exercise. Ideally, we work to them to hand this work off to them, to deprecate things where we’re approving the deprecation. Prs, they’re handling the migrations like
325 00:47:00.020 ⇒ 00:47:03.320 Uttam Kumaran: ideally. We we’re sort of guiding them into doing this work.
326 00:47:04.377 ⇒ 00:47:09.082 Uttam Kumaran: And for them to own own and understand the benefits.
327 00:47:13.201 ⇒ 00:47:19.110 Uttam Kumaran: So I think that’s something that’s probably not as clear here. Amber is is a process document. I think
328 00:47:19.310 ⇒ 00:47:32.559 Uttam Kumaran: we probably something I missed to do is just understand what Brainforge team can take on or has to take on versus what team can take on. I think on this looker side. I didn’t. I wasn’t fully aware of how
329 00:47:33.465 ⇒ 00:47:54.639 Uttam Kumaran: equip the existing team is in Looker, and it seems like they’re all pretty good, so I would love to hand a good amount of this work off, and instead, just guide them into taking these actions, and us, being like a Pr review, will see when they do these big deprecations. This will also get get them to adopt in this way faster.
330 00:47:56.070 ⇒ 00:48:05.479 Zack Gibbs: Like the short term steps of 80 to 90% of the looker stuff being unused like, I would rather I would rather the team that we have
331 00:48:05.730 ⇒ 00:48:23.990 Zack Gibbs: make those cuts, because then they’ll buy into it more and then they’ll have to make decisions on like what to cut, what to keep and be involved in that process. But I feel like that’s something that we we can handle right? So to your point, I think that that type of breakdown of what
332 00:48:24.190 ⇒ 00:48:48.600 Zack Gibbs: what are you recommend? You guys take on versus what we can take on would be good to see but that’s like in my mind. That’s 1 of the 1st things that we should do now that we’re at after we’re post. Mother’s day is like, let’s make some deep cuts here and see what comes out of the woodworks. But I would rather that our team take ownership of that. Those cuts and license count, and in, you know, looks queries.
333 00:48:49.960 ⇒ 00:48:50.425 Uttam Kumaran: Thanks.
334 00:48:53.200 ⇒ 00:48:53.790 Uttam Kumaran: And I think
335 00:48:53.790 ⇒ 00:49:16.229 Uttam Kumaran: I think, Yeah, you know, even Emily, I think moving Pdt is is totally something that I think you can own, and would just be there for reviewing. So I think that’s something across the board here. Amber on this document, is we? When we on these implementation plans. I think we should just add, like a, whether Urban Sims can can take this on, or whether we recommend.
336 00:49:16.280 ⇒ 00:49:29.980 Uttam Kumaran: we sort of read. That’s probably the nice thing is, we have all the we have Csv exports of all the content in here, so it’s it’ll be easy for that team to sort of go through and slice this up.
337 00:49:33.360 ⇒ 00:49:36.590 Zack Gibbs: Yeah, I don’t know. My gut feeling.
338 00:49:37.383 ⇒ 00:49:49.009 Zack Gibbs: You guys can chime in is like some of these items like Looker, we, we own our internal team owns looker anything changing in looker, we have business teams that are involved.
339 00:49:49.440 ⇒ 00:50:00.329 Zack Gibbs: We may leverage you guys, for some of the change management there in migration, but, like that, seems like a clear line of delineation for ownership. That we could take on on our end.
340 00:50:00.330 ⇒ 00:50:01.190 Zack Gibbs: whereas
341 00:50:01.210 ⇒ 00:50:10.710 Zack Gibbs: you guys may be better suited to do some of the redshift work and help coordinate. And do some, you know, do chunks of the data mark work?
342 00:50:11.450 ⇒ 00:50:15.424 Zack Gibbs: so that’s kind of where it would be good to see the breakdown.
343 00:50:16.850 ⇒ 00:50:20.489 Zack Gibbs: get a get a feel, or if it’s like, you know, it’s things are done in tandem.
344 00:50:20.490 ⇒ 00:50:20.979 Uttam Kumaran: No, no.
345 00:50:23.780 ⇒ 00:50:42.480 Zack Gibbs: But I don’t. The way that I’m like reading this. Maybe, Alex, this is the point you’re making, and I’m just like, no, I’m just not catching on is like the DVD modeling changes are going to impact downstream reporting. And then there’s change management there that has to be done and migration into there that has to be done with the business teams that are that are consuming those reports. So it’s like you.
346 00:50:43.050 ⇒ 00:50:46.568 Zack Gibbs: You have to do this in chunks, and you know
347 00:50:47.650 ⇒ 00:50:48.150 Uttam Kumaran: Yes.
348 00:50:48.150 ⇒ 00:50:52.540 Zack Gibbs: Is, is the is. Now what I’m hearing.
349 00:50:53.460 ⇒ 00:51:14.480 Uttam Kumaran: Yeah, so we, I would rather we focus on a mark basis. We try to go after the high priority ones first, st and then for the looker work, I I would much rather the analysts. Take that on. And that way they’re bought in because they’re gonna be using this the most, and I want them to have an acute understanding of like what’s in the marks.
350 00:51:14.610 ⇒ 00:51:19.269 Uttam Kumaran: either building, existing, building new explorers or migrating existing explorers.
351 00:51:19.450 ⇒ 00:51:20.816 Uttam Kumaran: That’s fair.
352 00:51:24.350 ⇒ 00:51:29.749 Uttam Kumaran: and then, even in the Dbt world again, I think a lot of this will be between me. Devil out a
353 00:51:30.050 ⇒ 00:51:35.969 Uttam Kumaran: amber and Luke you know, so it’s not completely on us, I think, especially
354 00:51:36.380 ⇒ 00:51:47.930 Uttam Kumaran: as we build the intermediate models. That’s where, like this crew will be just like centralizing logic a lot and diving into inventory items. Tableau, except like breaking those down.
355 00:51:48.720 ⇒ 00:51:54.189 Uttam Kumaran: That’s like the meat. That’s that’s the stuff that’s gonna take like a time.
356 00:51:57.290 ⇒ 00:51:58.179 Uttam Kumaran: Oh, yeah.
357 00:51:59.366 ⇒ 00:52:02.299 Uttam Kumaran: But I hope at least.
358 00:52:02.300 ⇒ 00:52:02.710 Zack Gibbs: Or the.
359 00:52:02.710 ⇒ 00:52:08.689 Uttam Kumaran: Plan you can send to the analyst crew to say like, this is roughly stuff. Or or you guys can prioritize this.
360 00:52:13.670 ⇒ 00:52:23.134 Zack Gibbs: yeah, I think a good next for me, at least, to help visualize it, and then make sure that we all feel internally good about the next steps would be
361 00:52:23.640 ⇒ 00:52:35.308 Zack Gibbs: the more like a more timeline breakdown of kind of what we discussed today, which is really like you guys owning a lot of the data. Mark data, mark pieces with our you know where you know where, with our help and assistance,
362 00:52:35.720 ⇒ 00:52:47.440 Zack Gibbs: and like, what’s the sequence of, say, we just focus on inventory 1st and finish that out like, what are the what are the steps there to get inventory
363 00:52:48.443 ⇒ 00:52:56.336 Zack Gibbs: finalized, you know, in a trusted state downstream, you know, reports migrated or updated.
364 00:52:57.170 ⇒ 00:53:00.659 Zack Gibbs: And then, you know, once that’s done.
365 00:53:00.920 ⇒ 00:53:07.186 Zack Gibbs: we move to sales, or we move to, you know, whatever whatever the actual other, you know, business areas are.
366 00:53:07.650 ⇒ 00:53:18.479 Zack Gibbs: and what does that like chunk of time look like? It’s my. My feeling is like that. Those that’s gonna be like an epic in itself, with a bunch of you know, children stories underneath.
367 00:53:19.580 ⇒ 00:53:20.320 Zack Gibbs: that would be
368 00:53:21.960 ⇒ 00:53:34.236 Zack Gibbs: how this, I guess, translates more into like a concrete action plan. At least we could start with kind of like the high level. What is the sequence? Who owns which pieces of that sequence? Roughly.
369 00:53:35.380 ⇒ 00:53:37.190 Zack Gibbs: that would be the best.
370 00:53:37.550 ⇒ 00:53:40.280 Uttam Kumaran: I think the content inside of this document is really good.
371 00:53:40.740 ⇒ 00:53:46.740 Zack Gibbs: It’s just more of like, what’s the sequencing? Now I know that you guys are looking to us a little bit for that privatization.
372 00:53:47.130 ⇒ 00:53:49.100 Zack Gibbs: But it sounds like we’re all running around like
373 00:53:50.190 ⇒ 00:53:52.290 Zack Gibbs: we have to do this data mark by data, mark. And there’s
374 00:53:53.000 ⇒ 00:54:01.410 Zack Gibbs: bundle of activities across. Dbt, looker, redshift that I’ll have to occur with, you know, by each data mark right?
375 00:54:02.560 ⇒ 00:54:03.335 Uttam Kumaran: Yes.
376 00:54:04.110 ⇒ 00:54:04.415 Zack Gibbs: Okay.
377 00:54:05.680 ⇒ 00:54:11.050 Uttam Kumaran: Oh, sorry so amber! I think that’s pretty clear. If you want to take that on.
378 00:54:11.190 ⇒ 00:54:13.510 Uttam Kumaran: and then we can turn something around this week.
379 00:54:13.890 ⇒ 00:54:20.100 Amber Lin: Sure. Yeah, we haven’t yet decided on which specific data more, we’re gonna do.
380 00:54:22.480 ⇒ 00:54:29.770 Uttam Kumaran: I think the order in which you have it is seems pretty fair, but I don’t know exactly that way. I feel like marketing.
381 00:54:29.940 ⇒ 00:54:32.330 Uttam Kumaran: It’s probably on the low end.
382 00:54:33.150 ⇒ 00:54:34.240 Zack Gibbs: Yeah, where is the
383 00:54:36.080 ⇒ 00:54:42.293 Zack Gibbs: yeah. I don’t know about. So I don’t know about subs on that one. We should chat further about
384 00:54:42.810 ⇒ 00:54:43.510 Zack Gibbs: But.
385 00:54:44.040 ⇒ 00:54:45.080 Amber Lin: But definitely revenue.
386 00:54:45.080 ⇒ 00:54:50.780 Zack Gibbs: Story, inventory and sales or revenue. However, we wanna.
387 00:54:54.895 ⇒ 00:54:57.599 Zack Gibbs: And that seems they seem like
388 00:54:58.150 ⇒ 00:55:06.269 Zack Gibbs: the the top ones with inventory being first, st like we already made progress. There, let’s let’s finish that piece of it out, because that’s the.
389 00:55:06.270 ⇒ 00:55:06.900 Uttam Kumaran: Bye-bye.
390 00:55:06.900 ⇒ 00:55:08.930 Zack Gibbs: Probably the hairiest one that had that
391 00:55:09.330 ⇒ 00:55:13.900 Zack Gibbs: you know that people are familiar with, and then we can. We have, like a.
392 00:55:14.310 ⇒ 00:55:19.439 Uttam Kumaran: We don’t need to. We don’t need to decide on the next floor then, because that’s like, that’s a huge bundle of work. So.
393 00:55:19.440 ⇒ 00:55:20.170 Amber Lin: Yeah, let’s.
394 00:55:20.170 ⇒ 00:55:25.516 Uttam Kumaran: Does to cement that. Yeah. Awesome.
395 00:55:26.340 ⇒ 00:55:27.400 Amber Lin: So.
396 00:55:28.900 ⇒ 00:55:37.458 Uttam Kumaran: So yeah, Amber, I don’t know if you want to work on that in linear, if you want to work. And then the Google spreadsheet just again chart that we can look at this. I can provide
397 00:55:37.800 ⇒ 00:55:47.619 Uttam Kumaran: level of effort on each of the tasks. And then I think today we? We agreed on each of these sections, which ones? Which of the 5 or 10
398 00:55:47.760 ⇒ 00:56:01.900 Uttam Kumaran: sort of actions are are high priority. And then, yeah, I would leave the I would leave the looker stuff as is. But I think for for the team. Yeah, I feel like that’s a pretty good roadmap if you wanna
399 00:56:02.590 ⇒ 00:56:22.869 Uttam Kumaran: if you want to get the analyst team already kicked off and taking that stuff on, happy to work to, to expand that. But all all the data on what’s not being used is in looker. And then, yeah, the quickest cost adjustment looker side is going to be to cut licenses. So there is a similar user report where you could see who’s querying.
400 00:56:23.030 ⇒ 00:56:33.859 Uttam Kumaran: So first, st things is just to see who’s not querying and get them off. The second piece is to find out who should not, who should not be. And, as you mentioned like, we should just be in netsuite
401 00:56:34.600 ⇒ 00:56:39.603 Uttam Kumaran: that’s like, probably answer both of those. In a almost an hour or 2.
402 00:56:43.410 ⇒ 00:56:43.885 Zack Gibbs: Yep.
403 00:56:46.270 ⇒ 00:56:58.969 Zack Gibbs: yeah. And the reality is is that our our looker agreement doesn’t renew until middle of July. But it would be nice just to make those cuts like within the next 3 week, 2, 3 weeks, and then.
404 00:56:59.140 ⇒ 00:57:02.330 Zack Gibbs: you know, there’ll be things that come out. Come out of the
405 00:57:02.750 ⇒ 00:57:06.562 Zack Gibbs: you know that we’re not expecting. Once we make deeper cuts there.
406 00:57:07.700 ⇒ 00:57:14.980 Zack Gibbs: and then we’ll have actual time to make sure we understand how many licenses we really need, and which roles within those license groups. We need.
407 00:57:15.860 ⇒ 00:57:20.459 Uttam Kumaran: Yeah, I I think like making the cut now and seeing who yells
408 00:57:20.720 ⇒ 00:57:22.419 Uttam Kumaran: is a better way to go.
409 00:57:25.160 ⇒ 00:57:31.680 Uttam Kumaran: So if we can assist on getting that query together, or whatever. I don’t think that’s gonna take long time.
410 00:57:36.926 ⇒ 00:57:38.480 Uttam Kumaran: Cool. Okay?
411 00:57:38.730 ⇒ 00:57:48.429 Uttam Kumaran: So yeah, amber. I guess whenever you’re ready, maybe we can send something over, and then we wanna have another meeting like this weekend. Maybe if you just want to send a quick little associated when you put together that timeline.
412 00:57:48.950 ⇒ 00:57:52.599 Uttam Kumaran: and we can go from there. Try to close this out this week.
413 00:57:54.310 ⇒ 00:58:01.241 Amber Lin: Sure. Yeah. When I get some time I’ll start doing on time out Utah. Do internal review, and then we’ll send you guys
414 00:58:02.526 ⇒ 00:58:05.549 Amber Lin: either. English and Helen.
415 00:58:07.480 ⇒ 00:58:08.060 Amber Lin: Okay.
416 00:58:08.060 ⇒ 00:58:08.530 Alex K: Thank you so much.
417 00:58:08.530 ⇒ 00:58:09.449 Zack Gibbs: Yeah, I just want to reiterate the work.
418 00:58:09.450 ⇒ 00:58:10.080 Alex K: Helpful.
419 00:58:10.310 ⇒ 00:58:10.730 Emily Giant: Yeah.
420 00:58:10.730 ⇒ 00:58:30.690 Zack Gibbs: Yeah, I think we’re. We’re excited about next steps. Now that we’re through this period, there’s still like cleanup stuff that’s happening. That’s taking attention away. But that should taper off here soon and really focus on more sustainable, you know. Build out which I think we’re all all excited to to undertake. So.
421 00:58:31.490 ⇒ 00:58:32.270 Uttam Kumaran: Oh!
422 00:58:32.270 ⇒ 00:58:33.190 Amber Lin: Awesome.
423 00:58:33.930 ⇒ 00:58:34.400 Alex K: Have a good one.
424 00:58:34.400 ⇒ 00:58:37.990 Uttam Kumaran: Awesome. Thanks. Everyone. Thanks.