Meeting Title: CDP Data Model Planning Sync Date: 2025-07-08 Meeting participants: Awaish Kumar, Robert Tseng
WEBVTT
1 00:00:28.365 ⇒ 00:00:29.020 Awaish Kumar: Hello!
2 00:00:31.020 ⇒ 00:00:33.650 Robert Tseng: Hey? Wish? Okay, let’s do it.
3 00:00:35.250 ⇒ 00:00:38.100 Robert Tseng: Where do you want me to start.
4 00:00:38.850 ⇒ 00:00:40.570 Awaish Kumar: So actually, I haven’t.
5 00:00:41.000 ⇒ 00:00:43.779 Awaish Kumar: I’ve been reading this document and.
6 00:00:43.780 ⇒ 00:00:44.460 Robert Tseng: Okay.
7 00:00:47.080 ⇒ 00:00:52.020 Awaish Kumar: Fair. I’m so this is the kind of table you need at the end.
8 00:00:52.410 ⇒ 00:00:54.120 Awaish Kumar: Is that the goal.
9 00:00:55.180 ⇒ 00:01:05.050 Robert Tseng: Yeah. Yes, that is it. It may not look exactly like that. But yes, that’s the idea.
10 00:01:06.810 ⇒ 00:01:10.439 Awaish Kumar: Like I, so like, there, there are 2 things like number one.
11 00:01:10.550 ⇒ 00:01:15.500 Awaish Kumar: the kind of table this is is more like based on user Ids.
12 00:01:16.219 ⇒ 00:01:20.529 Awaish Kumar: which are basically we. We have a some user id for them.
13 00:01:20.860 ⇒ 00:01:28.050 Awaish Kumar: Either they signed up on bask or they have a they are customers.
14 00:01:28.180 ⇒ 00:01:35.259 Awaish Kumar: But are you? Are we looking to include the the ones where we don’t know? Like they just visited the
15 00:01:35.590 ⇒ 00:01:41.590 Awaish Kumar: pass platform and like Eden website. And then, thank you.
16 00:01:41.590 ⇒ 00:01:42.250 Robert Tseng: Yeah.
17 00:01:45.650 ⇒ 00:01:54.390 Robert Tseng: okay, actually, I don’t know if this would be easier if I kind of I have the doc pulled up, too. Can I? Can I share my screen? And you just tell me where you want to go?
18 00:01:54.850 ⇒ 00:01:55.180 Awaish Kumar: Okay.
19 00:01:56.010 ⇒ 00:01:57.870 Robert Tseng: Okay, I will.
20 00:02:02.830 ⇒ 00:02:03.963 Robert Tseng: Yeah. So
21 00:02:04.950 ⇒ 00:02:11.786 Robert Tseng: if you look here. So this is what segment currently sense. If you go into bigquery, you can go and look at these different
22 00:02:12.350 ⇒ 00:02:19.990 Robert Tseng: I might actually try to just pull it up right now, so I can show you query
23 00:02:24.530 ⇒ 00:02:25.820 Robert Tseng: a.
24 00:02:32.330 ⇒ 00:02:34.219 Robert Tseng: I lost all my queries.
25 00:02:35.830 ⇒ 00:02:39.920 Robert Tseng: Okay? Well, I guess my point with this was
26 00:02:40.900 ⇒ 00:02:43.699 Robert Tseng: If we go to
27 00:02:45.500 ⇒ 00:02:50.780 Robert Tseng: believe that these are the raw tables. And so if I just go to id graph updates.
28 00:03:00.510 ⇒ 00:03:01.780 Robert Tseng: So
29 00:03:12.490 ⇒ 00:03:19.989 Robert Tseng: yeah, I believe that these are all like the random ids that already that we get from segments.
30 00:03:20.600 ⇒ 00:03:24.669 Robert Tseng: So, yeah, we can look at you can look at these different tables. But like.
31 00:03:25.200 ⇒ 00:03:26.850 Awaish Kumar: I can’t see the screen.
32 00:03:27.790 ⇒ 00:03:28.380 Robert Tseng: So.
33 00:03:36.750 ⇒ 00:03:41.990 Awaish Kumar: Yeah, okay, so is this product unified as a separate tool or.
34 00:03:42.990 ⇒ 00:03:54.299 Robert Tseng: Yeah. So prod unify is like how segment, like segment, has already been syncing all of their profiles data into like bigquery. We just haven’t been using it.
35 00:03:56.210 ⇒ 00:04:07.219 Robert Tseng: And so I was like looking through these different. These 6 tables like these are all the raw tables of like random events that fire and then segment creates like their own segment ids. Most of these are anonymous.
36 00:04:07.897 ⇒ 00:04:10.642 Robert Tseng: Then there’s like external mapping updates.
37 00:04:11.900 ⇒ 00:04:19.360 Awaish Kumar: Yeah, I’ve been reading through this. And like, the Cdp manage this id management. So
38 00:04:20.439 ⇒ 00:04:26.269 Awaish Kumar: like, segment gets data from every other platform like Google, analytics.
39 00:04:26.914 ⇒ 00:04:41.699 Awaish Kumar: Different like orders, data, customer. I/O data like whatever there is. And then cook like the cookies emails. So it it’s have like this steps of figuring out the identity.
40 00:04:41.819 ⇒ 00:04:44.299 Awaish Kumar: So 1st of all, like using email or
41 00:04:44.399 ⇒ 00:04:52.809 Awaish Kumar: phone number, then it it goes, look looking for or things like that. And in stepwise it figures out what exactly
42 00:04:53.879 ⇒ 00:04:57.649 Awaish Kumar: like, what, what roles are linked to some profile.
43 00:04:58.260 ⇒ 00:05:15.050 Robert Tseng: Yeah, so yeah, we don’t have to do any of that. I think it’s really just collecting. So yeah, I just, I just remember so profile traits updates, this is like the super wide table that is like each row is a single customer. Obviously, most of them will not have values like you can see, this is all null. It’s kind of just
44 00:05:16.000 ⇒ 00:05:22.027 Robert Tseng: junk. So yeah, I think that’s why I went. And I
45 00:05:28.250 ⇒ 00:05:30.549 Awaish Kumar: Yeah, I see your audio.
46 00:05:31.170 ⇒ 00:05:51.859 Robert Tseng: Yeah, like, I wrote a query here that basically just like mapped every column into like a key value preparing. Just so I could understand, like, what are all the different columns? Then I tried to write this like, I mean, it’s really bad. I didn’t. It didn’t even fully work, but I was trying to basically look at. Okay for every column. How many values are there? Like.
47 00:05:53.880 ⇒ 00:06:12.530 Robert Tseng: yeah, that’s basically what I was trying to do. Like, what’s the percentage of completeness for for each for each column? Because that’ll kind of give me. And when was the last updated and that’ll give me some sense of like freshness, and also just kind of like, you know, like completeness, I guess.
48 00:06:13.940 ⇒ 00:06:15.160 Robert Tseng: And
49 00:06:15.470 ⇒ 00:06:26.890 Robert Tseng: you know that I was expecting to help me trim down this 350 column table into I would expect something like less than 30.
50 00:06:26.940 ⇒ 00:06:50.649 Robert Tseng: not all of it is useful, obviously so and like you said a lot of that stuff we already get you. I know you. There’s a dim customers model that that data we get straight from bask. But I’m just saying like, Well, what if we shifted that like? Why do we have to use bask like, you know, segments using Basque as well. So why don’t we just take what segments profile already kind of gives us? And we just build upon that we do our own enrichment.
51 00:06:52.860 ⇒ 00:06:59.420 Robert Tseng: yeah. So within segment, like, there isn’t my like, you can look in the ui and see like what
52 00:06:59.660 ⇒ 00:07:06.050 Robert Tseng: is habit has been set up. This click on that real quick.
53 00:07:08.590 ⇒ 00:07:13.050 Robert Tseng: Then here.
54 00:07:14.890 ⇒ 00:07:15.439 Robert Tseng: So there’s
55 00:07:15.440 ⇒ 00:07:27.129 Robert Tseng: like about, you know, like these are the. These are the important traits, and these are, this is what’s getting pushed into mix panel already, and I guess obviously order count gets sent into. I don’t even know Hubspot doesn’t exist, so it’s just customer I/O
56 00:07:28.850 ⇒ 00:07:40.430 Robert Tseng: Then I know that in segment you can go, and you can create some additional traits in here, and they can flow through like they’ll show up in bigquery, and we can use those as well. So it’s just great, like, we don’t have to
57 00:07:40.930 ⇒ 00:07:43.219 Robert Tseng: create every field in
58 00:07:43.772 ⇒ 00:07:49.210 Robert Tseng: in in in bigquery. But yeah, I think just being able to like, at least
59 00:07:49.660 ⇒ 00:07:53.999 Robert Tseng: manage this in the warehouse is what I’m is what I’m
60 00:07:55.230 ⇒ 00:08:02.990 Robert Tseng: trying to get us to do. And I think we just obviously have more flexibility to get whatever we want rather than having to go through.
61 00:08:03.100 ⇒ 00:08:07.909 Robert Tseng: You know this limited set of of things that segment allows us to do.
62 00:08:10.032 ⇒ 00:08:17.590 Awaish Kumar: So like some of the thing. So for number one task, I understand that like, we want to see like, if we can utilize
63 00:08:18.358 ⇒ 00:08:21.249 Awaish Kumar: profiles already created by segment.
64 00:08:22.394 ⇒ 00:08:26.069 Awaish Kumar: As our team customer, or we can maybe merge them together.
65 00:08:27.300 ⇒ 00:08:48.539 Robert Tseng: Yeah, I mean, for the purposes of like this test, you don’t have to merge them. You can just create a separate model for now, and you know, I think whether or not we use segment. I think rudder stack the other tool we’re evaluating operates very similarly. So like they will have their own kind of identity stitching thing like, I don’t know. Maybe we have to read the docs to go and figure out what that looks like. But
66 00:08:48.640 ⇒ 00:08:55.360 Robert Tseng: I I you know Vertor Stack doesn’t have a ui it. It is like warehouse native. So it will. It will.
67 00:08:55.942 ⇒ 00:09:03.080 Robert Tseng: Yeah, we can maybe just even look at it now. So rudder, stack, identity, resolution.
68 00:09:08.000 ⇒ 00:09:18.750 Awaish Kumar: But, like the the major difference between these 2 segment profiles and our customer data is, we only rely on the customers like who makes some order with us.
69 00:09:18.750 ⇒ 00:09:22.670 Robert Tseng: Correct. Yeah, we’re a segment, has everything, or has every anonymous user.
70 00:09:23.042 ⇒ 00:09:24.159 Awaish Kumar: Visit, maybe yeah.
71 00:09:25.180 ⇒ 00:09:25.790 Robert Tseng: Yeah.
72 00:09:30.480 ⇒ 00:09:41.980 Robert Tseng: yeah, which I’m totally fine with keeping. I mean, I I do think we need that, because that’s what mix panel depends on mix panel in order for it to be function like a Google analytics and needs to have.
73 00:09:42.190 ⇒ 00:09:55.139 Robert Tseng: you know, the full full range of all visitors and what they’re doing on the platform so that we can measure like, do do so. We can, you know, try to figure out how to convert more of those visitors that don’t end up becoming customers.
74 00:09:55.682 ⇒ 00:10:00.260 Robert Tseng: So yeah, our gym customer stable is limited. It’s only active customers right.
75 00:10:01.810 ⇒ 00:10:02.570 Robert Tseng: Yeah.
76 00:10:03.060 ⇒ 00:10:04.150 Robert Tseng: So
77 00:10:04.460 ⇒ 00:10:08.820 Robert Tseng: yeah, so in that case that it’s definitely not gonna be the same model, it’s gonna be like a.
78 00:10:09.050 ⇒ 00:10:14.940 Robert Tseng: But yeah, whatever we whatever we need to do to like, get the
79 00:10:15.440 ⇒ 00:10:28.240 Robert Tseng: which I mean segment already gives you some raw like tables, and gives you some materialized tables as well, so I think you can, we can build some sort of like intermediary like, enriched.
80 00:10:28.540 ⇒ 00:10:30.840 Robert Tseng: I don’t know if it’s the customers. But
81 00:10:33.790 ⇒ 00:10:50.780 Robert Tseng: okay, so the the yeah, there’s like the users. There’s like the users table that we can just get straight from segment. We don’t necessarily have to do any more enrichment on it. But then we need, like the enriched customers table, which does take active customers from like the
82 00:10:50.980 ⇒ 00:11:00.350 Robert Tseng: from, like the segment users. And then we need to be able to add more to it than we currently have. And then customers, so that we can push that into customer. I/O,
83 00:11:01.710 ⇒ 00:11:02.360 Awaish Kumar: Okay.
84 00:11:02.620 ⇒ 00:11:03.849 Robert Tseng: Yeah, with customers.
85 00:11:04.550 ⇒ 00:11:05.470 Robert Tseng: Yeah.
86 00:11:05.470 ⇒ 00:11:11.200 Awaish Kumar: So in the like, we have this profile table, right? This you know what it was called
87 00:11:12.086 ⇒ 00:11:16.669 Awaish Kumar: in the bigquery, in the prod unify. There’s profile traits, updates.
88 00:11:17.210 ⇒ 00:11:17.820 Awaish Kumar: So.
89 00:11:17.820 ⇒ 00:11:18.180 Robert Tseng: Yes.
90 00:11:18.180 ⇒ 00:11:22.410 Awaish Kumar: You mentioned some user table like, like, where’s that nipple.
91 00:11:22.410 ⇒ 00:11:25.920 Robert Tseng: I I believe this. Is it.
92 00:11:25.920 ⇒ 00:11:27.580 Awaish Kumar: Are you talking about this one, or.
93 00:11:27.990 ⇒ 00:11:35.689 Robert Tseng: Yeah, I think this is like their big user table like, this is just this is just synced every hour. And it’s just like.
94 00:11:35.900 ⇒ 00:11:36.410 Awaish Kumar: Okay.
95 00:11:36.410 ⇒ 00:11:39.809 Robert Tseng: Yeah, I mean, 90% of these don’t have. Like, you know.
96 00:11:40.180 ⇒ 00:11:49.180 Awaish Kumar: Yeah. So what I understand is like, we get this user table, maybe try to keep it to short few columns which are useful.
97 00:11:49.430 ⇒ 00:11:55.690 Awaish Kumar: and then also create another table on top of it, which basically maybe combine data from our
98 00:11:55.900 ⇒ 00:12:02.340 Awaish Kumar: them customer and enrich it. Basically maybe with Ltv total orders.
99 00:12:02.340 ⇒ 00:12:03.150 Robert Tseng: Yes.
100 00:12:03.950 ⇒ 00:12:06.200 Awaish Kumar: Yeah, we have, yeah.
101 00:12:06.840 ⇒ 00:12:11.099 Robert Tseng: Yeah. And then this call this model here was just like.
102 00:12:11.565 ⇒ 00:12:27.000 Robert Tseng: I I was thinking of a better way to try to manage like, well, there’s like 350 columns here. How do we know what’s useful? So that’s I was hoping to transform this into like an intermediary table. That basically is like a.
103 00:12:27.130 ⇒ 00:12:40.829 Robert Tseng: you know, point in, it’s like a snapshot of like, what traits work trait work comes from, you know. How complete is it should like, and just like some sort of freshness test so that.
104 00:12:41.010 ⇒ 00:12:46.029 Robert Tseng: you know, we know, like when you’re when you’re deciding, like what to limit
105 00:12:46.220 ⇒ 00:12:54.930 Robert Tseng: from this model. We have, you know, this intermediary model to help you to make that decision.
106 00:12:55.383 ⇒ 00:13:06.730 Robert Tseng: If freshness score is above blah blah, or like completeness scores that are like that. You know, we have, like a set of criteria that helps you to. Narrow down like the
107 00:13:06.850 ⇒ 00:13:12.550 Robert Tseng: this, this raw model into like what you’re describing.
108 00:13:13.250 ⇒ 00:13:15.370 Awaish Kumar: Yeah, yeah, I, okay, I understand
109 00:13:16.130 ⇒ 00:13:20.899 Awaish Kumar: the purpose of what you are trying. You are trying to figure out the meaningful columns. Basically.
110 00:13:21.280 ⇒ 00:13:21.680 Robert Tseng: Yep.
111 00:13:21.680 ⇒ 00:13:23.680 Awaish Kumar: Trying to do from this. Oh.
112 00:13:23.680 ⇒ 00:13:29.790 Robert Tseng: And maybe I’m over complicating it. Maybe we could. We could do it much simpler than this, but I just wanted to at least throw something on there.
113 00:13:31.420 ⇒ 00:13:35.070 Awaish Kumar: Okay, understood? I will work on that today and
114 00:13:35.180 ⇒ 00:13:38.920 Awaish Kumar: maybe have us have something tomorrow, to show you so we can have more
115 00:13:39.170 ⇒ 00:13:41.290 Awaish Kumar: when we have something we can
116 00:13:41.390 ⇒ 00:13:44.739 Awaish Kumar: like like, have a more further discussion on top of that.
117 00:13:45.310 ⇒ 00:14:00.599 Robert Tseng: Yeah. And I know that you’re wanting to know. What exactly should that model look like? So that’s why I created a separate page of like a trade roadmap or enrichment tracker. You know, the the short answer is like, we don’t know exactly. It’s it’s a bit it’s a bit difficult, because
118 00:14:00.760 ⇒ 00:14:05.859 Robert Tseng: Bobby is the main lifecycle person like I, I need like a marketer
119 00:14:06.340 ⇒ 00:14:10.049 Robert Tseng: to tell me what’s useful. I mean, I can. Let’s make assumptions, and
120 00:14:10.120 ⇒ 00:14:20.099 Robert Tseng: you know, like you, said Ltv. Total orders, whether they’re churned or not next payment day like stuff like that. Sure, like we don’t really need someone to tell us that that’s important. We can just bring those in
121 00:14:20.375 ⇒ 00:14:40.989 Robert Tseng: and I don’t want to, you know, enrich it with too much. But so I was like kind of thinking through. Okay, well, what are the different like? How do we phase out like different things that we know that we can do immediately based on the data that we already have. And we just have, like, you know, a few fields that we add in. And I think that would be enough to make the decision so like over the next week.
122 00:14:41.000 ⇒ 00:15:07.080 Robert Tseng: and then maybe over time, as we’re working with a marketer. Then they’re letting us know. Hey, we want to add, add more and more traits. And we and we kind of understand, like, what are the different categories of things that they could go after, and I’ve already kind of mapped that out in my head, and I’ve tried to write. Write it down to some extent of like. What are the different. You know things that we could enrich it with, not saying that we need to capture all of this. I think really the objective for this
123 00:15:07.250 ⇒ 00:15:12.440 Robert Tseng: ex. You know, this sprint is really this, you know, is just this is the main
124 00:15:13.890 ⇒ 00:15:19.115 Robert Tseng: This is all we need to do for this script. Yeah.
125 00:15:19.570 ⇒ 00:15:20.050 Awaish Kumar: Okay.
126 00:15:20.050 ⇒ 00:15:21.039 Robert Tseng: Does that make sense.
127 00:15:21.770 ⇒ 00:15:22.480 Awaish Kumar: Yeah.
128 00:15:22.810 ⇒ 00:15:28.323 Robert Tseng: Okay, cool. So yeah, I think you get it. I think I think we’re on the same page.
129 00:15:29.680 ⇒ 00:15:30.670 Awaish Kumar: So that we
130 00:15:31.120 ⇒ 00:15:36.129 Awaish Kumar: I’ll work on it, and maybe, like, go further in detail in the on this document, and.
131 00:15:36.860 ⇒ 00:15:37.580 Robert Tseng: Yeah. Twist.
132 00:15:37.580 ⇒ 00:15:44.200 Awaish Kumar: We’ll have something so like when I’m I work on that like, I’ll have more clarity. And
133 00:15:44.560 ⇒ 00:15:49.400 Awaish Kumar: I’m like, when we see something. Yeah, we will have more
134 00:15:50.320 ⇒ 00:15:53.960 Awaish Kumar: more clarity on, like, how how to move forward.
135 00:15:54.560 ⇒ 00:15:55.240 Robert Tseng: Okay.
136 00:15:55.450 ⇒ 00:16:17.309 Robert Tseng: cool. Yeah, yeah, definitely. Don’t overcomplicate it. I think you, I think you understand. Thank you for making it simpler. I I think that’s that’s that’s helpful. Yeah. And the rest. I’m spending my time in customer I/O now trying to like, understand, like, how this is going to be used. So I have all these notes that I’m gonna keep spending more time here. But but yeah, I think
137 00:16:17.370 ⇒ 00:16:45.400 Robert Tseng: you get you you and you understand the point, we’re just trying to get a single source of truth for, like like a user’s model that includes anonymous and customers. And from that we’re just gonna push that into all of our tools from now on. And then we also want to have, like a single enriched customer data model. That we can use for customer I/O as well. So and and mixed now. So I think, yeah, I think we’re on the same page there. So it’s really just 2. Well, yeah, anyway. So I think that’s
138 00:16:45.490 ⇒ 00:16:51.332 Robert Tseng: that’s cool. I mean, sounds like, sounds like we’re we get it. So
139 00:16:52.560 ⇒ 00:16:53.780 Awaish Kumar: Okay. Yeah.
140 00:16:53.780 ⇒ 00:16:54.370 Robert Tseng: Yeah.
141 00:16:55.100 ⇒ 00:16:56.440 Robert Tseng: Any other questions.
142 00:16:56.690 ⇒ 00:16:59.459 Awaish Kumar: Is this helpful with this document helpful?
143 00:17:00.290 ⇒ 00:17:19.579 Awaish Kumar: Yeah, it was. It was very like it was very helpful. It was very, but it has a lot of information so how to digest everything. So I I that’s why I tried to like scope it down to to focus on on like today or like in next 2 days, to to.
144 00:17:19.589 ⇒ 00:17:20.009 Robert Tseng: Yeah.
145 00:17:20.010 ⇒ 00:17:27.019 Awaish Kumar: Actually, you can do something and then like iterate over this document because it has a lot of information.
146 00:17:27.740 ⇒ 00:17:39.810 Robert Tseng: Yeah, you’re right. So I think just this, what what we’re calling quote unquote objective to is really what you’re gonna work on. And and yeah, I I think the numbering is kind of confusing. I might take it off because it’s
147 00:17:40.270 ⇒ 00:17:44.989 Robert Tseng: that they don’t. It’s not a progression they don’t lead to one another. I I kind of just
148 00:17:45.570 ⇒ 00:17:47.180 Robert Tseng: yeah. I’m just gonna.
149 00:17:47.830 ⇒ 00:17:49.946 Awaish Kumar: Yeah, that’s okay. I know.
150 00:17:50.370 ⇒ 00:17:53.890 Robert Tseng: Okay, cool.
151 00:17:55.610 ⇒ 00:18:04.150 Robert Tseng: Alright. Yeah. Well, that’s that. This is just me. Like, I, I just document everything as I’m doing it. So sometimes I kind of write too much. But I’d rather have more than nothing. So.
152 00:18:05.400 ⇒ 00:18:07.139 Awaish Kumar: Okay, yeah, thank you.
153 00:18:07.800 ⇒ 00:18:08.500 Robert Tseng: Okay.
154 00:18:10.300 ⇒ 00:18:15.530 Awaish Kumar: I know. Yeah. Sorry. One more thing about this payment thing for polytomic.
155 00:18:16.040 ⇒ 00:18:18.409 Robert Tseng: Oh, yeah, I saw your message. Yeah.
156 00:18:19.150 ⇒ 00:18:29.870 Awaish Kumar: So Utam also said the same thing that like, if they want, I know we can have a contract, otherwise we can ask them to maybe have a set of their own card in the polytomic. So.
157 00:18:29.870 ⇒ 00:18:30.239 Robert Tseng: Got it.
158 00:18:30.240 ⇒ 00:18:33.130 Awaish Kumar: It is so they can just directly pay. Okay.
159 00:18:33.130 ⇒ 00:18:40.325 Robert Tseng: Okay, I will. I’ll let them know today. Yeah. So we don’t have to worry about that. It’s not super urgent. I think what you’re working on is more urgent. So
160 00:18:40.550 ⇒ 00:18:42.600 Robert Tseng: okay, cool. Let’s do that.
161 00:18:43.010 ⇒ 00:18:44.089 Awaish Kumar: Thank you. Bye.
162 00:18:44.360 ⇒ 00:18:45.440 Robert Tseng: All right. This.