Meeting Title: CDP Data Model Planning Sync Date: 2025-07-08 Meeting participants: Awaish Kumar, Robert Tseng


WEBVTT

1 00:00:28.365 00:00:29.020 Awaish Kumar: Hello!

2 00:00:31.020 00:00:33.650 Robert Tseng: Hey? Wish? Okay, let’s do it.

3 00:00:35.250 00:00:38.100 Robert Tseng: Where do you want me to start.

4 00:00:38.850 00:00:40.570 Awaish Kumar: So actually, I haven’t.

5 00:00:41.000 00:00:43.779 Awaish Kumar: I’ve been reading this document and.

6 00:00:43.780 00:00:44.460 Robert Tseng: Okay.

7 00:00:47.080 00:00:52.020 Awaish Kumar: Fair. I’m so this is the kind of table you need at the end.

8 00:00:52.410 00:00:54.120 Awaish Kumar: Is that the goal.

9 00:00:55.180 00:01:05.050 Robert Tseng: Yeah. Yes, that is it. It may not look exactly like that. But yes, that’s the idea.

10 00:01:06.810 00:01:10.439 Awaish Kumar: Like I, so like, there, there are 2 things like number one.

11 00:01:10.550 00:01:15.500 Awaish Kumar: the kind of table this is is more like based on user Ids.

12 00:01:16.219 00:01:20.529 Awaish Kumar: which are basically we. We have a some user id for them.

13 00:01:20.860 00:01:28.050 Awaish Kumar: Either they signed up on bask or they have a they are customers.

14 00:01:28.180 00:01:35.259 Awaish Kumar: But are you? Are we looking to include the the ones where we don’t know? Like they just visited the

15 00:01:35.590 00:01:41.590 Awaish Kumar: pass platform and like Eden website. And then, thank you.

16 00:01:41.590 00:01:42.250 Robert Tseng: Yeah.

17 00:01:45.650 00:01:54.390 Robert Tseng: okay, actually, I don’t know if this would be easier if I kind of I have the doc pulled up, too. Can I? Can I share my screen? And you just tell me where you want to go?

18 00:01:54.850 00:01:55.180 Awaish Kumar: Okay.

19 00:01:56.010 00:01:57.870 Robert Tseng: Okay, I will.

20 00:02:02.830 00:02:03.963 Robert Tseng: Yeah. So

21 00:02:04.950 00:02:11.786 Robert Tseng: if you look here. So this is what segment currently sense. If you go into bigquery, you can go and look at these different

22 00:02:12.350 00:02:19.990 Robert Tseng: I might actually try to just pull it up right now, so I can show you query

23 00:02:24.530 00:02:25.820 Robert Tseng: a.

24 00:02:32.330 00:02:34.219 Robert Tseng: I lost all my queries.

25 00:02:35.830 00:02:39.920 Robert Tseng: Okay? Well, I guess my point with this was

26 00:02:40.900 00:02:43.699 Robert Tseng: If we go to

27 00:02:45.500 00:02:50.780 Robert Tseng: believe that these are the raw tables. And so if I just go to id graph updates.

28 00:03:00.510 00:03:01.780 Robert Tseng: So

29 00:03:12.490 00:03:19.989 Robert Tseng: yeah, I believe that these are all like the random ids that already that we get from segments.

30 00:03:20.600 00:03:24.669 Robert Tseng: So, yeah, we can look at you can look at these different tables. But like.

31 00:03:25.200 00:03:26.850 Awaish Kumar: I can’t see the screen.

32 00:03:27.790 00:03:28.380 Robert Tseng: So.

33 00:03:36.750 00:03:41.990 Awaish Kumar: Yeah, okay, so is this product unified as a separate tool or.

34 00:03:42.990 00:03:54.299 Robert Tseng: Yeah. So prod unify is like how segment, like segment, has already been syncing all of their profiles data into like bigquery. We just haven’t been using it.

35 00:03:56.210 00:04:07.219 Robert Tseng: And so I was like looking through these different. These 6 tables like these are all the raw tables of like random events that fire and then segment creates like their own segment ids. Most of these are anonymous.

36 00:04:07.897 00:04:10.642 Robert Tseng: Then there’s like external mapping updates.

37 00:04:11.900 00:04:19.360 Awaish Kumar: Yeah, I’ve been reading through this. And like, the Cdp manage this id management. So

38 00:04:20.439 00:04:26.269 Awaish Kumar: like, segment gets data from every other platform like Google, analytics.

39 00:04:26.914 00:04:41.699 Awaish Kumar: Different like orders, data, customer. I/O data like whatever there is. And then cook like the cookies emails. So it it’s have like this steps of figuring out the identity.

40 00:04:41.819 00:04:44.299 Awaish Kumar: So 1st of all, like using email or

41 00:04:44.399 00:04:52.809 Awaish Kumar: phone number, then it it goes, look looking for or things like that. And in stepwise it figures out what exactly

42 00:04:53.879 00:04:57.649 Awaish Kumar: like, what, what roles are linked to some profile.

43 00:04:58.260 00:05:15.050 Robert Tseng: Yeah, so yeah, we don’t have to do any of that. I think it’s really just collecting. So yeah, I just, I just remember so profile traits updates, this is like the super wide table that is like each row is a single customer. Obviously, most of them will not have values like you can see, this is all null. It’s kind of just

44 00:05:16.000 00:05:22.027 Robert Tseng: junk. So yeah, I think that’s why I went. And I

45 00:05:28.250 00:05:30.549 Awaish Kumar: Yeah, I see your audio.

46 00:05:31.170 00:05:51.859 Robert Tseng: Yeah, like, I wrote a query here that basically just like mapped every column into like a key value preparing. Just so I could understand, like, what are all the different columns? Then I tried to write this like, I mean, it’s really bad. I didn’t. It didn’t even fully work, but I was trying to basically look at. Okay for every column. How many values are there? Like.

47 00:05:53.880 00:06:12.530 Robert Tseng: yeah, that’s basically what I was trying to do. Like, what’s the percentage of completeness for for each for each column? Because that’ll kind of give me. And when was the last updated and that’ll give me some sense of like freshness, and also just kind of like, you know, like completeness, I guess.

48 00:06:13.940 00:06:15.160 Robert Tseng: And

49 00:06:15.470 00:06:26.890 Robert Tseng: you know that I was expecting to help me trim down this 350 column table into I would expect something like less than 30.

50 00:06:26.940 00:06:50.649 Robert Tseng: not all of it is useful, obviously so and like you said a lot of that stuff we already get you. I know you. There’s a dim customers model that that data we get straight from bask. But I’m just saying like, Well, what if we shifted that like? Why do we have to use bask like, you know, segments using Basque as well. So why don’t we just take what segments profile already kind of gives us? And we just build upon that we do our own enrichment.

51 00:06:52.860 00:06:59.420 Robert Tseng: yeah. So within segment, like, there isn’t my like, you can look in the ui and see like what

52 00:06:59.660 00:07:06.050 Robert Tseng: is habit has been set up. This click on that real quick.

53 00:07:08.590 00:07:13.050 Robert Tseng: Then here.

54 00:07:14.890 00:07:15.439 Robert Tseng: So there’s

55 00:07:15.440 00:07:27.129 Robert Tseng: like about, you know, like these are the. These are the important traits, and these are, this is what’s getting pushed into mix panel already, and I guess obviously order count gets sent into. I don’t even know Hubspot doesn’t exist, so it’s just customer I/O

56 00:07:28.850 00:07:40.430 Robert Tseng: Then I know that in segment you can go, and you can create some additional traits in here, and they can flow through like they’ll show up in bigquery, and we can use those as well. So it’s just great, like, we don’t have to

57 00:07:40.930 00:07:43.219 Robert Tseng: create every field in

58 00:07:43.772 00:07:49.210 Robert Tseng: in in in bigquery. But yeah, I think just being able to like, at least

59 00:07:49.660 00:07:53.999 Robert Tseng: manage this in the warehouse is what I’m is what I’m

60 00:07:55.230 00:08:02.990 Robert Tseng: trying to get us to do. And I think we just obviously have more flexibility to get whatever we want rather than having to go through.

61 00:08:03.100 00:08:07.909 Robert Tseng: You know this limited set of of things that segment allows us to do.

62 00:08:10.032 00:08:17.590 Awaish Kumar: So like some of the thing. So for number one task, I understand that like, we want to see like, if we can utilize

63 00:08:18.358 00:08:21.249 Awaish Kumar: profiles already created by segment.

64 00:08:22.394 00:08:26.069 Awaish Kumar: As our team customer, or we can maybe merge them together.

65 00:08:27.300 00:08:48.539 Robert Tseng: Yeah, I mean, for the purposes of like this test, you don’t have to merge them. You can just create a separate model for now, and you know, I think whether or not we use segment. I think rudder stack the other tool we’re evaluating operates very similarly. So like they will have their own kind of identity stitching thing like, I don’t know. Maybe we have to read the docs to go and figure out what that looks like. But

66 00:08:48.640 00:08:55.360 Robert Tseng: I I you know Vertor Stack doesn’t have a ui it. It is like warehouse native. So it will. It will.

67 00:08:55.942 00:09:03.080 Robert Tseng: Yeah, we can maybe just even look at it now. So rudder, stack, identity, resolution.

68 00:09:08.000 00:09:18.750 Awaish Kumar: But, like the the major difference between these 2 segment profiles and our customer data is, we only rely on the customers like who makes some order with us.

69 00:09:18.750 00:09:22.670 Robert Tseng: Correct. Yeah, we’re a segment, has everything, or has every anonymous user.

70 00:09:23.042 00:09:24.159 Awaish Kumar: Visit, maybe yeah.

71 00:09:25.180 00:09:25.790 Robert Tseng: Yeah.

72 00:09:30.480 00:09:41.980 Robert Tseng: yeah, which I’m totally fine with keeping. I mean, I I do think we need that, because that’s what mix panel depends on mix panel in order for it to be function like a Google analytics and needs to have.

73 00:09:42.190 00:09:55.139 Robert Tseng: you know, the full full range of all visitors and what they’re doing on the platform so that we can measure like, do do so. We can, you know, try to figure out how to convert more of those visitors that don’t end up becoming customers.

74 00:09:55.682 00:10:00.260 Robert Tseng: So yeah, our gym customer stable is limited. It’s only active customers right.

75 00:10:01.810 00:10:02.570 Robert Tseng: Yeah.

76 00:10:03.060 00:10:04.150 Robert Tseng: So

77 00:10:04.460 00:10:08.820 Robert Tseng: yeah, so in that case that it’s definitely not gonna be the same model, it’s gonna be like a.

78 00:10:09.050 00:10:14.940 Robert Tseng: But yeah, whatever we whatever we need to do to like, get the

79 00:10:15.440 00:10:28.240 Robert Tseng: which I mean segment already gives you some raw like tables, and gives you some materialized tables as well, so I think you can, we can build some sort of like intermediary like, enriched.

80 00:10:28.540 00:10:30.840 Robert Tseng: I don’t know if it’s the customers. But

81 00:10:33.790 00:10:50.780 Robert Tseng: okay, so the the yeah, there’s like the users. There’s like the users table that we can just get straight from segment. We don’t necessarily have to do any more enrichment on it. But then we need, like the enriched customers table, which does take active customers from like the

82 00:10:50.980 00:11:00.350 Robert Tseng: from, like the segment users. And then we need to be able to add more to it than we currently have. And then customers, so that we can push that into customer. I/O,

83 00:11:01.710 00:11:02.360 Awaish Kumar: Okay.

84 00:11:02.620 00:11:03.849 Robert Tseng: Yeah, with customers.

85 00:11:04.550 00:11:05.470 Robert Tseng: Yeah.

86 00:11:05.470 00:11:11.200 Awaish Kumar: So in the like, we have this profile table, right? This you know what it was called

87 00:11:12.086 00:11:16.669 Awaish Kumar: in the bigquery, in the prod unify. There’s profile traits, updates.

88 00:11:17.210 00:11:17.820 Awaish Kumar: So.

89 00:11:17.820 00:11:18.180 Robert Tseng: Yes.

90 00:11:18.180 00:11:22.410 Awaish Kumar: You mentioned some user table like, like, where’s that nipple.

91 00:11:22.410 00:11:25.920 Robert Tseng: I I believe this. Is it.

92 00:11:25.920 00:11:27.580 Awaish Kumar: Are you talking about this one, or.

93 00:11:27.990 00:11:35.689 Robert Tseng: Yeah, I think this is like their big user table like, this is just this is just synced every hour. And it’s just like.

94 00:11:35.900 00:11:36.410 Awaish Kumar: Okay.

95 00:11:36.410 00:11:39.809 Robert Tseng: Yeah, I mean, 90% of these don’t have. Like, you know.

96 00:11:40.180 00:11:49.180 Awaish Kumar: Yeah. So what I understand is like, we get this user table, maybe try to keep it to short few columns which are useful.

97 00:11:49.430 00:11:55.690 Awaish Kumar: and then also create another table on top of it, which basically maybe combine data from our

98 00:11:55.900 00:12:02.340 Awaish Kumar: them customer and enrich it. Basically maybe with Ltv total orders.

99 00:12:02.340 00:12:03.150 Robert Tseng: Yes.

100 00:12:03.950 00:12:06.200 Awaish Kumar: Yeah, we have, yeah.

101 00:12:06.840 00:12:11.099 Robert Tseng: Yeah. And then this call this model here was just like.

102 00:12:11.565 00:12:27.000 Robert Tseng: I I was thinking of a better way to try to manage like, well, there’s like 350 columns here. How do we know what’s useful? So that’s I was hoping to transform this into like an intermediary table. That basically is like a.

103 00:12:27.130 00:12:40.829 Robert Tseng: you know, point in, it’s like a snapshot of like, what traits work trait work comes from, you know. How complete is it should like, and just like some sort of freshness test so that.

104 00:12:41.010 00:12:46.029 Robert Tseng: you know, we know, like when you’re when you’re deciding, like what to limit

105 00:12:46.220 00:12:54.930 Robert Tseng: from this model. We have, you know, this intermediary model to help you to make that decision.

106 00:12:55.383 00:13:06.730 Robert Tseng: If freshness score is above blah blah, or like completeness scores that are like that. You know, we have, like a set of criteria that helps you to. Narrow down like the

107 00:13:06.850 00:13:12.550 Robert Tseng: this, this raw model into like what you’re describing.

108 00:13:13.250 00:13:15.370 Awaish Kumar: Yeah, yeah, I, okay, I understand

109 00:13:16.130 00:13:20.899 Awaish Kumar: the purpose of what you are trying. You are trying to figure out the meaningful columns. Basically.

110 00:13:21.280 00:13:21.680 Robert Tseng: Yep.

111 00:13:21.680 00:13:23.680 Awaish Kumar: Trying to do from this. Oh.

112 00:13:23.680 00:13:29.790 Robert Tseng: And maybe I’m over complicating it. Maybe we could. We could do it much simpler than this, but I just wanted to at least throw something on there.

113 00:13:31.420 00:13:35.070 Awaish Kumar: Okay, understood? I will work on that today and

114 00:13:35.180 00:13:38.920 Awaish Kumar: maybe have us have something tomorrow, to show you so we can have more

115 00:13:39.170 00:13:41.290 Awaish Kumar: when we have something we can

116 00:13:41.390 00:13:44.739 Awaish Kumar: like like, have a more further discussion on top of that.

117 00:13:45.310 00:14:00.599 Robert Tseng: Yeah. And I know that you’re wanting to know. What exactly should that model look like? So that’s why I created a separate page of like a trade roadmap or enrichment tracker. You know, the the short answer is like, we don’t know exactly. It’s it’s a bit it’s a bit difficult, because

118 00:14:00.760 00:14:05.859 Robert Tseng: Bobby is the main lifecycle person like I, I need like a marketer

119 00:14:06.340 00:14:10.049 Robert Tseng: to tell me what’s useful. I mean, I can. Let’s make assumptions, and

120 00:14:10.120 00:14:20.099 Robert Tseng: you know, like you, said Ltv. Total orders, whether they’re churned or not next payment day like stuff like that. Sure, like we don’t really need someone to tell us that that’s important. We can just bring those in

121 00:14:20.375 00:14:40.989 Robert Tseng: and I don’t want to, you know, enrich it with too much. But so I was like kind of thinking through. Okay, well, what are the different like? How do we phase out like different things that we know that we can do immediately based on the data that we already have. And we just have, like, you know, a few fields that we add in. And I think that would be enough to make the decision so like over the next week.

122 00:14:41.000 00:15:07.080 Robert Tseng: and then maybe over time, as we’re working with a marketer. Then they’re letting us know. Hey, we want to add, add more and more traits. And we and we kind of understand, like, what are the different categories of things that they could go after, and I’ve already kind of mapped that out in my head, and I’ve tried to write. Write it down to some extent of like. What are the different. You know things that we could enrich it with, not saying that we need to capture all of this. I think really the objective for this

123 00:15:07.250 00:15:12.440 Robert Tseng: ex. You know, this sprint is really this, you know, is just this is the main

124 00:15:13.890 00:15:19.115 Robert Tseng: This is all we need to do for this script. Yeah.

125 00:15:19.570 00:15:20.050 Awaish Kumar: Okay.

126 00:15:20.050 00:15:21.039 Robert Tseng: Does that make sense.

127 00:15:21.770 00:15:22.480 Awaish Kumar: Yeah.

128 00:15:22.810 00:15:28.323 Robert Tseng: Okay, cool. So yeah, I think you get it. I think I think we’re on the same page.

129 00:15:29.680 00:15:30.670 Awaish Kumar: So that we

130 00:15:31.120 00:15:36.129 Awaish Kumar: I’ll work on it, and maybe, like, go further in detail in the on this document, and.

131 00:15:36.860 00:15:37.580 Robert Tseng: Yeah. Twist.

132 00:15:37.580 00:15:44.200 Awaish Kumar: We’ll have something so like when I’m I work on that like, I’ll have more clarity. And

133 00:15:44.560 00:15:49.400 Awaish Kumar: I’m like, when we see something. Yeah, we will have more

134 00:15:50.320 00:15:53.960 Awaish Kumar: more clarity on, like, how how to move forward.

135 00:15:54.560 00:15:55.240 Robert Tseng: Okay.

136 00:15:55.450 00:16:17.309 Robert Tseng: cool. Yeah, yeah, definitely. Don’t overcomplicate it. I think you, I think you understand. Thank you for making it simpler. I I think that’s that’s that’s helpful. Yeah. And the rest. I’m spending my time in customer I/O now trying to like, understand, like, how this is going to be used. So I have all these notes that I’m gonna keep spending more time here. But but yeah, I think

137 00:16:17.370 00:16:45.400 Robert Tseng: you get you you and you understand the point, we’re just trying to get a single source of truth for, like like a user’s model that includes anonymous and customers. And from that we’re just gonna push that into all of our tools from now on. And then we also want to have, like a single enriched customer data model. That we can use for customer I/O as well. So and and mixed now. So I think, yeah, I think we’re on the same page there. So it’s really just 2. Well, yeah, anyway. So I think that’s

138 00:16:45.490 00:16:51.332 Robert Tseng: that’s cool. I mean, sounds like, sounds like we’re we get it. So

139 00:16:52.560 00:16:53.780 Awaish Kumar: Okay. Yeah.

140 00:16:53.780 00:16:54.370 Robert Tseng: Yeah.

141 00:16:55.100 00:16:56.440 Robert Tseng: Any other questions.

142 00:16:56.690 00:16:59.459 Awaish Kumar: Is this helpful with this document helpful?

143 00:17:00.290 00:17:19.579 Awaish Kumar: Yeah, it was. It was very like it was very helpful. It was very, but it has a lot of information so how to digest everything. So I I that’s why I tried to like scope it down to to focus on on like today or like in next 2 days, to to.

144 00:17:19.589 00:17:20.009 Robert Tseng: Yeah.

145 00:17:20.010 00:17:27.019 Awaish Kumar: Actually, you can do something and then like iterate over this document because it has a lot of information.

146 00:17:27.740 00:17:39.810 Robert Tseng: Yeah, you’re right. So I think just this, what what we’re calling quote unquote objective to is really what you’re gonna work on. And and yeah, I I think the numbering is kind of confusing. I might take it off because it’s

147 00:17:40.270 00:17:44.989 Robert Tseng: that they don’t. It’s not a progression they don’t lead to one another. I I kind of just

148 00:17:45.570 00:17:47.180 Robert Tseng: yeah. I’m just gonna.

149 00:17:47.830 00:17:49.946 Awaish Kumar: Yeah, that’s okay. I know.

150 00:17:50.370 00:17:53.890 Robert Tseng: Okay, cool.

151 00:17:55.610 00:18:04.150 Robert Tseng: Alright. Yeah. Well, that’s that. This is just me. Like, I, I just document everything as I’m doing it. So sometimes I kind of write too much. But I’d rather have more than nothing. So.

152 00:18:05.400 00:18:07.139 Awaish Kumar: Okay, yeah, thank you.

153 00:18:07.800 00:18:08.500 Robert Tseng: Okay.

154 00:18:10.300 00:18:15.530 Awaish Kumar: I know. Yeah. Sorry. One more thing about this payment thing for polytomic.

155 00:18:16.040 00:18:18.409 Robert Tseng: Oh, yeah, I saw your message. Yeah.

156 00:18:19.150 00:18:29.870 Awaish Kumar: So Utam also said the same thing that like, if they want, I know we can have a contract, otherwise we can ask them to maybe have a set of their own card in the polytomic. So.

157 00:18:29.870 00:18:30.239 Robert Tseng: Got it.

158 00:18:30.240 00:18:33.130 Awaish Kumar: It is so they can just directly pay. Okay.

159 00:18:33.130 00:18:40.325 Robert Tseng: Okay, I will. I’ll let them know today. Yeah. So we don’t have to worry about that. It’s not super urgent. I think what you’re working on is more urgent. So

160 00:18:40.550 00:18:42.600 Robert Tseng: okay, cool. Let’s do that.

161 00:18:43.010 00:18:44.089 Awaish Kumar: Thank you. Bye.

162 00:18:44.360 00:18:45.440 Robert Tseng: All right. This.