Meeting Title: Brainforge CDP Implementation Intro Meeting Date: 2025-07-11 Meeting participants: Awaish Kumar, Henry Zhao
WEBVTT
1 00:00:38.570 ⇒ 00:00:40.120 Henry Zhao: Hello! How are you doing.
2 00:00:48.230 ⇒ 00:00:49.639 Awaish Kumar: We’re studying. Hi!
3 00:00:50.060 ⇒ 00:00:51.120 Awaish Kumar: How are you doing.
4 00:00:51.960 ⇒ 00:00:53.010 Henry Zhao: I’m good. How are you?
5 00:00:54.230 ⇒ 00:00:55.999 Awaish Kumar: I’m good. How about you?
6 00:00:56.600 ⇒ 00:00:58.020 Henry Zhao: Good thanks.
7 00:00:58.590 ⇒ 00:01:04.190 Henry Zhao: So I just started yesterday, so I thought I would like to meet you, and just kind of get a little bit of understanding of what you work with.
8 00:01:05.141 ⇒ 00:01:07.249 Henry Zhao: So I can figure out, you know
9 00:01:07.490 ⇒ 00:01:11.509 Henry Zhao: where, when I might be able to use you and ask you questions about certain things.
10 00:01:13.660 ⇒ 00:01:14.380 Awaish Kumar: Okay.
11 00:01:14.760 ⇒ 00:01:17.740 Henry Zhao: You wanted to give an introduction first, st I guess I’ll start with my introduction.
12 00:01:17.940 ⇒ 00:01:24.980 Henry Zhao: Basically, I’ve been brought on to help with the Cdp implementation.
13 00:01:25.509 ⇒ 00:01:31.300 Henry Zhao: So right now, right, we use segment. I don’t know if you’ve worked with segment, but I guess I’ll wait for your introduction to kind of get into that.
14 00:01:33.160 ⇒ 00:01:40.629 Awaish Kumar: Oh, yeah, like, my name is Avesh. I’m a kind of have been working
15 00:01:40.800 ⇒ 00:01:49.120 Awaish Kumar: as a lead data engineer in the past few years. And here I’m kind of managing people and also
16 00:01:50.040 ⇒ 00:01:52.400 Awaish Kumar: performing data engineering work.
17 00:01:52.984 ⇒ 00:01:57.979 Awaish Kumar: Yeah, like in the Eden side. Mostly, I’m managing data engineering
18 00:01:58.300 ⇒ 00:02:02.979 Awaish Kumar: work. And I’m supporting the Cdp work that Robert was doing.
19 00:02:03.200 ⇒ 00:02:06.300 Awaish Kumar: Select support means if there’s any
20 00:02:06.450 ⇒ 00:02:10.780 Awaish Kumar: modeling requirements like, if for your investigation, if you need any models, any
21 00:02:11.190 ⇒ 00:02:16.990 Awaish Kumar: kind of data ingestions or things like that. So I I have been helping with that.
22 00:02:18.410 ⇒ 00:02:24.820 Henry Zhao: Okay, so were you the one that set up like the web hooks, or like set up help set up set up segments? Or was that somebody else.
23 00:02:26.570 ⇒ 00:02:34.710 Awaish Kumar: Yeah, setting up segment was not part of like my responsibilities. That was done by someone in the Eden side.
24 00:02:34.890 ⇒ 00:02:41.740 Awaish Kumar: I’m just. We are just reading data which is coming from segment into the bigquery and
25 00:02:43.640 ⇒ 00:02:46.110 Awaish Kumar: and performing modelling on top of it, like
26 00:02:46.702 ⇒ 00:02:52.450 Awaish Kumar: like, whatever data comes through segment connectors and also on
27 00:02:52.970 ⇒ 00:03:00.210 Awaish Kumar: yeah, that was, that’s mo, mostly it, like we mostly working on whatever is coming out of segment
28 00:03:00.330 ⇒ 00:03:02.290 Awaish Kumar: and then building the models.
29 00:03:02.460 ⇒ 00:03:18.949 Awaish Kumar: But then, like some of the people can do in the Eden team are doing reverse Etl. So they are pulling data from bigquery, which maybe from our models, or something like that back to some customer I/O or different
30 00:03:19.610 ⇒ 00:03:22.090 Awaish Kumar: platforms. But like that’s not
31 00:03:22.766 ⇒ 00:03:32.039 Awaish Kumar: I do normally like like that’s not. That’s like. We are not managing that there. There are like different people in Aiden who are doing that.
32 00:03:33.795 ⇒ 00:03:38.669 Henry Zhao: Okay, so what part are you? Are you currently managing or like continuing to manage.
33 00:03:39.880 ⇒ 00:03:48.979 Awaish Kumar: No, no, that’s what I’m saying. Like segment is is being set up by someone in Eden. Engineering. Right?
34 00:03:49.470 ⇒ 00:03:52.590 Awaish Kumar: What we are doing is getting the data from segment
35 00:03:53.190 ⇒ 00:04:07.609 Awaish Kumar: right? And then any type of data analytics work on top of that is being done by rainforged team, which includes me Damilade, Robert, and Annie.
36 00:04:08.760 ⇒ 00:04:09.510 Awaish Kumar: So I’m.
37 00:04:09.510 ⇒ 00:04:09.830 Henry Zhao: You know.
38 00:04:09.830 ⇒ 00:04:14.890 Awaish Kumar: Data, analytics and data engineering person. So we have some data which is
39 00:04:15.040 ⇒ 00:04:18.099 Awaish Kumar: where we are not using segments. So we have some
40 00:04:18.625 ⇒ 00:04:28.179 Awaish Kumar: different kind of Zendesk or different connectors which which are not set up in segments. So we get data using different tools.
41 00:04:28.726 ⇒ 00:04:41.159 Awaish Kumar: and then just the data to bigquery and then model it, and leave the models in a format that the the other teams, like engineering marketing and the item can use it.
42 00:04:41.730 ⇒ 00:04:55.559 Awaish Kumar: But then, like, for example, if somebody’s using is using to for for reverse detail. That’s again, someone in the Eden team like the brain force is not managing that part right now.
43 00:04:57.670 ⇒ 00:05:00.890 Henry Zhao: Okay, so what languages do you do you work with.
44 00:05:02.760 ⇒ 00:05:04.270 Awaish Kumar: Mostly we are doing
45 00:05:04.450 ⇒ 00:05:13.969 Awaish Kumar: so mostly. All the injection and the modeling work we are doing is the Dvt. And I have to ex to access some of the data and
46 00:05:14.940 ⇒ 00:05:21.720 Awaish Kumar: reading data from Google Sheet. And things like that, we, I use some python using.
47 00:05:21.960 ⇒ 00:05:26.209 Awaish Kumar: So I’m using Dexter as a tool for our python pipelines.
48 00:05:26.430 ⇒ 00:05:29.790 Awaish Kumar: And we have been using Dbt and escal
49 00:05:29.920 ⇒ 00:05:32.349 Awaish Kumar: for over all the data transformation work.
50 00:05:33.830 ⇒ 00:05:39.549 Henry Zhao: Okay, very interesting. I think those are really the only questions I had. I guess if I have any questions I’ll
51 00:05:39.710 ⇒ 00:05:44.370 Henry Zhao: I’ll reach out to you. But in the meantime, is there anything that you recommend that I should look at on my onboarding.
52 00:05:44.370 ⇒ 00:05:50.569 Awaish Kumar: So yeah, for the Cdp work. I think Robert must have shared a document with you
53 00:05:50.730 ⇒ 00:05:58.780 Awaish Kumar: like he has built a very long document of different. What, what, what we are
54 00:06:00.000 ⇒ 00:06:08.110 Awaish Kumar: we need to investigate on Cdp work. He has shared this notion, Doc, in that there are like different objectives he has set.
55 00:06:08.270 ⇒ 00:06:23.395 Awaish Kumar: So there’s called one of the objective is to get the user profiles from segment and join it with the existing the warehouse customer data which is like coming from like our
56 00:06:24.380 ⇒ 00:06:30.640 Awaish Kumar: like sales platform, like boss. And so we have 2 different types of customer table.
57 00:06:30.940 ⇒ 00:06:46.260 Awaish Kumar: one, the user profiles coming from segment, which is based on all the different platforms which are connected to segment. Maybe some anonymous users, some users who who never made a purchase, but they are like still with the visited boss, or something like that.
58 00:06:46.864 ⇒ 00:06:52.509 Awaish Kumar: Visited like Indian website. And then on the other side, we have internal
59 00:06:53.778 ⇒ 00:06:58.371 Awaish Kumar: like, we have built the customer table. It’s called dim customer in the like
60 00:06:59.490 ⇒ 00:07:13.629 Awaish Kumar: for the the brain forge team. So we have built another dim customer table, but it it is only dependent on the the users who are real customers like we don’t have anyone who just visited the website and never made a purchase
61 00:07:13.780 ⇒ 00:07:29.580 Awaish Kumar: things like that. So now, like, I have been building like, we have the customer I have been building user profiles table. Now for the user profile table. We went through an exercise that, like everything I’m just telling you is is mentioned in the robot
62 00:07:30.005 ⇒ 00:07:46.579 Awaish Kumar: document I’m just giving. I’m summarizing it that in that document there’s an objective to create an audit table. So in the the user profile table is called the profile trade updates in the bigquery. It is coming from segment. It has more than like 300 columns.
63 00:07:47.250 ⇒ 00:07:53.709 Awaish Kumar: and for each column we call it a trait of a user, like like an email address or
64 00:07:54.420 ⇒ 00:07:58.300 Awaish Kumar: some mobile number or some other information like.
65 00:07:58.410 ⇒ 00:08:27.299 Awaish Kumar: there’s Utm source or things like that. So they are like, there are different traits for a user which basically can help us build a user profile. So there are like more than 300 columns. We have created a way to identify some meaningful traits which are useful to have in user profiles. So like, if we have all those 300, it’s really very hard to like get any real information out of that.
66 00:08:27.980 ⇒ 00:08:36.620 Awaish Kumar: So what we have have had have a strategy. So 1st of all, we try to measure those
67 00:08:37.970 ⇒ 00:08:51.839 Awaish Kumar: like we have, we, we find a way to calculate some of the metrics based on those traits. So there is a table in the bigquery. It is called int user traits int for intermediate
68 00:08:52.060 ⇒ 00:09:04.809 Awaish Kumar: and user traits. So there’s 1 table which basically have all these traits. And then for each trait, we are saying, we are checking the math different metrics like completeness
69 00:09:05.110 ⇒ 00:09:20.480 Awaish Kumar: and the normal percentages and the kind of like different variance. If there is a variance in the data and things like that. Then if it is being updated regularly or not, so kind of a few list of some to figure out like
70 00:09:20.580 ⇒ 00:09:23.909 Awaish Kumar: some medical map traits. So
71 00:09:24.418 ⇒ 00:09:36.161 Awaish Kumar: I have calculated all those metrics for those 300 traits. And then I use a filter query which is also in you can find it in Github, it and
72 00:09:37.050 ⇒ 00:10:00.859 Awaish Kumar: in the code base. It is the the model it’s in the. It’s a DVD project. So we are. The SQL. File is called like we call them as a model. So that model is again in meaningful traits. In that I have a query, and in that like, I’m I’m just using 4 filters to identify which traits are meaningful. And I get like around 30 to 40
73 00:10:00.970 ⇒ 00:10:05.201 Awaish Kumar: columns only, which really have some non null
74 00:10:06.160 ⇒ 00:10:10.954 Awaish Kumar: very, very data with larger variants. And
75 00:10:13.047 ⇒ 00:10:18.480 Awaish Kumar: some like distinct values for multiple users. So which gives us some
76 00:10:18.690 ⇒ 00:10:44.839 Awaish Kumar: and and like the indication that these can be useful, then I qualify them as a part of user profile table. So then again, there’s 1 more table, and that’s called user profiles. And then that user profiles table. If you go and you, you find it’s kind of one row per user. And then again, the we get the maximum.
77 00:10:45.900 ⇒ 00:10:49.700 Awaish Kumar: Whatever is the maximum value of that user?
78 00:10:50.328 ⇒ 00:10:56.760 Awaish Kumar: For some trade like it’s kind of a sub sub table of that bigger segment table.
79 00:10:57.090 ⇒ 00:11:09.679 Awaish Kumar: So out of 300 now we only have a table where we we have only 40 columns, which we identified, that they are meaningful. And also, there’s only one row per user. So we are not having like, multiple
80 00:11:10.548 ⇒ 00:11:15.139 Awaish Kumar: those per user, because original table will have multiple those per user as well.
81 00:11:16.040 ⇒ 00:11:25.280 Awaish Kumar: And then, after that, the next task I’m working on. It’s not finished yet, but I’m I’m targeting it to be done like in this week.
82 00:11:25.752 ⇒ 00:11:34.320 Awaish Kumar: Like like today. So at the end of day, so we’ll have one more table. It will be called kind of customer enriched model.
83 00:11:34.450 ⇒ 00:11:43.520 Awaish Kumar: So we have. as I mentioned, we have user profile table. And then we have a dim customer table which is already in there in victory.
84 00:11:43.830 ⇒ 00:11:45.820 Henry Zhao: Where where is the cable?
85 00:11:46.270 ⇒ 00:11:50.560 Awaish Kumar: Dim customer. And if if do you have access to bigquery, hidden warehouse.
86 00:11:51.140 ⇒ 00:11:56.126 Henry Zhao: Yeah. Which? Which? Folder like. Which repository is it? In which?
87 00:11:57.051 ⇒ 00:12:03.020 Awaish Kumar: So it is like in the git. If in the github it is in the in the analytics.
88 00:12:04.542 ⇒ 00:12:09.369 Awaish Kumar: It’s it’s it’s not in the brain forge AI organization.
89 00:12:09.950 ⇒ 00:12:15.120 Awaish Kumar: It is in Eden organization and it’s called Analytics
90 00:12:15.220 ⇒ 00:12:19.390 Awaish Kumar: Repository. If you don’t have access, maybe ask Robert.
91 00:12:19.890 ⇒ 00:12:23.649 Henry Zhao: But I mean in bigquery, in bigquery. Which repository is it in.
92 00:12:24.990 ⇒ 00:12:30.680 Awaish Kumar: Victory like there’s no repository like in victory. We have projects, schemas, and tables.
93 00:12:30.680 ⇒ 00:12:33.359 Henry Zhao: Yeah, which sorry? Which I meant. Which schema? Which schema is it in.
94 00:12:34.160 ⇒ 00:12:38.729 Awaish Kumar: It’s in the it’s in broad debiting marks.
95 00:12:39.790 ⇒ 00:12:43.280 Henry Zhao: Broad. Dvt. I don’t have that one. Maybe that’s why.
96 00:12:43.560 ⇒ 00:12:45.660 Awaish Kumar: Sorry. Can you share your screen? I can.
97 00:12:46.050 ⇒ 00:12:46.540 Henry Zhao: Yeah.
98 00:12:46.540 ⇒ 00:12:47.390 Awaish Kumar: Sure.
99 00:12:56.335 ⇒ 00:13:00.610 Awaish Kumar: If you search them customers on top, we can.
100 00:13:04.750 ⇒ 00:13:10.030 Awaish Kumar: Yeah, this one productivity marks and them customers.
101 00:13:11.030 ⇒ 00:13:13.420 Henry Zhao: Okay. If I start, maybe it’ll show up.
102 00:13:16.620 ⇒ 00:13:18.440 Awaish Kumar: Yeah, but the.
103 00:13:20.620 ⇒ 00:13:24.809 Henry Zhao: Okay. So here we have the dim customers. And then you were also talking about.
104 00:13:24.810 ⇒ 00:13:26.140 Awaish Kumar: And that’s not.
105 00:13:26.647 ⇒ 00:13:30.020 Awaish Kumar: Yeah, that’s that should be here as well. User profiles.
106 00:13:31.960 ⇒ 00:13:34.140 Awaish Kumar: You can search for it in the top.
107 00:13:37.290 ⇒ 00:13:38.440 Henry Zhao: It’s right here. Okay.
108 00:13:40.120 ⇒ 00:13:51.840 Awaish Kumar: So, yeah, now, I will working on creating a 3rd table, which will basically join the join these 2. But it will not have, like everything from dream customer.
109 00:13:51.990 ⇒ 00:14:04.829 Awaish Kumar: but it will have everything from user profile and plus few metrics from dim customer. And maybe if if there are some metrics which might not even be available in dim customer, like maybe
110 00:14:05.080 ⇒ 00:14:11.719 Awaish Kumar: total orders, value or total total revenue things like that lifetime value.
111 00:14:12.400 ⇒ 00:14:18.479 Awaish Kumar: so I will just calculate them on the fly, and we’ll make it a
112 00:14:19.314 ⇒ 00:14:22.229 Awaish Kumar: as part of maybe team customer and then
113 00:14:22.740 ⇒ 00:14:27.470 Awaish Kumar: build a final table called Customer Enriched model that will
114 00:14:27.620 ⇒ 00:14:33.539 Awaish Kumar: be that will have everything from user profile plus few more
115 00:14:33.720 ⇒ 00:14:36.700 Awaish Kumar: trades from them, customer. And the
116 00:14:37.560 ⇒ 00:14:46.990 Awaish Kumar: that’s what the objective is in in this exercise nice after that, like we can maybe, like
117 00:14:47.220 ⇒ 00:14:52.879 Awaish Kumar: Robot wanted to use this table to basically push the
118 00:14:53.070 ⇒ 00:14:56.960 Awaish Kumar: push it to customer I/O and use it for Owl
119 00:14:57.430 ⇒ 00:15:03.609 Awaish Kumar: the better campaigning, I think, and it’s like better segmenting of customers and campaigning.
120 00:15:03.970 ⇒ 00:15:10.850 Awaish Kumar: That was the objective. So I’m helping him to find out those customers, and he’s going to push
121 00:15:11.060 ⇒ 00:15:13.289 Awaish Kumar: that to Customer I/O and work on that.
122 00:15:14.020 ⇒ 00:15:19.420 Awaish Kumar: I’m not sure like. Now, how what is your role in it like, you know
123 00:15:19.530 ⇒ 00:15:21.670 Awaish Kumar: you must have discussed with Robert like.
124 00:15:24.310 ⇒ 00:15:25.236 Henry Zhao: Yeah, yeah.
125 00:15:27.160 ⇒ 00:15:34.000 Henry Zhao: Yeah. So 1st task for me is just to like, evaluate the Cdp, if we want to stay with segment or maybe move to rudder, stack.
126 00:15:35.760 ⇒ 00:15:43.050 Henry Zhao: Okay, well, that’s great. That’s really all I had to talk about today.
127 00:15:43.820 ⇒ 00:15:48.390 Henry Zhao: Thank you for the Good Intro. And then, if I ever need any help. I’ll probably reach out to you or or Robert
128 00:15:48.823 ⇒ 00:15:49.639 Henry Zhao: but thank you so much.
129 00:15:49.640 ⇒ 00:15:50.889 Awaish Kumar: Thank you. Yeah.
130 00:15:51.420 ⇒ 00:15:53.739 Awaish Kumar: Okay, have a good day. You, too. Bye?