2025-07-18_default_data_pipeline_strategy_discussio

Meeting Title: Default Data Pipeline Strategy Discussion Date: 2025-07-18 Meeting participants: Uttam Kumaran, read.ai meeting notes, Justin Wong, Victor Papyshev

WEBVTT

1 00:00:25.060 ⇒ 00:00:27.110 Uttam Kumaran: Hey, guys? Sorry about the delay.

2 00:00:27.240 ⇒ 00:00:29.540 Uttam Kumaran: Just another call kept going.

3 00:00:32.110 ⇒ 00:00:33.429 Justin Wong: Hey? No worries.

4 00:00:33.620 ⇒ 00:00:35.200 Uttam Kumaran: Hey? Nice to meet you just.

5 00:00:35.200 ⇒ 00:00:35.730 Victor Papyshev: No problem.

6 00:00:35.730 ⇒ 00:00:37.220 Justin Wong: Yeah. Good to meet you, too.

7 00:00:40.650 ⇒ 00:00:41.010 Uttam Kumaran: Ball.

8 00:00:41.010 ⇒ 00:00:41.780 Victor Papyshev: How’s it going.

9 00:00:42.200 ⇒ 00:00:47.050 Uttam Kumaran: Good. Good! I’m excited for the weekend. I don’t have any plans, so

10 00:00:47.510 ⇒ 00:00:51.170 Uttam Kumaran: I’m gonna try to do some. I wanna

11 00:00:51.630 ⇒ 00:00:54.710 Uttam Kumaran: do some clawed code stuff tomorrow.

12 00:00:54.980 ⇒ 00:01:02.310 Uttam Kumaran: So that’ll be my Saturday general session doing some doing some work for myself. So yeah.

13 00:01:03.100 ⇒ 00:01:06.900 Uttam Kumaran: I don’t get to do a lot of AI stuff for fun these days.

14 00:01:07.439 ⇒ 00:01:14.150 Uttam Kumaran: It’s like, kinda so I I this weekend, I feel like I have a little bit of time to try some stuff out. So.

15 00:01:15.590 ⇒ 00:01:16.350 Victor Papyshev: Awesome.

16 00:01:16.510 ⇒ 00:01:16.930 Uttam Kumaran: Yeah.

17 00:01:17.560 ⇒ 00:01:22.575 Victor Papyshev: Makes sense. I’m sure my brain will be buzzing with whatever we chat about today.

18 00:01:23.310 ⇒ 00:01:26.580 Victor Papyshev: I’ll probably be taking that into the weekend myself, but great.

19 00:01:28.680 ⇒ 00:01:46.509 Victor Papyshev: sweet. Well, yeah, I’m excited to jump back in, I mean, I guess, for context, you you just. I guess you just met Justin here. He’s one of our newest engineers to join the team. And so I’ve been working closely with him on sort of just as you know, a sounding board and a partner on this data.

20 00:01:47.387 ⇒ 00:01:52.430 Victor Papyshev: Data, pipelining and data modeling exploration stuff. So he’s fully fully caught up.

21 00:01:53.125 ⇒ 00:02:12.597 Victor Papyshev: Probably as as in the loop as me at this point on sort of where we’re at I gave Justin a little bit of a rundown of what we covered yesterday. I know. We kind of. I think the way yesterday broke down is probably 2 thirds kind of product facing discussion, and then, like 1 3rd kind of getting into the weeds a little bit more. I did a walkthrough of

22 00:02:13.248 ⇒ 00:02:35.940 Victor Papyshev: pipeline. Some some specific questions. Around identity resolution, and like the deal object stuff like that? So some of our takes. There’s questions. Justin did make a list of some questions he could come up with just from our conversations on our end. So we can kind of run through those. But I think maybe just we can start by just segwaying

23 00:02:35.950 ⇒ 00:02:50.279 Victor Papyshev: kind of off of yesterday into today. If there’s anything specific I did invite, I did provision Bringforge defaultcom and invited you to those cloud service providers. So you should be able to just poke around and confluent red panda and click house.

24 00:02:50.280 ⇒ 00:02:50.730 Uttam Kumaran: Okay. Great.

25 00:02:50.940 ⇒ 00:02:59.439 Victor Papyshev: As needed. But yeah, that’s kind of the I guess that’s the Tldr. We’ve got a bit of a short list of questions, but we can kind of just dive right in, I think.

26 00:02:59.630 ⇒ 00:03:05.370 Uttam Kumaran: Yeah, let’s go for it. Yeah, we spoke with. I spoke with Ryan earlier today, too, and we kicked off some of the product analytics.

27 00:03:05.811 ⇒ 00:03:07.618 Uttam Kumaran: Icp work on his side. So.

28 00:03:08.409 ⇒ 00:03:17.990 Uttam Kumaran: yeah, I was just doing some research after a call on like Red Panda versus confluent. Kind of like thinking a lot about sort of the what you were mentioning about

29 00:03:18.230 ⇒ 00:03:22.159 Uttam Kumaran: stuff that needs to be stateless and updates and things like that. So

30 00:03:24.150 ⇒ 00:03:26.599 Uttam Kumaran: yeah, we can go from from wherever.

31 00:03:27.500 ⇒ 00:03:47.289 Victor Papyshev: Cool. Well, maybe natural segue, and is is the top kind of question on our short list like, do we need deals, slash opportunities as a core, as a default core object, maybe, at least in the same sense as like person and company, or I think, just quick, recap like person and company, have, like pretty obvious.

32 00:03:47.290 ⇒ 00:04:06.630 Victor Papyshev: call, like quote, unquote natural identifiers. That a lot of other data sources subscribe to like enrichment providers all subscribe to email and domain. Essentially. Maybe they have some alternate ways of sort of querying or hitting their Apis, like Linkedin, URL, and stuff at times, but largely, I think, like

33 00:04:06.690 ⇒ 00:04:11.699 Victor Papyshev: email and company, like email and domain are just both widely accepted and like, it’s pretty typical.

34 00:04:12.075 ⇒ 00:04:38.119 Victor Papyshev: So I think that kind of led us yesterday into sort of how do we even treat deals? Because just because deals are more of like an internal object, almost that kind of feels like it should be anchored somewhere, and it’s like kind of relevant to you and like, has, like a many to one relationship with with respect to accounts and things of that nature. So we have shot a few things maybe, Justin, you can kind of jump into on, based on what

35 00:04:38.290 ⇒ 00:04:44.059 Victor Papyshev: I shared, and like. Kind of your take on on that. Now we can kind of jam on that to to get started.

36 00:04:44.990 ⇒ 00:04:47.640 Justin Wong: Yeah, I mean, as far as like the

37 00:04:47.740 ⇒ 00:04:54.790 Justin Wong: deal object is concerned, the way that I had started thinking about it was like

38 00:04:55.620 ⇒ 00:05:07.809 Justin Wong: people and companies are definitely objects where they can have information come from multiple sources. But ultimately we’ll all sort of tie to the same

39 00:05:07.940 ⇒ 00:05:08.740 Justin Wong: like

40 00:05:09.180 ⇒ 00:05:15.910 Justin Wong: core concept or core object, and it makes sense for us to have those as default core objects that then pull

41 00:05:16.500 ⇒ 00:05:26.400 Justin Wong: data in from those various sources, and then it sort of lives there, but deals and opportunities kind of strike me as a different thing where

42 00:05:26.500 ⇒ 00:05:37.530 Justin Wong: they don’t really exist too much as like a notion outside of probably the Crm that they

43 00:05:38.710 ⇒ 00:05:41.589 Justin Wong: originate from. And so

44 00:05:41.990 ⇒ 00:05:50.199 Justin Wong: would we need to have that as a separate object that we are effectively enhancing by having it as a default core object, because a lot of the time, like.

45 00:05:51.000 ⇒ 00:06:19.389 Justin Wong: yeah, you might have meeting notes or something that you might want to add to a deal, but I think a lot of customers probably are already doing that in their Crm, so it’s not something that we would have to augment, and usually any other contextual information that they add is going to be stuff that’s like related to either the company overall, or maybe one or 2 people on that deal, and so that we already have the capacity to.

46 00:06:19.890 ⇒ 00:06:33.500 Justin Wong: you know, modify, enhance those objects with that contextual information. But it’s very like one to Victor’s point. It’s very hard to sort of do identity resolution for a deal, because the like.

47 00:06:33.620 ⇒ 00:06:42.190 Justin Wong: it’s basically just defined by its relationships and certain attributes. Like, maybe you want to find deals based on products or something or skew. But

48 00:06:42.470 ⇒ 00:06:44.160 Justin Wong: you know, even then, like

49 00:06:44.710 ⇒ 00:06:53.850 Justin Wong: the unique identifier, is usually just some sort of like you it, or something that the Crm comes up with. Right? So, like all, considering all of those things.

50 00:06:54.560 ⇒ 00:07:04.020 Justin Wong: can you see a use case for it being its own core. Object. The way that we think about like a default person or a default company.

51 00:07:04.950 ⇒ 00:07:17.460 Uttam Kumaran: Yeah, we discussing this yesterday. Like, my, what I kind of ended up as is you’re not typically like in a sales motion. You’re not typically attributing.

52 00:07:17.710 ⇒ 00:07:19.810 Uttam Kumaran: It’s not really clear how to attribute

53 00:07:20.140 ⇒ 00:07:24.699 Uttam Kumaran: company or person related events to an opportunity.

54 00:07:25.194 ⇒ 00:07:44.649 Uttam Kumaran: Like an opportunity is a time box deal, but it’s not very obvious unless you only have one opportunity. For one thing, that activities for the people or companies associated with that opportunity also inherit those events. Also, you know, another like sort of like hands off way. To do this is.

55 00:07:45.080 ⇒ 00:07:52.500 Uttam Kumaran: you have the opportunity, but the opportunity. The source of truth has to be one of those systems

56 00:07:53.020 ⇒ 00:08:09.630 Uttam Kumaran: right, but also like those inherently the opportunities associated with a person or company. And then events are associated with those, so that in an opportunity view or something, you can see the rep related events. But they’re not like tied one to one to an opportunity.

57 00:08:09.750 ⇒ 00:08:13.770 Uttam Kumaran: Second, like I don’t. I think, for the most part, like

58 00:08:14.080 ⇒ 00:08:20.509 Uttam Kumaran: people, will be creating opportunities, but I doubt that there will be any enrichment on a deal like I don’t think

59 00:08:20.670 ⇒ 00:08:34.500 Uttam Kumaran: I would. I couldn’t really think about a used case yesterday where there is enrichment on a deal that wasn’t already something on the existing objects, right? So an opportunity, just what do they typically have? It just has like a

60 00:08:34.650 ⇒ 00:08:41.503 Uttam Kumaran: open date, maybe. Who’s this? Who’s assigned the date of expected close and a value.

61 00:08:42.470 ⇒ 00:08:48.730 Uttam Kumaran: there’s and then it has the associated company and the owner of the opportunity or sub owners.

62 00:08:48.730 ⇒ 00:08:49.160 Justin Wong: Right.

63 00:08:49.160 ⇒ 00:08:53.209 Uttam Kumaran: No like enrichment happening on that, because the enrichment happens on the associated objects.

64 00:08:54.810 ⇒ 00:08:56.829 Justin Wong: Yeah. And I think the

65 00:08:58.360 ⇒ 00:09:06.560 Justin Wong: there’s like 2 kind of patterns that I’ve been thinking about as it relates to opportunities. One is like, you know, for

66 00:09:07.789 ⇒ 00:09:36.000 Justin Wong: for attributing events that might come through the pipeline to opportunities like I don’t really see us ingesting them and referring to them as like Crm objects to be too much of an issue with that like, let’s say you have some type of event that a customer wants to pipe in, whether it’s 1 that we it’s 1 that we sort of omit natively, or something that they push in through an Api, whatever. But the event is something like.

67 00:09:36.090 ⇒ 00:09:37.759 Justin Wong: Oh, you know.

68 00:09:38.050 ⇒ 00:09:45.759 Justin Wong: maybe sales call happens. This is the duration, etc. But it’s like related to like these people.

69 00:09:46.203 ⇒ 00:09:54.976 Justin Wong: At this company. And maybe it’s related to this opportunity, because that’s something like the Ae. Knows at the time, or that some other context on the meeting

70 00:09:55.660 ⇒ 00:10:10.279 Justin Wong: like that, us not treating it like a core object. I don’t think that limits us in any way of attributing that events to it, for you know analytics or reporting later or for kicking off workflows. I think

71 00:10:10.850 ⇒ 00:10:16.379 Justin Wong: segwaying into to workflows or automation in general. That’s kind of where I’m

72 00:10:18.670 ⇒ 00:10:39.040 Justin Wong: feeling like we don’t need it as a core object or trying to make it a core object, then makes our lives a little bit more difficult, because, like, let’s say, you have a form fill. Come in, and you want some sort of automation based off of that form, fill or meeting request, even if you wanted to try and do the logic of like, does an opportunity exist

73 00:10:39.350 ⇒ 00:10:46.009 Justin Wong: for this customer and or this person

74 00:10:46.520 ⇒ 00:11:00.401 Justin Wong: like, is it enough to just say, like, Well, yeah, there is an opportunity in salesforce that exists. But, like, how would you differentiate one from the next one that’s potentially open, or try to intelligently decide whether or not you create a new one.

75 00:11:01.650 ⇒ 00:11:04.459 Justin Wong: like I. I’m not sure that having it as a

76 00:11:05.450 ⇒ 00:11:10.430 Justin Wong: core object helps us solve that problem anymore than just already having it as a

77 00:11:10.630 ⇒ 00:11:13.900 Justin Wong: as an ingested Crm object. Does that make sense.

78 00:11:14.070 ⇒ 00:11:18.257 Uttam Kumaran: Yeah, I I don’t think there’s I don’t think there’s any innovation there.

79 00:11:18.820 ⇒ 00:11:31.840 Uttam Kumaran: like, I think the innovation. Commonly, when you talk about opportunities are in understanding, velocity, understanding, value by stage understanding, value by rep understanding.

80 00:11:32.030 ⇒ 00:11:38.373 Uttam Kumaran: Rep like for a given rep, do they typically take longer, shorter on opportunity. But like

81 00:11:38.940 ⇒ 00:11:54.929 Uttam Kumaran: there’s also things that are like, you know. I think the only thing you may want to consider is opportunity value, based like automations, like when an opportunity is open for this long trigger. Something or

82 00:11:56.870 ⇒ 00:12:03.170 Uttam Kumaran: you know there, there may be some of those use cases that, like opportunity, plus some other signal.

83 00:12:03.360 ⇒ 00:12:04.893 Uttam Kumaran: Execute something.

84 00:12:05.660 ⇒ 00:12:06.260 Justin Wong: Yeah.

85 00:12:06.540 ⇒ 00:12:16.119 Uttam Kumaran: But I also I just I don’t if if you for company, if a company object only has one open opportunity, it would seem pretty fair to just say, like any event happening

86 00:12:16.260 ⇒ 00:12:22.410 Uttam Kumaran: pro it could be associated with that. But I I don’t. I I’m kind of with you in that I don’t see

87 00:12:23.130 ⇒ 00:12:25.639 Uttam Kumaran: I don’t see a clear reason to

88 00:12:27.480 ⇒ 00:12:35.949 Uttam Kumaran: like. The only other thing you could kind of say is like, look if a rep mails for rep associated with the opportunity emails, the client, and maybe that activity.

89 00:12:36.170 ⇒ 00:12:39.329 Uttam Kumaran: You could have some confidence that it’s related to that. But

90 00:12:39.510 ⇒ 00:12:47.971 Uttam Kumaran: I don’t know. In in working a lot myself on sales analytics. That’s not something they’re looking at at all. They don’t look at

91 00:12:49.530 ⇒ 00:12:54.470 Uttam Kumaran: They’re not looking at like emails sent and correlating that to

92 00:12:54.580 ⇒ 00:13:02.299 Uttam Kumaran: where things are closing. And I, it’s it’s it’s a lot simpler they’re looking at is how many things are open. How long has they been open, for?

93 00:13:02.630 ⇒ 00:13:04.460 Uttam Kumaran: Did we get a follow up out?

94 00:13:04.996 ⇒ 00:13:07.490 Uttam Kumaran: That’s kind of the extent of like

95 00:13:07.980 ⇒ 00:13:10.950 Uttam Kumaran: where things are. For for these typical sales managers.

96 00:13:11.760 ⇒ 00:13:19.829 Justin Wong: So that kind of segues into another question. I had sorry, Victor. I don’t know if I actually wrote this one down in the list. But

97 00:13:21.160 ⇒ 00:13:22.420 Justin Wong: the

98 00:13:22.650 ⇒ 00:13:31.150 Justin Wong: the question that leads me to, since you kind of touched on it in a couple of points there is. Are we thinking about how to store these

99 00:13:31.700 ⇒ 00:13:38.540 Justin Wong: in the most efficient manner. Because I think there’s like for most of our records.

100 00:13:38.630 ⇒ 00:14:08.459 Justin Wong: I I mean, this probably holds for most organizations right. But for most records there’s gonna be 2 ways that we try to access them, either by an actual id where we’re only looking for one or to your point of like, oh, automation based on. If there’s opportunities, for example, that have been open for more than 30 days or deal stage hasn’t changed, or something, right? So like in the 1st one, we know exactly which one we’re looking for. And we need to use the data from it. The second one. We’re doing it in more of like a

101 00:14:08.510 ⇒ 00:14:11.270 Justin Wong: search based list based approach list based.

102 00:14:11.270 ⇒ 00:14:11.790 Uttam Kumaran: Let’s see.

103 00:14:11.790 ⇒ 00:14:28.589 Justin Wong: Where, you know, we don’t need the ids necessarily. And the way that I think we’re planning on storing these at least one of the ways is treating everything like an event. So you know, it’s like the salesforce use case

104 00:14:29.510 ⇒ 00:14:42.490 Justin Wong: opportunity is created. That comes in as an event, we’re sending it through the event, pipeline. Ultimately, though, we’re storing it in, or I think we’re planning to store it in a quick house table for

105 00:14:43.071 ⇒ 00:14:55.650 Justin Wong: I don’t know how granular this is getting. Maybe we have one for salesforce objects. Maybe it’s specific to like, oh, salesforce opportunities. But either way, it’s kind of going into click house. And this is another

106 00:14:55.910 ⇒ 00:14:58.820 Justin Wong: situation where maybe.

107 00:14:59.470 ⇒ 00:15:11.711 Justin Wong: like opportunities are a little bit different from other objects, and that I I think the frequency of updates to that object is probably not quite as frequent as others.

108 00:15:13.550 ⇒ 00:15:29.520 Justin Wong: I don’t know I could be wrong about that. I’m I’m actually doubting that a little bit right now, but just like, you know, for the, for argument’s sake, any type of Crm record we are planning to ingest change events or created events, and then push that through the the stream, and that ultimately ends up in Click house, right? But then.

109 00:15:30.080 ⇒ 00:15:34.530 Justin Wong: like with Click House being the type of database it is. It’s not a primary id

110 00:15:34.950 ⇒ 00:15:40.099 Justin Wong: type of system that you would use to just pull something out of it, and so like

111 00:15:40.540 ⇒ 00:15:42.000 Justin Wong: going back to those

112 00:15:42.220 ⇒ 00:15:50.890 Justin Wong: query patterns that we would have in like, oh, a workflow execution, potentially, as it relates to individual records.

113 00:15:51.688 ⇒ 00:15:55.790 Justin Wong: Should we be storing these

114 00:15:56.350 ⇒ 00:16:01.950 Justin Wong: in a different data store. Should we be duplicating them somehow for a different use? Case like.

115 00:16:03.920 ⇒ 00:16:08.049 Uttam Kumaran: Yeah, I’m just like kind of struggling on whether or not we should.

116 00:16:08.290 ⇒ 00:16:10.040 Justin Wong: Be thinking about

117 00:16:10.880 ⇒ 00:16:19.709 Justin Wong: more efficient query patterns for the use cases that aren’t quite quick. House. And I, we have another question. That kind of touches on this later as well. But I think

118 00:16:20.340 ⇒ 00:16:21.809 Justin Wong: this is a good.

119 00:16:22.030 ⇒ 00:16:24.230 Uttam Kumaran: Entry point into that conversation.

120 00:16:29.100 ⇒ 00:16:36.050 Uttam Kumaran: Yeah, I mean it. Just I think it depends on the functionality that you want to allow. I mean.

121 00:16:36.781 ⇒ 00:16:40.360 Uttam Kumaran: I I think the opportunity based action.

122 00:16:43.800 ⇒ 00:16:51.099 Uttam Kumaran: Yeah, I mean, I can go back and forth and like, on a product sense of like, how important that is, I’ve worked with teams that are

123 00:16:51.220 ⇒ 00:16:53.200 Uttam Kumaran: sort of that advanced

124 00:16:54.940 ⇒ 00:17:01.399 Uttam Kumaran: I’ve worked. There’s a i would say. Most sales teams are not like they’re basically they’re barely getting to like

125 00:17:01.560 ⇒ 00:17:04.660 Uttam Kumaran: time and stage type analytics.

126 00:17:06.310 ⇒ 00:17:11.669 Uttam Kumaran: so I think it depends. I mean, look, if you’re gonna offer opportunity based

127 00:17:11.780 ⇒ 00:17:19.190 Uttam Kumaran: automations, I don’t think necessarily any of those are like particularly like timely like that. You can’t.

128 00:17:20.420 ⇒ 00:17:31.090 Uttam Kumaran: They have like a little bit of latency. And and where you take the action like they’re not. I don’t put those. Categorize them as like the moment something hits something it has to fire.

129 00:17:31.790 ⇒ 00:17:35.259 Uttam Kumaran: You could consider batching things or have some very light

130 00:17:35.380 ⇒ 00:17:38.139 Uttam Kumaran: Cron. But you’re right. If you have to take

131 00:17:38.637 ⇒ 00:17:51.809 Uttam Kumaran: actions and all those, then you’ll have to store all the opportunities somewhere, or at least recreate it from the events, the state of the opportunities. Right? I I do think that saving all of the change events

132 00:17:52.750 ⇒ 00:17:56.880 Uttam Kumaran: is is helpful. But again, if you’re you’re right, if you’re like.

133 00:17:57.030 ⇒ 00:18:01.750 Uttam Kumaran: the moment anything, a dimension on an opportunity turns into something.

134 00:18:02.120 ⇒ 00:18:07.800 Uttam Kumaran: You’ll have to basically run a where clause, find the rows that that qualify, and then

135 00:18:08.270 ⇒ 00:18:11.050 Uttam Kumaran: row by row. Take some action right.

136 00:18:11.050 ⇒ 00:18:11.920 Justin Wong: Yeah.

137 00:18:13.219 ⇒ 00:18:16.990 Justin Wong: Yeah. And I think that kind of segues into

138 00:18:17.360 ⇒ 00:18:27.819 Justin Wong: maybe one of the bigger questions that we have right now, I’m not sure. Victor discussed this with you yesterday, but definitely. One of the questions he and I spent some time puzzling over yesterday is like.

139 00:18:28.710 ⇒ 00:18:35.430 Justin Wong: right now we’re kind of architecting this system. To have Click house

140 00:18:35.880 ⇒ 00:18:43.100 Justin Wong: effectively be our source of truth for any point in time like

141 00:18:43.520 ⇒ 00:18:54.229 Justin Wong: these are the most up to date values for traits on a record. And then even a step further like this is the most point in time.

142 00:18:54.530 ⇒ 00:19:03.019 Justin Wong: Snapshot you might have for a record not just based on the most up to date data. But if you want to have, like a hierarchy of data for specific fields.

143 00:19:03.020 ⇒ 00:19:03.350 Uttam Kumaran: Yeah.

144 00:19:03.350 ⇒ 00:19:28.899 Justin Wong: So let’s say, you know, like employee counts one that we keep going back to. If you want to take that from, you can have a hierarchy, you might have a value for that from salesforce, from like a form, fill from an enrichment provider, and you can establish the hierarchy to be like, you know, I want the enrichment data to come first, st because maybe that’s more likely to be correct than something user provided. So

145 00:19:29.820 ⇒ 00:19:33.960 Justin Wong: in in that sense like, if we

146 00:19:35.120 ⇒ 00:19:39.869 Justin Wong: are treating Click House that way where we’re going to

147 00:19:40.130 ⇒ 00:19:52.089 Justin Wong: query it like, if we need to execute any type of action, and we need that information for like, oh, this like justin@default.com filled out a form.

148 00:19:52.200 ⇒ 00:20:16.390 Justin Wong: I have a bunch of automation that’s related to this. And it’s usually going to revolve around data that comes from like justin@default.com as a default object like we’re using click house, or at least planning to use Click House as the data store that returns us the information we need about that person. So you know, like the query pattern would be like for this organization. Id. For this email address

149 00:20:16.520 ⇒ 00:20:23.429 Justin Wong: give us effectively the aggregated list of traits

150 00:20:23.620 ⇒ 00:20:30.710 Justin Wong: by priority, and you know, with the merge tree by, like most recent value for a given source or whatever. But, like.

151 00:20:30.950 ⇒ 00:20:33.889 Justin Wong: you know, we’re kind of using it as our

152 00:20:34.320 ⇒ 00:20:44.190 Justin Wong: sifter to give us the freshest thing, but that might not necessarily be like the most efficient way to

153 00:20:44.780 ⇒ 00:20:48.589 Justin Wong: for us to access that data when we’re

154 00:20:49.520 ⇒ 00:20:57.769 Justin Wong: trying to read from? I don’t like. Does that make sense, you know, like it’s it’s great. For as an analytical tool, when we want to like bulk, query, a bunch of stuff. But

155 00:20:58.110 ⇒ 00:21:00.059 Justin Wong: if we’re in this.

156 00:21:00.320 ⇒ 00:21:00.680 Uttam Kumaran: Yeah.

157 00:21:00.680 ⇒ 00:21:11.900 Justin Wong: If we have access pattern we’re like, or or even if you just want to look at justin@default.com in the ui and see all the information about that record. We’re gonna pull that out of Click house versus.

158 00:21:12.250 ⇒ 00:21:14.689 Justin Wong: you know, a postgres database or something

159 00:21:16.650 ⇒ 00:21:25.740 Justin Wong: is, are there pitfalls with this strategy of us leaning so heavily on Click House. For you know, trade resolution, and ultimately, like

160 00:21:27.270 ⇒ 00:21:28.889 Justin Wong: source of truth.

161 00:21:29.170 ⇒ 00:21:34.449 Justin Wong: type thing. And and the reason why we’re leaning that way towards Click House, at least, is because we anticipate, like

162 00:21:35.040 ⇒ 00:21:55.860 Justin Wong: any number of events can come in and precipitate the need for updating traits on that record, and from at least my personal experience at Clearbit, when we had things like that happen. A postgres database just failed us because the number of rights that we were doing to the database with that frequency was just

163 00:21:56.250 ⇒ 00:21:57.676 Justin Wong: hammering it.

164 00:21:58.540 ⇒ 00:22:09.200 Justin Wong: so like for reads, it was great, but the number of updates that we had was just like our vacuums couldn’t keep up. We just had way. Too many transactions stemming from.

165 00:22:09.540 ⇒ 00:22:12.550 Justin Wong: you know, like relatively atomic update events.

166 00:22:14.170 ⇒ 00:22:20.880 Uttam Kumaran: Yeah, I mean, I would say, click house, you’re you’re right is gonna be great for any of these extremely like

167 00:22:21.160 ⇒ 00:22:32.250 Uttam Kumaran: heavy reads, like across super large data sets. You’re right on that. Like, I guess

168 00:22:33.230 ⇒ 00:22:40.160 Uttam Kumaran: part of my assumption was, there is like a postgres like basically postgres backend.

169 00:22:40.601 ⇒ 00:22:46.319 Uttam Kumaran: that’s typically used for those like, I mean, I assume that’s probably what’s kind of set up right now.

170 00:22:46.450 ⇒ 00:22:50.740 Justin Wong: Which is, do have those press for it currently.

171 00:22:54.910 ⇒ 00:22:59.710 Uttam Kumaran: it’s just for store for for anything where it’s like storing all these events.

172 00:23:01.090 ⇒ 00:23:03.569 Uttam Kumaran: using postgres, for that is pretty rough

173 00:23:04.420 ⇒ 00:23:07.080 Uttam Kumaran: for streaming all those events in

174 00:23:09.910 ⇒ 00:23:16.999 Uttam Kumaran: But you’re right, like the extremely fast updates

175 00:23:18.300 ⇒ 00:23:24.270 Uttam Kumaran: are, gonna be really tough on, like in a database like click house.

176 00:23:25.350 ⇒ 00:23:27.210 Uttam Kumaran: Yeah. And so.

177 00:23:27.550 ⇒ 00:23:29.470 Justin Wong: I’m kind of wondering

178 00:23:30.730 ⇒ 00:23:41.500 Justin Wong: for the situations where we need quick reads on this stuff, but we’re only ever querying an individual record, and we know that, like

179 00:23:41.710 ⇒ 00:23:46.610 Justin Wong: what options might we have for

180 00:23:48.508 ⇒ 00:23:56.010 Justin Wong: I guess, like storing the snapshots that the materialized view and click house produces. But in

181 00:23:56.260 ⇒ 00:23:59.259 Justin Wong: a database where we’re accessing it by.

182 00:23:59.860 ⇒ 00:24:06.960 Justin Wong: you know, a deterministic key like, yeah, just a row. And the thing the thing that just pops into my mind right now that we kind of.

183 00:24:07.440 ⇒ 00:24:10.850 Justin Wong: We kind of did this before at Clearbit, I

184 00:24:11.100 ⇒ 00:24:17.840 Justin Wong: not sure that it necessarily has a lot of value here, but like maybe we store.

185 00:24:18.650 ⇒ 00:24:26.710 Justin Wong: Maybe we store completed snapshots coming out of a materialized view and like a dynamo dB, table with

186 00:24:27.373 ⇒ 00:24:34.566 Justin Wong: like, you know, deterministic primary and sort keys or partition keys and sort keys.

187 00:24:36.210 ⇒ 00:24:41.440 Justin Wong: Because then, like, we’re not really faced with the same issue of

188 00:24:42.419 ⇒ 00:24:49.139 Justin Wong: like transactions in a postgres database. From this data, changing all the time. But then we are still able to like

189 00:24:49.860 ⇒ 00:24:56.120 Justin Wong: very adequately, or, you know, like efficiently read from it like very fast.

190 00:25:00.180 ⇒ 00:25:02.219 Uttam Kumaran: Yeah, I’m trying to think about

191 00:25:02.330 ⇒ 00:25:06.399 Uttam Kumaran: at what point you would. You actually like, build that clone? dB.

192 00:25:09.360 ⇒ 00:25:10.250 Justin Wong: Yeah.

193 00:25:10.250 ⇒ 00:25:10.940 Victor Papyshev: And

194 00:25:11.470 ⇒ 00:25:40.090 Victor Papyshev: I guess maybe something I’m thinking through is like, I think we agree that Click House is a great candidate for essentially storing this huge data set of just activity. Essentially, just like updates, granular updates that are being processed upstream via that transform and just being piped into click house and for the for the purpose of, you know, generating like these snapshots. So you you end up with stuff like this like, 1st name, 1st name, source, last name, last name source, and that all checks out. I think that’s like.

195 00:25:40.110 ⇒ 00:25:42.659 Victor Papyshev: you know. That’s why Click House is strong

196 00:25:42.730 ⇒ 00:25:50.530 Victor Papyshev: contender for that, I guess, then, if the reads, if, like the the reads by primary key is like our pitfall here, like where it’s like, it’s just not.

197 00:25:51.150 ⇒ 00:26:19.269 Victor Papyshev: It’s not designed for that right? Like house is better. For you know, big data analytics and for our use cases, I guess where I’m going with this like tying it into use cases, and like where each is important. So like, I think, for our purposes, in terms of like building, like robust, like filtered sorted views for the purpose of, like our users using our tables in product, and like being able to do robust reporting and like check in on, you know, reports ultimate, say.

198 00:26:19.270 ⇒ 00:26:19.610 Uttam Kumaran: Yes.

199 00:26:19.610 ⇒ 00:26:22.243 Victor Papyshev: Views, views, and whatnot. Right?

200 00:26:23.410 ⇒ 00:26:33.900 Uttam Kumaran: The common pattern is moving from postgres into click house. For the analytics use cases. I wonder if now postgres just becomes something in parallel that’s maintained.

201 00:26:34.070 ⇒ 00:26:37.559 Uttam Kumaran: And then click house is used for certain use cases that are like the

202 00:26:39.940 ⇒ 00:26:46.410 Uttam Kumaran: like, yeah, you’re you’re basically scanning across a ton of rows for like an analytics number or something for that reason.

203 00:26:47.290 ⇒ 00:26:53.919 Uttam Kumaran: And if you’re looking doing just a simple object lookup that also exists in a postgres environment, I mean, the problem will be

204 00:26:54.570 ⇒ 00:26:57.620 Uttam Kumaran: vasselets for both.

205 00:26:58.370 ⇒ 00:27:01.210 Uttam Kumaran: But the analytics use cases. The sla’s are always

206 00:27:02.040 ⇒ 00:27:05.989 Uttam Kumaran: less like you don’t need to have crazy sla’s for

207 00:27:07.020 ⇒ 00:27:10.289 Uttam Kumaran: for that sort of stuff. It’s just if it powers product.

208 00:27:11.060 ⇒ 00:27:14.699 Uttam Kumaran: Then we just had to be clear on what sources, for what logic.

209 00:27:15.550 ⇒ 00:27:40.560 Victor Papyshev: Yeah, well, here’s the thing. I think we. I think this is more of it. Maybe Justin, you and I went back and forth about this like actually, this is from Monday, I think, is where we started. Really digging into. This is like the work. The real time workflow use case is probably like, though the singular, like true, call it real time, like we need a fast read sort of use case that we have, and even those are pretty few and far between, like, you know, we kind of racked our brains like we looked at the current product. I mean

210 00:27:41.060 ⇒ 00:27:57.679 Victor Papyshev: realistically, it’s like form submissions where, like, there’s a prospect sitting in front of their computer and looking at a spinner, right? They need to know like what happens. And in order for us to figure out what to do with that. With that submission, either like show, a schedule, or redirect, or whatever we have to run a workflow. So that’s like a real time workflow use case.

211 00:27:57.720 ⇒ 00:28:15.410 Victor Papyshev: And then maybe we do. We apply the same sort of concepts like our booking links right? Like we don’t do that today, but that’s something we might want to do tomorrow. But, like overall there, there are relatively few like true call, like prospect facing like highly latency, sensitive use cases, and those are probably it like, let’s say, like we come up with a few more

212 00:28:15.410 ⇒ 00:28:35.060 Victor Papyshev: down the line. But it’s going to be like a handful, not like 90% of them. So in terms of like, sla’s like, I think that we can be more flexible in terms of like. If someone queues up a batch of a hundred workflow runs right, or a hundred enrichments, or something like that, like, I think it’s going to be acceptable for that stuff to come in like at a slower pace. So if we’re thinking through those like real.

213 00:28:35.060 ⇒ 00:28:42.950 Victor Papyshev: those real time use cases like, am I understanding it correctly that those are the use cases where, like fast reads, are important? I mean, I know we also don’t want to be like.

214 00:28:42.950 ⇒ 00:28:52.019 Uttam Kumaran: Like, let me give you an example. So you know, I worked at this company flow code. They’re like a QR. Code company. And so one of our common use cases was this, like dynamic redirects

215 00:28:53.270 ⇒ 00:29:11.423 Uttam Kumaran: where based on like an IP lookup, or we, we also built a rules engine. You could route people like based on the time of day, based on like the weather. Random stuff like that. But you would need to be able to change the re dime, to be able to switch people.

216 00:29:11.760 ⇒ 00:29:17.549 Uttam Kumaran: and that lookup. Yes, we were. I think we were using. I could actually call a friend, but we

217 00:29:17.961 ⇒ 00:29:31.358 Uttam Kumaran: we were using single store to process those events, and then we had some other. But we were doing that in some type of postgres related environment where we could basically look up, where is this thing need to redirect and then dynamically send them there?

218 00:29:32.040 ⇒ 00:29:42.755 Uttam Kumaran: And then the what I was processing on the analytics side was the events because we would need to report out. But the sla’s were a lot different, like, it wasn’t important.

219 00:29:43.590 ⇒ 00:29:51.019 Uttam Kumaran: but it was important to materialize those changes on the like where the redirect should go, or rules pretty quickly.

220 00:29:51.382 ⇒ 00:29:55.110 Uttam Kumaran: And for those to be looked up, because those have to be looked up in flight.

221 00:29:55.630 ⇒ 00:29:59.419 Uttam Kumaran: You know, basically when someone scans a QR code.

222 00:30:00.410 ⇒ 00:30:14.580 Victor Papyshev: Yeah, I think one example that’s been brought up internally is by one of our guys who was at Sprig, where they also lean on Click House significantly and like kind of the whole bread and butter there is like, based on, based on this user’s activity and their traits.

223 00:30:14.580 ⇒ 00:30:32.140 Victor Papyshev: Do we need to show like a survey right now, like, do we need to show a survey in app? That’s essentially, you know, the entire product in a nutshell. It’s just like survey, like conditional survey presentation, based on like a large swath of data associated with like, I guess that anonymous Id or that user Id or whatever.

224 00:30:32.140 ⇒ 00:30:32.470 Uttam Kumaran: Yeah.

225 00:30:32.470 ⇒ 00:30:38.780 Uttam Kumaran: I guess your my question would be does if the action has to be taken. But does the enrichment have to be like the most up to date enrichment, like.

226 00:30:41.334 ⇒ 00:31:02.789 Victor Papyshev: That’s a good question. So we kind of dug into this yesterday just somewhere. It’s like, you know, kind of like, what are our in terms of data sources. If we’re talking about like the goal, the ultimate, like the high level goals again, kind of how I contrast our today’s workflows with hopefully. Tomorrow’s workflows is today. It’s like, when it comes time to run a workflow execution. It’s pretty pretty simple, like it’s just, you know. Trigger happens.

227 00:31:02.810 ⇒ 00:31:27.200 Victor Papyshev: hit the execute service. We’re gonna like, I mean, it could be a batch like or whatever. But, like either way, we we kick off execution. We kind of just like jump right into it. And then, like any data that’s required, or a lot of the data that’s required for that workflow execution is often like fetched essentially like as notes in the workflow, like go like enrich with Apollo. It’s gonna go enrich with Apollo. Then it’s gonna move on the next step. Pretty simplistic. I think that a contrast like product wise

228 00:31:27.200 ⇒ 00:31:37.049 Victor Papyshev: here is like, I’m thinking of it, at least at least conceptually. More is almost like a data like a pre flight check where it’s like, okay, workflow triggered. Now, we need to go essentially

229 00:31:37.060 ⇒ 00:31:53.928 Victor Papyshev: get the most complete data we can, according to like the sla’s of like this type of workflow execution. So like, I think 97% of workflow trigger types are gonna be like time insensitive like, it can go, because I guess, like, the whole point is that you know, if I pull up this

230 00:31:54.230 ⇒ 00:32:18.439 Victor Papyshev: you are like, the user is essentially defined here like this is like my understanding of my data. This is how I expect my objects to be populated in default, and I’m going to be building my workflows like off of the that assumption essentially like. And my goal is that I can access person. And I can be confident that it’s being. It’s fields are coming from this stuff in this way or these sources in this way. So if that’s like the high level goal, it’s like this, data pre flight check. Essentially

231 00:32:18.440 ⇒ 00:32:27.829 Victor Papyshev: like the the flow that happens is, you know, ideally, we check what we already have, I guess, like, that’s where, like a like a primary key based lookup is ideally fast. Again, maybe, like.

232 00:32:27.830 ⇒ 00:32:28.220 Uttam Kumaran: Yeah.

233 00:32:28.220 ⇒ 00:32:42.949 Victor Papyshev: Sensitive use case. We don’t care. But like, let’s let’s let’s like, look at the most demanding use case like a form submission like somebody’s sitting there in front of a spinner right now and like speed like our customers complain about latency even today. So like we want to make that as fast as possible. So

234 00:32:42.950 ⇒ 00:33:12.659 Victor Papyshev: you know, they’re sitting there like, we need to go essentially find out, like, what data do we need for this? Like, let’s let’s go see where we’re at. And do we need to enrich right now? Now, I think the the valid data sources for that use case are going to be a little bit more limited, like, you know, which ones are fast versus slow, like we have some enrichment partners that they’ve we’ve seen them take like 11 seconds to go get data like those are not going to be eligible right now, like we’re not even going to consider those in terms of like enriching right now, but clear bit in a poll are pretty fast. We run those in parallel. We should be good to go

235 00:33:12.660 ⇒ 00:33:20.839 Victor Papyshev: in, like, you know, 300 ms, say 300 ms or less. That’s that’s not too problematic. But I guess, like the that initial step of like.

236 00:33:21.060 ⇒ 00:33:31.069 Victor Papyshev: So you’re like justin@default.com just fill the form. I want to know. Do I have data on Justin already? I might like, that’s where I guess right now we’re assuming we’d be hitting Click house with a query like.

237 00:33:31.070 ⇒ 00:33:31.450 Uttam Kumaran: Yeah.

238 00:33:31.450 ⇒ 00:33:37.770 Victor Papyshev: Select where that like email is just in a default.com so.

239 00:33:37.770 ⇒ 00:33:38.440 Justin Wong: Yeah.

240 00:33:38.440 ⇒ 00:33:42.829 Victor Papyshev: That’s where it’s like, I guess that’s like the use case. Where reads we want reads to be fast.

241 00:33:45.390 ⇒ 00:33:50.110 Victor Papyshev: Otherwise we naively enrich. I guess. We could just do that. But

242 00:33:50.350 ⇒ 00:33:52.840 Victor Papyshev: anyway, yeah, I’m curious to hear about the tracks, or what.

243 00:33:54.970 ⇒ 00:33:57.089 Justin Wong: What? What do you mean by naively enriched.

244 00:33:57.730 ⇒ 00:34:05.609 Victor Papyshev: I mean, I just kind of made that up on the spot, like if if like hypothetically, if like click house is too slow, like we could just.

245 00:34:06.030 ⇒ 00:34:06.930 Justin Wong: Interesting Rachel.

246 00:34:06.930 ⇒ 00:34:17.230 Victor Papyshev: Like, essentially resolve the data, resolve the data priority on the spot, like naively like, without really looking. I think that’s very inefficient. I probably wouldn’t opt for that. I’m just thinking, like.

247 00:34:17.650 ⇒ 00:34:28.070 Victor Papyshev: yeah, actually, I I would probably just straight up, not offer that realistically. But I guess, like, how do we work around it, if like, if we assume that this will be slow. And maybe my other question is like, how slow, because I mean.

248 00:34:28.400 ⇒ 00:34:32.637 Victor Papyshev: hey, obviously, it’s gonna be a lot because this data is gonna grow. So

249 00:34:33.460 ⇒ 00:34:40.530 Victor Papyshev: that might be, you know, that might be a problem in the long run. So I’m curious, like, how slow can be expected to be. Because if it’s like.

250 00:34:41.040 ⇒ 00:34:44.879 Victor Papyshev: you know, if it’s 300 ms or something that’s not a problem. Now, is that.

251 00:34:44.880 ⇒ 00:34:45.670 Uttam Kumaran: Yeah.

252 00:34:45.679 ⇒ 00:34:48.499 Victor Papyshev: Computationally intensive. And like, it’s gonna like run up cost.

253 00:34:48.500 ⇒ 00:34:49.729 Uttam Kumaran: Yes, yes.

254 00:34:49.739 ⇒ 00:34:50.639 Victor Papyshev: You know, that’s gonna be.

255 00:34:50.639 ⇒ 00:34:51.059 Justin Wong: Yeah.

256 00:34:51.060 ⇒ 00:34:51.780 Victor Papyshev: Ratio.

257 00:34:54.270 ⇒ 00:34:59.719 Justin Wong: So that’s kind of what I I think. I just thought of another way to kind of frame this

258 00:35:00.750 ⇒ 00:35:03.656 Justin Wong: this idea like when we’re looking at the

259 00:35:04.730 ⇒ 00:35:09.470 Justin Wong: the field mappings in default. Victor, the

260 00:35:09.660 ⇒ 00:35:36.360 Justin Wong: like. This is not all that different in concept from having essentially formula fields right? So like in salesforce. You could have an object that looks like this, except for annual revenue, when it’s custom properties, or something, or like rather, here it would be more like 1st name, where you have 2 mappings that’s effectively like in salesforce analogous to a formula field. The only thing is like in salesforce formula fields.

261 00:35:36.820 ⇒ 00:35:42.080 Justin Wong: They exist as a formula, and they’re computed every time. So if you wanted to just like

262 00:35:42.270 ⇒ 00:35:54.769 Justin Wong: query, a database, you know, like, if you did a Socal query that would get computed at re time. That data is just not there statically. And what we’re trying to do here is potentially.

263 00:35:54.770 ⇒ 00:35:55.820 Uttam Kumaran: Materialize it.

264 00:35:56.030 ⇒ 00:36:07.280 Justin Wong: Yeah, you know, haven’t materialized. So this is computed like, it’s, it’s not computed every time. It’s just there, statically available every time. And like this is definitely

265 00:36:07.560 ⇒ 00:36:11.929 Justin Wong: great. For if we’re doing larger scale,

266 00:36:13.690 ⇒ 00:36:19.441 Justin Wong: hey, we need to do something with a batch of objects, or we’re gonna power like the

267 00:36:20.150 ⇒ 00:36:27.070 Justin Wong: the tables view within default of objects and stuff like, it’s wonderful for that use case. But whenever we’re dealing with like

268 00:36:28.460 ⇒ 00:36:37.449 Justin Wong: a singular atomic record that, like a unique record based on email, this kind of feels like an overly

269 00:36:38.460 ⇒ 00:36:41.080 Justin Wong: powerful, maybe

270 00:36:41.350 ⇒ 00:36:49.140 Justin Wong: solution for having identity resolution for a singular record when you’re only grabbing it by email, or something like that.

271 00:36:54.400 ⇒ 00:36:55.230 Uttam Kumaran: Yeah.

272 00:36:57.510 ⇒ 00:36:58.210 Victor Papyshev: Hmm.

273 00:36:59.140 ⇒ 00:37:02.159 Justin Wong: And, like Victor, kind of what we were talking about yesterday, is

274 00:37:03.240 ⇒ 00:37:22.869 Justin Wong: almost emulating the current salesforce approach of well, this is effectively a formula field, and we compute it every time when we need to for the real time use case. But the sticker here is we want to do both right, like we have, we have real time use cases where we’re going to be operating on

275 00:37:23.290 ⇒ 00:37:31.429 Justin Wong: XY, and Z records, and we know exactly what XY and Z are versus. We’ll have plenty of use cases where we’re also operating on

276 00:37:31.850 ⇒ 00:37:35.909 Justin Wong: batches or querying lists and not knowing how much we’re gonna get back.

277 00:37:36.040 ⇒ 00:37:42.039 Justin Wong: But we effectively have these 2 query patterns that we want to

278 00:37:42.300 ⇒ 00:37:50.280 Justin Wong: solve for. And currently, it’s just like we cannot find a

279 00:37:50.650 ⇒ 00:37:56.840 Justin Wong: approach that is equally as powerful and cost efficient for both use cases.

280 00:38:00.510 ⇒ 00:38:04.559 Victor Papyshev: Yeah, I mean, I think that all tracks, I mean, yeah, we have 2.

281 00:38:05.490 ⇒ 00:38:13.279 Victor Papyshev: Call it conflicting use cases where I mean, the click house approach is good for the former. Yeah, I mean no notes. I think I agree with that in terms of.

282 00:38:13.280 ⇒ 00:38:13.920 Uttam Kumaran: Yeah.

283 00:38:13.920 ⇒ 00:38:15.380 Victor Papyshev: The conundrum, or.

284 00:38:16.280 ⇒ 00:38:17.910 Justin Wong: For what it’s worth, though.

285 00:38:17.910 ⇒ 00:38:18.580 Uttam Kumaran: Yeah, go ahead. Go ahead.

286 00:38:18.580 ⇒ 00:38:29.659 Justin Wong: Performance wise at clear bit. We didn’t do it to the scale that we’re planning to do it here, but we had certain tables that we would use for trade resolution, and we would look them up by like.

287 00:38:30.960 ⇒ 00:38:42.209 Justin Wong: you know you you can’t select by primary key, but we would look it up filtering. For you know, email or record Id or something to that effect. And those queries.

288 00:38:45.540 ⇒ 00:38:46.920 Justin Wong: we’re not

289 00:38:48.730 ⇒ 00:39:06.110 Justin Wong: like latency wise. We didn’t really have a problem with it. When we were working, when we were actually querying for just one record, we would actually have more problems with it, querying for large amounts of records. But whenever we were looking for just trait resolution, for, like a specific record, I don’t recall us having a time where the latency was like

290 00:39:06.880 ⇒ 00:39:11.779 Justin Wong: too high to be unacceptable for a real time. Use case. So you know, take that for what it’s worth

291 00:39:11.960 ⇒ 00:39:16.449 Justin Wong: it. It might performance wise, actually be okay, but cost wise.

292 00:39:17.788 ⇒ 00:39:21.129 Justin Wong: You know, I don’t know what that will look like.

293 00:39:21.680 ⇒ 00:39:22.470 Victor Papyshev: Sure.

294 00:39:22.660 ⇒ 00:39:31.590 Victor Papyshev: Yeah, when you say identity resolution, you mean this bit where we’re doing like right now, we’re using like the Arc, Max, if based on priority scoring that we’re doing based on the user setup right? Like the snap.

295 00:39:31.590 ⇒ 00:39:32.180 Justin Wong: Yeah, I’m.

296 00:39:32.180 ⇒ 00:39:32.750 Victor Papyshev: View.

297 00:39:32.750 ⇒ 00:39:36.959 Justin Wong: Yeah, we never did anything. This complex. We kind of

298 00:39:37.970 ⇒ 00:39:50.819 Justin Wong: stopped at the step of like, we’ll have multiple values coming in for a specific trait. We generally just want the most recent one. So we let like, you know, the merge tree. Take care of that. But you know, yeah. Still, we’re

299 00:39:51.580 ⇒ 00:39:56.969 Justin Wong: at that point. It’s the same thing as having, like the cost for the material as you is built into.

300 00:39:57.280 ⇒ 00:40:08.426 Justin Wong: like the right of it. So when you’re reading from a materialized view. The cost of that is not any different than reading from like the the table itself, after the merge tree has already done its thing.

301 00:40:08.720 ⇒ 00:40:09.560 Victor Papyshev: Sure.

302 00:40:09.880 ⇒ 00:40:12.069 Justin Wong: So yeah, it’s like more more. The

303 00:40:12.230 ⇒ 00:40:15.320 Justin Wong: cost of actually having the materialized view.

304 00:40:16.200 ⇒ 00:40:16.530 Victor Papyshev: Yeah.

305 00:40:16.530 ⇒ 00:40:22.069 Uttam Kumaran: But the nice thing is, you don’t need all the tables like in. If you were to add postgres to this

306 00:40:22.320 ⇒ 00:40:26.130 Uttam Kumaran: like you just need a couple of objects

307 00:40:26.260 ⇒ 00:40:33.590 Uttam Kumaran: and a couple of dimensions, like some. Some dimensions, are not going to be used for routing right?

308 00:40:34.147 ⇒ 00:40:39.510 Uttam Kumaran: And really, what you need back into the analytics is you just need to know.

309 00:40:40.750 ⇒ 00:40:43.120 Justin Wong: Like the analytics on, like what the.

310 00:40:43.480 ⇒ 00:40:47.170 Uttam Kumaran: What was hit. And then that’s what would go back into click house.

311 00:40:52.020 ⇒ 00:40:55.029 Uttam Kumaran: right? Because the in, yeah, go ahead.

312 00:40:56.250 ⇒ 00:40:58.069 Justin Wong: That is one of our struggles.

313 00:40:58.630 ⇒ 00:40:59.300 Uttam Kumaran: Yeah.

314 00:41:01.020 ⇒ 00:41:03.860 Justin Wong: I think that is one of our struggles. Is that

315 00:41:04.940 ⇒ 00:41:08.289 Justin Wong: the use case for our customers is they

316 00:41:08.720 ⇒ 00:41:17.570 Justin Wong: actually do want to route on fields where they’re utilizing this prioritization and waterfall logic like.

317 00:41:17.570 ⇒ 00:41:18.230 Uttam Kumaran: Yeah.

318 00:41:18.930 ⇒ 00:41:19.630 Justin Wong: Yeah.

319 00:41:20.040 ⇒ 00:41:26.069 Uttam Kumaran: So. But you need to just clone. That is what needs to get copied into postgres. But, like the.

320 00:41:26.070 ⇒ 00:41:27.330 Justin Wong: Oh, I see what you’re saying.

321 00:41:27.490 ⇒ 00:41:30.482 Uttam Kumaran: That needs to get copied into postgres

322 00:41:32.030 ⇒ 00:41:33.659 Uttam Kumaran: But, for example.

323 00:41:33.870 ⇒ 00:41:39.819 Uttam Kumaran: if you don’t have like, if you don’t have the information from the clear bit or whatever on time.

324 00:41:40.520 ⇒ 00:41:48.420 Uttam Kumaran: It’s the routing still needs to happen very quick. So the re needs to happen super quick, whether that data is going to be

325 00:41:48.970 ⇒ 00:41:53.660 Uttam Kumaran: like. The enrichment can’t happen, and and there can’t be any sort of real time enrichment.

326 00:41:54.545 ⇒ 00:42:01.690 Uttam Kumaran: All that data has to be in postgres sort of static. What, then, has to go back to Click House, though, is the event like

327 00:42:02.020 ⇒ 00:42:08.500 Uttam Kumaran: a form fill route happened, and that has been the back for analytics.

328 00:42:09.060 ⇒ 00:42:09.790 Justin Wong: Yeah.

329 00:42:10.200 ⇒ 00:42:10.760 Victor Papyshev: Hmm.

330 00:42:10.970 ⇒ 00:42:19.589 Victor Papyshev: yeah. I mean, I think that yeah part of the real time use case like our takeaway on Monday was that it’s a tentative takeaway was that

331 00:42:19.720 ⇒ 00:42:25.510 Victor Papyshev: since they’re like pretty edge case use cases. And like, we kind of know, we, we will know.

332 00:42:25.840 ⇒ 00:42:37.159 Victor Papyshev: or the system will know, like what data sources are considered valid and whatnot. I feel like we are okay with like cause, like, right now, I mean, I guess I’m sharing chrome, anyway, like, if you recall the diagram, or if you have fixed.

333 00:42:37.160 ⇒ 00:42:37.790 Uttam Kumaran: Yes.

334 00:42:37.790 ⇒ 00:43:06.890 Victor Papyshev: Pulled up. It’s like, you know, the the data pipeline is essentially handling. All of the, you know, the stateful merging of user land configuration like mappings whatnot. And then we’re doing transformation to make that to turn that essentially into update individual update rows and click house like that’s kind of the Tldr. We are okay doing some of that stuff like essentially duplicatively in memory as needed for, like those real time, use cases like, let’s say, we have our pixel script on the on the website. We capture a form fill. We we identify it whatever and like that’s going to hit some like slash

335 00:43:06.930 ⇒ 00:43:34.040 Victor Papyshev: like post to like slash submission or something like a slash form submission, whatever that itself like in like, I think, like the the one component that doesn’t go away is like we ideally are checking whatever database for like. If we’ve already enriched Justin, and we’ve already resolved. Essentially, we’ve already resolved this entire like priority list. We don’t need to like waste time hitting clear bit right now, right? But either. But if clear bit is missing and it is in the data source priority. And we know that clear bit is fast enough to.

336 00:43:34.610 ⇒ 00:44:00.160 Victor Papyshev: In this real time use case, we’re gonna hit clear bit Api, anyway, if we have to do like duplicative transformation, like, yeah, pipe like, throw that in. Shove that through the pipeline. No problem. We can do the same thing like in memory in real time, just like the same logic we’re we’re applying like, essentially in the pipeline just right now in this like workflow execution service. That’s fine like, we can live with that in order to essentially produce that object like on the spot, instead of relying on.

337 00:44:00.510 ⇒ 00:44:06.369 Uttam Kumaran: So most of your enrichments are not going like, are they? All are the most of them gonna be support that.

338 00:44:08.240 ⇒ 00:44:09.180 Uttam Kumaran: Near real time.

339 00:44:09.180 ⇒ 00:44:10.400 Victor Papyshev: Like latency, wise.

340 00:44:10.570 ⇒ 00:44:15.210 Uttam Kumaran: Yeah, how many of them are? Really, I don’t know. That seems surprising.

341 00:44:15.830 ⇒ 00:44:30.738 Victor Papyshev: I mean, it’s what we’re doing now, pretty much like we if we don’t have clear bit enrichment for somebody like, and they hit a workflow we hit that. Now, I think our latency is usually like a hundred 50 ms for clear bit. For example. So that that’s like within reasonable

342 00:44:31.130 ⇒ 00:44:41.929 Victor Papyshev: use cases today? It’s just like slower ones we would not tolerate, but like pretty much like, what are we doing in the in the pipeline. We’re taking that clear bit enrichment payload. And we’re like essentially flattening it to all the individual.

343 00:44:41.930 ⇒ 00:44:49.900 Uttam Kumaran: But all, all I guess you do need to measure then, is how long it takes change to make it, to persist into postgres. That’s the sla.

344 00:44:50.060 ⇒ 00:44:53.260 Uttam Kumaran: So that’s what you just kind of measure, right like

345 00:44:54.900 ⇒ 00:44:57.800 Uttam Kumaran: And then that’s what you need to flag in the product which is like

346 00:44:58.430 ⇒ 00:45:06.359 Uttam Kumaran: it will take X amount of time before we already have some validation on like. When does this end up persisting in postgres? I don’t know what that is.

347 00:45:06.930 ⇒ 00:45:11.829 Uttam Kumaran: I think that’s a great optimization challenge for somebody, but that’s the

348 00:45:12.820 ⇒ 00:45:23.496 Uttam Kumaran: that’s the that’s the sla. And and I don’t think that’s like invalid like in in salesforce like the yes, they do formula fields, but it’s like damn slow because of that.

349 00:45:24.010 ⇒ 00:45:29.170 Uttam Kumaran: And I do think that you have to persist. Some of the, and that’s the I mean.

350 00:45:31.230 ⇒ 00:45:32.570 Uttam Kumaran: I think that’s fine.

351 00:45:36.170 ⇒ 00:45:36.860 Victor Papyshev: Okay.

352 00:45:37.920 ⇒ 00:45:38.820 Uttam Kumaran: Like.

353 00:45:39.080 ⇒ 00:45:46.460 Uttam Kumaran: I don’t think that’s gonna that’s like a super long. Because ultimately what you’re doing is like when you, when you change something, an event gets sent

354 00:45:46.790 ⇒ 00:45:51.790 Uttam Kumaran: and a record needs to get updated in Click house. And a record needs to get updated in in postgres.

355 00:45:56.030 ⇒ 00:45:56.800 Victor Papyshev: Right?

356 00:45:58.180 ⇒ 00:46:02.110 Victor Papyshev: So where’s my head? Go with that?

357 00:46:02.490 ⇒ 00:46:18.459 Victor Papyshev: Yeah, I mean, I think that’s ultimately correct. And are we treating like when we say, like record gets updated in postgres record gets updated and click house. The thought experiment here is like the the row structure is essentially is the same. Right? Like the scheme is the same. It’s like, essentially a clone, right? Like functionally, it’s a clone of what we have in clickhouse.

358 00:46:18.721 ⇒ 00:46:23.679 Uttam Kumaran: Yeah, but not it doesn’t have to be a full. It’s just a clone of what you need for

359 00:46:24.080 ⇒ 00:46:26.978 Uttam Kumaran: the real time. Use case, real time product use cases. Right? So

360 00:46:27.440 ⇒ 00:46:34.930 Uttam Kumaran: so in that sense like you, you are just, I mean, you just do a Cdc the other way. And you’re applying. You’re doing an upsert

361 00:46:35.350 ⇒ 00:46:37.350 Uttam Kumaran: for whatever the change record is.

362 00:46:37.950 ⇒ 00:46:42.080 Uttam Kumaran: I don’t think that’s gonna be like super slow, by the way, but like that is the

363 00:46:43.230 ⇒ 00:46:44.810 Uttam Kumaran: that’s a dependency.

364 00:46:48.040 ⇒ 00:46:52.220 Justin Wong: Yeah, actually, that might be. That might be

365 00:46:54.500 ⇒ 00:46:57.830 Justin Wong: a way to come at this like, because

366 00:46:58.300 ⇒ 00:47:03.520 Justin Wong: because the materialized view ultimately still produces, like

367 00:47:03.820 ⇒ 00:47:10.280 Justin Wong: for each customer the object with the traits that they care about and their prioritization order. If

368 00:47:10.390 ⇒ 00:47:30.559 Justin Wong: the Cdc coming out of the materialize view is ultimately what we sync into a postgres table that should cut down significantly on the amount of rights that we do, because, if, like the actual useful object after a materialized view doesn’t change. Then postgres doesn’t change, and then that saves us from

369 00:47:30.930 ⇒ 00:47:34.469 Justin Wong: like unnecessary rights, which I would argue is really like

370 00:47:34.750 ⇒ 00:47:39.660 Justin Wong: the biggest issue with trying to handle this in postgres versus in

371 00:47:42.070 ⇒ 00:47:49.350 Justin Wong: versus in quick house is just like, yes, it’s a large volume of rights. But it’s also like we

372 00:47:49.810 ⇒ 00:47:57.549 Justin Wong: if we’re writing a bunch of things that don’t, or making a bunch of transactions that don’t need to be made, because ultimately the data is not changing.

373 00:47:57.960 ⇒ 00:48:05.459 Justin Wong: then we’re kind of saving ourselves from overloading a postgres database.

374 00:48:05.460 ⇒ 00:48:08.630 Uttam Kumaran: Yeah, like, I I mean, but you could. You could also go first.st

375 00:48:08.880 ⇒ 00:48:09.700 Uttam Kumaran: Who?

376 00:48:13.480 ⇒ 00:48:16.330 Uttam Kumaran: Yeah, I guess I don’t know if you could go 1st to Postgres.

377 00:48:17.870 ⇒ 00:48:28.819 Victor Papyshev: I mean, I I have a click pipe already set up to take, like, I’m assuming that users are gonna be manipulating their priority orders like this stuff over in user land, which is like a postgres setup. And I have.

378 00:48:28.820 ⇒ 00:48:35.429 Uttam Kumaran: Okay, then, yeah, then, that’s what gets updated first.st That’s gonna be that I feel like 100.

379 00:48:35.940 ⇒ 00:48:38.480 Uttam Kumaran: That’s I think you that. And then

380 00:48:39.020 ⇒ 00:48:43.549 Uttam Kumaran: then you click you. Then why use click pipes to get it back into click house.

381 00:48:44.370 ⇒ 00:48:50.660 Victor Papyshev: Yeah, I mean, click, click house acquired pure Dv, and then they just roll rolled it as a click. By? Pretty much so, yeah.

382 00:48:53.060 ⇒ 00:48:54.790 Victor Papyshev: okay. And.

383 00:48:56.220 ⇒ 00:49:00.464 Uttam Kumaran: And then in Click House, there’s like these like there’s like these buffers right for the

384 00:49:01.970 ⇒ 00:49:04.050 Uttam Kumaran: How often to insert records.

385 00:49:05.050 ⇒ 00:49:14.439 Victor Papyshev: Yeah, yeah, the the docs are. It’s either 100,000 rows, 20 MB, or every 5 seconds is click pipes batching strategy.

386 00:49:14.840 ⇒ 00:49:16.420 Victor Papyshev: like whichever comes first.st

387 00:49:16.750 ⇒ 00:49:17.839 Uttam Kumaran: I see. Okay.

388 00:49:17.840 ⇒ 00:49:23.059 Victor Papyshev: But that’s off of Kafka topic. So I don’t know how that would work with pure dB, I’m not sure about the pure dB batching.

389 00:49:27.390 ⇒ 00:49:29.420 Victor Papyshev: I mean, that’s yeah.

390 00:49:31.070 ⇒ 00:49:53.960 Victor Papyshev: Yeah. I might need a noodle on it a little bit, because I’m thinking, like the point of like. So if we’re doing this cloning thing. Essentially, this duplicative data store thing, just like one is optimized for primary key looks up, lookups, and one is optimized for analytical like, you know intense queries and view generation and stuff view as like a product term like save view, like salesforce whatever. Yeah, like a set of filters, essentially, that you want to persist.

391 00:49:54.820 ⇒ 00:49:56.928 Victor Papyshev: So if those 2 use cases,

392 00:49:57.960 ⇒ 00:50:09.100 Victor Papyshev: we use postgres for the more real time, esque use cases. Therefore, so like, what’s the use case like, we’re running a real time workflow execution. Let’s just like stick to forms, form submissions. Actually the only real one that exists.

393 00:50:09.100 ⇒ 00:50:09.810 Uttam Kumaran: Yeah, yeah.

394 00:50:09.810 ⇒ 00:50:15.639 Victor Papyshev: Stick to that one that that workflow needs to as quickly as possible. Go find out like.

395 00:50:16.170 ⇒ 00:50:26.979 Victor Papyshev: I mean, essentially, do we have? Ha! What data do we have on the real time compatible data sources, which is like a couple of nursing providers. And I mean, ideally. Crm.

396 00:50:27.410 ⇒ 00:50:28.670 Victor Papyshev: that’s the thing.

397 00:50:28.670 ⇒ 00:50:31.020 Uttam Kumaran: That has to get that. Yeah, exactly.

398 00:50:31.180 ⇒ 00:50:31.560 Victor Papyshev: I wish.

399 00:50:31.560 ⇒ 00:50:37.979 Uttam Kumaran: Crm, or again, you, I don’t know if you guys are gonna offer like Cal, if you have to offer like calculated fields and stuff like.

400 00:50:41.220 ⇒ 00:50:43.940 Uttam Kumaran: yeah, yeah, we, I would like to. But

401 00:50:43.940 ⇒ 00:50:46.570 Uttam Kumaran: all has to get back into postgres.

402 00:50:46.730 ⇒ 00:50:50.429 Victor Papyshev: Right exactly. I mean, the thing is like, I mean, speaking of, like the.

403 00:50:50.430 ⇒ 00:50:59.029 Uttam Kumaran: Well, this is the this is the complication is so so typically when you have a click house or type, analytical warehouse set up all it is is a

404 00:50:59.200 ⇒ 00:51:03.060 Uttam Kumaran: read-only clone of postgres.

405 00:51:03.530 ⇒ 00:51:13.760 Uttam Kumaran: and it is in like an olap database where you can run huge reads from. But there’s not this like reverse. There’s not this like logic dependent on that

406 00:51:14.870 ⇒ 00:51:18.009 Uttam Kumaran: that has to go back into postgres typically.

407 00:51:18.750 ⇒ 00:51:19.290 Victor Papyshev: Edit it.

408 00:51:19.460 ⇒ 00:51:21.189 Uttam Kumaran: Yeah, you’re not like.

409 00:51:21.190 ⇒ 00:51:26.869 Victor Papyshev: What we want is the materialized view to be a Postgres table. That’s our that’s our dream state. Ask like, can we.

410 00:51:26.870 ⇒ 00:51:27.550 Uttam Kumaran: Yes.

411 00:51:27.550 ⇒ 00:51:30.160 Victor Papyshev: How do we do that? And doesn’t seem to exist.

412 00:51:31.790 ⇒ 00:51:34.714 Victor Papyshev: But the thing is like, Yeah, well, actually, go go ahead.

413 00:51:35.610 ⇒ 00:51:44.480 Uttam Kumaran: No, I mean you, could you? Could I? Yeah, I think you could. You could bring the changes into postgres. I I just think, in order to do this real time routing you’re in a

414 00:51:44.960 ⇒ 00:51:45.939 Uttam Kumaran: have to

415 00:51:46.070 ⇒ 00:51:52.039 Uttam Kumaran: have to use it. Do it that way. I think I would like to see what the use cases for bringing

416 00:51:53.700 ⇒ 00:51:59.599 Uttam Kumaran: otherwise in your graph, right? Right? The only way we had postgres was in the user land part

417 00:52:00.010 ⇒ 00:52:02.350 Uttam Kumaran: didn’t have anything coming back from Click house.

418 00:52:04.420 ⇒ 00:52:24.510 Victor Papyshev: That’s right. I mean, yeah, I don’t have anything coming out of Click House, because click house seems to be, you know, they focus on ingestion via click pipes. I mean, I’m sure that we can set up our own pipes. I mean, I’m sure, like I could probably set up a for example, I could set up a red panda connect, probably pipeline, to read from Click House, and like, publish that to a topic and then push it along to some.

419 00:52:24.510 ⇒ 00:52:29.409 Uttam Kumaran: I do think for that one for the one routing use case. You just have to pull from there and then from other stuff. Yeah.

420 00:52:31.250 ⇒ 00:52:34.624 Victor Papyshev: Okay, I might like pen and paper some of the stuff or go whiteboard it.

421 00:52:34.970 ⇒ 00:52:38.079 Victor Papyshev: just to this gives me this gives me some stuff to think about.

422 00:52:40.910 ⇒ 00:52:45.489 Victor Papyshev: Interesting, I mean, how’s this landing with you, Justin? I know we’re like right, just a little over time. So.

423 00:52:45.490 ⇒ 00:52:46.030 Uttam Kumaran: Yeah.

424 00:52:46.030 ⇒ 00:52:46.680 Victor Papyshev: Complain.

425 00:52:46.680 ⇒ 00:52:48.010 Uttam Kumaran: Yeah, I have some time case.

426 00:52:48.010 ⇒ 00:52:49.379 Victor Papyshev: See? You, yeah.

427 00:52:49.380 ⇒ 00:52:51.009 Uttam Kumaran: That’s the next 30 min. Yeah.

428 00:52:51.010 ⇒ 00:52:52.442 Victor Papyshev: Okay, cool. Appreciate it.

429 00:52:53.450 ⇒ 00:52:57.970 Victor Papyshev: Yeah. My calendar is clear. But yeah, Justin, how’s this kinda landing with you so far.

430 00:52:58.430 ⇒ 00:53:01.650 Justin Wong: Yeah, I mean, this is kind of all

431 00:53:04.650 ⇒ 00:53:11.330 Justin Wong: kind of accelerating the flywheel I’ve had in my head about like we have a bit of a chicken and egg problem where

432 00:53:11.470 ⇒ 00:53:19.079 Justin Wong: you know, the data comes in a couple of different ways. We have to use it a couple of different ways, but depending on where you’re jumping in, like.

433 00:53:20.220 ⇒ 00:53:22.339 Justin Wong: yeah, I don’t know. I’m.

434 00:53:22.920 ⇒ 00:53:25.710 Uttam Kumaran: To like. It’s gonna make sense. If you if you guys are, gonna do like

435 00:53:26.245 ⇒ 00:53:34.490 Uttam Kumaran: calculated fields and then use that for routing, there is going to be an sla because that calculation you’re not gonna be want to do in postgres

436 00:53:34.950 ⇒ 00:53:38.050 Uttam Kumaran: like, if it is like, I want to run a sum.

437 00:53:38.260 ⇒ 00:53:41.629 Uttam Kumaran: and then I want to then use that for the routing workflow.

438 00:53:42.860 ⇒ 00:53:44.649 Uttam Kumaran: It has to happen somewhere else.

439 00:53:46.750 ⇒ 00:53:49.200 Victor Papyshev: Somewhere else, meaning like.

440 00:53:49.690 ⇒ 00:53:58.899 Uttam Kumaran: Perhaps happened in like an analytics environment where, for example, you’re like, I wanna I wanna route this based on some calculation that is like a sum of a value.

441 00:54:00.370 ⇒ 00:54:00.960 Justin Wong: Yeah.

442 00:54:00.960 ⇒ 00:54:02.840 Victor Papyshev: Example, like routing based on that. I mean.

443 00:54:02.840 ⇒ 00:54:03.800 Uttam Kumaran: Yeah, I’m trying to think.

444 00:54:03.800 ⇒ 00:54:05.410 Victor Papyshev: Yeah, I mean, I can give you an example. I mean.

445 00:54:05.410 ⇒ 00:54:07.049 Uttam Kumaran: Oh, yeah, yeah, if you have one. Yeah.

446 00:54:07.050 ⇒ 00:54:22.890 Victor Papyshev: Yeah, we haven’t gotten into the user object so much yet. I mean, I think that you, the user object is like we focus on like person and company pretty much. And then we just now touch on like deal. I think that again, like, user slash person. All right. Yeah, let’s let’s call user, like, as in the default user, like a sales rep, for example.

447 00:54:23.650 ⇒ 00:54:31.219 Victor Papyshev: Again, we’ll see how we end up literally treating that like. But in this context. That’s interesting to us, because you can do things like, you know.

448 00:54:32.933 ⇒ 00:54:33.940 Victor Papyshev: quota attainment

449 00:54:34.320 ⇒ 00:54:52.850 Victor Papyshev: rate this quarter right? Like using that as a variable, and that that is fed by probably what I mean. There’s a chance that we get away with just pulling that straight from salesforce, because it’s probably something you’re calculating in salesforce. Anyway, we can just straight up, pull the whatever like the integer is, or the percentage of the flow, or whatever whatever it might be

450 00:54:53.577 ⇒ 00:54:58.982 Victor Papyshev: but let’s just let’s assume, like the the most complex case where we are essentially calculating.

451 00:54:59.740 ⇒ 00:55:12.179 Victor Papyshev: something like a quota attainment percentage. This this quarter like so far this quarter something like that, you know. How would we look that you can imagine like the formula for that like, get all opportunities.

452 00:55:12.360 ⇒ 00:55:19.709 Victor Papyshev: get all opportunities. Stage closed. One add up their amount and divide by the by, the reps quota, or something.

453 00:55:19.710 ⇒ 00:55:20.840 Uttam Kumaran: Yeah, yeah.

454 00:55:21.142 ⇒ 00:55:23.860 Victor Papyshev: Something like that, you know. That’s like an example.

455 00:55:23.860 ⇒ 00:55:25.959 Uttam Kumaran: And then that’s and then that’s for lead routing.

456 00:55:26.240 ⇒ 00:55:27.660 Uttam Kumaran: Yeah, exactly.

457 00:55:27.660 ⇒ 00:55:32.660 Uttam Kumaran: I guess my question is like, does that need to happen in real time like. Then I would push back and say no.

458 00:55:32.660 ⇒ 00:55:33.780 Victor Papyshev: No, no, definitely, not.

459 00:55:33.780 ⇒ 00:55:37.360 Uttam Kumaran: Then you can run. Then you can run that in. Then you can run that anywhere.

460 00:55:37.660 ⇒ 00:56:07.619 Victor Papyshev: Yeah, that’s yeah. That’s fine. I think. Yeah, the sla. It’s it’d be good to start adding some not like pretty much like adding some certainty or definition of the sla’s here, because I think, like where I stopped thinking about. I mean, I knew it wasn’t like really purpose built for this, anyway. But I guess once I got this far, like it’s probably what Friday last week, or something like our spike that we’ve been building out kind of got to enough certainty where I was looking at it like I read the docs on click pipes finally, and it’s like pretty much I should operate under the assumption that click pipes is at least 5 seconds behind reality.

461 00:56:07.620 ⇒ 00:56:08.410 Uttam Kumaran: Yes.

462 00:56:08.410 ⇒ 00:56:16.809 Victor Papyshev: By nature, and then I looked into it like I could set up a red and a connect pipeline with a more granular batching strategy right like, if I pull up the docs for that real quick.

463 00:56:16.810 ⇒ 00:56:17.450 Uttam Kumaran: Yes.

464 00:56:17.450 ⇒ 00:56:20.019 Victor Papyshev: Interesting like, I think it’s called input sequel.

465 00:56:21.200 ⇒ 00:56:26.529 Victor Papyshev: SQL, input think it’s worth a SQL insert.

466 00:56:27.197 ⇒ 00:56:45.050 Victor Papyshev: I’ll share quickly. Not to get not to stay too in the weeds, but it’s like, you know, batching is like a much more robust sort of configuration you can set, and they on a fit. They have community level support for click house as an engine or as a driver. You can see, like the drivers on this red Panda

467 00:56:45.985 ⇒ 00:56:57.019 Victor Papyshev: output is Mysql postgres Click House. They support a few of these officially, like Mysql Postgres. There’s a list somewhere else on, like which ones are officially supported with versus which ones are community supported.

468 00:56:57.630 ⇒ 00:56:59.870 Victor Papyshev: But anyway, I guess we could

469 00:57:00.370 ⇒ 00:57:09.270 Victor Papyshev: set up this to go from Red Panda to click house, essentially, roll our own click pipe to right to click house to do these inserts

470 00:57:09.530 ⇒ 00:57:11.620 Victor Papyshev: we could. But even then it’s like.

471 00:57:11.940 ⇒ 00:57:26.990 Victor Papyshev: I don’t know that I, if we want to tighten up the sla. This is an option like we don’t have to rely on click pipes and also click pipes are the cloud Saas offering only so that’s that’s actually, I think, our only outstanding, like vendor lock in into the cloud, the manage, the manage cloud

472 00:57:27.550 ⇒ 00:57:37.134 Victor Papyshev: products like from from cloud, that I’m not worried about that. I’m probably gonna do that, anyway. But it’s good that we have a way out, so to speak, where it’s like, if we need to roll our own

473 00:57:37.380 ⇒ 00:57:38.170 Uttam Kumaran: Yes, yes.

474 00:57:38.170 ⇒ 00:57:41.089 Victor Papyshev: Like egress from Red Panda to click house. We can

475 00:57:41.567 ⇒ 00:58:07.470 Victor Papyshev: but anyway, it’s kind of more of a side side note. Either way. Like, when I got to this point, like, I pretty much stopped counting on click house being up to date in a real time. Sense. I was like, okay, it seems out of pattern to really expect click house to count on it being like real time in the in the context of, I need current data for like a workflow execution. Right? The second. So yeah, I think now, it’s like the.

476 00:58:07.470 ⇒ 00:58:13.370 Uttam Kumaran: I I really think you just have to have an sla by like execution type.

477 00:58:14.070 ⇒ 00:58:19.670 Uttam Kumaran: And then you you do have a purpose built database because these trade offs you’re not gonna find

478 00:58:19.850 ⇒ 00:58:28.449 Uttam Kumaran: something that’s gonna do everything like you’re you’re gonna have round numbers that you need to display on a dashboard that you should pull from Click house.

479 00:58:28.600 ⇒ 00:58:35.739 Uttam Kumaran: You’re also gonna have things that like you can’t go query you. You shouldn’t build the back end on

480 00:58:37.240 ⇒ 00:58:41.160 Uttam Kumaran: on clickouts, just because these tables will become very, very big

481 00:58:41.410 ⇒ 00:58:46.339 Uttam Kumaran: to go do those lookups. But like what? What? Those that subset may be a lot smaller.

482 00:58:48.042 ⇒ 00:58:50.370 Uttam Kumaran: There’s gonna be stuff that you don’t.

483 00:58:52.870 ⇒ 00:58:59.019 Uttam Kumaran: Yeah. But but again, it’s like whether those events come in, and then they persist in both, or they go into

484 00:58:59.580 ⇒ 00:59:06.710 Uttam Kumaran: 1 first.st but but here’s an example. If you’re like, if you need to use the round number for a workflow.

485 00:59:06.990 ⇒ 00:59:12.160 Uttam Kumaran: the round number calculation is going to be better suited in Click House.

486 00:59:12.490 ⇒ 00:59:21.579 Uttam Kumaran: Like to run a select sum of blah blah. To get a whole number is much better in click house. But then you have to. That is gonna take.

487 00:59:22.190 ⇒ 00:59:25.470 Uttam Kumaran: then you have to persist that right like.

488 00:59:25.470 ⇒ 00:59:26.110 Victor Papyshev: Yeah.

489 00:59:26.110 ⇒ 00:59:29.399 Uttam Kumaran: Whatever the whatever the calculation is, there has to be like

490 00:59:29.550 ⇒ 00:59:33.559 Uttam Kumaran: a persistence. It can’t calculate on the fly in postgres.

491 00:59:34.260 ⇒ 00:59:35.080 Justin Wong: Yeah.

492 00:59:35.080 ⇒ 00:59:37.880 Victor Papyshev: I mean, that’s the chicken and egg thing. Ultimately.

493 00:59:37.880 ⇒ 00:59:41.839 Uttam Kumaran: Yeah. Yeah. But it also depends on like what you guys want to offer product. Wise like.

494 00:59:42.760 ⇒ 00:59:45.750 Uttam Kumaran: maybe it’s maybe it’s not everything under the sun. For now.

495 00:59:46.590 ⇒ 00:59:53.689 Justin Wong: Yeah, I don’t know, Vic, this is kind of making me think

496 00:59:55.390 ⇒ 01:00:00.530 Justin Wong: this is kind of making me think we we might need to explore that identity.

497 01:00:00.690 ⇒ 01:00:08.190 Justin Wong: resolution, or like trade resolution, if you want to call it that service that I was kind of talking about yesterday, like we might need

498 01:00:08.470 ⇒ 01:00:22.439 Justin Wong: the source data to be in very quick read databases. Postgres might not be the answer. I’m kind of leaning dynamo right now, just because I’ve seen us use that that way in the past, but, like we might need to do that, and then have

499 01:00:22.630 ⇒ 01:00:25.490 Justin Wong: a trade resolution

500 01:00:25.880 ⇒ 01:00:44.750 Justin Wong: service that can take an Id and a customer’s priority mappings, pull the data sources it needs from dynamo and then do the calculation on the spot. And we can persist those snapshots down the event, pipeline, and maybe those end up in Click House, because we will want to use it that way later, for sure. But

501 01:00:45.710 ⇒ 01:00:51.240 Justin Wong: yeah, I I don’t know. I don’t really see a way around this necessarily.

502 01:00:51.970 ⇒ 01:01:07.249 Justin Wong: If we want to have better support for the real time, use case because I think the part that I’m getting hung up on, and is honestly giving me a little bit of fear. Going forward is if we rely on everything and click us, and we have to have some sort of like lag

503 01:01:07.530 ⇒ 01:01:13.199 Justin Wong: sla for you know, data accuracy coming out of Quick house.

504 01:01:13.300 ⇒ 01:01:14.820 Justin Wong: We already get

505 01:01:15.050 ⇒ 01:01:22.140 Justin Wong: just way. Too many questions from customers about like. Why did this compute this way at the time when data should look like this?

506 01:01:22.140 ⇒ 01:01:23.320 Uttam Kumaran: Really, okay.

507 01:01:23.670 ⇒ 01:01:42.629 Justin Wong: Yeah, I mean, it’s it’s not so much like related to this specific use case. Like the where it happens, a lot is in in workflow execution right now, and that’s more to do with how we like track, workflow state and things like that. But it’s very analogous to the situation we were. We’re talking about introducing here. If we’re there’s an Sla on

508 01:01:43.050 ⇒ 01:01:58.677 Justin Wong: how long it might take for data to update and reflect in Click House. And we use that as our primary source of data, we’re gonna get all sorts of these things where it’s like, well, you know, this record got updated in salesforce to say that deal stage was

509 01:01:59.610 ⇒ 01:02:14.660 Justin Wong: you know, close lost at 8 53 0. 5 and the workflow ran at 8, 53 0 6. But it calculated it as if it was still open, right? And like

510 01:02:15.350 ⇒ 01:02:29.809 Justin Wong: I, I already know that that’s not only going to introduce a lot of confusion, and our customers really don’t like non deterministic behavior. But then our ability to even like audit that and show that that’s the reason why

511 01:02:30.190 ⇒ 01:02:36.899 Justin Wong: is because of the like event process, not just like our event processing lag from multiple.

512 01:02:36.900 ⇒ 01:02:37.970 Uttam Kumaran: Yeah, yeah, yeah.

513 01:02:37.970 ⇒ 01:02:40.610 Justin Wong: But then also quick pipes on top of it like.

514 01:02:41.280 ⇒ 01:02:44.430 Justin Wong: it’s gonna be really hard for us to.

515 01:02:46.550 ⇒ 01:02:48.969 Justin Wong: It’s gonna be really hard for us to.

516 01:02:48.970 ⇒ 01:02:50.009 Uttam Kumaran: So in that, in that, in that.

517 01:02:50.010 ⇒ 01:02:50.460 Justin Wong: Example.

518 01:02:50.460 ⇒ 01:02:55.710 Uttam Kumaran: Like what triggers the workflow in that salesforce example like, if you can.

519 01:02:57.960 ⇒ 01:03:03.616 Victor Papyshev: This could be anything like something that happens to need to read the status of

520 01:03:04.260 ⇒ 01:03:17.410 Victor Papyshev: of an opportunity I don’t know just like I mean, it could be that it’s a real time. Use case, and that’s when it hits or or just have. Maybe it’s a batch workflow execution that happens to run that workflow at that time, and it just gets the wrong value because it hasn’t persisted.

521 01:03:17.955 ⇒ 01:03:18.749 Victor Papyshev: I guess, to

522 01:03:19.990 ⇒ 01:03:33.460 Victor Papyshev: either postgres or click houses, whatever we’re reading if it hasn’t persisted yet, like if it due to sla’s, then it’s gonna be like just the like. The incorrect outcome as far as the customer is concerned. Now, what is the likelihood of that happening like, I guess

523 01:03:33.660 ⇒ 01:03:42.999 Victor Papyshev: this is where we talk like is seconds fine, and we tell users it can take up to whatever seconds for changes to, you know.

524 01:03:43.170 ⇒ 01:03:47.730 Uttam Kumaran: Because how are you gonna get it from salesforce right like, how long is it taking to get out of

525 01:03:48.120 ⇒ 01:03:49.560 Uttam Kumaran: from a whatever.

526 01:03:49.960 ⇒ 01:04:01.680 Victor Papyshev: I mean, there are a few variables, for that example, like how long? Because, like right now, we’re triggering workflows off of salesforce outbound messages being emitted from flows that we’re essentially programmatically publishing.

527 01:04:01.810 ⇒ 01:04:04.730 Uttam Kumaran: But are these? Are you hitting the Api to get those or those web hooks.

528 01:04:04.990 ⇒ 01:04:20.019 Victor Papyshev: Those are web hooks they hit, they outbound. Messages are emitted based on a flow being triggered conditionally in salesforce. So and we just programmatically publish those, and like they hit an endpoint. Those outbound messages are delivered to an endpoint on our end a serverless endpoint.

529 01:04:20.984 ⇒ 01:04:24.517 Victor Papyshev: So because like we can. Trigger flows off of

530 01:04:25.638 ⇒ 01:04:43.519 Victor Papyshev: object was created or object was updated or object was created or updated, like those are like radio button options in the Ui, and then you can also write. You write salesforce formulas for that. So like what we do is like, we essentially white label a salesforce formula. You are like, builder using our same Ui patterns as other.

531 01:04:43.520 ⇒ 01:04:43.950 Uttam Kumaran: Yeah, yeah.

532 01:04:43.950 ⇒ 01:04:52.750 Victor Papyshev: And or conditions. And that’s what we publish. And that’s how we get messages out of folks. Salesforce instances. I think there’s also like, I think there is like salesforce Cdc. Of some kind.

533 01:04:52.750 ⇒ 01:04:53.660 Uttam Kumaran: Yeah, yeah, yeah.

534 01:04:53.660 ⇒ 01:04:55.164 Victor Papyshev: Also subscribe to. But

535 01:04:55.910 ⇒ 01:05:11.020 Victor Papyshev: anyway, I guess whatever we end up using, I think 1st of all, we’re subject to whatever salesforce is like internal behaviors are there. I mean, if we if the update happened at 8 53 0, 5, we might. I guess we don’t really know. We could try to dig through some docs, some ancient docs, or something.

536 01:05:11.020 ⇒ 01:05:12.040 Uttam Kumaran: Yeah, I.

537 01:05:12.670 ⇒ 01:05:24.610 Uttam Kumaran: Some clients like, for example, we have clients who are on like Netsuite, and the Netsuite Api, like a slas, are too slow. So then they upgrade their netsuite, and then we end up hooking into their old app like Netsuite, will give us

538 01:05:24.780 ⇒ 01:05:31.669 Uttam Kumaran: literally the access to their the customer’s data store in their dB, then I can get the events like

539 01:05:31.930 ⇒ 01:05:35.040 Uttam Kumaran: way quicker. But that’s like a super enterprise

540 01:05:35.740 ⇒ 01:05:39.430 Uttam Kumaran: like cause they’re not gonna do that with every client otherwise, like, go through the Api. But

541 01:05:42.640 ⇒ 01:05:43.770 Uttam Kumaran: yeah, that’s

542 01:05:49.200 ⇒ 01:05:50.109 Uttam Kumaran: yeah. It’s.

543 01:05:52.940 ⇒ 01:06:05.789 Victor Papyshev: Sleep over it. But other things like you expect data freshness. Then it comes down to characteristics with 3rd party systems, characteristics of our.

544 01:06:05.790 ⇒ 01:06:06.290 Uttam Kumaran: Yes.

545 01:06:06.290 ⇒ 01:06:12.840 Victor Papyshev: Time and ingestion, batching and whatnot, new variables, I guess. Like.

546 01:06:14.050 ⇒ 01:06:21.570 Victor Papyshev: where am I going? This, I mean, I guess, like, let’s maybe I mean, let’s say that there’s some theory we come up with, or find some theoretical solution that allows us

547 01:06:21.890 ⇒ 01:06:34.230 Victor Papyshev: to take materialized views from Click House and essentially clone them, and I guess it’s almost like pure, but in reverse, I guess, like in the click out. If if we if something like that theoretically exists.

548 01:06:34.711 ⇒ 01:06:45.550 Victor Papyshev: you know, does that solve our problem? I guess again, depends on the sla’s of that same. Because, like, you know, to just example, let’s let’s also theoretically pretend that.

549 01:06:46.129 ⇒ 01:07:01.769 Victor Papyshev: You know, the Salesforce Crm updates are emitted instantly and like, and then they’re consumed by Click House instantly, like ingested. That’s great. That’s like. I guess, then the question is, how often are we reciterializing that view? And how often, how long does it take to admit that

550 01:07:01.920 ⇒ 01:07:03.530 Victor Papyshev: either those changes or.

551 01:07:05.060 ⇒ 01:07:12.980 Justin Wong: Well, actually, I. So I was looking into this today with Click house. The materialized views are not necessarily

552 01:07:14.010 ⇒ 01:07:38.379 Justin Wong: calculated. Table wide, like the view is done at right time. So as we’re writing a row into the main table for whatever data that affects, then it’s almost like fan out it. It then also is writing to a materialized view at the same time. So theoretically materialized views are always up to date with whatever the

553 01:07:39.120 ⇒ 01:07:45.390 Justin Wong: most recent data is in the table itself, so we don’t necessarily have a latency

554 01:07:45.950 ⇒ 01:07:58.060 Justin Wong: on that. I think there might be a latency in the Merge Tree engine. I can’t remember, I I know that’s come up before, but as far as the materialized view is like, it should always be reflective of what’s in the the actual table.

555 01:07:59.990 ⇒ 01:08:05.420 Uttam Kumaran: If your if your calculation is dependent on just values and it’s in the one row, then I don’t think it’s

556 01:08:06.680 ⇒ 01:08:08.700 Uttam Kumaran: I don’t think you’re gonna have a problem.

557 01:08:09.410 ⇒ 01:08:12.419 Uttam Kumaran: It’s if your calculation depends on like.

558 01:08:14.170 ⇒ 01:08:21.590 Justin Wong: Oh, it’s multiple rows the way we’re planning to store right now. It’s it’s many.

559 01:08:21.590 ⇒ 01:08:22.250 Uttam Kumaran: Okay.

560 01:08:22.859 ⇒ 01:08:28.259 Justin Wong: Rows. And I mean, there’s also there’s also a possibility that we

561 01:08:29.119 ⇒ 01:08:36.829 Justin Wong: haven’t spent enough time thinking about this, and maybe we can do these as singular rows, and let a different merge tree

562 01:08:38.019 ⇒ 01:08:42.789 Justin Wong: do more of the work for us in the materialized view. But I don’t know that that

563 01:08:44.449 ⇒ 01:08:47.739 Justin Wong: necessarily solve the problem we’re that we’re

564 01:08:49.139 ⇒ 01:08:52.019 Justin Wong: primarily focused and trying to solve at the moment.

565 01:08:52.830 ⇒ 01:08:59.410 Victor Papyshev: When we say calculating, we mean like this, like taking object, update rows and turning them into snapshots right?

566 01:08:59.560 ⇒ 01:09:00.409 Victor Papyshev: By the way.

567 01:09:04.750 ⇒ 01:09:06.490 Justin Wong: Yes.

568 01:09:06.490 ⇒ 01:09:07.340 Victor Papyshev: But I mean.

569 01:09:08.200 ⇒ 01:09:10.000 Justin Wong: Yeah, yeah, yeah, yeah, yeah, yeah.

570 01:09:10.790 ⇒ 01:09:13.669 Victor Papyshev: So actually a merge tree wouldn’t.

571 01:09:14.680 ⇒ 01:09:21.179 Justin Wong: Yeah, I, okay, I’m sorry. Merge Tree wouldn’t help us here, because we would then effectively need a a person

572 01:09:22.229 ⇒ 01:09:28.630 Justin Wong: data table. First, st where all of these are actually individual rows for a person and not just like.

573 01:09:30.760 ⇒ 01:09:35.029 Justin Wong: yeah, update, yeah, for for specific services or sources.

574 01:09:39.050 ⇒ 01:09:45.859 Victor Papyshev: I mean, I mean, we could make a we could make a table like that

575 01:09:46.270 ⇒ 01:09:53.039 Victor Papyshev: where I mean, because we’re we’re in control of what our standard fields are, and I guess maybe we run. We run into trouble with the custom stuff.

576 01:09:53.200 ⇒ 01:09:57.400 Victor Papyshev: Not sure if like. I guess what I don’t know what merge directions would be. It may be

577 01:09:57.560 ⇒ 01:10:01.519 Victor Papyshev: aggregating merge tree. Maybe I’m not familiar yet with like.

578 01:10:01.927 ⇒ 01:10:03.149 Justin Wong: Well, there’s like

579 01:10:03.410 ⇒ 01:10:09.720 Justin Wong: aggregating and replacing, and I think you can even combine the 2. Replacing is kind of the one

580 01:10:10.040 ⇒ 01:10:12.390 Justin Wong: that I’m

581 01:10:13.330 ⇒ 01:10:25.820 Justin Wong: thinking we would use for something like this. I we might even want to use it for something here. I I know that like for one of the used cases you talked about yesterday. We probably wouldn’t want to use replacing. Because we.

582 01:10:26.060 ⇒ 01:10:34.420 Justin Wong: I think we want to have the ability to show updates over time, even from specific sources, for.

583 01:10:34.420 ⇒ 01:10:35.130 Victor Papyshev: Yeah.

584 01:10:35.440 ⇒ 01:10:38.189 Justin Wong: You know a record versus.

585 01:10:38.190 ⇒ 01:10:42.230 Victor Papyshev: Computation. Right? If like, if priority changes right, like.

586 01:10:42.230 ⇒ 01:10:42.670 Justin Wong: Yeah.

587 01:10:43.800 ⇒ 01:10:47.700 Justin Wong: But the downside to that, though, is because we’re not using.

588 01:10:50.016 ⇒ 01:10:59.429 Justin Wong: Actually, I need to think that through a little bit more, I believe, because we’re not using a replacing merge tree. We are effectively keeping

589 01:10:59.710 ⇒ 01:11:13.953 Justin Wong: a lot more rows in this table than we need. So computation, wise person snapshots actually has to go through way more data than it should have to like. If we use a replacing merge tree on object updates.

590 01:11:15.710 ⇒ 01:11:26.079 Justin Wong: then it would only keep the most recent one for unique value of event type, you know, whatever and that would cut down.

591 01:11:28.310 ⇒ 01:11:32.599 Justin Wong: you know, computationally on what the what it takes for the materialized view.

592 01:11:32.750 ⇒ 01:11:33.710 Justin Wong: But

593 01:11:35.328 ⇒ 01:11:44.550 Justin Wong: then, you know, we then we’d be duplicating data even more to an extent, because we have one table. Using a merge tree. We have one table using a replacing merge tree.

594 01:11:45.180 ⇒ 01:11:45.730 Victor Papyshev: Yeah.

595 01:11:46.295 ⇒ 01:12:00.860 Victor Papyshev: I was gonna propose, like, we could duplicatively write the same updates, or like, maybe structured differently to 2 different tables. One is like we’re using like replacing merge tree or something. And this and then the other one’s effectively something like what we have now. With a snapshots view

596 01:12:01.110 ⇒ 01:12:05.440 Victor Papyshev: hooked into or yeah tapping into it. But yeah, I guess that would.

597 01:12:06.320 ⇒ 01:12:10.050 Victor Papyshev: Well, I guess one would be. It would be duplicative data storage. But one.

598 01:12:10.050 ⇒ 01:12:10.460 Uttam Kumaran: Yes.

599 01:12:11.620 ⇒ 01:12:19.830 Victor Papyshev: You know, ideally more, I guess, less computationally intensive, because all it’s maintaining is essentially like what we, the same thing we’re doing here. But just

600 01:12:20.100 ⇒ 01:12:24.129 Victor Papyshev: via the merge tree engine, not via view materialization.

601 01:12:24.690 ⇒ 01:12:27.019 Victor Papyshev: So I don’t know if I mean, I guess

602 01:12:27.330 ⇒ 01:12:34.220 Victor Papyshev: that’s still gonna be subject to like, would that be faster? For, like primary key, ask lookup, or do we not really?

603 01:12:35.742 ⇒ 01:12:36.740 Uttam Kumaran: No, not really.

604 01:12:36.740 ⇒ 01:12:43.819 Victor Papyshev: I don’t. I don’t think so. Right. We still we the the issue is still that this isn’t really tooled under the hood to be good at that kind of lookup right?

605 01:12:44.650 ⇒ 01:12:50.410 Justin Wong: Yeah, ultimately, whatever is in the materialized, whatever makes it to the materialized view is

606 01:12:51.180 ⇒ 01:12:56.530 Justin Wong: like the lookup. Cost will be the same. It’s just going to be dependent upon the table size and the indices, but

607 01:12:56.900 ⇒ 01:13:02.500 Justin Wong: it’ll be the look of cost once it’s in the view, is, it will be the same.

608 01:13:04.010 ⇒ 01:13:11.259 Victor Papyshev: Yeah, I mean, by the way, with, I think you mentioned that materialized use. They materialize at insert time. I mean, we you could.

609 01:13:11.260 ⇒ 01:13:12.100 Justin Wong: Reading.

610 01:13:12.100 ⇒ 01:13:14.510 Victor Papyshev: You could do a refreshable one where we’re in control of like

611 01:13:15.150 ⇒ 01:13:23.520 Victor Papyshev: frequently. So if we do have this like dual approach where one is merged, say, replacing merge tree powered, and this another one is

612 01:13:23.530 ⇒ 01:13:37.029 Victor Papyshev: materialized view powered. It’s like trade off being. The materialized view maintains like it has like source priority, awareness, like historically. And it can like rematerialize itself. I guess the other one is more naive, I guess, like, for

613 01:13:37.030 ⇒ 01:13:53.540 Victor Papyshev: I I guess again, like. We may or may not actually be getting the perk we want out of that like if we still can’t query it efficiently with like on a key. But I guess, like, if control over how frequently we, we materialize like we control the sla, and it’s not like on insert. We could do a refreshable

614 01:13:53.570 ⇒ 01:14:02.020 Victor Papyshev: materialize view. Looks like just a heads up. I don’t know that that unlocks anything for us right now, but it’s at least an option if it comes to that.

615 01:14:03.480 ⇒ 01:14:04.350 Victor Papyshev: Oh.

616 01:14:07.150 ⇒ 01:14:10.699 Uttam Kumaran: Yeah, like, scheduled kind of thing going on. So.

617 01:14:12.170 ⇒ 01:14:17.940 Victor Papyshev: I don’t know. So it’s tool in the tool belt. I don’t know if it actually solves the problems we’re we’re tackling. But.

618 01:14:18.170 ⇒ 01:14:19.190 Justin Wong: Yeah.

619 01:14:23.800 ⇒ 01:14:24.400 Victor Papyshev: Hmm!

620 01:14:24.640 ⇒ 01:14:28.430 Victor Papyshev: I mean, I guess, like the way hold up.

621 01:14:29.430 ⇒ 01:14:34.700 Justin Wong: I was just gonna say, I think we will have

622 01:14:35.570 ⇒ 01:14:38.440 Justin Wong: sort of infinite ability to keep

623 01:14:38.940 ⇒ 01:14:42.609 Justin Wong: spinning our wheels on this and changing like

624 01:14:42.940 ⇒ 01:14:47.020 Justin Wong: our perspective on what the ideal

625 01:14:48.420 ⇒ 01:14:54.910 Justin Wong: solution is gonna look like here. But we might need to box ourselves in in the sense of

626 01:14:55.390 ⇒ 01:14:58.350 Justin Wong: kind of get more

627 01:14:59.470 ⇒ 01:15:07.300 Justin Wong: firm product requirements and user stories from like Sid and Nico and the rest of the team and figure out based on that like.

628 01:15:07.500 ⇒ 01:15:14.390 Justin Wong: what us sla’s do we need to meet internally to deliver product functionality? And then that’s gonna force us to like.

629 01:15:15.920 ⇒ 01:15:24.815 Justin Wong: Look at this from a very particular angle. And if one of those becomes too expensive. We can take that back and be like, well, look, this is just never gonna be cost effective. But

630 01:15:26.100 ⇒ 01:15:29.870 Justin Wong: I think we’re trying to solve for everything right now, and maybe we don’t have to.

631 01:15:31.350 ⇒ 01:15:47.622 Victor Papyshev: Okay? Then, yeah, I mean, I’m doing some writing on paper here. I guess I can. You can take some. I can jot some notes on about like, especially recent ourselves around the actual use cases. A little bit. See if that adds some healthy constraints to how we think about it?

632 01:15:48.280 ⇒ 01:16:01.429 Victor Papyshev: yeah. Cause. Yeah, I think you’re right. We are kind of looking for a silver bullet, which you know in theory, like, we’ve gotten pretty far like looking for silver bullets. I think we’re probably hitting the maybe a limit of how silver bullet we can get so.

633 01:16:01.430 ⇒ 01:16:01.790 Justin Wong: Yeah.

634 01:16:02.030 ⇒ 01:16:13.700 Victor Papyshev: I’m also curious to learn again. I think if we boil it down we can at least know what know what tooling to look for like what solution to like be hunting for in terms of orchestrating our toolkit like whether it’s some.

635 01:16:13.900 ⇒ 01:16:19.399 Victor Papyshev: you know, like, I said, like some theoretical click house to Postgres like synchronous.

636 01:16:19.400 ⇒ 01:16:19.730 Uttam Kumaran: Yeah.

637 01:16:19.730 ⇒ 01:16:22.329 Victor Papyshev: Like, if that exists, then like, if we we would know we can.

638 01:16:22.330 ⇒ 01:16:23.449 Victor Papyshev: Yeah, if we like.

639 01:16:23.450 ⇒ 01:16:28.709 Victor Papyshev: boil it down to the use cases. But I’m curious to hear your thoughts as we kinda come up on.

640 01:16:28.710 ⇒ 01:16:35.429 Uttam Kumaran: Yeah, I mean, I mean, I’m gonna call my friend from Flow code, probably this evening, and

641 01:16:35.700 ⇒ 01:16:45.330 Uttam Kumaran: sort of ask him like what we did. I also think we should. Once you have the use cases. We should put this in front of Click House, too. I think they’ll

642 01:16:45.690 ⇒ 01:16:50.230 Uttam Kumaran: the folks. The solution. Architects there are pretty good. They’ll give us some sense of what to do.

643 01:16:51.930 ⇒ 01:16:56.283 Uttam Kumaran: But it’ll get the vendors to work for us a little bit, and it’ll be worth it.

644 01:16:59.180 ⇒ 01:17:04.570 Uttam Kumaran: They hop on us with slack, and do that. And then, yeah, I’m gonna look, I’m gonna think about a little bit more

645 01:17:05.170 ⇒ 01:17:06.289 Uttam Kumaran: this weekend.

646 01:17:08.710 ⇒ 01:17:20.019 Uttam Kumaran: Yeah, I do. I do think that like I don’t know whether product gets the cost of supporting some of these, and part of this could be solved just through like

647 01:17:21.270 ⇒ 01:17:29.949 Uttam Kumaran: customer education, like what we, what you kind of what we promise. There’s also like things you can add on like we can eventually come back and add some functionality.

648 01:17:33.440 ⇒ 01:17:37.579 Justin Wong: Yeah, I think the part that’s going to be

649 01:17:38.320 ⇒ 01:17:41.090 Justin Wong: at some point we’re just gonna have to

650 01:17:42.700 ⇒ 01:18:06.490 Justin Wong: swallow some of the additional complexity. Is that a lot of this is powering what our differentiator is going to be like. Default wants to come at this from a different perspective for customers where we can deliver value that currently, they have to struggle in doing because they’re using 4, 5, 6 disparate platforms that all have data they want to use

651 01:18:06.600 ⇒ 01:18:13.520 Justin Wong: for a an automation. Use case of any kind, whether that’s routing or you know any anything else.

652 01:18:13.520 ⇒ 01:18:23.479 Uttam Kumaran: Or the opposite. They’re limited. They’re using it just in one tool, like they’re just doing the automation in Hubspot. And they’re having to like Reverse Etl, or do shit to get it in there.

653 01:18:23.820 ⇒ 01:18:30.690 Uttam Kumaran: Right? So part of this is like, I think, just understanding that like, what are we competing with

654 01:18:31.462 ⇒ 01:18:40.160 Uttam Kumaran: and yeah, like, I think the most basic thing is, someone’s probably getting a bunch of info piping it into a snowflake. Then getting that into something.

655 01:18:40.160 ⇒ 01:18:40.540 Justin Wong: Yeah.

656 01:18:40.540 ⇒ 01:18:54.440 Uttam Kumaran: Common reverse. Etl use case right now is everything gets sent to a snowflake. Data team, compute something like a churn risk or something that then gets sent in back into Hubspot. The sla’s on. Those are like they’re not. There’s no there’s no sla like.

657 01:18:54.440 ⇒ 01:18:55.060 Justin Wong: Yeah.

658 01:18:55.060 ⇒ 01:18:59.350 Uttam Kumaran: So ultimately, like, you guys have solved that really? Well, I think

659 01:19:00.690 ⇒ 01:19:09.000 Uttam Kumaran: there, I think there is some constraints to put on it. Yeah, what? What’s what is real time? Ultimately, though, like, look, you’re not solving like 9. 1 1 call has to get

660 01:19:09.320 ⇒ 01:19:17.339 Uttam Kumaran: put into someone. So it’s not like a do or die. It is. It is someone in b 2 b, but like.

661 01:19:17.990 ⇒ 01:19:22.420 Uttam Kumaran: And and I will say, Look, I think you guys have a lot of sophistication you’re already offering.

662 01:19:22.770 ⇒ 01:19:38.079 Uttam Kumaran: So yes, maybe someone is pissed that like they updated this thing and like it didn’t persist in 3 seconds. But you should be like dude. If you’re expecting 3 seconds, you probably have other things to optimize like this isn’t where this isn’t. That’s I don’t think that’s a fair

663 01:19:38.450 ⇒ 01:19:47.549 Uttam Kumaran: product requirement. And just given my experience in sort of sales analytics. Some people don’t even have this, basically at any sla or any reasonable sla even that. So.

664 01:19:48.220 ⇒ 01:19:53.690 Victor Papyshev: Yeah, I mean, I think, ultimately, like, user facing latency is what people like really split hairs over. I think

665 01:19:54.460 ⇒ 01:20:11.949 Victor Papyshev: freshness like. And I know we can solve the latency. But like I said, we short circuit the whole data pipeline thing we get. We got our rapid response data, the best of our ability, and we and we tell users. So that’s the case, you know, like that we do make a best effort like this is a time sensitive workflow trigger. Note that like only like

666 01:20:11.980 ⇒ 01:20:27.860 Victor Papyshev: only data sources that are like, you know, considered rapid response, are going to be the ones that are guaranteed to be considered like in your in the data here, like in whatever you build in this workflow, I think that’s reasonable. As long. And we can work around like the user facing latency.

667 01:20:28.190 ⇒ 01:20:42.800 Victor Papyshev: Yeah. And speaking of like documenting, like, I think Tommy mentioned that just like, what are we? What are we promising people? I think, like, as I think about this from like a platform perspective like, it’s like, generally the big trend here is like, we’re taking default like shifting it from being like super

668 01:20:42.800 ⇒ 01:21:05.410 Victor Papyshev: workflow centric. Despite that remaining important. It’s like shifting from being workflow centric to like more data centric. How we think about building this. So in that sense, like, I’m if I’m thinking about it as a platform like ideally, developers can eventually build on top of this platform and like tap into the data model that we’re essentially putting forth. And something like that, like, I’m just imagining hypothetical docs for a world like that, and even for us, internally, as an engineer.

669 01:21:05.410 ⇒ 01:21:05.750 Uttam Kumaran: Yeah.

670 01:21:05.750 ⇒ 01:21:25.720 Victor Papyshev: Is like, okay, we, we carry this. So this, this, this and this data within the data layer. And here’s like how fresh all of it can be expected to be based on like, based on the upstream. Like, say, we’re like pulling in Google calendar data. Maybe that’s fresh every minute, or whenever the relevant user like, you know, gets a meeting.

671 01:21:25.720 ⇒ 01:21:35.170 Uttam Kumaran: Yeah. But but again, there’s also things you can do where it’s like. Look, if you make an update, then any workflow that’s triggered before we can guarantee. Persistence is delayed.

672 01:21:35.510 ⇒ 01:21:46.170 Uttam Kumaran: you know. So so it’s you just have to do choice, do you? Rather it happen? And it’d be wrong, or you rather it’d be a little slow, and it’d be it’d be right. And that’s like, I think, a product

673 01:21:47.140 ⇒ 01:21:49.670 Uttam Kumaran: decision. But I just don’t think

674 01:21:50.910 ⇒ 01:21:57.909 Uttam Kumaran: you guys are. I think we’re splitting hairs over a couple of seconds where I think that’s fine. I don’t think I think even the use case, Justin, you brought up.

675 01:21:58.110 ⇒ 01:21:59.849 Uttam Kumaran: I think it’s a good like

676 01:22:00.040 ⇒ 01:22:02.139 Uttam Kumaran: yo. We should be the best. But, like

677 01:22:02.310 ⇒ 01:22:08.780 Uttam Kumaran: I know, these sales firms, they’re not even their sla’s right now on even doing any sort of signal based routing is so slow.

678 01:22:09.800 ⇒ 01:22:14.619 Uttam Kumaran: And so I feel like you’ve already hit the nail, you know, just even in

679 01:22:14.780 ⇒ 01:22:19.940 Uttam Kumaran: with a couple of seconds, sla, let alone like cool.

680 01:22:20.320 ⇒ 01:22:43.989 Victor Papyshev: Well, let’s let’s take like where we’re at. I guess in like this conversation we’ll translate it essentially, just some like product spec as well as like, take it to the team and see like, okay, what are. Here’s where we’re at like, here are kind of the trade offs we’re looking at. And then here are the use cases. We’re looking to service like the way we’ve sort of like category or broken them out into categories in terms of like the day, the data processing having that just on paper as a document, it’ll probably be helpful to like, validate it, or like poke holes in it.

681 01:22:44.180 ⇒ 01:22:45.160 Uttam Kumaran: Yeah. Okay.

682 01:22:47.940 ⇒ 01:22:50.397 Uttam Kumaran: Well, great thanks for giving us another half hour.

683 01:22:50.670 ⇒ 01:22:51.660 Uttam Kumaran: Yeah, of course. Of course.

684 01:22:51.660 ⇒ 01:22:52.379 Justin Wong: Yeah, this is.

685 01:22:52.380 ⇒ 01:22:55.359 Uttam Kumaran: This is great. Oh, this is a this is a dope challenge.

686 01:22:55.490 ⇒ 01:22:57.239 Uttam Kumaran: Yeah, unique challenge.

687 01:22:57.500 ⇒ 01:22:58.369 Victor Papyshev: Sweet, well.

688 01:22:58.370 ⇒ 01:23:06.150 Uttam Kumaran: If you want me to. If you want me to help get in touch with the Click House people, or whatever let me know or if you want me to try to get them on slack and get them thinking about this for us.

689 01:23:06.330 ⇒ 01:23:09.099 Uttam Kumaran: I’d love to have the vendors work for us, you know. So.

690 01:23:09.720 ⇒ 01:23:10.540 Justin Wong: Yeah.

691 01:23:10.540 ⇒ 01:23:11.130 Victor Papyshev: Yeah.

692 01:23:11.130 ⇒ 01:23:15.680 Justin Wong: That was something we didn’t take enough advantage of when when I was at clear bit like we had

693 01:23:15.950 ⇒ 01:23:19.489 Justin Wong: bigquery like offering to

694 01:23:19.780 ⇒ 01:23:25.260 Justin Wong: give us a bunch of input on this stuff, and we didn’t really take full advantage of it. I think we did better with clickouts. But yeah, like

695 01:23:25.630 ⇒ 01:23:32.890 Justin Wong: one of the things that shot us in the foot was like we weren’t using bigquery to the most efficient way that we could, even though they were offering

696 01:23:33.020 ⇒ 01:23:35.869 Justin Wong: to like. Walk through that with us.

697 01:23:36.030 ⇒ 01:23:39.509 Uttam Kumaran: Yeah, maybe one output of Victor, like the documentation.

698 01:23:39.700 ⇒ 01:23:44.109 Uttam Kumaran: The output is like once because they’re gonna be like, we need to know what you’re dealing with.

699 01:23:44.230 ⇒ 01:23:53.179 Uttam Kumaran: So let me, let’s maybe we can aim to put something together or append to your doc, and then I can try to get some on the phone or get them in slack next week.

700 01:23:53.440 ⇒ 01:24:02.340 Uttam Kumaran: And we can just be like, here’s the here, the open questions that we’re we’re dealing with there, I mean, but I think we should figure out the product side 1st before continuing.

701 01:24:03.100 ⇒ 01:24:10.050 Victor Papyshev: Cool sounds good. I’ll probably just jump right into that while it’s fresh, and I’ll try to get something your way over the weekend, or something like that.

702 01:24:10.560 ⇒ 01:24:11.630 Uttam Kumaran: Okay, perfect.

703 01:24:11.780 ⇒ 01:24:12.440 Victor Papyshev: Awesome.

704 01:24:12.560 ⇒ 01:24:14.670 Victor Papyshev: Okay, cool. This is helpful. I appreciate it.

705 01:24:14.670 ⇒ 01:24:15.619 Uttam Kumaran: You guys, yeah, of course.

706 01:24:15.920 ⇒ 01:24:16.609 Victor Papyshev: Thank you.

707 01:24:16.610 ⇒ 01:24:17.120 Justin Wong: Thank you.

708 01:24:17.120 ⇒ 01:24:18.690 Uttam Kumaran: Soon. Yeah. Bye.

709 01:24:18.690 ⇒ 01:24:19.230 Victor Papyshev: I.

Brainforge Knowledge

Explorer

2025-07-18_default_data_pipeline_strategy_discussio_734b69da

Graph View