Meeting Title: Brainforge x Breezy: Engineering Discovery Date: 2025-12-10 Meeting participants: Awaish Kumar, Xiaojie’s AI Notetaker, Greg’s AI Notetaker, Uttam Kumaran, Xiaojie Zhang, Greg
WEBVTT
1 00:04:24.770 ⇒ 00:04:25.810 Uttam Kumaran: Anyways…
2 00:04:26.550 ⇒ 00:04:27.150 Awaish Kumar: Blue.
3 00:04:36.990 ⇒ 00:04:37.850 Xiaojie Zhang: Hey, hey!
4 00:04:38.520 ⇒ 00:04:39.910 Uttam Kumaran: Hey, how are you?
5 00:04:40.170 ⇒ 00:04:41.690 Xiaojie Zhang: I’m doing good, how are you?
6 00:04:42.050 ⇒ 00:04:43.440 Uttam Kumaran: Good, nice to meet you.
7 00:04:43.440 ⇒ 00:04:48.269 Xiaojie Zhang: Oops in here, sorry, I have too many note-takers here. So…
8 00:04:48.900 ⇒ 00:04:59.169 Xiaojie Zhang: Let’s see, it’s, you know, the system we’re trying to build, there’s, like, the function, you know, one of the features is AI Notetaker. I have two.
9 00:04:59.170 ⇒ 00:05:01.710 Uttam Kumaran: I have one for dev environment, another for product.
10 00:05:01.710 ⇒ 00:05:07.440 Xiaojie Zhang: I guess, lucky, I think he only has the product one, or the dev one.
11 00:05:07.440 ⇒ 00:05:10.979 Uttam Kumaran: Are you guys gonna do… are you guys gonna do anything for, like, phone calls?
12 00:05:11.330 ⇒ 00:05:20.009 Xiaojie Zhang: We are trying, we’re trying to integrate that with phone call. Right now, I think the role model in the AI note-taker side is Granola.
13 00:05:20.010 ⇒ 00:05:22.119 Uttam Kumaran: Yeah, yeah, yeah. I was gonna say, like, I…
14 00:05:22.120 ⇒ 00:05:23.959 Xiaojie Zhang: Yeah, there’s a reason to lay support.
15 00:05:23.960 ⇒ 00:05:24.580 Uttam Kumaran: Jeremy?
16 00:05:24.580 ⇒ 00:05:25.070 Xiaojie Zhang: Yeah.
17 00:05:25.070 ⇒ 00:05:28.129 Uttam Kumaran: I haven’t tried the phone call feature, because I always forget to, like.
18 00:05:28.130 ⇒ 00:05:37.810 Xiaojie Zhang: I haven’t, I haven’t. I think they need some certain setup, whatever. Yeah, I have granola on my laptop as well, but, you know, I kind of prefer that, but we have the…
19 00:05:38.290 ⇒ 00:05:41.090 Xiaojie Zhang: Our own solution, so we’re just dogfooding it.
20 00:05:41.580 ⇒ 00:05:51.159 Uttam Kumaran: Yeah, one thing that we do is we use Zoom, Zoom’s native recording. Oh, okay. How is that? Pretty good. It’s like…
21 00:05:51.260 ⇒ 00:06:01.640 Uttam Kumaran: So, the one thing that’s nice is that, you know, it doesn’t have to use a note taker, because it’s just, like, inbuilt in the platform, but it’s not very popular. I feel like they didn’t do a good job of, like.
22 00:06:02.160 ⇒ 00:06:07.350 Uttam Kumaran: you have to build it, so we built on top of the SDK to, like, sort of grab files and recordings, but…
23 00:06:07.550 ⇒ 00:06:09.829 Uttam Kumaran: It’s a little bit cleaner of a solution.
24 00:06:09.980 ⇒ 00:06:14.350 Uttam Kumaran: But, yeah, Granola is the first one that I saw that actually is, like, trying to do phone calls.
25 00:06:15.770 ⇒ 00:06:23.700 Uttam Kumaran: But still, you know what I started to do is I’ll take the phone call on speaker, and just turn granola on my laptop.
26 00:06:25.950 ⇒ 00:06:39.040 Uttam Kumaran: So… yeah. Well, really nice to meet you both. I appreciate the time. I’m not sure how much, you know, context, y’all have, but I sent a little bit of a document over, sort of.
27 00:06:39.170 ⇒ 00:06:58.549 Uttam Kumaran: I’ll give you a little bit of background on Brainforge, so we’re a data and AI consultancy. We do a lot of work, sort of setting up data infrastructure, product analytics, basically everything around data and reporting. And so, a lot of the work that we do is, you know, things that I think
28 00:06:58.550 ⇒ 00:07:11.419 Uttam Kumaran: both Jim Z and Seagal mentioned that were really important, which is just setting up, like, the reporting stack, like warehouse, ETL, modeling, as well as considering, you know, options for product analytics.
29 00:07:11.520 ⇒ 00:07:25.069 Uttam Kumaran: And then sort of understanding, like, reporting. I think the other kind of workstream that he mentioned was, sort of this, like, unique MLS dataset, you know, that you guys are working with.
30 00:07:25.070 ⇒ 00:07:34.500 Uttam Kumaran: But yeah, sort of just wanted to say hi today, and just see how many questions I could answer, and sort of get us into how your guys’ vision for
31 00:07:34.500 ⇒ 00:07:36.089 Uttam Kumaran: You know, both of those…
32 00:07:36.200 ⇒ 00:07:47.779 Uttam Kumaran: those work streams. The other thing is I actually was able to go to a webinar with, with a demo with, like, a bunch of, I think, new beta users, so I did get a sense that…
33 00:07:47.780 ⇒ 00:08:01.799 Uttam Kumaran: Yeah, it was just the day after I met them, and they’re like, can you hop on this? I’m like, okay, I’ll film and listen. And it’s a cool product. I think it’s great. I have a lot of friends in, residential and commercial real estate. I worked in, I worked at WeWork on the data team for a while.
34 00:08:01.800 ⇒ 00:08:07.890 Xiaojie Zhang: Oh, here we go, here we go. Nice, nice. You kind of, like, know what is going on on the real estate side.
35 00:08:07.890 ⇒ 00:08:19.539 Uttam Kumaran: Yeah, but it’s also, like, what you guys are doing on the pulling building planes and stuff like that is, like, awesome. I don’t think I’ve seen anything like that. But then, of course, I think you guys are so hyper-focused on…
36 00:08:19.570 ⇒ 00:08:29.570 Uttam Kumaran: the real estate, like, the… the real estate agent and, like, everything around that person. I don’t know, it was just cool to watch, you know, get a chance to listen in, so…
37 00:08:29.570 ⇒ 00:08:48.320 Xiaojie Zhang: Cool, cool, cool. Thanks for that. Yeah, we can do a quick introke for us. I’m Xiao Ji. We… like, actually, Greg and I are actually on another side. You folks are in United States, we’re on another side of the planet. So I’m in China at this moment, Greg is in, Australia, so,
38 00:08:48.690 ⇒ 00:08:50.240 Greg: Well, not right now, I’m not.
39 00:08:50.240 ⇒ 00:08:51.680 Xiaojie Zhang: Sorry, you’re in Melbourne?
40 00:08:51.680 ⇒ 00:08:59.470 Greg: I’m usually in Melbourne, but right now, I’m in, in Eastern Malaysia.
41 00:09:00.100 ⇒ 00:09:03.760 Xiaojie Zhang: Oh, here we go. Oh, nice. A little bit of funds there. Great.
42 00:09:03.760 ⇒ 00:09:07.540 Greg: Yeah, only landed… only landed, like, a couple of hours ago.
43 00:09:07.540 ⇒ 00:09:17.700 Xiaojie Zhang: Oh, okay, shit, yeah, yeah, so, yeah, we’re on another side of the planet, so that… which is great, and I’m kind of, you know, we have,
44 00:09:17.700 ⇒ 00:09:32.849 Xiaojie Zhang: I would say small dev center in China, so I’m kind of, like, leading and overseeing, you know, folks here. Had some experience on the data side, on the… you know, I… you know, I worked for Airbnb before, so… and, like, as a product engineer, plus,
45 00:09:32.980 ⇒ 00:09:50.509 Xiaojie Zhang: data engineer for a while, I’m like, I pick… I then kind of go back to the product land, because I’m, like, more… like, I’m kind of more passionate about, you know, building software for customers, but I kind of get the idea about, like, you know, how complicated and…
46 00:09:50.510 ⇒ 00:09:53.479 Uttam Kumaran: You guys had a good data stack there. I used to read the Airbnb blogs.
47 00:09:53.480 ⇒ 00:10:05.569 Xiaojie Zhang: Oh, yeah, the Airflow and, like, you know, SuperSat stuff, yeah, you know, that’s… that’s… that’s… I would say it’s good open source software, if you have no choice. But we are starting things in our…
48 00:10:05.570 ⇒ 00:10:14.229 Uttam Kumaran: This is, like, back then, right? I feel like now, yeah, we have a… there’s actually a lot of better and cheaper options that are more stable than, like, even 5 years ago, so…
49 00:10:14.230 ⇒ 00:10:29.459 Xiaojie Zhang: Yeah, and Greg is an engineering leader who is, I think, the founding engineer who actually built a lot of things from ground up. So, yeah, we’re kind of joined… you know, we’re kind of here, I think, like, Jim C and Sigao want us to have a quick chat, trying to
50 00:10:29.460 ⇒ 00:10:49.320 Xiaojie Zhang: of course, know each other, ask some questions, and see whether they can be a good fit there. Like, still, we have no idea what it can, like, the relationship gonna, you know, marching towards to. But yeah, like, but I’m more than happy to know each other, and I think we are kind of, like, discussing the next step, how we’re gonna…
51 00:10:49.320 ⇒ 00:10:51.370 Xiaojie Zhang: You know, integrate with…
52 00:10:51.590 ⇒ 00:11:05.500 Xiaojie Zhang: you know, the team, or, like, start to… I think it’s important to start the data initiative, trying to understand the data from a, you know, user or BI perspective, and at the same time, we have
53 00:11:05.610 ⇒ 00:11:07.979 Xiaojie Zhang: You know, a big, you know,
54 00:11:08.130 ⇒ 00:11:23.820 Xiaojie Zhang: a project that kind of pulling data for a more accuracy purpose, or more real-time purpose. So, yeah, I think that that can be some interesting topic to… to… to kind of dive deep a little bit. Yeah, so… Greg, do I… do you have anything to add on?
55 00:11:24.920 ⇒ 00:11:34.900 Greg: I guess my… my background was I worked at Afterpay, from when it was small,
56 00:11:35.360 ⇒ 00:11:38.870 Greg: meaning we scaled that up. I was on, like, the core team.
57 00:11:39.470 ⇒ 00:11:41.720 Greg: For most of…
58 00:11:41.860 ⇒ 00:11:48.900 Greg: 8 years, I guess, as a… as a back-end engineer and engineering manager, and,
59 00:11:49.870 ⇒ 00:11:56.920 Greg: Yeah, that was, that was pretty interesting. Sort of hockey stick growth, yeah.
60 00:11:57.700 ⇒ 00:11:59.819 Greg: Yeah, and…
61 00:12:01.230 ⇒ 00:12:14.890 Greg: Yeah, I’m looking forward to hearing, sort of, your… your thoughts on, like you said, even in the last 5 years, things have changed a lot in terms of, you know, ingesting huge amounts of data and…
62 00:12:15.290 ⇒ 00:12:21.430 Greg: Transforming it, or whatever, so it’s, so we can use it.
63 00:12:21.850 ⇒ 00:12:26.940 Greg: in, you know, super fast queries and stuff. So, yeah, looking forward to getting into that.
64 00:12:27.610 ⇒ 00:12:29.120 Uttam Kumaran: Yeah, great, and then maybe, like.
65 00:12:29.150 ⇒ 00:12:37.219 Uttam Kumaran: For me, I live here in Austin. My background is in data engineering, so I’ve worked, I worked in New York for a number of years as a data engineer.
66 00:12:37.250 ⇒ 00:12:56.119 Uttam Kumaran: sort of, like, increasingly smaller startups, and then my last role, I led product at a data startup. So, kind of, you did a just similar path to you, actually. Like, I actually went into product as well, and I was just… I became very opinionated about data products, because I was buying a lot of data products.
67 00:12:56.120 ⇒ 00:13:13.609 Uttam Kumaran: And then I went and, like, you know, built the first version of one, and then I left to start this business, you know, about 3 years ago. And basically, it’s just scaling a lot of what we’ve learned from doing, sort of, data stack setups, reporting analytics setups, and just now doing that for clients. And so we’ve assembled
68 00:13:13.610 ⇒ 00:13:28.549 Uttam Kumaran: you know, an amazing team, and I’ll let, you know, Wish intro after this, but all we do every day is come in and implement a lot of data solutions. The other thing is we built a lot of our business using AI, and so, actually, in the last year or so.
69 00:13:28.550 ⇒ 00:13:44.859 Uttam Kumaran: we’ve been trying to find more ways to actually weave in AI to accelerate the existing data work that we do, but additionally layer AI on top of a lot of the data for, you know, of course, doing a lot of the, you know, the stuff that y’all are doing. So, tons of context engineering.
70 00:13:45.130 ⇒ 00:14:00.580 Uttam Kumaran: tons of making, you know, different types of data formats available for agents, and so that’s been, you know, a blast. Yeah, maybe I wish I’ll let you give a little bit of intro, and then I would like to just maybe walk through the document that we put together, and then, yeah, I’m happy to sort of
71 00:14:00.580 ⇒ 00:14:06.500 Uttam Kumaran: talk a little bit about how we think about the ecosystem of vendors, tools, a way to put stuff together. Yeah, more than happy.
72 00:14:07.100 ⇒ 00:14:11.389 Awaish Kumar: Yeah, my name is Vish Kumar, and I’m in Pakistan right now.
73 00:14:12.900 ⇒ 00:14:25.819 Awaish Kumar: I have also… I’ve been working, like, for more than 8 years now as a data engineer. I have also experienced working at startups and growth stage companies.
74 00:14:25.960 ⇒ 00:14:29.790 Awaish Kumar: I have previously worked in, like, vacation rental business as well.
75 00:14:29.890 ⇒ 00:14:36.029 Awaish Kumar: So, kind of similar to property, but a little bit of caveats.
76 00:14:36.150 ⇒ 00:14:51.640 Awaish Kumar: Yeah, so that’s me. I have, like, I’ve been building data infrastructure from scratch, and, like, creating the end-to-end data pipelines and helping with the BI tool setup as well.
77 00:14:53.230 ⇒ 00:14:59.610 Uttam Kumaran: So maybe I’ll just go ahead and walk through, this document, and, basically.
78 00:14:59.690 ⇒ 00:15:11.700 Uttam Kumaran: you know, from our short sort of discovery and chats with everybody, we basically learned two things. One is, like, you know, I think we have a good… somewhat a good sense of… of what the product is.
79 00:15:11.700 ⇒ 00:15:35.910 Uttam Kumaran: I still have to install it and sort of give it a shot on my end, but I am familiar with the real estate market, and so hearing Jim Z’s vision, you know, hearing the webinar, I have an understanding of, like, what the business problem is that the team is definitely looking to solve. Additionally, I think what was really important, you know, when working with any company that’s trying to grow fast, like y’all, is just timeline. So I did get an understanding of, like.
80 00:15:35.940 ⇒ 00:15:55.559 Uttam Kumaran: okay, where we are in phase right now, sort of how we’re onboarding data users, we’re looking for, you know, more, like, structured launches in early next year. And so I, of course, understand that in order for, you know, data folks and product folks to make great decisions, like, you need a ton of data. So kind of the first part
81 00:15:55.560 ⇒ 00:16:02.490 Uttam Kumaran: Of our, like, sort of approach is, like, talking about analytics and just, like, the foundational
82 00:16:02.520 ⇒ 00:16:21.860 Uttam Kumaran: you know, tools you need to just understand product analytics. And so we do quite a bit of work in, you know, B2B SaaS, B2C SaaS, and so very familiar with tools like Amplitude, Mixpanel, PostHog, that do, you know, event tracking, things like that. So part of this first phase is just, like.
83 00:16:21.860 ⇒ 00:16:36.420 Uttam Kumaran: okay, can we basically understand the ex… I think there’s an existing mixed panel or statsig implementation, understand, like, what the event taxonomy is, and just drive towards building for the product team those core measurement dashboards.
84 00:16:36.420 ⇒ 00:16:51.270 Uttam Kumaran: The nice thing is, like, you don’t really need to, you know, do this in an external BI tool. If you can just do this all within your MixPanel or your Amplitude, it’s actually, like, a great way to accelerate, just, like, having a good understanding of things like
85 00:16:51.270 ⇒ 00:16:57.869 Uttam Kumaran: DAU, MAU, segments, cohorts, like, paths into the product, things like that. So that’s just a great
86 00:16:57.880 ⇒ 00:17:02.160 Uttam Kumaran: Baseline to just, like, get Views into, like, product usage.
87 00:17:02.400 ⇒ 00:17:18.349 Uttam Kumaran: And so it’ll also give you… and then basically from there, you know, we can talk a little bit about the MLS data and underbuilt. To kind of just talk about, Greg, like, your point about, sort of, like, what is the, like, reporting stack right now, and I think…
88 00:17:18.369 ⇒ 00:17:38.009 Uttam Kumaran: you know, both of you will kind of recognize some of these tools. It’s like, this is, like, a common stack that we often implement. So, like, you know, on the back end on your side, you just may have a bunch of different sources. Typically, we’ll implement some type of ingestion tool. So, typically, we recommend, like, Fivetran or Polyatomic. There’s a… there’s a couple of ones that are
89 00:17:38.140 ⇒ 00:17:52.589 Uttam Kumaran: you know, best in class, reliable, you know, so it’s really going to be a cost in the edge cases of integrations that you need. And then, really, a lot of our work starts there, which is just setting up, like, a core reporting data warehouse.
90 00:17:52.590 ⇒ 00:17:59.100 Uttam Kumaran: In this view, you know, we’re demonstrating Snowflake, but basically, this is, like, how we would set up a data mart.
91 00:17:59.100 ⇒ 00:18:18.749 Uttam Kumaran: So we land data, we do, like, some intermediate modeling, and then we basically try to make available for reporting use cases several different marts. And so, you may have your back-end user data, you may have finance and payment data, you may have your, you know, product analytics information,
92 00:18:18.750 ⇒ 00:18:31.260 Uttam Kumaran: You may have information from, you know, your agent interactions. Landing all that into a core place, and basically combining it, implementing business logic, making it easy for anyone to come report is, like, this is the typical
93 00:18:31.260 ⇒ 00:18:52.770 Uttam Kumaran: you know, sort of reporting stack, and then on the back end here, it’s sort of, like, getting that out for access. I think this is where there’s been some more innovation recently. One is, like, a lot of folks want to take product usage data and use it in tools like Klaviyo, for marketing activation. For example, a new user logged in, but they haven’t,
94 00:18:52.770 ⇒ 00:19:05.349 Uttam Kumaran: done, like, the first event, okay, we just send them an email, like a transact… like a marketing-based email or something like that. So that’s ways where you’re actually moving product usage data back into, you know, marketing activation flows.
95 00:19:05.350 ⇒ 00:19:22.999 Uttam Kumaran: Additionally, I’m sure, like, you know, advertising and leveraging ad dollars to drive, growth is gonna be a big thing, and for a lot of the advertisers, you know, ad teams that we work with, being able to tag, you know, those people and be able to retarget them is super, super important. So, that’s a lot of, like.
96 00:19:23.000 ⇒ 00:19:32.640 Uttam Kumaran: what gets enabled here, in addition to just BI. And then the last layer that we’ve been doing a lot of exploration on is, like, what are the best, like, AI sort of chat with
97 00:19:32.640 ⇒ 00:19:51.410 Uttam Kumaran: reporting data tools. How do we make it more easier for more people in the business to actually get value out of these data marts? And maybe they don’t have the expertise or the time to build dashboards, but what are other ways that they can actually start to, you know, chat with modeled, you know, governed data sources?
98 00:19:51.580 ⇒ 00:20:07.720 Uttam Kumaran: And so this is, like, kind of the rough stack. Of course, like, it’s going to be dependent on a lot of the sources. I know. You guys also have sources that aren’t sort of the typical vendors, but any questions here? Anything I can talk about, about, like, the typical reporting stack that, you know, we see?
99 00:20:13.430 ⇒ 00:20:14.850 Xiaojie Zhang: Nope, already.
100 00:20:15.180 ⇒ 00:20:16.000 Greg: Not.
101 00:20:16.150 ⇒ 00:20:17.299 Greg: Bye-bye. Bye.
102 00:20:18.440 ⇒ 00:20:29.039 Uttam Kumaran: So that’s, like, I mean, I think the first phase of this, really, is, like, trying to just drive towards, like, getting a good understanding of product analytics. The second phase here is about, like, the MLS
103 00:20:29.160 ⇒ 00:20:38.329 Uttam Kumaran: data, and I think this is where I would love to sort of hear from you guys. I… I sort of… we sort of wrote this based on chatting with Seagal and Jim Z,
104 00:20:38.770 ⇒ 00:20:48.739 Uttam Kumaran: I’m familiar with the MLS, I’m sort of familiar with, like, what data is in there, but yeah, I’m kind of… I kind of am interested in, like.
105 00:20:48.790 ⇒ 00:21:01.769 Uttam Kumaran: hearing how we… whatever system gets built supports any of the initiatives regarding this data, whether this is getting used for production or for reporting. But, like, yeah, we’d just love to hear a little bit about
106 00:21:02.030 ⇒ 00:21:05.259 Uttam Kumaran: Like, the vision with this, you know, phase.
107 00:21:06.070 ⇒ 00:21:10.559 Greg: Yeah, so that’s… it’s kind of a different,
108 00:21:11.670 ⇒ 00:21:17.349 Greg: I think this kind of sits off a bit to the side. The… the main thing…
109 00:21:18.070 ⇒ 00:21:31.639 Greg: Well, from my perspective, is to be able to generate comps for properties based on, you know, generally a subject property with, you know, a range of bedrooms, bathrooms.
110 00:21:31.810 ⇒ 00:21:47.790 Greg: all your normal stuff, and search through listings to get comparable properties within, you know, a radius of the long of the subject property. So we currently use a data provider
111 00:21:47.930 ⇒ 00:21:54.620 Greg: to get access to that data, but this project would be…
112 00:21:54.840 ⇒ 00:22:07.530 Greg: is getting the bulk data files of MLS listings, so we’re talking, like, 160 million records, kind of daily,
113 00:22:07.810 ⇒ 00:22:18.090 Greg: About 300 columns wide, so a fair bit of data to ingest, and then we wanna, like, a lot of that data, we won’t
114 00:22:18.860 ⇒ 00:22:30.860 Greg: use, we’ll want to, sort of, transform it into, the right shape to then be able to query super fast to generate comps.
115 00:22:31.170 ⇒ 00:22:34.220 Xiaojie Zhang: So, like, from a high level, it’s a fairly simple.
116 00:22:34.910 ⇒ 00:22:40.620 Greg: thing to do, but I guess to do it well is the…
117 00:22:40.730 ⇒ 00:22:47.189 Greg: so that it’s, you know, reliably handling that amount of data, and whether I actually
118 00:22:48.010 ⇒ 00:22:52.240 Greg: I’ve got a few clarifying questions that I need to find out, like, whether we can get
119 00:22:52.480 ⇒ 00:23:01.140 Greg: Deltas, or if it has to be full files every time, lots of just, kind of.
120 00:23:01.980 ⇒ 00:23:04.680 Uttam Kumaran: Ergonomic. Keeping that flow, kind of…
121 00:23:04.680 ⇒ 00:23:08.799 Greg: Happening, because it’s, you know, it’s a fair bit of data, to be processing, and…
122 00:23:09.010 ⇒ 00:23:14.739 Greg: how we handle failures and all that kind of stuff. And then getting it into,
123 00:23:15.870 ⇒ 00:23:23.290 Greg: assuming into, like, our Postgres to then query it super far to get the… to get the comps.
124 00:23:23.450 ⇒ 00:23:25.330 Greg: Yeah.
125 00:23:25.610 ⇒ 00:23:27.820 Greg: Questions? Thoughts?
126 00:23:29.080 ⇒ 00:23:31.640 Uttam Kumaran: Yeah, I mean, it makes sense. I think,
127 00:23:32.600 ⇒ 00:23:45.439 Uttam Kumaran: Yeah, there’s certainly a couple ways of doing it. I think if… depending on the ergonomics on their side, like, if these are… if they’re just landing flat… if they’re able to land flat files and put that into S3 or a data lake, there’s a lot of…
128 00:23:45.440 ⇒ 00:23:54.770 Greg: They can put it into… they can put it into Snowflake, or Databricks, or… yeah, I think they’re pretty flexible with where they can,
129 00:23:55.010 ⇒ 00:23:56.110 Greg: land it.
130 00:23:56.640 ⇒ 00:23:57.220 Uttam Kumaran: Okay.
131 00:23:57.350 ⇒ 00:24:12.659 Uttam Kumaran: Yeah, at that point, there is a lot of different ways for us to basically stream that in, in a few different ways into wherever it needs to go. So we’ve done work, you know, in Snowflake, there’s tools like Snowpipe and things that actually
132 00:24:12.660 ⇒ 00:24:28.050 Uttam Kumaran: sit on top of S3 and sort of stream data into Snowflake, and, like, basically they’re running, like, copy commands under the hood. So there’s also, again, if the goal is to move into Snowflake, run some large transformations, and then pipe that data back.
133 00:24:28.050 ⇒ 00:24:45.739 Uttam Kumaran: the nice thing about Snowflake, and this is similar on, you know, BigQuery or in Databricks, is we would just basically write that all back to a data lake, and then get that process back into a Postgres instance for querying. You know, that’s, I feel like, kind of common. I don’t know, Waysh, if you have any other.
134 00:24:46.070 ⇒ 00:24:54.560 Awaish Kumar: Yeah, there are, like, there are multiple options, as you mentioned, like, one way is to, like, put it back to some bucket, and then…
135 00:24:54.670 ⇒ 00:25:02.379 Awaish Kumar: loaded, to Postgres, but there are, like, tools like Polyatomic, as you mentioned, they can directly, like.
136 00:25:02.500 ⇒ 00:25:06.650 Awaish Kumar: load from Snowflake to Postgres without any intermediate storage.
137 00:25:06.760 ⇒ 00:25:07.600 Uttam Kumaran: Yeah.
138 00:25:07.990 ⇒ 00:25:08.770 Awaish Kumar: Yeah.
139 00:25:09.170 ⇒ 00:25:12.750 Awaish Kumar: But it, like, depends on, on the…
140 00:25:13.210 ⇒ 00:25:14.530 Awaish Kumar: Also on the use case, like.
141 00:25:14.530 ⇒ 00:25:15.080 Xiaojie Zhang: like…
142 00:25:15.300 ⇒ 00:25:24.390 Awaish Kumar: we… if we are directly gonna get the data in Snowflake, or if we are going to get it in Snowflake, or maybe in…
143 00:25:24.940 ⇒ 00:25:26.260 Awaish Kumar: in our storage.
144 00:25:26.520 ⇒ 00:25:31.179 Awaish Kumar: And then how we want to buy it? Does it need transformation, so… This kind of thing.
145 00:25:31.850 ⇒ 00:25:41.339 Uttam Kumaran: Yeah, there’s also ways that, like, look, if this gets landed in a… in a data lake, we may not even need to move it to a warehouse to do transformations.
146 00:25:41.440 ⇒ 00:25:54.749 Uttam Kumaran: There’s a lot more technology that’s come out to enable transformations directly on Data Lake, like, in memory, using things like DuckDB and things like that, so it is sort of, like, depending on the product requirements, like, whether it’s
147 00:25:54.900 ⇒ 00:26:01.120 Uttam Kumaran: You know, how fast things need to land, whether it’s, like, sort of, we have to stream that in, or we can batch stuff.
148 00:26:01.220 ⇒ 00:26:08.309 Uttam Kumaran: there’s a couple ways of doing this. I think, similarly, we kind of just would need to be able to talk to the…
149 00:26:08.710 ⇒ 00:26:13.270 Uttam Kumaran: The vendor themselves, and sort of learn a little bit more, so… .
150 00:26:13.270 ⇒ 00:26:18.580 Awaish Kumar: But is that, like, 160 million rows, like, coming every day, or…
151 00:26:18.730 ⇒ 00:26:20.630 Awaish Kumar: Is it more, like, incremental?
152 00:26:25.370 ⇒ 00:26:30.320 Uttam Kumaran: I think Greg may have… dropped? I don’t know, Shaji, if you know?
153 00:26:31.660 ⇒ 00:26:32.300 Xiaojie Zhang: Oh.
154 00:26:34.450 ⇒ 00:26:35.529 Xiaojie Zhang: I think so.
155 00:26:36.100 ⇒ 00:26:46.679 Xiaojie Zhang: Yeah, the internet connection might be bad in a hotel, or so, or maybe his laptop battery dead. You never know what happens during travel.
156 00:26:46.910 ⇒ 00:27:02.710 Uttam Kumaran: Yeah, but… but I think that makes sense. Yeah, I think… you guys are gonna use the product to do comps, but how… like, the… how much do you need, like, the data that’s coming in every day? Like…
157 00:27:03.410 ⇒ 00:27:09.099 Xiaojie Zhang: Like, I mean, I’ll turn… like, you also work in the real estate, right?
158 00:27:09.100 ⇒ 00:27:09.630 Uttam Kumaran: Yeah.
159 00:27:09.630 ⇒ 00:27:27.499 Xiaojie Zhang: So, ideally, they’re, like, Redfin, Dillo, everyone, like, this is so simple, right? Just pulling MLS data, in some sort of real time, but it’s all coming from, like, one data source. Like, I think we find the data source, which is CoreLogic. Like, I’m not sure whether you know, like, CoreLogic, but they’re kind of centralized, like, all the…
160 00:27:27.970 ⇒ 00:27:39.820 Xiaojie Zhang: all the real-ass data in the United States, but surprisingly, we look around on the market, there’s no good data provider that just have that near real-time data, like Zillor.
161 00:27:39.820 ⇒ 00:27:40.410 Uttam Kumaran: Graduate.
162 00:27:40.410 ⇒ 00:27:47.290 Xiaojie Zhang: there’s no. So that kind of… like, we do try out different providers, but… the…
163 00:27:47.740 ⇒ 00:27:50.230 Xiaojie Zhang: Result is not that good.
164 00:27:50.570 ⇒ 00:27:51.230 Uttam Kumaran: Okay.
165 00:27:51.230 ⇒ 00:27:52.720 Xiaojie Zhang: But we do apple-to-apple comparisons.
166 00:27:52.720 ⇒ 00:28:03.739 Greg: Yeah, sorry, I dropped… I lost internet. Yeah, I was gonna provide a bit of extra context, like, the whole reason we’re doing this is because the provider that we are using currently
167 00:28:03.930 ⇒ 00:28:11.420 Greg: just… Hasn’t been… like, we need super accurate data, and, like, there may be…
168 00:28:11.900 ⇒ 00:28:14.970 Greg: I don’t know, 85% there, but…
169 00:28:15.180 ⇒ 00:28:29.010 Greg: we need it to be, we need it to be as high as possible. And so, yeah, CoreLogic now, called Hotality, is kind of the gold standard for this stuff. There’s not really an option that can…
170 00:28:29.010 ⇒ 00:28:38.570 Greg: give us better data, but their bulk data is obviously not real-time, obviously. They have a real-time product called Trestle.
171 00:28:38.580 ⇒ 00:28:45.799 Greg: But that is on an MLS… per MLS basis, so you have to…
172 00:28:46.500 ⇒ 00:29:00.850 Greg: broker agreements with the individual MLSs, so we have, like, I implemented Trestle for, California, and that looks great, but…
173 00:29:01.000 ⇒ 00:29:17.690 Greg: we’re not going to be able to do, like, it’s not a straightforward process to set up these agreements, in all the different states, so that’s not an option, and that’s why we went with the data provider we’re with currently, because they kind of handle those individual MLS
174 00:29:17.840 ⇒ 00:29:22.430 Greg: relationships, and we get the API access.
175 00:29:22.590 ⇒ 00:29:28.620 Greg: Across, all the MLSs, but… like…
176 00:29:30.250 ⇒ 00:29:46.759 Greg: I can see the seams in their data, and where there’s… there’s stuff that’s just… that’s just wrong, or missing, and it’s… we’re, like, cobbling together, like, a couple of other smaller APIs to try and fill the gaps in their stuff, like…
177 00:29:46.760 ⇒ 00:30:01.119 Greg: their days on market is wrong for properties because of, you know, sale dates, listing dates, being off, and so we’re pulling those from somewhere else, and it’s just… it’s a nightmare. So… so it’s kind of like, well.
178 00:30:01.120 ⇒ 00:30:14.159 Greg: we’re gonna get serious, and we’re getting data from the best source we can. Now we need to work out how we then turn that into our own, API that can, get rid of this
179 00:30:14.210 ⇒ 00:30:16.450 Greg: of a… Chocolata.
180 00:30:17.960 ⇒ 00:30:31.940 Uttam Kumaran: Yeah, I think for us, the biggest thing is just to understand, like, what the SLAs are, from the product requirements. You know, whether, like, both on how do we land it, how do we transform it, and then how do we get that back into Postgres, and then what is…
181 00:30:32.310 ⇒ 00:30:40.750 Uttam Kumaran: for, like, what are the minimum SLAs we want to hit on fresh data. And then also, yeah, understanding, like, how they’re sending us that and the ergonomics of that is going to be…
182 00:30:41.350 ⇒ 00:30:56.430 Uttam Kumaran: basically the core here. The other thing is, like, again, building some redundancy. If the one thing is if this is touching product, of course, like, there’s a lot higher degree of scrutiny than if this is just for reporting, and so…
183 00:30:57.610 ⇒ 00:31:02.270 Uttam Kumaran: Really important for us to just make sure that there’s Different ways for…
184 00:31:02.420 ⇒ 00:31:15.749 Uttam Kumaran: you know, the product to continue to access this data, and there’s, like, a… basically, we have a clear understanding of, like, what, how fresh this data is, and, like, what part you can query, and how often we’re syncing it, so…
185 00:31:16.160 ⇒ 00:31:17.590 Uttam Kumaran: But okay, makes sense.
186 00:31:22.640 ⇒ 00:31:32.130 Greg: Yeah, like… I guess the… the landing and transforming is…
187 00:31:34.990 ⇒ 00:31:46.019 Greg: the SLAs are a bit different compared to, like, our API that’s then serving the results from, Postgres. Like, there needs to be…
188 00:31:46.620 ⇒ 00:31:57.389 Greg: super high availability, needs to be super fast. We want to basically be replicating, trestle-like experience, so we want to be generating our comps in
189 00:31:58.100 ⇒ 00:32:02.049 Greg: I want them to be under a second.
190 00:32:02.050 ⇒ 00:32:08.749 Uttam Kumaran: Really? I think the comp generation piece, yeah, like, I would say less concerned about that, because I think that’ll be…
191 00:32:08.750 ⇒ 00:32:09.350 Greg: Yeah, yeah, yeah, yeah.
192 00:32:09.600 ⇒ 00:32:10.999 Uttam Kumaran: Yeah, more on, like.
193 00:32:11.970 ⇒ 00:32:27.440 Uttam Kumaran: how does that… how is, like, the 130 million sort of coming in? And how much of it can we do incremental? Can we transform in flight, and then land back? And then understanding, okay, like, what are the deltas that are coming in every day, and then…
194 00:32:28.790 ⇒ 00:32:35.360 Uttam Kumaran: basically trying to phase to see, like, okay, here are some options, you know? So, yeah, it’s a cool problem, yeah, definitely. Makes sense.
195 00:32:37.950 ⇒ 00:32:41.269 Greg: Yeah, yeah, so… we kind of need to…
196 00:32:42.280 ⇒ 00:32:47.309 Greg: get some answers to a bunch of those… a bunch of those questions. I need to…
197 00:32:47.550 ⇒ 00:32:51.099 Greg: Just set some time, with them.
198 00:32:51.100 ⇒ 00:32:55.790 Uttam Kumaran: You guys have the contact at CoreLogic already, and they basically have given you.
199 00:32:55.790 ⇒ 00:32:56.689 Greg: Oh, yeah, yeah, yeah.
200 00:32:56.830 ⇒ 00:33:01.880 Uttam Kumaran: Okay, they’ve given you a set of options on, like, how they can load. So, yeah, I mean, for this, it’s basically, like.
201 00:33:02.030 ⇒ 00:33:04.999 Uttam Kumaran: We would just do a couple proof of concepts and see
202 00:33:05.280 ⇒ 00:33:14.199 Uttam Kumaran: like, what’s possible, you know? Whether we go directly to Snowflake, whether we drop this in a data lake and do something in memory, I think OH, like, there’s a couple options.
203 00:33:14.630 ⇒ 00:33:18.010 Awaish Kumar: We could also use, like, if it is coming to S3 directly.
204 00:33:18.240 ⇒ 00:33:22.900 Awaish Kumar: use by Spark, and transform it, and store directly to the
205 00:33:23.310 ⇒ 00:33:26.900 Awaish Kumar: Sources, and we don’t need to move it to warehouse in that case.
206 00:33:26.900 ⇒ 00:33:29.810 Uttam Kumaran: Yeah, like, that’s kind of, like, what I’m hoping happens, is that
207 00:33:30.140 ⇒ 00:33:36.630 Uttam Kumaran: like, we don’t need to actually go through a snowflake. It depends on, like, how they drop… how they can drop it to us, and we can do everything
208 00:33:36.970 ⇒ 00:33:38.380 Uttam Kumaran: directly in Lake.
209 00:33:40.110 ⇒ 00:33:41.150 Uttam Kumaran: But we’ll see.
210 00:33:41.850 ⇒ 00:33:42.839 Greg: So, can you…
211 00:33:43.040 ⇒ 00:33:52.290 Greg: tell me a bit more about that. Like, what would be the determinative factor if you had to…
212 00:33:54.890 ⇒ 00:33:59.400 Greg: use Snowflake or not in that kind of scenario.
213 00:34:00.220 ⇒ 00:34:14.409 Awaish Kumar: Yeah, for, like, for Snowflake, like, it depends on what is the use case. If only the use case is to transform it and put it back to Postgres, where it will be used as… will be used by the product.
214 00:34:14.600 ⇒ 00:34:17.720 Awaish Kumar: Then we don’t need to, like, involve the warehouse.
215 00:34:17.820 ⇒ 00:34:32.119 Awaish Kumar: But if there are things, like, we need the same data for analytical reporting, or, like, we want to do more deep dive into the data and figure out the trends, or things like that, then we might need to store it in a data warehouse.
216 00:34:32.770 ⇒ 00:34:43.239 Greg: Yeah, yeah, see, I don’t know, if you have any different thoughts, but we don’t have… Products…
217 00:34:44.050 ⇒ 00:34:53.670 Greg: requirements around… that kind of analytics on the listings data. At the moment. It’s…
218 00:34:54.230 ⇒ 00:34:59.180 Greg: Like, Sean, like, the… the possibilities are endless.
219 00:34:59.450 ⇒ 00:35:00.550 Uttam Kumaran: Yeah, we could do…
220 00:35:00.550 ⇒ 00:35:19.139 Greg: But, like, we don’t have those use cases yet. Like, our use case is very, kind of, narrow and focused and about making, like, amazing comps that are super accurate, super fast, that builds trust with agents and, you know, gives them the best
221 00:35:19.410 ⇒ 00:35:33.550 Greg: the best comps that kind of money to buy, really. That’s… that’s the… that’s the thing that we’re doing now. Obviously, there’s lots of stuff that would come
222 00:35:33.730 ⇒ 00:35:38.010 Greg: Later, but that’s… that’s not defined at this stage.
223 00:35:38.950 ⇒ 00:35:54.699 Xiaojie Zhang: Yeah, I agree with you, Greg. I think, I mean, as Greg said, like, the possibility is endless. I will say, you know, since we’re building a tool for real estate, I don’t see why we kind of start coming up with the analysis for the area.
224 00:35:54.700 ⇒ 00:36:05.669 Xiaojie Zhang: the neighborhood, all that type of stuff, because we have first-hand data and just floating into our data warehouse. Like, I think it’s gonna be a waste if we just transforming data, you know, and just make comps.
225 00:36:05.670 ⇒ 00:36:10.710 Xiaojie Zhang: But for right now, I think foreseeing future, we don’t really have big…
226 00:36:10.980 ⇒ 00:36:14.779 Xiaojie Zhang: query to run an analysis base. We want to transform the data
227 00:36:14.940 ⇒ 00:36:31.749 Xiaojie Zhang: in some certain way, and just put back to our production database, very likely Postgres or, like, Elasticsearch, like, I don’t know, but it’s like, we’re gonna decide, like, later, but we’re gonna, basically run super-fast query to pull the most accurate
228 00:36:31.810 ⇒ 00:36:38.309 Xiaojie Zhang: most up-to-date data as possible. That’s for a foreseen future.
229 00:36:39.300 ⇒ 00:37:00.349 Uttam Kumaran: Yeah, I mean, the best thing here is that you actually don’t lose by going in one direction. Snowflake, of course, is a data warehouse. It’s tuned for analytics, like large selects over large amounts of data, like, not point lookups. And so, even if we were to do transformation in a data lake, you can still put Snowflake on top of that, and so, like.
230 00:37:00.370 ⇒ 00:37:02.470 Uttam Kumaran: I kind of want to understand, like.
231 00:37:02.490 ⇒ 00:37:13.309 Uttam Kumaran: if, for example, if Snowflake, if they can only share the data via Snowflake, then that changes the architecture a little bit, but if we can decouple that, then both are still achievable, and I totally hear you, like.
232 00:37:13.310 ⇒ 00:37:25.850 Uttam Kumaran: there’s… there’s probably tons of analytics use cases and other use cases to go on top of that data, but the primary one to solve is getting that… those records as fast as possible into Postgres in a transformed state.
233 00:37:25.900 ⇒ 00:37:31.529 Uttam Kumaran: there are some options for us to try that don’t involve Snowflake.
234 00:37:31.800 ⇒ 00:37:38.059 Uttam Kumaran: You know, and so I think it’s worth seeing what the trade-offs are Between those, you know, so…
235 00:37:39.340 ⇒ 00:37:40.549 Uttam Kumaran: That makes sense.
236 00:37:43.150 ⇒ 00:37:44.280 Uttam Kumaran: Cool. I mean, and.
237 00:37:44.280 ⇒ 00:37:49.729 Greg: Yeah, no, they, they, like, they said, we can get it from…
238 00:37:49.940 ⇒ 00:37:58.139 Greg: SFTP, like, I think it’s, you know, they can chuck it in S3 or whatever, it’s just certainly not… it doesn’t have to go via…
239 00:37:58.140 ⇒ 00:38:15.570 Uttam Kumaran: As long as it ends up in S3 or a data lake, we can always move it to a data warehouse, but right now, some of the technology that’s come out recently allows you to do, like, transformation in flight, which allows us to dramatically reduce the SLA. Like, if we were to move this to Snowflake.
240 00:38:15.570 ⇒ 00:38:22.519 Uttam Kumaran: run transformations, push it back to S3, or push it back to Postgres, there is just another step.
241 00:38:22.690 ⇒ 00:38:25.880 Uttam Kumaran: like, I think we can achieve much faster SLAs.
242 00:38:26.030 ⇒ 00:38:28.360 Uttam Kumaran: For the product use, for sure.
243 00:38:28.520 ⇒ 00:38:32.860 Uttam Kumaran: So that would be, like, basically, we would just kind of, like, do a couple of…
244 00:38:33.190 ⇒ 00:38:51.299 Uttam Kumaran: of demos and kind of show what’s… what’s possible. Of course, like, again, if we were to… it’s just also the amount of hoops that, you know, that data has to jump through every time, and understanding, yeah, are they landing… are they landing, like, flat files every day? Are they landing the deltas? When does that come in? Like, how can we trigger…
245 00:38:51.640 ⇒ 00:38:59.289 Uttam Kumaran: I don’t know, OH, like, I feel like in the last few years, there’s a lot more technology on, like, data lake, sort of, in-memory transformations for us to definitely try here.
246 00:38:59.960 ⇒ 00:39:00.519 Uttam Kumaran: the Bible.
247 00:39:00.920 ⇒ 00:39:02.810 Uttam Kumaran: A few years before that.
248 00:39:05.380 ⇒ 00:39:14.109 Awaish Kumar: Yeah, like, the common architecture is, like, people use PySpark on top of the data lake, or, like, S3 storage.
249 00:39:14.310 ⇒ 00:39:16.619 Awaish Kumar: And because it gives us the… what?
250 00:39:17.250 ⇒ 00:39:24.859 Awaish Kumar: like, the… the… power to process it in, like, parallelly, in, like, in seconds, it can also…
251 00:39:25.180 ⇒ 00:39:31.490 Awaish Kumar: Process billions of rows, and then also we can directly store Two operational databases.
252 00:39:32.050 ⇒ 00:39:32.650 Uttam Kumaran: Yeah.
253 00:39:35.120 ⇒ 00:39:40.620 Uttam Kumaran: Cool. And then the la- I mean, the last piece… last… last phase was just talking about, like.
254 00:39:40.800 ⇒ 00:39:48.809 Uttam Kumaran: underbuilt. I think one of the things that we talked a little bit about was just understanding, like.
255 00:39:49.000 ⇒ 00:39:58.379 Uttam Kumaran: what are… what are folks, like, what are folks looking for in terms of new addresses? Like, where’s the demand for more, like, underbuilt data?
256 00:39:58.380 ⇒ 00:40:16.250 Uttam Kumaran: like, where should they… where should we prioritize the next cities to build? These were just a couple of things we discussed. We didn’t really go super deep. I mean, this is the product where I was like, okay, yeah, that’s, like, super, super awesome. How do we… I’m kind of interested on the product side to understand, like, the roadmap here, and, like, how does data…
257 00:40:16.610 ⇒ 00:40:22.800 Uttam Kumaran: you know, help with that? Is there other parts of this process that… that needs also similar, like.
258 00:40:23.470 ⇒ 00:40:28.700 Uttam Kumaran: like, SOA or sort of data manipulation help? Like, what do you guys think?
259 00:40:31.670 ⇒ 00:40:41.779 Xiaojie Zhang: Yeah, I will say, unfortunately, both Gregory and I are not experts, like, not super, super detailed on the build side. Like, at least me, I’m not sure how Gregory is.
260 00:40:41.780 ⇒ 00:40:43.449 Greg: Definitely not. I’m definitely not.
261 00:40:43.450 ⇒ 00:40:44.490 Xiaojie Zhang: I have some lightness.
262 00:40:44.620 ⇒ 00:41:02.150 Xiaojie Zhang: I look into… I mean, I’m kind of testing things around, like, for my first project in Brazy, I’m kind of, like, tests around, or give some stress tests for underbuild. My understanding is it’s, like, right now, I think the hard part is, like, data parsing. It’s kind of, like, gathering
263 00:41:03.290 ⇒ 00:41:05.640 Xiaojie Zhang: different, like, PDFs, right?
264 00:41:05.640 ⇒ 00:41:11.689 Uttam Kumaran: Well, that’s why I’m interested to know what you guys are using under the hood for OCR or for extraction. I don’t know if you know.
265 00:41:12.130 ⇒ 00:41:15.990 Xiaojie Zhang: Yes, I think you’re definitely asking that, like, you know, that’s a good question.
266 00:41:15.990 ⇒ 00:41:17.510 Uttam Kumaran: That’s a million dollar question, too.
267 00:41:17.510 ⇒ 00:41:25.949 Xiaojie Zhang: Yeah, to be very honest, I can be straightforward, too, like, for this moment, we’re doing things in a hardcore way, which is, like, very manual.
268 00:41:25.950 ⇒ 00:41:30.400 Uttam Kumaran: We are thinking about, or we are moving towards a more.
269 00:41:30.400 ⇒ 00:41:32.100 Xiaojie Zhang: Large inks model-based.
270 00:41:32.830 ⇒ 00:41:33.310 Uttam Kumaran: Great.
271 00:41:33.310 ⇒ 00:41:34.150 Xiaojie Zhang: OCR…
272 00:41:34.150 ⇒ 00:41:38.330 Uttam Kumaran: I have some… I have some things you guys should try,
273 00:41:38.500 ⇒ 00:41:56.309 Uttam Kumaran: I don’t know if you guys are doing a spike around this, but we… we’ve… we evaluated a bunch of tools, for, like, extraction, retrieval, from… from tons of PDFs, like, diagrams, text.
274 00:41:56.520 ⇒ 00:42:06.929 Uttam Kumaran: like, all types of stuff from… and there’s a couple of great platforms that do this extremely, extremely well, that you should totally evaluate if you guys are doing an evaluation whenever.
275 00:42:07.050 ⇒ 00:42:08.400 Greg: United. Yeah.
276 00:42:09.060 ⇒ 00:42:24.870 Xiaojie Zhang: Because that’s definitely on our roadmap. We’re actually building it right now. I’m, like, more on the pipeline and AI side for now, because that’s kind of, like, the China team domain is. I’m just trying to get things up to speed and, like, sustainable.
277 00:42:24.870 ⇒ 00:42:29.419 Xiaojie Zhang: And I will also, you know, spare some time on the underbuild side, basically.
278 00:42:29.880 ⇒ 00:42:48.569 Xiaojie Zhang: spin off the workflow, or kind of automated workflow. Like, we might still need to do some scraping, or kind of, like, manually pulling this, like, data in, but I’m 100% with you, we need to abstract this data, like, it’s… I think it’s, like, very domain-specific. You need to understand the…
279 00:42:48.570 ⇒ 00:42:50.509 Xiaojie Zhang: The building code, the char…
280 00:42:50.510 ⇒ 00:42:54.270 Greg: It’s such a wide variety, because it’s at, like, a county level.
281 00:42:54.270 ⇒ 00:42:54.940 Xiaojie Zhang: Exactly.
282 00:42:54.940 ⇒ 00:43:02.429 Greg: or even… Yeah, or even lower, it’s… there’s just, like, a huge variance in the way this…
283 00:43:02.990 ⇒ 00:43:15.180 Xiaojie Zhang: Yeah, but we’re… we are thinking about, kind of, like… I mean, I think it’s definitely, it’s not only a technology problem, I think technology operation, or some sort of, like, it’s a hybrid approach.
284 00:43:15.180 ⇒ 00:43:22.020 Uttam Kumaran: procurement of these, like, where does it land? But I just think, like, this problem 5 years ago would have been way…
285 00:43:22.020 ⇒ 00:43:23.759 Xiaojie Zhang: It’s unsolvable. It’s unsolvable.
286 00:43:23.760 ⇒ 00:43:24.890 Uttam Kumaran: I don’t know.
287 00:43:25.450 ⇒ 00:43:34.259 Uttam Kumaran: I do think that, like, you have at least some path, like, there’s a lot of great OCR and, like, LLM, like, image…
288 00:43:34.430 ⇒ 00:43:41.069 Uttam Kumaran: Extraction tooling now that you can consider a building system that, like, layers a lot of that on, compares, like.
289 00:43:41.280 ⇒ 00:43:42.949 Uttam Kumaran: I don’t know, I just think it was.
290 00:43:42.950 ⇒ 00:43:52.679 Xiaojie Zhang: Or even, like, we can condense a 50-page building code guideline to just 3-page TLDR and give… hand that to our…
291 00:43:52.970 ⇒ 00:43:53.520 Uttam Kumaran: Exactly.
292 00:43:53.520 ⇒ 00:43:54.190 Xiaojie Zhang: Correct.
293 00:43:54.220 ⇒ 00:44:06.670 Uttam Kumaran: they’re gonna save them, like, tons of time, and… As long as, like, your level of accuracy and, like, what is important, like, what is… what is the… what is the, like, proof of concept of, like.
294 00:44:06.980 ⇒ 00:44:18.510 Uttam Kumaran: getting every building, and then… and then how do we layer… start to layer… because also, these are PDFs, it’s not like they go anywhere, you know, so as long as you get them stored, you can start to layer on much more and more and more.
295 00:44:18.740 ⇒ 00:44:23.919 Uttam Kumaran: of the latest that’s coming out from Google, from, like, Contextual, on, like, these
296 00:44:24.200 ⇒ 00:44:30.550 Uttam Kumaran: extraction, you know, and there’s… it’s just like, I feel like in the last 2 years, especially, there’s a lot in this world.
297 00:44:30.550 ⇒ 00:44:31.240 Greg: Oh, yeah.
298 00:44:31.240 ⇒ 00:44:32.910 Uttam Kumaran: Cool.
299 00:44:34.090 ⇒ 00:44:44.410 Uttam Kumaran: I can tell, this is what, like, was so super exciting to hear about, because I was like, yeah, I don’t think you could have built this before, I don’t know, it would have been very, very hard. But it is a unique data challenge, because of course, like.
300 00:44:45.070 ⇒ 00:44:47.770 Uttam Kumaran: This is, like, really tough data to, like…
301 00:44:48.060 ⇒ 00:44:50.810 Greg: Yeah, let’s do it, but just for a very small area.
302 00:44:51.390 ⇒ 00:44:53.130 Uttam Kumaran: Yeah, yeah.
303 00:44:54.290 ⇒ 00:44:57.830 Uttam Kumaran: Cool. But yeah, it wasn’t able to be scaled.
304 00:44:57.830 ⇒ 00:45:00.869 Greg: Sort of out beyond that, sort of.
305 00:45:01.300 ⇒ 00:45:05.219 Greg: small LA area, and that’s… and that’s what we’re doing.
306 00:45:05.220 ⇒ 00:45:24.709 Uttam Kumaran: Yeah, I’m interested in how you guys are, like, not only, like, I… I mean, there’s one part of this we wrote here, which is, like, prioritizing where to go next, but when you go into a new area, like, how do… how do you leverage AI to actually automate, like, whether it’s the… I don’t know, I don’t… again, I assume these places don’t have APIs, so you’re either logging into, like, a county database.
307 00:45:24.820 ⇒ 00:45:31.409 Uttam Kumaran: raping and moving PDFs to, like, something… I don’t know, it’s an even interesting, like, data acquisition.
308 00:45:31.410 ⇒ 00:45:38.029 Greg: Yeah, PDFs or even Word docs on, like, county websites and stuff, yeah.
309 00:45:38.340 ⇒ 00:45:40.499 Greg: But it’s also designed to be no API access.
310 00:45:40.500 ⇒ 00:45:47.930 Uttam Kumaran: So, yeah, like, how do you know the… you got it right, you know? Yeah, but it’s cool.
311 00:45:48.360 ⇒ 00:45:57.399 Uttam Kumaran: I think you guys may be some of the first people I’ve seen that are actually doing this, like, it may just be a complete proprietary data set. You know, I don’t know anybody who…
312 00:45:57.840 ⇒ 00:46:02.190 Uttam Kumaran: I don’t know, maybe Zillow or Redfin or those guys are doing something like this, but,
313 00:46:02.440 ⇒ 00:46:04.149 Uttam Kumaran: I don’t know, yeah, it’s pretty cool.
314 00:46:04.980 ⇒ 00:46:07.330 Greg: As far as I know, we’re the only ones doing.
315 00:46:07.330 ⇒ 00:46:12.870 Uttam Kumaran: The webinar, like, it was so cool to watch, like, how that was getting used, and then they produced the
316 00:46:13.010 ⇒ 00:46:17.320 Uttam Kumaran: the plans and things like that, like, I sort of can see the through line, for sure.
317 00:46:22.200 ⇒ 00:46:27.810 Greg: Yeah, but it’s a… yeah, it’s an interesting but not easy problem to solve, for sure.
318 00:46:29.340 ⇒ 00:46:46.879 Uttam Kumaran: Yeah, and then kind of, like, maybe I know we just have a few minutes of, like, to maybe give you a sense of, like, how we work. Like, we’re… we’re all engineers, so we typically operate on, like, one-week sprints. We run, like, internal stand-ups for, like, all of our clients, basically, and then we typically at least
319 00:46:46.880 ⇒ 00:46:55.999 Uttam Kumaran: of course, we work with Slack, we do… all of our team is, like, really, really remote, async friendly, so we do a lot of Looms, Slacks,
320 00:46:56.000 ⇒ 00:47:13.760 Uttam Kumaran: you know, everybody operates and does, like, PR reviews and things like that. We try to at least meet, like, with the team, with our client teams, like, once a week, because I know things get busy and just don’t want to go more than a week without saying hi, but if we want to even do daily stand-ups or whatever is helpful,
321 00:47:13.760 ⇒ 00:47:32.440 Uttam Kumaran: we operate, you know, kind of at your speed. A lot of us on the team have worked, you know, in product startups before, so… so no… no sort of fear there. I think the biggest thing for us to understand is out of these three items, like, the phasing, right? So, for the… I think the product analytics work, maybe…
322 00:47:32.440 ⇒ 00:47:47.790 Uttam Kumaran: Xiaoji, maybe you, Jim Z, other folks on the product team, like, how do you understand, like, what’s going on? And then there’s also, like, okay, what… what are the… what is the initial milestone on the MLS data side? And, like, what are some target dates we need to hit? And, like, what are the criteria of, like.
323 00:47:48.040 ⇒ 00:47:54.569 Uttam Kumaran: a proof of concept, MVP, and, like, kind of V1. And then I think the underbuilt thing is kind of for us to…
324 00:47:54.970 ⇒ 00:47:56.350 Uttam Kumaran: Probably poke at.
325 00:47:56.540 ⇒ 00:48:04.279 Uttam Kumaran: third, you know, and sort of just see what’s going on, and maybe assist with, like, evaluating tools or proof of concepts. Is that, like, sort of, like.
326 00:48:05.090 ⇒ 00:48:06.160 Uttam Kumaran: cover…
327 00:48:06.440 ⇒ 00:48:14.570 Uttam Kumaran: sort of the scope. I guess we’re only talking about, like, the next, like, sort of couple months, right? But yeah, that’s… that’s sort of, like, what’s top of mind for us.
328 00:48:16.670 ⇒ 00:48:40.310 Xiaojie Zhang: Yeah, I think we just want to get things rolling, like, to be very honest, we’re a startup, we’re scrappy, there are tons of other random, to be very honest, like, random shit there. Like, for us, like, maybe for end of this year, we’re trying to launch Android, which is big enough work in front of us. We have iOS running, but I’m, like, always pushing GMC, saying, like, okay, data solution, we should
329 00:48:40.310 ⇒ 00:48:55.209 Xiaojie Zhang: take it as early as possible to gather user signals, all that. So that’s the reason why I kind of bring, like, a mixed panel, a static, because I have experience with Static, although we’re not doing, like, A-B testing at this moment, but at least it’s a very cheap way to get this, like, data where
330 00:48:55.210 ⇒ 00:48:57.810 Xiaojie Zhang: A sort of data warehousing, like.
331 00:48:57.810 ⇒ 00:48:58.470 Uttam Kumaran: Yes.
332 00:48:58.470 ⇒ 00:49:03.160 Xiaojie Zhang: Almost 10 years ago, I was, like, in Airbnb. It’s a 50-people team to make this.
333 00:49:03.160 ⇒ 00:49:04.519 Uttam Kumaran: Oh, I know, I know.
334 00:49:04.520 ⇒ 00:49:06.420 Xiaojie Zhang: speed testing framework, like, it’s gonna say it was…
335 00:49:06.420 ⇒ 00:49:07.229 Uttam Kumaran: Absolutely, right?
336 00:49:07.230 ⇒ 00:49:24.729 Xiaojie Zhang: Right now, like… Yeah, but right now, I get 1 million event free from per month, which is a dream, like, and we are not that big of a company that gets that much traffic, so we are basically getting things for free, like, why not? That’s kind of my thing. And I’m pretty sure, like, of course, Greg is…
337 00:49:24.730 ⇒ 00:49:32.509 Xiaojie Zhang: you know, setting the plan, we’re gonna handle that intense data problem. So we are forming data workstream.
338 00:49:32.600 ⇒ 00:49:37.060 Xiaojie Zhang: Along the way. And I think it’s, like, almost perfect time to discuss, like, how we’re gonna…
339 00:49:37.330 ⇒ 00:49:50.099 Xiaojie Zhang: coord… like, collaborate, coordinate, like, you know, in the near future, but I think I… like, like, Greg and I will bring, you know, like, our, like, the feet, like, you know, our thought through this. Yeah.
340 00:49:50.100 ⇒ 00:50:13.070 Xiaojie Zhang: chat, I think we do learn a ton. You guys are super experienced on, kind of, like, you know, tools out there, and, like, I seem to, like, super experienced on, like, real estate side as well, which is totally good. Yeah, we will bring, like, we will have, like, internal discussion, saying, like, how we’re gonna work together, what’s a pace, what’s a goal, because right now, we’re kind of, like, a little bit swamped by, like.
341 00:50:13.070 ⇒ 00:50:16.070 Xiaojie Zhang: You know, some odd… Product area?
342 00:50:16.070 ⇒ 00:50:29.009 Uttam Kumaran: No surprise at all to us. Like, I’ve worked in Sardis my whole career, so I have no fear. I think the best way is to just tell us, like, what is the golden goose here? Have us… have us, and then give us a hard deadline.
343 00:50:29.260 ⇒ 00:50:45.360 Uttam Kumaran: And we’ll work backwards from there. Of course, we’re used to dealing with shifting priorities and all that. For us, what you have is, like, we’re very unbiased in the tools. We pick the tools to accomplish the job. Like, we don’t get paid by any of these guys, but we just like to work with the best
344 00:50:45.360 ⇒ 00:50:59.869 Uttam Kumaran: tools, because it just makes life a lot easier. So more than happy to be a partner in, like, deciding on, like, what the infra is. But also just, like, you guys have a timeline that you’re trying to hit, so however we can help, you know, get you there, for sure.
345 00:51:01.390 ⇒ 00:51:02.160 Xiaojie Zhang: Cool.
346 00:51:02.160 ⇒ 00:51:03.309 Greg: Yeah, sounds great.
347 00:51:04.040 ⇒ 00:51:22.740 Uttam Kumaran: Okay, cool, so I’m gonna take some of our notes, and I’m just gonna just, like, I’ll just, like, actually edit some of the stuff in that document, and then maybe I’ll just send another copy out, and then, yeah, I’ll just maybe send a short summary of, like, our conversation with the channel, and then, yeah, if you guys want to let me know next steps, or I’ll send a note to the channel and, you know, kind of let everyone know that we chatted, and…
348 00:51:22.740 ⇒ 00:51:29.130 Uttam Kumaran: Yeah, I feel like after this conversation, I kind of have, you know, a lot more context, at least in the two core first phases.
349 00:51:29.130 ⇒ 00:51:35.669 Uttam Kumaran: I think just getting, you know, context on timeline and, like, what the requirements are would be even better, but, like.
350 00:51:35.980 ⇒ 00:51:38.890 Uttam Kumaran: You know, we’re ready to go. Totally.
351 00:51:39.290 ⇒ 00:51:40.540 Xiaojie Zhang: Awesome, awesome.
352 00:51:41.110 ⇒ 00:51:41.790 Uttam Kumaran: Okay.
353 00:51:42.000 ⇒ 00:51:43.559 Uttam Kumaran: Alright, that sounds great.
354 00:51:43.990 ⇒ 00:51:45.809 Uttam Kumaran: Yeah, safe travels, Greg.
355 00:51:46.280 ⇒ 00:51:50.569 Greg: Yeah, it is. Yeah. Yeah. Yeah. Yeah, for sure.
356 00:51:51.550 ⇒ 00:51:52.900 Uttam Kumaran: Okay. Cool. Thank you.
357 00:51:53.220 ⇒ 00:51:54.469 Xiaojie Zhang: Alright, thanks folks.
358 00:51:54.470 ⇒ 00:51:54.930 Awaish Kumar: Thank you.
359 00:51:54.930 ⇒ 00:51:55.839 Greg: Thanks a lot.
360 00:51:55.840 ⇒ 00:51:56.340 Xiaojie Zhang: Ruin.
361 00:51:56.340 ⇒ 00:51:56.770 Greg: Bye.