brainforge_breezy_technical_discussion_12_10

Uttam Kumaran: Hey, how are you? Xiaojie Zhang: I’m doing good, how are you? Uttam Kumaran: Good, nice to meet you. Xiaojie Zhang: Oops in here, sorry, I have too many note-takers here. … Let’s see, it’s, , the system we’re trying to build, there’s, , the function, , one of the features is AI Notetaker. I have two. Uttam Kumaran: I have one for dev environment, another for product. Xiaojie Zhang: , lucky, he only has the product one, or the dev one. Uttam Kumaran: Are you guys gonna do… are you guys gonna do anything for, , phone calls? Xiaojie Zhang: We are trying, we’re trying to integrate that with phone call. now, the role model in the AI note-taker side is Granola. Uttam Kumaran: , , . I was gonna say, , I… Xiaojie Zhang: , there’s a reason to lay support. Uttam Kumaran: I haven’t tried the phone call feature, because I always forget to, . Xiaojie Zhang: I haven’t, I haven’t. they need some certain setup, whatever. , I have granola on my laptop as , but, , I prefer that, but we have the… Our own solution, we’re just dogfooding it. Uttam Kumaran: , one thing that we do is we use Zoom, Zoom’s native recording. Oh, . How is that? Pretty good. It’s … , the one thing that’s nice is that, , it doesn’t have to use a note taker, because it’s just, , inbuilt in the platform, but it’s not very popular. I feel they didn’t do a good job of, . you have to build it, we built on top of the SDK to, , grab files and recordings, but… It’s a little bit cleaner of a solution. But, , Granola is the first one that I saw that is, , trying to do phone calls. But still, what I started to do is I’ll take the phone call on speaker, and just turn granola on my laptop. … . , really nice to meet you both. I appreciate the time. I’m not how much, , context, y’all have, but I sent a little bit of a document over, . I’ll give you a little bit of background on Brainforge, we’re a data and AI consultancy. We do a lot of work, setting up data infrastructure, product analytics, everything around data and reporting. And , a lot of the work that we do is, , things that both Jim Z and Seagal mentioned that were really important, which is just setting up, , the reporting stack, warehouse, ETL, modeling, as as considering, , options for product analytics. And then understanding, , reporting. the other workstream that he mentioned was, this, , unique MLS dataset, , that you guys are working with. But , just wanted to say hi today, and just see how many questions I could answer, and get us into how your guys’ vision for , both of those… those work streams. The other thing is I was able to go to a webinar with, with a demo with, , a bunch of, , new beta users, I did get a sense that… , it was just the day after I met them, and they’re , can you hop on this? I’m , , I’ll film and listen. And it’s a cool product. it’s great. I have a lot of friends in, residential and commercial real estate. I worked in, I worked at WeWork on the data team for a while. Xiaojie Zhang: Oh, here we go, here we go. Nice, nice. You , , know what is going on on the real estate side. Uttam Kumaran: , but it’s also, , what you guys are doing on the pulling building planes that is, , awesome. I don’t think I’ve seen anything that. But then, of course, you guys are hyper-focused on… the real estate, , the… the real estate agent and, , everything around that person. I don’t know, it was just cool to watch, , get a chance to listen in, … Xiaojie Zhang: Cool, cool, cool. Thanks for that. , we can do a quick introke for us. I’m Xiao Ji. We… , , Greg and I are on another side. You folks are in United States, we’re on another side of the planet. I’m in China at this moment, Greg is in, Australia, , Greg: , not now, I’m not. Xiaojie Zhang: Sorry, you’re in Melbourne? Greg: I’m usually in Melbourne, but now, I’m in, in Eastern Malaysia. Xiaojie Zhang: Oh, here we go. Oh, nice. A little bit of funds there. Great. Greg: , only landed… only landed, , a couple of hours ago. Xiaojie Zhang: Oh, , shit, , , , , we’re on another side of the planet, that… which is great, and I’m , , we have, I would say small dev center in China, I’m , , leading and overseeing, , folks here. Had some experience on the data side, on the… , I… , I worked for Airbnb before, … and, , as a product engineer, plus, data engineer for a while, I’m , I pick… I then go back to the product land, because I’m, , more… , I’m more passionate about, , building software for customers, but I get the idea about, , , how complicated and… Uttam Kumaran: You guys had a good data stack there. I used to read the Airbnb blogs. Xiaojie Zhang: Oh, , the Airflow and, , , SuperSat stuff, , , that’s… that’s… that’s… I would say it’s good open source software, if you have choice. But we are starting things in our… Uttam Kumaran: This is, , back then, ? I feel now, , we have a… there’s a lot of better and cheaper options that are more stable than, , even 5 years ago, … Xiaojie Zhang: , and Greg is an engineering leader who is, , the founding engineer who built a lot of things from ground up. , , we’re joined… , we’re here, , , Jim C and Sigao want us to have a quick chat, trying to of course, know each other, ask some questions, and see whether they can be a good fit there. , still, we have idea what it can, , the relationship gonna, , marching towards to. But , , but I’m more than happy to know each other, and we are , , discussing the next step, how we’re gonna… , integrate with… , the team, or, , start to… it’s important to start the data initiative, trying to understand the data from a, , user or BI perspective, and at the same time, we have , a big, , a project that pulling data for a more accuracy purpose, or more real-time purpose. , , that that can be some interesting topic to… to… to dive deep a little bit. , … Greg, do I… do you have anything to add on? Greg: my… my background was I worked at Afterpay, from when it was small, meaning we scaled that up. I was on, , the core team. For most of… 8 years, , as a… as a back-end engineer and engineering manager, and, , that was, that was pretty interesting. hockey stick growth, . , I’m looking forward to hearing, , your… your thoughts on, you said, even in the last 5 years, things have changed a lot in terms of, , ingesting huge amounts of data and… Transforming it, or whatever, it’s, we can use it. in, , super fast queries . , , looking forward to getting into that. Uttam Kumaran: , great, and then maybe, . For me, I live here in Austin. My background is in data engineering, I’ve worked, I worked in New York for a number of years as a data engineer. , , increasingly smaller startups, and then my last role, I led product at a data startup. , , you did a just similar path to you, . , I went into product as , and I was just… I became very opinionated about data products, because I was buying a lot of data products. And then I went and, , , built the first version of one, and then I left to start this business, , about 3 years ago. And , it’s just scaling a lot of what we’ve learned from doing, , data stack setups, reporting analytics setups, and just now doing that for clients. And we’ve assembled , an amazing team, and I’ll let, , Wish intro after this, but all we do every day is come in and implement a lot of data solutions. The other thing is we built a lot of our business using AI, and , , in the last year or . we’ve been trying to find more ways to weave in AI to accelerate the existing data work that we do, but additionally layer AI on top of a lot of the data for, , of course, doing a lot of the, , the stuff that y’all are doing. , tons of context engineering. tons of making, , different types of data formats available for agents, and that’s been, , a blast. , maybe I wish I’ll let you give a little bit of intro, and then I would to just maybe walk through the document that we put together, and then, , I’m happy to talk a little bit about how we think about the ecosystem of vendors, tools, a way to put stuff together. , more than happy. Awaish Kumar: , my name is Vish Kumar, and I’m in Pakistan now. I have also… I’ve been working, , for more than 8 years now as a data engineer. I have also experienced working at startups and growth stage companies. I have previously worked in, , vacation rental business as . , similar to property, but a little bit of caveats. , that’s me. I have, , I’ve been building data infrastructure from scratch, and, , creating the end-to-end data pipelines and helping with the BI tool setup as . Uttam Kumaran: maybe I’ll just go ahead and walk through, this document, and, . , from our short discovery and chats with everybody, we learned two things. One is, , , we have a good… somewhat a good sense of… of what the product is. I still have to install it and give it a shot on my end, but I am familiar with the real estate market, and hearing Jim Z’s vision, , hearing the webinar, I have an understanding of, , what the business problem is that the team is definitely looking to solve. Additionally, what was really important, , when working with any company that’s trying to grow fast, y’all, is just timeline. I did get an understanding of, . , where we are in phase now, how we’re onboarding data users, we’re looking for, , more, , structured launches in early next year. And I, of course, understand that in order for, , data folks and product folks to make great decisions, , you need a ton of data. the first part Of our, , approach is, , talking about analytics and just, , the foundational , tools you need to just understand product analytics. And we do quite a bit of work in, , B2B SaaS, B2C SaaS, and very familiar with tools Amplitude, Mixpanel, PostHog, that do, , event tracking, things that. part of this first phase is just, . , can we understand the ex… there’s an existing mixed panel or statsig implementation, understand, , what the event taxonomy is, and just drive towards building for the product team those core measurement dashboards. The nice thing is, , you don’t really need to, , do this in an external BI tool. If you can just do this all within your MixPanel or your Amplitude, it’s , , a great way to accelerate, just, , having a good understanding of things DAU, MAU, segments, cohorts, , paths into the product, things that. that’s just a great Baseline to just, , get Views into, , product usage. And it’ll also give you… and then from there, , we can talk a little bit about the MLS data and underbuilt. To just talk about, Greg, , your point about, , , what is the, , reporting stack now, and … , both of you will recognize some of these tools. It’s , this is, , a common stack that we often implement. , , , on the back end on your side, you just may have a bunch of different sources. Typically, we’ll implement some type of ingestion tool. , typically, we recommend, , Fivetran or Polyatomic. There’s a… there’s a couple of ones that are , best in class, reliable, , it’s really going to be a cost in the edge cases of integrations that you need. And then, really, a lot of our work starts there, which is just setting up, , a core reporting data warehouse. In this view, , we’re demonstrating Snowflake, but , this is, , how we would set up a data mart. we land data, we do, , some intermediate modeling, and then we try to make available for reporting use cases several different marts. And , you may have your back-end user data, you may have finance and payment data, you may have your, , product analytics information, You may have information from, , your agent interactions. Landing all that into a core place, and combining it, implementing business logic, making it easy for anyone to come report is, , this is the typical , reporting stack, and then on the back end here, it’s , , getting that out for access. this is where there’s been some more innovation recently. One is, , a lot of folks want to take product usage data and use it in tools Klaviyo, for marketing activation. For example, a new user logged in, but they haven’t, done, , the first event, , we just send them an email, a transact… a marketing-based email or something that. that’s ways where you’re moving product usage data back into, , marketing activation flows. Additionally, I’m , , , advertising and leveraging ad dollars to drive, growth is gonna be a big thing, and for a lot of the advertisers, , ad teams that we work with, being able to tag, , those people and be able to retarget them is super, super important. , that’s a lot of, . what gets enabled here, in addition to just BI. And then the last layer that we’ve been doing a lot of exploration on is, , what are the best, , AI chat with reporting data tools. How do we make it more easier for more people in the business to get value out of these data marts? And maybe they don’t have the expertise or the time to build dashboards, but what are other ways that they can start to, , chat with modeled, , governed data sources? And this is, , the rough stack. Of course, , it’s going to be dependent on a lot of the sources. I know. You guys also have sources that aren’t the typical vendors, but any questions here? Anything I can talk about, about, , the typical reporting stack that, , we see? Xiaojie Zhang: Nope, already. Greg: Bye-bye. Bye. Uttam Kumaran: that’s, , , the first phase of this, really, is, , trying to just drive towards, , getting a good understanding of product analytics. The second phase here is about, , the MLS data, and this is where I would love to hear from you guys. I… I … we wrote this based on chatting with Seagal and Jim Z, I’m familiar with the MLS, I’m familiar with, , what data is in there, but , I’m … I am interested in, . hearing how we… whatever system gets built supports any of the initiatives regarding this data, whether this is getting used for production or for reporting. But, , , we’d just love to hear a little bit about , the vision with this, , phase. Greg: , that’s… it’s a different, this sits off a bit to the side. The… the main thing… , from my perspective, is to be able to generate comps for properties based on, , generally a subject property with, , a range of bedrooms, bathrooms. all your normal stuff, and search through listings to get comparable properties within, , a radius of the long of the subject property. we currently use a data provider to get access to that data, but this project would be… is getting the bulk data files of MLS listings, we’re talking, , 160 million records, daily, About 300 columns wide, a fair bit of data to ingest, and then we wanna, , a lot of that data, we won’t use, we’ll want to, , transform it into, the shape to then be able to query super fast to generate comps. Xiaojie Zhang: , , from a high level, it’s a fairly simple. Greg: thing to do, but to do it is the… that it’s, , reliably handling that amount of data, and whether I I’ve got a few clarifying questions that I need to find out, , whether we can get Deltas, or if it has to be full files every time, lots of just, . Uttam Kumaran: Ergonomic. Keeping that flow, … Greg: Happening, because it’s, , it’s a fair bit of data, to be processing, and… how we handle failures stuff. And then getting it into, assuming into, , our Postgres to then query it super far to get the… to get the comps. Questions? Thoughts? Uttam Kumaran: , , it makes sense. , , there’s certainly a couple ways of doing it. if… depending on the ergonomics on their side, , if these are… if they’re just landing flat… if they’re able to land flat files and put that into S3 or a data lake, there’s a lot of… Greg: They can put it into… they can put it into Snowflake, or Databricks, or… , they’re pretty flexible with where they can, Uttam Kumaran: , at that point, there is a lot of different ways for us to stream that in, in a few different ways into wherever it needs to go. we’ve done work, , in Snowflake, there’s tools Snowpipe that sit on top of S3 and stream data into Snowflake, and, , they’re running, , copy commands under the hood. there’s also, again, if the goal is to move into Snowflake, run some large transformations, and then pipe that data back. the nice thing about Snowflake, and this is similar on, , BigQuery or in Databricks, is we would just write that all back to a data lake, and then get that process back into a Postgres instance for querying. , that’s, I feel , common. I don’t know, Waysh, if you have any other. Awaish Kumar: , there are, , there are multiple options, as you mentioned, , one way is to, , put it back to some bucket, and then… loaded, to Postgres, but there are, , tools Polyatomic, as you mentioned, they can directly, . load from Snowflake to Postgres without any intermediate storage. But it, , depends on, on the… Also on the use case, . we… if we are directly gonna get the data in Snowflake, or if we are going to get it in Snowflake, or maybe in… in our storage. And then how we want to buy it? Does it need transformation, … This thing. Uttam Kumaran: , there’s also ways that, , look, if this gets landed in a… in a data lake, we may not even need to move it to a warehouse to do transformations. There’s a lot more technology that’s come out to enable transformations directly on Data Lake, , in memory, using things DuckDB that, it is , , depending on the product requirements, , whether it’s , how fast things need to land, whether it’s, , , we have to stream that in, or we can batch stuff. there’s a couple ways of doing this. , similarly, we just would need to be able to talk to the… The vendor themselves, and learn a little bit more, … . Awaish Kumar: But is that, , 160 million rows, , coming every day, or… Is it more, , incremental? Uttam Kumaran: Greg may have… dropped? I don’t know, Shaji, if ? Xiaojie Zhang: , the internet connection might be bad in a hotel, or , or maybe his laptop battery dead. You never know what happens during travel. Uttam Kumaran: , but… but that makes sense. , … you guys are gonna use the product to do comps, but how… , the… how much do you need, , the data that’s coming in every day? … Xiaojie Zhang: , , I’ll turn… , you also work in the real estate, ? , ideally, they’re, , Redfin, Dillo, everyone, , this is simple, ? Just pulling MLS data, in some real time, but it’s all coming from, , one data source. , we find the data source, which is CoreLogic. , I’m not whether , , CoreLogic, but they’re centralized, , all the… all the real-ass data in the United States, but surprisingly, we look around on the market, there’s good data provider that just have that near real-time data, Zillor. there’s . that … , we do try out different providers, but… the… Result is not that good. But we do apple-to-apple comparisons. Greg: , sorry, I dropped… I lost internet. , I was gonna provide a bit of extra context, , the whole reason we’re doing this is because the provider that we are using currently just… Hasn’t been… , we need super accurate data, and, , there may be… I don’t know, 85% there, but… we need it to be, we need it to be as high as possible. And , , CoreLogic now, called Hotality, is the gold standard for this stuff. There’s not really an option that can… give us better data, but their bulk data is obviously not real-time, obviously. They have a real-time product called Trestle. But that is on an MLS… per MLS basis, you have to… broker agreements with the individual MLSs, we have, , I implemented Trestle for, California, and that looks great, but… we’re not going to be able to do, , it’s not a straightforward process to set up these agreements, in all the different states, that’s not an option, and that’s why we went with the data provider we’re with currently, because they handle those individual MLS relationships, and we get the API access. Across, all the MLSs, but… … I can see the seams in their data, and where there’s… there’s stuff that’s just… that’s just wrong, or missing, and it’s… we’re, , cobbling together, , a couple of other smaller APIs to try and fill the gaps in their stuff, … their days on market is wrong for properties because of, , sale dates, listing dates, being off, and we’re pulling those from somewhere else, and it’s just… it’s a nightmare. … it’s , . we’re gonna get serious, and we’re getting data from the best source we can. Now we need to work out how we then turn that into our own, API that can, get rid of this of a… Chocolata. Uttam Kumaran: , for us, the biggest thing is just to understand, , what the SLAs are, from the product requirements. , whether, , both on how do we land it, how do we transform it, and then how do we get that back into Postgres, and then what is… for, , what are the minimum SLAs we want to hit on fresh data. And then also, , understanding, , how they’re sending us that and the ergonomics of that is going to be… the core here. The other thing is, , again, building some redundancy. If the one thing is if this is touching product, of course, , there’s a lot higher degree of scrutiny than if this is just for reporting, and … Really important for us to just make that there’s Different ways for… , the product to continue to access this data, and there’s, , a… , we have a clear understanding of, , what, how fresh this data is, and, , what part you can query, and how often we’re syncing it, … But , makes sense. Greg: , … the… the landing and transforming is… the SLAs are a bit different compared to, , our API that’s then serving the results from, Postgres. , there needs to be… super high availability, needs to be super fast. We want to be replicating, trestle- experience, we want to be generating our comps in I want them to be under a second. Uttam Kumaran: Really? the comp generation piece, , , I would say less concerned about that, because that’ll be… , more on, . how does that… how is, , the $100 or$ 130 million coming in? And how much of it can we do incremental? Can we transform in flight, and then land back? And then understanding, , , what are the deltas that are coming in every day, and then… trying to phase to see, , , here are some options, ? , , it’s a cool problem, , definitely. Makes sense. Greg: , , … we need to… get some answers to a bunch of those… a bunch of those questions. I need to… Just set some time, with them. Uttam Kumaran: You guys have the contact at CoreLogic already, and they have given you. , they’ve given you a set of options on, , how they can load. , , , for this, it’s , . We would just do a couple proof of concepts and see , what’s possible, ? Whether we go directly to Snowflake, whether we drop this in a data lake and do something in memory, OH, , there’s a couple options. Awaish Kumar: We could also use, , if it is coming to S3 directly. use by Spark, and transform it, and store directly to the Sources, and we don’t need to move it to warehouse in that case. Uttam Kumaran: , , that’s , , what I’m hoping happens, is that , we don’t need to go through a snowflake. It depends on, , how they drop… how they can drop it to us, and we can do everything directly in Lake. But we’ll see. Greg: , can you… tell me a bit more about that. , what would be the determinative factor if you had to… use Snowflake or not in that scenario. Awaish Kumar: , for, , for Snowflake, , it depends on what is the use case. If only the use case is to transform it and put it back to Postgres, where it will be used as… will be used by the product. Then we don’t need to, , involve the warehouse. But if there are things, , we need the same data for analytical reporting, or, , we want to do more deep dive into the data and figure out the trends, or things that, then we might need to store it in a data warehouse. Greg: , , see, I don’t know, if you have any different thoughts, but we don’t have… Products… requirements around… that analytics on the listings data. At the moment. It’s… , Sean, , the… the possibilities are endless. Uttam Kumaran: , we could do… Greg: But, , we don’t have those use cases yet. , our use case is very, , narrow and focused and about making, , amazing comps that are super accurate, super fast, that builds trust with agents and, , gives them the best the best comps that money to buy, really. That’s… that’s the… that’s the thing that we’re doing now. Obviously, there’s lots of stuff that would come Later, but that’s… that’s not defined at this stage. Xiaojie Zhang: , I agree with you, Greg. , , as Greg said, , the possibility is endless. I will say, , since we’re building a tool for real estate, I don’t see why we start coming up with the analysis for the area. the neighborhood, all that type of stuff, because we have first-hand data and just floating into our data warehouse. , it’s gonna be a waste if we just transforming data, , and just make comps. But for now, foreseeing future, we don’t really have big… query to run an analysis base. We want to transform the data in some certain way, and just put back to our production database, very likely Postgres or, , Elasticsearch, , I don’t know, but it’s , we’re gonna decide, , later, but we’re gonna, run super-fast query to pull the most accurate most up-to-date data as possible. That’s for a foreseen future. Uttam Kumaran: , , the best thing here is that you don’t lose by going in one direction. Snowflake, of course, is a data warehouse. It’s tuned for analytics, large selects over large amounts of data, , not point lookups. And , even if we were to do transformation in a data lake, you can still put Snowflake on top of that, and , . I want to understand, . if, for example, if Snowflake, if they can only share the data via Snowflake, then that changes the architecture a little bit, but if we can decouple that, then both are still achievable, and I totally hear you, . there’s… there’s probably tons of analytics use cases and other use cases to go on top of that data, but the primary one to solve is getting that… those records as fast as possible into Postgres in a transformed state. there are some options for us to try that don’t involve Snowflake. , and it’s worth seeing what the trade-offs are Between those, , … That makes sense. Cool. , and. Greg: , , they, they, , they said, we can get it from… SFTP, , it’s, , they can chuck it in S3 or whatever, it’s just certainly not… it doesn’t have to go via… Uttam Kumaran: As long as it ends up in S3 or a data lake, we can always move it to a data warehouse, but now, some of the technology that’s come out recently allows you to do, , transformation in flight, which allows us to dramatically reduce the SLA. , if we were to move this to Snowflake. run transformations, push it back to S3, or push it back to Postgres, there is just another step. , we can achieve much faster SLAs. For the product use, for . that would be, , , we would just , , do a couple of… of demos and show what’s… what’s possible. Of course, , again, if we were to… it’s just also the amount of hoops that, , that data has to jump through every time, and understanding, , are they landing… are they landing, , flat files every day? Are they landing the deltas? When does that come in? , how can we trigger… I don’t know, OH, , I feel in the last few years, there’s a lot more technology on, , data lake, , in-memory transformations for us to definitely try here. the Bible. A few years before that. Awaish Kumar: , , the common architecture is, , people use PySpark on top of the data lake, or, , S3 storage. And because it gives us the… what? , the… the… power to process it in, , parallelly, in, , in seconds, it can also… Process billions of rows, and then also we can directly store Two operational databases. Uttam Kumaran: Cool. And then the la- , the last piece… last… last phase was just talking about, . underbuilt. one of the things that we talked a little bit about was just understanding, . what are… what are folks, , what are folks looking for in terms of new addresses? , where’s the demand for more, , underbuilt data? , where should they… where should we prioritize the next cities to build? These were just a couple of things we discussed. We didn’t really go super deep. , this is the product where I was , , , that’s, , super, super awesome. How do we… I’m interested on the product side to understand, , the roadmap here, and, , how does data… , help with that? Is there other parts of this process that… that needs also similar, . , SOA or data manipulation help? , what do you guys think? Xiaojie Zhang: , I will say, unfortunately, both Gregory and I are not experts, , not super, super detailed on the build side. , at least me, I’m not how Gregory is. Greg: Definitely not. I’m definitely not. Xiaojie Zhang: I have some lightness. I look into… , I’m testing things around, , for my first project in Brazy, I’m , , tests around, or give some stress tests for underbuild. My understanding is it’s, , now, the hard part is, , data parsing. It’s , , gathering different, , PDFs, ? Uttam Kumaran: , that’s why I’m interested to know what you guys are using under the hood for OCR or for extraction. I don’t know if . Xiaojie Zhang: , you’re definitely asking that, , , that’s a good question. Uttam Kumaran: That’s a million dollar question, too. Xiaojie Zhang: , to be very honest, I can be straightforward, too, , for this moment, we’re doing things in a hardcore way, which is, , very manual. Uttam Kumaran: We are thinking about, or we are moving towards a more. Xiaojie Zhang: Large inks model-based. Uttam Kumaran: I have some… I have some things you guys should try, I don’t know if you guys are doing a spike around this, but we… we’ve… we evaluated a bunch of tools, for, , extraction, retrieval, from… from tons of PDFs, , diagrams, text. , all types of stuff from… and there’s a couple of great platforms that do this extremely, extremely , that you should totally evaluate if you guys are doing an evaluation whenever. Xiaojie Zhang: Because that’s definitely on our roadmap. We’re building it now. I’m, , more on the pipeline and AI side for now, because that’s , , the China team domain is. I’m just trying to get things up to speed and, , sustainable. And I will also, , spare some time on the underbuild side, . spin off the workflow, or automated workflow. , we might still need to do some scraping, or , , manually pulling this, , data in, but I’m 100% with you, we need to abstract this data, , it’s… it’s, , very domain-specific. You need to understand the… The building code, the char… Greg: It’s such a wide variety, because it’s at, , a county level. or even… , or even lower, it’s… there’s just, , a huge variance in the way this… Xiaojie Zhang: , but we’re… we are thinking about, , … , it’s definitely, it’s not only a technology problem, technology operation, or some , , it’s a hybrid approach. Uttam Kumaran: procurement of these, , where does it land? But I just think, , this problem 5 years ago would have been way… Xiaojie Zhang: It’s unsolvable. It’s unsolvable. Uttam Kumaran: I don’t know. I do think that, , you have at least some path, , there’s a lot of great OCR and, , LLM, , image… Extraction tooling now that you can consider a building system that, , layers a lot of that on, compares, . I don’t know, I just think it was. Xiaojie Zhang: Or even, , we can condense a 50-page building code guideline to just 3-page TLDR and give… hand that to our… Uttam Kumaran: they’re gonna save them, , tons of time, and… As long as, , your level of accuracy and, , what is important, , what is… what is the… what is the, , proof of concept of, . getting every building, and then… and then how do we layer… start to layer… because also, these are PDFs, it’s not they go anywhere, , as long as you get them stored, you can start to layer on much more and more and more. of the latest that’s coming out from Google, from, , Contextual, on, , these extraction, , and there’s… it’s just , I feel in the last 2 years, especially, there’s a lot in this world. I can tell, this is what, , was super exciting to hear about, because I was , , I don’t think you could have built this before, I don’t know, it would have been very, very hard. But it is a unique data challenge, because of course, . This is, , really tough data to, … Greg: , let’s do it, but just for a very small area. Uttam Kumaran: Cool. But , it wasn’t able to be scaled. Greg: out beyond that, . small LA area, and that’s… and that’s what we’re doing. Uttam Kumaran: , I’m interested in how you guys are, , not only, , I… , there’s one part of this we wrote here, which is, , prioritizing where to go next, but when you go into a new area, , how do… how do you leverage AI to automate, , whether it’s the… I don’t know, I don’t… again, I assume these places don’t have APIs, you’re either logging into, , a county database. raping and moving PDFs to, , something… I don’t know, it’s an even interesting, , data acquisition. Greg: , PDFs or even Word docs on, , county websites , . But it’s also designed to be API access. Uttam Kumaran: , , , how do the… you got it , ? , but it’s cool. you guys may be some of the first people I’ve seen that are doing this, , it may just be a complete proprietary data set. , I don’t know anybody who… I don’t know, maybe Zillow or Redfin or those guys are doing something this, but, I don’t know, , it’s pretty cool. Greg: As far as I know, we’re the only ones doing. Uttam Kumaran: The webinar, , it was cool to watch, , how that was getting used, and then they produced the the plans that, , I can see the through line, for . Greg: , but it’s a… , it’s an interesting but not easy problem to solve, for . Uttam Kumaran: , and then , , maybe I know we just have a few minutes of, , to maybe give you a sense of, , how we work. , we’re… we’re all engineers, we typically operate on, , one-week sprints. We run, , internal stand-ups for, , all of our clients, , and then we typically at least of course, we work with Slack, we do… all of our team is, , really, really remote, async friendly, we do a lot of Looms, Slacks, , everybody operates and does, , PR reviews that. We try to at least meet, , with the team, with our client teams, , once a week, because I know things get busy and just don’t want to go more than a week without saying hi, but if we want to even do daily stand-ups or whatever is helpful, we operate, , at your speed. A lot of us on the team have worked, , in product startups before, … … fear there. the biggest thing for us to understand is out of these three items, , the phasing, ? , for the… the product analytics work, maybe… Xiaoji, maybe you, Jim Z, other folks on the product team, , how do you understand, , what’s going on? And then there’s also, , , what… what are the… what is the initial milestone on the MLS data side? And, , what are some target dates we need to hit? And, , what are the criteria of, . a proof of concept, MVP, and, , V1. And then the underbuilt thing is for us to… Probably poke at. third, , and just see what’s going on, and maybe assist with, , evaluating tools or proof of concepts. Is that, , , . the scope. we’re only talking about, , the next, , couple months, ? But , that’s… that’s , , what’s top of mind for us. Xiaojie Zhang: , we just want to get things rolling, , to be very honest, we’re a startup, we’re scrappy, there are tons of other random, to be very honest, , random shit there. , for us, , maybe for end of this year, we’re trying to launch Android, which is big enough work in front of us. We have iOS running, but I’m, , always pushing GMC, saying, , , data solution, we should take it as early as possible to gather user signals, all that. that’s the reason why I bring, , a mixed panel, a static, because I have experience with Static, although we’re not doing, , A-B testing at this moment, but at least it’s a very cheap way to get this, , data where A data warehousing, . Almost 10 years ago, I was, , in Airbnb. It’s a 50-people team to make this. Uttam Kumaran: Oh, I know, I know. Xiaojie Zhang: speed testing framework, , it’s gonna say it was… now, … , but now, I get 1 million event free from per month, which is a dream, , and we are not that big of a company that gets that much traffic, we are getting things for free, , why not? That’s my thing. And I’m pretty , , of course, Greg is… , setting the plan, we’re gonna handle that intense data problem. we are forming data workstream. Along the way. And it’s, , almost perfect time to discuss, , how we’re gonna… coord… , collaborate, coordinate, , , in the near future, but I… , , Greg and I will bring, , , our, , the feet, , , our thought through this. . chat, we do learn a ton. You guys are super experienced on, , , , tools out there, and, , I seem to, , super experienced on, , real estate side as , which is totally good. , we will bring, , we will have, , internal discussion, saying, , how we’re gonna work together, what’s a pace, what’s a goal, because now, we’re , , a little bit swamped by, . , some odd… Product area? Uttam Kumaran: surprise at all to us. , I’ve worked in Sardis my whole career, I have fear. the best way is to just tell us, , what is the golden goose here? Have us… have us, and then give us a hard deadline. And we’ll work backwards from there. Of course, we’re used to dealing with shifting priorities . For us, what you have is, , we’re very unbiased in the tools. We pick the tools to accomplish the job. , we don’t get paid by any of these guys, but we just to work with the best tools, because it just makes life a lot easier. more than happy to be a partner in, , deciding on, , what the infra is. But also just, , you guys have a timeline that you’re trying to hit, however we can help, , get you there, for . Greg: , sounds great. Uttam Kumaran: , cool, I’m gonna take some of our notes, and I’m just gonna just, , I’ll just, , edit some of the stuff in that document, and then maybe I’ll just send another copy out, and then, , I’ll just maybe send a short summary of, , our conversation with the channel, and then, , if you guys want to let me know next steps, or I’ll send a note to the channel and, , let everyone know that we chatted, and… , I feel after this conversation, I have, , a lot more context, at least in the two core first phases. just getting, , context on timeline and, , what the requirements are would be even better, but, . , we’re ready to go. Totally. Xiaojie Zhang: Awesome, awesome. Uttam Kumaran: Alright, that sounds great. , safe travels, Greg. Greg: , it is… . , for . Uttam Kumaran: . Cool. Thank you. Xiaojie Zhang: Alright, thanks folks. Awaish Kumar: Thank you. Greg: Thanks a lot.

Brainforge Knowledge

Explorer

brainforge_breezy_technical_discussion_12_10_25

Graph View

Backlinks