Meeting Title: Brainforge Interview w- Pranav Date: 2026-03-03 Meeting participants: Ruixi Wen, Pranav Narahari, Kaela Gallagher
WEBVTT
1 00:00:08.070 ⇒ 00:00:09.380 Ruixi Wen: Hi.
2 00:00:11.850 ⇒ 00:00:13.020 Pranav Narahari: Hey, how’s it going?
3 00:00:13.620 ⇒ 00:00:16.200 Ruixi Wen: Hi, Pernel, nice to meet you, yeah.
4 00:00:16.200 ⇒ 00:00:17.390 Pranav Narahari: Nice to meet you as well.
5 00:00:17.660 ⇒ 00:00:19.199 Ruixi Wen: Yeah, where are you calling in from?
6 00:00:19.760 ⇒ 00:00:21.350 Pranav Narahari: I’m in Massachusetts.
7 00:00:21.350 ⇒ 00:00:22.450 Ruixi Wen: Oh, nice, nice.
8 00:00:22.450 ⇒ 00:00:23.419 Pranav Narahari: Yeah, how about you?
9 00:00:23.550 ⇒ 00:00:28.100 Ruixi Wen: Yeah, I’m in San Francisco. How’s the blizzard? Is this, like, getting better right now? Like…
10 00:00:28.590 ⇒ 00:00:35.449 Pranav Narahari: It is, yeah, yeah, I think we… the worst of it was last week. It should be a lot better now. Hopefully. There’s a little snow today.
11 00:00:35.740 ⇒ 00:00:37.880 Ruixi Wen: That’s great to know, like, stay safe.
12 00:00:38.310 ⇒ 00:00:40.120 Pranav Narahari: Thank you, thank you, I appreciate that.
13 00:00:40.360 ⇒ 00:00:44.389 Pranav Narahari: Yeah, if you’re ready, we can just hop into things. We have,
14 00:00:44.540 ⇒ 00:00:49.009 Pranav Narahari: just quite a few things that I just want to, like, talk over. All set?
15 00:00:49.300 ⇒ 00:00:50.420 Ruixi Wen: Yeah. Cool.
16 00:00:51.650 ⇒ 00:00:57.949 Pranav Narahari: Perfect, yeah, so I kind of just want to start off with… in one sec, let me just bring up some notes that I have here.
17 00:01:08.170 ⇒ 00:01:10.290 Pranav Narahari: Sorry about that… one sec.
18 00:01:10.290 ⇒ 00:01:12.329 Ruixi Wen: No, take your time, no worries, yeah.
19 00:01:12.330 ⇒ 00:01:13.820 Pranav Narahari: Cool.
20 00:01:14.050 ⇒ 00:01:29.210 Pranav Narahari: Yeah, so I just wanted to talk about a little bit of, like, how you go about judging AI responses when you feel like it’s best to use AI. And so, how do you decide if, like, AI is the right solution, and when, maybe, is it the wrong solution?
21 00:01:29.840 ⇒ 00:01:32.230 Ruixi Wen: Yes, yes, I think, like,
22 00:01:32.470 ⇒ 00:01:42.520 Ruixi Wen: it’s… really depends on, like, first of all, like, on your use case. So, like, in different use cases, like, you care about, like, different things. So, for example, like…
23 00:01:42.620 ⇒ 00:01:46.990 Ruixi Wen: When… I am, like, really talking to…
24 00:01:47.120 ⇒ 00:02:02.669 Ruixi Wen: AI, like, I use different metrics, like, when I’m just, like, talking to AI, like, on chat or cloud, it’s really about, like, I’m trying to get the answer I want, but when I was, like, working for my previous company, which our company is about, like, an AI-agented company for data engineering, it’s…
25 00:02:02.670 ⇒ 00:02:16.489 Ruixi Wen: very similar, it’s more product, it’s product-focused, but it’s very similar to what Brainforce do. So we do have, like, different metrics that really depends on what the data engineers want to achieve from there. And I think, like.
26 00:02:16.610 ⇒ 00:02:35.360 Ruixi Wen: On the higher level, I think probably we have, like, the, like, Layer 1, the model performance metrics. So, for example, we had, like, a model for, like, dbt to text, so it’s, like, really about, like, the precision and recall. It’s, like, the… very much the foundational pair.
27 00:02:35.390 ⇒ 00:02:37.509 Ruixi Wen: To say, like, whether or not, like, it’s…
28 00:02:37.960 ⇒ 00:02:52.139 Ruixi Wen: really extract the contacts that we needed. And then I would say, like, there are also other ones that, for example, like, on the model level, like, the latency, is very important, so whether or not, like.
29 00:02:52.870 ⇒ 00:02:58.930 Ruixi Wen: helping… The, to us to, like, shh, really,
30 00:02:59.180 ⇒ 00:03:14.949 Ruixi Wen: how do I say, like, opposite, like, like, really, like, save the time to really, make the time to value, to achieve, like, the goal of our product, to provide the best time to value. Yeah. And I also think, like,
31 00:03:15.130 ⇒ 00:03:34.959 Ruixi Wen: the model collaboration is very important. But on the layer 2, I would say, like, it’s really about, like, when I was, like, hopping on meetings with the customer to know, like, their user experience metrics. So, the tech success rate, like, how many times they need to, like, accept, edit, or give suggestions for what they have. For the product I was working on before, they have the human in the loop.
32 00:03:34.960 ⇒ 00:03:37.470 Ruixi Wen: So if you’re not satisfied with this, like, you can…
33 00:03:37.470 ⇒ 00:03:48.340 Ruixi Wen: go ahead and, like, check and edit inline yourself for the code. And also, I think, like, the engagement, retention, based on their subscription, as well as, like.
34 00:03:49.010 ⇒ 00:03:53.189 Ruixi Wen: How do they, like, give out, like, qualitative feedback, maybe from…
35 00:03:53.190 ⇒ 00:03:53.700 Pranav Narahari: Gotcha.
36 00:03:53.700 ⇒ 00:03:55.330 Ruixi Wen: our feedback. Yeah.
37 00:03:55.330 ⇒ 00:03:56.760 Pranav Narahari: Awesome. Yeah.
38 00:03:56.930 ⇒ 00:04:01.020 Pranav Narahari: How do you decide between, like.
39 00:04:01.200 ⇒ 00:04:13.440 Pranav Narahari: different models, like, what are the trade-offs that you’ve noticed between the major models that are out there? So, like, OpenAI’s models, Anthropic, Gemini, any other ones that you want to mention as well that are open source?
40 00:04:13.440 ⇒ 00:04:13.800 Ruixi Wen: Yeah.
41 00:04:13.800 ⇒ 00:04:18.680 Pranav Narahari: How can you kind of, differentiate these models? When do you decide on using which one?
42 00:04:19.120 ⇒ 00:04:22.019 Ruixi Wen: Yes, yes. I think it’s really from, like,
43 00:04:22.040 ⇒ 00:04:38.519 Ruixi Wen: the practice… so I use, like, AI models on a day-to-day basis, like, for my personal work use as well. And I think it’s… I really, like, know by testing, like, different use cases. For example, like, I found, like, I like ChatGPT for, like, it’s…
44 00:04:38.530 ⇒ 00:04:53.150 Ruixi Wen: I would say, like, it’s reasoning thinking, like, because it always, like, generates, like, bullet points, and it always can… it’s… the listening time is really short, and it’s, like, also the first AI product I use, so, like, I just have this, like.
45 00:04:53.500 ⇒ 00:05:09.989 Ruixi Wen: awesome for its, like, logic thinking. For example, I use Cloud for its, like, writing skill, because I real… I really compare side-by-side, for example, to, try to run this, like, demo script. And I found, like, Cloud, the way they say things are a lot more personalized.
46 00:05:09.990 ⇒ 00:05:17.139 Ruixi Wen: And I would say, like, for example, Gemini, I use it for, like, photo generation. And actually, I try, like, CDAMs.
47 00:05:17.140 ⇒ 00:05:32.729 Ruixi Wen: 2.0 recently, from ByteDance, and which is, like, a video generation too, and they are, like, really progressive right now, and, I found, like, their video generation compared to Gemini is, like, even… I would personally think, like, even more,
48 00:05:33.040 ⇒ 00:05:46.959 Ruixi Wen: consistent, through, like, different, different angles. Like, the character they generate and the image is, like, more real and consistent. So I think it’s, like, I love to train, like, different tools, and I think I really had the time, like.
49 00:05:47.080 ⇒ 00:05:54.850 Ruixi Wen: to decide what tasks I wanted to designate to them, I use the same prompt and see, like, what different result they get from there.
50 00:05:56.120 ⇒ 00:06:00.960 Pranav Narahari: Next question I have is kind of, like, on evaluation.
51 00:06:01.090 ⇒ 00:06:09.250 Pranav Narahari: How would you best, like… what do you think is the best way to evaluate some of these, like, LMs, considering that they’re non-deterministic all the time?
52 00:06:09.690 ⇒ 00:06:12.960 Ruixi Wen: I’m sorry, could you repeat your last part? How, what?
53 00:06:12.960 ⇒ 00:06:18.430 Pranav Narahari: LLMs, because they’re non-deterministic, how do you think it’s best to evaluate them?
54 00:06:19.110 ⇒ 00:06:19.890 Ruixi Wen: Mmm…
55 00:06:20.230 ⇒ 00:06:35.000 Ruixi Wen: I think that’s, like, a really, really good question. Yeah, I mean, the goal is, like, we want AI to very much have, like, deterministic output, but it’s just, like, not always the case.
56 00:06:35.100 ⇒ 00:06:40.330 Ruixi Wen: Let me think. I think, like, it’s very important to have, like,
57 00:06:41.730 ⇒ 00:06:54.949 Ruixi Wen: to have… have, like, very much, like, clear… I would say, like, it’s, like, kind of three steps. So very first, like, we need to have, like, a very clear objective on, like, what you want to evaluate. Is this, like.
58 00:06:56.460 ⇒ 00:07:16.280 Ruixi Wen: because at the end of the day, we are putting this for business use, putting this, like, to be user-centric, so it’s worried about, like, whether or not it’s, like, accuracy, efficiency, or how to satisfy our users. So, and we need to make sure our evaluation is, like, very much align with the business goal, and, really focus on that, instead of, like.
59 00:07:18.300 ⇒ 00:07:28.049 Ruixi Wen: Instead of, like, just, like, seeing very much general things, because I think, like, AI can be… LM can be used in different use cases, so it’s very much like we need to have an intentional use case for it.
60 00:07:28.100 ⇒ 00:07:46.200 Ruixi Wen: And secondly, we need to have, like, very much diverse metrics. I know, like, before I mentioned, like, some basic metrics, like precision, recall, user feedback, and I think, like, we need to take, like, a dual approach for the metric that we wanted to evaluate, to providing, like, a very much…
61 00:07:46.530 ⇒ 00:07:52.510 Ruixi Wen: comprehensive overview for its performance, to make sure it’s, like.
62 00:07:52.630 ⇒ 00:07:59.290 Ruixi Wen: driving towards our business goal. And the very lastly, I think it’s, like, really to iterate and refine
63 00:07:59.470 ⇒ 00:08:11.150 Ruixi Wen: because, AI models, like, the core part is, like, it can learn, from things, so we need to, like, always give this, like, continuous, like, feedback loops to know,
64 00:08:11.500 ⇒ 00:08:18.999 Ruixi Wen: To have this, like, iterative improvements so that they can meet our goals better, meet the users’ needs better.
65 00:08:19.200 ⇒ 00:08:20.060 Ruixi Wen: If the answer.
66 00:08:20.060 ⇒ 00:08:23.169 Pranav Narahari: Totally. Yeah. Yeah, that’s great.
67 00:08:23.300 ⇒ 00:08:40.590 Pranav Narahari: Have you ever built an evaluation framework before for any project, app, or, even if you didn’t, how would you maybe design, like, end-to-end an evaluation framework? If we can go a little bit technical there as well, that’d be… that’d be great.
68 00:08:41.280 ⇒ 00:08:47.570 Ruixi Wen: Yes, yes, let me think… I think…
69 00:08:49.510 ⇒ 00:08:57.530 Ruixi Wen: an AI evaluation one. I think, like, one time was, like, when we were…
70 00:08:57.830 ⇒ 00:09:06.990 Ruixi Wen: I was, like, on this deal, project. In my previous company, I was, like, the funding go-to-market, so I was, like, the first person to always, like, talk through, with them.
71 00:09:07.050 ⇒ 00:09:14.949 Ruixi Wen: was the, data engineer manager we were selling to. I worked, like, very closely with the sales engineer. And it was, like, already, like, the…
72 00:09:14.990 ⇒ 00:09:32.809 Ruixi Wen: before we got into POC stage, they very much explicitly told us about their workflow, and what they wanted, like, they wanted to, like, shorten the time that passed from, like, analytical insights to, like, really production-grade, level
73 00:09:33.110 ⇒ 00:09:44.950 Ruixi Wen: pipeline for the data engineers. So this, like, they want to, like, shorten this, like, time to value part with our, like, AI tool. And for our AI tool, I think, like, in order to
74 00:09:45.180 ⇒ 00:09:54.169 Ruixi Wen: achieve that, like, our tool is, like, first of all, like, we really need to make sure it has, like, the,
75 00:09:54.470 ⇒ 00:10:10.619 Ruixi Wen: like, enough, like, contacts, so we need to have, like, a determinist context tracing with our, like, for example, our lineage system that can really, like, track every code into, like, project-wide or even organization-wide, where are the DBD code coming from.
76 00:10:10.620 ⇒ 00:10:21.479 Ruixi Wen: And secondly, to have, like, the verifying code before the code executed. So, like, the human loop, or the AI has, like, a self…
77 00:10:21.640 ⇒ 00:10:24.990 Ruixi Wen: Examining process for the execution.
78 00:10:25.330 ⇒ 00:10:32.269 Ruixi Wen: And, lastly, like, to really, talk us through, like, on our capabilities to ensure, like.
79 00:10:32.410 ⇒ 00:10:49.380 Ruixi Wen: the, what do you call it? Like, the logic they’re using, like, the increment… incrementalization logic they’re using really match with, like, the engineering, standards. So, that was, like, the product features we had, and when we are, like.
80 00:10:49.500 ⇒ 00:11:01.010 Ruixi Wen: walking them through, like, the POC stage, we, basically had, like, a… every two weeks, we had a check-in, and they used our product for, like, 4… 4 weeks, around, like, a month.
81 00:11:01.650 ⇒ 00:11:02.790 Ruixi Wen: And,
82 00:11:03.130 ⇒ 00:11:10.380 Ruixi Wen: We, asked them about, like, the… because we set up, like, the demo environment, so we really asked them about, like.
83 00:11:10.450 ⇒ 00:11:16.200 Ruixi Wen: Led them to track, like, how many engine… data engineers for this product specifically, like, do they
84 00:11:16.270 ⇒ 00:11:30.959 Ruixi Wen: need to get on board this product, and how much hours they’re dedicating… how many hours they’re already dedicating to this product, and also, like, through survey to ask their data engineers to see, like, how much time they save from that, and also
85 00:11:30.980 ⇒ 00:11:43.600 Ruixi Wen: from the product features we have, for example, the determinants, context tracing, the code verification before execution, as well as, like, the match logic, we track, like.
86 00:11:43.710 ⇒ 00:11:59.619 Ruixi Wen: data engineers’ evaluation on three parts on, like, yourself, really, like, which part contributes the most to their goal to have this, like, time to value from intellectual insights to, production level… production grade level pipeline, yeah.
87 00:11:59.620 ⇒ 00:12:04.570 Pranav Narahari: Gotcha. Yeah, talking a little bit more about, like, production and delivery. Yeah.
88 00:12:05.770 ⇒ 00:12:16.369 Pranav Narahari: A common problem, specifically with chatbots, is that they can hallucinate. What are some… what are some things that you can look into to mitigate hallucinations?
89 00:12:17.030 ⇒ 00:12:27.000 Ruixi Wen: Yes, yes, yeah, I think hallucination is definitely, like, like, a really big risk, maybe, like, the biggest risk for AI, so I think…
90 00:12:27.270 ⇒ 00:12:33.970 Ruixi Wen: Something that we could look into was… I think, out, like, similar… similar ways. I think, like…
91 00:12:34.130 ⇒ 00:12:38.569 Ruixi Wen: To address hallucination, like, it’s also from, like, a model layer, and the product.
92 00:12:38.720 ⇒ 00:12:54.949 Ruixi Wen: product or our service feature, and at the end, like, after we launch it, like, when hallucination happens, how do we, like, really cope with that? So I think for model layer, as I said, like, we need to evaluate the best model that has, like, the
93 00:12:55.170 ⇒ 00:13:00.160 Ruixi Wen: Highest accuracy and the highest confidence level, and can make sure
94 00:13:00.310 ⇒ 00:13:04.770 Ruixi Wen: The model matched with, like, the application the best, compared to all.
95 00:13:04.770 ⇒ 00:13:15.780 Pranav Narahari: I guess, going a little bit deeper on that, too, what are certain metrics or parameters you can look at within the model to… or ways you can,
96 00:13:15.800 ⇒ 00:13:28.709 Pranav Narahari: modulate some of these parameters that these models, have, like using an API, or even some, like, just using their web app, how can you vary some of those parameters to mitigate
97 00:13:28.760 ⇒ 00:13:30.769 Pranav Narahari: The hallucination.
98 00:13:31.400 ⇒ 00:13:32.300 Ruixi Wen: Mmm…
99 00:13:32.850 ⇒ 00:13:45.969 Ruixi Wen: I think that’s, like, a very good question. I don’t think I have, like, direct, experience on that. That’s okay, yeah. But I’m thinking, I’m thinking… I’m just, like, thinking of, like, the things that I…
100 00:13:45.970 ⇒ 00:13:47.850 Pranav Narahari: Yeah, you can think out loud, yeah.
101 00:13:47.850 ⇒ 00:13:50.580 Ruixi Wen: Yeah, I think, like, probably,
102 00:13:50.960 ⇒ 00:13:56.560 Ruixi Wen: It has to do with, like, the overconfidence rate. Like, how many times, like, it’s…
103 00:13:57.530 ⇒ 00:14:16.650 Ruixi Wen: let me think… how many times is, like, like, said some… so, hallucinity can be coming in different forms, and I think, like, when someone is, like, mine’s, like, overcompetence rate. They say it’s, like, this is 100% correct, this is for sure correct, but it’s actually not. And also, like, how many times it’s, like, the…
104 00:14:17.020 ⇒ 00:14:20.780 Ruixi Wen: the confidence calibrations, to see, like, how much…
105 00:14:21.210 ⇒ 00:14:30.279 Ruixi Wen: the confidence level it is. And also the times that they say it’s, like, answerable versus, like, unanswerable.
106 00:14:30.810 ⇒ 00:14:37.310 Ruixi Wen: I think, like, that are some, signals that we can tell, like, whether or not, like, it’s hallucinating
107 00:14:37.450 ⇒ 00:14:45.919 Ruixi Wen: a lot or not, but I would say, like, I would say, like, for example, the agent that we built for data engineers, so…
108 00:14:46.210 ⇒ 00:14:47.840 Ruixi Wen: For that, it’s…
109 00:14:48.190 ⇒ 00:14:58.239 Ruixi Wen: Has to do with, like, the success rates of the test, and also, like, verification pass rate, and also, like, how many times the,
110 00:14:58.590 ⇒ 00:15:06.059 Ruixi Wen: constrained violation rate, like, if we ask them to, for example, use this, like, adjacent schema, like, how much, like.
111 00:15:06.430 ⇒ 00:15:07.790 Ruixi Wen: Hallucination is.
112 00:15:07.960 ⇒ 00:15:11.510 Ruixi Wen: has with that. That’s just, like, something that comes to my top of my mind, but I.
113 00:15:11.510 ⇒ 00:15:13.670 Pranav Narahari: Yeah, no, that’s fine. Yeah, that’s good.
114 00:15:13.670 ⇒ 00:15:17.720 Ruixi Wen: about, like, more systematic, I definitely don’t think I covered everything from there.
115 00:15:18.040 ⇒ 00:15:30.449 Pranav Narahari: No problem. What do you think is the hardest part about bringing a POC or a prototype using an LLM, and then shipping that into a production application?
116 00:15:31.100 ⇒ 00:15:46.039 Ruixi Wen: Yeah, I think I personally think it’s really about, like, especially when it’s, like, enterprise-level clients. They have different workflows, and they have, like, different concerns. I would say the hardest part is probably, like, to find, like, a way that
117 00:15:46.260 ⇒ 00:15:59.270 Ruixi Wen: In terms of deployment, and how much it integrates their workflow, and how much, like, meta… metadata or data set, like, whether we have access to metadata or data set, like, how… how comfortable they are with…
118 00:15:59.490 ⇒ 00:16:16.939 Ruixi Wen: the, the service, the product that we are integrating, because once… from POC to production level, I think the key part is, like, PLC can be, like, we’re just saying, setting a demo environment. We don’t really touch on their data set, the database, or any, data sources they have. But once it’s production level.
119 00:16:16.940 ⇒ 00:16:22.610 Ruixi Wen: Like, the AI is literally, like, an employee that’s working side-on-side with them, so to have this, like.
120 00:16:22.720 ⇒ 00:16:34.869 Ruixi Wen: they feel comfortable, safe, building this trust with us was, like, the hardest part. And I would say, like, we can address, like, in different ways. Really depends on what kind of ways, like, maybe, like, changing deployment.
121 00:16:34.870 ⇒ 00:16:42.739 Ruixi Wen: to make… from… maybe from SaaS to self-hosting, to make sure we have containers, we set up, like, Kubernetes to, isolate it.
122 00:16:42.740 ⇒ 00:16:51.980 Ruixi Wen: From, isolated from the environment, where we don’t have, like, by default, we don’t have access to our metadata, or we have regular check-ins, we have, like.
123 00:16:52.000 ⇒ 00:16:54.920 Ruixi Wen: Different, like, compliance contracts we…
124 00:16:55.160 ⇒ 00:17:05.279 Ruixi Wen: I feel most comfortable with, but I think this trust is where, that’s the hardest, because on the POC stage, I think anything, like, for features, for,
125 00:17:05.390 ⇒ 00:17:11.109 Ruixi Wen: Feedback, it’s all something, like, the team can work on, but the trust is something, like, you really need to, like…
126 00:17:11.640 ⇒ 00:17:14.539 Ruixi Wen: Build, gradually, and have, like, very.
127 00:17:14.540 ⇒ 00:17:14.859 Pranav Narahari: Yeah.
128 00:17:14.869 ⇒ 00:17:16.839 Ruixi Wen: Customized solutions for that, yeah.
129 00:17:16.839 ⇒ 00:17:20.719 Pranav Narahari: So this… yeah, this definitely sounds like it could be, like, an issue in production.
130 00:17:21.549 ⇒ 00:17:31.309 Pranav Narahari: What is something that you can do to change this in an application that’s already pushed to production to make, like you said, like, the customers feel more safe and secure with the product?
131 00:17:31.830 ⇒ 00:17:32.650 Ruixi Wen: Hmm.
132 00:17:32.940 ⇒ 00:17:47.579 Ruixi Wen: I think if it’s, like, for sure, like, as I said, like, a better way we have, like, the best, whole process for food deployment to integration that we negotiate behind… negotiate ahead, but I think it’s… if it’s already put into, like.
133 00:17:47.900 ⇒ 00:17:49.490 Ruixi Wen: food production.
134 00:17:49.660 ⇒ 00:18:00.670 Ruixi Wen: Let me think… I think… Mmm… I think, like, like, We really, amazing.
135 00:18:01.350 ⇒ 00:18:04.550 Ruixi Wen: If it’s, like, already put into food.
136 00:18:04.880 ⇒ 00:18:08.219 Ruixi Wen: production, I think what we need is, like, two…
137 00:18:09.070 ⇒ 00:18:11.660 Ruixi Wen: Make sure we has, like, this,
138 00:18:12.960 ⇒ 00:18:27.060 Ruixi Wen: contingency plan, like, if anything happens in term… That we can, like, always have, like, the people, the resources that can back up to help fix the issue, and also have.
139 00:18:27.060 ⇒ 00:18:27.410 Pranav Narahari: Yeah.
140 00:18:27.410 ⇒ 00:18:44.600 Ruixi Wen: very much regular checkings with the customers, because, like, what matters, like, to have this, like, to keep the retention rate high from a revenue perspective. And I think, like, these are, like, the two biggest parts that I think are very much important to have this kind of…
141 00:18:44.810 ⇒ 00:18:47.609 Ruixi Wen: Customer support, after.
142 00:18:47.610 ⇒ 00:18:48.150 Pranav Narahari: Yeah.
143 00:18:48.150 ⇒ 00:18:48.710 Ruixi Wen: into.
144 00:18:49.140 ⇒ 00:18:57.749 Pranav Narahari: I think maybe from a more technical perspective, if we can… and I can maybe paint a picture a little bit more, we kind of talked about an issue with, like, hallucination.
145 00:18:57.750 ⇒ 00:18:58.950 Ruixi Wen: Yeah, yeah, yeah.
146 00:18:58.950 ⇒ 00:19:17.680 Pranav Narahari: And so that’s something that maybe the POC, you know, is… there’s a lot of guardrails, you’re not stress testing it as much as you might a production application. And so let’s say we find out hallucinations are, much higher in production, or we’re noticing the issue much more.
147 00:19:17.880 ⇒ 00:19:26.739 Pranav Narahari: How do you go about, like, seeing that production issue, and then starting to create a change, or maybe… yeah, what is your process after…
148 00:19:27.050 ⇒ 00:19:30.250 Pranav Narahari: This is a production issue, to then mitigate it.
149 00:19:31.650 ⇒ 00:19:39.890 Ruixi Wen: Yeah, okay, so, I think, like, like…
150 00:19:41.680 ⇒ 00:19:53.280 Ruixi Wen: I think, being, like, food production is, like, we basically, like, like, this deal, deal for sure closed, like, it’s, like, fully, as I said, like, fully integrated with the system and the workflow. I’m just making sure that’s waiting back.
151 00:19:53.280 ⇒ 00:20:08.760 Pranav Narahari: Yeah, you can think about it like this instead of, like, deal open or close, it’s just like, okay, we’re in their production environment now, there are maybe quite a few users using the application, and then we notice an issue. It’s not that, like…
152 00:20:08.760 ⇒ 00:20:17.330 Pranav Narahari: Yeah, and so there’s people that you’re currently supporting using the application, you know it’s an issue, so now what’s your process for patching that issue?
153 00:20:18.290 ⇒ 00:20:31.140 Ruixi Wen: Mmm, I see, I see, I see, that makes sense. I think, like, what we need to is, like, we should have, like… we definitely need to change that, and I think, like, one thing was, like, we need to…
154 00:20:31.380 ⇒ 00:20:37.340 Ruixi Wen: To fix, like, those…
155 00:20:37.520 ⇒ 00:20:48.109 Ruixi Wen: To… maybe, like, on system… system… if… if it’s, like, a… like a one-time… one-time issue, then… then it’s, like, we are… maybe we can fix this, like…
156 00:20:48.240 ⇒ 00:20:50.090 Ruixi Wen: Maybe…
157 00:20:50.440 ⇒ 00:21:06.379 Ruixi Wen: add, like, a confidence threshold, so if retrieval similarity is, like, less than some point, then respond with uncertainty. But I think on more system-level changes, like, we can maybe add, like, a verification layer checks for the customer, so make sure, like…
158 00:21:06.470 ⇒ 00:21:13.859 Ruixi Wen: the reclaims is, like, maps to the contacts, and it’s, like, the lineage can track exactly where it’s, like, coming from.
159 00:21:14.050 ⇒ 00:21:19.720 Ruixi Wen: And, if it’s, like, really about, like,
160 00:21:20.310 ⇒ 00:21:24.790 Ruixi Wen: As you said, like, those, like, two-filling hallucin…
161 00:21:24.860 ⇒ 00:21:43.959 Ruixi Wen: hallucination, I think, like, we can fix, like, the orchestration. For example, force the model to say to return code, or, you know, add retry logic, before generation, from that, and I think, like, that would be, like, some,
162 00:21:44.130 ⇒ 00:21:50.950 Ruixi Wen: like, the rare things, like, engineering teams can do, but I think it’s definitely better we have, like, a system alum level and, like, a very much…
163 00:21:51.080 ⇒ 00:22:08.319 Ruixi Wen: closed loop for that, that we can detect where’s the problem coming from, categorize it, analyzing why this problem is causing that, so that we can having, like, targeted change to whatever that could… what’s happening. May it be, like, unsupported claim, or the tool failure, yeah.
164 00:22:09.750 ⇒ 00:22:14.420 Pranav Narahari: Awesome. Cool. Just have a couple more questions.
165 00:22:15.870 ⇒ 00:22:21.000 Pranav Narahari: How, if you haven’t already, would you design a RAG system?
166 00:22:22.050 ⇒ 00:22:38.219 Pranav Narahari: at scale. So think about not just for your own personal use, think about for if you’re to push to production, there’s going to be quite a few users using it, stress testing it. If you can go, like, step by step, trying to get as technical, and then,
167 00:22:38.790 ⇒ 00:22:41.220 Pranav Narahari: Going end-to-end, that would be… that’d be great.
168 00:22:41.810 ⇒ 00:22:52.619 Ruixi Wen: Mmm, I see, I see. Like, in my previous job, like, we had, like, a, rack system, like, already laid out, so I didn’t, like, have, like, hand-on experience to.
169 00:22:52.620 ⇒ 00:22:53.370 Pranav Narahari: Yeah, no problem.
170 00:22:53.370 ⇒ 00:23:04.299 Ruixi Wen: But I would just, like, say, like, what I understand, and I definitely, like, like, very much… I asked you some very meaningful questions, so, if you have, like, anything to add on, I definitely hope, like, to get your perspective on this.
171 00:23:04.500 ⇒ 00:23:05.430 Pranav Narahari: Sure, yeah.
172 00:23:05.430 ⇒ 00:23:13.399 Ruixi Wen: Yeah, I think, like, for a rack system at scale, the very first is, like, we want to know, like, the… the goal and its…
173 00:23:13.880 ⇒ 00:23:19.270 Ruixi Wen: the goal and its boundary, like, kind of like the grounding rules,
174 00:23:19.270 ⇒ 00:23:20.040 Pranav Narahari: Yeah.
175 00:23:20.040 ⇒ 00:23:31.310 Ruixi Wen: was it, like, P50, P95 latency? And also, like, how much risk are we willing to take if we’re working with, like, some customers in government or healthcare-related, like.
176 00:23:31.310 ⇒ 00:23:45.549 Ruixi Wen: we need to really set it up, like, maybe no response, refuse response at a certain point. And the very first thing is, I think, like, to have, like, the database layer design, and in which, like, we need to make sure to…
177 00:23:45.550 ⇒ 00:24:01.869 Ruixi Wen: make sure, like, we are compatible with, like, the data sources, like, make sure the schemas are very much standardized, be it’s, like, from Google Docs, or PDFs, or whatever, Snowflake, or whatever they’re using, to make sure they have, like, a very much standardized schema.
178 00:24:01.870 ⇒ 00:24:12.540 Ruixi Wen: And, secondly, I think, like, we need to have, like… because we’re… you’re sitting at scale, so I… I think, like, we need to have, like, a tracking strategy to make sure, like.
179 00:24:12.610 ⇒ 00:24:13.430 Ruixi Wen: the…
180 00:24:13.830 ⇒ 00:24:24.109 Ruixi Wen: the truck… to have, like, a okay truck size, I’m thinking maybe, like, around, like, 400, 500 tokens, and to make sure the,
181 00:24:24.480 ⇒ 00:24:41.110 Ruixi Wen: the… the… how do I say, like, to make sure, like, the quality, is the… is the… is the best, that they can have, like, some extra words that help us to categorize, the data, maybe, like, like, I don’t know, chuck.
182 00:24:41.260 ⇒ 00:24:50.159 Ruixi Wen: text truck ID, or some, like, keywords we can have there. And aside from there, I think we can…
183 00:24:50.330 ⇒ 00:25:04.250 Ruixi Wen: design. Like, this is, like, on top level. And then after that, I think we can go ahead to do, like, the retrieval system, like a hybrid retrieval system. And
184 00:25:04.570 ⇒ 00:25:09.190 Ruixi Wen: In which, like, we can… once, I’m thinking, like…
185 00:25:09.380 ⇒ 00:25:16.650 Ruixi Wen: Okay, at scale-wise, I think this… this is gonna be hard, because, like, it means, like, you have, like, mo… multiple retrieval…
186 00:25:17.000 ⇒ 00:25:23.970 Ruixi Wen: raw, so maybe we need to, like, categorize the ritual by domain, like, it’s, like, for.
187 00:25:23.970 ⇒ 00:25:26.619 Pranav Narahari: Support, or engineering, or…
188 00:25:26.620 ⇒ 00:25:39.170 Ruixi Wen: for policy, that we can have, like, this kind of domain detect, classification, as well as, like, language detection to make sure it’s, like, going to the right place.
189 00:25:39.410 ⇒ 00:25:42.299 Ruixi Wen: That’s great, yeah. Yeah, and
190 00:25:42.430 ⇒ 00:25:46.420 Ruixi Wen: And we really make sure… and then after that, I think it’s, like, more…
191 00:25:46.640 ⇒ 00:25:50.850 Ruixi Wen: For, like, building the contacts, to make sure, like, it’s…
192 00:25:51.100 ⇒ 00:26:09.589 Ruixi Wen: having, like, the, right context to do, because, like, and it’s, like, it has to be very much, like, budget conscious, because, like, different contexts have, like, different rankings, so I think it’s very important to build the context to know, like, what context to rent first.
193 00:26:09.590 ⇒ 00:26:10.090 Pranav Narahari: Awesome.
194 00:26:10.090 ⇒ 00:26:16.989 Ruixi Wen: is, like, the generation part, what kind of, like, the prompt template we want to use, and what kind of, like.
195 00:26:17.610 ⇒ 00:26:30.249 Ruixi Wen: prompts, for example, like, a prompt on us, I think, probably it’s very necessary for them to, like, include contacts during the prompt, what kind of citation they need, and what kind of, like, maybe we can set up some template for…
196 00:26:30.390 ⇒ 00:26:44.530 Ruixi Wen: on the generation layer. And, lastly, I think it’s, like, what you… we talk about, like, to make sure to be anti-collucination, because we cannot… the point of AI is, like, we don’t need to, like, manually review everything, so we have this, like…
197 00:26:44.670 ⇒ 00:26:49.870 Ruixi Wen: Maybe, like, claim grounding or tooling to help with this kind of risks.
198 00:26:50.110 ⇒ 00:26:51.679 Ruixi Wen: Yeah, that’s one.
199 00:26:51.680 ⇒ 00:26:57.759 Pranav Narahari: Great. Yeah, that was great. I know we’re running low on time, I have a hard cut at…
200 00:26:57.760 ⇒ 00:26:59.080 Ruixi Wen: In 2 minutes.
201 00:26:59.080 ⇒ 00:27:03.599 Pranav Narahari: If you have a couple of questions right now for the last 2 minutes, like, happy to answer anything.
202 00:27:04.050 ⇒ 00:27:23.420 Ruixi Wen: Yes, yes, yeah, I know, like, you’re really much, like, integrate, like, we’re very in-depth job for now. I’m, like, curious for, what do you think, like, for this role that’s, like, most important when… if I got this role, like, when I’m working with you, like, what do you think are the most important things that I need to be really good at?
203 00:27:24.290 ⇒ 00:27:28.940 Pranav Narahari: Yeah, I think, what I’ve noticed here at Brainforge is,
204 00:27:29.800 ⇒ 00:27:46.230 Pranav Narahari: people now are, like, they work in a very, like, flat organization, and especially here, like, it’s very flat. People are basically… There are certain people that specialize in things, and you can go to, like, certain of our, like, technical leads for just, like, really in-depth, like.
205 00:27:46.230 ⇒ 00:27:54.559 Pranav Narahari: architecture questions that you might need to make. But for the most part, we’re… we’re all very well versed in a lot of the similar things.
206 00:27:54.560 ⇒ 00:27:56.130 Ruixi Wen: So just feeling like…
207 00:27:56.130 ⇒ 00:28:09.890 Pranav Narahari: you’re good on both the product side, on the delivery side, like, in communicating with clients, but then also just on the technical side, too, and, like, like, getting your hands dirty with, like, the coding, and, like, deploying, and the testing. I think…
208 00:28:10.030 ⇒ 00:28:15.289 Pranav Narahari: This is, like, a unique role in that you can really have your hands in all of those different categories.
209 00:28:15.550 ⇒ 00:28:29.409 Ruixi Wen: Mmm, yeah, yeah, that sounds, like, indeed very exciting, yeah, because I talked to, like, Samuel before, and he told me, like, more higher level about, like, the projects thing, and it seems, a very interesting space, yeah. I wouldn’t, like, hold you any long, because you have, like, a heart.
210 00:28:29.410 ⇒ 00:28:38.399 Ruixi Wen: stop, but I really appreciate your time, and I hope, like, I can have the chance to, like, ask more questions about, like, your jobs, and the products you’re working.
211 00:28:38.400 ⇒ 00:28:50.229 Pranav Narahari: Sure, yeah, you can, I think you have my email, it’s attached to this, calendar invite, so feel free to email me anything. I wish I had a few more minutes, but yeah, I do have a hard stop at 1. But, yeah, thank you so much, this was great.
212 00:28:50.230 ⇒ 00:28:53.329 Ruixi Wen: Okay, thank you so much, Parnell, thank you so much for your questions.
213 00:28:53.610 ⇒ 00:28:54.710 Pranav Narahari: Yeah, have a good one. Bye.
214 00:28:54.780 ⇒ 00:28:56.289 Ruixi Wen: You too, have a great day.