Meeting Title: Brainforge AI Engineer Interview Date: 2026-03-09 Meeting participants: Mouhamad, Pranav Narahari
WEBVTT
1 00:01:16.940 ⇒ 00:01:18.749 Pranav Narahari: Hey, Mohammed, how’s it going?
2 00:01:19.430 ⇒ 00:01:21.080 Mouhamad: Hey, Pranav, how are you?
3 00:01:21.920 ⇒ 00:01:23.910 Pranav Narahari: I’m good, I’m good. Nice to meet you.
4 00:01:24.390 ⇒ 00:01:27.520 Mouhamad: Nice to meet you, man, nice to meet you. How are you? How’s everything?
5 00:01:28.450 ⇒ 00:01:31.240 Pranav Narahari: Things are good, things are good.
6 00:01:31.830 ⇒ 00:01:35.269 Pranav Narahari: Yeah, I see that you, I believe, already talked to Sam, right?
7 00:01:36.050 ⇒ 00:01:36.710 Mouhamad: Yes.
8 00:01:36.940 ⇒ 00:01:37.540 Mouhamad: Good.
9 00:01:37.540 ⇒ 00:01:39.439 Pranav Narahari: Great. How did that go?
10 00:01:40.360 ⇒ 00:01:42.019 Mouhamad: Good, good, it was fun.
11 00:01:42.170 ⇒ 00:01:43.709 Pranav Narahari: It was fun talking to you.
12 00:01:44.260 ⇒ 00:01:46.739 Pranav Narahari: Yeah, Sam’s great. Sam’s awesome.
13 00:01:46.870 ⇒ 00:02:01.459 Pranav Narahari: Yeah, so how I usually like to run these interviews is, first, just kind of getting to know a little bit about you, but since we only have 30 minutes, you know, I just try to move along pretty quick, maybe try and keeping, like, all answers between, like.
14 00:02:01.800 ⇒ 00:02:09.740 Pranav Narahari: 90 seconds to, like, 2 minutes, and then, yeah, there’s, like, a few different topics that I want to cover.
15 00:02:10.229 ⇒ 00:02:11.210 Pranav Narahari: But…
16 00:02:11.320 ⇒ 00:02:19.909 Pranav Narahari: yeah, we can just hop right into it. Yeah, so tell me a little bit about yourself, and what brings you here to Brainforge, and what makes you excited about Brainforge?
17 00:02:20.370 ⇒ 00:02:36.640 Mouhamad: Sure, sure, awesome. So, just… just to talk about, a bit about myself, and just quickly. So, my name is Mohamed, I am a senior slash lead, now I could say I’m a lead, lead AI engineer, data science and AI engineer at a company called Artifact.
18 00:02:36.660 ⇒ 00:02:51.930 Mouhamad: And Artifact is basically a consultancy, AI consultancy company, and we do, like, projects with ministries, with private companies, and different sectors, like aviation, tourism, electricity, whatever, anything.
19 00:02:52.150 ⇒ 00:03:04.159 Mouhamad: And yeah, I’ve been working with them for now, for the past three and a half years-ish, about to be four or something. And yeah, and I’m leading development in a couple of projects, so projects related… everything related to AI,
20 00:03:04.160 ⇒ 00:03:12.169 Mouhamad: Yeah, I have a team with me, sometimes I have, like, a small team, sometimes I have a big team. Depends on the project, on the client, on whatever.
21 00:03:12.460 ⇒ 00:03:22.179 Mouhamad: And yeah, basically a new challenge, new… I do have, like, a similar background, because, like, we’re all consultancy and everything.
22 00:03:22.210 ⇒ 00:03:38.060 Mouhamad: a new challenge, a new market also, like, a new market that I’ve never been into, because I’ve been mainly working here in the Middle East, Saudi, Dubai, Qatar, a bit also in Asia and a bit in Europe as well, with companies like.
23 00:03:38.060 ⇒ 00:03:38.469 Pranav Narahari: Oh, nice.
24 00:03:38.470 ⇒ 00:03:49.409 Mouhamad: And, yeah, so I’ve been on all of these, regions. The US is, like, still a little bit… still new, so it’s a new opportunity, a new adventure for me.
25 00:03:50.270 ⇒ 00:03:54.490 Pranav Narahari: That’s cool. Yeah, so I’ll tell you a little bit about myself, too.
26 00:03:54.820 ⇒ 00:04:14.689 Pranav Narahari: Before we hop into, like, the rest of the interview, but yeah, so I come from a more traditional software engineering background, and then kind of got into the space of, like, AI engineering and, like, working at consulting firms about 2 years ago. And it’s great. You have a lot more, responsibility, you work in, like, much smaller teams.
27 00:04:14.690 ⇒ 00:04:32.159 Pranav Narahari: And you have both the technical type of implementation work, which is, which is great, which you’re trained for for years and years and years. But then, what I really like about it is that you’re client-facing, too, and even if you’re not presenting to clients all the time, is that
28 00:04:32.160 ⇒ 00:04:34.859 Pranav Narahari: You are very close contact with clients, so…
29 00:04:34.860 ⇒ 00:04:39.590 Mouhamad: Yeah, you still give them your technical expertise, I present to them all the things, yeah, I get you.
30 00:04:39.590 ⇒ 00:04:52.279 Pranav Narahari: Exactly. Yeah. So yeah, the first thing I kind of want to ask you about is just, I want you to just tell me about, like, any LLM project, if you’ve worked on a RAG project, that would be great too.
31 00:04:52.490 ⇒ 00:05:01.139 Pranav Narahari: Telling me, like, what you shipped end-to-end, what was, like, the user workflow, and then, yeah, what made it successful.
32 00:05:01.920 ⇒ 00:05:05.049 Mouhamad: Yeah, if you want to talk about the rag…
33 00:05:05.830 ⇒ 00:05:14.740 Mouhamad: One of the projects that I worked on… was, okay, for an aviation company.
34 00:05:14.900 ⇒ 00:05:25.969 Mouhamad: So the BHC company is Riyad Air, so it’s a new airline that opened in Saudi Arabia, and basically what they wanted is they wanted a chatbot on their entire system.
35 00:05:26.200 ⇒ 00:05:38.669 Mouhamad: And they have, like, stuff related to aviation. It’s not just, like, the technical… like, the normal rag where you have this textual data, but no, it also goes into, like.
36 00:05:39.050 ⇒ 00:05:41.619 Mouhamad: You have,
37 00:05:41.940 ⇒ 00:05:51.129 Mouhamad: some documents related to, like, the shape of the plane, like, images about the plane. If the plane has, for example, some,
38 00:05:51.270 ⇒ 00:06:10.549 Mouhamad: some stuff that are malfunctioned in the wing or anything. You also have these kind of things. You have some diagrams. So, you don’t have the normal text data, where you just have, like, a text and you just embed and etc. No, but you also have a lot of other, like, image data, etc.
39 00:06:10.550 ⇒ 00:06:11.110 Pranav Narahari: Okay.
40 00:06:11.360 ⇒ 00:06:29.000 Mouhamad: And yeah, it was one of… an interesting rag, because it’s not the usual rag that we do, and what made it successful is that it helped them a lot in their work, because the old way that they have is that whatever they want, anything, any sort of information or anything, they would just need to go and…
41 00:06:29.000 ⇒ 00:06:34.140 Mouhamad: Okay, we don’t know which one, maybe this and this PDF, for example, and let’s go to it.
42 00:06:34.140 ⇒ 00:06:39.350 Mouhamad: And start reading, and… and yeah, so it saved them a lot of time.
43 00:06:39.900 ⇒ 00:06:46.750 Pranav Narahari: Yeah, can you give me, like, an example of, like, some, like, queries they may ask, this, this LLM?
44 00:06:47.460 ⇒ 00:06:49.799 Pranav Narahari: Or there’s a RAG, like, app that you build.
45 00:06:50.560 ⇒ 00:06:56.960 Mouhamad: Yeah, so one of the… one of the queries, for example, some queries were related to, like.
46 00:06:57.140 ⇒ 00:07:16.510 Mouhamad: So, for example, they had some sort of air, like, airplanes, some, like, one type of airplane, so they wanted to see, like, can this airplane, for example, travel from Saudi Arabia, let’s say, to… from Riyadh, for example, to,
47 00:07:16.650 ⇒ 00:07:27.349 Mouhamad: to, like, England or something like that, in one go, and how much fuel does it cost? So, this is some sort of information that would do. Another thing they would have is the…
48 00:07:27.610 ⇒ 00:07:37.800 Mouhamad: Because they also, like, have, like, some engineering people in the company, so it also was helpful for them, because they would ask, like, for example,
49 00:07:37.970 ⇒ 00:07:50.199 Mouhamad: like, the wing of this… of this plane that they have, how much torque does it have inside of it? Because they… because they also have, can you hear me right?
50 00:07:50.530 ⇒ 00:07:51.859 Pranav Narahari: Yeah, I can hear you. Yeah, yeah.
51 00:07:51.860 ⇒ 00:07:54.050 Mouhamad: It gave me, like, battery life as well.
52 00:07:54.560 ⇒ 00:07:55.030 Pranav Narahari: Okay, no.
53 00:07:55.030 ⇒ 00:07:56.649 Mouhamad: gave me that today, Los Officer.
54 00:07:56.870 ⇒ 00:08:06.939 Mouhamad: Yeah, so I would ask, like, some, technical information on the plane, on the wing, etc, etc. So yes, these are some of the other parts.
55 00:08:07.510 ⇒ 00:08:10.349 Pranav Narahari: Cool, yeah, it seems like it’s pretty broad spectrum.
56 00:08:10.350 ⇒ 00:08:19.460 Mouhamad: Yeah, it was… it was… it was, like, general for them, and we gave them, like, access based on their access, like, where they were at in the company, and yeah.
57 00:08:20.660 ⇒ 00:08:39.169 Pranav Narahari: Cool, yeah. If you were to, in, like, as quick as possible, maybe just, like, in a few sentences, describe, like, the architecture of this, RAG system. So, like, all the way from, like, the user interface to then also, just, like, what the backend was, designed as.
58 00:08:40.559 ⇒ 00:08:44.809 Mouhamad: Okay, let me just spell my thoughts.
59 00:08:46.240 ⇒ 00:08:47.200 Pranav Narahari: Yeah, take your time.
60 00:09:10.750 ⇒ 00:09:11.560 Mouhamad: Okay.
61 00:09:12.950 ⇒ 00:09:17.390 Mouhamad: Sort of plays a common… question.
62 00:09:28.390 ⇒ 00:09:29.510 Mouhamad: Oh, that’s true.
63 00:09:35.410 ⇒ 00:09:42.249 Mouhamad: Okay, so, I’ll try to… I’ll just try and manage my idea, just to make it an infused sentence.
64 00:09:42.510 ⇒ 00:09:51.949 Mouhamad: So it consisted of, let’s say, three main layers. The first layer is the document processing.
65 00:09:52.400 ⇒ 00:09:57.880 Mouhamad: Where we have, basically, just pre-processed data, so we process data from…
66 00:09:58.180 ⇒ 00:10:03.590 Mouhamad: Text, from images, from everything, and we saved them into its… its…
67 00:10:03.590 ⇒ 00:10:19.279 Mouhamad: supposedly the database, and each one is to its database, because, like, you can use, for example, some vector database for the text, etc, but you cannot use some of, like, the images and stuff into other database, like Mongo, etc, or GitFS, or anything.
68 00:10:19.430 ⇒ 00:10:22.880 Mouhamad: So, the first layer, I would say, that was the document processing.
69 00:10:23.250 ⇒ 00:10:34.589 Mouhamad: The second one is the retrieval, and the third layer is the answer generation. So there is, like, navigation manuals, regulations,
70 00:10:34.710 ⇒ 00:10:39.189 Mouhamad: And the operation documents, etc. They were ingested.
71 00:10:39.460 ⇒ 00:10:46.319 Mouhamad: Into, like, its own and pre-processed properly and everything. They were cleaned. They weren’t, like, very, like.
72 00:10:46.790 ⇒ 00:11:04.009 Mouhamad: data that needed, like, a lot of enhancements or anything. Then you have the chunked, chunking, so chunk them, etc, then converted them into embedding, stored into vector databases for the ones that need vector databases, etc. And when a user would ask a question.
73 00:11:04.250 ⇒ 00:11:13.470 Mouhamad: The system would retrieve, like, the information from, like, the chunks, from these documents. I’d use… I used semantic search, for this.
74 00:11:13.750 ⇒ 00:11:19.350 Mouhamad: And yeah, this is in a few synthesis. Do you want me to explain more about it?
75 00:11:19.730 ⇒ 00:11:31.459 Pranav Narahari: No, that’s great. Also, can you kind of just, like, from a product standpoint, describe, like, what it looked like from a user’s perspective? So, was it a chatbot? Was it, was it integrated with some other system?
76 00:11:32.390 ⇒ 00:11:44.480 Mouhamad: No, it was a chatbot. So it was a chatbot interface, just like ChatGPT or anything, and they would just answer a question about it. They had the Google, like, Google GCP, basically.
77 00:11:44.640 ⇒ 00:11:49.189 Mouhamad: So, we would use, like, Google ecosystem and so forth. That was a short point, yeah.
78 00:11:49.800 ⇒ 00:11:58.899 Pranav Narahari: Gotcha. And did you, build any evaluation framework for this app, or if you didn’t, like, is there something that you think would have been useful for this app?
79 00:12:00.990 ⇒ 00:12:10.730 Mouhamad: Yes. So, for this one, it kind of… we wanted to do a lot of evaluation stuff, like, from Rajas to…
80 00:12:10.760 ⇒ 00:12:23.059 Mouhamad: to everything, etc. I did do some evaluation, yes. Like, I created… basically, what I did is I created, like, a label dataset of aviation questions.
81 00:12:23.240 ⇒ 00:12:35.079 Mouhamad: with their correspond to, like, relevant document passages. Then I measured, like, metrics. I measured, like, recall, precision,
82 00:12:36.970 ⇒ 00:12:41.070 Mouhamad: what else I did? I did also mere reciprocal as well.
83 00:12:41.240 ⇒ 00:12:48.359 Mouhamad: What was that? Municipal… Yes, various applicable rank.
84 00:12:48.990 ⇒ 00:12:50.080 Pranav Narahari: Gotcha, yep.
85 00:12:50.270 ⇒ 00:12:52.930 Mouhamad: Yeah, the word is difficult.
86 00:12:52.930 ⇒ 00:12:54.240 Pranav Narahari: No, no, you’re good, yeah.
87 00:12:54.680 ⇒ 00:12:56.690 Mouhamad: I would say, I would say…
88 00:12:56.930 ⇒ 00:13:03.540 Mouhamad: If I had time, I would have done even more, but we were a lot, like, pressed on time.
89 00:13:03.880 ⇒ 00:13:11.589 Pranav Narahari: Totally, yeah. I guess, thinking about it now, like, if you had the time to implement, what other type of, evals would you have added to the framework?
90 00:13:11.590 ⇒ 00:13:23.030 Mouhamad: Yeah, so other things I would add, like, for example, the generation quality of the LLM. So I would add, like, metrics like exact match,
91 00:13:23.200 ⇒ 00:13:33.000 Mouhamad: semantic cerality metrics, like, for example, birth score of F1 score. Also, I would do, like, some hallucination checks, if it’s hallucinating or not.
92 00:13:33.160 ⇒ 00:13:48.570 Mouhamad: Yeah, these are some of the things that I would… I would add if I had more time. Also, I did some stuff, like, related to latency, robustness, etc, but not, like, a thing. But yeah, I would… I would do, like, even these more, I would go into more, more into depth if I had time.
93 00:13:49.040 ⇒ 00:13:52.179 Pranav Narahari: Gotcha. So I guess kind of a segue to that is,
94 00:13:52.220 ⇒ 00:14:09.870 Pranav Narahari: what type of monitoring did you put into place in this application? And, even if the monitoring, because you said you had a… you had a pretty quick timeline, even if you didn’t do much, what do you think would have been useful monitoring, and what are, like, the metrics that would have mattered for an application like this?
95 00:14:10.720 ⇒ 00:14:20.000 Mouhamad: Yeah. So… there was some metal monitoring, monitoring, so, for example.
96 00:14:20.260 ⇒ 00:14:26.259 Mouhamad: So, let’s talk first from a perspective of what a good monitoring is. So, typically.
97 00:14:26.350 ⇒ 00:14:41.670 Mouhamad: I would say, like, it’s on four areas. So, for example, there is the retrieval quality, the generation quality, some sort of, like, performance stuff, like, system performance, and the last one, I would consider it as, like, data drift.
98 00:14:43.520 ⇒ 00:14:53.570 Mouhamad: these are, like, the pillars that I would usually do. So for this one, I did implement some monitoring for the retrieval.
99 00:14:53.770 ⇒ 00:15:12.840 Mouhamad: Basically to ensure, like, like, the system is actually, like, retrieving, etc. Like, for example, hit rate. I did, sort of some retrieval hit rate, but I, like, again, wasn’t a lot in depth, but if I had also the time, if we’re talking, like, if I had also to do a lot of things.
100 00:15:14.640 ⇒ 00:15:34.539 Mouhamad: So I would do, like, a recall. So for the retrieval monitoring, there was, like, as I said, like, recall, embedding severity, hit rate, the score of the embedding quality. For the generation quality, the, the groundedness, hallucination rate, relevancy.
101 00:15:34.740 ⇒ 00:15:35.970 Mouhamad: Yeah.
102 00:15:36.160 ⇒ 00:15:50.059 Mouhamad: there was… there could be also some stuff related to, like, the API… so there was API calls, but they had a big budget, they didn’t care about it, honestly. Error rates, throughput…
103 00:15:50.650 ⇒ 00:15:55.739 Mouhamad: Also for the data drift, like, just the query distribution drift, if I have time as well, yeah.
104 00:15:56.370 ⇒ 00:16:10.669 Pranav Narahari: Yeah, I also want to kind of ask about, if we kind of go into, like, last question about maybe this product that you built, maybe if you want to use an example for another one, whatever comes to mind, what’s something that, after you shipped it,
105 00:16:11.030 ⇒ 00:16:26.530 Pranav Narahari: you found that something was broken in production, either the client came back to you, or you found out on your own. And then what did you do to change that, afterward? So, like, after you’ve already shipped it, maybe there’s already customers using the application,
106 00:16:27.010 ⇒ 00:16:33.690 Pranav Narahari: maybe first just give me what that was that broke, just kind of, like, paint that picture for me, and then what you did to then fix it.
107 00:16:36.220 ⇒ 00:16:44.840 Mouhamad: Sure. So… So there was the generation…
108 00:16:45.000 ⇒ 00:16:46.950 Mouhamad: Is there anything for one tonight?
109 00:16:50.350 ⇒ 00:16:51.150 Mouhamad: Okay.
110 00:16:51.680 ⇒ 00:16:54.270 Mouhamad: So…
111 00:16:55.270 ⇒ 00:17:03.179 Mouhamad: In one case, after deploying the rag, basically what I noticed is that the answer quality is starting to degrading in production.
112 00:17:03.590 ⇒ 00:17:20.370 Mouhamad: It wasn’t, it wasn’t, like, it, it was good at the beginning, but it started degrading, etc. And some users inside, and some, some, like, some departments, not all departments, some departments, they started reporting that the system was giving, like, answers that were, like.
113 00:17:21.010 ⇒ 00:17:23.940 Mouhamad: I’d say either,
114 00:17:24.530 ⇒ 00:17:31.939 Mouhamad: Like, it wasn’t incorrect, but it was, like, partially incorrect, or it was referring to, like, a relevant document section.
115 00:17:32.170 ⇒ 00:17:41.709 Mouhamad: This is one of the problems that happened. What I did for this is basically the first thing I went and I diagnosed the error.
116 00:17:41.860 ⇒ 00:17:46.090 Mouhamad: So, I diagnose where the issue is happening in the pipeline.
117 00:17:47.370 ⇒ 00:17:58.100 Mouhamad: And since, basically, this is a rack system, and I have multiple components, what I did at the beginning is just go back and I check the logs for the retrieval stage.
118 00:17:58.340 ⇒ 00:18:03.179 Mouhamad: And basically what was retrieved during the retrieval and the generation stage.
119 00:18:03.370 ⇒ 00:18:17.339 Mouhamad: And by… by… basically, why I did this is because, like, by inspecting, like, the retrieved document chunks, like, I… I was able to, like, discover, like, the system was receiving less relevant,
120 00:18:17.540 ⇒ 00:18:21.970 Mouhamad: sections, compared to, like, the offline evaluation, etc.
121 00:18:24.510 ⇒ 00:18:32.859 Mouhamad: That error is one prime example of why it was, like, basically, like, a distribution shift in the query.
122 00:18:32.960 ⇒ 00:18:36.390 Mouhamad: Like, the user in production will ask a question in…
123 00:18:36.920 ⇒ 00:18:44.799 Mouhamad: And they’re Arab, and they’re Saudi Arabian, so their question is different than what I would ask the model.
124 00:18:44.940 ⇒ 00:18:50.250 Mouhamad: So it was, like, a different format, which affected, basically, the embedding.
125 00:18:51.300 ⇒ 00:18:51.820 Pranav Narahari: I see.
126 00:18:51.820 ⇒ 00:18:57.990 Mouhamad: Now, now for this, the way I fixed it is… I introduced…
127 00:18:58.330 ⇒ 00:19:07.519 Mouhamad: So it was semantic at the beginning, then I did… because, like, they would ask a question, but they would, like, throw some keywords there in their,
128 00:19:07.810 ⇒ 00:19:23.419 Mouhamad: So, I changed it from semantic to, like, hybrid, where I combined, like, semantic and keyword search, which did improve, like I’ll say, like, the technical terms, technicality. Also added re-ranking, which also helped.
129 00:19:23.490 ⇒ 00:19:31.300 Mouhamad: But yeah, but this is what I did. It did help the problem, but this is one of the issues that occurred on this.
130 00:19:32.140 ⇒ 00:19:45.680 Pranav Narahari: Great, yeah. Yeah, let’s hop into kind of, like… I just want to, like, bring up some scenarios and just, like, understand, like, how you think about them. And some of them, you know, things that I’ve just kind of gone through here at Brainforge. Okay.
131 00:19:45.680 ⇒ 00:19:58.829 Pranav Narahari: And so, for one of our clients, they were in the e-commerce space, and we were building MCP servers for them so that they can chat through via the… via, you know, just like a chatbot, similar to what you made.
132 00:19:58.830 ⇒ 00:19:59.260 Mouhamad: Okay.
133 00:19:59.260 ⇒ 00:20:14.810 Pranav Narahari: and one of the MCP servers that we built was a Shopify MCP server. The issue was that whenever they would… or not whenever, but sometimes when they would ask questions, with a Shopify MCP server enabled, they would get
134 00:20:15.050 ⇒ 00:20:18.489 Pranav Narahari: incorrect data being pulled in.
135 00:20:18.710 ⇒ 00:20:26.510 Pranav Narahari: And what they found, after looking at, like, the… the tool history, was that it looked like it was creating
136 00:20:26.700 ⇒ 00:20:30.569 Pranav Narahari: dummy data, and it wasn’t pulling in the Shopify data.
137 00:20:30.610 ⇒ 00:20:47.140 Pranav Narahari: And so, yeah, we were able to fix that, but I’m wondering, given that information, and you can ask me additional questions if needed, but what are… what would you think is the issue? What would you diagnose that as, and then what would you do to kind of fix that?
138 00:20:52.290 ⇒ 00:20:56.490 Pranav Narahari: Deacon Think Simple, yeah, nothing too complicated.
139 00:20:58.990 ⇒ 00:21:07.920 Mouhamad: Okay, domit data. So, dummy data, so I was generating…
140 00:21:09.430 ⇒ 00:21:14.389 Mouhamad: Was it connected to what exactly? What kind of database you’re connecting to what?
141 00:21:15.150 ⇒ 00:21:21.769 Pranav Narahari: So, it’s, connecting to the Shopify MCP, so it’s pulling the data dynamically from Shopify.
142 00:21:23.590 ⇒ 00:21:24.290 Mouhamad: Okay.
143 00:21:24.940 ⇒ 00:21:35.280 Mouhamad: Just simply, if I would… if I were to… to approach the issue, yeah.
144 00:21:36.880 ⇒ 00:21:49.279 Mouhamad: Because you also mentioned that the tool was doing it, so I would approach it in three ways. Not three ways, but I would look at the three layers. I would look at the tools integration.
145 00:21:49.660 ⇒ 00:21:54.809 Mouhamad: Yeah. That you did. The Asian reasoning, because I was the agent will return.
146 00:21:54.930 ⇒ 00:21:58.019 Mouhamad: And the data validation.
147 00:21:58.380 ⇒ 00:22:08.740 Mouhamad: First thing that I would do is to understand what actually broke, if the tool history… so that’s system-generated dummy data, rather than the actual Shopify data that you spoke about.
148 00:22:08.950 ⇒ 00:22:15.590 Mouhamad: That means… it’s basically that the LLM failed, like, and instead of…
149 00:22:16.140 ⇒ 00:22:21.130 Mouhamad: like, the LLM will always try to help, will always try to generate something.
150 00:22:21.560 ⇒ 00:22:24.160 Pranav Narahari: Yeah, the LLC created that data, yeah.
151 00:22:24.160 ⇒ 00:22:31.710 Mouhamad: Yeah, so this is one example of when the LLM fails, it will try to still generate something to help you.
152 00:22:31.810 ⇒ 00:22:45.160 Mouhamad: So… so this is why, like, this is one thing that I would check, because, like, the dummy data, that means, like, the LLM did something, or if it failed to properly call a tool responsible for fetching something.
153 00:22:45.310 ⇒ 00:22:51.550 Mouhamad: Or it could be a hallucination problem, could be also that I hallucinated or anything.
154 00:22:51.990 ⇒ 00:22:52.420 Pranav Narahari: Cool, so…
155 00:22:52.750 ⇒ 00:23:04.419 Pranav Narahari: That’s a great answer. I kind of want to, go down that route now. Say if it is a hallucination, issue, what are, like, a couple, like, a couple, like, quick fixes that you can make to mitigate hallucination?
156 00:23:05.240 ⇒ 00:23:10.059 Mouhamad: Okay. For the fixes of hallucination,
157 00:23:10.890 ⇒ 00:23:23.269 Mouhamad: The first thing I would think about is basically force grounding, retrieve a document. So it’s basically instructing the model, like, to answer only using the retrieved context. Like, you can just add it, like.
158 00:23:23.380 ⇒ 00:23:29.199 Mouhamad: If you don’t know, don’t, don’t, don’t just come up with answers or anything.
159 00:23:29.310 ⇒ 00:23:30.299 Mouhamad: And so…
160 00:23:30.300 ⇒ 00:23:32.680 Pranav Narahari: When you say add that, where would you add that?
161 00:23:33.490 ⇒ 00:23:34.859 Mouhamad: When you say what, sorry?
162 00:23:35.030 ⇒ 00:23:46.219 Pranav Narahari: So you, you said, like, you know, you said you just mentioned, like, you can add this additional, line of text. Where, where could you add that to, like, the LLM to, or your, your.
163 00:23:46.380 ⇒ 00:23:50.049 Pranav Narahari: your AI system to, like you said, force-ground it.
164 00:23:50.700 ⇒ 00:23:55.930 Mouhamad: Yeah, so one place that you can add it is in the system prompt.
165 00:23:56.250 ⇒ 00:24:02.040 Mouhamad: Yeah, perfect. The first place that we’ve got to actually do it, or the instruction prompt, it depends if you have, like, instruction prompt and score.
166 00:24:02.960 ⇒ 00:24:03.630 Pranav Narahari: Yeah.
167 00:24:03.830 ⇒ 00:24:19.209 Pranav Narahari: What’s another parameter that you could, modulate in terms of if you’re noticing, like, hallucination issues, what is one parameter for these, you know, like, your LLM API call that you can use to, reduce hallucination?
168 00:24:21.420 ⇒ 00:24:23.630 Mouhamad: By parameter, you mean what exactly?
169 00:24:24.120 ⇒ 00:24:28.089 Pranav Narahari: So there’s, like, a bunch of different parameters that you can use,
170 00:24:28.190 ⇒ 00:24:43.679 Pranav Narahari: you know, like, bigger models might have, like, larger… let’s use tokens, for example. You can… you can set, like, a number of tokens. Is there a certain parameter that stands out to you that you can use to, that you can… that you can update to…
171 00:24:44.270 ⇒ 00:24:48.480 Pranav Narahari: Decrease or increase, like, potential for hallucinations?
172 00:24:48.770 ⇒ 00:24:54.889 Mouhamad: Okay, I get you. So, like, parameters, like, you mean, for example, like, temperature, and top speed…
173 00:24:54.890 ⇒ 00:24:56.400 Pranav Narahari: Yeah.
174 00:24:56.400 ⇒ 00:25:10.980 Mouhamad: Yeah, so you can… we can, for example, adjust the temperature. Usually, like, zero to ground more, etc. It depends on the system as well, so it’s not always the case. Top B, you can also use it. Frequency penalty, also.
175 00:25:12.670 ⇒ 00:25:13.980 Pranav Narahari: Yeah, that’s great.
176 00:25:14.050 ⇒ 00:25:15.110 Mouhamad: Cool.
177 00:25:15.840 ⇒ 00:25:19.610 Pranav Narahari: Yeah.
178 00:25:19.850 ⇒ 00:25:27.939 Pranav Narahari: So, I mean, that was a lot of, like, what I wanted to go over in terms of scenarios. We kind of just, like, ran through all of them in, like, that one example.
179 00:25:29.270 ⇒ 00:25:33.769 Pranav Narahari: Yeah, and I think… so we have 5 minutes left. Are there any questions that you have for me?
180 00:25:35.970 ⇒ 00:25:41.560 Mouhamad: No, I think I already asked questions a lot to Sam in the previous round.
181 00:25:41.750 ⇒ 00:25:44.580 Mouhamad: But I think I, yeah, I do have, like, some answers.
182 00:25:45.660 ⇒ 00:25:46.970 Pranav Narahari: Okay, perfect.
183 00:25:47.080 ⇒ 00:26:00.609 Pranav Narahari: That’s great, then. We can, we can end a little bit early. Yeah, if anything comes up, feel free to email me, or email Sam, too, if, there’s a question more pertinent to him. But, yeah, we will, we’ll be in touch shortly.
184 00:26:01.010 ⇒ 00:26:05.520 Mouhamad: Will do. Thank you so much, Prana, for your time. It was really fun to do this interview.
185 00:26:06.070 ⇒ 00:26:07.640 Pranav Narahari: Cool, that’s awesome. Talk soon.
186 00:26:07.640 ⇒ 00:26:09.390 Mouhamad: Thank you so much. Talk soon, bye.