Meeting Title: Brainforge Final Interview Date: 2026-03-18 Meeting participants: Uttam Kumaran, Sowmya, Samuel Roberts


WEBVTT

1 00:00:16.190 00:00:17.290 Uttam Kumaran: Hello?

2 00:00:21.860 00:00:22.450 Sowmya: Hello?

3 00:00:23.920 00:00:24.870 Sowmya: Hi!

4 00:00:25.470 00:00:26.159 Uttam Kumaran: Hi, how are you?

5 00:00:26.920 00:00:28.390 Sowmya: Good. I’m doing great!

6 00:00:28.520 00:00:29.849 Sowmya: Thank you for asking.

7 00:00:29.850 00:00:30.940 Uttam Kumaran: How’s the week going?

8 00:00:31.620 00:00:32.920 Sowmya: It’s going great.

9 00:00:33.890 00:00:35.550 Uttam Kumaran: What’s the… what’s the highlight?

10 00:00:36.270 00:00:38.740 Sowmya: What about you? How is your dead hand?

11 00:00:39.710 00:00:50.300 Uttam Kumaran: Today, today’s good. Busy. Yeah, I feel like every day is, like, really, really busy, so we’re… we’re bringing on a lot of new teammates, and bringing on new clients, and so it’s just…

12 00:00:50.410 00:00:55.640 Uttam Kumaran: Everything all at once in my world, so…

13 00:00:55.640 00:00:56.390 Sowmya: I think…

14 00:00:56.390 00:01:14.490 Uttam Kumaran: I think Sam is gonna just be, like, one or two minutes late. He just messaged me. But maybe before he starts, yeah, I’m happy to, I’m really pumped, that you’re taking the time to do this. I guess any questions that, you want to ask, maybe as soon as he gets here, we can probably start reviewing the… the exercise itself, but…

15 00:01:14.620 00:01:20.480 Uttam Kumaran: Feel free, any questions for me, or anything that, you know, maybe, you’d love to ask, happy to answer.

16 00:01:22.230 00:01:27.080 Sowmya: Oh, Ask you some of the questions, like,

17 00:01:27.240 00:01:31.939 Sowmya: What is the biggest challenges currently you are working on in AI?

18 00:01:32.810 00:01:49.070 Uttam Kumaran: Yeah, I think in AI, one, internally, we’re working on how do we gain… get everybody adopting, skills, like using a cursor, or open code, and building out, like, harnesses for internal agents.

19 00:01:49.180 00:01:54.369 Uttam Kumaran: I think Our hardest stuff is probably the things that we’re doing internally.

20 00:01:54.500 00:02:01.249 Uttam Kumaran: Versus, like, our clients, I think, usually are asking for just MCPs and building simple agents.

21 00:02:01.380 00:02:18.690 Uttam Kumaran: Internally, we’re building a lot of agent harnessing to actually build background agents and execute, you know, actually some of our client work. So, yeah, that’s… I think just figuring that out and making sure that we can have agents actually execute good work is, like, sort of our biggest challenge right now.

22 00:02:20.610 00:02:21.850 Sowmya: Oh, okay.

23 00:02:22.950 00:02:25.549 Uttam Kumaran: You’re cutting in… your mic is cutting in and out a little bit.

24 00:02:26.360 00:02:28.710 Sowmya: Oh, my Well…

25 00:02:29.250 00:02:31.209 Uttam Kumaran: It’s still, like, cutting a little bit in and out.

26 00:02:33.530 00:02:34.180 Sowmya: Method?

27 00:02:34.960 00:02:38.040 Sowmya: Like, I’m just asking, like, argument.

28 00:02:38.640 00:02:40.110 Sowmya: Hi, I’m Costa.

29 00:02:40.620 00:02:43.220 Uttam Kumaran: It’s still cutting in and out a little bit, I’m not exactly…

30 00:02:43.590 00:02:44.280 Sowmya: Okay.

31 00:02:46.070 00:02:47.599 Sowmya: Freedom is horrible.

32 00:02:49.300 00:02:51.609 Sowmya: Just a second, sorry for the inconvenience. Okay.

33 00:02:51.610 00:02:52.590 Uttam Kumaran: Yeah, yeah, that’s working now.

34 00:02:53.530 00:02:55.000 Sowmya: It’s working okay.

35 00:02:55.000 00:02:55.620 Uttam Kumaran: Yeah.

36 00:02:56.650 00:02:58.090 Sowmya: Yeah, I’m just asking that.

37 00:02:58.090 00:02:58.729 Samuel Roberts: Oh, I noticed too much.

38 00:02:58.730 00:03:01.510 Sowmya: try argument, it’s like cursor.

39 00:03:02.310 00:03:08.470 Uttam Kumaran: No, I haven’t tried argument, we… a lot of people here use either Cloud Code or Cursor,

40 00:03:08.630 00:03:15.100 Uttam Kumaran: So I think it’s sort of whatever people want to use, sort of their flavor. But no, it’s maybe something I should look into.

41 00:03:16.280 00:03:18.029 Uttam Kumaran: Is that what you prefer?

42 00:03:19.580 00:03:25.459 Sowmya: Yeah, like, sometimes I prefer, and also, like, like, it’s, good enough.

43 00:03:25.740 00:03:27.310 Sowmya: Like, while walking?

44 00:03:27.570 00:03:29.190 Uttam Kumaran: Yeah, augment code, right?

45 00:03:29.470 00:03:30.240 Sowmya: Yeah.

46 00:03:31.560 00:03:32.220 Uttam Kumaran: Cool.

47 00:03:34.610 00:03:44.209 Uttam Kumaran: Cool. Well, welcome, Sam. Yeah, I think we could probably get started. Maybe, Sam, I could let you kind of drive, but, like, we could get started sort of reviewing the exercise.

48 00:03:44.540 00:03:46.710 Uttam Kumaran: And yeah, let’s… let’s go forward.

49 00:03:49.450 00:03:54.069 Samuel Roberts: And hold on one sec, my computer’s, freaking out a little bit still. Okay, here we go.

50 00:03:55.600 00:04:03.169 Samuel Roberts: All right, yeah, hi. Alright, sorry, I’m a little turned around for a second, give me one sec to find everything I had open.

51 00:04:04.160 00:04:07.239 Samuel Roberts: cool, there we go. Alright.

52 00:04:07.800 00:04:09.459 Samuel Roberts: So.

53 00:04:09.460 00:04:10.010 Sowmya: Awesome.

54 00:04:10.340 00:04:15.919 Samuel Roberts: Yeah, I watched the video, I pulled down the code, but I didn’t get to go too deep yet, but I saw your… your…

55 00:04:16.120 00:04:19.319 Samuel Roberts: your video, everything looked good there. I was curious…

56 00:04:20.390 00:04:24.200 Samuel Roberts: But a couple things might just start us off, if I can find my notes…

57 00:04:26.500 00:04:27.110 Sowmya: Sure.

58 00:04:29.830 00:04:41.109 Samuel Roberts: Yeah, okay. So, one of the things you talked about was being a little more deterministic, and less, less reliant on, an LLM. Can you talk a little bit about

59 00:04:41.660 00:04:43.850 Samuel Roberts: the trade-offs there…

60 00:04:43.850 00:04:44.310 Sowmya: Bye.

61 00:04:44.310 00:04:57.999 Samuel Roberts: Why… I mean, you kind of explained it in the video, but I also want to just, for the sake of here, a little bit more about, like, what that meant, why that trade-off was chosen that way. I think I’d like to hear your thoughts on that a little bit more.

62 00:04:58.380 00:05:01.479 Sowmya: Sure, like, I would try to explain you.

63 00:05:01.830 00:05:18.270 Sowmya: like, this was actually the key, decision in the project, so they intentionally made the system more deterministic first, and then LLM as a fallback to improve the reliability and cost and explainability.

64 00:05:18.270 00:05:35.970 Sowmya: So, why it is not full LLM-based is we have the consistency issue, right? So, LLM can give different outputs for the same input, but in compliance, that is kind of very risky, right? So, we have deterministic, rules like,

65 00:05:35.980 00:05:50.439 Sowmya: like, there is the same input, there is the same output. So, the second is, like, we know, you know, depression matters more than the creativity. So, this is the kind of safety or a compliance, and not a chatbot.

66 00:05:50.450 00:06:05.010 Sowmya: Like, so, false negative means, like, it’s a very dangerous, right? And false positive means, like, impact on the business. We have rules placed on also controlled matching, and which gives the title control.

67 00:06:05.010 00:06:17.180 Sowmya: And on the cost and latency side, we have the LLMs falls, like, falls under very expensive, right, and slower. Like, if I run LLM for every product, like, it is not scalable.

68 00:06:17.180 00:06:26.119 Sowmya: That is why I used the rules plus FERSI matching to filter fast, and also LLM, only for the edge cases.

69 00:06:26.690 00:06:27.436 Sowmya: And,

70 00:06:28.060 00:06:36.340 Sowmya: And what I did is, I have added the deterministic layer, like FOZ and synchronous matching, so that

71 00:06:36.490 00:06:47.280 Sowmya: That matches 70-80% of the cases, and the next step is the semantic layer, like embedding similarity, and also… and to handle the variation.

72 00:06:47.280 00:06:59.539 Sowmya: And one more step is that LLM reasoning for ambiguous cases that are unclear, like, which have ingredients which are unclear and generate the explanation. So, basically.

73 00:06:59.770 00:07:15.490 Sowmya: I asked myself, like, where do I really need the intelligence versus where can I hard-code certain, certainities, so that I push the LLM to the edges, not to the code. So, this is how, like, I optimize the…

74 00:07:15.540 00:07:24.600 Sowmya: Determination for safety and also auditability, and use the LLM selectively for recall on ambiguous cases.

75 00:07:28.020 00:07:34.760 Samuel Roberts: Great. Yeah, I would love to talk more about, the edge cases. When you were actually building it, like, what were the…

76 00:07:34.830 00:07:48.330 Samuel Roberts: the specific moments that you, like, hit that you realized, okay, this isn’t working that right way, or, okay, this is where the fallback to the LLM might help, and I’m curious about your process, I guess, for actually putting this together that way.

77 00:07:49.200 00:07:51.230 Sowmya: Yeah, like,

78 00:07:51.250 00:08:00.209 Sowmya: Those cases aside, what I have observed is, like, we have the synonyms or, like, alternate problems.

79 00:08:00.210 00:08:14.550 Sowmya: like, like, while returning the vitamin B3, or versus, like, like, some other, things, like, so the issue is there is a role-based, rule-based versus CSV that does not catch this.

80 00:08:14.550 00:08:17.290 Sowmya: Even fuzzy matching fails sometimes.

81 00:08:17.310 00:08:23.080 Sowmya: Like, we… so we have all… already added the synonym nature, so we added the…

82 00:08:23.730 00:08:40.620 Sowmya: But we observed that there are still some edge cases, like, that is why we use the LLM to reason it. And the second is, like, we have some hidden and mixed integrins, in the ingredients in the text, like, I can say, for example.

83 00:08:40.620 00:09:00.610 Sowmya: The text contains the plant-based extracts and also stabilizers. So, issue with this kind of text, like, there is no explicit indicator name, like, the rule will fail, right? So, we used LLM to infer the possibility of the risky components, and also mark as review instead of save.

84 00:09:00.670 00:09:13.149 Sowmya: And also, there are some OCR noise from the images, you know, like, for some chemical formulas or chemical text, like sodium or benzonate, or else it gives the…

85 00:09:13.660 00:09:31.460 Sowmya: like, at the rate of, like, some things, like, at this symbol, instead of, like, a character, so, or some OCR noise, we have used preprocessing and normalization, and if it is still unclear, LLM reasoning step will definitely help in these cases.

86 00:09:31.460 00:09:43.420 Sowmya: And there is also some partial matches and also threshold issues that I have known, like fuzzy match code and also, like, borderline, in case there is an issue.

87 00:09:43.420 00:09:56.229 Sowmya: And also, if threshold is very low, then it is the false positive, right? And if it is high, it can miss the cases, so that… that is why I use the deterministic threshold first.

88 00:09:56.270 00:10:16.370 Sowmya: And also, like, there are also some over-flagging problem, which is the bigger one. LA version has, like, too many unsaved flags, and also business impact, right? So, I have introduced, three states, safe and unsafe and review. So, LLM is used only for the borderline review.

89 00:10:16.780 00:10:23.089 Sowmya: like, which are unsafe. Like, so, these are the edge cases which I wanna highlight about.

90 00:10:24.500 00:10:34.690 Uttam Kumaran: Can I… can I ask a question about just, like, even from a higher level? Like, how did you… what was your process to actually map out the system that you were going to build?

91 00:10:34.690 00:10:49.319 Uttam Kumaran: Like, do you… are you, like, are you typically someone that, like, goes and just is writing notes, or do you quickly try and build certain pieces? Like, when you got this problem, like, what was your first step in, like, sort of trying to think about a larger solution?

92 00:10:50.210 00:10:53.070 Samuel Roberts: You read my mind, that’s exactly where I was going.

93 00:10:53.600 00:11:00.559 Sowmya: Yeah, that was, like, a good question. For this… for this one, I did not jump into the coding first.

94 00:11:00.560 00:11:09.489 Sowmya: how I map the system is that, I understand the problem properly, because I reframe the problem, this is as, you know.

95 00:11:09.490 00:11:22.069 Sowmya: like, only an AI problem first, because it is a kind of compliance and pipeline problem, so I drove into multiple layers, like ingestion, extraction, and then matching, and then decision.

96 00:11:22.130 00:11:41.300 Sowmya: And then, I started with the simplest possible way, so that baseline. So, I always do CSV, forbidden list, and also simple string match to just answer the, like, can I solve the 50-60% of this, like, without using actual AI or not?

97 00:11:41.300 00:12:06.269 Sowmya: And then I identified the, gaps, like, which is really, I can say, important, like, instead of over-designing, or… I always run few samples and saw the, like, failures, so if synonyms is missed, or noise is text is missed, and the context is missing, that told me that where AI is actually, like, I can say that, needed, and what part, like, we need

98 00:12:06.270 00:12:07.120 Sowmya: that.

99 00:12:07.120 00:12:08.130 Sowmya: And also.

100 00:12:08.320 00:12:21.080 Sowmya: AI can… will help us, and also I try to create, I would say it’s a modular design on paper. And also, so I usually sketch it quickly on my notebook, not every heavy document.

101 00:12:21.080 00:12:34.000 Sowmya: I define what each layer will do, and what will go, and what will come out. So, this will help me, like, teams in place, and then I will build it incrementally, so I follow the layer by layer by…

102 00:12:34.000 00:12:50.880 Sowmya: layer approach, like, first ingestion, extraction, and then rule-based matching, and then to improve the, like, matching, and also adding the LLM. So I never start with the LLM, always, and then I also, like, add the evaluation early.

103 00:12:50.880 00:13:00.380 Sowmya: So, because to get the model in place, we need to have the evaluation pipeline also to track the experiment, and also… so I…

104 00:13:00.530 00:13:14.099 Sowmya: create a small data set, uncheck false positives and false negatives based on that. I optimized the uses of LLMs, so I did not just plug the LLMs, so I simply do something like…

105 00:13:14.270 00:13:27.130 Sowmya: I ask myself as, like, what to call, when to call, and what input to give, and what output format I need. So, this is my approach, actually, for, like, when I’m building this, problem.

106 00:13:29.180 00:13:29.740 Uttam Kumaran: Great.

107 00:13:29.940 00:13:31.090 Uttam Kumaran: Yes, Sam, go ahead.

108 00:13:32.330 00:13:41.979 Samuel Roberts: Yeah, so, I mean, building on that, I want to hear more about the process, like, we were talking cursor and augment. I’m curious, using the AI-assisted coding tool, like, how do you…

109 00:13:42.200 00:13:51.419 Samuel Roberts: how do you go back and forth with these tools? You know, we’re all figuring out our ways of doing things, I’m curious to hear yours, I guess.

110 00:13:54.120 00:14:07.109 Sowmya: Yeah, so basically, like, I actually use a cursor quite a bit here, like, not… not, like, generate everything, so I start with the structure myself, so I don’t ask cursor, like, build a

111 00:14:07.110 00:14:14.919 Sowmya: Like, the whole system. So, I define the folders and pipeline flow, because the cursor comes in after the structure is very clear.

112 00:14:15.150 00:14:31.259 Sowmya: So, and then I use the cursor for this corporate task. So, I use it like a pair programmer, not the generator. Like, whenever I need to, the fudgy match, or using rapid force, and, and if I need to add the OCR extraction using,

113 00:14:31.410 00:14:39.579 Sowmya: EasyOCR, and or, if I need to refactor some modules into the class, so it is kind of for some targeted prompts.

114 00:14:40.050 00:14:42.680 Sowmya: Then also, You know?

115 00:14:42.970 00:14:51.719 Sowmya: I can see that there are some back-and-forth loops, like, like, my loop is something, like, I write the basic logic.

116 00:14:52.050 00:15:03.900 Sowmya: So, cursor improves the… and fills the gaps, and it review and modify that, and then I test it, and then I repeat it. So, I never blindly accept the code, whatever has given.

117 00:15:04.010 00:15:22.060 Sowmya: given by the code, or cursor, like, so I do iterate the approach, and also optimizing the code, and then testing. And, you know, cursor really, really helped me most on the boilerplate code, or… and also refractoring side, on those edge cases, and also…

118 00:15:22.260 00:15:34.490 Sowmya: working with the… but I don’t use the cursor for core destinel logic, or also LLM prompting and matching threshold. So, yeah, like, that is what, like, I do with the cursor so far.

119 00:15:36.100 00:15:36.670 Samuel Roberts: Okay.

120 00:15:38.200 00:15:40.970 Uttam Kumaran: Do you… are you typically working on, like…

121 00:15:41.070 00:15:53.910 Uttam Kumaran: I guess… I guess my question is, like, how… how did you think about, like, testing throughout the whole process? Like, are you having Kirscher add unit tests? Like, are you doing integration testing? Like, how do you think about

122 00:15:54.560 00:15:55.940 Uttam Kumaran: that process.

123 00:16:00.250 00:16:02.630 Sowmya: Yeah, do you talking about the cursor?

124 00:16:02.900 00:16:08.069 Uttam Kumaran: No, no, no, like, within your system, like, how do you typically think about testing?

125 00:16:08.380 00:16:14.399 Uttam Kumaran: Like, unit tests, or integration tests, or, you know, potential deploy tests.

126 00:16:17.230 00:16:30.809 Sowmya: Yeah, so basically, like, on the testing side, I did not, like, treat, like, a pure machine learning, it kind of, like, like, production system. So, testing was basically a layer-by-layered,

127 00:16:30.880 00:16:41.789 Sowmya: So, I use mix of, like, unit test for components and integration for pipeline, and also evaluation dataset for the LLM quality, and also unit test.

128 00:16:42.000 00:16:51.860 Sowmya: for… I use it for component level, and I wrote the small tests for critical pieces, like, integrate… like, ingredient extraction, and also

129 00:16:51.920 00:17:04.989 Sowmya: synonym matcher, and also Fuzzy Matcher, and also, like, so the goal is to make that each module work independently in the unit testing, and also… and on the integration testing side.

130 00:17:05.109 00:17:15.169 Sowmya: This is more important than for this project, because I tested the full pipeline, so I checked the correct ingredients and extracted, and also current matches the

131 00:17:15.319 00:17:25.839 Sowmya: matches the forbidden list, and also correct the final label, because bugs usually happen between the components, right? Not inside the one, so…

132 00:17:26.230 00:17:44.350 Sowmya: And also, there is evaluation testing, which is LLM-specific. I always create a small label data set, like, for product and expected output, and then I run the pipeline on the dataset and compare the outputs, where I check the precision.

133 00:17:44.550 00:17:47.720 Sowmya: False positive, false negatives, and also…

134 00:17:47.850 00:18:03.869 Sowmya: And also, there is a manual testing, which is very important. I initially tested manually with the edge cases, like synonyms, OCR noise, and mixed inputs. This helped me to identify the gaps, like, initially, and also

135 00:18:04.010 00:18:23.809 Sowmya: I did not write a very heavy test suit, but initially, but I was focused on the test where system can break, so my realization was your test can ensure the correctness, but integration and evaluation ensures that, the kind of, like, real-world reliability and scalability, too.

136 00:18:25.250 00:18:29.640 Sowmya: Right. So, this is what I usually do, like, whenever in the project.

137 00:18:34.610 00:18:40.920 Samuel Roberts: Great. Talk to me about the, the metrics and the logging. I saw you…

138 00:18:41.040 00:18:48.929 Samuel Roberts: included that, and also, like, the unique identifier for everything for… I just want to hear a little bit of your thought process for that.

139 00:18:50.730 00:18:51.830 Sowmya: Sure!

140 00:18:51.930 00:19:11.240 Sowmya: like, like, so on the logging and metric side, this is the compliance system, right? Like, like, not the demos, like, so I have added the checks, so the system actually reliable over time. So, I track the precision, recall, and false positives, and which is very important here.

141 00:19:11.240 00:19:15.559 Sowmya: And the false negatives, which is critical for safety, so I have…

142 00:19:15.730 00:19:31.410 Sowmya: stored using these evaluation runs, and there is a pipeline metrics as well, like, I can say that percentage of cases handled, and also embeddings, and also LLMs. This helped me to understand, like, am I overusing the LLM?

143 00:19:31.430 00:19:46.699 Sowmya: I’m not using the LLM like that, and also… and also there is a latency, which is P95 latency, and also total response time and the LLM call time, because, you know, in the project, LLM is the bottleneck, so we need to…

144 00:19:46.970 00:20:04.750 Sowmya: check the latency side as well, right? So, there is a confidence con… confidence and fallback rate, like, a review and a LLM fallback. So, these are a kind of signal that you know that gives a, like, glimpse about the uncertainty in the system.

145 00:20:04.870 00:20:07.780 Sowmya: On the… on the logging side.

146 00:20:08.010 00:20:14.680 Sowmya: I structured the logging to trace the full decision path, actually. For each request.

147 00:20:14.840 00:20:25.250 Sowmya: for input to extract, and also read the, like, read the ingredients, and also then it matches, and also, till the final decisions, I log the each request.

148 00:20:25.370 00:20:29.730 Sowmya: Why this actually important is, let’s say if the…

149 00:20:30.150 00:20:40.359 Sowmya: if it is a wrong output, then I can trace where it is actually happening. Like, is it an extraction issue, or it’s a matching issue, or it’s a…

150 00:20:40.640 00:20:47.059 Sowmya: LLM issue, so like that, like, so that… so also for the auditability, because compliance systems

151 00:20:47.060 00:21:06.740 Sowmya: we always need the… why was this flagged, and also, like, why we need to answer it, and also we need that explainability, like, why is the fault, and also… so log give me the full explanation trial, like, why it is flagged, and also, we have the improving systems, when… where I can use the…

152 00:21:06.810 00:21:12.550 Sowmya: logs to collect the failure cases and also feedback into the evaluation dataset, so…

153 00:21:12.740 00:21:21.910 Sowmya: That’s why, like, we have designed the, like, centralized logger and also structure logs on different levels. So, yeah, I designed, basically, metrics

154 00:21:21.910 00:21:34.050 Sowmya: to measure this, system reliability and also LLM usage. However, like, it is not covering 100%, but whatever it is required, I have added it into the project.

155 00:21:40.020 00:21:42.079 Samuel Roberts: Tom, do you have any other questions, or…

156 00:21:42.080 00:21:50.109 Uttam Kumaran: Yeah, I was gonna, I guess, gonna ask, like, tell me about developing, sort of, any of these LLM-based systems, like, within a team.

157 00:21:50.140 00:22:02.129 Uttam Kumaran: Like, I’m kind of interested in, like, how you… any of your experiences, like, developing with others? Like, talk to me about how you think about the actual, like, development lifecycle for, like, a feature or a product.

158 00:22:02.170 00:22:06.339 Uttam Kumaran: Like, things like branching, things like…

159 00:22:07.120 00:22:14.710 Uttam Kumaran: You know, reviews, and, like, talk to me about, like, sort of what is the ideal software development lifecycle environment that you’re used to?

160 00:22:16.830 00:22:17.870 Sowmya: Sure!

161 00:22:18.540 00:22:28.669 Sowmya: You see, on the… on this, basically, like, I have worked on, like, in a team setup, like, I always follow a, like, structured approach.

162 00:22:28.880 00:22:39.890 Sowmya: like, LLM product lifecycle approach, and also not only, like, ad hoc experiment, so I break everything into four stages. If I think about the LLM feature lifecycle.

163 00:22:40.260 00:22:49.650 Sowmya: So, one is the experimentation, the other is the validation, and third is the productionization, and then last is the monitoring and iteration.

164 00:22:49.890 00:22:56.770 Sowmya: So, these are the, like, always I break into the four stages, and also the branching strategy side.

165 00:22:57.040 00:23:10.750 Sowmya: I always have a main dev and also feature branches, so main is the fall into the stable production. Only the dev is for integration branch, and the feature branch is used for each feature. So, let’s say…

166 00:23:10.880 00:23:19.429 Sowmya: If we have a RAG improvement, or a prompt change feature, or any other feature related to the LLM, I can add it to the feature branch.

167 00:23:19.710 00:23:22.950 Sowmya: So… So, prompt…

168 00:23:23.210 00:23:35.900 Sowmya: changes will go into the feature branches, and it will be triggered as to, like, the code actually, like, and also on the development flow, I always start with the feature branch, and then

169 00:23:35.960 00:23:44.140 Sowmya: I do the local experimentation using test prompts, test retrieval quality, and also small evaluation retest.

170 00:23:44.290 00:23:47.650 Sowmya: And so I don’t post the random experiments.

171 00:23:48.170 00:23:48.710 Uttam Kumaran: Okay.

172 00:23:49.860 00:24:06.209 Uttam Kumaran: talk to me about, like, how you think about evaluations. Like, I’m sort of interested in, like, methodology, like, I, you know, I’ve been sort of studying evals maybe for, like, 2 years, but maybe I’m kind of interested in, like, how it’s maybe changed, like, in your view.

173 00:24:06.220 00:24:18.099 Uttam Kumaran: Like, how do you feel like you’re evaluating LLM outputs differently these days? Like, it is definitely very tedious, so yeah, I’m just curious, like, kind of how you think about that.

174 00:24:21.010 00:24:31.499 Sowmya: Yeah, like, on the evaluation side, like, what I do is there are multiple layers that I always approach, like,

175 00:24:31.860 00:24:47.070 Sowmya: If, honestly, this has evolved a lot, like Bert earlier, I treat evaluation, like, accuracy check. Now, I treat, like, it’s a system design problem, not only accuracy or some specific metrics.

176 00:24:47.270 00:24:50.059 Sowmya: Earlier, it was, like, a small dataset.

177 00:24:50.240 00:24:54.999 Sowmya: accuracy, FN score, and also manual checking, which worked.

178 00:24:55.000 00:25:10.860 Sowmya: really well for machine learning, but for LLM, it did not work really well. So, we have deterministic checks I always use, which is very far… fast, and also, before even LLM quality, there is some format validation, and also

179 00:25:10.960 00:25:12.450 Sowmya: the required,

180 00:25:12.480 00:25:20.820 Sowmya: field are also present or not, and also no empty output, so that, like, these catches 22 to 30% DAM failures.

181 00:25:20.820 00:25:34.379 Sowmya: Yeah. And, next is, there is some, task-specific metrics, like classification, we have precision, which is very important, and then the recall, the false positives and false negatives.

182 00:25:34.380 00:25:47.609 Uttam Kumaran: Is there a tool that you’re used to using for setting these up? Like, are you using, like, a software tool? Are you using an open source tool to actually do the framework for executing and holding the evals?

183 00:25:49.910 00:26:04.199 Sowmya: Yes, yeah, I have used, Langsmith and also Prompt… and also, like, Deep, like, Eval as well, for OpenAI evaluations, but I don’t rely on, like, only just one, because I combine the tools and also custom evaluations.

184 00:26:04.200 00:26:10.569 Sowmya: So, I always, goes for Langsmith for debugging and tracing, and also, like,

185 00:26:10.570 00:26:27.250 Sowmya: for prompt testing, and also for RAC-specific tools, if only needed, I use the RAGAS framework, like, which is good for evaluation frameworks, and also to track the framework, and also to get the retrieval quality, and also context relevance. In that cases, I use the RAGAS.

186 00:26:28.630 00:26:29.380 Uttam Kumaran: Great.

187 00:26:31.620 00:26:34.840 Uttam Kumaran: Yeah, I mean, maybe one other question I have,

188 00:26:34.990 00:26:42.819 Uttam Kumaran: is… I am interested in, like, how AI has changed the way you’re executing work over the last, like, 3-6 months.

189 00:26:42.820 00:26:43.310 Sowmya: Correct.

190 00:26:43.310 00:26:49.830 Uttam Kumaran: I mean, in particular, I’m interested in, like, you must be following things with OpenClaw, Claude Code,

191 00:26:50.270 00:26:53.409 Uttam Kumaran: You know, like, various sort of, like,

192 00:26:53.980 00:27:09.639 Uttam Kumaran: like, automated agentic systems. I’m wondering, like, how you think it’s… it will affect, sort of, the way you… you execute work, and how, like, let’s say you were running an AI team. How would you take advantage of, like, some of those technologies?

193 00:27:10.080 00:27:12.250 Uttam Kumaran: Especially, like, the last, like, 3-4 months, you know?

194 00:27:13.510 00:27:14.660 Sowmya: Yeah.

195 00:27:15.490 00:27:19.340 Sowmya: Like, we can see that, like, basically in the

196 00:27:19.340 00:27:40.220 Sowmya: three to six months, like, we can see things have, like, actually changed a lot. Like, AI is no longer just for only features, now it’s a part of engineering workflow itself, so my… my workflow kind of earlier is, like, writing the code, debugging, then testing. Now it is, like.

197 00:27:40.220 00:27:53.090 Sowmya: think, design, and also use AI assistant, like, build, and also validate it, and then iterate it. I have used tools like Cursor and Cloud Code for fast prototyping to…

198 00:27:53.090 00:27:59.790 Sowmya: Like, so this basically what uses to take hours is, now I’m able to do it in minutes.

199 00:27:59.790 00:28:15.160 Sowmya: So, there is also, like, code understanding for large code basis or basis. So, like, I also ask the explain the flow and where the decision happening. So, this speeds up the onboarding a very lot, and so…

200 00:28:15.160 00:28:25.850 Sowmya: And also, like, if I’m running a team, I would structure the usage like this, and I will define where AI is allowed and where AI is not allowed, so I can get the…

201 00:28:25.990 00:28:31.679 Sowmya: Some lists, like, let’s say, test generation, debugging help, boil freight things, and also…

202 00:28:31.790 00:28:49.760 Sowmya: And for control things, we can have the code logic, prompts, and also destruction systems, so we can avoid the, like, bind releance, like, which we are having right now, so we can standardize the AI usage, and so I can use the… something like prompt templates, or also code review rules.

203 00:28:49.760 00:28:54.049 Sowmya: And also, AI in the development lifecycle as well, like in dev phase.

204 00:28:54.070 00:29:09.769 Sowmya: We can use the cursor or a cloud code for utilities, or some ideation, or on the PR phase, we can use the EI-assisted, like, code reviews. And for testing, we can use the… we can generate unit tests.

205 00:29:09.770 00:29:15.300 Sowmya: And suggest some edge cases like that. So, yeah, I have seen two to three…

206 00:29:15.390 00:29:19.559 Sowmya: Like, like, it’s faster development now?

207 00:29:19.820 00:29:28.520 Sowmya: Which is, like, faster iteration cycles, and also less time on boilerplate code as well.

208 00:29:28.760 00:29:29.320 Uttam Kumaran: Yeah.

209 00:29:29.590 00:29:43.430 Uttam Kumaran: Yeah, I mean, to give you some interesting pieces, like, you know, I’ve been doing a lot in the past, like, few weeks especially, and we’re using a lot of these, like, what they call, like, background agents, right? So I’m able to assign work to Codex, to Cursor Cloud.

210 00:29:43.680 00:29:54.469 Uttam Kumaran: And so what I’m finding is that most of, like, my work is actually more on project planning, and breaking things down into tickets appropriately, because then I can just…

211 00:29:54.560 00:30:07.319 Uttam Kumaran: I have another interface where I’m assigning work to agents to execute, but the harnessing of the agent needs to be good, like, you need to have all your env keys, you have clear guidelines on how to do certain work, how to push PRs.

212 00:30:07.460 00:30:18.430 Uttam Kumaran: So yeah, I agree, it’s changing really, really fast, but it’s sort of… that’s why I think the biggest focus right now is on planning, right? And a ton of upfront planning before…

213 00:30:18.430 00:30:18.940 Sowmya: Fair enough.

214 00:30:18.940 00:30:20.440 Uttam Kumaran: Execution, you know?

215 00:30:23.450 00:30:23.960 Uttam Kumaran: Cool.

216 00:30:25.550 00:30:34.490 Sowmya: The biggest thing, like, that is going in the market. Every day, new things are coming, like, as you mentioned, as you mentioned, open claw.

217 00:30:34.490 00:30:51.569 Sowmya: disrupted a lot of industries, and so there are, again, like, I think now it’s just the start, like, there is, again, a lot of things which is going to come, and we need to prepare it accordingly. So, yeah, things like this, responsibilities are getting changed now.

218 00:30:51.570 00:30:55.539 Sowmya: Now, from coders to planners, and then later to something, so…

219 00:30:57.420 00:30:58.509 Uttam Kumaran: I agree, agree.

220 00:30:59.310 00:31:01.810 Uttam Kumaran: Cool. I think that’s all the questions I had, Sam.

221 00:31:03.580 00:31:09.580 Samuel Roberts: Yeah, he covered most of what I still had left, so… Yeah, what questions do you have for us? What can we, you know…

222 00:31:09.690 00:31:14.270 Samuel Roberts: answer for you about the role, about Brain Forge, about how we work, whatever you have.

223 00:31:16.900 00:31:28.760 Sowmya: Thank you, first of all, thank you for your time and having your, like, time, and it was nice talking to you, both of you, and I just have, like, few questions,

224 00:31:28.780 00:31:37.349 Sowmya: Regarding that, like, if I’m successful in this interview, what would be, like, the first focus on the first 90 days?

225 00:31:38.920 00:31:47.420 Uttam Kumaran: Yeah, I can, give you some insight. So, we have active clients that we are, developing Agentic solutions for.

226 00:31:47.690 00:31:53.040 Uttam Kumaran: Whether that’s workflow automations, whether that’s internal chat interfaces.

227 00:31:53.420 00:32:06.290 Uttam Kumaran: However, a lot of our work is actually a lot more around building the context layer, and building the context graph for agentic applications to access to. And so we have a few clients right now that that’s the systems we’re kind of building.

228 00:32:06.290 00:32:13.009 Uttam Kumaran: So the first, like, 30, 60, 90 days is, one, of course, like, meeting everybody else on the AI team, understanding, like, how…

229 00:32:13.010 00:32:34.959 Uttam Kumaran: Brainforge works. Of course, like, I think compared to your… compared to, I think, what you’re used to at Bank of America, like, we are a client service company, right? So, everything is in service of the client and understanding what they want built, the timeline. So in that sense, like, you’ll get to see some of those clients, and then you’ll get assigned, to begin working within one of them.

230 00:32:34.970 00:32:43.520 Uttam Kumaran: But again, I think the work is very, very similar to a lot of the pieces in this exercise. It’s a mix of LLM, it’s also a mix of just normal software development.

231 00:32:43.680 00:32:44.500 Sowmya: Yeah.

232 00:32:44.500 00:33:01.659 Uttam Kumaran: But again, fundamentally, I think our job is changing where we’re all able to speed up and use it, you know, agentic systems to do that. And so we are… you’ll see everybody at Brainforge uses AI for, like, every single thing. Like, including all the business people, marketing people,

233 00:33:01.890 00:33:11.790 Uttam Kumaran: So we’re very, very forward, and you’ll also see some of our own systems that we built, you know, internally, and get a sense to… get an access to work on those and things like that, so…

234 00:33:13.730 00:33:17.799 Sowmya: Yeah, that’s great to know. Yeah, and

235 00:33:17.980 00:33:22.480 Sowmya: Honestly, like, that first 90 days is a mix of everything.

236 00:33:22.780 00:33:29.599 Sowmya: And… That’s all for me right now, questions, and do you have any more questions for me?

237 00:33:29.600 00:33:38.309 Uttam Kumaran: Yeah, I’m kind of interested in, like, what interested you in Brainforge originally? Like, I know it’s a little bit different than your current experience, I’m wondering, like, what drew your attention?

238 00:33:38.610 00:33:46.970 Uttam Kumaran: Or, like, after going through this process, like, what did you find more compelling to, like, you know, continue, forward?

239 00:33:48.970 00:33:52.789 Sowmya: Yeah, that’s a really good one. So, like, what…

240 00:33:52.790 00:34:09.930 Sowmya: Like, at Brainforce, like, what I’ve found is a couple of things, like, first is the focus on LLM work, like a product-focused LLM work. So, in my current work, I have built LLM systems, like RAG agents and evaluations, but mostly in the enterprise, or you can say internal use cases.

241 00:34:09.929 00:34:19.110 Sowmya: So now it is very clearly, like, product plus user-facing plus fast iteration. That is something I wanted to do more.

242 00:34:19.110 00:34:33.110 Sowmya: Or, like, especially from 0 to 1 building. Like, also, I see that, like, star… strong emphasis on the evaluation. So, so in the JD, like, we have specifically mentioned that, like, evaluation frameworks.

243 00:34:33.110 00:34:46.280 Sowmya: semantic retrieval and prompt iteration, so that aligned a lot, like, what I have been doing and wanted to know more about the… understand more about the LLMs and also, like, the whole systems.

244 00:34:46.280 00:34:55.269 Sowmya: like, building the evaluation dataset, and also tuning the retrieval, and I also like that. It’s not just, like, LLMs using.

245 00:34:55.270 00:35:06.079 Sowmya: But also, get the impact, like, it’s, is the LLM really workflow, or also, it’s doing the worth or not? So, also…

246 00:35:06.690 00:35:12.380 Sowmya: So, these are the main things, like, I would like to say. So, about brain force.

247 00:35:12.810 00:35:13.740 Sowmya: Cool.

248 00:35:13.740 00:35:20.629 Uttam Kumaran: Great, yeah, I think, we are… you’ll see internally and externally, we’re trying to do what’s on the cutting edge.

249 00:35:20.800 00:35:26.769 Uttam Kumaran: We also move very, very quickly, so people are able to own end-to-end

250 00:35:27.050 00:35:30.399 Uttam Kumaran: development of platform from, like, product ideation.

251 00:35:30.530 00:35:49.020 Uttam Kumaran: All the way through to actual agentic, like, framework development, all the way back up through evals and integrations. So we expect that a lot of the folks on our AI team are, you know, in that way, a little bit more full stack, you know? Doesn’t mean, like, you have to have a full understanding of every single part, but…

252 00:35:49.110 00:36:06.629 Uttam Kumaran: In this sense, we have people who have flavors of interest in backend, or front-end or integrations, but ultimately, on the AI team in particular, I think everybody sort of has, like, a broad mix of skill set, which means you’re sort of like a Swiss Army knife, when working for clients, right? So, that’s great to hear.

253 00:36:06.630 00:36:08.270 Sowmya: Yeah, that’s right.

254 00:36:09.800 00:36:13.209 Sowmya: Cool. Thank you so much, like, for your time.

255 00:36:13.470 00:36:14.839 Uttam Kumaran: Yeah, of course, no, thank you so much.

256 00:36:14.840 00:36:16.009 Sowmya: Like, it was…

257 00:36:16.230 00:36:19.419 Uttam Kumaran: Yeah, I’m sure you’ll definitely, hear back from us very shortly, so…

258 00:36:20.430 00:36:28.559 Sowmya: Oh, thank you, it was really nice discussing everything with you, and thank you so much for your time, and you have a nice rest of the day.

259 00:36:28.560 00:36:30.119 Uttam Kumaran: Thank you. Okay. Bye.

260 00:36:30.120 00:36:30.850 Sowmya: Byes.