Meeting Title: AI Service Office Hours Block Date: 2026-03-30 Meeting participants: Mustafa Raja, Casie Aviles, Samuel Roberts, Pranav


WEBVTT

1 00:00:23.660 00:00:24.470 Mustafa Raja: Hey…

2 00:00:26.800 00:00:27.860 Casie Aviles: Hey, Mustafa.

3 00:00:28.480 00:00:29.129 Mustafa Raja: How are you?

4 00:00:30.500 00:00:31.709 Casie Aviles: Yeah, doing alright.

5 00:00:33.820 00:00:35.080 Mustafa Raja: Yeah, same.

6 00:00:35.680 00:00:38.540 Casie Aviles: You’ll be, starting later this week, right?

7 00:00:39.720 00:00:41.150 Mustafa Raja: From Wednesday, yeah.

8 00:00:43.750 00:00:45.370 Casie Aviles: Oh, okay, okay, okay.

9 00:00:45.560 00:00:49.550 Samuel Roberts: Oh, cool, yeah, that’s something I wanted to talk about, make sure we were all prepared.

10 00:00:50.240 00:00:50.890 Mustafa Raja: Yep.

11 00:00:54.180 00:00:55.820 Samuel Roberts: Anyway, how are you guys doing?

12 00:00:56.410 00:00:58.250 Mustafa Raja: Yeah, I’m good. I’m doing good.

13 00:00:58.750 00:01:00.019 Samuel Roberts: Okay, cool, cool.

14 00:01:00.950 00:01:02.380 Samuel Roberts: Good weekend, everyone.

15 00:01:02.660 00:01:03.270 Pranav: Hey, guys.

16 00:01:04.250 00:01:05.019 Samuel Roberts: Deep enough.

17 00:01:06.460 00:01:07.449 Samuel Roberts: How’s it going?

18 00:01:08.470 00:01:11.460 Pranav: Pretty good, pretty good. Didn’t do much this weekend. How about you?

19 00:01:11.460 00:01:17.180 Samuel Roberts: Same. Same. I was still kind of dragging, not feeling great. I slept a lot.

20 00:01:17.700 00:01:20.450 Samuel Roberts: Yeah. Which was very nice. I needed, I needed it.

21 00:01:20.880 00:01:24.100 Pranav: Yeah, sometimes all you need is just, like, a weekend to just chill out.

22 00:01:24.100 00:01:28.069 Samuel Roberts: That’s… yeah, yeah, it was… you know, it’s not the,

23 00:01:29.310 00:01:40.980 Samuel Roberts: what am I trying to say here without being mean to my kid? With a baby, it’s different, but it’s still, like, there’s a relaxing and there’s a doing things with a baby, and, like, he’s still effort and work, but it was at home, and…

24 00:01:41.150 00:01:42.860 Samuel Roberts: I could nap when he napped, you know?

25 00:01:43.360 00:01:43.740 Pranav: Yes.

26 00:01:44.660 00:01:46.200 Samuel Roberts: But yeah, cool, cool.

27 00:01:46.530 00:01:50.049 Samuel Roberts: Are you, are you still in, Worcester?

28 00:01:50.570 00:01:52.969 Pranav: Yeah, I am still in Worcester for right now, so…

29 00:01:52.970 00:01:53.340 Samuel Roberts: Okay.

30 00:01:53.340 00:01:54.980 Pranav: Soon, I’m gonna be making the move.

31 00:01:55.360 00:01:58.720 Samuel Roberts: Cool, okay, I knew you were looking at leases and stuff, I didn’t know how that played out.

32 00:01:58.860 00:01:59.610 Samuel Roberts: Sweet.

33 00:01:59.610 00:02:07.069 Pranav: Yeah, so I started looking, but then I… everyone that I was talking to is like, you have to kind of do it just about, like, 60 days or less before.

34 00:02:07.910 00:02:08.940 Samuel Roberts: Oh, interesting.

35 00:02:09.310 00:02:10.090 Pranav: Yeah.

36 00:02:11.039 00:02:14.879 Pranav: Yeah, I think that’s just kind of, like, the cycle, like, for all the apartment buildings, and…

37 00:02:14.880 00:02:15.360 Samuel Roberts: Sure.

38 00:02:15.360 00:02:20.900 Pranav: It’s like, they have to give… the current tenants have to give 60-day notice if they want to re-sign, so if they don’t.

39 00:02:20.900 00:02:21.510 Samuel Roberts: Oh, okay.

40 00:02:21.510 00:02:24.629 Pranav: It opens up to everybody else.

41 00:02:24.630 00:02:25.260 Samuel Roberts: True.

42 00:02:25.540 00:02:29.020 Pranav: But then, yeah, if you’re not going to, like, one of those apartment units, then it could…

43 00:02:29.180 00:02:31.580 Pranav: Be, like, 30 days or less, so…

44 00:02:31.580 00:02:32.500 Samuel Roberts: Right, right.

45 00:02:33.040 00:02:37.759 Pranav: I probably won’t have any, like, place figured out until, like, maybe beginning of May.

46 00:02:38.290 00:02:42.490 Samuel Roberts: Yeah, yeah, that makes sense, though. Okay, cool. Still plenty of time, I feel like, to sort it all out.

47 00:02:43.100 00:02:43.710 Pranav: Yep.

48 00:02:44.360 00:02:44.950 Samuel Roberts: Cool.

49 00:02:45.750 00:02:47.640 Pranav: Give me one sec, guys, I’ll be right back.

50 00:02:48.110 00:02:49.150 Samuel Roberts: Yeah, totally, totally.

51 00:02:49.730 00:02:54.790 Samuel Roberts: Yeah, sorry, I didn’t mean to interrupt what you guys were talking about previously, but I want to make sure you guys finished whatever you were saying.

52 00:02:59.130 00:03:01.219 Mustafa Raja: Yeah, we were just checking out one each other.

53 00:03:01.940 00:03:06.139 Samuel Roberts: Okay, oh, good. Okay, excellent. I love that. Thank you, guys.

54 00:03:06.140 00:03:06.620 Mustafa Raja: happy.

55 00:03:06.770 00:03:08.940 Samuel Roberts: Good.

56 00:03:09.530 00:03:10.420 Samuel Roberts: Good.

57 00:03:13.000 00:03:15.839 Samuel Roberts: Do you guys do anything fun this weekend, or do you guys have relaxing.

58 00:03:15.840 00:03:16.720 Mustafa Raja: I do.

59 00:03:17.090 00:03:21.910 Mustafa Raja: I had some, friends come over from different cities to my city.

60 00:03:22.130 00:03:25.389 Mustafa Raja: Oh, nice. I just spent the whole weekend with them.

61 00:03:26.100 00:03:29.040 Samuel Roberts: That’s… that’s… that convenience. How about you, Casey?

62 00:03:30.600 00:03:36.230 Casie Aviles: Just… Stuff, just, home stuff, like cleaning,

63 00:03:37.360 00:03:42.350 Casie Aviles: What else did I do? And, yeah, just, you know, cooking, stuff like that.

64 00:03:42.580 00:03:44.320 Casie Aviles: Just chill.

65 00:03:44.320 00:03:45.500 Samuel Roberts: That’s good, yeah.

66 00:03:46.800 00:03:49.099 Samuel Roberts: Yeah, we had a little bit of oxygen.

67 00:03:49.370 00:03:50.210 Samuel Roberts: Yeah.

68 00:03:50.350 00:03:50.980 Samuel Roberts: What do you mean?

69 00:03:50.980 00:03:51.670 Casie Aviles: Sorry.

70 00:03:52.250 00:03:53.500 Mustafa Raja: What did you cook?

71 00:03:53.680 00:04:01.200 Casie Aviles: Oh, it’s… well, we call it pansit. It’s… I’m not sure if it’s… it’s like a nodal with…

72 00:04:01.200 00:04:02.900 Samuel Roberts: Yeah, I know Penza, yeah.

73 00:04:03.420 00:04:06.439 Casie Aviles: Yeah, it’s like… it’s like no dose, but…

74 00:04:07.320 00:04:10.500 Casie Aviles: You, you add, like, different…

75 00:04:11.300 00:04:16.180 Casie Aviles: what do you call these ingredients, like vegetables and some meat. So, it’s something that…

76 00:04:16.730 00:04:20.230 Casie Aviles: You know, we usually eat when there’s…

77 00:04:20.440 00:04:26.380 Casie Aviles: in the weekends, or when there’s, like, a party, but it’s not really a celebration. We just.

78 00:04:26.380 00:04:26.730 Samuel Roberts: Hmm.

79 00:04:26.730 00:04:29.229 Casie Aviles: I just felt like cooking that, so…

80 00:04:29.230 00:04:29.820 Samuel Roberts: Nice.

81 00:04:30.990 00:04:32.389 Samuel Roberts: Sometimes that’s all you need.

82 00:04:33.130 00:04:34.270 Samuel Roberts: You weren’t familiar.

83 00:04:34.810 00:04:35.930 Samuel Roberts: Making it.

84 00:04:36.540 00:04:37.250 Samuel Roberts: Cool.

85 00:04:41.680 00:04:46.540 Samuel Roberts: Yeah, well, I didn’t really have anything specific today, I saw…

86 00:04:46.540 00:04:48.149 Pranav: I think Mustafa.

87 00:04:48.150 00:04:48.559 Samuel Roberts: Oh, go ahead.

88 00:04:48.560 00:04:52.129 Pranav: After the meeting on Friday, right? Or did I? I’m forgetting.

89 00:04:52.940 00:04:54.650 Samuel Roberts: I’m sorry, repeat that, I didn’t catch the beginning.

90 00:04:55.080 00:05:00.310 Pranav: I don’t know if I was able to sync with you guys after my meeting with Janiece and Yvette on Friday.

91 00:05:01.800 00:05:03.569 Samuel Roberts: I know, I don’t know.

92 00:05:04.130 00:05:07.220 Pranav: Okay, yeah, I can kind of give just, like, a high-level…

93 00:05:07.580 00:05:14.349 Pranav: Like, how that went. Went really well. They’re excited about the next two products that we’re gonna start working on this week.

94 00:05:14.350 00:05:15.400 Samuel Roberts: So…

95 00:05:16.120 00:05:18.009 Pranav: And then also, like…

96 00:05:18.730 00:05:28.920 Pranav: In the meantime, like, they are mentioning, like, you know, that they’re just noticing small things here and there, nothing seems big, everything seems to be pretty good.

97 00:05:29.850 00:05:39.659 Pranav: I think what we’re all noticing, though, is, like, there is a lot more of, like, these notifications we’re getting in that ABC Logs channel of slow execution time.

98 00:05:39.910 00:05:45.399 Pranav: At least I feel like I’m noticing it a lot more. Would you guys say the same thing?

99 00:05:46.570 00:05:55.250 Casie Aviles: Yeah, it’s because the threshold is just 5 seconds, but I think for, at least for the start of today, it’s around…

100 00:05:55.910 00:05:59.969 Casie Aviles: I’ve… Yeah, it’s around 5.

101 00:06:00.880 00:06:07.260 Casie Aviles: And then… Oh, is the threat Sorry?

102 00:06:07.800 00:06:10.669 Pranav: Was the threshold different before? I thought it was… it just stayed the same.

103 00:06:11.140 00:06:13.279 Casie Aviles: Yeah, it’s the same, it’s still 5.

104 00:06:14.980 00:06:19.450 Casie Aviles: But it’s just, like, a couple milliseconds above 5.

105 00:06:19.620 00:06:20.840 Casie Aviles: Or…

106 00:06:20.840 00:06:21.780 Samuel Roberts: Oh, yeah.

107 00:06:21.780 00:06:23.450 Casie Aviles: Slower than it will be.

108 00:06:23.650 00:06:28.969 Casie Aviles: 6. I think the longest I’ve saw it today is 8 seconds.

109 00:06:30.220 00:06:35.400 Casie Aviles: But that’s… That’s one record, or, like, one log for now.

110 00:06:36.000 00:06:38.749 Casie Aviles: Oh, wait, there’s 9… That’s true.

111 00:06:39.910 00:06:44.049 Pranav: Yeah, there’s some other… there’s one other one, 9 today.

112 00:06:44.370 00:06:45.919 Pranav: But, yeah, I think…

113 00:06:46.140 00:06:51.449 Pranav: I think it could also just be, like, increased usage. So what we saw last… when I was looking at,

114 00:06:51.870 00:06:55.939 Pranav: when I removed the QAers, right, so all the trainers and Janiece.

115 00:06:56.620 00:07:00.250 Pranav: There’s actually increased usage just amongst ANDI itself.

116 00:07:00.690 00:07:05.540 Pranav: Like the production ending. So we saw a huge spike in just…

117 00:07:05.940 00:07:09.940 Pranav: The production and the usage, as well as, you know, the…

118 00:07:10.250 00:07:14.320 Pranav: from 0 to 100 of the QA usage, so…

119 00:07:14.440 00:07:32.420 Pranav: maybe that’s another thing. Like, the… I think production usage is just gonna continue to increase, if not just at least stay the same to, like, where it was last week. So, I think what we were seeing before was, like, 500-ish, between 500 to 600. Last week, we saw 700.

120 00:07:33.230 00:07:40.459 Pranav: For the… for production and usage, so… I don’t really think that should change in terms of, like, the execution time.

121 00:07:42.020 00:07:42.750 Pranav: But…

122 00:07:42.750 00:07:49.410 Samuel Roberts: No, ideally not, but… I mean, yeah, they’re not the ones triggering the Gemini slowdown anyway, you know what I mean? Like…

123 00:07:50.630 00:07:54.650 Samuel Roberts: it’s not like their volume is gonna affect that overall, I think, anymore.

124 00:07:54.650 00:07:55.260 Pranav: Yeah.

125 00:07:55.690 00:08:01.219 Samuel Roberts: I thought that might have been an issue, but I think it’s a, like, broader Google thing.

126 00:08:03.040 00:08:05.920 Pranav: Yeah, that’s really annoying, like, so… like…

127 00:08:06.130 00:08:09.599 Pranav: We kind of built this in a way where we were hoping to, like.

128 00:08:09.820 00:08:13.209 Pranav: have, like, more levers, right? To, like… Yeah.

129 00:08:13.470 00:08:20.790 Pranav: Decrease execution time, so, like, what is, like, the game plan, I guess? Because, like…

130 00:08:22.580 00:08:28.229 Pranav: it’s not as easy as just swapping in a different key, because we’re on GCP, we need to be using, like, the…

131 00:08:28.230 00:08:31.389 Samuel Roberts: That’s the other, yeah, the other part of it was their stuff, so…

132 00:08:31.390 00:08:32.230 Pranav: Yeah.

133 00:08:32.720 00:08:35.390 Pranav: Or do we ask them to provision us

134 00:08:35.580 00:08:37.800 Pranav: a separate API key, and is that something.

135 00:08:39.510 00:08:46.890 Samuel Roberts: Yeah, I… we… I mean, Tim was asking about Vertex, which is, like, the other part of the Google AI ecosystem.

136 00:08:47.010 00:08:51.989 Samuel Roberts: that I’m not clear if that will fix this problem or not.

137 00:08:52.480 00:08:57.720 Samuel Roberts: It… it doesn’t seem like it’s the same thing as, like, provisioning our own…

138 00:08:58.710 00:09:01.410 Samuel Roberts: like, Azure instances of stuff.

139 00:09:03.020 00:09:18.549 Samuel Roberts: it seems like it’s just, like, a way… another way to, access the APIs. And so, from what I’ve been reading, like, other people complaining online, it seems like it has just as bad rate limits, if not… or not bad rate limits, but bad overall issues, where people are hitting rate limits early.

140 00:09:19.410 00:09:20.880 Samuel Roberts: Earlier than they expect.

141 00:09:21.190 00:09:29.329 Samuel Roberts: So I don’t know if that’s the solution. The other thought was, in case, I don’t know what happened with that Friday, the, like, testing other models for fallbacks, potentially?

142 00:09:30.780 00:09:40.650 Samuel Roberts: Because there were some that were timing out completely, and then there were some that were just slow. And so I think, at least knowing how the other models behave, we can maybe start rerouting things some other ways.

143 00:09:42.150 00:09:46.829 Casie Aviles: Yeah, I’ve yet to run the script, but… Okay. I think I should be able to do it.

144 00:09:47.470 00:09:48.340 Casie Aviles: Okay.

145 00:09:48.340 00:09:58.650 Samuel Roberts: Yeah, the other thing, I was playing around with this morning is the… the observability that, Mustafa added to the online Monster Studio.

146 00:09:59.160 00:10:11.179 Samuel Roberts: Which we, I guess, had previously, but then when we migrated, it kind of got turned off, but now it’s back on. So, if you go to that, Monster Studio link, you can see more of the detail of the runs.

147 00:10:11.560 00:10:12.480 Samuel Roberts: Mmm.

148 00:10:12.680 00:10:14.740 Samuel Roberts: And where things are slowing down.

149 00:10:15.750 00:10:16.440 Mustafa Raja: Yeah, and…

150 00:10:16.440 00:10:17.630 Samuel Roberts: Through the whole waterfall.

151 00:10:18.130 00:10:22.920 Mustafa Raja: Yeah, and now we can actually see the tool calls also. Exactly.

152 00:10:22.920 00:10:23.590 Samuel Roberts: Exactly.

153 00:10:23.590 00:10:26.759 Mustafa Raja: We’re not able to see, so this… these eyes are not.

154 00:10:27.230 00:10:39.249 Samuel Roberts: Yeah, so now we can actually dive in and see, like, okay, this is the part that’s slowing down, can we optimize? Like, I’m seeing here, it took, like, 8 seconds for the first… on just one random one I’m looking at, the very last one, I guess.

155 00:10:39.630 00:10:43.330 Samuel Roberts: you know, it… I’m still clicking through a lot of these just to see how…

156 00:10:43.610 00:10:49.390 Samuel Roberts: consistent these things are, but it looks like it took 8 seconds. I can share my screen real quick, because I want to…

157 00:10:50.090 00:10:52.649 Samuel Roberts: just demo this, I guess, really briefly, but…

158 00:10:54.670 00:10:57.179 Samuel Roberts: Like, here’s the run from… oops.

159 00:10:57.280 00:10:59.200 Samuel Roberts: Sorry, I moved the wrong thing, there we go.

160 00:10:59.970 00:11:05.750 Samuel Roberts: So, this is the CSR running workflow, which is, like, the main one. You can see it took 40 seconds.

161 00:11:05.900 00:11:11.059 Samuel Roberts: Overall, I’m not… Totally clear on…

162 00:11:11.610 00:11:18.180 Samuel Roberts: what times we’re logging to the Slack bot? That, like, is it sums of these, you think?

163 00:11:19.200 00:11:19.770 Mustafa Raja: Yes.

164 00:11:19.770 00:11:22.619 Samuel Roberts: So then… okay, because then there’s, like, this one down here.

165 00:11:22.840 00:11:26.819 Samuel Roberts: I don’t know if this is… like, where is it returning in this strip? Is it…

166 00:11:29.290 00:11:32.970 Samuel Roberts: Like, there’s an output processor, I… Sorry?

167 00:11:33.300 00:11:36.820 Mustafa Raja: I think if you’d screw up, Under June?

168 00:11:38.160 00:11:43.210 Mustafa Raja: So, you’re trying to see which step returns to which one, right?

169 00:11:43.440 00:11:52.009 Samuel Roberts: Well, so my thought was, when I saw, like, what is this chunk text step zero? I also think maybe we need to go into the code and name some of the steps, because I think these steps…

170 00:11:52.280 00:11:54.180 Samuel Roberts: Can be named, potentially?

171 00:11:54.310 00:11:56.940 Samuel Roberts: And that might help us identify them better.

172 00:11:58.830 00:12:05.049 Samuel Roberts: But, like, this one, explain the relevancy score where 0 is the lowest one for the highest LM. Is this post the…

173 00:12:05.690 00:12:07.340 Samuel Roberts: Return to the user, though?

174 00:12:07.340 00:12:10.229 Mustafa Raja: This, I think, I think this is just a scorer.

175 00:12:11.100 00:12:14.169 Samuel Roberts: Right, but that’s what I’m saying, that’s not holding up the return, is my point.

176 00:12:14.760 00:12:21.460 Samuel Roberts: It’s not. Okay, that’s what I thought, I just… I was like, that seems like it… we would notice if it was taking an extra 10 seconds every time.

177 00:12:22.880 00:12:29.450 Samuel Roberts: Okay, that’s… I… that’s what I thought, but I… it was hard for me to tell, because, like, step 0, step 0, step zero, I wasn’t sure if these were all…

178 00:12:30.070 00:12:34.890 Samuel Roberts: Things we could name. But even up here, like, This took 8…

179 00:12:35.960 00:12:38.909 Samuel Roberts: seconds to do the actual LLM call, it looks like.

180 00:12:39.730 00:12:43.129 Samuel Roberts: That still seems like the biggest chunk of things.

181 00:12:45.570 00:12:46.370 Samuel Roberts: Right.

182 00:12:46.920 00:12:56.790 Samuel Roberts: So, I don’t know what the best solution here… if they’re committed to running on GCP, I feel like we’re in a little bit of a bind trying to figure that out with Google’s infra.

183 00:12:59.180 00:13:03.129 Samuel Roberts: But I also need to do a little more digging into the vertex, because maybe there’s something there we’re missing.

184 00:13:06.030 00:13:15.619 Samuel Roberts: Yeah, besides that, I don’t know where else we can, like, optimize presently. You know, I thought… my thinking was that as we moved over to this, and we got to the…

185 00:13:15.960 00:13:27.600 Samuel Roberts: like, new Monster Flow, and the… running on their servers and stuff, that we’d be able to start shaving, you know, milliseconds places, rather than, like, 8 seconds, do you know what I mean?

186 00:13:28.010 00:13:28.610 Pranav: Right.

187 00:13:29.090 00:13:32.979 Samuel Roberts: So I… we’re just playing a different ballgame all of a sudden, because Gemini is just…

188 00:13:33.420 00:13:35.290 Samuel Roberts: not keeping up, so I don’t know.

189 00:13:38.170 00:13:43.609 Samuel Roberts: And it did seem like it was a certain time of day, as well, that we saw, so I’m just not sure where to, like…

190 00:13:43.950 00:13:44.420 Pranav: Yeah.

191 00:13:44.420 00:13:45.120 Samuel Roberts: that, either.

192 00:13:45.120 00:13:46.650 Pranav: That, actually, like…

193 00:13:46.780 00:13:54.889 Pranav: Casey, I remember, like, with that report, like, usually we were doing those reports, like, end of day, is what I remember. That probably…

194 00:13:55.240 00:14:01.009 Pranav: like, agrees with, kind of, the downtime variation that you showed.

195 00:14:01.970 00:14:05.500 Pranav: Or I guess, like, the… the execution…

196 00:14:06.080 00:14:13.230 Pranav: like, time increasing and decreasing, like, over the course of the day. So… That makes sense.

197 00:14:13.650 00:14:21.000 Pranav: if we could… I don’t think it makes sense, like, we don’t need to rerun that report, we’re already seeing the execution time varying, like, throughout the day. Yeah, I think…

198 00:14:21.340 00:14:26.740 Pranav: right now, right? Like, we just saw, like, another couple logs come in, one said 10, or…

199 00:14:27.300 00:14:28.570 Pranav: 12 seconds on here, too.

200 00:14:28.840 00:14:29.830 Casie Aviles: Yeah, 13.

201 00:14:30.350 00:14:33.430 Pranav: 13, 12… yeah.

202 00:14:34.560 00:14:40.030 Pranav: So… What I would want to know is, like, before we…

203 00:14:40.380 00:14:50.809 Pranav: Because I could see them, if they… like, if they’re noticing 12-second times, like, it’s only a matter of time that they come back to us, and I kind of want to preemptively go to them and be like, hey, we’re noticing these things.

204 00:14:51.340 00:14:59.420 Pranav: Yeah, definitely. But what I would like to come forward with is, like, okay, if we’re to substitute this Gemini LLM

205 00:14:59.890 00:15:04.980 Pranav: with, you know, Claude, or whichever other model that

206 00:15:05.400 00:15:11.509 Pranav: we decide for them. We should show them, like, okay, this is gonna solve the problem, right? Definitely.

207 00:15:13.500 00:15:20.700 Pranav: what, like… Sam, like, I’m thinking, like, we just run another report, I guess, on, like, Or…

208 00:15:21.610 00:15:26.820 Pranav: I don’t know, like, in parallel, but then we have to, like, use our own API key, I guess, to show that.

209 00:15:26.930 00:15:38.009 Pranav: Or do you just feel very confident that, like, this would be, like, solving the issue? And then what we can do is, like, run a spike, like, over the course of a couple days, and, like, let them… and, like.

210 00:15:38.300 00:15:42.420 Pranav: They can let us know if, like, they’re noticing lower execution time.

211 00:15:44.950 00:15:50.269 Samuel Roberts: I’m sorry, repeat, so you’re saying, come to them with something, or change something initially? I missed that.

212 00:15:51.510 00:15:56.760 Pranav: I just want to come to them with, like, a game plan that we feel Like, so…

213 00:15:56.760 00:15:57.210 Samuel Roberts: Okay, yeah.

214 00:15:57.210 00:16:03.020 Pranav: it sounds like we feel really confident that if we were to swap the, you know, the Gemini…

215 00:16:03.650 00:16:12.540 Pranav: provider, or whichever… what are we using? 3.1 Flash. If we swap 3.1 Flash for something else, right?

216 00:16:12.670 00:16:25.200 Pranav: Do we feel pretty confident? Or, like, yeah, do we feel like we’ve ex… we know for sure that this is going to be the thing that severely brings down the execution time? Like, we won’t have these, like…

217 00:16:25.890 00:16:30.139 Pranav: over 6-second, queries anymore? Or at least as.

218 00:16:30.140 00:16:35.060 Samuel Roberts: Yeah, I think… I think we need to test some other models, basically. That’s what,

219 00:16:35.590 00:16:48.440 Samuel Roberts: Casey and I had talked about just, like, a few of the older models, just making sure, because it… I don’t know, like, I don’t think this Gemini 3.1 flash preview should be this slow, normally, and we’re seeing it’s not always. So I think…

220 00:16:48.530 00:16:59.520 Samuel Roberts: the thought was, if there’s other models that we can fall back to, maybe in certain times of day, maybe if we see a couple runs do something, it adjusts. The other thing is to potentially figure out,

221 00:17:00.010 00:17:05.339 Samuel Roberts: if Vertex gives us access to, like, a dedicated instance, or not.

222 00:17:05.520 00:17:15.250 Samuel Roberts: And then the other thing is, does it… how badly does it need to be their GCP? Are they able to, say, provision or give us a key that works with…

223 00:17:15.910 00:17:28.289 Samuel Roberts: you know, like you said, Claude, or OpenAI, or whatever, like, Open Router even, maybe we could test a few other things. I just don’t know how that’s gonna go for them, so I feel like that’s a… I’d rather keep it on their info as much as possible, but…

224 00:17:29.990 00:17:33.810 Samuel Roberts: If we need to, maybe we move off that, but that seems like a…

225 00:17:34.020 00:17:38.479 Samuel Roberts: We can talk about that, we can find out from Tim what that might entail, maybe?

226 00:17:42.020 00:17:46.560 Samuel Roberts: But I think the bigger thing is… is potentially figuring out, like, if Vertex can give it to us.

227 00:17:47.110 00:17:50.889 Samuel Roberts: And we just… Need to switch that over at some point.

228 00:17:51.330 00:17:53.000 Samuel Roberts: I’m not convinced of that yet.

229 00:17:54.410 00:17:59.930 Samuel Roberts: But that… those are kind of the options. Like, if Vertex can help us run a dedicated thing, the kind of way Azure did.

230 00:18:00.790 00:18:01.900 Samuel Roberts: if…

231 00:18:02.670 00:18:09.100 Samuel Roberts: Any of the other models that we can fall back to do perform better, maybe, because they’re not being hit as hard.

232 00:18:09.420 00:18:10.240 Samuel Roberts: Or…

233 00:18:10.240 00:18:18.229 Pranav: But I guess one thing is, like, setting up Vertex, right? Like, there’s gonna be… is there gonna be a time associated with that, or is it just as easy as provisioning a key?

234 00:18:19.920 00:18:22.589 Samuel Roberts: I have to do the re… we have to spike on the vertex stuff, I think.

235 00:18:23.060 00:18:30.460 Pranav: Yeah, it seems like it’s not gonna be as simple as just, like, going to, like, platform.openai and then just create new API key.

236 00:18:31.820 00:18:41.500 Samuel Roberts: Right, that’s what I’m saying. So, like, that’s why we went this way instead of Vertex, was because Vertex is, like, more complicated and not, like… we didn’t think it was necessary for what we were doing. I’m now not sure… because I thought…

237 00:18:41.710 00:18:45.789 Samuel Roberts: I didn’t think we’d run into this problem using just the Gemini API.

238 00:18:46.270 00:18:48.539 Pranav: Yeah, so I guess what I’m, like…

239 00:18:49.150 00:18:55.549 Pranav: I don’t want to come to the table with, like, hey, like, we can implement this other thing that is going.

240 00:18:55.550 00:19:07.889 Samuel Roberts: Yeah, I’m saying, I’m telling you the options that I see for the path forward overall. How we present it might be different. I think we need to spike on the vertex, we need to maybe find out how complex it would be to switch if that’s the path we need to go down.

241 00:19:08.730 00:19:14.790 Pranav: I think… But also, instead of spiking on virtual first, it’s just like, we…

242 00:19:15.630 00:19:25.919 Pranav: Well, first, Casey’s gonna run this script to see, okay, with these backup models, or these other models, are we seeing the same, execution time issues?

243 00:19:25.920 00:19:26.610 Samuel Roberts: Correct.

244 00:19:27.280 00:19:31.059 Pranav: And are these models all gonna be Gemini models?

245 00:19:31.790 00:19:37.509 Samuel Roberts: Well, that’s what we had right now, because that’s what I was thinking, like, if we’re on GCP, that’s what we can access, but…

246 00:19:37.600 00:19:52.699 Samuel Roberts: It’s probably worth, like, testing against our own Azure, or testing, like, just using our own API key for a quick test to see, like, okay, is Gemini really being a terrible bottleneck overall? Or Google in general, you know, not just any specific Gemini. And that would then…

247 00:19:53.010 00:19:59.989 Samuel Roberts: guide us a little bit towards maybe saying, okay, these models just can’t handle what we’re doing, or their infra can’t handle what we’re doing, at certain.

248 00:19:59.990 00:20:00.320 Pranav: Yeah.

249 00:20:00.320 00:20:08.600 Samuel Roberts: So I think, yeah, Casey, we may want to, in addition to testing those ones… test against our,

250 00:20:09.550 00:20:15.290 Samuel Roberts: Azure, fast ones. I forget what we still have standing on there. I think 4.0’s going away, right?

251 00:20:15.290 00:20:15.790 Casie Aviles: Oh.

252 00:20:15.790 00:20:19.409 Samuel Roberts: We can look at the list and figure out what would be good to test against for…

253 00:20:20.250 00:20:21.540 Samuel Roberts: For speed there.

254 00:20:22.600 00:20:23.620 Samuel Roberts: Okay.

255 00:20:25.400 00:20:28.399 Samuel Roberts: The other element… oh, sorry, go ahead.

256 00:20:29.300 00:20:34.690 Casie Aviles: Yeah, I was just… I’m just thinking about whether, like, the script itself

257 00:20:35.200 00:20:40.580 Casie Aviles: It’s different from, like, when it’s live, that we’re testing, because

258 00:20:41.070 00:20:43.120 Casie Aviles: When we were running the scripts.

259 00:20:43.480 00:20:49.099 Casie Aviles: It was showing pretty good time, or execution times.

260 00:20:49.860 00:20:50.270 Casie Aviles: But when.

261 00:20:50.270 00:20:51.339 Samuel Roberts: That is a good point.

262 00:20:51.490 00:20:56.780 Casie Aviles: when we started using it live, like, It’s starting to show…

263 00:20:57.490 00:21:01.769 Casie Aviles: Execution times that are not as fast as the ones we got.

264 00:21:02.810 00:21:04.340 Samuel Roberts: Yeah, I’m also…

265 00:21:04.440 00:21:07.899 Samuel Roberts: I mean, we talked a little bit about the time of day we ran some of those tests.

266 00:21:09.260 00:21:13.550 Samuel Roberts: as a factor, so I’m wondering if that’s something, like, we should run this…

267 00:21:13.780 00:21:21.040 Samuel Roberts: you know, two times a day, and see how it compares to, like, an evening, like, US time, and a… in the middle of that

268 00:21:21.500 00:21:25.640 Samuel Roberts: That, like, spike of latency we saw from, like, 10 to 1.

269 00:21:26.030 00:21:26.700 Casie Aviles: Okay.

270 00:21:27.860 00:21:32.040 Samuel Roberts: I think, at least for…

271 00:21:32.330 00:21:34.950 Samuel Roberts: the Gemini models, that might help us understand.

272 00:21:35.540 00:21:46.690 Samuel Roberts: So that even might be a thing to just test on its own, just, like, run a script of a bunch of questions against its… as it currently stands, and see how our latency on that script compares to what we’re seeing live.

273 00:21:47.260 00:21:52.190 Samuel Roberts: coming from the CSRs, because if those don’t match, then there’s something else going on here.

274 00:21:56.090 00:21:57.330 Casie Aviles: Okay. Yeah.

275 00:21:57.650 00:21:59.440 Samuel Roberts: So, yeah, I think…

276 00:22:00.440 00:22:09.529 Samuel Roberts: Order of operations on that then becomes, yeah, probably run a script that tests against the current app as is, with,

277 00:22:11.570 00:22:19.040 Samuel Roberts: With some basic questions at the right time, and then probably at the evening as well, when we ran it last time, and see, are those… do those diverge, kind of, as well?

278 00:22:19.780 00:22:23.839 Samuel Roberts: At the same time run… Some of the other models.

279 00:22:24.460 00:22:29.810 Samuel Roberts: And, see how those compare at the same times.

280 00:22:29.960 00:22:31.000 Samuel Roberts: And then…

281 00:22:31.580 00:22:36.150 Samuel Roberts: I don’t know if the time of day matters as much for testing something on our Azure, but…

282 00:22:36.740 00:22:48.990 Samuel Roberts: just see, like, are we able to reduce that time drastically? Because I… I just feel like Azure wasn’t quite this bad, but maybe there were other things, and we… it’s just… this is surfacing, just, like, the model being the bottleneck now, because we got it out of N8N. I don’t know.

283 00:22:49.360 00:22:55.629 Samuel Roberts: I think… Those tests, so the current one, as is.

284 00:22:56.470 00:23:00.419 Samuel Roberts: Twice a day, or two times today, maybe, via…

285 00:23:00.600 00:23:04.319 Samuel Roberts: Backup models as well, 2 times a day, and then testing against…

286 00:23:04.570 00:23:09.639 Samuel Roberts: like, something fast on our Azure, or something we know is not gonna have… A spike in latency.

287 00:23:10.670 00:23:13.269 Samuel Roberts: would be the… The main things to do.

288 00:23:14.120 00:23:15.580 Samuel Roberts: Does that sound like a plan?

289 00:23:18.410 00:23:20.880 Casie Aviles: Cool. Yeah, that makes sense for me.

290 00:23:21.250 00:23:26.629 Samuel Roberts: Okay, I think at the same time, we can… Spike on Vertex.

291 00:23:26.740 00:23:35.259 Samuel Roberts: just to figure out, one, what it might take for us to switch over. We can check with Tim about… once we determine if that’s even worth it.

292 00:23:35.630 00:23:39.550 Samuel Roberts: Check with Tim about what that might entail.

293 00:23:41.520 00:23:44.530 Samuel Roberts: But I’m not really convinced that’s worth it yet.

294 00:23:46.390 00:23:57.950 Samuel Roberts: But the spike will help with that. And then, I mean, the other thing we can start fiddling with is, like, how we’re making these calls, how we’re doing the routing, like, there’s some stuff there, but I feel like if we have to make a model call.

295 00:23:58.430 00:23:59.650 Samuel Roberts: Like, that’s the…

296 00:24:00.390 00:24:06.809 Samuel Roberts: It’s such a big difference compared to, like, the little bits we might gain from, like, prompt tweaks or something, I think.

297 00:24:07.000 00:24:11.170 Samuel Roberts: Unless we can really just, like, simplify.

298 00:24:11.170 00:24:14.769 Pranav: Like… I don’t even want to…

299 00:24:14.980 00:24:27.339 Pranav: suggest, like, I don’t… like, I want them to kind of force us down going vertex, if that’s what they want, you know? Like, if they’re not going to provision us an API key.

300 00:24:27.340 00:24:28.020 Samuel Roberts: Yeah.

301 00:24:28.190 00:24:31.170 Pranav: Or, like, OpenAI, or Claude, or…

302 00:24:31.290 00:24:46.099 Pranav: if that’s the case, then we can, like, okay, they’re telling us we need to go and use Vertex AI, then okay. Then we’ll do that, but I don’t really want to spend time right now on, like, looking into that.

303 00:24:46.100 00:24:46.570 Samuel Roberts: Okay.

304 00:24:46.590 00:24:50.220 Pranav: Just because I know it’s going to be, like, way more complicated.

305 00:24:50.560 00:24:51.630 Pranav: then…

306 00:24:51.790 00:24:56.749 Pranav: just getting a new API key for a different provider, and just pasting it into the app right now.

307 00:24:56.990 00:24:59.379 Pranav: And so, like, I think with.

308 00:24:59.380 00:25:00.580 Samuel Roberts: Yeah.

309 00:25:00.990 00:25:06.840 Pranav: with Casey’s research right now, like, we’ll be able to… just,

310 00:25:07.240 00:25:15.460 Pranav: see, okay, with the models that are just straight up, just plug-in to the app, we’ll be able to see, is there any difference?

311 00:25:15.680 00:25:16.820 Pranav: Now…

312 00:25:17.100 00:25:28.440 Pranav: if we’re running into the same issues, then we can… we can let them know, like… and then I think, Casey will also use Azure, right, to… to also, against the ones that we have in GCP right now.

313 00:25:28.630 00:25:30.070 Pranav: use other models.

314 00:25:30.740 00:25:42.030 Pranav: That are, I guess, part of our account, so we’ll get billed, but just for today’s script, it should be okay. It will have data to be like, hey, all we need is you guys to provision

315 00:25:42.260 00:25:49.399 Pranav: like, or for you guys to give us an API key for this specific model right here, and then all the issues will be solved.

316 00:25:49.730 00:25:57.240 Pranav: Now, if they’re like, okay, well, we can’t do that because we need you guys to use Vertex AI, then we can be like, okay, then let us look into that.

317 00:25:57.740 00:26:02.060 Pranav: But… I want to come to the table with them, because, like, that, I would say, is just…

318 00:26:02.210 00:26:06.379 Pranav: an annoying hoop for us to jump through, right? If we have to set up vertex.

319 00:26:08.740 00:26:13.800 Samuel Roberts: Yeah, yeah, I think… I’m just not convinced if that will even solve this or not, based on what I’m seeing people talk about the day.

320 00:26:13.800 00:26:17.849 Pranav: Yeah, that’s another thing. So, we don’t even know if that’s gonna solve it, so…

321 00:26:17.850 00:26:18.709 Samuel Roberts: Yeah, that’s what…

322 00:26:18.710 00:26:23.270 Pranav: Do you feel the most confident that, like, if we just threw an OpenAI key in there, that it’s gonna solve the issue?

323 00:26:23.910 00:26:26.550 Samuel Roberts: That’s what, yeah, that’s, I think, what the test will hopefully show.

324 00:26:26.930 00:26:33.520 Pranav: Yeah, okay, cool. Alright, so, yeah, before the spike, let’s just, Casey, let us know when you, get that.

325 00:26:33.640 00:26:41.309 Pranav: that script run, and then you get a report, and then maybe we can hop into, like… or you can just update us on Slack, and we can hop into a call if needed.

326 00:26:41.880 00:26:42.870 Casie Aviles: That works. Huh.

327 00:26:43.280 00:26:44.060 Pranav: Cool.

328 00:26:46.630 00:26:47.569 Pranav: Alright, guys.

329 00:26:47.730 00:26:55.650 Pranav: There’s a few other things flat, but, I mean, it’s, like, in linear, but…

330 00:26:56.320 00:27:01.480 Pranav: Yeah, I think, Casey, we ended the week saying, like, I think you were gonna…

331 00:27:01.690 00:27:04.120 Pranav: Work on today as well, just like…

332 00:27:04.340 00:27:19.970 Pranav: the real dashboard for the weekly usage report, they really liked those insights that, that you, in that Google Sheet that you made, Casey. So, they’re really excited about that real dashboard. So, whenever we have that up and running, I’ll let them know.

333 00:27:20.480 00:27:27.299 Pranav: If we can get that up today or tomorrow, that’d be great. I see a ticket here for the script you’re gonna make.

334 00:27:27.400 00:27:40.080 Pranav: Yeah, that tester filter as well, so they can differentiate between what the QA test, usage was versus just production. That’s gonna be helpful.

335 00:27:41.210 00:27:47.579 Pranav: Let’s see… did we, did we spin down the QA, Andy?

336 00:27:48.580 00:27:50.179 Casie Aviles: Yeah, I, I, it’s not…

337 00:27:50.590 00:27:56.580 Casie Aviles: I did… I’ve disabled it. I’m not sure if people are… I don’t think people are using it anymore.

338 00:27:57.170 00:27:59.819 Pranav: Okay, perfect. Yeah, I just don’t want people using that.

339 00:28:00.120 00:28:02.939 Samuel Roberts: Yeah, everything should be production now, basically.

340 00:28:02.940 00:28:05.070 Pranav: Perfect, perfect. Okay.

341 00:28:05.420 00:28:10.110 Pranav: And then, I think the main bug that they were noticing was just, like, with zip codes.

342 00:28:11.260 00:28:26.750 Pranav: For a reason for, like, some queries, maybe it’s on their phrasing, but, like, something that we probably need to be, we need to be the ones fixing is some of these querying issues that we’re noticing, specifically for zip code. So, any progress on…

343 00:28:28.500 00:28:33.930 Casie Aviles: Yeah, sorry, sorry. Yeah, I was just working on it earlier, before the poll, so I have some progress there.

344 00:28:34.730 00:28:35.340 Pranav: Cool.

345 00:28:36.420 00:28:37.230 Pranav: Okay.

346 00:28:37.550 00:28:40.549 Pranav: I think everything’s tracked, though, so that’s good.

347 00:28:41.530 00:28:43.019 Pranav: Yeah, I think that’s it for me.

348 00:28:46.950 00:28:56.500 Samuel Roberts: Okay, I see, I’m looking at the thing you posted on Friday at the end of the day. Yeah, it looks like it’s that ABC thing with rodents is definitely a problem. I’m wondering if there’s other things we can preemptively…

349 00:28:56.760 00:28:58.819 Samuel Roberts: Catch there, too, if there’s other…

350 00:29:00.020 00:29:05.570 Samuel Roberts: services that have a weird, like, way they might ask, that that would be something we can head off, but I’m not sure.

351 00:29:05.570 00:29:08.590 Casie Aviles: Yeah. The issue there was that…

352 00:29:09.270 00:29:12.860 Casie Aviles: We did not, like, standardize

353 00:29:13.240 00:29:16.990 Casie Aviles: A level, column in the past.

354 00:29:17.190 00:29:22.779 Casie Aviles: So right now, what I did to fix that was to add, like, a level column.

355 00:29:23.190 00:29:24.730 Casie Aviles: So that would be…

356 00:29:25.120 00:29:31.290 Casie Aviles: what Andy will query, so you… it would look for, like, A, B, or C level, and then…

357 00:29:31.960 00:29:35.510 Samuel Roberts: Okay, and so it won’t be part of the name of the service, then, anymore.

358 00:29:35.510 00:29:39.890 Casie Aviles: Yeah, that was, like, what’s making it confusing, because… Oh, good.

359 00:29:40.440 00:29:44.839 Casie Aviles: There were so many ways that they would name the services, so…

360 00:29:45.020 00:29:45.620 Samuel Roberts: Yeah.

361 00:29:46.330 00:29:47.620 Samuel Roberts: Okay, good.

362 00:29:47.790 00:29:53.249 Samuel Roberts: Alright, so yeah, so by adding that, we’re already heading off the other sort of stuff I’m talking about here, so…

363 00:29:53.490 00:29:54.300 Samuel Roberts: Cool.

364 00:29:57.270 00:29:58.360 Samuel Roberts: Anything else?

365 00:30:04.080 00:30:05.249 Pranav: I think I’m all set.

366 00:30:06.010 00:30:07.020 Samuel Roberts: Alright, cool.

367 00:30:07.460 00:30:14.280 Samuel Roberts: So yeah, Casey, if you need a hand with those scripts or anything, or just running them as, you know, taking a minute and you want me to run some or something, let me know.

368 00:30:14.720 00:30:18.330 Samuel Roberts: Otherwise, we’ll look out for your… Yeah, I’ll…

369 00:30:18.330 00:30:21.010 Casie Aviles: I’ll start with the test for…

370 00:30:21.590 00:30:27.730 Casie Aviles: the setup as is, right? So, just to see if, like, The execution times match.

371 00:30:28.340 00:30:36.839 Samuel Roberts: Yeah, if that doesn’t match what we’re seeing from the, like, live data, I… then we have to dig in somewhere else, because that doesn’t make sense to me.

372 00:30:36.840 00:30:37.769 Casie Aviles: You know. Okay.

373 00:30:38.140 00:30:45.569 Samuel Roberts: the model calls are such a big chunk of time that I don’t think, like, you know, transit time or any other things really could be causing a problem here.

374 00:30:45.760 00:30:51.410 Samuel Roberts: So I would think… Unless, you know, unless there’s something in the way you do the testing that’s different.

375 00:30:51.590 00:30:54.130 Samuel Roberts: I feel like as long as you’re hitting that endpoint, and…

376 00:30:54.440 00:30:55.750 Casie Aviles: Yeah, that’s essentially…

377 00:30:55.750 00:30:57.250 Samuel Roberts: traces, yeah, I think we should.

378 00:30:57.250 00:30:58.129 Casie Aviles: what I’m doing.

379 00:30:58.480 00:30:58.969 Casie Aviles: I’m just…

380 00:30:58.970 00:31:02.559 Samuel Roberts: Yeah, do that and see what kind of spikes we’re getting, or kind of latency we’re getting, I should say.

381 00:31:02.710 00:31:04.050 Casie Aviles: Okay, okay.

382 00:31:04.560 00:31:07.490 Samuel Roberts: Yeah, so I would say update that, and then we can kind of make the plan from there.

383 00:31:07.680 00:31:09.290 Samuel Roberts: On which ones to run next.

384 00:31:11.580 00:31:12.480 Casie Aviles: Sounds good.

385 00:31:12.810 00:31:14.250 Samuel Roberts: Awesome, thank you so much.

386 00:31:16.370 00:31:18.019 Samuel Roberts: Anything else anyone has?

387 00:31:19.410 00:31:20.080 Mustafa Raja: Nope.

388 00:31:21.010 00:31:21.850 Samuel Roberts: Alrighty.

389 00:31:23.180 00:31:24.110 Samuel Roberts: Thank you, Wong.

390 00:31:24.410 00:31:25.760 Pranav: Cool. Thank you. Thanks, guys.

391 00:31:26.520 00:31:30.999 Samuel Roberts: Alright, Panav, did you take a look at those, tickets in, Markdown, in the Git repo?

392 00:31:31.000 00:31:36.910 Pranav: Yeah, I saw that you posted those. I need to… I need to look through them. Okay. I’ll get back to you soon on that.

393 00:31:37.470 00:31:43.330 Samuel Roberts: That’s cool, yeah, I just, I mean, I just wanted to, you know, make sure that you were aware that they were there, and then, yeah, we can run those and get those in whenever.

394 00:31:43.790 00:31:45.740 Pranav: Yeah, definitely. I appreciate that, thank you.

395 00:31:45.740 00:31:47.760 Samuel Roberts: Okay, cool. Alright. Thanks, Joe.

396 00:31:48.320 00:31:48.940 Pranav: See you guys.

397 00:31:49.630 00:31:50.480 Samuel Roberts: by…