Meeting Title: Hannah <> Casie: ABC Dashboards Date: 2025-09-04 Meeting participants: Hannah Wang, Casie Aviles
WEBVTT
1 00:00:21.690 ⇒ 00:00:24.710 Hannah Wang: Hi. Okay.
2 00:00:25.100 ⇒ 00:00:31.180 Hannah Wang: So, this case study is about measuring AI agents, so…
3 00:00:31.430 ⇒ 00:00:37.589 Hannah Wang: Before we get started, I want to ask what that even means, like measuring an AI agent.
4 00:00:39.830 ⇒ 00:00:48.920 Casie Aviles: Yeah, so for, I guess what that means is basically that, we want to Make sure that…
5 00:00:49.330 ⇒ 00:01:00.859 Casie Aviles: the AI agent is, performing, like, the quality of its output is… acceptable, and… Production ready, so…
6 00:01:01.250 ⇒ 00:01:08.370 Casie Aviles: That is… yeah, that’s basically… so that’s where measurement comes in. We want to know if
7 00:01:09.580 ⇒ 00:01:14.920 Casie Aviles: given a set of metrics, I guess, or criteria…
8 00:01:15.850 ⇒ 00:01:19.179 Casie Aviles: We can start to measure the performance of the agent.
9 00:01:19.940 ⇒ 00:01:24.510 Casie Aviles: And know if it needs some work, like, if it’s performing bad, or…
10 00:01:24.840 ⇒ 00:01:25.289 Hannah Wang: I see.
11 00:01:26.110 ⇒ 00:01:30.339 Casie Aviles: Yeah, if it’s performing good, yeah, stuff like that.
12 00:01:31.070 ⇒ 00:01:35.280 Hannah Wang: So is it essentially just QA? Like, QAing an agent?
13 00:01:37.280 ⇒ 00:01:40.390 Hannah Wang: Quality assurance, or is it different than QA?
14 00:01:42.460 ⇒ 00:01:48.710 Casie Aviles: Yeah, it’s… it’s… it’s also, it’s part… I guess it’s also, you can also look at it as QA,
15 00:01:49.040 ⇒ 00:01:55.849 Casie Aviles: And then… I guess for ABC, for… yeah, for ABC’s case.
16 00:01:56.380 ⇒ 00:02:05.420 Casie Aviles: Other than just QA, we also use it to report on the client, like,
17 00:02:06.050 ⇒ 00:02:16.869 Casie Aviles: We want to also know if the AI is… actually hitting… Or, like, It’s delivering business impact, so…
18 00:02:17.060 ⇒ 00:02:20.969 Casie Aviles: Other than just, like, the technical metrics, like,
19 00:02:21.680 ⇒ 00:02:27.230 Casie Aviles: There’s also, like, the volume of… I guess, like.
20 00:02:27.990 ⇒ 00:02:34.700 Casie Aviles: How… like, usage, that’s one basic example. The usage, ha,
21 00:02:34.960 ⇒ 00:02:37.739 Casie Aviles: I think we also wanted to hit, like.
22 00:02:39.740 ⇒ 00:02:46.130 Casie Aviles: Like, we want to know if they’re actually… getting…
23 00:02:46.550 ⇒ 00:02:53.970 Casie Aviles: Better calls, better quality calls, so that’s one of the things that we want to measure.
24 00:02:54.080 ⇒ 00:02:55.849 Casie Aviles: For the… yeah, for the AI.
25 00:02:55.850 ⇒ 00:02:57.279 Hannah Wang: I see. Okay.
26 00:02:57.820 ⇒ 00:03:04.300 Hannah Wang: Okay, cool. So… My next question… Is this related to…
27 00:03:04.590 ⇒ 00:03:10.490 Hannah Wang: Andy? Like, sorry, I don’t really know the client project, so I’m just trying to get a better understanding.
28 00:03:10.600 ⇒ 00:03:13.110 Hannah Wang: But I guess what…
29 00:03:13.530 ⇒ 00:03:20.260 Hannah Wang: like, the dash… yeah, I guess we can just get into it. So, how long did the dashboard
30 00:03:20.380 ⇒ 00:03:21.340 Hannah Wang: I guess.
31 00:03:22.090 ⇒ 00:03:26.430 Hannah Wang: Building and measuring of everything. Like, do you know how long that took?
32 00:03:31.490 ⇒ 00:03:34.050 Casie Aviles: I’m really bad at remembering, I should.
33 00:03:34.050 ⇒ 00:03:40.339 Hannah Wang: That’s okay. Do you think it’s just, like, a couple quarters? Like, Q2 to Q3, Q1 to Q3?
34 00:03:43.930 ⇒ 00:03:55.940 Casie Aviles: Yeah, I think we started around… February, where we were building We started building just… a data set.
35 00:03:58.130 ⇒ 00:04:01.599 Casie Aviles: Like, the ideal answers, and then…
36 00:04:02.120 ⇒ 00:04:05.840 Casie Aviles: the actual answers that the AI generated, so…
37 00:04:07.280 ⇒ 00:04:09.810 Casie Aviles: As early as that, we were building…
38 00:04:10.560 ⇒ 00:04:15.169 Casie Aviles: Yeah, we were building that, and then I think…
39 00:04:15.630 ⇒ 00:04:20.619 Casie Aviles: I’m not… I don’t recall very well when we launched the dashboard.
40 00:04:21.649 ⇒ 00:04:26.289 Hannah Wang: That’s okay. I think that’s a good general idea.
41 00:04:26.839 ⇒ 00:04:35.629 Hannah Wang: And then was it… I guess, what… who were the team members involved? Not just, like, AI engineer, but, like, I’m assuming Amber was the PM.
42 00:04:36.670 ⇒ 00:04:37.280 Casie Aviles: Yes.
43 00:04:37.770 ⇒ 00:04:39.080 Hannah Wang: And then, yeah.
44 00:04:39.760 ⇒ 00:04:43.060 Casie Aviles: Annie also worked on the dashboard, so…
45 00:04:43.060 ⇒ 00:04:43.510 Hannah Wang: Okay.
46 00:04:43.510 ⇒ 00:04:49.729 Casie Aviles: Initially, we… it was just… Me and Utang, who created the dashboard.
47 00:04:50.790 ⇒ 00:04:54.560 Casie Aviles: But, yeah, and then eventually Annie…
48 00:04:54.890 ⇒ 00:04:58.779 Casie Aviles: Came, yeah, took over the dashboard work.
49 00:04:59.820 ⇒ 00:05:00.510 Hannah Wang: Okay.
50 00:05:01.070 ⇒ 00:05:04.319 Hannah Wang: Cool, so…
51 00:05:05.690 ⇒ 00:05:13.860 Hannah Wang: I don’t really… I’m looking at the questions I want to ask you, I don’t really know if they’re gonna make sense in the context of this case study, so…
52 00:05:14.210 ⇒ 00:05:22.740 Hannah Wang: Bear with me, but, I guess, like, before building the dashboard, like, what were…
53 00:05:22.960 ⇒ 00:05:26.900 Hannah Wang: what was ABC doing, or, like, what were you doing?
54 00:05:27.800 ⇒ 00:05:36.230 Hannah Wang: Before building the dashboard, in terms of, like, measuring AIH, Like, was it just, like.
55 00:05:36.740 ⇒ 00:05:49.250 Hannah Wang: Because I’m assuming there was no dashboard before it, so was it, like, difficult to measure agents? Maybe you can just, like, show me the dashboard, and I can see the metrics on there.
56 00:05:51.710 ⇒ 00:05:54.470 Casie Aviles: Yeah, and… Just continue.
57 00:05:56.740 ⇒ 00:05:58.339 Casie Aviles: Yeah, okay, so…
58 00:05:58.970 ⇒ 00:05:59.720 Casie Aviles: This is…
59 00:05:59.940 ⇒ 00:06:00.290 Hannah Wang: Cool.
60 00:06:00.290 ⇒ 00:06:03.230 Casie Aviles: dashboard that we have for…
61 00:06:03.680 ⇒ 00:06:07.219 Casie Aviles: For Andy, this is, the usage for Andy.
62 00:06:07.220 ⇒ 00:06:07.630 Hannah Wang: Okay.
63 00:06:07.630 ⇒ 00:06:14.629 Casie Aviles: and… I guess before… before this, we… we… we really didn’t have any… any… anything like this, so…
64 00:06:14.890 ⇒ 00:06:17.590 Casie Aviles: It was definitely difficult to know, like.
65 00:06:19.580 ⇒ 00:06:23.060 Casie Aviles: Like, at a high-level perspective, like, how many…
66 00:06:23.700 ⇒ 00:06:30.029 Casie Aviles: like, how much is Andy being used? And then how good are the scores? Like.
67 00:06:30.220 ⇒ 00:06:34.400 Casie Aviles: Or, like, how good is the quality for Andy, I guess, like…
68 00:06:34.520 ⇒ 00:06:43.540 Casie Aviles: the answers that, the bot is generating. We have no idea, like… I guess we were just testing it based on…
69 00:06:43.860 ⇒ 00:06:52.429 Casie Aviles: potential questions. So, like I mentioned, we… when we started with ABC, one of the first things we established was
70 00:06:52.550 ⇒ 00:06:55.590 Casie Aviles: the… what we call the golden data set.
71 00:06:55.810 ⇒ 00:06:59.460 Casie Aviles: So… This is, basically a list of…
72 00:06:59.710 ⇒ 00:07:08.649 Casie Aviles: like, expect… like, inputs, like, questions. So, given a question, what is the expected answer?
73 00:07:09.080 ⇒ 00:07:10.079 Casie Aviles: We have that.
74 00:07:10.330 ⇒ 00:07:14.150 Casie Aviles: And we would… compare the…
75 00:07:14.780 ⇒ 00:07:21.210 Casie Aviles: We would compare Andy’s responses based on the expected answers, and then we would give it a score, so that’s…
76 00:07:22.340 ⇒ 00:07:24.840 Casie Aviles: What we were trying to build then, but…
77 00:07:25.070 ⇒ 00:07:29.380 Casie Aviles: Before the actual… even the scoring, before that, we were just testing
78 00:07:29.480 ⇒ 00:07:33.269 Casie Aviles: And checking manually, like, okay, is this good? Yeah.
79 00:07:33.380 ⇒ 00:07:37.989 Casie Aviles: That’s what we did. But we already had, like, evals in mind.
80 00:07:40.060 ⇒ 00:07:44.529 Hannah Wang: Got it So it was just, like, a lot of manual checking, and…
81 00:07:45.130 ⇒ 00:07:45.850 Casie Aviles: Yes.
82 00:07:45.850 ⇒ 00:07:53.170 Hannah Wang: Even, like, you didn’t even have a dataset before, so it was even harder. Okay. And…
83 00:07:54.300 ⇒ 00:08:03.409 Hannah Wang: Let’s see… So I guess, what’s the point of measuring Andy’s performance.
84 00:08:05.030 ⇒ 00:08:10.239 Hannah Wang: like, measuring the AI agent, like, what’s the point of doing so? I know it’s… it might be, like, a
85 00:08:10.490 ⇒ 00:08:12.159 Hannah Wang: obvious answer, but I just want
86 00:08:12.520 ⇒ 00:08:15.090 Hannah Wang: Like, the full context, I guess.
87 00:08:15.700 ⇒ 00:08:19.090 Casie Aviles: Yeah, yeah, no worries. Yeah, I mean, I guess…
88 00:08:19.420 ⇒ 00:08:25.569 Casie Aviles: We, like I mentioned earlier, I believe, We want to…
89 00:08:26.550 ⇒ 00:08:31.730 Casie Aviles: We know that if Andy is performing well, so that’s what…
90 00:08:33.960 ⇒ 00:08:43.889 Casie Aviles: It’s… it’s also not, like, our evals is not at the best place either, like, the automated ones, so that’s something we’re still actively trying to improve, but…
91 00:08:44.360 ⇒ 00:08:48.110 Casie Aviles: The idea is we have, like, a high-level score here.
92 00:08:48.390 ⇒ 00:08:56.570 Casie Aviles: Which is called an average quality score, so we want to know if Andy is actually
93 00:08:57.680 ⇒ 00:09:06.470 Casie Aviles: Responding correctly, because… I mean… In general, we’d want it to respond with correct answers, but
94 00:09:06.770 ⇒ 00:09:11.560 Casie Aviles: For, yeah, for CSRs, of course, they are talking to customers, so…
95 00:09:12.080 ⇒ 00:09:14.340 Casie Aviles: On the phone, so we want…
96 00:09:14.620 ⇒ 00:09:20.450 Casie Aviles: them to be given, like, the right information. So that’s how we measure it.
97 00:09:20.770 ⇒ 00:09:23.649 Casie Aviles: and then we also have, like.
98 00:09:24.460 ⇒ 00:09:27.760 Casie Aviles: These, thumbs up, thumbs down feedback, so…
99 00:09:29.370 ⇒ 00:09:37.219 Casie Aviles: Yeah, actually, this is… I’m just seeing this now, in a while. It’s good to see that at least there’s more thumbs up.
100 00:09:37.660 ⇒ 00:09:44.960 Casie Aviles: Counts now, then… Yeah, than before, but… Okay.
101 00:09:44.960 ⇒ 00:09:48.099 Hannah Wang: The thumbs up from the CSRs, like, giving feedback.
102 00:09:48.520 ⇒ 00:09:50.350 Casie Aviles: Yes, yes, this is from them.
103 00:09:53.110 ⇒ 00:09:54.060 Casie Aviles: Okay, so…
104 00:09:54.060 ⇒ 00:10:03.960 Hannah Wang: I guess going back even further, like, Andy, so the purpose of ANDI is to help CSR agents answer questions, right? Like, if a CSR…
105 00:10:04.570 ⇒ 00:10:05.580 Hannah Wang: agent.
106 00:10:05.690 ⇒ 00:10:14.069 Hannah Wang: Or if a CSR doesn’t know, like, the answer to something, do they ask Andy for that? Is that the purpose of Andy?
107 00:10:15.170 ⇒ 00:10:22.010 Casie Aviles: Yes, yes, and what else? I think… another… Reason…
108 00:10:22.140 ⇒ 00:10:29.109 Casie Aviles: that Andy could help them is to get, yeah, to answer calls much faster, to handle more
109 00:10:29.400 ⇒ 00:10:32.670 Casie Aviles: Stuff, so basically to augment them, their work.
110 00:10:34.860 ⇒ 00:10:44.820 Casie Aviles: And also, like, if they have… if ABC has, like, new… Csrs, then… if, with Andy.
111 00:10:45.010 ⇒ 00:10:52.170 Casie Aviles: It would… I guess, like, what the client has mentioned before, like…
112 00:10:52.370 ⇒ 00:10:55.239 Casie Aviles: They would… they would… it would be like,
113 00:10:55.630 ⇒ 00:10:59.160 Casie Aviles: they would be… how do I describe this? Arat.
114 00:10:59.280 ⇒ 00:11:01.069 Casie Aviles: I’m losing words.
115 00:11:01.070 ⇒ 00:11:02.090 Hannah Wang: That’s okay.
116 00:11:02.890 ⇒ 00:11:03.480 Casie Aviles: Thanks.
117 00:11:03.820 ⇒ 00:11:10.309 Casie Aviles: they would be, like… is it… it would be as if they were experts already with Andy, right? Like…
118 00:11:10.950 ⇒ 00:11:13.929 Casie Aviles: it wouldn’t show that they’re new, I guess. That’s kind of.
119 00:11:13.930 ⇒ 00:11:14.620 Hannah Wang: Yeah.
120 00:11:15.210 ⇒ 00:11:17.670 Casie Aviles: Kind of what I’m trying to say,
121 00:11:17.940 ⇒ 00:11:25.040 Casie Aviles: And then… yeah, so also the speed… Right? Like, if…
122 00:11:26.300 ⇒ 00:11:31.179 Casie Aviles: Yeah, I guess those are the main… the main things, and also, kind of.
123 00:11:31.530 ⇒ 00:11:37.530 Casie Aviles: Yeah, another thing is actually the centralizing, like, The… the information, and…
124 00:11:38.340 ⇒ 00:11:47.980 Casie Aviles: Based… like, they have their own documentation, their… Like, information about…
125 00:11:49.150 ⇒ 00:11:53.889 Casie Aviles: how to handle things, or… yeah, basically their internal Bible, I guess.
126 00:11:55.530 ⇒ 00:12:00.989 Casie Aviles: And… before Andy, it was really messy, it was really difficult to…
127 00:12:01.230 ⇒ 00:12:05.200 Casie Aviles: Actually get anything out of their documents.
128 00:12:05.690 ⇒ 00:12:09.840 Casie Aviles: So, part of the work that we did was to actually clean that up.
129 00:12:11.130 ⇒ 00:12:15.830 Casie Aviles: And, have Andy be trained with that knowledge.
130 00:12:16.560 ⇒ 00:12:17.520 Casie Aviles: So…
131 00:12:17.520 ⇒ 00:12:18.050 Hannah Wang: Got it.
132 00:12:18.690 ⇒ 00:12:24.710 Casie Aviles: Yeah, it also makes it easier to them to just access the information that they need.
133 00:12:25.690 ⇒ 00:12:26.390 Hannah Wang: Okay.
134 00:12:26.790 ⇒ 00:12:28.900 Hannah Wang: Cool.
135 00:12:29.140 ⇒ 00:12:37.929 Hannah Wang: So going back to… thanks for answering what Andy is. So going back to the dashboard, what did…
136 00:12:38.700 ⇒ 00:12:48.630 Hannah Wang: building this dashboard entail? Like, what did you have to set up? Like, what tools did you use? How did you, like, start measuring
137 00:12:49.050 ⇒ 00:12:51.469 Hannah Wang: the agent, I guess. Like, yeah.
138 00:12:53.020 ⇒ 00:13:00.810 Casie Aviles: Y, so… So of course, before we develop this dashboard, we would need the data to show.
139 00:13:01.140 ⇒ 00:13:01.900 Hannah Wang: Yes.
140 00:13:02.110 ⇒ 00:13:03.400 Casie Aviles: To show these.
141 00:13:03.540 ⇒ 00:13:13.270 Casie Aviles: So… We needed to have some sort of logging and observability in place, so…
142 00:13:14.080 ⇒ 00:13:20.650 Casie Aviles: At the time, we were all kind of learning how to do it, so… Wow, okay.
143 00:13:20.800 ⇒ 00:13:21.510 Casie Aviles: Yeah.
144 00:13:21.630 ⇒ 00:13:28.160 Casie Aviles: I can show… I can just briefly go to… And the… But…
145 00:13:28.510 ⇒ 00:13:32.730 Casie Aviles: Yeah, yeah, this is… this looks like spaghetti. Wow.
146 00:13:34.170 ⇒ 00:13:37.640 Casie Aviles: But, yeah, I mean, the idea is we want to log
147 00:13:37.880 ⇒ 00:13:42.530 Casie Aviles: The responses, so we start there, where is that?
148 00:13:43.520 ⇒ 00:13:47.970 Casie Aviles: Yeah, we have this, we have… we log it to Snowflake.
149 00:13:50.090 ⇒ 00:13:54.210 Casie Aviles: That’s what we do, so we… we get the data that we… that…
150 00:13:54.750 ⇒ 00:14:01.040 Casie Aviles: We can get from Google, the Google Chat interface, so… We logged those.
151 00:14:01.270 ⇒ 00:14:03.969 Casie Aviles: We store a bunch of metadata, like.
152 00:14:04.460 ⇒ 00:14:09.179 Casie Aviles: the timestamp, like, IDs and all of that, like, usernames.
153 00:14:10.500 ⇒ 00:14:14.240 Casie Aviles: We stored that in… A table, and that’s where…
154 00:14:15.090 ⇒ 00:14:21.879 Casie Aviles: That’s how we get, like, the usage, or, like, the total exchanges that we have here.
155 00:14:22.710 ⇒ 00:14:27.009 Casie Aviles: And then… What else? For the…
156 00:14:28.200 ⇒ 00:14:31.330 Casie Aviles: So the second thing is the scoring,
157 00:14:33.170 ⇒ 00:14:36.290 Casie Aviles: Oh, sorry. For that, we were using…
158 00:14:37.920 ⇒ 00:14:43.380 Casie Aviles: Like, a mix of brain trust, Also…
159 00:14:43.500 ⇒ 00:14:50.219 Casie Aviles: like, an LLM score, so it’s like an AI that scores the AI, basically.
160 00:14:50.860 ⇒ 00:14:55.180 Casie Aviles: And then we just give it a set of instructions, and then we also
161 00:14:55.730 ⇒ 00:15:02.059 Casie Aviles: use the, like I said, the data set, the golden dataset that we have to compare and score.
162 00:15:02.510 ⇒ 00:15:04.240 Casie Aviles: So that’s how we generate.
163 00:15:04.580 ⇒ 00:15:06.810 Casie Aviles: These quality scores.
164 00:15:09.010 ⇒ 00:15:12.620 Casie Aviles: And then for, yeah, for the thumbs up, we implemented, like.
165 00:15:13.730 ⇒ 00:15:18.600 Casie Aviles: a Google Chat thing where they could send… I could show here, like, they.
166 00:15:18.600 ⇒ 00:15:19.709 Hannah Wang: We have these, we have these…
167 00:15:19.710 ⇒ 00:15:20.570 Casie Aviles: buttons.
168 00:15:22.420 ⇒ 00:15:27.120 Casie Aviles: You could send a thumbs… you could click for a thumbs up, and then that would get sent to our…
169 00:15:27.550 ⇒ 00:15:30.610 Casie Aviles: Database, and then we could even do, like.
170 00:15:30.840 ⇒ 00:15:34.129 Casie Aviles: Thumbs down, and then we would ask for, like, feedback.
171 00:15:35.790 ⇒ 00:15:42.059 Casie Aviles: So… Yeah, I could send… Something like this, so…
172 00:15:42.680 ⇒ 00:15:47.460 Casie Aviles: This would also get logged, and it would even send to our Slack for alerts.
173 00:15:47.820 ⇒ 00:15:48.810 Hannah Wang: Oh, cool.
174 00:15:50.870 ⇒ 00:15:51.640 Casie Aviles: Yeah.
175 00:15:52.290 ⇒ 00:15:55.939 Casie Aviles: Yeah, I think we… I’m not sure if we… yeah.
176 00:15:56.130 ⇒ 00:15:58.529 Casie Aviles: Oh. Yeah, it’s this one. That’s Pete Bell.
177 00:15:58.860 ⇒ 00:15:59.300 Hannah Wang: Mmm.
178 00:15:59.300 ⇒ 00:16:04.620 Casie Aviles: So… Yeah, I think that’s where we… that’s how we get the data.
179 00:16:04.860 ⇒ 00:16:06.290 Casie Aviles: for the dashboard.
180 00:16:06.720 ⇒ 00:16:07.690 Casie Aviles: Pretty much.
181 00:16:07.960 ⇒ 00:16:08.670 Hannah Wang: Okay.
182 00:16:09.000 ⇒ 00:16:13.829 Hannah Wang: And then, I’m assuming real is visualization, right? Like…
183 00:16:14.230 ⇒ 00:16:21.120 Hannah Wang: Yeah. It’s a visualization tool, okay. Okay, and…
184 00:16:21.500 ⇒ 00:16:30.799 Hannah Wang: I guess, like, with the feedback that you get, what do you do? Like, do you just, like, retrain the agents or something with the feedback?
185 00:16:32.740 ⇒ 00:16:34.800 Casie Aviles: What we do with the feedback
186 00:16:34.920 ⇒ 00:16:41.150 Casie Aviles: Is we… it gets… so this one is actually recent… a recent change that we did is…
187 00:16:41.720 ⇒ 00:16:46.570 Casie Aviles: For each thumbs-down feedback, that we get.
188 00:16:46.700 ⇒ 00:16:53.759 Casie Aviles: It would be… It would, like, create a triage ticket, here on Linear.
189 00:16:53.760 ⇒ 00:16:55.109 Hannah Wang: Oh, okay.
190 00:16:55.960 ⇒ 00:16:59.689 Casie Aviles: And… That’s to make… to keep…
191 00:16:59.980 ⇒ 00:17:03.340 Casie Aviles: Track of, like, what we need to do, and then…
192 00:17:03.650 ⇒ 00:17:08.740 Casie Aviles: we… Amber has been working with the… with the client to…
193 00:17:09.490 ⇒ 00:17:17.220 Casie Aviles: Handle these triage tickets, so if it’s… so, like, it’s determining whose responsibility would it be, because sometimes…
194 00:17:17.560 ⇒ 00:17:24.229 Casie Aviles: Sometimes it… It’s, like, it’s the data that the client has, so we can’t really do…
195 00:17:24.420 ⇒ 00:17:27.509 Casie Aviles: Anything, so they have to change the data that they have.
196 00:17:28.319 ⇒ 00:17:31.549 Casie Aviles: They have to update their document.
197 00:17:32.040 ⇒ 00:17:35.990 Casie Aviles: for the AI to have accurate, information.
198 00:17:36.150 ⇒ 00:17:38.590 Casie Aviles: On the other hand, if it’s, like,
199 00:17:38.690 ⇒ 00:17:43.739 Casie Aviles: our aside, then we would… it would be assigned to me, like, for this one, for example.
200 00:17:44.480 ⇒ 00:17:48.680 Casie Aviles: And then I would, like, make fixes. So fixes could be…
201 00:17:49.370 ⇒ 00:17:55.519 Casie Aviles: typically what I do is I investigate first, I try to recreate the
202 00:17:55.720 ⇒ 00:17:59.339 Casie Aviles: Like, the problem, given, like, this input.
203 00:17:59.860 ⇒ 00:18:02.370 Casie Aviles: And then if I can spot, like.
204 00:18:02.890 ⇒ 00:18:10.429 Casie Aviles: What went wrong, so it could be, like, a system instruction problem, where the instruction was missing…
205 00:18:10.590 ⇒ 00:18:16.380 Casie Aviles: Something, or… Yeah, that’s pretty much, like, what we do, so…
206 00:18:16.910 ⇒ 00:18:23.720 Casie Aviles: Like, in terms of training, we’re not really doing any… AI training, like,
207 00:18:23.860 ⇒ 00:18:32.890 Casie Aviles: We don’t have any trained models, these are just the base models, but what we do is we typically, yeah, adjust the system instructions, or…
208 00:18:33.380 ⇒ 00:18:42.090 Casie Aviles: Other than… if there are other kinds of fixes that we do, like, we make fixes here in the workflow, sometimes some of these nodes fail.
209 00:18:42.620 ⇒ 00:18:48.190 Casie Aviles: So We would, go here and… remedy that.
210 00:18:49.320 ⇒ 00:18:50.020 Hannah Wang: Got it.
211 00:18:52.100 ⇒ 00:19:03.009 Hannah Wang: So for the dashboard, is it mostly internal? Like, is it mostly just our team that looks at it, or is it also, like, the client also looks at it?
212 00:19:03.460 ⇒ 00:19:07.530 Casie Aviles: Yeah, yeah, I believe the client also takes a look at this,
213 00:19:07.850 ⇒ 00:19:10.720 Casie Aviles: I haven’t been in one of those meetings, but…
214 00:19:10.810 ⇒ 00:19:13.699 Hannah Wang: Okay. Amber has meetings with…
215 00:19:13.700 ⇒ 00:19:14.550 Casie Aviles: Yvette?
216 00:19:15.400 ⇒ 00:19:21.200 Casie Aviles: ABC, and they have, like, a weekly KPI review of the dashboard.
217 00:19:21.200 ⇒ 00:19:21.910 Hannah Wang: Oh, okay.
218 00:19:21.910 ⇒ 00:19:27.040 Casie Aviles: So, yeah, that’s where we’re able to know if this is actually
219 00:19:27.430 ⇒ 00:19:30.780 Casie Aviles: Hitting goals, business goals as well.
220 00:19:31.010 ⇒ 00:19:41.609 Hannah Wang: I see. Okay, let me ask Amber for the client feedback for this, because I don’t… yeah, you haven’t been in those meetings, so you wouldn’t know.
221 00:19:42.530 ⇒ 00:19:49.870 Hannah Wang: Okay, so those are the solution, the results… okay, I think the results and the impact,
222 00:19:50.050 ⇒ 00:19:54.180 Hannah Wang: for the client, I’ll ask Amber, but I guess… Do you…
223 00:19:54.410 ⇒ 00:20:01.579 Hannah Wang: like, internally within Brain Forge, like, was there any feedback that you got from Amber, or…
224 00:20:01.900 ⇒ 00:20:07.340 Hannah Wang: I guess UTAM or whoever works with ABC, the client,
225 00:20:07.520 ⇒ 00:20:12.559 Hannah Wang: Like, how helpful this dashboard is, or any other feedback.
226 00:20:14.510 ⇒ 00:20:18.870 Casie Aviles: Yeah, so… Hmm, I think, yeah, definitely, like, the…
227 00:20:19.970 ⇒ 00:20:27.659 Casie Aviles: Being able to see, like, how many people are using it, and yeah, the dashboard…
228 00:20:27.870 ⇒ 00:20:31.790 Casie Aviles: Making it easy for us to see that It’s,
229 00:20:31.930 ⇒ 00:20:37.890 Casie Aviles: It’s good, it’s good, yeah, and what else,
230 00:20:38.380 ⇒ 00:20:42.569 Casie Aviles: I think, yeah, I guess for the negative feedback, it’s just…
231 00:20:42.990 ⇒ 00:20:46.340 Casie Aviles: That the quality scores definitely need some work.
232 00:20:47.090 ⇒ 00:20:51.240 Casie Aviles: Because, one of the things we noticed is that
233 00:20:52.560 ⇒ 00:21:01.769 Casie Aviles: Sometimes, like, the scorer would give a high score, but then the CSR would give it a thumbs down, so there’s, like, a mismatch with
234 00:21:02.260 ⇒ 00:21:06.890 Casie Aviles: how we score it, and then how the CSRs
235 00:21:07.550 ⇒ 00:21:11.579 Casie Aviles: Even, think of the, like, the response, so…
236 00:21:11.700 ⇒ 00:21:12.980 Hannah Wang: I see. I guess that’s…
237 00:21:12.980 ⇒ 00:21:17.610 Casie Aviles: None of those. Yeah, the… Negative feedbacks for this.
238 00:21:18.360 ⇒ 00:21:19.090 Hannah Wang: Okay.
239 00:21:21.490 ⇒ 00:21:29.759 Hannah Wang: Okay, cool. I think this is good. And then if I have any more questions, I’ll ask you, in Slack, but…
240 00:21:30.100 ⇒ 00:21:33.409 Hannah Wang: Yeah, this is good. I think that’s everything.
241 00:21:33.780 ⇒ 00:21:47.780 Hannah Wang: So… yeah, I’ll probably ask for you to look over the case studies as well, after we design them and get all the copy, but, that’ll be, like, next week or tomorrow or something, so…
242 00:21:48.370 ⇒ 00:21:51.869 Hannah Wang: Yeah, I’ll just ping you in the channel when it’s ready.
243 00:21:52.710 ⇒ 00:21:53.310 Casie Aviles: Sure.
244 00:21:54.150 ⇒ 00:22:05.260 Hannah Wang: All right, thanks again, as always, for going… spending, like, an hour, basically, going through these, case studies are very helpful, so appreciate it.
245 00:22:06.160 ⇒ 00:22:12.020 Casie Aviles: Yeah, thank you as well. I know I just chat a lot about this, but… Yeah.
246 00:22:12.020 ⇒ 00:22:18.319 Hannah Wang: No, it’s good. It’s good for you to talk. The more you talk, the more context AI has, so it’s… it’s good.
247 00:22:18.700 ⇒ 00:22:22.669 Hannah Wang: Alright, well, I’ll talk to you on Slack.
248 00:22:23.280 ⇒ 00:22:24.470 Casie Aviles: Alright, thank you, Ana.
249 00:22:24.470 ⇒ 00:22:25.820 Hannah Wang: Alright, thanks, bye.