2025-09-12_brainforge_a-b_testing_discussion

Meeting Title: Brainforge A-B Testing Discussion Date: 2025-09-12 Meeting participants: Jake Nathan, Shreya Chowdhury

WEBVTT

1 00:01:48.420 ⇒ 00:01:49.350 Jake Nathan: Hey, Shreya.

2 00:01:49.350 ⇒ 00:01:50.799 Shreya Chowdhury: Hey, how’s it going?

3 00:01:51.200 ⇒ 00:01:52.260 Jake Nathan: Good, how are you doing?

4 00:01:52.260 ⇒ 00:01:54.150 Shreya Chowdhury: Good. Nice to meet you.

5 00:01:54.150 ⇒ 00:01:57.630 Jake Nathan: Yeah, nice to meet you too. Thank you for making time, I really appreciate it.

6 00:01:57.630 ⇒ 00:02:02.600 Shreya Chowdhury: Yeah, no worries. Sorry about the rescheduling stuff, it’s just been a super busy week.

7 00:02:03.110 ⇒ 00:02:15.109 Jake Nathan: I bet, yeah, I know, client work. It’s like, you gotta… yeah, there’s just a ton there. So, yeah, I appreciate you, again, making time, and yeah, you said it’s been a busy week, how’s… how’s today going?

8 00:02:15.650 ⇒ 00:02:34.580 Shreya Chowdhury: It’s going, still pretty busy, but it’s Friday, so I’m just kind of, like, in a sprint state, I’m hoping, like, I don’t… it’s probably ambitious to want to get everything I want to get done done, but I feel like I’ll feel so much better if I can wrap up a lot of things and start with, like, a cleaner, or relatively cleaner slate next week.

9 00:02:35.120 ⇒ 00:02:42.009 Jake Nathan: Yeah, I totally got ya, and how… remind me, like, I just joined a few weeks ago, so how long have you been at Brain Forge?

10 00:02:42.010 ⇒ 00:02:44.240 Shreya Chowdhury: Same. Also joined a few weeks ago.

11 00:02:44.240 ⇒ 00:02:47.930 Jake Nathan: Oh, really? Wow. Did you know Uouton before? Like, how’d you get…

12 00:02:47.930 ⇒ 00:02:57.609 Shreya Chowdhury: No, so he’s an ex-coworker of an ex-coworker. Okay. So, yeah, that… we’re kind of, like, in the same network, but yeah. What about you?

13 00:02:58.510 ⇒ 00:03:00.390 Jake Nathan: Yeah, funny enough.

14 00:03:00.460 ⇒ 00:03:10.480 Jake Nathan: about a year ago, I happened to play pickleball, like, just at a random place, I play in Open Play, and I met Robert, and then we just kinda…

15 00:03:10.480 ⇒ 00:03:24.079 Jake Nathan: became… like, we just had a good time there, and I remember we had, like, one… we just followed up and called one time after that, and then a year later, I was actually… Robert, he was telling me he had an idea for, like, a…

16 00:03:24.180 ⇒ 00:03:29.499 Jake Nathan: camping food, startup or something, and I… my background’s actually, like,

17 00:03:29.650 ⇒ 00:03:35.270 Jake Nathan: in the consumer world, like, I have a cooking channel on TikTok that has a million followers, so that’s kind.

18 00:03:35.270 ⇒ 00:03:35.940 Shreya Chowdhury: Oh, I’ve been…

19 00:03:35.940 ⇒ 00:03:36.280 Jake Nathan: whatever.

20 00:03:36.280 ⇒ 00:03:37.080 Shreya Chowdhury: Cool.

21 00:03:37.310 ⇒ 00:03:52.499 Jake Nathan: Yeah, it’s been fun, but… so I was kind of giving him, like, advice on if you were to start that, like, what type of content to do. And so, anyway, I saw, like, a post a few weeks ago, and just sent it to him, because it reminded me of him, and he’s like, hey, actually.

22 00:03:52.500 ⇒ 00:04:06.829 Jake Nathan: very random, but, like, we’re trying to step up our content, and if you wanted to help out. So, that’s… and then he intro’d me to Tom, and so, yeah, it’s been… it’s been really fun over the past few weeks. It seems like, there’s just a lot of cool stuff going on.

23 00:04:07.100 ⇒ 00:04:20.050 Shreya Chowdhury: Yeah, nice. Well, yeah, that’s a classic, like, meeting on the pickleball court. I feel like that’s very, like, it’s very, like, cliche, but, like, effective, like, tech networking, like.

24 00:04:20.450 ⇒ 00:04:33.619 Jake Nathan: That’s true, now that I think about it, that’s, like, the most cliche. I mean, I needed to step up my golf game, I feel like that’s where people usually… er, I don’t know, sometimes they meet, but I’m not good at golf, so pickleball is my… do you play pickleball at all?

25 00:04:33.620 ⇒ 00:04:44.200 Shreya Chowdhury: I have before. I have friends who are really good at it, and friends who are not, so we did skill-based matchmaking. I was in the latter group.

26 00:04:44.200 ⇒ 00:04:45.020 Jake Nathan: go, hey.

27 00:04:45.020 ⇒ 00:04:45.350 Shreya Chowdhury: Yeah.

28 00:04:45.350 ⇒ 00:04:46.349 Jake Nathan: Gotta start somewhere.

29 00:04:46.690 ⇒ 00:04:47.340 Shreya Chowdhury: Yeah.

30 00:04:47.340 ⇒ 00:04:51.419 Jake Nathan: No, that’s awesome. And, remind me, where are you based?

31 00:04:51.420 ⇒ 00:04:52.969 Shreya Chowdhury: I’m based in the Bay Area.

32 00:04:52.970 ⇒ 00:04:54.649 Jake Nathan: Okay, cool, that’s awesome.

33 00:04:55.400 ⇒ 00:04:57.459 Shreya Chowdhury: A lot of picklers here.

34 00:04:57.630 ⇒ 00:04:59.570 Jake Nathan: Yeah, I would imagine that, I would imagine.

35 00:05:01.270 ⇒ 00:05:10.900 Jake Nathan: Well, cool. Well, yeah, once again, thanks for making time. Utam, I mean, like I, kinda said in the… in Slack, he was just wanting to…

36 00:05:10.900 ⇒ 00:05:21.280 Jake Nathan: get, your thoughts on… on a few questions. So, kind of… I’ll just kind of run through, if you want, like, we can just kind of go through question by question, see how much time we have, and .

37 00:05:21.280 ⇒ 00:05:35.040 Shreya Chowdhury: Yeah, sounds good. I’m also happy to, like, we can split it up into half and half if we want to do a second session on Monday or something, because I know we’re, like, a little bit shorter on time, and I have a sync with Robert, like, right after, but yeah, happy to go through as many as we can.

38 00:05:35.210 ⇒ 00:05:46.650 Jake Nathan: Sounds good, yeah. Well, yeah, yeah, we’ll just go through one by one. So, yeah, what makes a good A-B test versus just something that’s a vanity exercise?

39 00:05:46.650 ⇒ 00:06:04.540 Shreya Chowdhury: Yeah, so I was thinking about this one, and I really like it, because I feel like A-B testing is, like, it’s fun to do, and it gives a lot of good insights, but I feel like a good test is to, like, it’s tied to something that yields, like, a meaningful product opinion, so, like.

40 00:06:04.830 ⇒ 00:06:22.650 Shreya Chowdhury: more than just curiosity. Obviously, it’s good to, like, be robust in your decisions, but I feel like… I think I wrote something down that I was thinking about yesterday. Yeah. Yeah, so, like, a good experiment answer is, like, a meaningful question that the team is already thinking about, where you have

41 00:06:22.650 ⇒ 00:06:36.860 Shreya Chowdhury: good quality data already. And then a vanity test would be just kind of like, oh, we’re collecting, like, numbers that don’t necessarily change decisions, like, there’s a lot of caveats in the data that we don’t understand,

42 00:06:37.510 ⇒ 00:06:50.440 Shreya Chowdhury: And also, like, a good A-B test or a good experiment is, like, it validates something that we are thinking about. It validates or disproves something we’re thinking about already,

43 00:06:50.810 ⇒ 00:06:54.900 Shreya Chowdhury: But it’s something that we don’t… we don’t want to, like.

44 00:06:55.090 ⇒ 00:07:03.229 Shreya Chowdhury: take the plunge without having robust data. But, you know, there are a lot of tests that are, like, we don’t need to, like.

45 00:07:03.390 ⇒ 00:07:21.049 Shreya Chowdhury: have it validated by an experiment to run… or to make that decision. So if it’s something we already know we need to fix, or we’ve seen trends in other ways, or even just based on our gut, we’re like, this is something we feel like we should do, it’s not necessarily worth the man hours and the money to run that experiment.

46 00:07:21.940 ⇒ 00:07:22.620 Jake Nathan: Yeah.

47 00:07:23.030 ⇒ 00:07:42.899 Jake Nathan: That totally makes sense. So you kind of need the conviction, like, you actually… like, you’re ready to figure out, like, to have an opinion on it, and then you need the data. Like, you can’t… it needs to actually be data that, when you look at it, you’re trusting it, like, okay, this… this data came back, and this is, like, this is actually going to inform our decision.

48 00:07:42.900 ⇒ 00:08:07.169 Shreya Chowdhury: Yeah, and as far as, like, in theory goes, it’s like, usually you can, like, run an A-B test with enough data, like, if you have, like, some amount of data, but it’s like you want to make sure that you have, like, a good sample size, and you have good outcome metrics, and you have, like, a good path forward, because a lot of times it’s like, if you don’t know what the, like, ideal outcome is, or what you’re trying to optimize for.

49 00:08:07.170 ⇒ 00:08:14.719 Shreya Chowdhury: no amount of experimenting or setting up tracking or whatever will make a difference. So, I think a good A-B test is always…

50 00:08:15.080 ⇒ 00:08:20.270 Shreya Chowdhury: Built off of, like, good…

51 00:08:20.810 ⇒ 00:08:24.299 Shreya Chowdhury: Business motivations, or good product, like, product goals.

52 00:08:25.670 ⇒ 00:08:40.909 Jake Nathan: Yeah, and to follow up, you said, like, you need a good sample size and good, business metrics. I know it depends so much on… I mean, an A-B test could literally mean anything for any product, but what, like, how do I even figure out what’s a good sample size?

53 00:08:41.299 ⇒ 00:08:47.200 Shreya Chowdhury: Yeah, so I think that depends on what you’re testing for.

54 00:08:48.440 ⇒ 00:08:54.380 Shreya Chowdhury: Sometimes, like, obviously, like, if you have, like, a really, really large sample size, like, it’s…

55 00:08:54.540 ⇒ 00:09:10.710 Shreya Chowdhury: a little more, like, strenuous and expensive to run that experiment, and then also, like, to analyze the data, there’s just so much of it, it’s almost overwhelming. But at the same time, if your sample size is too small, it just means, like, even if you get a result that

56 00:09:10.790 ⇒ 00:09:23.540 Shreya Chowdhury: you can technically mark as statistically significant or whatever, it’s not that significant because you just don’t have enough data. So it’s like, yeah, like, if your sample size is just too small,

57 00:09:24.150 ⇒ 00:09:27.920 Shreya Chowdhury: Unfortunately, sometimes you just can’t get, like, a meaningful

58 00:09:28.110 ⇒ 00:09:31.900 Shreya Chowdhury: Like, result from an experiment like that.

59 00:09:34.720 ⇒ 00:09:41.839 Jake Nathan: Yeah, do you, like, how do you gut check if it’s enough, if the sample size is large enough?

60 00:09:42.510 ⇒ 00:09:46.190 Shreya Chowdhury: Yeah, that’s a good question. I think it dep… like.

61 00:09:46.810 ⇒ 00:10:02.269 Shreya Chowdhury: I feel like it really depends on the feature or something, like, in the past, and, like, the ones that I’ve worked on are, like, oh, like, we have, like, this ecosystem, so it’s, like, the app store or the theme store, and we’re thinking of launching, like, a new header or, like, a new, like.

62 00:10:02.270 ⇒ 00:10:09.289 Shreya Chowdhury: I don’t know, group of apps, or, like, a new group of themes, whatever it is, and it’s like, we want to see which one,

63 00:10:10.110 ⇒ 00:10:15.030 Shreya Chowdhury: Makes more sense for merchants, or which ones, like, help with their, like.

64 00:10:15.700 ⇒ 00:10:30.499 Shreya Chowdhury: whatever the goal is, so it could either be, like, helping them find the right apps and themes, or, like, helping them, like, just navigate the store and, like, improve their search. So usually for those ones, it is, like.

65 00:10:30.970 ⇒ 00:10:42.240 Shreya Chowdhury: we’ll do anywhere between a few hundred to a thousand for, like, a good sample size. I could see that changing for, like, different cases, like, if you want to do, like.

66 00:10:43.580 ⇒ 00:10:50.410 Shreya Chowdhury: Actually, yeah, let me stick to it. I feel like, generally, like, that’s what I think would be a good sample size for a lot of the things that we want to test for.

67 00:10:50.590 ⇒ 00:11:00.900 Shreya Chowdhury: If you want to learn, like, or if you want to learn and get insights about, like, larger, like, marketing campaigns or something like that that you want to roll out, you probably want

68 00:11:01.240 ⇒ 00:11:16.990 Shreya Chowdhury: like, I don’t know, if hypothetically you wanted to roll out a campaign, like, in a bunch of different major cities, you probably want even bigger of a sample size, ideally, than a few hundred. Yeah, I guess it just depends on, like, how… what you’re testing for.

69 00:11:17.900 ⇒ 00:11:22.190 Jake Nathan: Yeah, that makes sense. And then… On the metrics side, too.

70 00:11:22.320 ⇒ 00:11:26.030 Jake Nathan: Is it possible, like, could you give me an example of…

71 00:11:26.130 ⇒ 00:11:39.430 Jake Nathan: Like, once again, every business metric, you know, every company, it depends, like, the… it would, depend on what business metrics are important to you, but, like, is there an example that… of, metric

72 00:11:39.630 ⇒ 00:11:43.869 Jake Nathan: That, like, in your past experience, that you tried to measure that wasn’t

73 00:11:44.000 ⇒ 00:11:51.960 Jake Nathan: good, and actually this metric was more… was, like, what you were actually looking for? Like, is there… do you have any examples of metrics that aren’t… aren’t good?

74 00:11:52.120 ⇒ 00:11:53.610 Shreya Chowdhury: Yeah, so I think, like.

75 00:11:55.430 ⇒ 00:12:17.850 Shreya Chowdhury: I mean, the more data you have available to you, it’s always better. You don’t always get the opportunity to, like, dig through all of it, so it’s good to have the most relevant. I think generally there’s… that one’s also a little context-dependent, because there are metrics that might be valuable or important in different cases, but they’re not necessarily relevant to what you’re looking to test for. So, like.

76 00:12:18.100 ⇒ 00:12:28.120 Shreya Chowdhury: For example, like, a lot of times we’ll look at things like, clicks, like, click-through rates, like, installs, things like that, but…

77 00:12:30.490 ⇒ 00:12:46.030 Shreya Chowdhury: I think the one good thing to consider is, like, does that metric tie back to the customer value or business outcomes? So, like, for example, if you have a marketing team, and, you know, we run the A-B test, and we notice that in the,

78 00:12:46.030 ⇒ 00:12:58.450 Shreya Chowdhury: in the test group, like, there’s a huge spike in clicks or something, but nobody actually signs up for the product. That’s sort of a vanity win, because yes, it improved one of the metrics that

79 00:12:58.510 ⇒ 00:13:12.109 Shreya Chowdhury: generally you would care about, like, you do want higher click-through rates and more clicks, but it doesn’t really help convert users. So, in that case, like, if they’re not becoming paying customers, or if they’re not

80 00:13:12.110 ⇒ 00:13:19.619 Shreya Chowdhury: Yeah, like, signing up or purchasing the product or service, it doesn’t really move the business outcomes. So in that case, it’s like.

81 00:13:19.620 ⇒ 00:13:23.449 Shreya Chowdhury: Yeah, like, clicks is a great metric generally, but…

82 00:13:23.750 ⇒ 00:13:26.359 Shreya Chowdhury: It doesn’t really… if it’s not…

83 00:13:26.700 ⇒ 00:13:30.740 Shreya Chowdhury: Looking at the conversion, or if it’s not improving the conversion, then…

84 00:13:31.110 ⇒ 00:13:34.800 Shreya Chowdhury: it doesn’t really help us. It’s not… it’s not a win.

85 00:13:35.320 ⇒ 00:13:40.779 Shreya Chowdhury: But, yeah, so that… but that is context-dependent, because sometimes, like, again, if we’re thinking about just, like.

86 00:13:42.030 ⇒ 00:13:59.699 Shreya Chowdhury: a header or, like, changing something on a website, then, yeah, click would be good. Or click-through rate would be good, but if we’re thinking about, like, oh, we’re launching this specific campaign and hoping to improve, like, sign-ups, then you probably want to look at click-to-sign-up conversion there. Yeah.

87 00:13:59.700 ⇒ 00:14:09.619 Shreya Chowdhury: So I just think it doesn’t… like, a lot of times, I think in experiments, people will over-index on, like, let’s look at every single metric that we would usually care about, and it’s like, you shouldn’t.

88 00:14:10.940 ⇒ 00:14:14.479 Jake Nathan: Yeah, that’s… that’s powerful, and that… that makes sense. It seems like…

89 00:14:14.620 ⇒ 00:14:20.689 Jake Nathan: I mean, it’s so context-dependent, but kind of the closer you can get to like you’re saying, for…

90 00:14:20.840 ⇒ 00:14:26.659 Jake Nathan: instead of clicks, it would probably be closer to something that actually, like you’re saying, ties to the business value, and I’m…

91 00:14:26.800 ⇒ 00:14:36.869 Jake Nathan: In my experience, it seems like business value, you know, basically either as close as it can get to either, like, increasing revenue or decreasing costs, that seems to be most, like.

92 00:14:37.210 ⇒ 00:14:42.150 Jake Nathan: what you’re trying to go towards, generally. So, like, the closer it gets to that versus…

93 00:14:42.310 ⇒ 00:14:48.179 Jake Nathan: like, for me, for example, I feel like, in marketing a lot of times, it’s, like, always a balance of…

94 00:14:48.390 ⇒ 00:14:54.160 Jake Nathan: Like, top of funnel versus conver… Some people can…

95 00:14:54.270 ⇒ 00:15:01.699 Jake Nathan: Like, even if a video gets a million views, but it doesn’t convert anyone, then it’s like, which is the more important metric, the views, or the…

96 00:15:01.990 ⇒ 00:15:04.290 Jake Nathan: Or, the conversion, so…

97 00:15:04.290 ⇒ 00:15:09.660 Shreya Chowdhury: Yeah. Like, a lot of times, I think certain metrics are nice to have, but they’re not…

98 00:15:10.550 ⇒ 00:15:28.839 Shreya Chowdhury: it’s like a bonus. That shouldn’t be the deciding factor on whether or not you roll something out. Like, if we… I don’t know, if a business had been, like, declining in sales and declining in revenue generally, and we were just looking at clicks instead of conversions, like, that’s great, but it wouldn’t… doesn’t really help us.

99 00:15:29.590 ⇒ 00:15:42.440 Jake Nathan: Totally. Okay, that’s… that’s really helpful. I think, I mean, that could almost be an article in itself, but, I’m excited to keep going. So, yeah, I guess on that point, so, you said, okay, you need to actually have

100 00:15:42.520 ⇒ 00:15:52.650 Jake Nathan: conviction that this is something that you want to test, and that you’re… you want to have an opinion on. Like, how… what makes a good hypothesis, or what makes a good

101 00:15:52.890 ⇒ 00:15:56.160 Jake Nathan: Test in the first place, like, having conviction to run that test.

102 00:15:56.790 ⇒ 00:16:00.830 Shreya Chowdhury: Yeah, so I… I think that also goes back with, like.

103 00:16:01.310 ⇒ 00:16:11.290 Shreya Chowdhury: it goes back to, like, what… what do we reasonably have the data to test and run an experiment? Because a lot of times, like.

104 00:16:12.050 ⇒ 00:16:18.469 Shreya Chowdhury: again, like, you don’t need data to make every decision, and you shouldn’t even want it. Like, if you know something is broken, then, like.

105 00:16:18.560 ⇒ 00:16:35.990 Shreya Chowdhury: you don’t have to test for it, like, if you know you should fix it, like, let’s just fix it. So I think that it… part of it is, like, what is something that we don’t know about, and we… we have the ability to, like, get more robust in our decision.

106 00:16:36.370 ⇒ 00:16:45.619 Shreya Chowdhury: So, I would probably start with, like, a lot of big unknowns or assumptions that, like, just block growth or block certain decisions.

107 00:16:45.620 ⇒ 00:16:58.350 Shreya Chowdhury: And then once you have, like, just try and narrow the scope and quantify those hypotheses, like, as much as you can. So whatever you’re thinking about in plain English, like, how can you relate that to the data?

108 00:16:58.890 ⇒ 00:17:16.979 Shreya Chowdhury: And I think it’s also important to ask, like, if this hypothesis is true, like, how does it materially change our roadmap? Like, how… how would we move forward, like, in either decision? Like, or in… I feel like there’s… there’s 3 outcomes, right? Like, either it… your hypothesis, like.

109 00:17:17.190 ⇒ 00:17:28.059 Shreya Chowdhury: strongly suggest, yes, you should do this, strongly suggest, no, you shouldn’t do this, or it makes no impact. So I think it’s also good to have an idea of

110 00:17:28.530 ⇒ 00:17:32.019 Shreya Chowdhury: Depending on each outcome, how would we move forward?

111 00:17:32.460 ⇒ 00:17:35.919 Shreya Chowdhury: Yeah, so I feel like that…

112 00:17:37.280 ⇒ 00:17:43.210 Shreya Chowdhury: That’s, like, a good start to, like, deciding which hypotheses are worth testing.

113 00:17:43.540 ⇒ 00:17:59.859 Shreya Chowdhury: it’s also good to just, yeah, like, again, get very specific and intentional about what metrics you do want to improve. Like, I think, yeah, like we just mentioned, like, a lot of times, like, yeah, we can look at clicks, but we’re not looking at conversion rate, like…

114 00:17:59.930 ⇒ 00:18:05.020 Shreya Chowdhury: We can have a good conversion rate, but we don’t have a good retention rate, so it just is like…

115 00:18:05.270 ⇒ 00:18:06.140 Shreya Chowdhury: what…

116 00:18:06.330 ⇒ 00:18:13.719 Shreya Chowdhury: what is the thing that we’re trying to optimize for? And it’s hard to, like, optimize for all of those things in one experiment.

117 00:18:14.430 ⇒ 00:18:26.879 Shreya Chowdhury: So, if we’re looking at the wrong metric sometimes, like, it doesn’t… even if it looks like the needle’s moving in the right direction, if we’re looking at the wrong thing, it doesn’t necessarily constitute a success, I would say.

118 00:18:28.700 ⇒ 00:18:36.779 Shreya Chowdhury: And then, the other thing I would say is, like, avoiding pet project tests. So, like, we should filter out to, like.

119 00:18:38.450 ⇒ 00:18:57.789 Shreya Chowdhury: I think it’s good to find the right balance of robustness and resourcefulness. Like, you just don’t want to waste time and money over-indexing on experimentation. There is value in being able to say, like, hey, we’re making this decision, and we have a lot of data to back this up, but at the same time, it’s like, if we spent two weeks

120 00:18:58.290 ⇒ 00:19:07.459 Shreya Chowdhury: To get, you know, to run that experiment and get to that decision, whereas we could have made that 2 weeks ago, then, like, that experiment probably wasn’t worth running.

121 00:19:07.690 ⇒ 00:19:10.660 Shreya Chowdhury: Yeah.

122 00:19:12.200 ⇒ 00:19:17.770 Jake Nathan: That… that completely makes sense, and I… I think… One of the…

123 00:19:17.900 ⇒ 00:19:21.259 Jake Nathan: Things that resonated with me the most is, like, almost mapping out

124 00:19:22.440 ⇒ 00:19:31.280 Jake Nathan: How, how, like, depending on the data that we get back, like, how will we move forward if we actually,

125 00:19:31.410 ⇒ 00:19:44.399 Jake Nathan: have the data, and we… we get it back, and we run this test, like, how is our action gonna actually change? Because if you can’t figure that out, then it’s not… probably not worth running in the first place if you… if you can’t…

126 00:19:44.400 ⇒ 00:19:55.869 Jake Nathan: see how you’re gonna actually, like, how it’s gonna materially impact, what you’re gonna do. So… and you mentioned… another thing you mentioned was you don’t want to have too many metrics, like, do you…

127 00:19:55.870 ⇒ 00:20:05.230 Jake Nathan: Is it good to have one metric? Is it good to have… do you generally recommend, you know, one to three? Obviously, everything is so dependent, but, like.

128 00:20:05.330 ⇒ 00:20:07.370 Jake Nathan: What’s your advice there?

129 00:20:07.640 ⇒ 00:20:20.660 Shreya Chowdhury: I think… here’s the thing, in an ideal world, if you have, like, an infinite amount of time, resources, money, like, you test for, like, as much of the things that you want, but it’s, like, one, there’s always gonna be constraints on budget.

130 00:20:20.660 ⇒ 00:20:28.870 Shreya Chowdhury: prioritization, all that kind of stuff, so it’s like, pick the ones that we absolutely care about. I feel like usually what I’ll try to do is, like.

131 00:20:29.220 ⇒ 00:20:40.469 Shreya Chowdhury: Again, like, yeah, the ones that are super important, and then you can throw in another couple of nice-to-haves that are like, okay, yeah, this is, like, what we expected to see that’s good, or, like, we…

132 00:20:40.470 ⇒ 00:20:50.379 Shreya Chowdhury: oh, this is, like, surprising, because sometimes, like, if you look through all of the metrics, like, it’s good to have additional stuff to, like, look under the hood. But yeah, generally, I think…

133 00:20:50.610 ⇒ 00:20:59.039 Shreya Chowdhury: It’s good to have a mix of some, like, overall counts, like, like, large counts, and then,

134 00:20:59.640 ⇒ 00:21:09.000 Shreya Chowdhury: like, some proportional rates. And… Yeah, I feel like…

135 00:21:09.600 ⇒ 00:21:21.029 Shreya Chowdhury: I usually try to look at, like, 5 different ones, but again, it really depends on, like, what we’re measuring for. So I feel like that one is also usually context-dependent,

136 00:21:22.440 ⇒ 00:21:23.350 Shreya Chowdhury: Yeah.

137 00:21:23.940 ⇒ 00:21:32.650 Jake Nathan: That… that makes sense. It seems like what you’re saying is, like, yeah, have the… kind of the main ones that you really care about, but then have other ones that, like you’re saying, are more…

138 00:21:32.980 ⇒ 00:21:36.550 Jake Nathan: Like, they might confirm or just,

139 00:21:36.960 ⇒ 00:21:40.130 Jake Nathan: Yeah, they’re more, like, nice-to-haves, but they’re not gonna…

140 00:21:40.300 ⇒ 00:21:44.370 Jake Nathan: Those aren’t gonna be the ones that you’re making decisions off of,

141 00:21:44.790 ⇒ 00:21:57.809 Jake Nathan: And, yeah, I can think of a lot of examples, just in my own experience of, like… like I said, some… some videos I’ve posted to sell something, I’ve gotten 10 million views on a video, and then gotten 50,000 views, and the one with 50,000 views actually got

142 00:21:57.900 ⇒ 00:22:14.909 Jake Nathan: more, more sales than the one with 10 million views, so it’s like, if you’re… depending on what you’re tracking, you know, I… someone could say, whoa, that was such a win that you did 10… yeah, so, and I guess, this… this isn’t a question on here, but, this is making me think…

143 00:22:15.090 ⇒ 00:22:20.150 Jake Nathan: like… Are there times… When you’d recommend, or…

144 00:22:20.490 ⇒ 00:22:39.370 Jake Nathan: it seems like unless you have all the data, if you don’t have the data, like, what do you do? Do you try to get more data to do the A-B test, or, like, is there a lot of times, like, it’s just… it’s… should teams really… like, do teams over A-B test? Like, should some of them just not A-B test until they know that they have enough data?

145 00:22:41.470 ⇒ 00:22:51.299 Shreya Chowdhury: I think generally, yes. Like, you’re just not gonna get a lot of value out of running a test where the data quality is poor.

146 00:22:51.520 ⇒ 00:22:57.329 Shreya Chowdhury: Again, if you have infinite time, resources, money, you can go ahead just for fun, but that…

147 00:22:57.550 ⇒ 00:22:59.540 Shreya Chowdhury: That really is just kind of, like.

148 00:22:59.820 ⇒ 00:23:09.659 Shreya Chowdhury: playing around, I don’t think it’s usually worth it. But yeah, a lot of times it’s hard, like, if you don’t have, like, good quality data to begin with, like.

149 00:23:09.730 ⇒ 00:23:23.160 Shreya Chowdhury: it’s hard to, like, draw insights from that generally, so you kind of just have to go based on your gut and, like, other assumptions, like, hey, this is, like, the right decision, this is how… the direction we want to move in, let’s just do it. Other times, like.

150 00:23:23.630 ⇒ 00:23:30.629 Shreya Chowdhury: You might have… Good quality data, but not the data that you want to test for, or, like.

151 00:23:31.380 ⇒ 00:23:34.709 Shreya Chowdhury: like, when you’re running an A-B test, you basically have to have

152 00:23:35.190 ⇒ 00:23:47.089 Shreya Chowdhury: like, you already have to have all the options, so you have to have, like, a test group, a control group, so you kind of have to build the product or feature anyways. So that’s the part that it’s, like, a lot, like.

153 00:23:47.120 ⇒ 00:23:56.030 Shreya Chowdhury: I don’t know, I feel like sometimes people think it’s like, should we do this? And it’s like, to run an A-B test, sometimes you have to do it, roll it out, and then decide, do we want to roll it out

154 00:23:56.100 ⇒ 00:24:01.569 Shreya Chowdhury: like, more largely, or pulled back. So that’s where it’s kind of like…

155 00:24:02.140 ⇒ 00:24:09.790 Shreya Chowdhury: it takes a lot more effort to confirm or deny a hypothesis you might have already.

156 00:24:09.950 ⇒ 00:24:16.720 Shreya Chowdhury: And then that’s… I think it’s, like, worth being careful there to, like, avoid any biases, because you don’t want to, like…

157 00:24:16.920 ⇒ 00:24:23.700 Shreya Chowdhury: go into an experiment and, like, really, like, hope for one outcome, or, like, look for one outcome.

158 00:24:25.450 ⇒ 00:24:27.240 Jake Nathan: Yeah, do you… how do you…

159 00:24:28.150 ⇒ 00:24:31.720 Jake Nathan: we’re all pretty biased. How do you avoid bias?

160 00:24:32.460 ⇒ 00:24:36.890 Shreya Chowdhury: I think it’s…

161 00:24:37.220 ⇒ 00:24:54.899 Shreya Chowdhury: A lot of times, like, I’ll be kind of, like, in the past, I’ve definitely worked on tests where it’s like, oh, we know that this is something that we want to roll out or want to do, so we’d really love to see this metric improve. So, like, I’ll kind of just be pigeonhole, like, digging through, like, segmenting all of the data.

162 00:24:54.900 ⇒ 00:25:00.910 Shreya Chowdhury: And seeing, like, oh, did this make a difference, like, anywhere? Like, it would be really good to know. And sometimes, like.

163 00:25:01.210 ⇒ 00:25:04.060 Shreya Chowdhury: that’s just not the case, and if it’s not, then, like.

164 00:25:04.410 ⇒ 00:25:11.309 Shreya Chowdhury: I mean, the data’s not always gonna be nice or, like, agree with me, there’s nothing I can do about that, so I think it’s also important to, like.

165 00:25:11.660 ⇒ 00:25:20.389 Shreya Chowdhury: kind of mentally prepare yourself that, like, hey, this might not go the way we want it to go, and yeah, you kind of just have to be prepared for that sometimes.

166 00:25:21.420 ⇒ 00:25:22.010 Shreya Chowdhury: Yeah.

167 00:25:22.010 ⇒ 00:25:23.270 Jake Nathan: Yeah, that…

168 00:25:23.500 ⇒ 00:25:34.130 Jake Nathan: That makes sense, and we’ve… I feel like we’ve already touched on this, but do you have any other thoughts? Like, are there any metrics you’ve seen teams obsess over that don’t matter that much? We kind of just talked about, like.

169 00:25:34.240 ⇒ 00:25:39.400 Jake Nathan: That, but is there anything else you want to add, or if not, like, we can keep going?

170 00:25:39.950 ⇒ 00:25:41.600 Shreya Chowdhury: Yeah,

171 00:25:42.960 ⇒ 00:25:57.910 Shreya Chowdhury: Maybe not, like, this is a slightly different question, but I think it’s just worth noting, and maybe we’ll get to this, I think one of the questions was, like, what makes a good A-B test, but it’s also worth, like, when you have a lot of the data, and, like, especially if

172 00:25:59.240 ⇒ 00:26:15.189 Shreya Chowdhury: it doesn’t… nothing sticks out to you initially, like, this is why I think it’s, like, worth, like, having a smaller… like, a narrower scope of things to look at, because a lot of times, like, you… it’s like a Russian nesting doll, like, you keep digging deeper and deeper into the same metrics.

173 00:26:15.330 ⇒ 00:26:17.659 Shreya Chowdhury: I think it’s always good to take, like.

174 00:26:18.260 ⇒ 00:26:22.480 Shreya Chowdhury: A look under the hood and go into all of it with a magnifying glass, because…

175 00:26:22.660 ⇒ 00:26:26.269 Shreya Chowdhury: There was this one time where we ran an experiment.

176 00:26:26.620 ⇒ 00:26:38.899 Shreya Chowdhury: and we had the two groups, and at, like, at a glance, at a surface level, there was no difference between the two groups. But then, like, we segmented the data, and we’re like, what if we just look at, like.

177 00:26:39.290 ⇒ 00:26:44.950 Shreya Chowdhury: what if we, like, segment the groups and look at, like.

178 00:26:45.050 ⇒ 00:27:05.760 Shreya Chowdhury: you know, people on a specific device, and people in, like, a specific, like, revenue bar, so we segmented them by, like, if they were mid-market, if they were really large, if they were just starting out. And then we saw that, like, okay, of these, like, specific groups, like, it did make a difference, even if it initially didn’t. And that’s something that, like.

179 00:27:05.770 ⇒ 00:27:13.789 Shreya Chowdhury: you wouldn’t think to do, or wouldn’t necessarily, like, see at a glance. So I think a lot of times, like.

180 00:27:14.970 ⇒ 00:27:22.920 Shreya Chowdhury: if… The teams obsess over, like, we want to improve this metric across, like, the entire population.

181 00:27:23.090 ⇒ 00:27:26.540 Shreya Chowdhury: That…

182 00:27:27.050 ⇒ 00:27:39.059 Shreya Chowdhury: like, you’re… like, yes, I understand wanting that, but it’s like, hey, maybe, like, you can still roll this out, but you just want to cater the solution to, like, this one specific group that, like, it seems to really work for, or something.

183 00:27:40.000 ⇒ 00:27:52.089 Jake Nathan: Mmm. Yeah, so it’s… it’s not just… like, yeah, it’s segmenting your demographic somehow. It might, like, maybe even go deeper on, like, what demographic is…

184 00:27:52.460 ⇒ 00:27:55.790 Jake Nathan: You want to… improve.

185 00:27:56.360 ⇒ 00:27:56.750 Shreya Chowdhury: Yeah.

186 00:27:56.750 ⇒ 00:28:03.259 Jake Nathan: The experience for, versus just, hey, across the board, we want every single person to…

187 00:28:03.450 ⇒ 00:28:09.039 Jake Nathan: We want to see a… yeah, that makes sense to me. Yeah. Okay, that’s good to know, and…

188 00:28:09.190 ⇒ 00:28:18.659 Jake Nathan: So, let’s say you run an A-B test, you have enough data, you have the right metrics, the business metrics, you get the result.

189 00:28:18.980 ⇒ 00:28:29.419 Jake Nathan: now, like you said, like, you want it to actually be something that has the, that can influence the product, like, how… how do you and your teams

190 00:28:29.580 ⇒ 00:28:43.320 Jake Nathan: how do you actually communicate? Like, how do you make it so it actually does influence the product? Like, what… how are you advocating for yourself, advocating for the team of, like, who are you talking to to actually make it happen?

191 00:28:43.770 ⇒ 00:28:47.049 Shreya Chowdhury: Yeah, so I think that comes back to, like, a…

192 00:28:47.710 ⇒ 00:28:55.360 Shreya Chowdhury: a culture of documenting the results, so, like, both the wins and failures, and you want to have a shared knowledge base, because I think…

193 00:28:55.440 ⇒ 00:29:06.279 Shreya Chowdhury: a lot of times, yeah, there is also value in knowing that, like, hey, this did nothing. So, like, we don’t have to go down this road again and, like, try and do things similar to this.

194 00:29:06.300 ⇒ 00:29:22.740 Shreya Chowdhury: And… yeah, like, the other thing is just being, like, really good and thorough about documentation and results. So, like, you have your post-experiment, like, analyses and, like, you know, recommendations based on those. Make sure that insights are visible and…

195 00:29:22.740 ⇒ 00:29:32.470 Shreya Chowdhury: comprehensive to all types of stakeholders, like, you want to make sure even people from a non-tech background understand, like, hey, even if you don’t understand, like, the statistical analysis here, like.

196 00:29:32.810 ⇒ 00:29:34.619 Shreya Chowdhury: Let’s just understand, like.

197 00:29:35.030 ⇒ 00:29:42.230 Shreya Chowdhury: what numbers are good, what numbers are bad, and, like, what does this mean for us? I think…

198 00:29:42.830 ⇒ 00:29:56.609 Shreya Chowdhury: it’s, yeah, like, it’s also good to celebrate when an experiment kills a bad idea early, like, that also, like, it just reinforces the culture of, like, oh, like.

199 00:29:57.380 ⇒ 00:30:06.330 Shreya Chowdhury: this is part of… it’s just one of the pitfalls of, like, running an experiment. Like, it would have been great if it reinforced the thing that we wanted, but at the same time, this is good to know.

200 00:30:06.600 ⇒ 00:30:11.789 Shreya Chowdhury: Yeah, I think as long as people are, like, receptive to…

201 00:30:11.980 ⇒ 00:30:17.830 Shreya Chowdhury: The data and what it’s pointing to, and you have good quality data, like… it…

202 00:30:18.010 ⇒ 00:30:21.060 Shreya Chowdhury: And you document it and present it well, like…

203 00:30:21.260 ⇒ 00:30:24.849 Shreya Chowdhury: That helps a lot in influencing, like, the product roadmaps.

204 00:30:27.100 ⇒ 00:30:29.639 Jake Nathan: Listening to that, it seems like…

205 00:30:29.930 ⇒ 00:30:44.799 Jake Nathan: especially when you’re saying that, like, celebrate when something kills a bad idea, really. I could logically see why that would make sense. Then I think, like, okay, if that idea is tied to a human, then, like, have you… have you had any experience where you’ve had to…

206 00:30:45.060 ⇒ 00:30:56.889 Jake Nathan: deal with, like, the emotional human side of things, of like, hey, even if the data says this is actually not good, you have to, like, somehow communicate that in a way to someone who had the idea, or…

207 00:30:57.010 ⇒ 00:30:58.020 Jake Nathan: Yeah, I’m.

208 00:30:58.020 ⇒ 00:31:14.969 Shreya Chowdhury: I think I’ve always, like, been fortunate enough to work with professionals. A lot of times, like, the big decision was above my pay grade, so it would end up in, like, one or more situations. One of them could be, like, oh, you know what? Even though they, like.

209 00:31:14.970 ⇒ 00:31:21.729 Shreya Chowdhury: the data doesn’t support this, we’re firm in our decision that we want to do this anyway. We’re gonna do it. And it’s like, okay, like…

210 00:31:22.060 ⇒ 00:31:28.590 Shreya Chowdhury: that is… like, if that’s your prerogative, then, like, whatever, I’m just here to present the data.

211 00:31:28.630 ⇒ 00:31:31.450 Shreya Chowdhury: I think sometimes it’s about, like.

212 00:31:31.470 ⇒ 00:31:45.820 Shreya Chowdhury: Yeah, like, it’s unfortunate if it’s a sunk cost thing, because you’ve had a lot of people work on the product feature, looking forward to rolling it out, like, getting emotionally invested, and like, you’ve already put in the man hours, and it’s like, oh, like, when you realize, oh shit, like.

213 00:31:45.820 ⇒ 00:31:52.199 Shreya Chowdhury: This either isn’t worth doing, it’s not gonna change anything, or, like, it’ll actually be bad, we can’t do it anymore, like…

214 00:31:52.940 ⇒ 00:32:04.589 Shreya Chowdhury: Yeah, it’s a bummer, but, I mean… I think at that point, like, when you know it’s, like, when you know it’s not gonna be a bad decision, and we already went through the effort of running the experiment, and…

215 00:32:05.150 ⇒ 00:32:14.490 Shreya Chowdhury: Yeah, if it’s the alternative situation where we’re, like, our decision hinges on this experiment, then, like, yeah, you have to kind of mourn the loss there.

216 00:32:15.740 ⇒ 00:32:16.680 Shreya Chowdhury: Yeah.

217 00:32:17.860 ⇒ 00:32:21.010 Jake Nathan: That makes sense, and back to what you were saying about, like.

218 00:32:21.250 ⇒ 00:32:32.219 Jake Nathan: So, having a shared knowledge base, that’s really important. I’m just trying to, like, ultimately with this piece of content, like, I kind of want to give someone who’s reading this, hey, if I’m just starting out.

219 00:32:32.220 ⇒ 00:32:42.880 Jake Nathan: like, what… what’s kind of just, like, spoon-fed to me? What’s, like, what’s a first step? And so, like, a knowledge base, for sure, seems… like, a shared knowledge base seems to be a first step.

220 00:32:42.930 ⇒ 00:32:53.630 Jake Nathan: do you usually have some sort of meeting cadence where you’re like, okay, before we run the experiment, we’re gonna meet during it, we’re gonna meet after? Like, what’s kind of a meeting cadence that you do?

221 00:32:53.630 ⇒ 00:33:08.200 Shreya Chowdhury: I think it’s really good to me before, just to have, like, a brainstorming session and, like, translate, like, your opinions and your, like, hypotheses and just all your gut feelings from plain English into data.

222 00:33:09.010 ⇒ 00:33:11.399 Shreya Chowdhury: Yeah, so, like, when you know, like, yeah, from…

223 00:33:12.400 ⇒ 00:33:21.689 Shreya Chowdhury: yeah, so, like, what you’re feeling or thinking to English, to data, because a lot of times it’s hard to… like, if there’s something that’s gen… generally, like, immeasurable, it’s like.

224 00:33:22.560 ⇒ 00:33:34.340 Shreya Chowdhury: there’s nothing we can do about it, but for everything that can be measured, let’s try and get as precise as we can. And let’s think about, like, what does success look like to us? Like, what do we want to optimize for?

225 00:33:34.340 ⇒ 00:33:45.819 Shreya Chowdhury: what different segments and groups do we want to look at? Like, how big do we want our sample size to be? How long do we want to run the experiment? Like, and if we get result X, Y, and Z, like.

226 00:33:46.300 ⇒ 00:33:59.759 Shreya Chowdhury: what would we do, as a result of that? And then, a lot of those you can leave, like, somewhat iffy until you get the results of the experiment, and then you, like, get more, like,

227 00:34:00.110 ⇒ 00:34:10.310 Shreya Chowdhury: like, you have more conviction in your opinions. So yeah, I think it’s definitely good to meet before and get really, really aligned on,

228 00:34:11.290 ⇒ 00:34:13.309 Shreya Chowdhury: How you want to run the experiment.

229 00:34:13.489 ⇒ 00:34:24.699 Shreya Chowdhury: I think it’s less important to be meeting throughout. It’s good to just monitor, like, the results, just to make sure that, like, nothing breaks, or it’s, like…

230 00:34:25.639 ⇒ 00:34:39.070 Shreya Chowdhury: make sure that it didn’t largely drop in any North Star metrics, because then you kind of want to rectify that more immediately. But as long as everything is mostly smooth, then I think the pre and the post are what’s most important.

231 00:34:39.070 ⇒ 00:34:52.070 Shreya Chowdhury: Because post-experiment, like, you really want to go through all that data with a fine-tooth comb and be like, okay, yeah, since we… now we have, like, all of these robust results, like, let’s make sure that,

232 00:34:52.900 ⇒ 00:34:56.240 Shreya Chowdhury: I don’t know, you squeeze the most juice out of it that you can.

233 00:34:57.090 ⇒ 00:35:08.509 Jake Nathan: That makes sense, and I know we’re coming up on time. I guess one… one last question, and it actually kind of relates to our first question, is how… how long do you usually run A-B tests?

234 00:35:08.980 ⇒ 00:35:16.750 Shreya Chowdhury: Yeah, so I’ve had ones that I’ve run for 2 weeks, and then I’ve had ones that have run for a month.

235 00:35:16.970 ⇒ 00:35:22.039 Shreya Chowdhury: I think that one also just depends on how much

236 00:35:22.250 ⇒ 00:35:31.830 Shreya Chowdhury: time you have, and how much data you want to collect reasonably. And then also sometimes, like, you have to, like.

237 00:35:32.030 ⇒ 00:35:50.119 Shreya Chowdhury: sometimes we would have to stagger the experiments a little bit, because if we’re experiment… like, if we’re running two experiments simultaneously, like, we want to make sure that there’s no overlap, like, we don’t want certain people in, like, both groups, and then we can’t, like, control for the results,

238 00:35:50.220 ⇒ 00:35:56.230 Shreya Chowdhury: So yeah, it just depends on your schedule, how many other tests you’re running, yeah.

239 00:35:57.420 ⇒ 00:36:00.250 Jake Nathan: That makes sense. Would you say generally, like.

240 00:36:00.600 ⇒ 00:36:04.910 Jake Nathan: Are most of the tests between, kind of, that 2 weeks to 1 month?

241 00:36:05.140 ⇒ 00:36:12.060 Shreya Chowdhury: I’ve never run one that’s less than 2 weeks, that’s, like, the minimum. I…

242 00:36:12.440 ⇒ 00:36:15.530 Shreya Chowdhury: I’m not gonna say without a doubt that

243 00:36:15.830 ⇒ 00:36:24.410 Shreya Chowdhury: you can’t go more or less. I’m sure there’s, like, you can find cases for all of the above,

244 00:36:24.620 ⇒ 00:36:30.860 Shreya Chowdhury: But I think generally a good rule of thumb is, like, two-ish weeks. Gotcha.

245 00:36:31.170 ⇒ 00:36:35.410 Jake Nathan: Okay, cool. Well, I know you got this meeting with Robert right now.

246 00:36:35.410 ⇒ 00:36:58.670 Jake Nathan: Thank you, this was… this was really helpful. I, I’d love to, see what I can do with this, just to start, and, I’ll send you back, like, you and Utam back, like, a draft, sometime next week, and then if we feel like we need to beef it up more, and like, ask a few more questions, then we can definitely do that. I just want to be respectful of your time. But yeah, this was really helpful, and thank you, like.

247 00:36:58.910 ⇒ 00:37:07.170 Jake Nathan: This is one of the… I’ve had, like, a few conversations so far, and this is, like, by far the most, like, tactical, so I appreciate you, like, trying to, like, really dig down and be tactical.

248 00:37:07.170 ⇒ 00:37:23.620 Shreya Chowdhury: Yeah, I’m really happy to hear that. I’m sure there’s certain things that, like, were worth mentioning that I missed. I know we’re both, like, a little bit strapped for time today, but, yeah, send over whatever draft, and then if I have any other thoughts or, like, answers to questions, I can also type it up and send it over to you, too.

249 00:37:23.990 ⇒ 00:37:30.490 Jake Nathan: Yeah, that sounds good. Awesome. Well, yeah, it’s super nice to meet you, and thanks again for making time, and yeah, I’m sure we’ll…

250 00:37:30.750 ⇒ 00:37:37.490 Jake Nathan: see more of each other, since we’re both… Yeah, definitely. Yeah, stay in touch regardless. Okay, that sounds good. Alright, Sherry, talk to you later.

251 00:37:37.490 ⇒ 00:37:38.170 Shreya Chowdhury: Bye.

252 00:37:38.370 ⇒ 00:37:38.970 Jake Nathan: Bye.

Brainforge Knowledge

Explorer

2025-09-12_brainforge_a-b_testing_discussion_432bd8fc

Graph View