Meeting Title: Mathieu <> Robert Interview Session Date: 2025-05-21 Meeting participants: Mathieu Dumoulin, Hannah Wang, Robert Tseng


WEBVTT

1 00:03:31.730 00:03:32.950 Hannah Wang: Hello!

2 00:04:06.150 00:04:08.200 Hannah Wang: Hello! Can you hear me?

3 00:04:12.410 00:04:12.885 Robert Tseng: Hello!

4 00:04:13.890 00:04:14.880 Hannah Wang: Hello!

5 00:04:15.500 00:04:16.510 Robert Tseng: Hello! Hello!

6 00:04:21.279 00:04:22.390 Hannah Wang: Oh, what’s happening?

7 00:04:22.390 00:04:24.960 Robert Tseng: Hold on. I’ve got to change to headphones.

8 00:04:25.320 00:04:25.789 Hannah Wang: Okay.

9 00:04:53.750 00:04:54.700 Robert Tseng: Where are you?

10 00:04:55.570 00:05:02.590 Hannah Wang: I’m at the Limbs place cause their cat sitting Ellison’s cat, but

11 00:05:02.740 00:05:05.350 Hannah Wang: they can’t watch her obviously, for

12 00:05:05.550 00:05:08.490 Hannah Wang: 2 weeks. No, I’m cat sitting.

13 00:05:08.680 00:05:14.190 Hannah Wang: Oh, cats are so scary. Okay? Oh, okay.

14 00:05:14.190 00:05:15.590 Robert Tseng: Ellison’s cat.

15 00:05:16.210 00:05:29.690 Hannah Wang: Yeah. Well, her mom was watching her, but her mom went to China because her grandpa is not doing well, so there’s no one to watch her cat, because Ellison’s in Australia so

16 00:05:30.694 00:05:35.069 Hannah Wang: she asked me if I can catch it, but then Eric’s allergic.

17 00:05:35.290 00:05:42.439 Hannah Wang: So I told them. Maybe Nathan and cat can do it, but they’re in Asia for 2 weeks, so

18 00:05:43.330 00:05:47.610 Hannah Wang: there’s like a bunch of people checking in on her while they’re gone.

19 00:05:47.990 00:05:50.200 Hannah Wang: Me one of them so.

20 00:05:50.200 00:05:51.110 Robert Tseng: I see.

21 00:05:52.060 00:05:58.629 Hannah Wang: Yeah, she’s cats are like terrifying. I don’t know how to interact with them. Honestly.

22 00:05:58.980 00:06:01.219 Robert Tseng: Yeah, she doesn’t look like a very cuddly cat.

23 00:06:01.450 00:06:03.409 Hannah Wang: No, she is, but she like

24 00:06:05.720 00:06:07.739 Hannah Wang: I just. I’m just scared of cats.

25 00:06:08.680 00:06:09.390 Robert Tseng: Yeah.

26 00:06:09.540 00:06:14.310 Hannah Wang: Because their claws are like they just come out whenever and they scratch you.

27 00:06:14.610 00:06:20.035 Robert Tseng: Yeah, dude, I have no idea how this is. Gonna go.

28 00:06:20.290 00:06:28.520 Hannah Wang: Oh, it’s okay. We’ll just see. I mean, I don’t think we can clip anything from it, anyway. So

29 00:06:30.100 00:06:32.020 Hannah Wang: that should be fine.

30 00:06:33.160 00:06:34.739 Robert Tseng: Oh, we’re not gonna get anything out of it.

31 00:06:35.220 00:06:39.180 Hannah Wang: Oh, I I don’t know. I his email was like, thanks.

32 00:06:39.430 00:06:40.860 Hannah Wang: I was like, you’re welcome.

33 00:06:42.570 00:06:43.930 Hannah Wang: We’ll see.

34 00:06:43.930 00:06:46.020 Robert Tseng: That’s 500 trial run.

35 00:06:46.330 00:06:47.050 Hannah Wang: Yeah.

36 00:06:47.050 00:06:53.180 Robert Tseng: I have done 0 prep. Other than well, other than the outline thing which should be good, I’ll take it, and I’ll do the next one better.

37 00:06:54.000 00:06:57.320 Hannah Wang: I mean, I feel like you’re pretty good at talking to people so.

38 00:06:58.000 00:06:59.080 Robert Tseng: Yeah,

39 00:07:00.490 00:07:02.380 Hannah Wang: Okay.

40 00:07:05.440 00:07:06.970 Hannah Wang: what else?

41 00:07:09.510 00:07:11.529 Hannah Wang: Yeah, I think I should be fine.

42 00:07:12.210 00:07:12.930 Robert Tseng: Nice.

43 00:07:12.930 00:07:13.660 Hannah Wang: Oh!

44 00:07:15.180 00:07:16.849 Robert Tseng: He’s kind of an eccentric dude.

45 00:07:17.760 00:07:18.360 Hannah Wang: Oh, really.

46 00:07:18.360 00:07:19.650 Robert Tseng: Kind of weird.

47 00:07:19.790 00:07:21.449 Hannah Wang: Where are you? Gonna be on the call?

48 00:07:22.010 00:07:27.059 Hannah Wang: I’ll be video off so that the gallery view is just Youtube. I’ll be here.

49 00:07:27.390 00:07:27.970 Robert Tseng: Okay.

50 00:07:28.510 00:07:29.960 Robert Tseng: Oh, that makes me feel better.

51 00:07:30.770 00:07:32.966 Hannah Wang: Does it? Okay, that’s good. I’ll just be lurking.

52 00:07:34.830 00:07:35.640 Robert Tseng: Yeah.

53 00:07:48.680 00:07:51.130 Robert Tseng: I can’t wait for vacation.

54 00:07:51.950 00:07:53.030 Hannah Wang: Yeah.

55 00:07:54.390 00:07:59.950 Robert Tseng: Dave Mulatti is gonna be in New York for like 5 HI was trying to see him tomorrow, but I don’t think it’s gonna work out.

56 00:08:00.260 00:08:01.920 Hannah Wang: Always like layovering.

57 00:08:01.920 00:08:08.360 Robert Tseng: Yeah in Jersey. If I had my car and I was still living there, it’d be easier. But now

58 00:08:08.860 00:08:10.719 Robert Tseng: I don’t know if I’m gonna go over there.

59 00:08:11.130 00:08:14.320 Hannah Wang: Yeah, it’s far transit. Wise.

60 00:08:14.500 00:08:18.320 Robert Tseng: Yeah, I am.

61 00:08:21.075 00:08:27.290 Robert Tseng: I hope he. I hope he is like 5 min late. That would give me some more downtime.

62 00:08:27.680 00:08:28.580 Hannah Wang: Oh, yeah.

63 00:08:30.820 00:08:32.610 Robert Tseng: I actually wouldn’t mind if he canceled.

64 00:08:33.870 00:08:35.860 Hannah Wang: We’ll see. We’ll see what happens.

65 00:08:44.810 00:08:46.790 Robert Tseng: It’s like cold and rainy here.

66 00:08:47.240 00:08:49.500 Hannah Wang: Oh, really. Oh, yeah, I see your window.

67 00:08:49.680 00:08:50.330 Robert Tseng: Yeah.

68 00:08:50.540 00:08:54.220 Hannah Wang: It’s like the opposite here. It’s hot,

69 00:08:56.410 00:09:01.550 Robert Tseng: What does the Lims place look like now? The setting anything looks different. I mean, I see the couch is like somewhere else.

70 00:09:02.220 00:09:03.200 Hannah Wang: There’s a shrine of.

71 00:09:03.200 00:09:04.760 Robert Tseng: Oh, my goodness!

72 00:09:04.760 00:09:05.780 Hannah Wang: Plushies.

73 00:09:05.780 00:09:06.510 Robert Tseng: Yeah.

74 00:09:07.470 00:09:11.000 Hannah Wang: There’s the cat just chilling there.

75 00:09:11.230 00:09:13.450 Robert Tseng: Looks scared. But yes.

76 00:09:17.330 00:09:18.410 Hannah Wang: Okay.

77 00:09:19.330 00:09:20.280 Hannah Wang: Oh.

78 00:09:20.670 00:09:22.790 Robert Tseng: Oh, another cat!

79 00:09:25.030 00:09:25.629 Mathieu Dumoulin: Hello! There!

80 00:09:26.230 00:09:27.280 Robert Tseng: Hey! Mateur.

81 00:09:28.350 00:09:29.860 Mathieu Dumoulin: Please call me Matthew.

82 00:09:31.560 00:09:36.670 Robert Tseng: Yeah, you prefer Matthew Matthew Matte. I don’t even know Matthew.

83 00:09:36.670 00:09:40.347 Mathieu Dumoulin: Matthew Matt. And and yeah, that’s

84 00:09:41.280 00:09:43.239 Mathieu Dumoulin: it’s a good. It’s a good choice.

85 00:09:43.700 00:09:47.130 Robert Tseng: Go, may I?

86 00:09:47.130 00:09:49.839 Robert Tseng: Or are you in the I’m assuming you’re at home? Not the office.

87 00:09:49.930 00:09:52.249 Mathieu Dumoulin: That’s right. I’m at home today.

88 00:09:52.250 00:09:52.940 Robert Tseng: Okay.

89 00:09:53.630 00:09:59.670 Robert Tseng: And you, I saw a cat, and you’re we’re we’re just. She was just showing me a cat that she’s cat sitting for. So.

90 00:09:59.670 00:10:00.350 Hannah Wang: Yeah.

91 00:10:00.350 00:10:02.900 Mathieu Dumoulin: Yes, I have a big Maine coon.

92 00:10:03.200 00:10:06.670 Hannah Wang: Oh, I love Maine coons. Yeah.

93 00:10:06.670 00:10:07.810 Mathieu Dumoulin: Very fluffy.

94 00:10:07.810 00:10:14.860 Hannah Wang: Yeah, the fluffy. Yeah, I don’t know what breed she is, but she’s like a white, not so fluffy cat. But

95 00:10:15.080 00:10:24.242 Hannah Wang: it’s my 1st time cat sitting. And I don’t really know how to interact with them. Cause I’m a dog person. So it’s been interesting cat sitting. Yeah.

96 00:10:24.570 00:10:29.329 Robert Tseng: Oh, that’s the thing with cats, you you do not need to interact with them.

97 00:10:29.330 00:10:29.670 Hannah Wang: That is.

98 00:10:29.670 00:10:31.610 Robert Tseng: They interact with you? Yes.

99 00:10:31.610 00:10:31.980 Hannah Wang: Yes.

100 00:10:31.980 00:10:34.060 Mathieu Dumoulin: If they, if they choose to do so.

101 00:10:34.060 00:10:36.280 Robert Tseng: Yeah, nice.

102 00:10:37.000 00:10:37.440 Hannah Wang: Awesome.

103 00:10:37.440 00:10:48.330 Robert Tseng: Cool. Well, thanks for taking time to jump on this call. Matthew, I think. Yeah. I was excited to follow up on our conversation we met at the rudder stacks events. And

104 00:10:48.950 00:11:04.320 Robert Tseng: yeah, I mean, I know that you’re you’re kind of like cautious. Don’t want. We’re not gonna like, cut it up and post it a bunch all over social media. But you know, figured since we already outlined out some stuff that I’d be interested to ask you about, anyway, that’d be a good yeah, this opportunity for us to

105 00:11:05.070 00:11:25.360 Robert Tseng: go through an interview, anyway, and then kind of more of like a discussion honestly. And then, yeah, I mean, well, we may. I guess I I don’t know if this is communicated by asking if she could. Ask you, you know, if before we publish or release anything, we would obviously run it by you, maybe it’d be like Doc, like.

106 00:11:25.690 00:11:29.280 Robert Tseng: I don’t know, like an article, or like some anonymized thing, or whatever.

107 00:11:30.700 00:11:32.140 Mathieu Dumoulin: Yeah, yeah, that’s fine with me.

108 00:11:32.140 00:11:33.680 Robert Tseng: Okay, cool.

109 00:11:34.050 00:11:43.239 Robert Tseng: Alright. Well, yeah. I mean, that’s kind of the the intro. I’ll just kind of get get into it then. But yeah, I think, particularly since you’re you know the

110 00:11:44.210 00:11:51.710 Robert Tseng: I, I think you’re the only data person at Coast Fay, right? And you built it from built that data, the data team from scratch. We were.

111 00:11:51.710 00:11:52.720 Mathieu Dumoulin: Yeah, that’s right.

112 00:11:52.720 00:12:11.471 Robert Tseng: Yeah, we would be. It’d be cool to talk about the challenges of building kind of 0 to one data functions. I think just to recap like, you know, I run Brainforge Brainforge like a data and AI consultancy. We work with companies and help them stand up data stacks. So at rudder stack, I was representing Eden health, which was,

113 00:12:11.850 00:12:23.919 Robert Tseng: yeah, I mean, I’m we. We basically are their data team. I have like 3 of my team staff on on that client. So we’re not a 1 man shop, which is why what you’re doing is very impressive, because

114 00:12:24.030 00:12:33.539 Robert Tseng: pretty much take on the scope of 4 people under under one under one brain, and I’d love to just kind of unpack that more through our conversation today.

115 00:12:33.939 00:12:57.959 Robert Tseng: But I was like clicking into your Linkedin, and like looking in at your background. I know. I mean, I remember you told me you were in consulting. So I didn’t know that you spent so many years in Japan. And kind of like led various like software consulting data role. So just such an interesting nontraditional data background that I’d love to just hear you talk about your career in your own words, about like kind of how that prepared you for your current role now. And

116 00:12:58.470 00:13:00.159 Robert Tseng: yeah, maybe we’ll just start there.

117 00:13:02.700 00:13:06.680 Mathieu Dumoulin: Yes, nontraditional definitely resonates.

118 00:13:07.770 00:13:15.960 Mathieu Dumoulin: I have spent quite a bit of time over the past 15 years in the data and analytics

119 00:13:15.970 00:13:45.079 Mathieu Dumoulin: space. I came at it from the direction of data science rather than traditional analytics, or more recently, learning spark and calling oneself a data engineer. I would fit the more traditional description of the role meaning that I have a traditional software engineering background. And then I added, data science

120 00:13:45.160 00:13:52.960 Mathieu Dumoulin: on top of that and eventually settled on the data space as an area where I felt

121 00:13:53.607 00:13:58.260 Mathieu Dumoulin: re, you know, was more consistent with my preference for back end

122 00:13:58.980 00:14:07.410 Mathieu Dumoulin: back end software engineering and felt that there were sufficiently

123 00:14:08.180 00:14:29.486 Mathieu Dumoulin: meaty challenges to solve that that I thought were technically interesting. I moved across a lot of different stacks I I touched. I got really really deep into hadoop back in the days. Then I got really deep into the native cloud.

124 00:14:30.300 00:14:41.795 Mathieu Dumoulin: capabilities of aws and azure, for example, leveraging things like Emr glue and azure data factory and

125 00:14:43.080 00:14:46.989 Mathieu Dumoulin: more recently, I’ve I’ve much more. Found a lot of

126 00:14:47.560 00:15:06.000 Mathieu Dumoulin: interest in working on the modern data stack and to kind of close close the loop on on what I’m doing at coast. I think the the thinking around the modern data stack really fit well with the

127 00:15:07.050 00:15:21.455 Mathieu Dumoulin: context of that organization, the challenges that it needed to solve. And it’s it’s indeed proven. It’s been proven out by allowing me as a

128 00:15:22.390 00:15:50.032 Mathieu Dumoulin: you know, one man team to marshal the resources of the organization. There’s a lot of data going on at coast. It’s not because they didn’t necessarily have a a data focused person that there’s not quite a lot of data work that happens all across both the software engineering product side as well as the analytics side. But everything was happening. In in

129 00:15:50.960 00:15:54.099 Mathieu Dumoulin: a second class, right? Data as a second class citizen.

130 00:15:54.818 00:16:06.539 Mathieu Dumoulin: I think that that my role has been to cut through complexity and bring that data as a 1st class citizen perspective

131 00:16:07.090 00:16:20.100 Mathieu Dumoulin: and an accelerated path to value by leveraging. You know, and applying the the ideas around the modern data stack like out of the the.

132 00:16:20.880 00:16:39.019 Mathieu Dumoulin: you know. Once I got into the company and and understood the problems better. And again based on on you know, a background that has allowed me to touch a lot of different stacks and design them and build them this was the the solution that I thought was was the right one. And

133 00:16:39.280 00:16:45.729 Mathieu Dumoulin: after a year I think that that we have a lot to show for based on those early decisions.

134 00:16:46.450 00:17:15.479 Robert Tseng: Cool. Yeah, I mean, thanks for the overview, I think, totally makes more sense kind of how you ended up. Just like really getting the hang of the modern data stack being from a traditional software engineering background. Because a lot of the practices in the modern data stack are basically trying to mirror software engineering practices, right? So totally makes sense to you. I didn’t know you were from. You were like doing data science like 1015 years ago. So I if I’d like to just kinda go back a bit to that. And you know.

135 00:17:15.609 00:17:38.370 Robert Tseng: back then, 15 years ago, data, science wasn’t even really called data science, you know. I mean, I would. I would go off as far to say that it was mostly, you know, it’s more like a folks that were in insurance or finance were already kind of like doing this type of, you know, predictive modeling. It wasn’t. Didn’t have the sexy name data science attached to it that now all these other

136 00:17:38.725 00:17:57.230 Robert Tseng: Companies have popularized since you know, social media and entertainment companies started to bring kind of poach those folks over to their companies. So I’m curious, like, how you kind of made your foray into specifically data science. And then, yeah, like, what? Where? Where did that kind of start for you?

137 00:17:58.083 00:18:01.646 Mathieu Dumoulin: I ended up doing a

138 00:18:02.440 00:18:12.490 Mathieu Dumoulin: master’s degree in computer science a little bit later than most people at around age 30, and

139 00:18:12.650 00:18:16.209 Mathieu Dumoulin: and pretty much in the 1st

140 00:18:16.460 00:18:24.680 Mathieu Dumoulin: 2 weeks of the 1st semester of that master’s degree, I had picked some classes on machine learning.

141 00:18:25.590 00:18:36.810 Mathieu Dumoulin: And I immediately understood that this was a technology that would change the world like that was

142 00:18:37.872 00:18:44.380 Mathieu Dumoulin: incredibly clear. I I would say, just like somebody using chat. Gpt

143 00:18:44.861 00:18:51.369 Mathieu Dumoulin: for the 1st time and going holy smokes. This thing is is real and it’s spectacular, you know.

144 00:18:52.190 00:18:52.985 Mathieu Dumoulin: And

145 00:18:53.960 00:19:12.870 Mathieu Dumoulin: you’re quite correct at the time. This was a little bit just a little bit before data science became known as data science, even though it had been practiced for quite a while earlier, and the topic of my research was at the intersection of

146 00:19:12.870 00:19:32.870 Mathieu Dumoulin: machine learning and what what at the time was called big data engineering. And I created a text processing pipeline, using hadoop and and all kinds of libraries and and tools of that library of that ecosystem to

147 00:19:33.500 00:19:56.150 Mathieu Dumoulin: classify government Rfps to create a personalized pipeline of Rsps of Rfps for consultancy directors of the Consultancy that was sponsoring my my master’s degree program.

148 00:19:57.030 00:20:07.110 Mathieu Dumoulin: And it worked. And then the machine learning challenge was very interesting the the big data part was was

149 00:20:08.120 00:20:34.081 Mathieu Dumoulin: overwhelming from beginning to end. But we eventually delivered something that actually worked against all odds. We’re talking about Hortonworks. 1.0 1.1, you know. It’s like at a time where where all of these things were crappy chef and puppet scripts to install things that didn’t work half the time. It was pretty an incredibly cool project,

150 00:20:34.500 00:20:39.960 Mathieu Dumoulin: but but certainly that when you come out of a project like that

151 00:20:40.870 00:20:45.015 Mathieu Dumoulin: you you have battle scars. And you you understand that?

152 00:20:46.197 00:21:04.512 Mathieu Dumoulin: basically, you can build anything if you put your mind to it, you know. And that, that kind of 0 to one designing a a a platform for for data processing that has some kind of real visible business utility

153 00:21:05.090 00:21:08.770 Mathieu Dumoulin: ends up being the driving

154 00:21:09.560 00:21:20.239 Mathieu Dumoulin: force of of you know what’s been driving my choices as I’ve changed from role to role over my career. Because that’s that’s what I’m deeply passionate about. You know.

155 00:21:20.830 00:21:28.419 Robert Tseng: Got it. Yeah, I mean, let me kind of just like respond to some of that. So I mean, you built this 1st kind of big.

156 00:21:28.910 00:21:42.899 Robert Tseng: you know, data pipeline using hadoop and I mean, my understanding of it is is basically what data lakes do for us automatically now. But like you had to kind of do all the parallelizing kind of, and figuring out that logic beforehand.

157 00:21:42.900 00:21:43.670 Mathieu Dumoulin: Different.

158 00:21:43.670 00:22:13.370 Robert Tseng: And then you stitch together all of these different data primitives from like, you know, the Apache days were like all these different projects were kind of like things that you had to go and piece piece by piece, kind of put it together. You know I you know I may date myself. I I started my data career and, like Snowflake, already existed, you know, and data lakes were already. We’re already there. So I’ve never had to like had to build that out end to end. And I’m curious. I mean, you did that. That’s truly 0 to one, you know, like back then

159 00:22:13.910 00:22:33.660 Robert Tseng: and as you’re doing it in the modern data stack. Now, like, I like to kind of make make that contrast really, really clear for for folks like, you know what? What’s the difference between doing the 0 to one build the 1st time that you did it, using all of those systems back then versus kind of like what 0 to one looks like with the modern data stack. Today.

160 00:22:34.342 00:22:40.500 Mathieu Dumoulin: I think that the most visible difference is what

161 00:22:41.328 00:22:45.031 Mathieu Dumoulin: is the pace of development and the

162 00:22:46.280 00:23:10.140 Mathieu Dumoulin: quality of the developer experience in order to and and also the the emergence of of a body of well understood, well known best practices to guide us. Yeah. So the tooling is better. Our understanding of what what we should do and what what the end State looks like is better

163 00:23:10.760 00:23:17.769 Mathieu Dumoulin: in the early Hadoop days that that I started that project. I did not know what the end

164 00:23:17.830 00:23:39.918 Mathieu Dumoulin: looked like. I did not understand the tools. Well, every time you you went into a new, a different tool, the Api and the the thinking behind it was radically different. I was porting. So I designed a python data science machine learning algorithm to to do the classification fine.

165 00:23:40.370 00:23:59.390 Mathieu Dumoulin: porting that over into the hadoop world where I needed to actually get the data to it so that that required learning how to use the file system, because the file system of hadoop was special and different and quite, quite complicated in and of itself. Then you need to do some data cleaning. Well, oh.

166 00:24:00.380 00:24:06.744 Mathieu Dumoulin: what? What do I do? So you could do it in Java. And just like you said the algorithms, you kind of have to do everything yourself.

167 00:24:07.090 00:24:24.080 Mathieu Dumoulin: We decided to use pig instead, which was a tool that was invented at Yahoo, I believe to create that kind of abstraction over the the raw map reduced code that was very difficult to understand for anything significant, anyways. And and so I had to learn that

168 00:24:24.540 00:24:54.450 Mathieu Dumoulin: then. Okay, I have the data. Now, I have to apply machine learning. So there’s another library for that. It’s mahoot. Now, I have to jump into Java programming. So now I’ve gone from python to bash to pig Latin, which was the language of pig to Java. And and all the while you’re kind of just catching up to everything. And and and you know you don’t even have the luxury at the beginning of knowing what the end looks like. Today we start with well established

169 00:24:54.450 00:25:06.579 Mathieu Dumoulin: patterns around best practices of data engineering. We know that it’s going to end up in a data warehouse. We understand that it should be in some kind of

170 00:25:06.580 00:25:31.940 Mathieu Dumoulin: star schema data modeling style. We understand that the best way to do those data transformation is probably dbt, so we stay quite a while in SQL world data integration is click, click, click task. We happen to be using airbyte. But you could just as well be using 5 tran matillion like God knows what. Right, brother stack

171 00:25:31.940 00:25:54.472 Mathieu Dumoulin: something else. Right? And and so that that’s pretty easy, as well, you know. We need to do some kind of data quality validation. Well, you can buy a tool like Monte Carlo. You can use open source tools as well to do something similar. Maybe with sodaio. I don’t know right?

172 00:25:55.260 00:26:05.080 Mathieu Dumoulin: especially if you’re in the snowflake ecosystem. We’re particularly in the redshift ecosystem, which is basically everything in snowflake. Just slightly crappier

173 00:26:06.880 00:26:11.710 Mathieu Dumoulin: depend. The tools are slightly less compatible, and and you know

174 00:26:12.370 00:26:18.990 Robert Tseng: Yeah, there’s a reason why somebody went over at at Snowflake and created Snowflake. I do not believe.

175 00:26:19.170 00:26:24.199 Mathieu Dumoulin: Making a better redshift was a terrific insight. I think

176 00:26:24.350 00:26:26.950 Mathieu Dumoulin: everybody understands that would be a good thing.

177 00:26:27.140 00:26:27.975 Robert Tseng: Yeah.

178 00:26:28.810 00:26:53.830 Mathieu Dumoulin: Such as it is right that modern data stack understanding from the start. These are going to be my pieces. This is how they’re going to work. I ingest over here. It’s going to end up over there. My transformations are dbt over here. All of this is understood right from the beginning, and I am minimizing code as much as possible.

179 00:26:53.830 00:27:16.626 Mathieu Dumoulin: so that actually, in the 1st 2 or 3 months we were able to deliver something like 150 models to production. Were they at the level of what a bank would create like? No, but we’re a startup and and you know, getting stuff out the door is really really important. And that initial

180 00:27:17.150 00:27:33.288 Mathieu Dumoulin: you know, insight about leveraging these tools that work well together. Allowed us to deliver quite quite a lot of value in the 1st even one or 2 quarters

181 00:27:34.280 00:28:03.210 Mathieu Dumoulin: through adding a bunch of new data sources that were not there before, such as Dynamodb salesforce, stripe data and and a bunch of other things, and and and the company, was able to transition away from a virtualized data virtualization based solution essentially grabbing all of their analytics which were already quite sophisticated when I showed up.

182 00:28:03.910 00:28:04.480 Robert Tseng: Hmm.

183 00:28:05.350 00:28:08.589 Mathieu Dumoulin: Essentially straight off of the production. dB,

184 00:28:08.800 00:28:32.389 Mathieu Dumoulin: well, an analytics copy of it. But the the schema was the operational dB, schema, you know, and they’re they’re kind of complaining everything slow. So we kind of delivered quite a lot of value by creating these these good star schemas easy transformations and and and and all of that was doable by one person, because I did not work alone.

185 00:28:32.690 00:28:36.289 Mathieu Dumoulin: And this is super important. When you’re when you’re talking about

186 00:28:36.620 00:28:47.340 Mathieu Dumoulin: being successful is 0 to one is to not work in a dark office somewhere where nobody knows what the heck you’re doing.

187 00:28:48.110 00:28:48.970 Mathieu Dumoulin: Yeah.

188 00:28:49.110 00:29:12.579 Mathieu Dumoulin: it’s always tempting to do that because you’re so focused on doing certain things. And people don’t tend to question what you’re doing very much, because nobody understands what you’re doing, anyways, but quite the opposite. I reached out. I listened. What do people want? What are the data sets that people are looking for? Where’s the demand in terms of

189 00:29:12.580 00:29:36.649 Mathieu Dumoulin: of data. What are the issues that we need to overcome and connecting new data sources was probably number one. Performance was number 2. So by bringing in Airbyte and Dbt. Very, very quickly we were able to deliver on those things, and I think that this speed and productivity, combined with

190 00:29:37.360 00:30:05.649 Mathieu Dumoulin: like, I said, reaching out to the rest of the organization meant that in a couple of teams again, very early on I got people to buy into what what I was pushing, what I was recommending, and most of the Dbt transformation work was not done by me. It was actually done by team members who use the tool to to create the models that they need.

191 00:30:06.275 00:30:14.310 Mathieu Dumoulin: So so for example, the risk team that that is our most sophisticated data user, a very critical function in a Fintech

192 00:30:15.294 00:30:31.009 Mathieu Dumoulin: they migrated all of what they were doing off of the virtualization, the data virtualization solution and into Dbt with a lot of materialized tables, and and they did that. I would say 80 or 90% on their own.

193 00:30:31.220 00:30:31.980 Robert Tseng: Wow!

194 00:30:31.980 00:31:01.229 Mathieu Dumoulin: Right. So I set up the tools I taught them. I taught them how to fish, and they fished right? So I do think that technology is is a critical you know, success factor for this kind of work. But you know the consulting aspect of you know, working with the people around you and getting people to buy in and join into the work.

195 00:31:01.660 00:31:21.190 Mathieu Dumoulin: Is the the real lever, I would say, like the fulcrum is the technology. But the lever is the people, basically. And that’s that’s how we were able to do as much as we. We’ve done with me, focusing as much as possible on the foundational piece of it.

196 00:31:21.610 00:31:43.549 Mathieu Dumoulin: I would say 30, 40% of my time. And another 30% or so of my time has been. Yes, working on direct use cases. I’ve I’ve written some software to grab data out of out of certain Apis, and that we needed, for example, for Credit Bureau scoring data and things like that that, we’re not

197 00:31:43.550 00:32:04.190 Mathieu Dumoulin: not doable with just click, click. Unfortunately, because we needed to combine data from our databases. To make the right Api calls. And so you’re not just sucking stuff out. You’re actually suck sucking it out intelligently, so that intelligence needed to come from some custom code, and

198 00:32:04.750 00:32:15.849 Mathieu Dumoulin: that that delivery was initially quite, quite interesting, since I hadn’t coded in like 3 years. But we got there. And yeah, it’s running like a rock every day.

199 00:32:16.760 00:32:20.331 Robert Tseng: Amazing. I mean, that was such a I mean. Thank you for that. It was.

200 00:32:20.570 00:32:23.570 Mathieu Dumoulin: Sorry I’m rambling. I’m sorry.

201 00:32:23.570 00:32:41.399 Robert Tseng: No, no, all good, you you you read my mind, and it kind of just answered like 3 or 3 questions at a time. But I’m gonna go back and I’m gonna drill into a couple of things, because I think I think some of it deserves more attention. So like, I think one thing I want to park on is like, well, yeah, we’re talking. I think data teams often think about like

202 00:32:41.910 00:32:57.439 Robert Tseng: like velocity, like, how much can we get done over time like it’s not, you know. I think, especially now that there’s more and more parallels with like running sprint cycles with data teams and doing a kind of you know, do it? You know, even though on the project management side, like, you’re, we’re we’re we’re doing like

203 00:32:57.600 00:33:11.350 Robert Tseng: we’re we’re managing data projects very similarly to software engineering products at this point, and you, something that you described that was like incredible was, yeah. The way that you have kind of entered this organ, your your current team and

204 00:33:11.770 00:33:35.410 Robert Tseng: been able to achieve like 150 plus models in 3 months, or something that’s insane like, I think that’s a speed that most people cannot fathom, you know, I think, from our own experience, like, sometimes we enter into teams that have already tried to do. You know, some sort of Dbt model. Maybe they already have a couple of 100 models out there, but it’s like A, you know, 5 plus year old organization.

205 00:33:35.410 00:33:43.449 Robert Tseng: And we’re having to do a lot of consolidation. But yeah, even from when we’re just getting started, you know. I think maybe the initial

206 00:33:43.480 00:34:00.779 Robert Tseng: model, like, I don’t know, like 30 to 50 in like in the 1st couple of months, is already kind of I would say is pretty fast. And and and the the volume that you’ve been able to to put out clearly, you said is because you’ve taught your team how to fish, and they’re the ones that are also coming building the models.

207 00:34:00.780 00:34:01.660 Mathieu Dumoulin: Exactly.

208 00:34:01.956 00:34:04.919 Robert Tseng: So obviously, it’s not just you like being completely superhuman.

209 00:34:04.920 00:34:06.549 Mathieu Dumoulin: How? How would I know? Right.

210 00:34:06.550 00:34:06.910 Robert Tseng: Yeah.

211 00:34:06.910 00:34:07.790 Mathieu Dumoulin: Showed up.

212 00:34:08.230 00:34:08.710 Robert Tseng: Yeah.

213 00:34:08.710 00:34:09.230 Mathieu Dumoulin: So.

214 00:34:09.230 00:34:18.942 Robert Tseng: Yeah. But I think, what would it would be interesting. I think you mentioned the part of that. I’m curious like if we broke it down. Some of those models are just, you know,

215 00:34:19.440 00:34:36.829 Robert Tseng: But you know, app analytics, replica of like production database. And then you’re basically doing some refactory building some schemas to make the the querying a bit faster. So maybe a lot of it is kind of just retrofitting what was already there. And then, like a bunch of it, was new integrations as well. Could you kind of speak to like

216 00:34:36.830 00:34:54.269 Robert Tseng: how much of that was you identify? Okay, this is the core data from the application that I need to go and sprint after make sure that it’s ready for reporting and then also like to prioritize new data sources. Right? Like, I, could you speak to that a bit more on like what trade-offs you made there, and how you prioritize.

217 00:34:54.429 00:35:06.149 Mathieu Dumoulin: I I do think that I’ve I’ve spent quite, quite a few years in consulting prior to this role, and while it’s easy to

218 00:35:06.839 00:35:15.295 Mathieu Dumoulin: keep all kinds of scorn on the Powerpoint jockeys from top tier consultancies.

219 00:35:16.399 00:35:41.149 Mathieu Dumoulin: there is a lot of incredible capabilities, incredible skills to learn from the consulting toolkit that you develop by working for a while in these kinds of organizations, and one of these, some of these that have been instrumental to my everyday

220 00:35:41.149 00:35:47.359 Mathieu Dumoulin: work in a startup is the ability to communicate with

221 00:35:47.719 00:35:58.439 Mathieu Dumoulin: business people in a way that is on on their terms, that they can that they can understand, and and and that can resonate with them.

222 00:36:00.029 00:36:19.549 Mathieu Dumoulin: to enter a dialogue with them. Basically where? I’m not there to impose a tech agenda. I’m here to understand what are the priorities of the organization with respects to data and analytics and make them happen in the most efficient possible way.

223 00:36:19.959 00:36:23.449 Mathieu Dumoulin: So that that was basically my 1st month was just

224 00:36:23.609 00:36:43.729 Mathieu Dumoulin: talking to people listening to what what pain points they had. And and I kept hearing like, why can’t I access salesforce data? Why can’t I access salesforce data? Why can’t I access dynamo? dB, data? I’ve been waiting for this for like 3 months I’ve been waiting for this for like 4 months. So so these were things that really came out.

225 00:36:44.099 00:36:46.141 Mathieu Dumoulin: The second thing that that

226 00:36:46.739 00:37:04.369 Mathieu Dumoulin: I find to be really like a superpower is the the capability to integrate all of these different things and prioritize them effectively and and have the pattern recognition to do that with confidence.

227 00:37:05.733 00:37:23.110 Mathieu Dumoulin: And and you know, to to know to know what the big stuff is, to know what the small stuff is, and to not worry too much about the small stuff. I don’t suffer from decision paralysis very much and

228 00:37:23.750 00:37:30.389 Mathieu Dumoulin: not only that, but but also is again going back to communication. But it’s 1 thing to communicate with

229 00:37:30.960 00:37:52.449 Mathieu Dumoulin: business people, but also it’s to build allies within the engineering organization as well. Right? Which means that every decision that I’ve made I’ve made a an important point of. Then going back to my CTO, going back to the head of platform and and

230 00:37:52.450 00:38:05.990 Mathieu Dumoulin: and and sharing my plans with them, and saying, You know, does this resonate? Does this make sense? How is this aligned with what we’re doing? How do we integrate our roadmap so that we help each other, and not just kind of fight over things? And so

231 00:38:05.990 00:38:11.299 Mathieu Dumoulin: that that kind of of 1, 2, 3, where where one, you’re kind of understanding

232 00:38:11.790 00:38:39.399 Mathieu Dumoulin: the problems that exist, you have a strategy to fix it that converts into a real plan with essentially a roadmap that you could then put into a jira board and and just burn down, basically. But but along the way, going back to the the technical side leadership and going. This is what I plan to do. This is how I plan to do it.

233 00:38:39.940 00:39:08.199 Mathieu Dumoulin: Do you have any concerns? What is God? Because when you’re when you’re coming in like, I’ve gotten like 6 new tools integrated into the organization in about a year. That’s a lot. That’s not normal. But I’ve done this process like a million times. And so for me, it is absolutely normal that when I’m considering to buy a tool, I have

234 00:39:08.470 00:39:11.780 Mathieu Dumoulin: a process in terms of

235 00:39:12.360 00:39:35.459 Mathieu Dumoulin: what is the tool for? Why do we need it? Who is it going to? What pain points? Is it going to solve? What are the alternatives? How much is it gonna cost? Why is it the right thing writing it all these things down, aligning with people in the process, so that by the time I’m formally asking for it, it’s kind of a done deal, you know. So

236 00:39:35.840 00:39:47.490 Mathieu Dumoulin: all of these things together kind of work and reinforce each other, so that the time that I actually spend on

237 00:39:47.600 00:39:55.170 Mathieu Dumoulin: delivery is well, much less than it otherwise would be. But I know it’s the right stuff.

238 00:39:56.440 00:40:01.750 Mathieu Dumoulin: and the people around me agree. This is the right stuff.

239 00:40:03.070 00:40:10.930 Mathieu Dumoulin: And and so what time I do spend is spent on high leverage, solutions and

240 00:40:11.280 00:40:21.229 Mathieu Dumoulin: high, level, high leverage solutions in as much as possible. A high productivity way, because we’re choosing our tools smartly, you know.

241 00:40:21.230 00:40:21.680 Robert Tseng: Yeah.

242 00:40:22.940 00:40:25.179 Mathieu Dumoulin: At least that’s been the strategy so far.

243 00:40:25.900 00:40:50.480 Robert Tseng: Got it. Yeah, I mean, I wanna kind of click into that. So obviously, tooling has changed a lot over the past. Even 5 years like, you know, I just still remember what Dbt 1st was like. Kind of incubated within an agency, right? It wasn’t even really like a product company and it spun off. And now they’ve kind of they’re kind of set the the standard for what like that whole data modeling semantic layer kind of start. But but part of the pipeline looks like.

244 00:40:50.871 00:40:58.709 Robert Tseng: yeah, I guess you know. Obviously, you’ve probably gone through very many, very like many waves of different tooling. So

245 00:40:58.850 00:41:10.279 Robert Tseng: you know now, right now you may feel like you know this. The you know the package of the 6 tools that you feel like best serves like the modern data stack needs today. And maybe you’ve had a few reps and implementing that

246 00:41:10.545 00:41:29.158 Robert Tseng: and like, it helps that you’re a systems. Thinker, you’re able to really, not just like, buy one tool off the shelf at a time and not be able able to abstract and talk about it’s it’s impact. And so I think it’s it’s helpful that you’re able to kind of do some of that consolidation. And and like, you’re a very well informed shopper compared to most folks. Right

247 00:41:29.690 00:41:30.060 Mathieu Dumoulin: Andrews.

248 00:41:30.060 00:41:30.410 Robert Tseng: So.

249 00:41:30.410 00:41:31.940 Mathieu Dumoulin: Talking with me.

250 00:41:32.450 00:41:45.370 Robert Tseng: Yeah, for vendors. You are afraid to talk to you. But yeah, I think that’s that’s a really important skill for data folks who are kind of data leaders that are kind of assumed to be

251 00:41:45.370 00:42:13.830 Robert Tseng: like the trusted expert on a bunch of things that they’ve probably never seen. But like, it’s kind of like, yeah, I’ve not seen it as exactly that before, because it came up the past 2 years, but I’ve probably worked with something similar, and I could figure it out like, How do you update? Kind of how do you? How do you stay on top of of things? And like, what are the core pieces that you you think are important to the stack that you’re willing to and like? When are you willing to willing to entertain, like the next innovation within each of those kind of components.

252 00:42:14.060 00:42:21.419 Mathieu Dumoulin: That’s a really good question. I think that for me, I learn a lot by doing.

253 00:42:21.600 00:42:28.939 Mathieu Dumoulin: I don’t have the luxury of spending 20 HA week

254 00:42:29.355 00:42:35.410 Mathieu Dumoulin: reading all kinds of stuff, left and right, and watching videos left and right. I I wish I did, but I don’t.

255 00:42:36.260 00:42:47.189 Mathieu Dumoulin: I do try to stay on top of 3, 4, 5 of the the best data blogs that that I can find. For example.

256 00:42:47.430 00:42:54.089 Mathieu Dumoulin: modern data 101 data guy, I mean the the quote unquote, the big ones, the best ones. So.

257 00:42:54.090 00:42:54.410 Robert Tseng: Yeah.

258 00:42:54.410 00:43:04.150 Mathieu Dumoulin: Try to keep. Keep on top of what what these guys publish. I do try to to buy and read the latest, most important

259 00:43:05.058 00:43:18.949 Mathieu Dumoulin: technical books on on topics. So like, I try to be strategic about it again. I don’t intend to spend 20 h of my week every single week reading up on stuff. So yeah, it has to be a couple of hours. No more.

260 00:43:19.350 00:43:19.860 Robert Tseng: Yeah.

261 00:43:20.735 00:43:31.469 Mathieu Dumoulin: I learn a lot by doing, and I think that I’ve been in the business for long enough at this point, where I feel like I have a pretty good nose for

262 00:43:31.650 00:43:33.960 Mathieu Dumoulin: what? What matters

263 00:43:35.701 00:43:42.180 Mathieu Dumoulin: if you take Llms, for example? I was fashionably. I was fashionably late to the party.

264 00:43:43.270 00:43:44.165 Mathieu Dumoulin: Because

265 00:43:45.510 00:43:59.020 Mathieu Dumoulin: For the 1st several years it was just a bullshit thing that consultants used to sell a lot of consulting services. That’s fine for consultants, and I was in a consultancy, and we sold all kinds of, you know.

266 00:43:59.466 00:44:21.419 Mathieu Dumoulin: Advisory for for companies to kind of figure these things out while the consultants were figuring it out themselves, and and that was fine, you know. I even built some internal projects in a team that that leveraged Llms to kind of try to build automa generate data pipelines automatically.

267 00:44:21.660 00:44:22.020 Robert Tseng: Hmm.

268 00:44:22.020 00:44:23.500 Mathieu Dumoulin: It kind of worked, which was.

269 00:44:23.500 00:44:24.450 Robert Tseng: It’s really interesting.

270 00:44:24.450 00:44:27.890 Mathieu Dumoulin: Yeah, yeah, was quite, quite interesting.

271 00:44:29.041 00:44:34.650 Mathieu Dumoulin: But kind of works and works are are 2 different things. And and.

272 00:44:34.650 00:44:35.010 Robert Tseng: Yeah.

273 00:44:35.010 00:44:51.660 Mathieu Dumoulin: Taking that for an example while my nose was. This is tantalizing, but not there. Time invested now will be wasted. So I read about it, but I did not go all in at all. And

274 00:44:51.770 00:44:57.660 Mathieu Dumoulin: now it’s it’s evolved to a place where it’s part of my everyday workflow.

275 00:44:58.030 00:45:05.400 Mathieu Dumoulin: Yeah, I code with it. I and it’s it’s it’s it’s an extraordinary tool to help me

276 00:45:05.720 00:45:18.790 Mathieu Dumoulin: make the most of my experience and knowledge, and to deliver on things that that I otherwise could never hope to do. A really interesting example was helping

277 00:45:19.570 00:45:26.340 Mathieu Dumoulin: a Pm, so she wants to explore some data.

278 00:45:27.827 00:45:39.299 Mathieu Dumoulin: scratch here, scratch there. And it turns out the data is in the form of at least part of the data is in the form of a SQL. Server, 2,019 backup.

279 00:45:40.330 00:45:41.190 Robert Tseng: Okay.

280 00:45:41.560 00:45:45.139 Mathieu Dumoulin: I’ve literally never used SQL. Server in my life.

281 00:45:45.140 00:45:45.460 Robert Tseng: Thank you.

282 00:45:45.460 00:46:03.590 Mathieu Dumoulin: No Tsql, but I was able to cobble together extract the data and create a viewer on it, and then extract only the data of interest that that my coworker needed

283 00:46:04.071 00:46:20.909 Mathieu Dumoulin: and and and the whole time I do not know. Tsql, I don’t know. Stored procedures. Very well. I just don’t come from that that part of the data engineering world. And and I was able to deliver it in a few hours, which was just like absolutely nuts, I mean, I have no.

284 00:46:20.910 00:46:21.260 Robert Tseng: Yeah.

285 00:46:21.260 00:46:43.039 Mathieu Dumoulin: I just delivered, but like I don’t. I have no idea how exactly I delivered what I delivered, but between what the Llms were giving me, and what you know the error messages that I saw and wiggle a little bit over here. Wiggle a little bit over there. Use a little bit of of intelligence and experience, and

286 00:46:43.660 00:46:48.850 Mathieu Dumoulin: you know you you get you get the capability of extracting vins out of this.

287 00:46:49.010 00:46:51.743 Mathieu Dumoulin: you know, ridiculously retarded database

288 00:46:53.170 00:46:59.190 Mathieu Dumoulin: and it works. And and I just, you know, wow, wow! I I mean.

289 00:46:59.190 00:46:59.510 Robert Tseng: Yeah.

290 00:46:59.510 00:47:12.269 Mathieu Dumoulin: 5, 10 years ago. This would have been a a 1 week project, or we had to hire a consultant because I don’t know. SQL. Server. Nobody else knows. SQL. Server. We’re not a Microsoft shop at all.

291 00:47:12.650 00:47:13.010 Robert Tseng: Yeah.

292 00:47:13.010 00:47:18.700 Mathieu Dumoulin: You know. I’ve literally never used Microsoft in in

293 00:47:19.550 00:47:23.859 Mathieu Dumoulin: after my 1st internship. I in university like like just never.

294 00:47:24.603 00:47:37.219 Mathieu Dumoulin: I’ve been in the Linux world like the whole time like. That’s the tradition that I I grew up in, so to speak. You know, as a professional. And and so this kind of thing is is really transformative, really interesting. And the.

295 00:47:37.220 00:47:54.920 Mathieu Dumoulin: you know, as I’m getting to understand the technology better and better. There’s all these kinds of interesting things that that come out of it that I’m now open to and think about as I include, how to think about what the next step is. Another good example are data contracts.

296 00:47:55.400 00:48:03.709 Mathieu Dumoulin: I’ve been interested in them for a while, but now it’s starting to to become a little bit more interesting, a little bit more real.

297 00:48:03.900 00:48:04.280 Robert Tseng: Hmm.

298 00:48:04.280 00:48:11.870 Mathieu Dumoulin: And we might start dipping our toes in that direction towards the end of this year or next year.

299 00:48:13.072 00:48:16.239 Mathieu Dumoulin: To integrate that into our data platform.

300 00:48:16.950 00:48:26.769 Mathieu Dumoulin: So so to me, learning is really all about a kind of inverted pyramid where I try to be

301 00:48:27.360 00:48:30.210 Mathieu Dumoulin: pretty on top of the biggest trends.

302 00:48:30.440 00:48:40.550 Mathieu Dumoulin: then have a a semi-informed perspective where my nose tells me. This kind of thing might have a potential.

303 00:48:41.953 00:48:48.840 Mathieu Dumoulin: And and and gradually know more and more about the things that I feel are closer and closer to what I really

304 00:48:49.240 00:48:50.600 Mathieu Dumoulin: care about

305 00:48:50.750 00:49:13.430 Mathieu Dumoulin: until it gets above the bar, where I want to really get hands on with it and use it. But I’m i i’m I’m really not interested in playing things for playing things sake. I’m just not not in my twenties anymore. So that’s just not something that I do. So by the time I I’m hands on with something. It’s because there’s a value.

306 00:49:13.530 00:49:30.589 Mathieu Dumoulin: There’s a there’s something valuable to the business that’s that’s at the other end of the rainbow, and it’s worth my effort to actually do. At which point I’ll learn what I’ll you know I’ll bridge the gap, and that’s what I’ve done with Dbt, I had no hands on experience with Dbt. Before actually using it with this company.

307 00:49:31.070 00:49:41.930 Mathieu Dumoulin: I think that in the 5 6 months that I’ve been using it. I went from raw beginner to maybe a high intermediate, and that’s been good enough to deliver on what I need to deliver.

308 00:49:43.917 00:50:09.580 Mathieu Dumoulin: And but but prior to selecting it, I had gone deep enough in terms of experience with teams that I led, using it to deliver things to clients. Reading about it, reading technical blogs about it to the point where I felt like the the risk was basically 0, right? So it’s really about spending as little time as possible to get just enough of an idea

309 00:50:09.640 00:50:18.104 Mathieu Dumoulin: to know what what I really want to get a little bit deeper on. I get a little bit deeper on and and and it’s really

310 00:50:19.097 00:50:34.849 Mathieu Dumoulin: whittling down that that, you know, constantly growing pipeline of of new tools, of new technologies, of new approaches, of new patterns to a set of things that I feel comfortable using.

311 00:50:36.441 00:50:44.250 Mathieu Dumoulin: Which I will. I mean, obviously the subset that I actually choose for a particular company. I mean the

312 00:50:44.320 00:51:11.629 Mathieu Dumoulin: the amount of tools that I’ve been exposed to is fairly high. I mean, I’ve worked on all of the major cloud platforms I’ve used. I’ve I’ve built things on on premise. I’ve built things in the cloud. I’ve built things with open source. I’ve built things with enterprise tools. And so you know, at the end, at the end. You’re really circling a subset of what you know and saying, you know this is the right subset for this for this for this organization, and consulting.

313 00:51:12.090 00:51:12.936 Mathieu Dumoulin: you know.

314 00:51:13.690 00:51:28.680 Mathieu Dumoulin: for for me has been a fantastic way to get this accelerated exposure to a lot of different organizations, a lot of different data contexts.

315 00:51:28.750 00:51:41.949 Mathieu Dumoulin: It’s always the same thing in terms of well, every company sucks at governance, every company sucks with single sources of truth, every every organization sucks with their data models. Everybody like

316 00:51:41.950 00:51:59.240 Mathieu Dumoulin: the problems are always the same. And what we want out of it is always the same. We want beautiful dashboards. We want data models that are super fast. We want data pipelines that are simple and powerful. We want everything to cost very little performance, to be very high, like, we know what we want.

317 00:51:59.490 00:52:01.500 Robert Tseng: We just don’t know how to get there.

318 00:52:02.160 00:52:07.249 Mathieu Dumoulin: And the other. The other thing is, everybody has the same problems. But

319 00:52:07.950 00:52:24.389 Mathieu Dumoulin: experiences those problems from their own context which is always different every single organization I’ve ever seen. They all have the same problems. They all want to go to the same place, but they’re always starting from a different context.

320 00:52:24.900 00:52:32.050 Mathieu Dumoulin: And therefore for the data solution, architect, if that’s a thing and maybe it is, maybe it isn’t.

321 00:52:32.890 00:52:42.850 Mathieu Dumoulin: That’s that’s the problem to solve, you know, is when you’re a solution. Architect is given this particular context.

322 00:52:43.410 00:52:47.830 Mathieu Dumoulin: how do we get to a next step that makes sense.

323 00:52:48.250 00:52:48.860 Robert Tseng: Yeah.

324 00:52:48.860 00:53:06.329 Mathieu Dumoulin: Right that that will have some combination of delivery of value, timeline necessary investments, technical difficulty to solve. And you’re trying to minimize and maximize across all of these different things for that particular organization. And

325 00:53:06.490 00:53:12.569 Mathieu Dumoulin: I think in the context of the average startup at at

326 00:53:12.680 00:53:20.579 Mathieu Dumoulin: series A to series. B, the modern data stack the so-called modern data stack is a really really good answer.

327 00:53:21.840 00:53:22.819 Mathieu Dumoulin: For example.

328 00:53:23.440 00:53:28.450 Robert Tseng: Sure. Yeah, I mean, once again, I think really good insight into

329 00:53:29.060 00:53:31.509 Robert Tseng: yeah, like, how do you stay

330 00:53:31.970 00:53:40.409 Robert Tseng: on top of kind of like the trends in the data space. But, like curate, your own subset of like, what’s helpful for you, you know.

331 00:53:41.270 00:53:57.060 Robert Tseng: you know in in general, like the universal problems across the data stack. And obviously, there’s like a little bit of context, dependency on like, what? What tool selection you end up going with. But clearly your consulting experience was able to kind of help you to to pattern, match and and get kind of develop

332 00:53:57.410 00:54:04.770 Robert Tseng: for like what like, what’s what subset of tools is best best for each, or you know, is is better applied to each context.

333 00:54:05.189 00:54:27.080 Robert Tseng: But yeah, I mean, I think, you know, I I if we could. If we could go a little longer I’d love to just like kind of wrap it up and ask a couple couple of questions. And going going back towards what we were saying. I think particularly because, you know, you are like, yeah, really, solo team data data team and a series, a post post series a startup.

334 00:54:27.570 00:54:54.279 Robert Tseng: I think that’s an interesting trend to me, because, you know, there’s I’ve been noticing the past few years. You know, we have all these distinct data rules, the data engineer, the analytics, engineer data scientists, and maybe a couple others, they’ve all been quite distinct. But I think that, as you kind of gave in your example, specialization being like the Tsql person, for for that particular type of SQL. Within that particular

335 00:54:55.340 00:55:01.130 Robert Tseng: ecosystem like that special being, a specialized consult, or a specialist in that

336 00:55:01.350 00:55:05.419 Robert Tseng: in a particular sub niche like that may not actually be that

337 00:55:05.650 00:55:13.229 Robert Tseng: you know differentiable anymore. Because now you can have, you know, with the with the enablement of Llms, like everybody’s able to stretch more

338 00:55:13.517 00:55:29.539 Robert Tseng: and able to go, and, you know, get good enough that they can pretty much like meet the needs of their their organization. So I, from my my point of view, is, I’m seeing a consolidation of these roles. And I think I think more and more teams are data. Teams are getting leaner. You know, my last in-house role.

339 00:55:29.540 00:55:54.000 Robert Tseng: Pretty bloated data team. We ended up trimming down. And as part of the reason why I started my company is because I wanted to be able to be like that small nimble data team 3 or less, if you were 3 or fewer people and be able to go and just be be that fractional data team for many different organizations. And so I think, like the thesis is kind of similar to what you were actually playing out in your workplace.

340 00:55:54.470 00:56:20.629 Robert Tseng: So my question is, you know like, what do you? What do you like, do you? What’s what do you think is like the threshold? Where for you? You’re gonna be like? Alright. That’s the cap I just when I’m starting to go. And I I need to grow out the team like, what does that look like, you know, like for you to to scale. Obviously, you’ve scaled your time. You have. You’re focusing a lot of high leverage stuff. But where do you hit your ceiling and you feel like you need to hire more and like what does kind of fanning out from there look like.

341 00:56:22.102 00:56:26.317 Mathieu Dumoulin: Great question. I think that the

342 00:56:28.367 00:56:32.630 Mathieu Dumoulin: let me refer to my notes. Cause that that was a really interesting

343 00:56:33.263 00:56:47.110 Mathieu Dumoulin: question. So so first, st is that the scaling out from from bad 1, 2 more

344 00:56:47.840 00:56:54.029 Mathieu Dumoulin: needs to happen when you have certain indicators

345 00:56:54.820 00:57:00.307 Mathieu Dumoulin: that that light up on your dashboard, and one of them is that the

346 00:57:01.410 00:57:03.200 Mathieu Dumoulin: You have become popular.

347 00:57:03.850 00:57:21.780 Mathieu Dumoulin: You are having meetings with multiple teams at the beginning of at the end of each quarter who want to understand where your capacity is going in the next quarter, because they they wanna make sure that their priorities come through and they don’t have to wait.

348 00:57:23.320 00:57:50.559 Mathieu Dumoulin: That would be an example that there is a growing backlog of use cases in the data and analytics space that the company has identified and which has support at the higher levels of the org, meaning that the head of sales, head of marketing, the head of risk, the CTO, the coo. These guys are starting to see

349 00:57:50.840 00:57:53.665 Mathieu Dumoulin: and demand and ask for

350 00:57:54.560 00:58:07.420 Mathieu Dumoulin: specific use cases. We need to create a sales funnel and create a bunch of analytics on top, so that our churn rate is better understood. And

351 00:58:08.540 00:58:11.560 Mathieu Dumoulin: you know that we track it. And then we can actually

352 00:58:12.240 00:58:34.480 Mathieu Dumoulin: take measures to reduce it. Somehow we want to reduce fraud. We’ve done what we could so far. But now we feel like we need to add advanced analytics, because here are a couple of cases that we’re not catching right now that have cost us $9,000 last month, and is on track to cost us double that this month. Oh, we need to do something about it. So.

353 00:58:34.840 00:58:44.170 Mathieu Dumoulin: You’re going from vague stuff of. We should use Llms to specific use cases that have upper management

354 00:58:44.440 00:58:46.720 Mathieu Dumoulin: eyes and attention. Right?

355 00:58:48.890 00:59:08.340 Mathieu Dumoulin: And and now you’re one person going like well, I would like to do the thing, but I cannot. So so when those kinds of things happen that’s probably starting to get to, because when you are one person you have a roadmap, and you’re able to show to the organization. Well, I can do a lot of things.

356 00:59:08.530 00:59:19.259 Mathieu Dumoulin: but needs are scaling beyond what what my ability to deliver is. And so, if you want me to deliver on these things, then it’s going to stretch out over the next 9 months.

357 00:59:19.510 00:59:24.370 Mathieu Dumoulin: and if we’re 2 people, then it might stretch out until 6 months, and if we hire.

358 00:59:24.870 00:59:30.010 Mathieu Dumoulin: If we go to a 3 person team, then we might be able to get this down to 3 or 4 months.

359 00:59:30.660 00:59:42.510 Mathieu Dumoulin: And then and then it becomes a financial decision. But that that has clear business imperatives to drive the decision. So that’s that’s very, very, very critical.

360 00:59:43.930 01:00:05.990 Mathieu Dumoulin: the other elements is that how much time are you spending on foundation work versus value delivery and use cases? That ratio is very important. How much time are you spending on just keeping the lights on and operating things when those ratios start to get out of whack.

361 01:00:06.110 01:00:19.192 Mathieu Dumoulin: And, for example, when then, you need to spend a lot of time on foundational capabilities. Well, if you’re spending a lot of time on foundational capabilities, you you can’t deliver anything at all for the business. The business typically doesn’t like that.

362 01:00:19.490 01:00:20.100 Robert Tseng: Yeah.

363 01:00:20.418 01:00:25.189 Mathieu Dumoulin: So so these are. These are some of the indicators that I would look for

364 01:00:25.700 01:00:39.280 Mathieu Dumoulin: to help drive the conversation around. When is it time to go from one to 2 or 2 to 3 or more? Secondly, there’s also a question of well, where these people sit.

365 01:00:41.205 01:01:02.059 Mathieu Dumoulin: it’s it’s it’s easy to just say, oh, data team data team data team. But that creates the incentive for the organization to have a centralized team and all the issues that come with it, meaning that business teams want to just swing projects over to the data team and then complain loudly when things take a long time.

366 01:01:02.470 01:01:05.180 Mathieu Dumoulin: Yeah, that is not a good model.

367 01:01:05.350 01:01:12.710 Mathieu Dumoulin: What you want instead is a centralized team that focuses on the data platform and the foundational data.

368 01:01:13.380 01:01:20.680 Mathieu Dumoulin: And you have embedded data engineers in teams working on specific business oriented data products.

369 01:01:21.690 01:01:22.300 Robert Tseng: Yeah.

370 01:01:22.620 01:01:23.160 Mathieu Dumoulin: And so.

371 01:01:23.160 01:01:23.540 Robert Tseng: Make, sense.

372 01:01:23.540 01:01:30.910 Mathieu Dumoulin: If a team wants use cases, you hire a data engineer that’s gonna work on your use cases.

373 01:01:31.350 01:01:50.080 Mathieu Dumoulin: And then, yeah, tell me where my, my stuff doesn’t necessarily need a full time person. Well, fine. That person can, can, you know, quote unquote, sit with the data team, but is gonna work, 50% on your use case in your in your team and 50% on some other team. And if you can’t quite find another team

374 01:01:50.190 01:01:55.079 Mathieu Dumoulin: to have work for the other 50%. Then it’s gonna be a tough conversation. Sorry?

375 01:01:57.200 01:02:24.910 Mathieu Dumoulin: and if you find, then you find I mean again, it’s it’s it’s you want to avoid sharing. And you really want to find ways to assign people 100%. It’s just much better to manage. But you know, again, it’s it’s it’s it’s always easy to find work. And so if if a person is in a particular team, and there’s actually only 50% of their time needed on the 1st set of use cases. Well, what I I can guarantee you that that will change

376 01:02:24.970 01:02:30.679 Mathieu Dumoulin: when when teams realize that that their data requests actually get done.

377 01:02:31.140 01:02:47.050 Mathieu Dumoulin: It’s incredible the speed at which they will find 5 more things to do. That’s the fact of working with data. But you know, fine work, 50%. And the other 50% send them back into the data team and work on foundational stuff that benefits everyone.

378 01:02:47.600 01:02:48.360 Robert Tseng: Yeah.

379 01:02:48.600 01:02:57.870 Mathieu Dumoulin: Like better data models working on speed improvements, reliability improvements, data, quality improvements like, there’s always a million things to do.

380 01:02:58.280 01:02:58.890 Robert Tseng: Yeah.

381 01:02:59.460 01:03:11.540 Robert Tseng: okay, yeah. I mean, super good insights. I really appreciate your time, Matthew. I’m gonna just kind of wrap up with a quick lightning round where we just ask a couple of quick response questions.

382 01:03:11.955 01:03:23.199 Robert Tseng: Yeah, I mean, I I gave a few just to get us going, but you know. Maybe I’ll have more as we keep going. So let’s let’s just start with some easy ones. So let’s start with underrated data tool that you love.

383 01:03:23.200 01:03:24.170 Mathieu Dumoulin: Doug, dB.

384 01:03:24.640 01:03:32.429 Robert Tseng: Duck TV, okay? And most overhyped term, I would say in in date, in data, right now.

385 01:03:35.620 01:03:41.800 Mathieu Dumoulin: In data in AI, I can answer you. That’s super easy. It’s agentic agentic, whatever it’s both.

386 01:03:41.800 01:03:42.429 Robert Tseng: Okay. Sure.

387 01:03:42.430 01:03:52.370 Mathieu Dumoulin: Shit. But data, I think in data. Hmm, I don’t know.

388 01:03:54.630 01:04:23.739 Mathieu Dumoulin: I don’t know. I I don’t think there’s a lot of bullshit going on in data right now, I think we’re all hard at work on things that matter, and the tools that are on top of the on top of the heap right now are the right ones. And I also think that the emerging the emerging things are are things like data, governance, data, quality and data contracts and things like that. Those those are not overhyped. They’re not hyped enough.

389 01:04:24.250 01:04:31.889 Robert Tseng: Yeah, no, we should have another call and talk about data contract. I’m curious about that. I would say, if I were to answer that question reverse Etl would probably be the most overhead.

390 01:04:31.890 01:04:37.919 Mathieu Dumoulin: Oh, my God, that’s a good one! I I do agree, but it’s a thing.

391 01:04:37.920 01:04:38.310 Robert Tseng: Yeah.

392 01:04:38.310 01:04:39.690 Mathieu Dumoulin: It is a thing.

393 01:04:39.890 01:04:40.620 Robert Tseng: Yeah.

394 01:04:40.620 01:04:43.990 Mathieu Dumoulin: Like sales and marketing. They just want their toys. And yeah.

395 01:04:43.990 01:04:48.979 Mathieu Dumoulin: they generate money. The business will always accommodate them always forever.

396 01:04:49.660 01:04:52.831 Mathieu Dumoulin: It is what it is, but it’s a good choice.

397 01:04:53.120 01:04:53.540 Robert Tseng: Yeah.

398 01:04:54.859 01:04:56.179 Mathieu Dumoulin: Okay.

399 01:04:56.452 01:04:59.450 Robert Tseng: If you weren’t working in data, what would you be doing.

400 01:05:00.140 01:05:05.580 Mathieu Dumoulin: I think I would probably be some kind of solution or enterprise architect.

401 01:05:05.950 01:05:06.640 Robert Tseng: Okay.

402 01:05:06.770 01:05:08.479 Mathieu Dumoulin: Yeah, I could. I could see that

403 01:05:08.953 01:05:12.509 Mathieu Dumoulin: that would probably be and if not, maybe an economist.

404 01:05:13.180 01:05:14.410 Robert Tseng: Oh, wow!

405 01:05:14.410 01:05:15.330 Mathieu Dumoulin: Why not?

406 01:05:15.790 01:05:16.490 Robert Tseng: Yeah.

407 01:05:17.877 01:05:39.169 Robert Tseng: and then I know you talked about things that you you shared a couple of blogs. You said books that you would read. But if you were to just pick one book that you think every data leader should, you know, read now, or maybe it will be more specific. Any data leader leading, you know, a post series, a like data team? Like, what? What should they? What should they be reading.

408 01:05:39.500 01:05:42.389 Mathieu Dumoulin: They should read the rewired book by Mackenzie.

409 01:05:43.260 01:05:45.479 Robert Tseng: Oh, okay, yeah, no. It’s good.

410 01:05:45.480 01:05:47.639 Mathieu Dumoulin: Popular choice. Yeah.

411 01:05:47.910 01:05:57.300 Mathieu Dumoulin: And and I could already hear the critics. But that book is genuinely good. There’s an incredible amount of expertise that went into it.

412 01:05:58.220 01:06:02.459 Mathieu Dumoulin: I’ll just put it this way. The the names of the authors on the book

413 01:06:02.900 01:06:05.810 Mathieu Dumoulin: are not the actual people who wrote it.

414 01:06:05.810 01:06:09.980 Robert Tseng: Yeah, that’s that’s any. Any book that consulting firm puts out.

415 01:06:10.360 01:06:24.510 Mathieu Dumoulin: But there’s an incredible amount of expertise in that book, and it speaks to the value orientation for how organization can be successful at

416 01:06:24.870 01:06:27.500 Mathieu Dumoulin: taking advantage of data and AI,

417 01:06:28.750 01:06:42.869 Mathieu Dumoulin: and it’s not just for big organizations. I think there’s an incredible amount of insights in there in the approach that I use every day in my work in a startup. And it’s it’s it’s it’s like a superpower.

418 01:06:43.050 01:06:48.379 Mathieu Dumoulin: So so the recipe is actually in a book. It’s in that book.

419 01:06:49.060 01:06:55.457 Robert Tseng: Yeah, got it. Okay? I got, I got one more question. So let’s say,

420 01:06:56.000 01:07:01.670 Robert Tseng: yeah, you you encounter a a problem that you’ve never solved in data. It’s like something urgent like.

421 01:07:02.270 01:07:12.909 Robert Tseng: you know, you you get that dreaded, dreaded slack message or like code red message. What’s what’s the 1st thing that you do before you before you start tackling the problem.

422 01:07:15.150 01:07:20.819 Mathieu Dumoulin: The 1st thing is to understand is this even a problem?

423 01:07:22.210 01:07:23.990 Mathieu Dumoulin: Don’t panic.

424 01:07:24.460 01:07:28.320 Mathieu Dumoulin: Don’t panic. Take a breath, and and

425 01:07:29.620 01:07:40.770 Mathieu Dumoulin: speak with the people involved first, st and everything else after that will will come forward, and

426 01:07:41.260 01:07:43.809 Mathieu Dumoulin: my experience says that

427 01:07:45.210 01:07:51.350 Mathieu Dumoulin: More more often, more often than not. There’s a simpler solution

428 01:07:53.010 01:07:56.150 Mathieu Dumoulin: that can often include, do nothing.

429 01:07:58.190 01:08:06.280 Robert Tseng: Okay, yeah, no, I barely a seasoned expert. You you recognize the could be a false alarm. Don’t don’t panic the skies.

430 01:08:06.280 01:08:13.370 Mathieu Dumoulin: Don’t assume. Don’t assume that the world is on fire. Yeah, don’t assume it’s not on fire. Assume nothing right like.

431 01:08:13.370 01:08:14.000 Robert Tseng: Hmm.

432 01:08:14.390 01:08:27.510 Mathieu Dumoulin: See what’s going on. Speak to the people the human presence counts for more than anything, because even if the problem is genuine, and even if it’s your fault.

433 01:08:28.260 01:08:38.619 Mathieu Dumoulin: Don’t worry. I fucked up already, and it’s only been a year people are incredibly forgiving when you are transparent and

434 01:08:40.399 01:08:44.970 Mathieu Dumoulin: focused on their needs in in the midst of that problem.

435 01:08:45.410 01:08:47.740 Robert Tseng: Yeah, that’s great.

436 01:08:47.740 01:08:54.800 Mathieu Dumoulin: And if you work on people that that want to blame you and and be be not nice, then then you just learn something, and maybe

437 01:08:55.350 01:08:57.000 Mathieu Dumoulin: maybe maybe maybe maybe right.

438 01:08:57.479 01:09:02.683 Robert Tseng: Yeah, well, they’re just looking for a scapegoat at that point. And unfortunately, you are the one.

439 01:09:03.279 01:09:04.059 Robert Tseng: Yeah.

440 01:09:04.539 01:09:07.020 Mathieu Dumoulin: Okay happened to us. Also.

441 01:09:07.020 01:09:08.840 Robert Tseng: Yeah, happens to the best of us.

442 01:09:08.840 01:09:09.706 Mathieu Dumoulin: That’s correct.

443 01:09:12.859 01:09:40.509 Robert Tseng: Okay, well, that comes. That’s that’s up. I think time time’s up for us, I think. Thank you so much for the the hour that you you gave us? Yeah, really, just pick your brain. I really enjoy this discussion like, I’ll I’ll be going back and thinking through. And I’ll read the book that you recommended, and all that. So yeah, thank you so much. Matthew. And yeah, I guess I’d love to stay in touch. And yeah, well, I guess I’ll I’ll hopefully. I’ll see you around if you’re another event in New York. Just let me know

444 01:09:40.510 01:09:42.829 Robert Tseng: pleasure. Thank you very much. Cool, all right.

445 01:09:43.960 01:09:44.470 Hannah Wang: Bye, bye.