brainforge_etl_tools_assessment_memo_walkthrough

Meeting Title: ETL Tools Assessment walkthrough Date: Dec 9 Meeting participants: Shivani, Andy, Awaish Kumar, Steve, Jason

Transcript:

Me: Hey, everyone. Sorry. My zoom just, like, updated right as I joined and, I don’t know, sort of stitching out. Nice to meet everyone. Nice to meet you. Steve, Andy.
Them: Good to meet you. Too.
Me: Hey. Hey, Shivani.
Them: Hi.
Me: Great. Let’s get started. So sort of the goal of this conversation is we put together a bit of a zoom assessment. I mean, ETL assessment. Basically going through what kind of like the end to end of, from our experience, you know, and implementing data platforms and considering, you know, data movement tools. Sort of like what, you know, we would suggest the path forward would be on, like how to move data for reporting, use cases, you know, here at Element, you know, kind of one of the goals. And I think a good piece of feedback from Shivani that we took yesterday is just to give a little bit more narrative from our experience on, you know, doing this, you know, day in and day out for several other clients. So I just want to confirm if everyone has that doc, and we can kind of go through it end to end if. That’s fair.
Them: We all have access to the doc.
Me: Okay, cool.
Them: And then. Sorry, my camera is funky right now, so that’s the reason you can’t see me right now.
Me: Oh, no problem. Cool. So I guess if. Did everyone have a chance to sort of give it a little bit of a read? Or would we prefer I just want to walk through from the top down?
Them: I think we should go top down. I think there are some changes since some of us reviewed it, so it’d probably better just go through the whole thing.
Me: Okay, let’s do it. And then I guess, overall, as you know, I know I didn’t get to say give a chance to say hi last time, but I run Brainforge. We do data and data platform work, setting up reporting systems for a lot of folks, you know, in a very similar ecosystem to y’all. So want to make sure that you guys feel really comfortable with the purpose of an ETL tool where it fits into their overall strategy of accelerating, you know, the accuracy. Of insights and accurately accelerating how many insights that the element team can get from our data. But also not just like cobbling together, you know, a bunch of random systems, like actually taking the time to understand why, you know, ETL. Is important in a broader data platform solution. So at the top of this document, I basically highlighted some of, like, what the struggle is, you know, today. You know, just from our brief period and meeting some of the folks on the team. I don’t think anybody here is, you know, I think everybody, I’m sure, on the team understands some of, like, what happens. On the reporting side of the business. Whether it’s information coming from source medium, whether it’s information coming from source systems like Shopify, Amazon, whether it’s like actual manually ad reporting coming from other ad systems. So one of the core parts of a data, you know, data platform is loading and transforming data. And so this is like, really the start. Of like, okay. Element uses several different vendors to run, you know, the business. This can be everything from, you know, Amazon to sell Amazon stuff, Shopify. So kind of just to even be very, very specific, it’s just they’re systems that run elements business. That data in those systems is what y’all are reporting on right now. The team uses a mix of different ways of getting that data out, whether it’s literally copying and pasting it from a ui, whether it’s manually exporting it out, and in some ways, whether it’s, like, programmatic. And so part of our goal as a data team commonly is to centralize that data in order to combine it and report on it and actually provide a source of truth for information. For example, there are differences in the way Amazon reports on refunds than Shopify. That is a nuance that if you are owning Shopify, you would understand Shopify, you’re owning Amazon, you understand Amazon. But for anybody else in the business, it’s very hard context. To keep in your head. And those are just two systems, and there’s two parts of those systems. And so for any single system, there are tons and tons of these nuances that typically, the goal of a data team is to standardize those based on accepted metrics and accepted business definitions and then make that available. For reporting so that more and more people in the company can leverage data for their day to day. That’s like the gist of kind of like, why data, in terms of etl, etl, extract, transform, load. Often you see elt, this is basically how do we land data in a place for us to actually model it and transform it for reporting. And so can I even go a little bit about. A lot of the history here is there have always been a lot of vendors whose main job is to move data from one system to somewhere that you want. This is like, this is an industry that’s existed for a while. There have been a lot more players that have entered in that are offered cheaper, more reliable integrations with systems. But what happens in the ETL world is the number of tools grows. As you guys know, if you’ve worked in ecommerce, or technology. New systems are coming across the team all the time. And so one of the core parts of some of the tools that we’ve recommended here is their ability to want to work with a wide variety of systems. Two is like reliability. And three, if they see new systems, how do they actually approach and develop solution to move that data? What is the alternative to using an ETL tool? Well, you could build all this yourself. And so before ETL tools, data teams built Python, Python or otherwise data flows where you’re calling in rest endpoint, you’re taking a piece of data and you’re moving it. But what happens that is not just a one time thing. You have to own that process, you have to monitor it, and now you have to call several endpoints for one vendor, and then now you’re basically running data flows. This is something that a lot of companies do manually, but the reason you go with a tool is one this is not a unique advantage point for Element to own this process. There is not much alpha in hiring a bunch of data engineers to rewrite shopify pipelines to land data into snowflake. Moving data from several of these vendors into a data warehouse is now I would describe as like commoditized, meaning many people do this, many people offer these data pipelines as a service, and it’s really actually a lot more cost effective even now than it was five or 10 years ago to do this. So part of this is truly, okay, we want to offload the ownership and the management of these data movement workloads to a vendor. And then focus on the actual meat of the bone for element, which is on managing KPIs, data modeling and then making sure reporting gets adoption. So that’s a little bit about etl. Any questions there, and I’ll just keep going, but any thoughts or questions? Okay, perfect.
Them: Well, no question.
Me: Cool. So, basically, we’re talking about why this matters now. And I think kind of a couple of things that we’re looking at in the team is one, we’re starting to see a couple of symptoms of why a data platform could be helpful. One is you don’t want your team to be focused. On data entry and data movement. That is not what their superpower is. And that is something you want to offload, whether it’s through a team or to a service. But you don’t want someone like Carlos, with his ultimate wisdom, to be spending time copying and pasting from source media and potentially getting something. Wrong. I think we can probably all agree on that. So part of that is just making sure that teams can focus on where their superpower is. The second thing is element is growing and growing the number of data sources. As you guys grow into different retail channels, as you’re leveraging more tools to run the business, these are all sources of data that the operators and the business want to leverage to report. And so that’s growing, right, the number of tools. I’m sure, as Jason, you guys and the team are aware of that. The other piece is next week. So NetSuite is coming up. This is something that I know the team is thinking about. And so this is going to be a huge source of data that the team is going to want to report on. Outside of netsuite to combine netsuite data with other sources. And so we’re going to have to find a solution to move that data into a single area. And then the last piece is just like being able to actually look at data in a combined fashion within a timely manner. If something happens last month. You don’t want to have to cobble together CSVs, put together a report, and then sort of have half confidence. You want to actually be able to have reliable reporting and you want to have that scale to people that may not have the skill sets to do all of that manual data pulling together work. So this is part of the reason this matters. And so I’ll kind of go through the recommendation and investment and stuff. But maybe I’ll just talk a little bit about kind of like what we’ve learned about CPG in a lot of our work. One is commonly. All of these sources have different levels of lag. Even in talking to Carlos, we saw that there is lag in Amazon that takes seven days to close or other things closer. There are just these nuances about every single data source. That require attention. There’s also a lot that happens during promotional periods where there’s huge volume surges that we see commonly across a lot of our companies. We work with another flower company, and so they do like 80% of their sales on two weekends, Mother’s Day and Valentine’s Day. So there are just these data challenges that are unique to ecommerce and retail. Revenue definitions. So things like sales or customer acquisition costs, there’s a lot of nuances depending on who’s using it. And what they’re using it for. Settlement. So there’s just like, hey, for example, when a refund comes in on Amazon, does it get booked and actually adjust the month in which the order was placed, or does it actually get booked in the month? And where the refund came in. These are just, like, nuances about each system. That you may find one or two people know, but that knowledge doesn’t get scaled and that isn’t widely understood. And its impact then last piece is like wholesale fragmentation. So as you guys start to do things in wholesale, in retail and in industries where they oftentimes don’t have great digital ecosystems, you’re going to start to have to deal with CSVs, SFTP, like kind of like sometimes garbage in type data systems. And so it’s really, really important for us like when we’re choosing a data ingestion tool to understand that not everything is going to have a perfect REST API endpoint, we may be dealing with situations where we have to land flat files. In fact, that is like one of the requirements even for SPINS data. I think that we leverage flat files. Flat files being like CSVs or Parquet files. And so you kind of want to understand that there are systems like Shopify, that perfect APIs, we can call them and we get data out. And there’s a ton of vendors that support pulling for Shopify, but we also need to support the nth vendor, that element. May bring on. And so this is sort of why you’re going to see that we recommend a couple of tools, but it’s mainly to account for the fact that part of choosing an ETL tool in all of our experience is that not any vendor is going to have all of it. For example, Emerson is something that’s new to us. We haven’t seen another client. With Emerson, we have seen similar systems, but it’s just something new. And so we want to make sure to choose a vendor that works for you that can actually say great. Let’s say Emerson had an API. I can support building that, and I can actually support hosting and maintaining that. In the event they can’t. What does that mean? Is Element has to own the creation of a data pipeline to own that, the maintenance of that when it goes down, who’s going to own that? And these are not things that Element wants to spend time or money running data pipelines. We want to spend time and money, understanding the business and understanding how to grow faster. So kind of like our philosophy and sort of how we think about UTL is written here. But we really always recommend using managed services in particular, unless you’re in an extremely regulated environment. Where there’s a lot of pii, or oftentimes, for some of our health clients, there may be restrictions here. We’re often like, hey, leverage these managed services for as much as ingestion as possible. And you really want to think of etl. As like piping in the wall. Ideally, after this sort of decision is, we don’t want to spend time talking about these vendors. They are not going to be the heroes of the system. They’re just going to be moving data from one place to another. So you kind of want. You don’t oftentimes. And we’ll talk about a lot of the options that are in this space. Oftentimes the ones that we don’t recommend are a lot of the systems that we’ve been called into clients to rip out because they’ve either been sunset or they’ve been bought by another company and then sort of like left to die or there’s no option for support. And so this is where we work a lot with a lot of these vendors. And so these two are the ones that when we come in, we stamp that we feel are great. And of course, we work with a lot of vendors who cost, maybe a factor of reliability, speed, maybe a factor. So this is all sort of dependent on the ecosystem. But these are ones that we know will allow us to go focus on the issues that actually matter for the company and not sit right. Python data pipelines that we have to maintain. So maybe I’ll pause there just before talking a little bit about assumptions. Any questions on. Etl data movement. I haven’t spoken at all about storing the data. I can go into that probably towards the end, but any questions on. This type of data movement. And also, are we doing anything that looks like this today, by the way? That’s a question for my side.
Them: We are. We’re not doing a ton of ETL type stuff today. We did employ Soligo as an iPass platform for an ERP integration that we then abandoned. So we’re moving to the netsuite one now. So we don’t really currently have sole in any critical path for anything currently. We are moving some of our backend stuff to more of an event driven architecture. We do have queuing systems going between our Shopify store and our 3PL, things like that. And we are moving more towards a queuing Kafka event based architecture in general. Those are going to be more custom systems, backend systems, rather than, like, putting them on. At least the current plan is not to employ hosted ETL platforms for things like that.
Me: Yeah.
Them: One thing, I’m not sure if we were going to get to it, but on the ETL end is the decision to, you know, the recommendation of having multiple systems versus trying to simplify to one single one and what the trade offs are. I know you know you spoke to not everything has a connector for every system.
Me: Sure.
Them: But most platforms do have the ability to write connectors, whether we employ their professional services or write them ourselves. I do want to focus a little bit on the trade off of trying to manage two systems and point yout systems.
Me: Yes.
Them: Pay 2 systems versus trying to consolidate into one.
Me: Definitely. Yeah. Maybe. If we can. Let me just speak about that briefly. I’m also in a very similar light. Like, I don’t really like at all to say we need two people to come do this job for us. I would much rather consolidate. Part of the reason why we recommended both of these is one, and I think if this is the primary factor or not, I think I’ll let the team decide. But the cost is quite significant. Quite significantly less. For Polyatomic than it is for Fivetran. So for basically a very similar service, we find very similar uptimes and actually quite a lot better support from Polyatomic. So it’s one of the reasons why we commonly recommend them. The thing with FiveTrans, it’s sort of the number one in this world. Both in marketing spend, but also in how much of the market they’ve taken up. However, there’s also still a lot of other competitors, like Matillion Informatica. There’s still a lot of legacy providers. But I would say in terms of what’s commonly recommended, Fivetran is there. You just have sources that we’re seeing within the element ecosystem that they don’t directly support. And so we have two options as a data team at that point. One, I can go to Fivetran and basically ask, hey, is this on the roadmap? And they typically come back with two couple options. One, they’re like, oh, we do have a private beta for that connector. But it’s going to come with these caveats, blah, blah, blah. Second oh, we don’t. But it’s on the roadmap and here’s when it’s going to be built often that’s not like it’s usually not. Oh, yeah, it’s coming out next week. That’s like, at some point it’ll be out. And then the third is like, oh, that’s not on the roadmap. And happy to, like, you can submit a ticket, basically. So when I started using Fivetran was almost 10 years ago now. And there are a lot different company and they’re not building net new connectors nearly as often. Their business has shifted primarily towards enterprise and building enterprise. So for a system like NetSuite, Salesforce, any type of SAP, they’re really, really strong. But for net new connectors for use cases like if I was to have them build something for spins, for example, I doubt that. They would make that happen. On the other side, given the fact that that’s where they are, that in a market like etl, There comes new entrants. And so Polyatomic is one of the ones that when I started the business, we went into the market to basically find, like, okay, we’re going to come into several companies and we’re going to be implementing etl. I want us to have a really opinion on several different options that work, of course, including the option of like, let’s just build stuff ourselves. And so these guys are folks that I spend a long time working with, have worked really, really well for almost like five or six of our clients are extremely responsive in slack on the support side and have built several net new connectors in quite quick turnaround time with no additional cost. So for that reason. And again, I think that was really what was sealed a deal for me and recommending them is that their support is way better than 5 tran. If 5 train goes down, we just submit. We have to submit a ticket. Usually for these folks, we have them in a channel with their CEO and their core, their technical engineering team. Second is the pricing is extremely favorable. You know, they’ve kept their expenses low on, you know, a lot of, like, what five Trainer spends on marketing and G N A, and they’ve been able to pass that to their customers. And then I’m not seeing any issues on reliability. So that’s like, kind of the reason we. We recommend too part of this whole breakdown is that we are. We would like to start with Fivetran. We can do a lot of these on Fivetran and show the cost outlays. But of course, like, part of this is, like, as I mentioned, it’s a piping in the wall. You really don’t want to talk about these folks. So, for example, if, like, something happens to a Fivetran connector, that’s that, like, cuts off everything downstream of it. And so we. Part of, like, why we recommend polyatomic is because of their support is, like, really, really great. I don’t know. Awaish, if you have any other, like, thoughts there or anything to add.
Them: Yeah, I agree with what you said, and. Yeah, like. The Fivetran is really like costly and we can like optimize our cost if we are using polydomic. And second thing also Vartam said like if one of the tool have downtime it anywhere and like we can just speed up the connectors from another tool. Like we have the backup.
Me: N.
Them: Okay, so there’s some semblance of redundancy there as well. And just coming back to so to be clear though, there is the ability to write custom connectors with an SDK N Fivetran. Right. So if we need to connect to stored like on 3PO.
Me: Similarly in polyatomic as well. Yeah.
Them: Okay? If cost was no object. Would you just move everything to 5tr?
Me: I still think, like, the support. Like, to be honest, the support has gotten worse over the years. And so this is the thing, like, as a. As, like. As the. You know, if I’m getting my input here, it’s really nice to have, like, the full technology team. For any vendor that we’re using, like on speed dial. And so I. I oftentimes have found that that’s incredibly effective. And for etl, which is just again, somewhat of like, you just need it to work all the time. Like, you just can’t have the power go out. It’s really nice. I found that over the last, like, five, six years, like, the quality of service and support from Fivetran has gone down, while their pricing has, like, basically gone up, like three acts. It just changed, like last week actually, again. So. But. But again, if price isn’t an option and I would love to consolidate everything to Fivetran. I don’t work for these guys like I work for you guys. So I don’t. I don’t have. I don’t. I don’t care about either of them much. I in. Fact, just care about redundancy. And I care about the fact that, like, this is something we. We just know works all the time. I would. If Fivetran was more. More open to building new connectors for us, I would probably just go with them, you know, no problem.
Them: Does the 5 trend cost include? Like dbt because they acquired dbt.
Me: No.
Them: Is that why the price went up?
Me: No, not, not. They haven’t done, like, any sort of cost integration yet. DBT’s pricing is on currently is on a user basis.
Them: Okay?
Me: So. It’s more of, like, if we have, like, five users using DBT, we pay, like, 50 or 100 bucks a month per user. They haven’t done any consolidation. I assume they will start to do that. But there’s other ways for us to offset the DBT cost. And, and frankly the bulk of the cost in this, in this, in the entire data platform system is going to come from ETL and bi. Warehousing and DBT are the lowest to of the cost centers in the entire system.
Them: That’s like. I understand. Kind of the redundancy, but. Why? So given that, like, the focus, well. The focus is on kind of like the Shopify. Component first. Like, does Polyatomic have, like, an Amazon connector already? Like, I guess the question is, like, why? Why? Look at Fivetran. I know, I just never done this, but, like, if we’re starting small, Polyatomic seems like. The one that’s more nimble. If we need to switch out, we could. And, and I, and I. I’m having trouble kind of understanding the redundancy anyways, because we’re going to use specific connect. The. The recommendation is to use specific ETLs for specific connectors. Right. So it’s only redundant from the perspective if one goes down, we got to spin up. Another one in a different system.
Me: Yeah, I would say. I would say, for one, is, it’s for me to understand, you know, what is. What is the price sensitivity there? You know, second is both of these. We can start as trials and go month to month. So they kind of fit our procurement criteria and that we don’t, we don’t necessarily need to, you know, sign any long term agreements with. I, I, I would prefer just go with one. The reason why I would suggest both is, for example, like where to go potentially spins and others. I. We’re not going to have much luck going to these guys and asking them to build it. Like they are. Think of them like the netsuite of, like, etl. Like, it’s. They have a roadmap, and it’s pretty set for polyatomic.
Them: But we. We can build it ourselves, right?
Me: We can also build it. Yeah, we can totally build it ourselves. And so this is.
Them: I want to, I want to keep that on the table. Just saying, like, we do have resources to be able to build these, if that factors into the long term decision.
Me: Great. Okay. No, totally. And so this is another thing. If, if, if a, if, if supporting a data source means calling like one endpoint and moving like one data frame over, that’s like. Yeah, I guess those are, those aren’t as the things I would just basically kick over to them. It’s more like, hey, let’s say where to go is actually pretty complicated. To support and there’s a tons of tons of volume. You know, it’s up to us to decide, okay, do we support that? And, like, what is the engineering cost, the maintenance cost? And just, like, having that as something to make to. To kind of contemplate versus going to one of these vendors. I would, you know, for me, if ultimately I would like to consolidate all to one. I think if polyatomic can handle all of these sources, Then consolidating there and having five train as like a turn on as backup would be best. But I wanted to sort of, like, show kind of like what all the options are. The cost. The cost difference in polyatomic is extremely significant. And also, this isn’t. You know, I do know that, like, 5 train, Matillion, or probably like the top two in this world. There are some vendors that we ruled out, like airbytillion, Stitch, rudder, stack. For example, air. There’s another one called portable. So we, like, are constantly in this market looking for, for these tools. A lot of these, like, come across really great, but extremely high maintenance, extremely fragile. A tool like Hevo and Stitch have both been bought and basically left to die. And then metal is just a really, really hard to configure. So we’ve sort of arrived at these both as, like, typically, what, what? We recommend. And polyatomic has a lot of pretty large enterprise customers like the NFL, Okta you know, customers like that as well. They just don’t do any. There’s no. They’re good at marketing at all, basically.
Them: So. So question for you, Utam, because it’s like, I would say price sensitivity. Like, I know I was pushing a little bit on, like, give me a sense of the total cost. More to like, talk about like landing the plane of the analysis. Less about, like, cost is going to be the lever that we go with. Or something like that. So, like, let’s just play the world for a second of, like, let’s say we only go polyatomic. What are the downsides?
Me: Yeah. Yeah. I mean. I think the only downside to polyatomic, and this is after, you know, we’ve been implementing them for. For three years at the company. Is just that they’re not, they’re not as well known in the market. And then second is they’re larger, like they’re net suite style connectors may not be. The most sophisticated in that. They just haven’t been running that suite connectors for probably more than, like, one or two years.
Them: Great. And we know we’re going to want to run net speed connector like so. If we just think about future state stack, we’re like, we know that we’re going to want data flowing into a warehouse. Data flowing from warehoused into netsuite. Data flowing from netsuite into warehouse. So if you think 5 tran and net go the best together.
Me: Correct. Yes.
Them: Then I think I’m echoing. But if you think five trying to net suite go the best Taylor, then that’s like a helpful framing for like the future state stack I think.
Me: I would say for anything that is enterprise, like a netsuite, or if we’re doing any other large erp. Or like SAP salesforce. Fivetran has been doing those integrations for years. Like, that’s their bread and butter. Polyatomic is, as Jason mentioned, they’re nimble. And so we’ve gone to them before. And, like, for example, we had a client that uses GoHighLevel for something. And we were like, hey, guys, we need to go high level connector. And they’re like, cool. We’ll build it for you in, like, two weeks. That was huge because my team was going to have to either build out myself or I was going to have to go to the next rung of options, you know, because that is not something that we shouldn’t be maintaining. So that was a great situation in where they were a great partner and they built it for us similarly. I don’t know. Awaish. Like, we’ve probably had them build almost like five or six other five, six, seven other connectors for us and they’ve done like a bang up job. It’s been awesome. You know, so.
Them: Yeah.
Me: Yeah.
Them: They have been like, one week turnaround time.
Me: Yeah, and less even about the speed. It’s just that, like. There’s not often where we have vendors that, like, listen. Actually, like, are supportive. Most of the time you buy software and it’s like, oh, that’s just what it is. And so.
Them: It is what it is. Yeah.
Me: That’s why it’s been nice. And for all the vendors you’ll see us recommend. Usually I try to just make sure that if something happens, we have a through line into support. Like, whether that’s, like, something that we’ve been able to get or that’s available for everyone, but, like. We’re not just like, I don’t want to just throw bad tools in the stack. It’ll really affect us. And so both of these, we’ve been five train in particular, like, I’ve used for most of my career and in polyatomic over the last two, three years has been really really great. So if I was to sort of, like, round this out. I. One thing we can do is I would suggest we have both for redundancy. We can totally get a better sense of the cost. I think you’re gonna find that polyatomic is quite dramatically cheaper. But I want to also give this team a little bit of understanding for, like, netsuite and for Shopify and for Amazon, like, for where a core volume is and the core complexity is how sophisticated their endpoints are, and I think, Awaish. That’s for us to compare. But for tools like where to go and for other tools that I know are coming down the stack. We’re gonna need a partner like Volatomic to support, or we’re gonna have to, you know, build it ourselves. And that will be sort of sourced by source dependent. In a situation where, like, for example, SPINS is giving us, like, you know, retail files. We can just throw them to a data lake. Or something on DigitalOcean and then pipe that into Snowflake. That’s not a problem. Right? So there’s. It’s not. Like, everything has to go through these guys, and, like, if these guys don’t support it, we’re screwed. We can always just do it the usual way, which is rewrite the scripts and load it in, but. Yeah. So I guess, like, what’s. You know, is there any other. I know we kind of just jump to, like, this, but, like, is there any other helpful context here? I mean, Happy to talk. Like, we basically went through each of the sources, and we kind of got a sense of the volume. In etl world. The way cost works is they scale by rows. Kind of a weird pricing because I don’t think really like the cost to maintain these connectors goes up by row. But this is just like what the industry sort of arrived at. I don’t see like typically we don’t see many like real time use cases. So oftentimes we can say, like the data team will try to guarantee at least, like, four hours of freshness. Because between landing data, modeling it, and getting into BI. So unless there’s, like, any real time, super real time use cases. I feel like at least we’re comfortable supporting all the existing commercial data sources. And then I think it’s just a decision on which ETL tool to go with. You know, and I think that’s. That’s really what it’s going to come down to.
Them: Others can lend their opinion, but to me, a bi tool is not supposed to be real time anyway, so I wouldn’t be concerned about. I don’t know of any. And again, correct me if I’m wrong, anyone on our side, I don’t know of any. Critical path tools we need for real time. Real time data we do have, you know, shopify reports and stuff like that are a little more real time than I would expect our BI tool would be.
Me: Yeah.
Them: So I don’t think that’s a huge concern.
Me: Okay? Then the other, the other piece we talked about here is on what we call reverse etl, or basically, like, Activating data. This is common in marketing where you, you, for example, like a really good use case is like calculating like a lead score. Hey, this lead is, like, really ripe for an upsell. Let’s send them an email. That piece of knowledge, that calculation has to then move into another system. This is basically like posting. Data to an endpoint. This is something that now both of these vendors are now offering. Again, like, I don’t think is really unique to any vendor. And both are pretty comfortable doing this, so. But this is something that I know the marketing team, we see this across the board on all of our clients that we support, that once we start calculating things in the warehouse that are, like, not possible to calculate within a tool like propensity to spend, For example, if we’re doing identity resolution, like, hey, we. We saw this customer start an Amazon and we’re able to identify them. On Shopify, and we want to then send them an email. Those are things that I’m sure the the marketing team will want to do and so rehearse. ETL is that method of actually sending data to another system for marketing activation. Typically. Or like, for example, if you want to go into your CRM and put in a total spend for a customer. It’s. It’s not as easy as, like, moving from Shopify to there. You may have to combine a bunch of sources and then move it. So that’s this process. So we kind of go through a little bit of, like, you know, coverage and things like that. Again, like, FiveTrans has been in the market for a long time, so they have a lot of coverage. They just are now a lot slower on like building new connectors. That’s the net net, the trade off. There’s a lot of. There is. There is quite a long tail of people that do this work. However, most of them suck. And, like, just. You have to trust us a little bit in that, like, all we do is buy and implement these tools. So part of the reason we’re recommending these two is because that makes our job really easy. Like our team. You don’t want our team basically having to deal with connector downtime. And if we go with some of these tools, that is, you know, often a problem that we face. We have a client that’s on he vo and stitch, and it’s like an absolute nightmare. But like they’re, they’re locked in, in the long term contract and we couldn’t, we couldn’t do anything but we actually were able to. Their netsuite, one of their use cases, this is the Flower Co. One other use cases for the data is they actually do have a real time use. Case, because during those periods of time, Like Mother’s Day and Valentine’s Day. They almost need within the hour of reporting. And so Polyatomic actually built a direct odbc connector to NetSuite that supported, like, basically a real time sync process of that data. That was a huge win with a vendor. And I don’t think, like, not would. N’t. I don’t, I don’t think Fivetran supports that. And if so, it would have been their enterprise tier. So it was totally, like, not possible without, without them. That’s an example of, like, our team got a challenge, like, hey, you have to enable our 60 minute or less reporting during these these peak sales periods. Okay? Like, figure it out, you know? And so a vendor was really, really clutch. In enabling that for us, you know, so. Yeah. Let me just. Like, again, I would say the real, the real ROI when we talk about leveraging ETL is just like not having to build connectors to shopify one person, couple. These vendors are doing it. They’re doing it for all these folks. There’s nothing really unique about, about that necessarily. What’s unique is just the cost and the stability of it. So in terms of.
Them: Yeah, I think.
Me: Yeah. God.
Them: We’re sorry. I was going to say no. I mean, I think we’re already sold on kind of like the ETL tool. I think the question more is is understanding from you, and I appreciate you breaking it down is like the rationale behind why we want to do two.
Me: Sure.
Them: Tools versus one.
Me: Okay?
Them: I think just like original Oppression was one ETL tool that kind of does it all. So you’re giving us kind of like that room for thought in terms of kind of like, you know, what does it mean to have like a robust, like enterprise ready integration like A5TR would offer to an enterprise tool like netsuite, you know, and then. You know, the other part is, like, all of the long tail connectors that we’re going to need in the, in the long term. It’s like, what is that strategy? And what’s. What I’m hearing is, Paul, atomic versus, like, you know, having to kind of like, articulate, unify, utilize our. Time.
Me: Y. Es.
Them: You know, for that understanding of the cost for it seem rather low.
Me: You’re exactly right.
Them: Yep. I want to be mindful of time, actually, just because I know when, like, maybe 15 minutes left for this call here. Team. I don’t know if there’s any question yet for that specifically, but I also want to talk a little bit more because I saw it in one of the section of the doc how it sounds like you’re also like, recommending that we use Snowflake right now as like kind of an assumption. Like, was there that exercise in terms of, like, snowflake bigquery? Because I think that was something that we were. We were discussing prior as well.
Me: Yeah. So this is part of our, like we do have to do, you know, we’ll have a very similar conversation about Warehouse. Snowflake. Offers some very unique etl solutions. In fact, one of which we’re actively engaging with, with the Emerson data. So I just wanted to kind of, like, have that highlighted here. Which is both two Snowpipe and Snowflake Private share. Snowflake private share, for example, is the fact that, like Snowflake, you know, is a Snowflake is a digital cloud on top of S3 and on top of Azure Blob. And so if your vendor that you’re taking data from is a Snowflake customer, They can share data with you directly through this product called Snowflake Share. Salesforce does this stripe does this. Company like Emerson does this where their product data is there and so they can just basically click share and it shares it with you. Unique part of Snowflake because it’s this layer. What does that mean? Is that there’s no ETL required. It’s like a direct replication. So just something I wanted. To call out. But the reason why I mentioned that for these types of tools, for example, if we decide to not go with Snowflake, then we will have to consider whether the ECL vendor, there’s an ETL vendor that we have that can build it or we build it ourselves. So for Emerson, for example, I’ll have to understand. Like, hey, what are their API options and how can we go direct if we can’t go through this method? Bigquery has a sort of similar product to Snowpipe. Nothing, you know, kind of similar to private share, to my knowledge. But that’s the only reason why we highlighted these two is because we do have like, you know, Emerson in particular. We do have that snowflake engagement right now. But I would say to talk about Warehouse, both of these tools support the long the kind of, like, core set of warehouses that, you know, I’m predicting that we go with, which are, you know, either snowflake, bigquery, and there’s a couple more that will definitely highlight. But no risk on either of these for those. Really, the risk for those warehouses are the long tail of connectors on whether they can support them directly or we can build an endpoint to it. Does that make sense?
Them: Got it. Yeah. No, it makes sense. It’s. When I saw that assumption, it mentioned Snowflake already. I didn’t know if you were already making that recommendation. Now. Or that was still kind of exercise that was still in progress.
Me: It’s just sort of like an ETL option, you know, if, if Snowflake ends up in the picture. So one thing I think a way should we can highlight just here, we can just put like, if this is one advantage to Snowflake,
Them: Yeah.
Me: Is this like no zero etl data share? But it falls sort of under them, like trying to enter into this ETL world. So I wanted to highlight that here.
Them: Yeah, but just to mention that we are working on, like, the comparative analysis of of all the different data warehouses.
Me: Yeah.
Them: Yeah, so I. Oops, my camera turned off. My understanding is that this is just an initial exercise, Jason. Our way in on what we think about BigQuery context for you with them. Phil today was talking about the touting the benefits of having Gemini sit over everything. So he was like, if we’re using Google suite of services, then like, why not do BigQuery? So I think that’s just like we’re doing one tool and then hoping to chip away at the other. Question next.
Me: Yeah. Yes. Yeah. Like, even on your, like, you know, we just completed a spike for another client on a N on top of data, actually, and we evaluated maybe six or seven different vendors in addition to, like, Snowflake Native AI, BigQuery Native AI. But that’s something we’ll highlight in the warehouse piece is like, does this unlock additional AI Snowflake has, you know, very. Like, a lot of these guys are now both coming out with a lot of these similar AI tools. Like one of, for example, if I want advantage of BigQuery, Is that you can land GA data, like, pretty seamlessly. Cause they’re both Google products. But there are also benefits to Snowflake, you know, in many different ways. So. Yeah. Something I think we’ll highlight in that data warehouse. And then, yeah, Shivani. It’s like, if you want us to go deeper on the AI piece, that was something that when we talk about the BI layer, sort of just like we’ll call it data access layer. We’ll talk about both. Data access through tableau looker, like those types of traditional bi. Tools as well as chat with data type. Interfaces.
Them: Makes sense. Yeah. Yeah, and I think holding that lens when you’re thinking about the Warehouse itself is helpful.
Me: Yeah, because that’ll be. We’ll do a lot of show and tell during that piece because we’ll have the data landed, you know, when we’re started going through that.
Them: Perfect. Yeah, but I just mean if one is more conducive, like if bigquery is more conducive to that than just holding that in, your assessment will be helpful.
Me: Yeah, yeah. Totally.
Them: Shivani. If what they would expecting to move away from Emerson. I’m not sure when I would be hearing that. And somebody else can correct me here when I’m here. We’re moving away from Emerson. My understandings we’re moving away from Emerson, doing some of the administrative stuff for us that they’ve been doing. But it’s not necessarily that this is my understanding, Jason. You can correct me. It’s not my understanding that Emerson, being the one feeding us data, would stop. Jason, do you know more than I do on that piece? I don’t have an update on where we are with that, but what I was going to say was I don’t know how much we should be investing into Emerson, knowing that there might be some transitions later.
Me: Okay?
Them: So, like, yeah, you know, in an ideal situation, like, I mean, all Emerson’s doing is grabbing the Walmart data from their APIs. And loading their snowflake. Right. So depending on how that relationship forms out in the future, You know whether or not we’d have direct access to the Walmart data in the future? The target data? That’s unclear to me.
Me: Yeah.
Them: I think that’s something me and Shivani have to, like, feel a little bit more about. So, you know, I guess that’s a long way of saying, like, let’s not put too much emphasis of weight, you know, on. On the Snowflake Private share specifically for Everson Just because I don’t know if that’s even something that’s kind of like. But I think utam this super, super long term either thing, like if you were to tee up the questions you have about Emerson. Let me like ping that to Phil because like, rather than just telling you don’t hold that like, let’s actually get you an answer. So if you’re like, hey, Phil, I want to understand the future state of Emerson from a data perspective, then I can tee that conversation up with him, or he can, like, you know, he can comment Async. He’s like, pretty into async. Com. So if you want to ping him or you want to just comment, like, feel free to do that.
Me: Great. We have something that’s on my desk to review, which is like our overview of everything in Emerson. So as part of that, I’ll put in like this is what we found in there. Like there’s this sort of, like, match, like what you guys expect. And, like, yeah, tell me about the relationship. And, like, what do they promise? Are we getting everything we promised? Like, a lot of those questions. Because we can also go out into the market and see, like, what other options are for this identical data. Just would love to know, like, what the relationship is, so. Okay?
Them: Yeah. Cool.
Me: And then maybe just in a last few minutes, sort of like. You know, both of these sort of fit like our criteria for procurement. So this is one thing, Jason, that I, like, heard you loud and clear on, like, we don’t want to sign super long term stuff. We want to have the flexibility. And so in terms of, like, the data stack. Of course. Like I said, any tool. They’re going to try to offer annual discounts. You can totally do both of these month to month. And weekend evaluate, you know, different phases of contracts. Both of these forks are very flexible. Like their salespeople just want to get money. So they’re, they’ll, they’re. We’ve seen them structure many types of deals to give you the opposite. Like in bi. Very rarely have we seen BI tools do month to month deals. So when we get to bi, it could be something we fight for, right? But I can tell you who who’s worth fighting versus not.
Them: Right.
Me: You know, so that. That’s, like, when we get to there, I can. I can give you a little bit more context, but at least in the warehouse world and UTL world shouldn’t be a problem to adhere to that.
Them: Right.
Me: Both of these have both. Both of these are great, like security constraints and things like that, so I wouldn’t worry too much there. So I think like, probably a next step, you know. Here is as part of our just like top level summary, Shivani, I’m just going to highlight a little bit more about, like, why two tools. Very clearly. Like both for redundancy reasons. And to support the long tail of connectors. I will, you know, kind of agree with Andy that supporting long tail conductors could be going with these or building ourselves. But that is the ultimate, like, I think net net here. I think what we can do is for both of these tools, we can start to load data and neither of them will force us to pay until they get an understanding of what our monthly expectation is. So we won’t have like five train as a 14 day trial and we can go get that extended polyatomic also similarly, they typically offer folks that like, work with us, like a month long trial. So it’s both things that we, we, we won’t have to like, basically be like, cool, here’s like 10 grand and start, like, we’ll, we’ll slow, we can slow roll into these. And then I think, Shivani, as soon as we get, like, an estimate of cost, we could probably make. A better decision. Does that seem like it’s kind of like a fair way to go? And that still kind of keeps us on timeline of, like, Basically trying to land data, start landing data before Christmas. For the rest of the team. The reason being is for some of these sources like Amazon, Shopify, We’ve seen like one to two week times to land data. Just given, like, rate limits. And volume limits on shopify and Amazon’s APIs. And so just want to make sure that we can do that before January, if possible. But, you know, we can also kick that off. Next month if we. If. If we don’t want to rush.
Them: Would you say land data? Are we backfilling historical data from Shopify?
Me: That’s great.
Them: For instance, like as an example. Okay? Okay, so you would backfill basically all of our Shopify historicals.
Me: That is cracked.
Them: And that’s where, like, we’re going to get the juice, right? Like, we had a conversation with Dan today, and he was like, oh, I feel like there’s so much analysis I would want to do if I could, like, look at Amazon historicals instead of just taking, like, monthly snapshots of data. So, like, it’ll unlock. I’m excited for what it eventually unlocks.
Me: Yeah, and that’s exactly it. Like, we’ve conned the clients where we’ve been able to get their entire historical Amazon data, and so not only, like, seasonality analysis, But like, for example, we have another client where we’re doing a lot of PO order related analyses. It’s really, really rich, especially for you guys’type level of order, volume. Like there’s a ton in there. So we’ll. We’ll go back. It’ll. It’ll grab all of that. And then that becomes basically your owned, right? So even if like you we go to different ETL tour, it’s like that data doesn’t go anywhere. Like at least you have a copy. You know of all of that? And so, yeah. And then in the, in the data warehouse memo, we’ll talk about, like, the cost of storage and things like that across each of the. The warehouse tools as well. Yeah.
Them: Sounds good. Awesome. So decision wise, I think you said by like, you know, obviously, like next week is we want to have that call. Is that when you expect us to have the data warehouse decision as well? So are we, are we going to get another doc for.
Me: We are going to get another dot for the warehouse, ideally next week as well.
Them: For the bigquery.
Me: I. I want, like, basically. Basically, like, we. I don’t expect us to go off kilter on the warehouse side, so, like, if we choose one of the. The major players, like, I’m not worried about either of these two supporting that.
Them: Okay? Y. Eah.
Me: But, yes, I would like to make both of those decisions. Before Christmas, ideally. So that we can, you know, our team can land that, start to land that data. You know, in that time period. That would be like. Amazing.
Them: Y. Ep. We’re on the same page here. I just wanted that.
Me: Okay? Yeah, that’s exactly right. The. Yeah. And I think on the warehouse piece, we’ll talk a lot about, like, some of the stuff we’re seeing on the AI side, some of the interesting things about BigQuery, NGA and some of the other options. It’s a bit of a. More nuanced. It’s a. Yeah. There’s going to be some decisions for us to make depending on our ergonomics. There. The AI Stuff, Shivani is tough because it’s just moving so fast. I don’t know who’s going to end up, like, way better than the other. And the Warehouse is. Is a decision really hard to go back on.
Them: Yep. Yeah.
Me: But. Both are really, really great. So, like, you know, but we’ll put a bunch of options in front of y’all.
Them: Yeah, perfect. Yeah, most of the, like, you know, houses can provide similar features, but for context, like, it would be nice if, if, if we know, like, if. Other parts of Element are considering any of the cloud providers like GCP or AWS or anything.
Me: Yeah. I think the team has just mentioned DigitalOcean, so I don’t think there’s like a. There’s like a push one way or another. Awaish. Which, like, would toughly push us one way. But, like, we have. We have another client who’s like, oh, we just want to procure as much as possible through aws. And so, okay, like, it makes. It kind of puts bigquery off the table because you can buy Snowflake through AWS Marketplace. And they’re like, we want to procure everything through AWS Marketplace. So there’s like, things like that that I don’t think is the case here, so. But we’ll kind of put some of the, the trade offs there and we can come to a decision. I think also Shivani. And we’re gonna. I’ll put what we learned from Source Medium about configuring and things like that in the data warehouse because they’re using bigquery behind the hood, so.
Them: Perfect.
Me: Great. Any feedback. On like this. Meaning this, Doc. There, like, I think we’re gonna. This will sort of be living as we like. You know, start to make this decision. But, like, I don’t know. I love talking about this stuff. There’s a lot of history in each of these different parts of the stack. So as you can tell, like, we. We just think about these a lot, and we’ve tried to partner with the best tools. Like, really don’t want to choose some of these guys on the bottom because they’ve just been tough to work with sometimes. So if there’s any feedback on, like, this type of meeting or this stock, let me know.
Them: Sounds great. Thank you, Utam.
Me: Okay? Guys. So, yeah, we’ll plan on having a very similar warehouse conversation next week. I think Shivani all organized that with you. And then, yeah, if there’s any questions on these, I think also, you know, on the polyatomics side, I’m happy to have their team come talk to our team and just. Say hi. So you put a face. Fivetran is a much more sales intensive process. But. Yeah, that’s probably. That’ll be. Probably be nice. So I’ll coordinate that with you, Shivani.
Them: Okay, perfect. Thank you.
Me: Okay. All right. Thank you, everyone. Appreciate the time.
Them: Thank you. Thank you.
Me: Thank you. Talk to you soon.
Them: Bye.

Brainforge Knowledge

Explorer

brainforge_etl_tools_assessment_memo_walkthrough_meeting

Graph View