2025_1_7_tech_team_check

Uttam Kumaran: Hey, everyone! Jason Wu: Hey, Utem. Uttam Kumaran: Hey, how’s it going? Jason Wu: Happy New Year! Uttam Kumaran: Happy New Year! How’s everything going? Jason Wu: Pretty good. Back to running full speed in our sprints. Uttam Kumaran: Good. At least surprises, hopefully, over a break. Jason Wu: , not during the break. Uttam Kumaran: Alright, good, good. Jason Wu: Hopefully, Bit of time as . Uttam Kumaran: , I feel this was the first, this is the first time we had, , pretty good, , company-wide break. It’s tough, because we support a lot of clients with different schedules, some clients are working all the time, some clients are not, we… this is the first year, we had pretty good coverage across everything, and , it was… it was really nice. Cool, today, , we could probably jump into things. We have a couple of topics, but really the impetus for this is as we’re starting to make a lot of, technical decisions, we want to make that, , the tech team is, , privy to , all of those. Also, we’re starting to make… we have to make some architecture decisions that we want to make that you guys are looped into. , , , having this, every week, or bi-weekly is gonna be good to batch a lot of that. , I can go ahead and share, and then, , me and Awash will , be leading. Great. today, I wanted to walk through, a little bit of our current, , architecture, based on a lot of the decisions we made. On the source side, on the ETL side, and then now going into the data warehouse. I want to talk about the Spins API. SPINS is another data source, that I know the team is using that we just… signed an agreement to then get, , API access to. I’ll be walking through just, , some questions we may or may not have the answers today. We’re talking about the Emerson data share, and, , what the ask from our team is gonna be. I know Shivani brought up a topic around, , the Shopify Insider Bundles. how that’s gonna potentially talk from Shopify to NetSuite, and … I did want to have a conversation about, , app integrations versus reporting, and, , what we should have the reporting infrastructure handle, what we may need direct integrations for. And then, we already started some modeling work, I do want to have you guys, just get an insight into, what we shared before, which is , , now that we’ve landed some data, how we’re driving towards, data model. let me go ahead and just walk through… The, , current architecture… Give me a sec… , , I’m… I’m gonna… this is, . pretty similar to what was before, I’ll probably start saving snapshots of this you can see the progression. But a couple things that are new here. One is we made a decision on a data ingestion tool, before this was just, , an icon. , the green checkmarks are , , confirmed in place. And then anything that’s, , a cog is, , , in progress, and I can explain what in progress is, depending on the phase. , starting from left to , we have, , all of our core raw data sources. As we identify future sources, we will start adding them here. Of course, , NetSuite I know doesn’t exist, or doesn’t exist in its final form now for us to consume it, but it is something on our radar, as we identify more of these sources, I’m next we’re going to start to add, some of the information Needed for marketing, needed for wholesale, we’ll add some of that there. , now, we’re working on Shopify, Amazon, Walmart, Recharge, and Emerson, and Spins. everything that we have access to from the side of the business. in terms of, , what’s in progress, now we have Shopify, and recharge coming in through Polyatomic. The Amazon and Walmart connectors are being built now. Maybe, awaish, we can talk a little bit about, , what’s… Remaining there, but I believe everybody on the team is with… is… has access to the Polytomic UI, and has, … have you guys logged in in the last two weeks? Maybe I can also… potentially log in and just show you, , what it looks in the UI, if that’s helpful. Andy Weist: Quick question, where is our 3PL stored in the grand scheme of things? Are we trying to integrate that data source In it, , early on, or are we pushing that for later? Uttam Kumaran: , we’re… we’re… it is part of the, It’s… it is somewhat looped in that integration question, but… , we need to bring that in. Where to go is the other one that’s not represented here, . I missed putting that on, but, we haven’t talked yet about… Andy Weist: Stored is where to go, stored required where to go. You can ignore where to go. For all intents and purposes, as of next week, we’re only on stored. Uttam Kumaran: , , we’ll… for where to go, then, it may just be a historical backfill that we’ll need? Yep, yep. And then, stored. But is… does Stored have all the 3PL data, or is that coming… will that come from… the 3PL… , themselves, , I don’t know what the… . Andy Weist: Going forward, stored will be our only 3PL for D2C. They’ll be the source of truth for all Shopify-filled orders, as as some others. you said, where to go will have a historical backfill. We should… Steve, I don’t remember where we’re at with… they were gonna give us, , a database dump of all the Where2Go data, … It may be very easy for us to push that in. We need to circle back on that with them. But as of next week, going forward, Stored will be our only fulfillment partner. They have an API we can integrate with, etc. Uttam Kumaran: , if you… if you guys even want to toss it… if there was an email about this, or you want to toss us in there, feel free, , we can push this. We will just copy and make we have a copy sitting in Snowflake. Wall this, because what we’ll end up doing is stitching these together. In the data model to show you the historical data set. Cool. then stored is… is then something we’ll add to the list. , , , other than that, , , if you log into the Polyatomic UI, you’re gonna just see these connectors. We can go deeper into that if that’s of interest, but otherwise, it’s… we’ve added the API key. When Awish goes through his demo, you’ll see the representation of the data in Snowflake, as as, , how we did the naming conventions on the schemas, that, … you’ll… you’ll see those pieces. now, I’m waiting to just hear from, , Awash, and team on… when these are, , totally done, that we verify we have all the information. there was one access requirement remaining on Shopify, and then there was an API issue that Polyatomic is talking to ReCharge about. But ultimately, these, I expect to move to green. , , any time now, … Anything awaish on… On any of these that we want to follow up on. Awaish Kumar: , , we… , for Shopify, we got the… access, because now I’m seeing some rows coming in for, , transactions data. , for recharge, it’s still, , from the recharge side. There are some bugs the polyatomic is seeing, there’s nothing, , we need from tech now on these two things. But, , on the Walmart, I… we still have… don’t have, , access to… the dashboard. And, secondly, , the Walmart eComp? And… Apart from that, we… , we are good, , In terms of, , the Pyatomic is… is working on Amazon and where to go connectors. And now that… you have replied for the spins… API, we can… . start to think about implementing the SPINS pipeline to bring in the data. Uttam Kumaran: do we need anything remaining on… on Shopify? , Shopify’s good, and then we do… We do need access to the e-commerce… the Walmart e-commerce UI. Awaish Kumar: e-commerce, we… we need UI access, but also in terms of, We are not clear yet, , what Polytomic needs to pull in the data, ? , we’re still figuring out. Jason Wu: Is the Walmart stuff, is that the snowflake access that we gave you to the Walmart tables, or is that something else? Uttam Kumaran: Two separate, the Emerson stuff is all retail. their, Walmart e-com will be through, , a portal, typically. . Who’s, who’s working with you on that one? We got it, Carlos. Awaish Kumar: Is the one. . Uttam Kumaran: And then, , spins, we are just working on a spins pipeline for another client. Ashwini is working on that. we will, , … I have a… I have a bit of a slide on, , what we’ll be looking at there, and , … we can talk about, that when we get to that topic, but maybe I just continued through. , we’re landing, we’re… Oh, , , please. Shivani Amar: Sorry, I was raising my hand, but I was , I’ll just speak. , my question on Shopify, , and I know we chose Polytopic, but maybe this is an annoying question, but, , the speed to ingest the data, is it , oh, FiveTran would have been… I don’t know how these things work. Is it, , Fivetron would have… Uttam Kumaran: It’s a Shopify rate, it’s a Shopify rate limit. Shivani Amar: There’s a Shopify rate limit. , thank you. I was just , we chose a good ETL, ? Uttam Kumaran: , it’s all of these… all of these folks have various rate limits. You guys sell a lot of stuff. it will take a while. We have customers that, . It could be a week. We have other folks that it’s… it’s a very similar time, , , , we’ll take a few weeks. The nice thing is, , the shape of the schema is already there, we are already continuing modeling, it doesn’t necessarily restrict us to start modeling, … Shivani Amar: That’s awesome. it’s just, , at this… is it just, , the… Uttam Kumaran: Think of, , a bucket getting filled, but we know, we know the shape of the bucket. , , it’s getting. Shivani Amar: It’s just pulling older data, it’s not necessarily pulling… Uttam Kumaran: I don’t… I don’t know, Wish, what direction… They’re… they started in, but… Either, , either way, , we already have the shape, all of the columns, we know that, we’re already starting modeling. But , , each of these, , their rate limits are not, , they… they just… these are just by default, , you can’t really do much. , Polytomic does have Specific, , because they move a lot of data from a lot of these folks, they… We’ll always go and ask, , hey, we’re moving, . the, , a huge amount of rows, , can you give us exclusive API access? But Shopify just doesn’t do that for anybody, … Shivani Amar: And I… I know you were… you have this on your agenda for later, but before… while we’re on the Shopify topic, can we just talk about the Insider Bundle piece? Uttam Kumaran: Oh, , , , , please. Shivani Amar: Steve, you might have some context here around what Jeff Warren is, , looking into. he was pinging me, letting me know… he was meeting with you, Steve, yesterday. And I don’t know if you have context or want to, , just discuss that topic here. Steve Sizer: Jeff was looking into trying to find or get some location data and inventory data for, , the distribution arm of of Element, . he’s in discussions currently. With Encompass on trying to find a solution for that. the API access isn’t available to us unless we do a certification, they’re looking into hiring a professional services team to take on that work. Shivani Amar: And the question that they’re trying to answer is… is, , around specific SKUs we sell and where. We’re , what. Steve Sizer: it’s… , they’re just trying to get access to better information with the, , local distribution, … same Bozeman, and now it’s gonna be Austin, and… and … What inventory we’re selling in those smaller, , shopping or retail settings? Shivani Amar: And , , Utham, when you hear that, are you … Uttam Kumaran: I… this seems in our world, . Shivani Amar: , , that’s where, Steve, I was getting confused because he pinged me, and he was . Uttam Kumaran: I heard the two things that you mentioned, Siobhan. There’s one thing about, , moving data, splitting data up in Shopify, and then moving it to NetSuite. And this is almost … Still about insights and reporting. Shivani Amar: Steve, do you have more context behind it? We can also just talk to Jeff directly with Brainforge, if that helps, helps resolve it. , I can’t even see if he’s free. I just… I was , why are we adding an app when, , we’re gonna get all of this granular data and we can always. Uttam Kumaran: , , you mentioned, you mentioned Trackstar. We just talked about stored in the, in the, in the… in the 3PL, we’ll get the 3PL data from that. Steve Sizer: Encompass is a separate system, and the distribution is almost a separate entity to element. that’s how they manage all their inventory, is through A separate system. Uttam Kumaran: , Encompass is… it’s , , it looks some type of ERP. solution, , I know there’s, , a bunch of these. , , ultimately, if they’re trying to get data out of this, and this is gonna be helpful. For the business in the future, we should just… Try to run it through this system, if it’s… , , , if that’s all I’m hearing. . I feel it’s not… It’s not at all dissimilar to what we’re doing here. Shivani Amar: , , I just pinged Jeff, and I’m just saying, if you want to hop into the Zoom, you’re welcome to, because I was , this was just an open question that I was … , , my… And Jason, feel free to weigh in on this. It’s , what I… I wasn’t if, , people need more education across the business around, , the future state of the thing. , when Jeff was pinging me, he was saying something , but when Shopify is linked to NetSuite. And I was , maybe I need to be educating more… people more on, , the vision that eventually everything’s connected to Snowflake, and the data’s moving in and out of Snowflake relative to us connecting everything on the outside, and that was, , the takeaway for me when I was talking to Jeff. Because I wasn’t quite if he’s just, , looking for an app to make some other connection point easy, if we could just do that through this data stack. Do you have any thoughts, Jason? Jason Wu: it just… we just have to have a quick, , , level cycle. I have not been part of those conversations with Jeff. about what I was looking for, but from what you just described, it sounds We might be doing a little bit of overlap, … Shivani Amar: , it just felt, , redundant, … Great. I’ve invited him, if he’s free, he can join. Uttam Kumaran: Cool. , we can get to the… we can get to the bottom. Again, typically there’s, , there’s a couple of use cases, sometimes people are just trying to get data out of a system to report on it. That’s all we’re doing here. Sometimes people are , when an order comes into Shopify, I needed to talk to to NetSuite, and do some logic in between, and that needs to happen when the order comes in. That is not something that you should. Shivani Amar: you should handle through your reporting system, because there are delays. If that’s what he’s looking for, that makes much more sense to me. He’s , I need it for, , handling the order, almost. then it’s , , then it shouldn’t be , in 6 hours, your data warehouse gets delayed. Uttam Kumaran: We will still get all of that information, because we’re gonna get NetSuite, and we’re gonna get everything in here, but it’s not for transactional or, , operational , the order fulfillment process, ? Shivani Amar: That makes… that’s a very helpful distinction. Andy Weist: Our… , I need some context from Jeff as , though, but… My understanding was that… None of our fulfillment process will be reliant on NetSuite anyway. , I don’t know that we should have the assumption that NetSuite needs to be more real-time than our BI. Jason Wu: think either. that’s where I, , it’s maybe a little bit of a game of telephone. I’m hoping Jeff just joins and explains what he’s thinking. Andy Weist: , this is more context just for you, Atum, to make , , you have the same understanding that I do. Uttam Kumaran: , understood. Andy Weist: That opinion could change. But, , historically, we have talked about NetSuite not needing to be real-time. , , I come from a place… an opinion of… having NetSuite following our Snowflake instance would be in the way we’re looking at things today. let’s just keep that door open, because that may be a cleaner implementation of both of these projects, and I just want to make we do things as simply as possible. Uttam Kumaran: . , reason I’m saying is people use NetSuite for a whole host of things. , NetSuite’s a catch-all for, , a thousand products, … Andy Weist: , we’re not, , managing inventory or anything in there. Honestly, the first iteration of it is mostly going to be for accounting purposes. , really, latency is not an issue with our NetSuite implementation, as I currently understand it. Uttam Kumaran: that’s something , Awash, , in our, data source docs, let’s just put some of, , the core… , use cases on these. That way, especially for NetSuite, where, , there is a suite of tools, we indicate, , the the core reason. And then, once things move through polyatomic, they’re ending up in raw. Raw is… again, these… these are all, , , … digital abstractions of data. RAW is not, , a system, it’s just a database. What this is mainly indicating is, , how we’re logically separating different parts of our modeling process. all of the data gets landed. We then, , clean it up and combine it. for example, combining Shopify and Amazon, and then producing, , one , orders table, which happens in, , March. And one thing, couple things that we’ve done here is, , GitHub is set up. dbt is set up, and, , Wish will be walking you through today, , one of those end-to-end processes, that we did, , on some Shopify data. And then, as we move to the here, this is where this data , what we usually call is, , gets activated, ? , gets sent sent into a visualization tool is probably… it’s more, , it’s visualized in a Viz tool, this is a BI tool. It maybe gets used downstream by, , this is new as of the last, , year, but, , gets used by AI agents in some way, and then additionally, it may get sent to a CRM, or sent to go-to-market tool. A common use case here is, . folks, want to leverage high LTV customers to create a high LTV Klaviyo campaign, and the Klaviyo folks need some of that model data from Snowflake, we would move that into Klaviyo for them to build more targeted campaigns with that rich data. That’s just one use case of why you would move data outside of Snowflake into a tool where, . , there’s a… There’s an operational use case for that data. The reason you wouldn’t do that all in Klaviyo is because Klaviyo doesn’t have modeling capabilities. To this degree. , combining a lot of data, doing really sophisticated, LTV or, , other analysis is not possible in Klaviyo, you send it to Snowflake, it processes things, and you move it back there for targeting. that’s, , the state of the architecture. What we’ll see over the next, , while is, , more sources will move to green and get checked off. Additionally, we’ll start building marts here. I would say this document is not gonna be, , the ultimate source of truth for what tables exist. This will be more visual. We’ll end up with, , quite a lot of tables, but that spreadsheet, and there will be some other documentation that comes out of dbt that will become the source of truth of, , what’s in the warehouse. But we will maintain this diagram to the best, , of our ability, because this is, , the the snapshot of the process. I saw Jeff is here. Do we want to switch back? Topics, really quick. Shivani Amar: We could also do a separate conversation, Jeff. … the thing that was coming up was, that I brought into the flow was, we were talking about how Shopify data syncs into our warehouse, and… we were just trying to understand if the app that you’re exploring is needed, but we’re all lacking a little bit of context around the app, and maybe we could do a separate call to understand it better. Jeff Warren: Andy and Steve have… all of the context for me, and much better knowledge and thought on how to execute this. , I feel I can be pulled out of the loop on this, because they… Got it from here and know where they’re gonna put it in the workflow. Andy Weist: just… let’s clarify. I heard a couple different things regarding what Jeff is looking at. One thing, , our conversation the other day was based on our product bundling, meaning, , one insider, one SKU contains multiple products, and how we’re handling that in Shopify, and how it… translates to actual fulfilled items in an order in our… in our warehouse. That’s the app we’re talking about? Is that ? Jeff, I know that’s the app we talked about. Shivani, does that make sense as far as, , what you brought up earlier? Shivani Amar: , Jeff had just pinged me saying that he was looking into an app that could help us disaggregate what an insider bundle is, and my confusion was, isn’t that. data warehouse will do. , our data warehouse will, , have that granular data. And my question was, is it needed, , immediately, or is it something that, , our data team can just report out on, , this is what the breakdown is, and then you don’t need another app layered into the system? I just didn’t totally understand the use case. Andy Weist: , then Jeff, you’re . I have enough context to make I push this forward in the way. , thanks for the clarification. Shivani Amar: , thank you, Jeff. Jeff Warren: , Shivani, maybe just, , to say it in layperson language, the issue is less the, disaggregating it into, , the warehouse software, , that’s already happening. . , they’re using, , a middleware to do that. It’s that it’s making it into QuickBooks. , the same way, that, , there is a syncing across the process that you can see, , , this insider bundle was made up of these three items, here’s the cogs for them, and then, because it was sold as an insider bundle, . Instead of it being $30, t h ey w er ee a c hin d i v i d u a l l y$ 27, there’s $3 of trade spend that is going against each one of these as an insider bundle trade spend line item. Cool. Sweet! Shivani Amar: Thank you. Jeff Warren: , catch y’all later. Shivani Amar: Base… Perfect. Uttam Kumaran: let’s go back then to our deck. Cool. And then… we just got access to… , I just signed the thing for spins today. But, Shivani, I don’t… I was just mentioning, we… we are doing… Spins work for another client now, too. , , , why we have some of the docs, and we’re familiar. I’m interested in these questions, , are we already pulling any data? And if not, , what data? There’s a lot of different endpoints. If we don’t know, then we can also go explore and come see what we find. Shivani Amar: we, , from my understanding, people are, , looking at SPINS reports, and you guys weigh in if of any other data being pulled, but, , people are looking at SPINS reports and, , looking at SPINS data. , our distribution team self-distribution team is, , looking at geographic data, because that’s the places that… that’s the place that geographic data is most clear now in our business, to try and figure out, , , , where would we want to do self-distribution? people are going into Spins and , , looking at the SPINS dashboards. People are looking at, . velocity of sparkling relative to other brands of, , , how are we performing relative to… I’m gonna have sent you one of the Spins reports, ? , people are looking, , in the ready-to-drink section, , how are we performing relative to Red Bull and whatever else? that stuff is, , these, , exports or reports versus, , people, , cutting the data or anything that. Uttam Kumaran: , , Ashwini, maybe what we can do is we can just see what endpoints we end up with access to. And in this situation, we’ll just profile and give an output of, , what we… what we find, and then we can… , end up… We could decide on if we want to just take everything we find, or… Limit it, ? Ashwini Sharma: maybe, if… If you could find out what’s the smallest granular data that you’re looking at in Spence report? Is it a week worth of data at the lowest level, or is it… , you never look at a week’s worth of data, and you just look at a month, or maybe four weeks, or a quarter. Shivani Amar: Jason, do anything there about how it’s being used today? Jason Wu: I don’t… this’ll all be… Will and Russell. Shivani Amar: . , honestly, … Jason Wu: , is the question… maybe the question behind the question, , we don’t want to pull, . Shivani Amar: unnecessarily large volumes of data if we don’t need to. Uttam Kumaran: , there’s just, there’s just, . , that’s it, and there’s just a bunch of different APIs. , by default, we can just take everything, but it may be, , may not end up useful. Shivani Amar: , , let’s say, , let’s think about the business question for a second, ? , there’s a business question around, . Uttam Kumaran: Even just… I’ll bring up the endpoint, even just, , we can talk specifics. , this is, , the docs that we have. we will get, , access to this, which is, , store insights, , Ashwini? And we have to decide. , of these, , 500 things, , what do we… What do we want? 460, , , … 460. Ashwini Sharma: 465, 465 for the other client that I’m working on, . Shivani Amar: , let me start with the business question again, because this is a little overwhelming to look at, ? But it’s , if I’m trying to think about how my… my question is, if I’m trying to think about how my product’s velocity is, I’m already getting some version of that from Emerson, ? , am I getting… I’m getting point-of-sale data from Emerson? I’m getting a sense of my velocity from Emerson, But then, … Uttam Kumaran: , you’re not getting benchmarks. Shivani Amar: , I’m not getting benchmarks from Emerson, which is what the point is of spins, but am I getting, from Emerson, am I getting… I’m getting velocity, and… sorry, the… , you just said something, and then I got thrown. Uttam Kumaran: You’re getting velocity sales, and you’re getting, by… by market, ? . stores, how many… Shivani Amar: I’m getting there, I just need to… I need to use one second. , you’re getting point of sales, , from spins. That’s what I’m trying to understand. Are you getting point of sales from both? Uttam Kumaran: That’s what I… that’s what I… I will have to go in and… we will have to go in and see what… what… element is using spins for. I’m not , but I don’t know, Ashwini, since you’re already… you just looked at it today. Ashwini Sharma: , it’s more a report, the spin state, ? There’s single grain, where you can track the order, ? And instead of that, what you see is a week worth of data, ? For example, . what was the number of units sold for this particular product, brand, in the last one week, ? That’s at the lowest grain of data that you can see. , and that’s where I came to this question, ? Do you see it at that level, or is it… more , , I don’t see a daily report of how things are changing, I just see weekly data, ? That’s the least that we can go. Otherwise, it is a month, quarter, 52 weeks, that … But, Ashwini, are you able to pull. Uttam Kumaran: Do you… you’re able to… you have to pick the dimensions, though, , that you need? Ashwini Sharma: , those attributes are the dimensions, ? If you see there, there’s attributes and measures, and if you look at the measures, that’s the quantity that you’ll be seeing on the report. And those measures are aggregated to 1 week, or 2 weeks, or 4 weeks, or more, ? Up to 52 weeks. Shivani Amar: tomorrow you’re doing a discovery call with retail, ? And , tomorrow you’re doing a discovery call with retail, and we can, , ask some questions to him around, , how often are you looking at Spins data, and what are you using it for? . My sense is that if Emerson is giving me point-of-sales data, and, , telling me, , how much transactions are happening regularly in a Target and in the store else, that Spins is then more for benchmarking, and if Spins is then more for benchmarking, then I don’t understand… I feel we could do the benchmarking on a monthly basis, as opposed to, , a daily, weekly transaction thing, if that helps. limit our, . unnecessary volume of data. that’s my intuition, is that you can go monthly with spins if the point of sales is in Emerson, let’s just, , clarify that, and then we can double-check that hypothesis with Russell tomorrow. Awaish Kumar: Your data is available in Emerson for Walmart and Target. Uttam Kumaran: , let me just note that down. , tomorrow we’ll ask about Emerson for retail and spins for Benchmark. Ashwini Sharma: And do we have the client ID and client secret to explore the. Uttam Kumaran: , we don’t have anything yet, . Just signed a thing today. Awaish Kumar: But, … We saw in retail, there were other… stores, other than the Walmart and Target, … , that data is not coming from Immersum. Uttam Kumaran: Hmm, maybe if Spins has non-Walmart target. Shivani Amar: POS. Oh, . Jason Wu: This is all the industry data, ? Shivani Amar: , but this must have our point of… if we’re not getting… , how are we getting point of sales for Vitamin Shop? Probably in spins, Jason? Jason Wu: We’re , how do we… , that’s a separate report that they just sent us. Steve Sizer: , they send a… , they send a report to Jessica. Shivani Amar: Oh, , then, Utham, when you do the discovery call for retail tomorrow, let’s just know. Uttam Kumaran: What other. Shivani Amar: There are other retailers. That we need to get. Jason Wu: We need to get that stuff into our warehouse. , , my understanding of spins is that it’s entirely just benchmark, ? , how’s, , the industry performing? And, , how are we relative to our competitors? Shivani Amar: And, , one thing… one thing on spins that I was talking to Will about, , on Monday was, pricing, ? It’s , especially in this world for a sparkling beverage now, we’re still trying to figure out how we’re doing our pricing. And , , Phil said, can’t you just get pricing from Spins? Why are we, , we’re having people on our team go around and take pictures when they go to Costco, and take pictures when they go to Walmart, and take pictures when they go to HT, and then … Phil says, can’t I just get that from Spins? But Will, the head of re… revenue, or commercial, is , Spins is gonna give me an average, and I want to understand, , how Costco prices us. or… sorry, , how Walmart prices us in, . Southside Chicago versus in Beverly Hills thing, because they might have a price variance. he’s , I don’t love spins because it’s just giving me an average. anyways, he still wants to deploy people to take these pictures and, , do this thing. , , pricing data is a thing that we would want to look at, but also take with a grain of salt as not, , the source of truth on how we think about pricing our sparkling beverage. Uttam Kumaran: Again, maybe that is a… , go ahead, go ahead. Shivani Amar: Eventually, when we settle at a price that we’re all feeling good about for sparkling, , maybe this’ll be, , a done… , I’m you benchmark over and over in life, but, , but, , the exercise now is very much, , are we pricing it ? , we’re trying to be a premium beverage. But, , are we pricing too high? , how are we feeling about our velocity? And all of that is, , people looking at the spins data now to say, , , our velocity’s , but it’s also the winter, and… overpricing relative to this, we haven’t landed… we haven’t, , totally… we don’t have a high confidence now, I would say, on how we’re pricing and how our velocity is going with the sparkling beverages. Uttam Kumaran: , what I’ll do is we’ll also validate, , if… is that just because they’re limited about what they see through the UI? , does the API end up having more? And then we’ll put together, , a one-pager on, . what does element have access to as part of spins from the… from the API? Shivani Amar: And that seems once you put that one-pager together, then we can pressure test it with Will and Phil, and send it out as a document, and they can weigh in. You can pick Russell’s brain tomorrow, but that’s spot on, to have, , people weigh in on, , I really want this data versus not. Uttam Kumaran: , , great. Shivani Amar: Thank you. Uttam Kumaran: , let me just share this again. Cool, , , we have some questions for retail, and we have some, , I’ll add this to our agenda for tomorrow, We’ll get understanding, and , this is a huge insight, we’ll ask them all other sources of… retail or retail distribution data that they’re getting, outside of Emerson and Spins. We’ll try to get a list, and we’ll see what the… what the damage is. The… , , when I hear this, , this is pretty classic. It’s, , not, , not a surprise. The problem is, is, . Supporting the nth retailer and the nth retailer’s format is, , the difficulty here. , most likely. if, , we get a list of, , there’s, , 30 other people that are sending CSVs and PDFs, we will prioritize by… Revenue, or, , immediate need for reporting, . Joe’s Crab Shack is selling element, , on the beach, and they’re selling, , 10. We don’t… , it may not be the highest priority, but a vitamin shop, I’m , is a big one, … And then on the Emerson side, , this is something, Jason, probably a question for you. we… we have Emerson’s, Snowflake instance. We now have our own Snowflake instance. You can’t… You can’t share someone else’s share. we just need them to share with our thing now. And we can provide you with all of the necessary… they just need our account ID and a couple things. And we can probably draft that email. If you want to loop us in, we can talk to them. I don’t think they’ll have any problem based on the last email they sent. They seem pretty sophisticated, … Jason Wu: , if you can give me the… the draft and the credential, I’ll… I’ll get the going. Uttam Kumaran: . Awaesh, anything else there? Awaish Kumar: , , that’s it. Uttam Kumaran: We already talked about this one, and then, , maybe I wish I can hand it to you for… By walking through dbt. Awaish Kumar: , I can… Share my screen. , … , I can just give an overview of Polytomic. This is how it looks . And we have connections to Recharge, Shopify, and snowflake Warehouse. And these are the bulk syncs, which are syncing our data from Shopify to Snowflake. And in the history, you can see how it’s doing. And we can see here that it is taking some time for the initial sync, and that’s the reason we don’t have order line data yet. And because. Uttam Kumaran: Maybe I can pause here, Wish? , if you scroll this, see how it’s hitting around 2.26? , that’s a… that’s a… That’s a Shopify cap. Typically. within, , 10 hours or , within every hour. Awaish Kumar: Of course, it’ll be, , something on the milliseconds, you can only hit the API many times and get a certain payload. Uttam Kumaran: … Since we kicked it off, we’ve been landing data consistently. 2 million, it doesn’t mean anything about… instead of, , what you’re looking at, ? , but when they… from the API side, they don’t really care much. , they see it as, , amount of records. For us, we are pulling specific tables, if you want to click into one of those syncs, Awash. to show the team exact… , , one of the… . this is , , what all the objects that we’re getting. Awaish Kumar: , and each, each run, what, table was … data for what Twitch table we pulled in. It just shows all of that in here, and We see that… The orders table has all the data now, but order line table, , if we just look at this, this is still in progress. , that’s what we are missing. Once it is done, , we will have the… Next incremental runs, and it should be just fine. , now moving on to the dbt part. in our BrainForge assessment repo, we have added a dbt project folder, which is … which contains the dbt project. And, , this is structured in a normal dbt. standard way, how dbt structures its projects. , in here, we have the models, which… where we normally work, and then there’s some… a folder called macros, which have… , here we play some… Sql, which is, , standardizes our business context, or the… Some, … , if we want to make some custom naming, or, , the piece of code which we want to reuse in our multiple models, that’s what we can put in here. And then if we go back to models, this is how we are structuring it. we have raw, mods, intermediate, and mods. Raw is just the SQL queries, which are showing the structure of the table inside of the database, or data warehouse, which is Snowflake. now, I have, , we have two sources, Shopify and Recharge. If we go into the Shopify, , these are all the tables which are syncing for Shopify. And if we just… if I open any one of them, , raw customers, these are all the columns which are… we are getting, through Polyatomic, for all the Shopify customers. , normally, at this point of time, , in the raw, it’s just mainly one-on-one mapping with whatever is in the… the database. And then, we go to the intermediate, where we start to model stuff, and… do some transformation, or cleanups, that. I’ve started adding some models, . deemed user and the orders, it’s , here I’m trying to get… all the… fields which we can get from the Shopify customers table, and then on top of it, we can have some more fields from orders, , for each customer, how much is the total revenue generated by this customer, how many… Wholesale orders, the lifetime wholesale revenue for this specific customer, and lifetime Retail revenue, , which is not the custom, . Which is not the wholesale. And, for further discussion… . Uttam Kumaran: If we could pause… pause here, , , , even one step out of this, . most of our work here is gonna be writing, , a ton of SQL this. this is, , where we live as, , analytics engineers. ? We’re just writing, , data models. As you see here, there are, , of course, , several types of transformations, , we’re doing countless stinks, we’re doing sums, but we’re also doing logic, where, for example. we, for example, have, , sum of total price, which is lifetime total revenue. In addition, , some business may request, hey, I want to see, in addition to total revenue, a column for wholesale revenue and retail revenue. ? , , we’ve… we forecast that that will be a question, we go ahead and build, that… those columns. you see here, we say, case when is wholesale? Is wholesale as a boolean? sum anything where wholesale is true, sum the total price where wholesale is true, and put that into this bucket. And these are all the types of transformations. I would say these are fairly simple transformations. When we get into, what complex transformations, when we get into, , subscription, churn, we’ll be looking at net new subscriptions, net new expansion, reductions, churn, cancellation, , resurrection, there’s a lot of complex modeling paradigms that we’ll be doing. But of course, a lot of, , the complication is stitching systems together. , this you’re seeing is just something for Shopify users. The Amazon customers table looks a lot different, has different columns. The source columns, although you may, , see the name and say they’re very defined. Similarly, they’ll be defined differently, and … We have just worked with some of this data, we skipped past some of the figuring out, but in the past, , you have to go read the docs and understand. How Shopify describes an order, how Amazon describes an order, and how we should stitch those together, , in SQL. Andy Weist: Do these run based on an event, or a time, a cron schedule, or at query time, … the question is, what’s the latency of, , something this, which is, … Awaish Kumar: , I can show you, , for this to run, I have set up this CICD And… which is in the GitHub Actions. , in the GitHub Actions, I… now, I have added two different workflows. One is on the PR validation, which just validates whenever we push new things, it will just test it if it is working. And then the other one, and whenever we merge it, it will also run that, workflow. But the second one, which is running in the production, it’s , now, it runs, on a schedule, , on an hourly base, , but we… it depends on how we want it to… want it to be. It can be moved to, , we need to refresh all the data once a day, or… Twice a day or twice a day, and we can just set that up. Andy Weist: , it’s scheduled, and the source of the scheduling is GitHub Actions. That makes sense. Cool. Uttam Kumaran: And the ordering happens, , , you run dbt run, it runs a compile steps, it figures out that this model references the last model, because it’s all Jinja templated, and then it runs it in order. But running a query… running a dbt model, doesn’t necessarily mean it will get materialized in the… in the… in the warehouse. And , materialization means, , you materialize a view, or you materialize as a table. A view is a query that gets run on query time. Versus a materialized table, you can think of it as a fixed table, that when a query hits, it runs on that. , that is configured at the model level, if you go back to that table, Oasia, you can just show the config at the top. Where we, we … you put in the… Awaish Kumar: Oh, the macro. Uttam Kumaran: , or, , you could just show the, at the top, you put in the table config for the DIM users table. Yep, the materialization schedule is table, meaning it will get written as a table in Snowflake. there’s… there’s… there’s just reasons for having views versus table. Sometimes, if tables are, directly similar to other tables, you may not want to just materialize the whole thing. , think about one table that’s 100 million rows, and another table that just… changes one column from that, you may not want to materialize that again, because now you have 200 million rows. Another, common materialization strategy is ephemeral. This is, , this is, … This is not a view, this is just creating a dbt model, as a way to represent, , almost a CTE, which is a common table expression. If a dbt model gets, , a thousand lines, it’s really hard to read, and we commonly will split things up. The last one is incremental, again, we may end up with tables with hundreds of millions of rows. You don’t want to run that and drop it, rerun that every time, you instead just want to add the latest, , add the latest orders, add the latest transactions. Those are incremental. , there are different configuration types, that dbt, again, helps you manage that just through this templating. Awaish Kumar: , if I just show this in Snowflake, this is how it is structured now. for each folder, , the raw is coming from this raw, we are … the data from Polyatomic goes into this raw tape, raw database, and that’s, , our queries and raw just reflect this structure, what is in there, and then apart from that, we have intermediate and then mods. , , and they are, , if we… if you see, we have… two databases, Prod, Intermediate, and ProdMarts, which show prod is the environment. the first part of this name says the environment name, and the second part is just the folder name. anything which you find in intermediate goes into the intermediate database. And all the models which are in Mars go… go create a table in this, , the MARTS database. In the MARTS, we have the customers. mart, and in the customer’s mart, we have a DIM users table. Similarly, for intermediate, we have Shopify. For Shopify, we created two different tables. it’s the same structure as you see in the… In the GitHub code. And then the first chunk shows the environment, which is, , if it is running in production, you will see in the prod. If we are running it in our, , local dev environment for testing, then it will end up in the dev databases. And our CICD… we have a CSID pipeline, which is, , staging, and for that, it will end up in SDG. databases. , and the… all the other, , the… Folders will become datasets. And inside of that folder, if there’s a model, it becomes a table in the snowflake. Uttam Kumaran: there’s this matching, of the repo structure to schemas and tables. But see, that is all just, … you don’t need… you don’t, have to do that to run dbt, but… after doing dbt for a long time, , it’s just very nice to keep things organized this. This is more ergonomics. we match… the first layer, folder structure to schemas, just it’s easy to go from Snowflake to, the next, … Awaish Kumar: , just to add, . Here, , the transformations we did, if, . they are just for Shopify, it looks, , really straightforward now. It becomes really complex when we have multiple sources. when the data from Amazon, and when we have data from Walmart, and then where to go, then all of that needs to be joined, and we don’t have just one model GIM customer. Then we might generate, , fact orders, fact order lines, and then we have some summary models, which Which are, , we need an order summary, or we need a product summary, things that, and… , that will just then… End up being more and more models in there. Steve Sizer: With the… with that processing of that data. you’re saying it runs on a schedule of, , every hour. How does it know What stuff it’s already processed, and where it… needs to start again for the next process. How do you… how do you map them? Awaish Kumar: , that is handled in dbt, as in, , , there are multiple different, , configurations. For each dbt model. , we have just a really simple model, I’m just saying it as a table, but then there are different materializations. , one materialization is called incremental. If we use that materialization, what happens is that we use primary key of the table to figure out what is already being processed, and what now needs to process. , we will have a primary key, and we will have, , last updated timestamp. Using those two columns, dbt will figure out, , we already processed this data, and I need only… need to process last seven days of data, and in that also, , we have hundreds of same customers, then it will just leave them and update the columns, , it will use the , the absurd … the strategy, it will update the fields which were needed to be updated, or it will insert the new ones. , normally we will be using incremental strategy, because that’s, And that’s how we can optimize our processing time and the cost. Uttam Kumaran: , and that’s just one of the common ways that, , again, you’re gonna see that we’ll end up with, , hundreds of models. And to run all of those, to materialize all of those, and a lot of those will become big, this is a common way that dbt projects get bloated. , off the start, we do incremental, we use ephemeral, and we just try to stay, , organized. Because the… the sprawl will get really, really big here. I know we’re at time, as we start to do more modeling, more of our weekly textings, we’ll talk a little bit about the logic that we’re implementing. , over the… we’ll never stop talking about, , ingesting new sources, but ideally, we’ll try to push as much through Polyatomic as we can, but most of our conversation, , will talk about some of the modeling logic, and as we start to get into the business intelligence layer and, , AI , that… that’s where, , these conversations will go as , but… Happy to answer any questions about, , DPT, or… or… , how to do modeling, and we hope that everybody can get into the repo and start pushing stuff, too, … , Jason, I’ll follow up with the Emerson email. , Shivani, I’ll… I’m gonna have the… I’m gonna update the retail agenda with… stuff for… for spins tomorrow. Perfect. And I can send that, and , I… that’s it. Let me know if there’s anything else. And, , we’re gonna update the… now that we’re starting to land data, we’re updating the Gantt chart for… For all the… with all the actual data models that we’re building, the core output data models. we’ll have that. we’ll have some of that to read tomorrow. Shivani Amar: Sounds good, thank you. Jason Wu: Thank you. Uttam Kumaran: Thanks, everyone. Shivani Amar: Thank you. Uttam Kumaran: Perfect. Talk soon.

Brainforge Knowledge

Explorer

2025_1_7_tech_team_check_in

Graph View