Meeting Title: Brainforge x CTA: Weekly! Date: Dec 12 Meeting participants: Katherine Bayless, Ashwini Sharma
Transcript:
Me: Hello? Sorry, just, like, distracted, like, right before the meeting.
Them: No worries. I actually was kind of in the same boat. I was just about to, like, send a report over to a colleague, and I was like, oh, I gotta get on this call. And so now I am still sending it.
Me: Cool. Yeah, I’ve been doing a lot in cursor this week, and we’ve been doing a lot of using it a lot for writing, and so I’m, like, exploring a couple different things in there. So, yeah, just got distracted.
Them: I would still love to do cursor like demo at some point.
Me: Yeah. I think we have a couple of, like, basically want to get a plan for. The focus of this meeting is really just get a plan for this month. And then talk a little bit about, you know, the next two. I think a couple things today I wanted to go over is one, we started to do some, you know, discovery internally, like putting together discovery plan for those two work streams we talked with you and Jay about. We’re also still continuing to work through, you know, our initial, like, sort of data work stream. Which Ashwini is leading. And so maybe we could even just talk there, I guess. Katherine, like, how do you see, like, the priorities? I think, for us, I feel really comfortable with the existing, you know, work stream. In terms of, like, what we’re doing so far. I think actually we can talk a little bit about even some of the short term things like moving some of those scripts to DBT and things like that. But I feel like we’re on track. Polytomics should come back the best, you know, early next. Week with some insight into what they can land, and then we can start moving on that.
Them: Yeah.
Me: So. And then we also are going to work probably by mid next week. We should have like a plan for discovery around both of those to, you know, issues of Shopify issue and the Okta issue. Like, how do you think about prioritizing across those three? Is it dependent on sort of like, what we find out? Or, like, what do you think about those three in terms of priorities?
Them: So I think. Let’s see. Bots. To buy time. Well, I think I will say I learned that apparently we do actually use Shopify to sell sponsorships for ces.
Me: Yeah. Okay?
Them: Which. Fascinating. Not really sure why or how or what that’s about, but I literally had an email from, like, September that was like, can we get a report on this? And I was like. And I punted it, I think, over to Kyle, and I just never thought about it again, but now I’m like, oh. Weird. It’s like our sponsorship data is in there. I don’t know what to do about that. Which did kind of increase the interest in Shopify, I think. I mean, truthfully, what’s hard for me to speak to the prioritization there a little bit is like, It does kind of depend on J wanting to take on the work. So I would say they feel very prioritize worthy to me. But if he’s like not on board then it’s kind of a non starter because at the end of the day I’m not in okta. But I would hope that he would be motivated to get that work through. Speaking of Okta and authentication. Well, maybe all parking lotter because it could derail other conversations, but I might have over committed us to solving a problem that probably would have been better to just let it stay a problem, but I can’t resist helping. But yeah.
Me: So maybe, maybe a good place to start is just to look at, again, like, our Gantt chart for the existing project and kind of share, like, where we’re at now. So we’re sort of through kickoff, went through, you know, setting up Snowflake. We’re also, you know, nicely, I think, are basically set up Ashwini on like, DBT and GitHub too, right?
Them: It’s a different one. Yeah.
Me: So I think right now we’re leveraging the Remembers data as, like, the through line to just, like, get things set up, which is great.
Them: Y.
Me: So I feel like we’re actually parallel pathing a little bit more than I expected initially. I do feel like we’re in the sort of tool evaluation period for data ingestion. So I probably will extend this just to, like, January. But I get. I’m. I’m hopeful that, you know, polyatomic. Becomes the. You know, they end up working out for us, so we can sort of do that.
Them: Eah. Yeah.
Me: After, which, like, I think soon after, we’ll just try to ingest, you know, we’ll try to ingest as much data internally as possible. So. I think this is, this is the rough, like, cadence of where we’re at on our side. As we start to, as we’re starting to get a sense of, like, cta, the Org and your needs, one of the things that we always start to do is basically develop like, our, our architecture diagrams, like our data. Platform spreadsheet.
Them: Yeah.
Me: The one thing that’s not clear yet now, and I think maybe we’ll be more clear as we meet more stakeholders, is like, what are the metrics? And, like, how do we maybe do some, like, KPI standardization? You know, so I think that’ll. That’s basically what this phase is here. So we have. We have a typical structure where we put in, like, here are the metrics we found that people are reporting at. Here’s, like, the variety of definitions. And like, can we get alignment on what we want these metrics to be source from. Like, how they’re defined and like, sort of like some type of sign off. So that’s like what that metrics glossary dictionary is.
Them: So for that, actually, we can probably. For the. For this current stage in our evolution. Like we still are doing the goals process kind of the old way where it’s a two page to do list that they laminate and hand out. But we have just received our 2026 to do list. I mean, as much as I chafe at the way we currently set goals, they are the things people are going to want to track because those are tied to bonuses. And so, like, eventually, yes, building out additional metrics that are really quality.
Me: Really? Cold.
Them: I think we can do, but for, like, a place to start. Those numbers in those 20, 26 goals are going to be what everybody’s interested in initially. Actually, let me just send that right now. Put it in Slack.
Me: Yeah. So maybe I want to use the way. Yeah, we’ll just start with that as a scope.
Them: Yeah. See? I actually haven’t even looked at them yet myself, so let’s.
Me: Still time. It’s, you know, it’s only december twelveth.
Them: Exactly. Exactly. Okay, Data apps, Brainforge cta. Let’s see. Okay? Actually, the one question I had too, which does kind of tie into the problem I’ve called for us is the entity resolution stuff obviously dependent on having data to do entity resolution with, but curious, kind of where we think we might be able to start working on some of that.
Me: Yeah, So I think that will be as soon as we have things landed. So I think that that is maybe something that as soon as we have things landed, we can scope, like, what’s there.
Them: Yeah.
Me: Then work towards that. So I’m gonna put. I’ll just put that in as an individual thing.
Them: Yeah, I mean, if. If we wanted to be able to make some headway on it without waiting for, like, data to be landed, like, via polyatomic. I mean, we do have the S3 integration setup in Snowflake, and all of the old data is N3, so, like, I haven’t brought it into. Snowflake, but, like, it would be totally possible to start bringing in some of those, like, historical data sets and doing the entity resolution work around them. Because, I mean, okay, admittedly, I have an idea of how this will work in my mind, but it may or may not be the way that is, like, best practice to approach it. But, like, my thought was, first step is like, coming through all of the data and just figuring out, like, okay, what are all of the different identifiers in different systems? How many people can we already kind of, like, flatten and figure out?
Me: First. Step is like. Okay?
Them: And then figuring out what that looks like on an ongoing ingestion basis eventually. But, like, tackling the mess that we do have is potentially useful.
Me: So then why don’t we. Ashwini, we can add that to like sources to bring in. And then let’s. Let’s work together on, like, what, like, initial plan would be. Like based on. We’ll just profile what data is in there.
Them: And I can share. A bunch of stuff. Wait, so I have, like, a full inventory of all the, like, you know, tables and columns and stuff like that. And then I also have some kind of shadows on the cave wall documentation from the former data team about, like, what data is in what tables. There are like 300 or 400 of them. But there’s really only a handful that we would need to do the entity resolution work with.
Me: Okay?
Them: And actually, the other thing, too is I could go through that CTA Systems inventory spreadsheet and identify, like, which systems do contain people or companies where they would have an identifier in that system, because there’s some of them obviously not.
Me: Which. Systems do cont. Ain people. Or companies. Where they would. Have identified. System. Obviously. Not.
Them: But yeah, yeah, yeah, yeah, yeah, yeah. Okay? Sorry. That was a whole. What kind of data? Is there an s3? Is it. Is it a CSV file? Parquet file? What kind of data? Are we both. So. Actually, I. So we took the old SQL server and I dumped it into CSV. CSV and parquet files in S3. So you could choose your own adventure, depending on which one you like better. It’s a lot of data.
Me: So let’s land that as a snowflake. So that would be a program.
Them: Yeah, yeah. And is that data dynamic, or is it just a historical snapshot? Yes, it’s not dynamic. It’s just a dump out from the old SQL Server. I mean, talk about suspenders and builds. I like I took a backup, I migrated it, I took a snapshot of it, and I rds. Then I exported all the data to CSV and parquet files and did some referential integrity checks afterwards. I’m very sure that I backed it up. And yet I still have not had the confidence to go close the old Azure account. That’ll be like December 31st before the ball drops. I’ll finally close my eyes and click the button. But yeah. So historical data. And frankly, I mean, it is actually a data source. We should put a little build around. I mean, it won’t change, but there will be questions against it. Like, as we continue to try and land like new data sources. People still want to go back into the archives, probably.
Me: Against. It. Like as we. Continue to find. Out. Probably.
Them: So yeah. Yeah.
Me: Okay?
Them: Yeah, it’s a lot of it. There’s a lot of it. And this is a one time ingestion. That. That’s what I understand. Yes. Yeah, yeah, exactly. Cool.
Me: Cool. So then let’s drive. We could drive towards that as well. In addition to like we have the remembers data model. I guess. Katherine like is like getting a bi tool like lend less relevant. And, like, should we push? That out. Getting or, like, you know, just evaluating what to do there. I don’t know what the timing is on, like, the power bi stuff. But.
Them: You know?
Me: I guess.
Them: Actually, that’s a good point. Yeah, to be totally honest. I feel like. Yeah, I actually.
Me: Meaning, I guess to give you the like, the trade offs. It’s like we could drive towards just getting like marks ready for remembers data for like and driving towards entity resolution in in Snowflake with whatever we have. In addition, like, and I guess like, we, if we just talk about the data work stream. We would just drive towards, like, accomplishing that. I don’t know yet how much time that’s going to take. So we were initially going to just, you know, as soon as we landed Data had, you know, some stuff modeled drive towards like a bi decision. If people are going to are okay with just getting flat files or going directly in Snowflake, then we can push that off and instead just prioritize like nailing that. And like other.
Them: Yeah. Yeah, I think that was actually really savvy pivot, because for all intents and purposes, we can still push data from Snowflake to power bi. So we do have a BI tool and yeah, I think, actually. Yeah, yeah, yeah, yeah. Putting some more focus on the marts and the entity resolution work rather than BI platform selection makes a lot of sense to me.
Me: Push. Data. Yeah. Because we could just get into like. Yeah, just spending. It’ll be another couple week process at least to look through bi tools. And so maybe we kick that for a bit and then, yeah, like, if. If. I don’t know, if we do have access to power, bi.
Them: Yeah.
Me: But that’s something that we can. If we can get access to that, then we can just. If, if needed, we can just build stuff there. Like that’s fine.
Them: Yeah. I mean, so we can definitely get you access. I could talk to Jay about that. I feel like I want to met, you know? I mean, it’s like a balanced beam that doesn’t exist, but it’s like I. I acknowledge that we have power. Bi. It does make sense to use it for things that, like, we must create, but.
Me: I mean, we can just also drive. Just people to go direct in Snowflake.
Them: Yeah, well, it’s also just like, I. I do want to keep people in the idea of, like, oh, we are going to get a new tool versus, like, entrenching further in power bi. But the reality is Kyle and Kai are putting out power bi reports because we need to put something out. So I think light power bi usage is fine. Just don’t want to get people, like, confused and think that that’s where we’re going to stay.
Me: Okay?
Them: Mostly thinking ahead to whenever I tell finance how much it’ll cost to buy Sigma, I don’t want them to be like, but you’re doing it all in power, bi now.
Me: Yeah. Yeah.
Them: Right? Yeah, yeah.
Me: Okay?
Them: But yeah, no, I think working on Martz and the NTT resolution stuff, that’s a better use of the time, honestly, because at this point my hope is just like probably most folks are going to go into, like either dark or panic mode until, you know, mid January. But when they come back, I want there to be like, all sorts of beautiful stuff for them to play with in Snowflake. So yeah. Yeah, yeah.
Me: So, yeah, I really think the next, you know, month or so is going to be just all modeling work. And for us to just work. Do working sessions on marts as we arrive there. So, yeah.
Them: Y. Eah.
Me: Okay?
Them: I like. It. I like it. Yeah.
Me: And then, yeah, probably, you know, by mid next week, we’ll have, like, the scope of, like, discovery. Like what we. What, like, basic questions. And what we hypothesize could be possible for both of those other work streams, the Okta and the Shopify.
Them: Okay?
Me: And then, you know, we can have a conversation and then can let me know. Based on, like, what. What the what, like, lift looks like or how timing. We could talk about that at that point.
Them: Okay? Yeah. Yeah, that sounds perfect.
Me: What I told sam. Is like, I just want to put in. Like, I was like, okay, let’s just start poking around. I’m like, let’s just put together, like, what the lift on Discovery would be. So give Katherine. Some, like, room to see, like, what we would need access to or questions we would ask and then be like, okay, this is not worth it right now, or, okay, let’s go after this, you know?
Them: Yeah.
Me: Yeah.
Them: Yeah, yeah.
Me: Yeah. Yeah.
Them: Yeah.
Me: Yeah.
Them: Yeah.
Me: Y. Eah.
Them: I mean. Yeah. Like I just want it fixed.
Me: Y. Eah.
Them: Yeah.
Me: Okay, great. I think the other item also, Kather says, let’s just talk about contract. We’re coming up on, like, our initial contract is up in, like, end of this month.
Them: Yep.
Me: Do you want to wait till next week to sort of scope out, like, what Jan would look like, or what do you think is. Is best?
Them: I mean, honestly, like, I so. I don’t want to, like, make more work.
Me: I don’t. Want to make all the. Work.
Them: But at the same time, I’m kind of like, I think. Realistically knowing how long it takes to get contracts through and all, you know, blah, blah, blah. I think if we put together, like, just a, like, Q1 scope.
Me: Just. To. Keep.
Them: Then, like, I can kind of slide that through the process, hopefully faster, now that the MSA is approved. Like, the scopes should go through smoother. But I do want, at the in the new year, I think, to come up with kind of like a full year’s plan. I’m just like, I don’t know if we can pull that off right now with the poverty spec brain cells I’ve got at this point, but I think we could confidently plan for Q1 and then we get back in January, start planning the rest of the year scope.
Me: Her through. Yeah. Okay?
Them: But I realize that’s asking you guys to go through the exercise kind of twice, so.
Me: That’s fine. So I’ll put in just something for. For Q1. I will sort of. I’ll. I’ll have it drafted and ready. Then based on where we arrive next week on new things, we can add those scopes and then send it for, you know, for you to review and then. Yeah. I mean.
Them: Yeah.
Me: Again. We’re. I feel like we’re moved as fast as I thought we would move, and. And I think we’ve been doing well. It’s like, I. I know that some things are shifting, and so we’re able to move stuff around, which is good.
Them: Y. Eah. Yeah.
Me: So I’m happy, you know, to focus more on just, you know, modeling and also, like, Kind of the way we work, of course, is just like in pod. So as we. As I get to see, like, where the bulk of the work is, like, if there’s going to be like four or six weeks, there’s just a lot of modeling work, then we’ll loop in someone else from our team. To come in and. And help. Help there as well. And so as we sort of need experts in different areas, like functional folks.
Them: Okay?
Me: Usually it’s just me and someone sort of, like, shine the flashlight in and sort of, like, look around. And then we kind of understand, okay? Like, how much are we taking on? Who do we need? You know, So I feel pretty good.
Them: Oh, yeah. I mean, you guys have been great. Honestly, like, I’m like, how much more of your time can I have? Right? I mean, yeah, like I seriously, I’ll take everything. But yeah, I think for Q1 too, to that point, like it is probably going to be a lot of modeling like as we start lending the data sources, just building march after mart after mart. Yeah, yeah, yeah. And actually, I mean, maybe, maybe that actually kind of becomes a natural, like, way to do it right is if we spend Q1 landing data and modeling data, and then there’s, you know, maybe towards the end of it is the work stream around the BI selection tool and then that plan. Yeah.
Me: Just. Building part. Two. You would hope it’s that clean. Yeah.
Them: Yeah.
Me: You hope it’s. But again, what my hope is that, like, we start to empower some of the people that are pulling from this. And then that’s where we start to meet with. With them directly. Right? So as we start to land things, we’ll run it by you, and then we’ll start to build a relationships with Kyle and different folks to start to, you know, serve them.
Them: Y. Es. Yeah.
Me: You know, directly.
Them: Yeah.
Me: And basically start to build out. And then that, that’ll start to allow us to start to build a little bit of mini roadmaps on what their needs are, analytics wise. And then as a platform team, like we can, then it’ll make the. Whether we need the BI decision, you know, it’ll make that clear Also, like it’ll make it because we. I need to know what their expectations are. In order to make that. Like, if everybody’s in comfortable in SQL, then there’s, there’s other tools. And so that’ll be, like. That’ll be helpful for that decision. So yeah.
Them: No, we’re. We’re assuming nobody wants to do SQL. Yeah. Yeah, I mean, we might find the handful of people who are willing, but, yeah, I think the, the overwhelming majority of stakeholders across the organization are going to want an Excel spreadsheet. Like, that’s, that’s the, the level they’re at, which is, I mean, hey, not nothing. And some of them have some pretty impressive, Vlookup. Things rigged up so, you know, cautiously optimistic for their abilities there.
Me: You know?
Them: But yeah, yeah. Yeah. Conversational interfaces, I think, are going to be the danger for it.
Me: So that’s what I mean. That’s going to be for us to even go do, like, a proof of concept. I want to. The way we. We’ve done in the past is we just literally take questions that we have been asked. And that’s how we test the tooling, you know, if it can satisfy versus, like, making up questions or, you know, thinking of some, like, nice things at work.
Them: Yeah. Y. Eah. Oh, actually, to that end. So I. I didn’t get as far as I wanted to yesterday, but I was starting to go through my inbox. Email is my Achilles heel. But there were, there’s a lot of recent stuff I just need to respond to, but there are a lot of, like, older things that I just kind of left there because we didn’t really have a board. And I was like, I don’t know. But this is a question that I can’t solve right now, but I’ll leave it. And so I’ve been starting to finally dump those into asana board, and so I should ask Jay about getting you guys asana access. I’m guessing that makes the most sense.
Me: This is a question. That I can’t solve. Right. Now.
Them: And that way you could see because, like, they’re being. You know, nothing in there is, like, super exciting, per se, but it’s definitely a lot of things that we will want to tackle next year.
Me: But. It’s difficult. To. Tackle. Next. Year. Okay?
Them: Yeah. Yeah.
Me: Okay? So I. Think maybe we spend the rest of the time. Ashwini, if you want to, we can talk through the sftp. Work stream, and then. I know, Katherine. In the beginning of the call, you had another Octa thing. That.
Them: Yeah, I can. Well, Ashwini pulls up stuff. I can. I can talk through that one. So basically, and I am some of this might be repetitive, I might have shared some of this already, but we have in that ecosystem of CES vendors where they’re all kind of daisy chain integrated one of so they all use email address as the unique identifier across systems, which is why the entity resolution work will solve for that eventually.
Me: Shar. P code is already. But we have. Eco. System cbs. Systems. Which is why. Nc resolution. Work.
Them: But one of the vendors uses an endpoint for like attendee match and exhibitor recommendations that the query parameter in the call is the email address. And so if you authenticate into the mobile app, you can then, you know, intercept and change the email address. And so if you want to get Katherine recommendations instead, of your own. All you need to do is drop the other email address into the query string, right?
Me: S. And so. You are dedicated. To. Capital. Ist. Drops. Yet.
Them: Not great. Not. Not great. Yeah, not great at all. So they were. The. The question amongst the team was, do we just figure it was this way last year and the year before, we’ll go one more year and then never again, or do we try to do better this year? Then the question was, well, we don’t have any system or any unique identifier that’s in all of the systems. And I was like, In my optimism. Earlier this year when I was starting down the NT resolution path, I had created the Data Ops ID in our data with the idea that it would eventually become a canonical id, even though right now it’s just one to one. But email. But I’ve been pushing it everywhere I touch and the registration vendor is storing it for any records that I’ve sent.
Me: Just want. Ed to work with you. But. Any. Record.
Them: Registrations come from a few places, so my stream has the Data Ops ID in their system.
Me: System.
Them: And I could certainly push the Ops IDs for all the others, because I have them. I just don’t have a way to push them normally.
Me: I. Just don’t follow makeup. Push them forward. Okay?
Them: Long story or long story short is I think they might actually be willing to make this API change and use my data ops IDs instead of the email address. Which is awesome. But now I need to figure out like, oh, how do I get these to actually be in all of the systems?
Me: Yeah.
Them: Which is, I mean, easy to imagine now. Like, I can do a bulk upload of the ones that we need, but then it’s like, okay, on site as registrations come in. How do I make sure that I’m keeping up with that? So this is as of yesterday afternoon. I’m still wrapping my head around it, but I’m excited to have caused this problem.
Me: Cool. Well, I don’t know. This sounds like a perfect role for some type of transform job to execute.
Them: Yeah.
Me: On load, and then we push things out, you know?
Them: Yeah. Yeah. And so in. It’s a nice segue back to what we’ll talk about now, because this job that Ashwini is going to pull up is also the one that is assigning those IDs. And so this is kind of like step zero in getting that data together.
Me: Okay? Okay? Perfect.
Them: Yeah, it was like, I don’t know, when I was new and had to figure out, like, okay, of all the old things the marketing team used to do, which one do I want to hold on to? And perhaps I was. Perhaps, I guess. Well, because I decided that this invite process for CES was the one super. Manual obnoxious things that I was going to keep supporting. And it has definitely. It’s definitely gotten me into some interesting conversations. So, yeah.
Me: Process. For dtfs. Great.
Them: Okay, do you want me to kind of talk through the file here or what’s most useful? I mean, I understand what’s there in the file. What I’m trying to understand is where are these tables you’re seeing? It runs on postgres right now, so I’m assuming that they are there in postgres. We need to bring them into Snowflakes. So you also mentioned you have already created an integration between S3 and Snowflake or something like that.
Me: Autom. Atic.
Them: Yeah. So if you look in Snowflake, actually, I created that webhooks database. Okay. And so this is. Our market research team has been, like, soliciting interest in these show floor tours. And so I built out. I’d already set up the webhook From Formstack to S3 and. So now I just took the last mile S3 to Snowflake to create, like, a little view that he could go in and download when we get new submissions to that form.
Me: And. Psychological. Kind of. Like. Dnb. Cover.
Them: So everything is there in this stage or where exactly? If you click on Views, that’s where I created the thing. Yeah. Yeah. So this was what I put together for him, and he was overjoyed.
Me: Wow. Great.
Them: But it is proof that the S3 integration works. Okay? Because every time the form stack is submitted, the webhook sends it to the S3 bucket and then this connected it out to Snowflake. It’s the only data that I’ve actually connected with the stage. I just in the IAM role scope, I only had it accessing two buckets one is the webhooks one, and then the other one was like a demo I had used when I first set it up, just to make sure it worked. But I can modify the role to have permissions to access any other bucket for that matter, including the one that contains all of that old archived data.
Me: Connected. Too much. To make sure. It work. S. But. Just. Not my configuration. Okay?
Them: Ultimately, the webhooks database will just get deleted. After ces, it was just quick and dirty. But anyway, so, yeah, so all of those tables they currently live in postgres. But they’re essentially one to one with the flat file I’m grabbing from the different systems. Like none of it’s actually integrated. I go and I grab eight or 10 flat files and then I import them into postgres. Run the code and send it on. So instead of importing into postgres, I could just park those flat files in an S3 bucket, and then they’d be available to Snowflake.
Me: Systems. Like. Integrated. You know? Three buckets.
Them: Sorry. I didn’t follow something over here. This is coming directly from this table, right? How did you load data in this table? From something from the stage. Yeah, so, I mean, I don’t have the commands in front of me anymore. I mean, since I knew to snowflake. I mean, maybe I didn’t do it right, but. Yeah. So I did the copy data from stage command to create the table, and the table just had one row or one column with the. Nested JSON, and then I created the view to unnest the JSON. Okay, okay, okay.
Me: To create.
Them: Got it. Yeah, I didn’t set up, like, Snowpipe or anything, so, like, in order to get the latest data in, I’ll have to run the copy from Command again. And this table is one of these somewhere? No, no, it’s irrelevant to this process. It’s just, I would say, proof that it does work. And so all of the tables in this process, I could put in an S3 bucket and add to that integration. Awesome. Yeah. Okay. And then we can take it forward from there. But what I wanted to mention was once we have transformations up to this point, Right. We can put it back into S3 in a different stage. And then from S3 to FTP will need a different approach, right?
Me: Different. Graph.
Them: I’m not sure if we can use the polytomic solution. From Snowflake to S3. We’ll have to explore that. But, like, From Snowflake to S3, that is something that is doable.
Me: But. Like from. This. To.
Them: Okay. Okay. Yeah. I mean, it’s funny. I have written glue jobs to put data up on FTP servers before, so I could. I could do it in glue. I just haven’t because, well, glue is kind of a pain. And then I saw that Polyatomic had the SFTP connector, and I was like, oh well, maybe we could just use that to take it from either S3 or Snowflake onto the marketing cloud. FTP.
Me: Before. Yeah.
Them: Because basically what happens at the end of this process, there’s a bunch of. And I should probably give you guys the DDL for the views, but there are views that I export. Those are the files that I take to Marketing Cloud. And so there’s five views for Marketing Cloud that I put on their FTP. And then once they’re on the FTP, everything else is automated to ingest them into marketing cloud, process them, populate all the data, extensions, etc. Etc. So like, once they hit that FTP, my hands are done.
Me: N. Tp.
Them: And then there’s a few other views that I send over to the registration vendors FTP, which I think I will continue doing manually because they just seem to be like, yeah, One of them doesn’t necessarily update that often. The other one should. But people are constantly doing the like before you send it. Before you send it. And I’m like, okay. Next year, we won’t have that. But at this point, it’s too late for me to be like, you know. Although, I don’t know. We’ll see. Maybe I’ll give in, which is kind of annoying.
Me: Any other questions, Ashwini?
Them: I think I’m good right now.
Me: Right now.
Them: So the current scope is do the transformations that are there in this file. And export the data. N3 out of these tables that that we have updated.
Me: Exp. Orted. These. Statements that. You have achieved.
Them: Updated or inserted. Yeah, I’ll give you. Yeah. Because that is kind of the missing piece in here is the views that wind up being what I export. So I can give you the DDL for those views. Sure, sure. That’s kind of. Yeah. The final shape of the data.
Me: With the. Data.
Them: And then I guess what I can do too is I’ll create an S3 bucket and I’ll drop in. I mean, I have to do this today at 5:00, so, like, I can drop in today’s files into the S3 bucket, and then you would have exactly the same sort of. Data that I would have starting this.
Me: And then.
Them: Sure. Cool.
Me: Do you want to. Are you going to end up moving this? That’s really dbt.
Them: Yes.
Me: Okay? Okay?
Them: Yeah.
Me: Y. Eah.
Them: It means. Yeah, the invite process. Generally, there’s a lot of appetite to overhaul it, but for the moment, this is. This is what we get.
Me: So maybe Ashwini, once we have that there, we can also discuss, like how we want to run the DBT jobs. On a schedule. Like, that’s the next decision for us to make, whether we want to do DBT Cloud or we want to execute, like, within Snowflake as dbt. Like deep, deeper functionality. So I think this is a good use case because we other software modeling hasn’t had like orchestration requirement yet, so. This is a decision we can also make next week.
Them: Sh. Yeah.
Me: Like DBT is open source. We can run this in many ways. Like, we don’t have to go through cloud, but it’d be good to talk through the options.
Them: Yeah. Yeah. I mean, yeah, I would say, yeah, like. This thing runs every day, so not too difficult probably to set up a pipeline for it. Like there’s not a ton of like, nuance. Like it doesn’t need to be event triggered or anything like that. Just a cron job essentially. Right. I’m just a human cron job right now, but. But. Yeah, I think in terms of orchestrating, like, other data refreshes and that’ll be. That’ll be an interesting kind of thing to figure out because we have so many systems that are important, but also only really in use for, like, part of the year versus then there’ll be some things that might be like, yeah, it makes sense to ingest every hour or every day.
Me: Great. Job. It’s actually great. To. Adjust every hour. Every day. Yeah. So if there’s stuff that if.
Them: Yeah.
Me: There’s, you know, jobs that. Are like, outside of just dbt, like we do trigger Python workloads or webhooks, then we should consider Ashwini doing this, like in glue or. Or something where we can orchestrate multiple, multiple things. You know?
Them: Yeah, right now it’s. Right now it’s all very crony type, but I think as we get our feet under us, we’ll be able to find better patterns.
Me: Okay? Okay? Cool.
Them: And if this is, like, too overwhelming to, like, pull together, like, I mean, at the end of the day, I just have to run it.
Me: You know, I actually think it’s helpful because it’s just going to front load us making infra decisions to accomplish it. So that’s what I. I know it’s just depending on, you know, for us, like we go to some clients where it’s really just like, okay, playing like a whole. Everything gets a requirement. Everything. It’s planned out. We’re also in situations where.
Them: Yeah. True.
Me: Okay, let’s. We have to do both. Like, we’re both planning for the future, but there are immediate wins that we can get, and so that’s. That’s just the balance for us, you know?
Them: Yeah. Yeah. Yeah, I think we’re kind of in that spot. Yeah.
Me: Which is good. I mean, I, I prefer that some people really press us on tons and tons and tons of planning and we end up, like, building a lot of documentation. But I think some people, especially the folks that just don’t have, don’t, haven’t worked with the data team or built it out, they they really need a ton of hand holding. And they’re like, don’t even touch anything until we, like, approve everything. Like, okay, we can go. We can do that.
Them: I mean, it’s funny. It is probably, like, I am probably the bridge between. Yeah. Because there’s a lot of that. There’s a lot of that internally.
Me: Which is great because, like, I don’t know, it takes trust. To, like, do both, you know, so, yeah.
Them: Yeah. Yeah. I mean, people get so excited that, like, the handful of things that I have made better. And so it’s like, yeah, the quick wins, they definitely build political capital.
Me: Okay? Great.
Them: Yeah.
Me: Okay? Perfect. All right, so then I’ll follow up, probably Tuesday, with some notes on the next discovery phase. I’m going to update the Gantt chart. As well today. I think, Ashwini, maybe if you want to follow up on this work stream and just thinking about DBT orchestration. And then Katherine. I’ll send this also in Slack, but if we can get access to power, Bi and Asana, that would be helpful.
Them: Yep. Yep, yep. How about we start with normal GitHub actions?
Me: Yeah.
Them: And later. Yeah.
Me: I would say check out the Snowflake DBT stuff because it came out this year.
Them: Yeah, yeah.
Me: I want to see because GitHub actions is just like little finicky.
Them: It is. Yeah. Yeah.
Me: So if the dbt. If we can run the DBT jobs in Snowflake, it could be, like, a really big win.
Them: Y. Eah. Yeah, I, I have great experience with GitHub Actions. I mean, like, so, for example, on commit like this repo pushes everything up to an S3 bucket and like, that kind of stuff. Actually, I think maybe I’ve got a few other magic tricks in there. I think it zips my Python notebook files or something like that, but, yeah. So I don’t. I don’t mind doing things in GitHub actions, but fiddly is a good word.
Me: Yeah.
Them: Yeah, the more we can lean on a very, like, frugal tech stack initially, the better. Because, I mean, it’s just. Yeah, the sticker shock is definitely real, and I know eventually we will continue to change that, but, yeah, the more we can, like, squeeze value out of the tools we have, the better.
Me: It’s like. That. Is. Perfect.
Them: Yeah.
Me: Yeah. Also, that’s why I wanted to see how much we could say within the ecosystem.
Them: Yeah. Yeah. Actually, also, not for nothing, but this particular job will give you a decent amount or decent sense of the volume stuff that Polyatomic is looking for. Because these are going to be some of our bigger data sources that we’re processing on a, like, regular basis.
Me: Okay? Perfect.
Them: Okay? So I’ll get Asana power bi I’ll get the DDL documentation. I will make a note of which systems have entities in them. And. Then we should be good. I. I have one more question. This data that. That the script is transforming. This is a dynamic data, right? You are running the script every day. To generate the output. And what do you have in place that moves the data from S3 to Snowflake on a daily basis? Oh. So right now it’s just me, the human. Okay? Yeah. That’s why I was saying, like, oh, I think we can actually automate this. Save me an hour.
Me: Yeah. That’s why we’re going to put in here. It’s not been executed.
Them: Yeah. Yeah. Yeah. Yeah. So right now I. I go to a bunch of different places and I download flat files and then I import them to Postgres, run this script, export the views, and put that data up on the FTP servers.
Me: On the. FTP servers.
Them: So yeah. All right.
Me: Cool. Okay, so let me. I’ll go ahead, and then I’ll probably find time, Katherine, for maybe Tuesday afternoon.
Them: Okay? Okay?
Me: So I’ll just put up. I’ll just put a meeting there.
Them: Actually, Tuesday might be dicey. I think I was looking at my calendar. Somebody else wanted to Tuesday spot. Actually, after 3pm I’m fine. But yeah, Tuesday before 3pm, I’m like double booked all over the place.
Me: Okay, I will aim for. For that. And then, yeah, me and Sam will be there, and we can talk through.
Them: Okay?
Me: This. Okay, awesome. Well, have a great weekend. Enjoy.
Them: All. Right. Yeah, you guys, too. Yeah, I know. Right? Right. Yeah.
Me: Yeah.
Them: And if you have any questions at all, you know where to find me. Don’t be shy.
Me: Okay? Thank you. Both.
Them: All right.
Me: Bye.