Brainforge Final Interview

Date: March 13, 2026 Source: Granola Meeting ID: 1f05b063-b8d6-444a-88d6-f04bd2354aae URL: https://notes.granola.ai/t/1f05b063-b8d6-444a-88d6-f04bd2354aae

Participants:


No summary

Transcript

Me: Hello. Hey, how are you? Good to see you. Good. How’s the week going? Them: It’s been fine so far. Wen’s great. How was yours? Me: It’s busy. Yeah. Busy. Good. Busy. Yeah. Where team is growing. Yeah. Team is growing, and I feel like it’s just been good to start to work with some new clients and some new capabilities. Them: That’s all this late? Me: Yeah. Them: Obviously, lots is going on. Me: What’s a weekend plan, Tommy? Them: So my cap plan is I’ve been working on my home lab. Not really shut down, but I’m trying to automate it. So doing what’s called PX input. So just something called px boots. Where. You can put in a new machine without doing. Me: Px reboot. Them: Px ebooks so it’s a network Boots pxe. Me: Evil. Okay? Them: So I’m trying to set up for my home lab so that when I add a new machine to it, it automatically starts off the machine. Assigns an IP address to it, install Ubuntu, set up this server set of the network, then joins the my cluster. Me: Interesting. Them: So I’m trying to do this. We can. Me: Nice. I didn’t know you’re doing a lot of networking stuff. Them: Play around with. So I have the home lab, ethereum lab where I set up kubernetes on a hosts on stuffs on it. And play around with infrastructures and all. Me: Interesting. Great. Nice. Them: Hi Awaish. How are you doing? I’m good. How about you? I’m doing great. Thank you. Me: Is it just me and you? Dummy is coming. Them: Might be coming. Me: Okay, let’s go ahead and get started. Them: Okay? Me: Yeah. I think Godwin. I don’t know if awaits you guys already met once before. Them: Yes. Me: Okay? Them: Pretty much. Me: Cool, then. Yeah, I think we can get into the exercise. Yeah, feel free. Them: Okay? So what should I start from? I don’t know if you guys have seen gone through the exercise. Distribution. Rather. Me: Yeah, I think one thing that would be await. Go ahead. Them: Yeah, I think we have reviewed the submitted challenge, but we want you to kind of give a demo. Of what you have worked on and how it looks like, how you made the choices. Or while communicating. So let me share my window. Screen. So you can see my screen. Me: Yes. Them: Yeah. Sorry. I wish. You seen something? No, I was just saying, see your screen? You can start. Okay? So for the challenge. It was really straightforward for me, so it had a lot of information in it, basically. I was graduated a byte to set up the for ingestion. Ingest it from. Me: Godwin. We’re still seeing ourselves on the zoom. I don’t know if you’re sharing something else. Them: Sharing my screen. Okay? Me: I was seeing your screen, but it was just like us on the zoom. Them: Sorry. I think I made a mistake. My setup screen. Me: Yes. Yeah, yeah. Them: I’m sharing my vs code. So for the ebikes installation. That didn’t take much of a time. One thing I noticed was that Ebike did not have a docker compose set. Up a docker setup from the last time I NTP had, but just guys checking. When I went in, they had, like, a command line to set it up locally. So I used their command line. I noticed that it installed kubernetes and installs ebit on topic on top of the kubernetes kind cluster, which is what I did. So you can see that if I dot again. Price utilis it checks and sees that ebay is installed to get my credentials I just drawn by EBCTL local credential and I get my command line. I’m exposing this though because it’s local host I don’t think anyone have access to it. So one terabyte was installed. Kind of. Next part was to try out a byte, basically and see how it works. So I thought about ingestion. So let me do this. So the setup was like this. I. Postgres. For the destination. A byte change data into. Gcs. So the reason I use GCS was that I need the width to post the data, and I tried doing that locally. But I could not find a way to track link mounts to data directory. In my local machine. Ebay. It has a command where if you are setting up ebikes installation you do like a volume part directory. I tried that, it didn’t work. So I tried to just upload the files to gcs and they were uploaded demos into what you call it coding. Directory is basically. You can see it here. So we had the customers, we have the orders, and we have the products, and each file went into a secret directory. In normal production. Setting each file we have like a partition. So for each day, data goes into separate partition where you have your file name, the partition date and some random quads or numbers, basically. So I operate the files into gcs. Then I use a byte. I set up, like connection. So set up the source connection. So this is it. For your source connection. Source name gcs. I created a service account, and the service account just had the GCS Data Viewer and GCL Pocket object through, if I’m calling that correctly. And I set up my streams. For each file, I have a separate stream. So for customers, I’m looking into a file. Name. Basically, if I was arranging the Bucket, I think for production, use case. When you have multiple files, you probably just use an asterisk. To get your files in that directory. This was just optional. If hard, multiple days, backfilled for three days. Then the rest were. The rest follow the same pattern. Shows the JSON l formats here. To ensure that it follows the format of the files. Itself. I tested the connection and make sure it was working. After setting up my source I went to set up my destination. I have a lot of destination because I’m using two currently the local PG to connect to the local postgres for local run and I have a postgres instance running on Planet Skill and I used to test for the GitHub action and production instance. So for the look rpg I used. The host is my Docker Bridge network. So I’m using this because when you set up. What do you call it? So everybody set up inside Docker container. Postgres set up using Docker compose and for them to communicate you have to use a bridge. So everybody’s can’t really access what you call it. Postgres because of because they are not directly in the same network. So you have to bridge them. So when you do run stuff on Docker, you can. See my doc somewhere currently. Remember that. Use the IP address basically for this. My port of iPhone3 to database name shopify. Schema row was in full ingestion. So no secret size. Configured for this, then my database password. Then it was just optional or default setting, basically. So I tested a connection. And make sure it was connecting if I triggered the job. So after doing that, Created like a connection to sync data from the look from storage location into because guys instance. Okay, I have a question. If I have to read instead of reading from Google Cloud Storage, if I just have to read from local machine. Can you reproduce the flow? Yes, if I’m able to get the local. Sorry. The files are already like you are able to download it, right? Yes, I downloaded them. So I had issues with mounting the files on my local machine. Basically, to enable EBIT 3D files file, you have to mark the file to a byte. Into the airbyte data directory When I did that, ebay was in region actually simply I suspect it has to do with some talking issue. Because I tried other approaches. I tried other approaches on that didn’t work. So Ebike has a. Let me see if I can look at that documentation. Okay. That’s okay. We can move on. Okay? Basically you can reproduce it using the local directory. So you just need to set up your source 2.2 so if you go to a source and create a news status, you have to can just let file. Increase of file source Creature schema format JSON nil for this case. If you are doing local system any part to your directory basically. For me. I could not get a URL, so I was matching the files. But it wasn’t really seeing the files. When I do this, I just decided to go with the storage location. Back to the connection. For this. It created connection. And you just click on your signal and sync. So you can see that it is five days ago. And when I was working on this, so it. On these connections. Did you know how to set up the monitoring? So if any of the sync failed. Just send the alert to Slack. I didn’t do that. Do you have an idea? It can be done. Or not. Whatever. Or not? I’m not sure, but I think Ekamp. I’m not really sure, but it should be something that should be able. It can be done, basically, but I’m not really sure because I’ve not really tried it. Okay? So this is just for the airbyte, basically, then for the postgres. Postgres setup. I have the docker compose here. Just a basic Docker compose for setting up Postgres 18 instance setting up the token restart policy on the environment. I’m using an ef file surrounding my database and username password np nu then to be able to access my postpress instance externally. That’s for a by2utams asset instance. How to set up. This IP address looker host IP address With porting, I’m volume where my data is going to be stored. Then I have an initialization script for the arbor close. So once you start up the docker, compose it, initialize this script, create the databases. Create the roles basically for each of the schemas. This is more of a health check to check that. So we have a volume here called postgres. Data is a part of Broker Compose and it will get deleted if I just run docker compose down or something. So how we can make it something which is not deleted as part of that command? So I know if you do Docker compose down minus V, let me delete the storage and reset everything. I zoom to not delete it. You have to set up a delete policy. I don’t know if I set up a backup basically. You can do like in backup for your instance, make sure it’s not deleted and if it will take easily restore the backup. Or you can set up like a delete, a non delete, what do you call it? Policy on your Docker compose. So your storage location. Your storage is now deleted. Okay? Use the volume mount to ensure that the volume is mounted. So the RBAC flows is basically just creating the schemas first. So I have my raw schema for the ingestion data to the raw schema. And I have my dev stage integration coming here and as much for my DBT dev environment and I have my staging intermediate and match layer for my DBT production layer. Then I’m revoking. Assets so that. Users will not have access to this database unless they are granted. That access order permission, basically. I’m doing a for loop to create the roles, check if the rule exists. Utah Credulocking password. Basically for production. This will be changed. Is it password hard coded in SQL. Is it a variable? Yeah, in the script. Is hard coded for a production setting. Setup using an environmental variable. Very clear. But like, it’s already on GitHub now. Documentation. You see that? I said you should change this. So these are like default password. Yeah. So for the rule ingestion I’m granting database shopify to the ruling gesture. So this is in rule that Ebike uses. Basically to ingest it. Then according to ebay documentation, you have to translate this to the role you are using or the user. Then granting access to the raw database so that you can create insh tables on the raw schema. Too, then granting the ability to also create insert update analytics. Or tables in your schema in the raw schema too, so that ebit when you are trying to ingest the time to the schema updates, ebyte needs access to. So we have similar setup for the road transformation. So this DBT uses DBT user users. So I’m sequencing it access to read from the rod schema so that you can read from the Rosky man and get the data from it basically. Then also. Giving. Yeah. Like in the challenge, we asked to also to have two different workflows for Run to Run on GitHub Actions, one on PR validations, which basically points to staging, right? I don’t see any databases for staging. There are no schemas for staging environment. This is for stitching. So basically, what I did was just for Devon staging, using the same schema for production. Right. Switch this schema. Also keep it simple. Though. For transformation road transformation, which is what DBT uses, granting it access to write to the dev and the production schemas. Production schemas. Then for the role developer, just granting to read access to the entire schemas so that can view the data in there and see what’s in there for the bio. Just launching access to view the max layer. Basically just only the Matlab Nio other databases. Then for the dbt. I’m using, what do you call it, UV to install my dbt. I can see my DBT project. So that’s all for the postgres setup. So to run it, you just run your docker compose of minus D to set up. I have it running already. So I don’t want to do that. Step you follow. For the DBT setup, I have my Project DBT project and it’s just a default DBT settings basically. But for my model setup, I’m using this Shopify stage for staging.