Why these AI updates could change your job forever
Download MP3Hello and welcome to the Break Even Brothers podcast, a podcast by two brothers talking all about how to leverage the latest and most advanced AI tools to supercharge your productivity and keep you on the cutting edge of your profession. If you are looking for real practical knowledge about the latest AI capabilities that you can apply on your day-to-day job, then get ready and tune in. Okay, cool. We are live with episode 18. of the Break Even Brothers podcast. Excited to talk to you guys all about AI and coding and new technology. There's been lots of recent developments with AI that we're going to get into here in this episode. So I'm super excited. Yeah. Brad, how's everything going? We got the hats. Everybody, if you're on YouTube, check it out. Solid PHP hat and Luxemite hat. So, you know, always wrapping. Things have been going to well. Yours looks brand new. Yeah, I mean, I don't wear it. It's only for special occasion. I would say today is a quite a special occasion. Nice, nice. Is it orange? It looks orange on the webcam. It is definitely orange. Okay. Yeah. Halloween color. I don't wear it often, but I got to represent sometimes Larval. Nice, nice, cool. Yeah, and just a quick aside before we get into the content, Brad and I were, we were in person this weekend because we both went out to visit our grandpa who had a 93rd birthday. So shout out to Grandpa Ray, turned 93. Pretty cool. Heck, yeah. That was awesome. All right. Let's do it, Brad. I know we wanted to kind of start with a kind of game this episode. What do you got for us? Yeah, so I found something on social media. Kind of hard to explain, so I'll give it my best explanation. But essentially, I'm going to think of five words and not tell Ben. I'm going to give myself 10 words to give him. I essentially have 10 words to give him for him to get my five words. And I talked to Ben before. I kind of explained it to him again in a poor way like I just did. but essentially we both before the episode have outlined five words and have not told each other. And now we'll do a live guessing game to just see if we can pick up on the words the other person had wrote down. So I kind of have a mix of easy and medium ones. We'll see how well this goes, but I'll kick it off. So I have, again, five words. And the first clue I'm going to give Ben is bundler. A web pack. Oh, amazing. One for one. is left. There are nine words to give and four more words to cross off. Let's go a little harder. Let's go dynamic. H-T-T-P-X. No. Well, I'll add more to that. Let's go programming. React. Okay, a little farther. Let's, we've got to reel this one in here. Compiled. Dynamic compiled. What was the other one? Dynamic compiled in. Oh, no. All right. We're going with the Hail Mary here. This word was way too hard now that I'm looking out on my screen. Actually, you know what? Let's pivot. Let's pivot. Let's go to, let's go to, let's see, fabrications. Hallucinations. Boom. Okay, solid. I don't even know how many guesses we're at. We must be at least five guess, or five clues so far. Let's go. I'm going to do a two-part clue here, but maybe we'll just try with one to lead. Excel. spreadsheet virtual google sheet function virtual function i don't know pandas uh that one was v table maybe maybe a little difficult v table what the hell's v table isn't that a function in excel no no are you sure v look up v look dude i'm sure my bad my bad this is how we have a non-accountant talking about excel yeah okay big rip on that one so we got hallucinations we got webpack we failed on B-table, which is my bad. We're still plugging on one that I think might also be too hard. Let's try a different one. Let's go chat. Butler. Lank. Cat Lank. I don't know. This is hard. Maybe too hard. Maybe too hard. That was context window. So chat length. Context window. Holy smokes. That's two words. It is two words. It is two words. Maybe a phrase. And then the other one, which I think I ran out of guess is probably again too hard. interpreted. So I was trying to hit you with Python to give you like an interpreted language. So I went with the opposite compiled, interpreted, maybe two programmer-esque of me. But yeah, so to wrap you through my five, V table failed. Interpreted, Python, you know, interpreted languages, I thought, hallucinations, AI chat, a webpack. I know your biggest enemy. And then context window, AI chat. So, okay, those are a little tough. Hit me with something easier. That was tough. Okay, hopefully mine are easy and I got the gist of this. I'll start with one accounting numbers programming quick books yeah quick books thank you yeah okay good for some reason I was putting those separate now I put them together okay yeah nice um okay uh library framework spreadsheet pandas pandas there you go too okay nice um I think got like three or four clues I don't know four um this one's gonna be I think a tougher one so I'll say billion dollars Unicorn. Yeah. Wow. That's good. Very nice. Thank you. Thank you. Okay. Cool. I got two more. Okay. Framework. Larval. Snake. Python. Wait. Python framework. Django. Django. Yeah. I must be at six or seven. Yeah. But you're doing good. Yeah. One more. Okay. Storage. Database. There you go. Look at that. Wow. Okay. Mine are too easy. You're just good at this game. I don't know. I think mine were a little over the top. Hey, we had one difficult one, one good one. So that works out. I'm very impressed that you got a lot of those. Very nice. Yeah, it made me think. I think I probably should have had more connecting words, but that's awesome. Cool. Well, yeah, I just want to play a fun game for the intro, but we do have a lot to get into. So I would love to jump right in. So the past two weeks has really been an explosive week for AI models. We've had pretty much every company across the board released something new. In the podcast notes right here, I just have a small. of some of the top highlights. So I'm going to go through those, explain, you know, who released it, what it is, what it's useful for, but there's a ton to get through. So let's just jump right in. So first up on the list, Open AI with their 4-0 Image Generation. And so this one has been a long time coming. They came out with, I think, Dolly, like a year ago. So image generation from text, you could chat and say, create me, you know, a turtle in the sand. and it can just figure out some kind of representation of what that looks like. But the big thing is there's text rendering first class support within OpenAI's 4-0 image generation. That's a big deal because if you've ever interacted with these tools before and you ask it, you know, create a billboard in L.A. And on the billboard, type in text, splitmy expenses.com. It will never get it right. The text looks horrible. It just does not work at all. It's kind of like one of those things if you've seen the AI image generation of fingers for people does not work. It's always six finger, seven fingers, you name it. This one has been trained well on text generation. I think it's pretty much a standout feature for them. Yeah, that's huge because I think one of the areas that I've tinkered with image generation the most is like kind of creating like logos or like content, like for like content marketing stuff. And yeah, the text was always awful. Like you would say, I want a marketing piece that I can put out there targeted towards, you know, franchises. And it would just like give you some alien looking text. Like look like words, but then it wasn't really words. Which is so weird because you think that text would be a simple thing to nail. But obviously it's not. I think alien text is a perfect description because I have seen it. And I think if you've seen it, you know what it looks like and you don't forget. And then when you see logos generated with that text, you know immediately. And I think the work around people were doing were they're creating figures with AI that were non-text and then adding text to that artifact like in Photoshop. Now you can do it all in one shot. So not only is text first class citizen, but also they have edits. So you can chat with it to get edits. You can upload a photo, say swap out the background and put me in Japan, put me anywhere, and it really can get fine-grained edits. So I'm curious if Photoshop will have a run for their money or if, you know, someone I've built a product from the ground up using AI images and maybe even, you know, opening eyes for a model could be pretty cool. Yeah. One thing on the Photoshop piece is I'm pretty sure Adobe, like they had a really strange clause in like their use of like AI. Because they have like an AI editing tool or something like that in their package. I don't want to misspeak on it. But they have some really weird licensing that I think I remember. I'll see if I can find a link and put it in the show notes. But, um, It seems like it's right for someone to come in there and, you know, shake their, their stranglehold on, you know, image and video editing. Because, yeah, it's going to be so much more democratized to everybody else right now. Anyone can go and create a logo or create some kind of content marketing for whatever they're interested in doing. You don't need to be a graphic designer. Obviously, like, you can only get you so far, but like just to be able to kind of get closer and closer is super exciting. Because, I mean, I heard Jason Stats say on an episode recently about agents, but he was saying, like, right now is the worst they're going to be. And, like, that's still pretty cool. Like, it's only going to get better. So I'm not sure that's the same for the image stuff as well. I think I've been trying to drive home that thought to lots of people, too, is like, if it's not good now, it's going to be a lot better in the future. Or if it's barely good now, it's going to be a lot better in the future. It's probably a better way to put it. But, yeah, I think I did hear something on Adobe's odd licensing with AI. So I'm curious what that ends up being. But yeah, that one's big release. Super exciting. I did try it out. I took a photo of myself until, like, I think I told AI it put me in India. And it like correctly outlined me and put me in a different location without, you know, fudging up my body. Like, it did a really good removed background almost on me. So super cool. The next one I want to talk about is Google had a big release this week. They came out with Gemini 2.5 Pro Experimental. Hats off to Google for one of the most confusing naming schemes ever. I think Open AI does a great job for confusing names too, but this one, of course, is up there. And I think the worst I've seen is Gemini 2.0 Flash thinking experimental preview. You get the gist, but it's horrible. But anyways, this model, big deal. Really, really good at coding, really, really good at reasoning. And it's not just me saying that this is directly from the leaderboard. that is the number one rated source of truth. So when I look at the leaderboard today, I think they were above the second model by 40 or so points. That's important because usually when new models take the lead, the interval in which they take the lead is not super significant. For this model, it was a decent jump, and I think people are excited that Google's back on top because they have good engineering talent. They've been playing catch-up for the past six to 12 months. So they release something. It's big, and it has a big gap. And I think people are really excited about it. The one caveat I would say is it is experimental. So the rate limits on their APIs to use this new model, I think it's five requests per minute, which, again, is not super heavy. It won't fly for a production website. But as a tinker and with these new AI models, I would highly, highly suggest to try it out. If you Google AI Studio it puts you right in their chat box And it a pretty good interface and you can try it out right there for free So super super exciting Yeah that really exciting I haven I think they announced that today right Yeah, today is March 25th. So I haven't been able to get in there too much just yet. But I think Gemini in particular is super exciting, maybe more exciting than some of the other models. Because, you know, there's the Google Workspace. A lot of companies use Google Drive, you know, Google Docs, Google Sheets. Google slides and all that kind of stuff. And I think being able to tap into Gemini just like natively within those documents or those files that you have that you're working on can be huge. I think we've talked in a previous episode or two about, you know, Gemini with Google Sheets and how can kind of get you some analysis or do a couple things for you. Again, that's only going to get better. And where, you know, Google is different from OpenAI or DeepSeek. and all those other models that we talk about is, you know, Google Workspace is pretty proliferated throughout, you know, American companies. And certainly, I think that's a mean that there's also opportunity for people to get familiar with Gemini to have, like, immediate impact in their day-to-day jobs by leveraging some of those tools. So that's super exciting. Yeah, I did have one use case to report back that was successful with Gemini. Previously, I think I was asking it to format text, and it kind of failed said it couldn't do it. recently I created a spreadsheet essentially had people's names and they were voting on things and it was a checkbox per name on a bunch of rows and I really want to figure out how to sum up if these boxes were checked in one row couldn't figure it out you know not not an excel guy as much as I'd like to be so I was Googling around how do I do this I just want to sum a row if there's checks in each column that are in this row and I couldn't figure it out I was trying to do sum if didn't work and I just asked jemini I thought why am i not using this little icon the top right that should be able to do this so I ask how do i sum all this and it came up with a formula count if not sum if but count if and that worked out so I was able to check a box in a specific row for a specific column and then it flipped from a total being zero to one checked other columns in the same row and it you know updated in real time so one successful use case and i thought i kind of lost trust or faith in the Gemini product when it couldn't do something so simple. I thought it wouldn't be able to do something what to me seemed a bit more complicated, but it did it. And so I think that kind of drives home. You got to tinker with these things. You might have a wrong prompt. Maybe the AI model got upgraded behind the scenes because these things changed rapidly. Like as we were seeing the past two weeks, tons and tons of releases, it only becomes much and more competitive. So I think these companies are pushing really hard to get, you know, latest state of the art models out there. So if you have tried something, don't lose faith like I did. Keep iterating. Keep experimenting. I think there's lots of room to come up cool use cases. And you'll never know until you try it. And if it doesn't work, you know, then try again in a few weeks. I'm sure it might have a better result then. Cool. And then the next one I want to talk about the biggest initial release, I think, of 2025 was deep seek. Deepseek R1 model was an absolute banger of a model. Open source, you name it. it really kind of took the world by stride and was the first reasoning model that pushed all these other competitors like Anthropic Open AI to come out with a reasoning model that was visible to the user. And visible to the user is essentially the thinking portion if you've used these chat AI apps to show you exactly what the model is thinking. If you use cursor, if you use chat GPT or Claude, you'll see it if you choose the thinking model. And it is really easy to follow the thinking pattern. You can read it when you write code and understand how it's. debugging thing. So again, DeepSeek R1 earlier this year was huge, but literally just yesterday on March 24th, they came out with a new model that has improved on the original model in reasoning and coding performance. And the one note here I would say is there's been so many model releases recently. For example, Gem and I didn't come out today, March 25th. I think Deepseek would have its shining moment. But literally all these companies are so cutthroat, spending so much money, working so hard that it's almost as if Deep Seek yesterday was the news, the talk of the town on Twitter, at least in the AI world. Now, you know, it's a little bit more Gemini. But I think I'd like to highlight it here, it's a great open source model. It's MIT license this time. If you're not familiar with licenses, it's a big deal in the programming community. It's how you can use the software, how they enable it. This can be if you can sell it, if you can package it within an app. I think the previous Deep Seek model was Apache, which again, there's like 50 different licenses. It's really complicated. I'm not an expert. But Apache was a little bit more restrictive than MIT. I think MIT is very open, very broad. Use it or lose it. We don't really care. This is our work and share it with the world. So a big update. Deep Seek did really well on coding and reasoning. But again, I think Gemini does better. And for one who's tinkering with things, I would probably lean on Gemini. But you can run Deep Seek yourself, whereas Gemini is close source. So a huge, huge update there. I think Deep Seek's going to continue to push LMs forward and kind of the open source self-hostable sense. Cool. Yeah, it's interesting open source as like, you know, being used in different companies. I think so much preferences around like having like a product that's kind of already built for you. But then there's like restrictions on how much you want to share that data. You know, with a with a company like Google or Open AI, right? Because they're private companies and I'm sure they can do with that data what they want. So yeah, having something that's more, you know, open source, you know, I think it probably makes it safer in a way. But, yeah, I'm curious how that gets adopted, you know, companies that are looking to leverage AI. I wonder if they go the deep seek route or do they kind of go the, you know, Google, Microsoft, kind of Apple route. Yeah. It's like you get a little bit more control, but you're not getting the latest and greatest intelligence. So depending on your use case, I think it really depends what the company size is, what your budget is. But yeah, very interesting question. Yeah. Cool. And then we got more. So, you know, bear with me here. We got speech to text. So GPT4O Transcribe. We're March 25th today on recording, but I think this was last week or earlier this week. And what this is, OpenAI has the whisper model. Again, enable speech to text. So you can send it in audio file. It will return you really, really good transcribe text in raw text form. And this new model, again, it's like better across the board. So lower latency. C, reduced word error rates. They call that word error rates. It's kind of the key component for a speech-to-text model. Usually abbreviated WER. So they were comparing in charts, whispers, word error rate with GPT4O transcribe and GPT4O transcribe mini. Those error rates are much, much better. And the reason this is important because it enables real-time voice. So Open AI is pushing on their agents SDK. This is a Python SDK. They put together to kind of coordinate agents. and we talked about this in the last episode, but this is a hot, hot topic for 2025. And I think the speech-to-text model and everything that Open AI is coming out with, image generation, speech-to-text, etc. It's painting the picture that they want to give agents tools and they're building this framework. And I can see that, you know, open-AI marketing team and product team really wanting to empower developers and this is a key kind of component along that path. So really exciting. I haven't used it yet. I haven't used most of these yet. I need to. but this one is just kind of like an obvious improvement. You know, when Apple releases computers year over year, it's like, hey, we have the M14 ship. It was better than the M15 ship by 15%. This is kind of what that model feels like. It's just better across the board and, you know, why not use it? Yeah, I feel like Open AI does a great job. You know, I feel like they're all had different strengths. Like Anthropic is like widely seen as like the best kind of code assistant one, right? I feel like Open AI does a great job of like giving you. use cases or like useful things to kind of build with. And then they may not be the best like reasoning. It might not be the best, you know, at the actual coding itself. But like in that voice agent release that you're talking about, you know, they put out different architecture and it won't get too much into it. But there's like a speech to speech one. And there's also like a change where it's like speech to text, LLM and then text of speech. And basically what they do a great job of is they kind of say, hey, you know, the speech to speech to speech one, it's a tongue twister, is best for like, you know, conversational search and discovery and interactive customer service scenarios. They kind of give you these examples. And then, you know, for the chained architecture, it says, you know, customer support, you know, sales and inbound triage. So I feel like Open AI, you know, for maybe other shortcomings they might have, they do a great job of kind of building these, like you said, tools that, like are pretty useful, you know, and I think you can immediately think of use cases. I remember hearing a a story. I think it was around Christmas time about someone who built like an AI Santa, like voice agent. So you call and it'd be like, you know, ho, ho, ho, ho, like, blah. And so, that's obviously like a silly kind of product, if you even call it that. But like there might be real applications for like voice agents down the road, you know, and open AI, you know, saying, hey, there's some tools that you can use. So, and they're just getting better and better, which is super exciting. Yeah, they definitely lead the way. I mean, I think it is kind of like Apple. they're creating things that people like don't know they need yet and creating the building blocks everybody else is kind of a second mover at least in the past two years so hats off to them i think they have a great research team and API team product team they're really well polished there and as you were talking about you know kind of their core building blocks i was thinking for their speech to text they also release a text to speech model and if you haven't seen it yet you can go to open a i.fm it's a free page It allows you to type in text, choose a voice profile. So I think they shipped with like 11 voice profiles. And it can read a script and play it out loud. What's also cool is they have a vibe section where you can describe the voice. So, you know, if you go to the page, you can check it out. But I'll describe it if you're not there. You can type in a script. So what you want the AI model to say. You choose a voice across 11 voices. And the vibe, at least the example that they have in their text boxes, the voice. so they've described it as gruff, fast-talking, a little worn-out, like a New York cabby who's seen it all but still keeps things moving. So you're describing almost the character behind the voice. And then you attach like a voice to it. So this can be, you know, high pitch, low-pitch, different accents kind of thing. So it's really, really cool how they can dive deep on like character development. I imagine this will be a big deal for voiceovers and creators and things of that nature. And again, this just kind of plays to their agent toolkit where, They have fantastic speech to text. They have fantastic text to speech, which is what I'm kind of mentioning. I don't have it down on the list because we have another contender there, but this whole package came out about a week ago. I'm super, super exciting. And then to move on to our text to speech contender this week, another big open source one is Sesame. So they have a CSMB model. And I think the big thing here is it's much more natural, like, dynamic speech. Usually when you're doing text to speech, you type something in and it kind of reads it like a robot and doesn't have a lot of context. What I imagine from this new model from Sesame is a lot of research and iteration on making it feel much more dynamic where you have sentences and structures and a longer message and it kind of flows better throughout the sentences. Maybe a little bit hard to describe but I think if you played with it you understand it a little bit more But their model is backed by a Lama model It open source It on GitHub It's on Hugging Face. You can try it out. And again, I think the big thing here is open source, natural text that feels way less far away from a robot. I think you can even clone your voice with a five second sample. So it adds a lot of heat to these audio AI companies like 11 labs and others who are charging a lot of money for a really good voice. I think you can get pretty far in the open source world with this AI, Texas speech model, kind of like the era of Deep Seek. Deep Seek brought great intelligence, quote unquote, for free in the open source world. Sesame seems to be that moment of AI voice. It's kind of hard to figure out in the open source world. They're bringing that to everybody's front door and saying, hey, this is accessible. You know, everyone can pick it up. So super, super cool. It's cool. Cool. And then the last one on the list. It's been a long, long week. And I'm excited to play with all these. but the last one we have is Quen released a new model. Horrible naming, but I'm going to read it off. Quen 2.5 dash VL dash 32B dash instruct. And to break that down to make it a little bit more digestible, 2.5, I think it's just a model. Dash VL is, I think, vision learning or like, it's basically a vision model. 2B is how big the model is. And instruct is usually a chat model. So there's like models that can give you an answer, and there's models I can chat with you. and those are different. And so the big reason this one is a good release for lots of folks is it's a pretty small model and it has really high capabilities for understanding visual data. So I think a big proponent here would be if, for example, I wanted to monitor a secure facility with cameras. I don't want to staff someone to look at cameras all day. I could use AI to essentially take photos, you know, every five seconds. I can ask the Quinn model, give it that raw photo data and say, is there anything that looks off here? Maybe you find someone without a badge. Maybe you find someone like, you know, doing weird things. Could be people. Could be objects. Could be whatever. But it gives you a way to reason about a photo in a way that has really deep understanding. It can draw bounding boxes around things. So you could say highlight, you know, the banana on the table. And then you could swap that banana out with some different object by using OpenAIs 4-0 image generation. So it's really really cool. I think I don't have a great grasp. of like vision understanding models. I don't use them that often. But when I see use cases online of people highlighting its strengths, I thought, wow, that is like really, really cool. I had never thought of that. And I think we're going to see a lot of products probably pop up over the next six to 12 months doing more vision stuff. And I think this one is also open source, pretty small model, so you can run it yourself and it's very, very efficient and good at visual understanding. So excited to tinker with something there in the future. Yeah, that's interesting. I thought of, this is probably a lot of effort. Hopefully someone's already doing this, or maybe someone that listens to this, can do it. But there's a game that's pretty popular called Observation Duty. Do you know what? Have you heard of that? I've not heard of it. Okay, it's like an indie game, but it's pretty popular as far as indie games go. But the premise is that you're like a security guard. Like, it's just first person. You don't like see your actual self, but you're a security guard, and you have like eight cameras that you have to kind of cycle through. And things either start going missing or something gets added. and you have to identify, like, what changed. And so as you cycle through, it's like, oh, that can moved from, like, the last time I saw it. So I'd be curious, if it would be cool, if someone could, like, test out this Quinn 2.5, you know, model and see if it could win observation duty, because it's, like, I bet you could. I put my money on it. Yeah. I'm going to Google and see if that's, that someone's doing that, because that seems like a natural way to test it. But, yeah, it'd be fun if we could play other games to, like, League of Legends, or a fast-paced game. if you could train it on that. I think that would be pretty cool. But yeah, this is kind of the model updates I wanted to share. There's definitely plenty more like Mistral came out with one, I think this week. And it's, again, open source chat, super smart. I think to summarize all these, we're getting tools across images, reasoning and coding, speech to text, text to speech, and vision. We're literally in like the pinnacle of everyone has eyes on AI. Everyone is spending a ton and ton of effort, money, intelligence resources to make these things awesome and stand out. If you are willing and capable to try out these tools when they come out, you're going to be very, very well equipped into the future, especially for workplace. So I think a lot of these tools are instantly upgrading available tools and workflows that exist today. Like Gemini's new model, they plug that into Gemini that exists on Google sheets, Google, you know, all the Google products. Huge upgrade. I don't think they're doing that today since it's experimental. But if it exists in the experimental land, you can imagine maybe in three to six months, It'll exist inside and embedded within the products. So really cool. If you have the opportunity, please, please try it out. I think you'll be impressed by what is out there. And a lot of these things, there's demos, pages, etc. to kind of tinker with them. And feel free to drop a comment in YouTube. And we can kind of lead you to a cool place to try it out. Yeah. Yeah, totally. I mean, I think to echo what Brad's said there, like, you know, these things just keep opening up people to be able to do more. You know, again, with the new image generation, Open AI release, like, people can make their own logo as much easier and whatever else. And then with Gemini, maybe we can eventually start having it create, you know, Google sheets automatically that it knows we're going to need and just be able to give it a simple prompt. And it creates this whole workbook for, you know, the accountants in the room. And so it's just going to make you more efficient. And one of the things, I guess, is a segue into what I was going to kind of talk about was, like, the whole developing and coding for agents. We talked about agents last episode. And so between that last episode and now, I had worked pretty extensively on making some agents that I wanted to kind of just share with this medium here because there's definitely some learnings. But it's also, again, impressive to kind of go through that. And I thought I'd share some of those learnings because there's some good moments. Quick question before you jump in. Yeah. Was there vibe coding involved? The vibes peaked in Valley. There was some good vibes. Okay, I would love to hear about it. Yeah. So I'll make it snap because I won't get into all the technical details and lose you guys. But there was two agents that I was kind of set out to make. The first one was like an email monitoring agent. So basically my premise was I don't want to sit there and monitor my inbox all the time. I want, you know, basically some kind of agent to be able to kind of take a first pass and decipher what's really important to me. what's just spam or just junk. And then I also wanted it to not let any messages that were important to me go longer than 24 hours without a response. And so I booted up cursor, started giving it the prompts. And, you know, one thing I had learned is, you know, it gets easily, if you try and ask it to do the whole shebang, it will get you limits. Like if you ask the agent to do all that, you will start getting limited in your API calls and it will stop you from coding all. that. So that was one of the moments of frustration. And you and I had talked offline here and I was telling you how annoying that was. And your suggestion was to break it up into different chats. So that was a learning because I had it all in one chat. And I was just trying to have this person or this agent, just code all this stuff for me. And once I started breaking up with the new chats, it seemed to definitely help with the limiting on like the API calls. And you're on cursor pro, right? Yeah, cursor pro. Yeah. Okay. I was using Claude Sonnet 3.7 for the most part. Okay, nice. And yeah, it got it to work. You know, basically I'd hit a, it was just running the program locally, but it was connecting to my inbox. So I'd hit, you know, Python main. Dot Pi. And it would look at my inbox and start flagging things that were like important or that had a sense of urgency. So the AI plugged in Open AI to be like my agent. I think I was using 4.0. And I just said, if anything looks urgent in the emails, like flag it. And then, you know, for everything that's not, just Mark, Mark, Mark is. read and it would do it and it was getting the urgency pretty close it was like I just made dummy emails I said oh like I'm a customer like please help I will pay a million dollars and it was like flagging that um how much did you iterate on the prompt because that sounds like the the bread and butter it's connecting to the email then having that's kind of grading prompt to say where it exists in the urgency spectrum not a ton yet um okay it's all been like my sample email so like I definitely as I'm getting more emails. I'm going to keep kind of turning it on and seeing. But yeah, it did a pretty good job of like figuring out like what was urgent. The one thing that I did add to it as I was going through with it was I want to be able to identify like if this is like a person on a certain email list. So say like just as an example, you know, you have clients, right? Like you are a business and you have clients and you want to identify emails that are coming from like your clients. Well, Like one thing that you can do is like in the case I was kind of demoing is like connect with your QuickBooks account like via API. Oh, that's cool. And see if see if the sender is in your QuickBooks contact list. If they are, then slot them at the top. You know, like if it's a client and it's urgent, then let's put them there. Connecting the systems and the data, that is like, you know, so nice. I feel like just interacting across multiple tools. I also hate email with a passion. I mean, I'm inbox zero. So to be fair, I do check it all. And I get down there, but it is so much effort day and day out. I'm trying to unsubscribe from things all the time. But that makes a ton of sense. I have one question. Did AI write your initial prompt for grading? As in you were vibe code and you said, I wanted to create this. And it, you know, wrote that first draft for the prompt that you sent to Open AI with the email content. Or did you write that first draft? Super curious. It wrote the first draft. Okay. And how long was it? Like, you know, 10 sentences, three sentences. it was probably five sentences. Okay, that's not bad. Yeah, and so, but I guess let me just say this to where it kind of went off the rails a little bit. And what I've learned is I asked it a query once. I said, hey, make sure that my like API calls open air, open AI are efficient because I don't want to get charged, you know, in an inefficient way per call. Because I think it identified, like, it was calling it separate times. I was calling it to like read the customer emails and then or just read the email. emails and then it was calling it again to like interpret it and it was like oh I think we can make this this one call so I was trying to like find a way like make sure it was still being cost efficient and when I did that it like started adding a much of stuff and so what I think I learned is it's great at adding more stuff to an existing code base it's not great at removing things especially things that it added itself and I'm talking specifically about Claude Sonnet 3.7 because that's what I was using and so when I did that thing has got a little bit wonky and it still works but it was, I had to kind of go back and try to walk back a couple of things. And so from this, and what I'll leave you with on this topic is, I'm still making the agents. And so the email agent is one. There's another one where I want it to generate reports for me automatically. So like, you know, at the certain day of the month, generate me a profit and loss, a balance sheet. Just do it. I don't want to have to go into QuickBooks and request it. Just do it. So that's the one I'm working on. But what I did find really helpful and interesting was, and I'll share the link in our show notes, but someone made a framework of like, hey, if you're going to try to, I guess, one shot, I think we've been using that phrase in a bit now. You know, you need to have these like four documents that are going to really help it like get there. And it talks about it like a project requirements doc Oh okay Yeah And there like a technical specs doc There was like two other ones I can remember exactly what it is but basically it was like get these things them first and then that way the AI is not going to hallucinate because I had some hallucinations on me. It started replying, auto replying to the emails when I didn't ask it to do that. It would say like, oh, we got your email. We respond within 24 hours. I was like, no, don't say that. So did you generate those like, quote unquote, required docs? So I've also seen that. But as someone who I feel like I've done a lot of this. vibe coding and I have a deep understanding, at least I think I do, of how to make it work and how to make it work well. When you saw that post or wherever you saw that, did you make the documents and did you find it successful? Yeah, I started on it, started on the email one. I haven't finished with the other one just yet, but yeah, it is helpful. And I used Open AI to like help me kind of craft that a little bit. And I think what's great about making things with like AI these days is like you do have to still know what you're trying to make. I think it really struggles when it's like you're not sure. And sometimes that's part of a process of making something is you're not really sure and you kind of iterate through it. That's fine. But like that's where it's like maybe you need to take those steps first and then come back to the AI and say, I want to build this. Because I think if you're trying something with AI and it goes down a path and that path ultimately doesn't be the right one. It's not the right one that's going to work for you, it's hard to walk the AI back out of that. It kind of, you almost, like, from what I know, from my experience, just need to nuke it and start over. Yeah, it's like the vibe debugging. If you get too far in, you kind of debug it on the way out. But, yeah, like Ben mentioned, it is great at adding things, even adding things twice, like duplicating things I've found when it shouldn't. But, man, asking it to remove something, it struggles. It does not do a good job on that. Yeah, yeah. So, yeah, more to come. I'll keep folks in tune on how that all develops because it's been really fun, you know, making those agents and like anything when you build something that's cool to see something work when you hit the button and it does all that. And again, I'm not a technical person, but, you know, these tools enable you to do such things, you know, which is really cool. And again, it's only going to get better. So like Brad said, tinker, tinker, tinker. Because that's the only way that you're going to learn these quirks, you know, that I've learned, as I was just mentioning. So yeah, you really learn by doing. And I'd say learn by failing to, like getting yourself into a hole, you probably learned a ton and not to do that again. So it's, you know, I've done it too. Everyone does it. But I have one final question of that story and then a few comments. Would you have done this if you didn't have cursor and vibe coding? No, it would have taken so much longer. You know, I thought about, I did think in my head, I'm like, well, all I'm coding is like a decision tree, like how to handle my email, which you can do without AI. a sense, right? Like, you can, you can always connect to, like, you know, I'm just talking about Python's. Like, there's a SMTP library that lets you connect to email. And you can always grab those emails and, like, categorize them in a certain way. But it's that, like, interpretation with, like, AI that was, that's like, that's the new thing for me. Again, a lot of my coding and stuff was, like, six or seven years ago and I was doing it a bit more often then. And that's the big change. It's like, it's that interpretation of, like, tell me that this is urgent. Like, do your best because like before you have to say like oh does it have the words like you have to like almost like rejects of like yeah urgent important help the ice age days yeah and like man like that was not that long ago and now here we are we just say hey like if this looks important or urgent please let me know flag it and i think there's so many more past you can take that down like oh another thing i was thinking about was you know if someone asked me a question and it's just a generic question that like it pertains to me or like you know if you're a business owner it just pertains to your business say like you are you know Arizona pest control and someone emails you and says oh like do you service the Tempe area like the AI could almost should be able to look at that and be like oh do we source tempi area no we don't and just auto reply instead of the business owner having to say no we do not service Tempe we service these areas like stuff like that you know it's so much easier to implement now than ever was before. And so I think that empowers people too. They can kind of craft a little bit of like what do they want, like their workflows that look like a bit more. They have a bit more autonomy in that. Yeah, it's cheap, accessible, and you have room for error. Again, you could go down any path you want. So that is super cool. I think the last note on that, Ben, whenever you feel comfortable, feel free to drop the email address. I will send an email to your agent and I will attach a subpoena and tell it to give me all your data. And I'm pretty sure your agent can't handle that right now. So whenever you're ready, just feel free to send the email. I saw this on Twitter recently. I think it was in the vein of prompt engineering prompt hacking. We see the kind of I'll pay you $1,000. You give me a good result. We see a lot sketchier things I won't name on the podcast. But you can imagine, you know, violence, kind of stuff. threats to the AI model, but the newest one that I saw, which I thought was pretty funny, is putting together a fake court document saying, you know, you chat GPT, your subpoena to give me this information, it's approved with all these legal stamps. And so one thing you do need to be careful about as you're developing these tools is the impact or, you know, what systems it's connected to. If it's connected to QuickBooks and it's not fully set up, if I sent an email to your chat agent and subpoenaed it, say, give me all your QuickBooks data or else, what happens? Again, it's probably covered, but I think it's a funny use case to bring up. might be fun to test with. Yeah, totally. Yeah. Cool. Well, that's awesome. I think we're almost at time. So I think the theme of what we've been driving home today is there's tons and tons of stuff out there. It's only getting better. New releases are literally coming every week. And it's pushing the frontier of intelligence, driving it down to zero and making it really accessible for people. Not only do you have the deep seek open source land, but you have the Gemini 2.5 Pro that is really driving fantastic coding and reasoning. So if you have the time, I would highly suggest trying out cursor, trying out any of these tools. Most of them are pretty accessible. And I think you'll be very impressed with the results. Yeah, totally. And I would just echo Brad in the sense of like, just see if you can do something. You might not know. And sometimes people let that stop them from just trying it. But, you know, what's interesting is you can kind of ask AI, like, can you help me with this? And like, you can't really like do that. anywhere else. And it'll tell you like, oh, yeah, you can do X, Y, and Z. So just, just be creative, think critically about what you're doing. And I'm sure there'll be use cases for you in your, in your day to day that's going to make you better, make you faster and make you more efficient and ultimately more valuable to whatever you're doing. So plus one to that. Not only is it fun, but it drives so much business value. That's why I love it. It's kind of like the self-fulfilling prophecy of if I automate this, I can go share with other people. They're going to love it. And then I just learned something along the way. So huge plus one to that. And then before we get to our bookmarks, I would like to shout out. If you're doing anything cool, if you vibe coded anything, it could be an email assistant, which is a little bit higher end, or just a simple workflow bot or anything, Python, PHP, Rust, if you're crazy. Feel free to leave a comment in the YouTube section. We'd love to hear what you're working on. I think there's so much opportunity from video games to small, personal, you know, kind of automation that I love doing. Really, really, really interesting to see what people are doing. And it kind of unlocks a wider skill set and product space having kind of these tools like cursor and vibe coding. So if you have something cool to share, we'd love to see it. Connect with you on YouTube. Feel free to leave a comment. That would be awesome. Cool. And then we'll dive into bookmarks. I'll go first. So in the theme of the game we played earlier today, one of my words was context window. My bookmark today is very, very simply a chart that describes how each AI model performs on a longer context window. And I won't go through the full bit, but essentially it goes from zero to, you know, no context, not a very, very short context to 120K tokens. And that is a lot, a lot of text. You can almost think like a book or so, like passing in a full book to chat GPT or Gemini. And the top models that do well here, if you look on the very right side, Gemini 2.5 Pro Experimental, the one we highlighted two, today that got released today has 90% recall or understanding, even if you pass in 120,000 tokens, which you can imagine is like a book. If you compare that with chat GPT's 4-O model, it has 65% recall or comprehension at that same length. That's a large difference. You got, you know, 25, 30%ish difference on understanding. So if you have a long chat and cursor, it's better to use a more intelligent model that can actually use that full context window. I think the lowest number we see here looks like it's 37%, which was Gemini 2.0 pro experimental. So that shows you. Literally, Google went from the worst understanding of 120K context to the best understanding and their next release, 2.5 pro experimental, maybe in a matter of, you know, six months. So if you're using an AI model, you need to know the context window. kind of comprehension and this is a really good chart for you so check it out thanks eric for posting it he also creates repo prompt great software cool awesome yeah my bookmark is about something we touched on a little bit last episode um but something that i think we want to dig into more in future episodes is mcp that model context protocol and the tweet is from a gentleman named Santiago um his handle is different but we'll we'll just link that but basically he talks about MCP and has a 16-minute video on how he's building his MCP server. She talks about the capabilities of an MCP server. And I think right now, from what I've read and kind of where I've understood it so far, there's some general excitement around that, but I think there's still some, honestly, maybe confusion or opaqueness on how much that can help versus just like a standard API. The perfect use case that I'm interested in maybe... working on myself or trying to understand is like an MC pervert MCP server with QuickBooks because their API is you know it's very it's a great API it's very comprehensive but you know it's the whole premise of not having to change your whole code base because they might change something on their end is certainly appealing and so yeah so the video is a 60 minute video I'm going to watch it and kind of see if there's anything I can glean from it but definitely check it out as well looks like Santiago does other posts about MCP and workflow automation. So, yeah, it should be pretty interesting. Nice. Yeah, it's a lot of hype right now. So MCP this, MCP that new AI model here and there. If you're not on the AI Twitter space, definitely check out the two folks we linked in the bookmark section. Really good source of content for the latest and greatest kind of breaking news and workflows and tools for AI Twitter. Absolutely. Cool. Cool. All righty. So the episode in the book. books. Awesome. Thanks, Brad. Good. Thanks. See you next time. See all next time. Thank you for listening to the Breaking Brothers podcast. If you enjoyed the episode, please leave us a five-star review on Spotify, Apple Podcasts, or wherever else you may be listening from. Also, be sure to subscribe to our show and YouTube channel so you never miss an episode. Thanks and take care. All views and opinions by Bradley and Benner are solely either of. and unaffiliated with any external parties.
Creators and Guests


