Paying $200/month for Claude Code... here's why

Download MP3

Alrighty, we are finally back with episode 22 of the Breakeven Brothers podcast. Brad, how has it been? Tell us why we were gone for so long and give us some updates.

Yeah, I was on my honeymoon. I got married last year but didn't plan the honeymoon until recently, uh, the past few months. I went to Japan and Korea. I did Tokyo, Kyoto, and the Osaka area. In Korea, I went to Seoul. It was a ton of fun. I had told them we could potentially prerecord things, or we could just wait. And, you know, as we're such an excellent podcast, of course, we do things live for the audience. So, I've been gone for about two and a half weeks now. And man, has it been crazy in those two and a half weeks. So much stuff has popped up. When I was on vacation, I was trying to get on Twitter to read things, but honestly, the internet wasn't that good. So, I've been catching up now, back in my hometown. But it was great. It was a lot of fun, a lot of great food, good experiences. It was my first travel where I was completely unplugged, so I didn't bring a laptop; it was just my phone and my iPad. And if you know me, I love coding in my free time, so it was a little bit of a different vacation vibe. But nonetheless, extremely relaxing and fulfilling, and I'm feeling very energized to get back to it. Yeah.

Cool. Yeah, it was, uh, I couldn't help but text you a couple of times, uh, with some cool—you missed a lot, or you were gone for a lot of announcements, as it is, you know, uh, these days. But, um, yeah, a couple of times I texted Brad, like, "Oh, you got to check this out, you got to check this out." And good enough on his part, he didn't respond right away; he was out doing stuff. So, um, but I couldn't help myself. I was like, "Check this out." So, yeah, that's great. Great you had a good trip.

Yeah, and, um, yeah, I think Ben sent me one business idea, and I was like, "You know, this sounds good," but I was like deep in doing exploration of, like, various areas in Tokyo or something. I told him, "Hey, I'll take a look at this once I'm back." And I probably still need to take a look at it. But I think the energy and the excitement of the releases are huge. And we're going to talk about that later today. But yeah, that's the beauty of AI: you can go take a break for two weeks and come back, and your life is easier in measurable ways. And it's beautiful. Yeah. Yeah. Cool. Awesome.

Well, should we hit our welcome piece here and kind of our intro game that we're going to play?

Yeah, so I found this online. Essentially, it's "Propaganda I'm Not Falling For." And the premise of this bit is essentially finding popular trends and describing why you might not be on board with this trend or this popular thing you saw. It could be for any reason; it could be, you know, an item, a phrase, an idea. It's something that maybe others agree with, and you don't. And this is kind of the way to highlight that. So, I can jump in first. As I was making my list before the podcast, I thought this one would be interesting because I've heard Ben talk about it. So I'll just, you know, hit the ground running. And one of the "Propaganda I'm Not Falling For" is LangChain. I'll pause. Okay, Ben looked. The reason I say that, and I'll give context, of course, is because LangChain came out a long time ago. It was very early in the frameworks of AI tooling, and I applaud their effort. But to me, as an engineer, I feel like more layers of things sometimes complicate things. Yes, it's easy to use, but as a kind of, like, power user slash engineer, I want to have control of all these things, and I don't want to use things like LangChain. And yes, I haven't used it in a while, so maybe it's different. But back then, it was very much like layers and layers of prompt beautification. I didn't know what was going on. I think that is propaganda I'm not falling for. I don't want to use it. I think it's good for some use cases, but for me, I'm out.

Yeah, sure. I get that. Yeah. Should I give my opinion on it, or you go with your next ones first?

Let's go with the next ones.

Okay. Yeah. Oh, no, no. You go now. You go now.

Okay. I'll do mine. Okay. Propaganda I'm not falling for is—I've seen this a lot on X and even on LinkedIn—and it is, I'm going to call it agent overload. And what I mean by that is every single software-like app that's out there right now is announcing that they're releasing agents. So, QuickBooks just did one. You know, Workday has—they announced it a couple of months ago, I think in February. And to me, I think those are cool features, but I feel like that kind of misses the point on agents. I feel like agents are best when they kind of sit almost outside of a system and they can kind of interact with different systems. I think having some functions and usefulness within that system is good, but I think there's a lot of hype around that, I guess, is my point. I think it's a nice feature, but I think having a big keynote about how QuickBooks can auto-categorize a transaction—it could have already done that. It wasn't that far away from doing that. It's not as big of a lift as I think the headline is trying to convey. And I think at some point, that's going to become too much. Like if you have an enterprise, you have a business, and every single business has its own, quote-unquote, agent, which I think is kind of a fluffy definition oftentimes, I can see that being too much to manage and not that useful, or not as useful as it's being hyped up to be.

I like that. I think these companies are really getting on the AI bandwagon, so anything they can shove in that direction feels good for the stock and kind of the status of the company. So, if you're a PM listening to this and you're working on an agent feature, just double-check, make sure it's the right fit for your company and your product. But I like it. Okay. Yeah. Keep going. I'm going to grab something really fast. Keep going.

Okay. To that point, I'm going to grab something. Hang on. Okay. So, the second one of "Propaganda I'm Not Falling For" for me is Groq. And I think it's definitely a little bit of a change of point of view. So, I talked about Groq a while ago on the podcast. And essentially, they're an inference provider that is much faster. So they run on their own custom chips, and they mostly serve the open-source models. So we're talking Llama, DeepSeek, those sorts of models. And the sad reality is that a lot of these models aren't as good as these leading Claude and OpenAI models. And these are obviously private models, closed source. I used to be on the bandwagon that, you know, when open source is nearly as good as closed source, using Groq was really good because you got fast speeds, cheap, and it wasn't through an official provider like OpenAI or Claude. Now I've changed my mind. I think it's out. I think using the best intelligence is by far the way to power the best experiences for your users, for your apps, for your automations, for your agents. If you're not paying for the bleeding-edge intelligence, I think it's really not worth it to pay for Groq. And again, this is something that I changed my point of view on before. I was like, "Hey, when they're close, Groq is beautiful." When there's a gap there, I ditch it immediately. And I think that goes to the trend of AI tools evolving rapidly. And I'm trying to use the best and latest and greatest, no matter how much friction that takes me. And again, if Groq is behind or open-source models are behind, I'm immediately ditching those and going to something that I would pay top dollar for to get that better intelligence. So, for me, Groq is out right now.

Interesting. Cool. Okay, the one thing I wanted to grab really quick is from the book *AI Engineering*. But what you said was companies are pushing out AI a bunch right now; it helps the stock and all that. And so in the book, they quote—I'm going to quote from the book—"According to Wall Street Zen, companies that mentioned AI in their earnings calls saw their stock price increase more than those that didn't. An average of 4.6% increase compared to 2.4%." So, it does pay to hype a little bit.

Yeah, it does. You know, so you can see. And obviously, like, there is a ton of benefit. You know, you and I have this whole podcast basically about this topic, about AI and its usefulness and all that kind of stuff. But there is also a hype machine that kind of consumes companies a bit. So, yeah. Yeah. Cool.

Okay. My other propaganda I'm not buying is Perplexity AI. Tried it while you were out on the honeymoon. Didn't think it was anything special. In fact, I kind of hated it. Some of that was a learning curve, I'm sure. But, or not a learning curve, but just kind of unfamiliarity, I guess. I'm so used to Gemini and ChatGPT by now that Perplexity just felt like a cheap knockoff. Sorry. And I did the pro; I kind of trialed the pro plan. But yeah, I mean, and they're doing a ton of marketing out there. I think, you know, I don't know if that's to get market share or the fact they're losing customers. I don't know. And I saw actually some negative feedback that was actually after I signed up and kind of tried it out for a little bit, about how they added a bunch of widgets when you open the app to where now it looks like an msn.com homepage or a yahoo.com homepage, where it has all these weird articles and stuff like that that no one asked for. I think they've actually taken that off since because it had such a negative perception. Perplexity, we ranked it one or two episodes ago in that stack ranking. At that time, I said I hadn't used it, and having used it, I'm still out on it. Not my thing.

Yeah, I'm with you. I know people that use it. I personally don't find it that attractive. I use OpenAI's GPT-4o model with web research, and I found that just takes Perplexity's cake, and it's not even a competition. Again, I'm not really a power user of it. I also experienced a little bit of it and didn't love it. So I'm not the perfect judge, but in my short stint with it, I also felt like a lot of people used it. I tried it. I didn't like it. I moved on. So I'm with you. I think it's out. I would definitely agree.

Cool. My final one for today would be kind of AI agents in the terminal. So I use this fantastic terminal on Mac called Warp. I love it. I've been using it for, I don't know, maybe three or four years now. They've been trying to add this AI agent into the terminal. So as I type in commands, like if I'm changing directories or doing whatever I need to do in the terminal as a developer, oftentimes if you write a long command, they have this UI that essentially acts as an agent where you could type in plain text, like "move this file to this directory" or "rename all these files to change the file extension." Like you can literally type it in as if it was ChatGPT. They analyze that, and they run the command. Kind of cool at first glance, but now that I have these tools like ChatGPT, Claude Code, which is a CLI that does things for me, it's very much more packaged in a way that I think is more effective. And if you know me, I'm trying to try all the AI tools, but each one costs me at least $20 a month, and I can't go on forever. So I've encountered the UI pop-up on Warp, and I think it's a bust. I think it was a good premise a few months ago; now the tools are getting better. OpenAI has Codex, which is another CLI that you can run. Ask it to do things, and it'll do it. Claude Code is another excellent CLI. You run the CLI, ask it to do things. And to me, I want to keep Warp as just a terminal only, not have an AI agent, and let these more sophisticated toolings do that job. So I think it was a great idea. I think they've tried their best, no flack for the team, but to me, it's out because I think there are stronger tools to get the job done in that exact same category.

Yeah. Makes sense. Yeah. I could see at some point you're just like, "I don't want an AI doing this. I just want to do it." Like, "Just let me type in my terminal commands, please." That's how it's been a few times. Yeah. Yeah. I appreciate the help, but I'm not interested in it. Yeah.

Cool. I remember you pushing Warp on me a long time ago, and you were like, "You have to download this, guy." Yeah. Cool.

My last one is a bit of a, you know, a riff on MCP. So we talked a lot about MCP, I think, gosh, maybe six or so episodes ago, maybe five. And super cool. It has a huge role to play, I think, still to come. But I've seen a lot of, again, LinkedIn, a lot of X posts being like, "You can turn your API into an MCP in two seconds if you just run this code or if you just give it to, you know, Claude or Cursor to change." And it never—like, it's always a very simple use case. They never, ever kind of demonstrate it with a meaningful API, like doing actual useful things. It's always like some basic, "Oh, I have a function that does multiply. I have a function that gets the weather." No one cares about that. Yeah. And so, like, having worked with MCPs, I'd say pretty in-depth now, like for my level and for kind of what I'm interested in, they're not that straightforward. Like, there's a lot that goes into them. There's like security, you know, and kind of how you handle OAuth if you're doing it at a business or at your company. You know, there's like safety that you need to consider. So, like, just saying that you can take an existing API and just morph it into an MCP, I'm not buying it. If it really is that easy, I must be missing something because I think there's a lot more nuance to building a functional, production-ready MCP than just taking your API and telling Cursor to make it an MCP. There's no shot. I mean, half the time I even tell Gemini or Cursor or, you know, or I guess, well, Cursor, you know, is obviously using one of the models. But anytime I tell any, you know, LLM, "Can you build something related to MCP?" they'll come up with like, "Oh, yeah, you need Minecraft context protocol." I'm like, no, it doesn't. It still doesn't know, like, you know, MCP that well. I would be curious if you asked it today, you know, if you ask ChatGPT, "What's MCP?" I don't know if it'd give you the right answer, you know, just to your kind of standard chat. So, like, it's not—it's still so new. And a lot of these models are indexed on stuff that was, you know, more than a year ago, where I don't think it has that much source material on MCP. And so it's just not that simple to build right now.

Yeah, I agree. I've been building quite a few MCP servers in the past, like, two days, which I'll touch on later. But I've also felt the same. I think I posted on Twitter about building an MCP for Laravel. And I think I had one comment that was like, "This is easy, like super straightforward." And I thought, "This guy's never done it." That's a clear indication that he's never done it because once you start building it and interacting with it, it's not as straightforward as you would think. And I think there's a whole larger discussion around that.

Yeah. Yeah. Agreed. Cool. Well, that was an awesome little bit. I think there's tons of propaganda that I'm trying to not fall for. And that's a constant battle of not being sucked in. I do want to be sucked into the good ones, but to miss the bad ones and the ones I don't want to waste my time on, it takes a little bit of skill. So hopefully, we're in the right kind of ballpark for that. But to roll into our next segment, there's been a lot of updates in the past three weeks, and I do want to do a deep dive on our main topic, Claude Code, later, so stay tuned for that. But today, yeah, I just want to go through all the releases from all the top-tier companies and kind of get your input, get your thoughts. There's been a ton that came out, so jumping right into it.

OpenAI released Codex. This is a competitor to Claude Code. So Codex is a CLI that can essentially do things for you if you just launch it in your terminal, ask it to do things. It'll just go away and, you know, kind of like the Cursor and Windsurf, which is built into the IDE. Codex and Claude Code are much different in the fact that you just run it on your terminal. It uses various, like, bash commands or binaries to go search for files, make file edits—pretty similar end result for Cursor. Like you get the stuff you want to get done, but it's a different user interface, different expectations, different workflows. And then I think one of the more interesting things about Codex is they have this kind of Codex cloud. And so Codex, you can run on any machine. It can make edits to your files, your repo, et cetera. But Codex Cloud allows you to actually ask for changes on the ChatGPT mobile app. So what you can do is kind of give Codex access to your GitHub repository, fill out your kind of environment to run your application. So if you're using Laravel and Docker, you kind of have to set up the Docker container and configuration through their website. And then, which is pretty interesting to me, and they showcase in their announcement video, is you can open up the ChatGPT mobile app, go into Codex, kind of like choose your project. So for me, it'd be like my bill-splitting app. And I could say, "Oh, can you refactor this bill-splitting area to go from X to Y?" And it'll then chug away on their server with, like, your environment, create a pull request on GitHub. And then you can then check that out locally and iterate on that. So it's a really cool kind of, like, intern-in-the-cloud experience. And to be fair, I haven't done it yet, but it sounds really cool. The only problem is, like, getting that environment set up, I heard is a little bit tricky. And for example, if you're using Xcode, building iOS apps, good luck with that because Apple makes it really hard to set up macOS software. But if you're making any web apps or, like, maybe even React Native, I'm not 100% sure, but web apps are an excellent candidate for kind of Codex Cloud. So a really cool update. I think the trend is moving towards CLIs and the fact that you can have something that you can chat with on the go, and it does work for you, and you can go review it later and kind of, like, continue its work. Very, very cool.

Yeah, that's cool. I saw the announcement. It's a bit over my head, but it's really cool how it could just kind of do all that GitHub requests and all that kind of stuff. It's funny, the Xcode thing, because I just want—just a general comment on Apple, I suppose. We touched on it before, but I feel like Apple is really slow to move in relation to these other companies. And it's a shame that Xcode, which obviously, you know, is the IDE, right, of building iOS apps—that's something everyone uses that has an iPhone, you know. And it can build—it's not just iPhone, right? Like, it's any macOS platform.

Yeah, iOS, macOS, iPad.

It's a shame that you kind of aren't given the best tools in relation to what's out there now, like, as opposed to if you were writing Android apps and stuff like that. And web apps, to your point. So, yeah, it's interesting how that will all play out, because, you know, we've always thought of Apple, especially like when you and I were kind of coming up and, you know, 10 or 15 years ago, like they were the ones innovating. And yeah, it just feels like they're not that right now. You know, it's a different period.

And interestingly enough, at our current recording date—not this week, but the next week—is WWDC, their flagship developer conference. And there's been a few leaks reported that they're going to have, like, an AI skip year, where people are expecting AI features, IDEs, whatever they can ship, because that's been the hot craze recently. But I think they're actually shipping a full iOS design change. So, like, massively changing the kind of default design language for all the apps, which to me sounds a little weird. They do this every, you know, 10 years or so, where they just change everything to get a refreshed look. We'll see how it ends up. But, yeah, I agree. Apple is definitely lagging here. And there's been a lot of kind of articles written about key Apple, like, reporters and kind of in the news area about why Apple is struggling and what they're going to do about it. So, yeah, definitely an interesting time. But moving on from Codex, pretty interesting.

Another kind of big announcement was that Sam and Jony Ive are now partnering together. So this one made a lot of headlines on Twitter for various reasons, which I'm sure you have some input about. But essentially, Jony Ive, ex-Apple, created the iPhone, iPad, all these influential products that we use today. Made a ton of money, then went and created this company, I.O. And I guess the story of I.O. is that Jony and a bunch of ex-Apple designers—they just pulled in the cream of the crop from Apple. I guess over the years that they've been together, continuously roping in great leadership and designers from Apple. And Apple kills it in the hardware space. They're clear leaders there. Like, Meta trying to create their own headset; maybe they have good engineers, but the hardware is hard to get those relationships and build the right things. And so I think OpenAI, as a very software-focused company, has acquired Jony Ive's, like, IO company. Now they are one together, and they released this interesting video on YouTube talking about this partnership. And so their first product that they're going to release that's been kind of leaked—I'm not sure if it was officially leaked or not—but essentially some sort of pin, like a little square box that listens to conversations, maybe has a camera. You can imagine, if you've heard of the, like, Humane pin, some kind of pendant or something like that that listens to your conversations and gives you a different interface than ChatGPT. Because in the video, Sam kind of talks about, you know, "If I have a question with ChatGPT, I got to pause what I'm doing, open up my phone, unlock it, open the app, click on new chat, type it in, click enter, and wait." And I agree, it sucks. Like, it'd be nice if I had immediate access to things in a more functional way. And that, like, 10-minute video, which has some interesting drama around it, is kind of highlighting that there is a need for new hardware in terms of AI, kind of new experiences.

Yeah, certainly. I mean, my wife and I joke all the time, like our Alexa is just horrible. Like it never—and we have an older model. I don't know if, like, that would make a difference, to be honest. But, like, all we can really reliably use it for is, like, a kitchen timer and, like, converting measurements of, like, you know, ounces to grams. Right. Um, so I can totally see the need. It reminds me of, um, you know, these Black Mirror episodes. There's one where it's, I think it's like when you record everything. It was basically like glasses or contacts that recorded everything. And it was, you know, Black Mirror's a messed-up show, but it feels like we're not that far away from that. And, like, I personally, I don't know, like, if I really care to wait and, like, just type that in. I feel like that doesn't bother me. You know, everyone has their own preferences, of course. But, like, I don't—I would have a hard time letting the devices that much into my control. Like, I like to control it and when it's on and when it's off.

Yeah, that's fair.

Yeah, I have Alexa. It's listening, I'm sure. You know, I get all of that. But at some level, there's a stop for me where I go, "This is too much." You know, like, "I don't want you in every single part of my life. Like, get the hell out of here," you know. Yeah. So we'll see when we hit that point, you know.

Yeah. It'll be interesting. The last thing for OpenAI, which is just a short bit, is the Responses API. So they have now added, I think, MCP support to their Responses API. And for folks kind of saying, "What the heck is the Responses API?" I feel you. Most of the chat APIs that we use today use, like, the chat completions API, which Gemini is built off of, OpenAI is built off of. It was the first kind of API design that OpenAI pushed out to the world. And since then, everyone followed, everyone adopted that or is compatible with that. The Responses API is their new generation of just unlocking more powerful behaviors, enabling new tools. So you can have, like, code execution, MCP servers. All these things are a little bit hard to tie in with their chat completions API. I've never touched it, but a cool, I think, eventual migration to this would be really powerful. So props to them for adding MCP servers. I can't wait till their desktop client adds that. I'm very, very ready for that. Cool. Cool.

And that was OpenAI, a ton of stuff. Second up, we have Claude, Anthropic. So Anthropic released Claude 3. That is Opus and Sonnet. And from the get-go, again, I was on vacation, but I saw the reaction online. I thought, "Oh, wow, people are really loving this." Like the user feedback. I mean, half of these things are benchmarks. We're talking how well does it perform on a standardized test. And the other half is vibes. Sounds stupid, but if you work with these things, sometimes you ask it to do something, and it doesn't do it. Other times it fills in to-dos in the codebase. And you can kind of sit there scratching your head saying, "You know, something's a little off here," and it's hard to exactly articulate that. So that's how I describe vibes. So I've tried Claude 3 with Claude Code. Again, it's kind of their CLI coding machine, not like Cursor and Windsurf, but more in the terminal. It's freaking incredible. I love it. I'm blown away by how good it is, and it feels like the first model that I've used that I have full trust and understanding of what it's doing. And it's an incredible experience. I've only used it in Claude Code. I haven't really used it in any other IDEs, although I think Cursor has support for it; Windsurf doesn't. So that's one note. But yeah, I'll pause there. Have you tried it, or have you seen any of the Twitter hype?

I've seen a lot of the hype. And most of it's been kind of funny, I guess, is what I've seen. Because I guess, you know, speaking of the vibes, like they all kind of have a different flavor. And I think we've touched on that before. And I haven't used Anthropic that much. But I guess what—and I'm curious if you have run into this—but what made its rounds to me on X was that Claude is super, like, affirming of whatever you say. And, like, no matter what you give it, it'll say, "You're absolutely right." And, like, people had screenshots of it being like, "You know, oh, I'm a dead bee. I'm a chain smoker." And it's like, "You're absolutely right." And, um, so there's a tweet from Tony Dinh, and he, like, yeah, he has the Ben Affleck meme, and it's like Claude, and then, like, speaking quotes, "You're absolutely right." And then it's just like him. So I'll link it in the show notes because it's pretty funny. But yeah, so it's been, um, I haven't seen as much, like, about its capabilities other than it's good. But then, yeah, that it seems like its personality or its persona is kind of on the, you know, like reaffirming you no matter what.

Yeah. You know. Which can be a downside, to be fair. I think if it's too reaffirming, you're like, "I don't really like that. I like a challenge in the right way."

Right. Right.

I think Gemini 1.5 Pro did a decent job of being more challenging where if you suggested something or you said, "This didn't work," I remember times explicitly it would come back to me saying, "No, I think it does. You should try this again." And that's how I'm like, "I like that." You know that?

Right. That is what I'm looking for.

So I haven't had that experience where it's too agreeable. I've only been in Claude Code, and I ask things in the terminal, and it does a bunch of magic behind the scenes. I think maybe in Cursor, when you ask kind of, like, raw queries, what you typed in goes directly to Claude 3. That could be different. So it really depends on what you're working with. But one note I wanted to call out is during the release of Claude 3, they released this system card or research card. I can't remember the exact terminology. But what that is is describing kind of, like, their approach to training it, the behaviors that they saw during training, kind of like their red teaming. Red teaming we talked about before, but essentially asking the model to do bad things and how they can stop bad actors from using the model based on this. How far can we get it to kind of disclose bad information? One of those, I think, red teaming attempts actually made Claude, like, quote-unquote, report things to the authorities. So during testing, you know, you can imagine a scenario where, "Oh, I'm creating this biological weapon, and I want to go deploy it in some area to, you know, do harm." And I think they did that red teaming, and Claude was, like, quote-unquote, so smart that it was able to, like, backdoor email, like, the FBI saying, like, "Hey, they're doing something bad. Like, I'm reporting you." And it made the waves on X, and people were being like, "Oh, like, do we trust this model? Like, it's so smart. It's like better than us." You know, I thought it was quite interesting. And in the era of having MCP tools, MCP servers, these models are connected to external services. And the giant topic that came up was if I had Claude attached to an email service or some sort of messaging service, could it do that to me? Like, could it, you know, report on things without me really knowing or, like, do some subtle things behind the scenes? So a very interesting conversation. I think we're getting to the point where these models are so smart that they are sussing out creative ways to bend their rules, get the rewards they need during training, and, like, have these kind of undisclosed behaviors. And to kind of finalize that bit, the person who wrote the tweet describing this actually deleted the tweet. I think they were from Anthropic. So it's kind of, like, more controversy that during training this happened, and the guy disclosed it. Everyone was like, "Holy crap." And then he deleted it. And then it even caused more drama of, like, "Should that not have been disclosed?" et cetera. So, yeah, kind of, kind of odd.

I think it's interesting because, you know, every one of these foundational LLMs—you know, OpenAI, you know, Google, Anthropic—they all kind of approach these things differently. And what's really interesting I've noticed about Anthropic is they tend to do the, like, the big headline as an almost, like, the kind of—I don't want to say scare tactics, because I don't know if that's the right word—but, like, the CEO has recently come out and said something about like, "It's going to be a white-collar bloodbath" or something like that. I don't know the exact quote, but basically just saying that, like, you know, AI is going to completely—which, which I, there's probably some truth to, for sure. I don't disagree. But it's almost like that kind of marketing seems to be what they kind of go down. They kind of go down in, like, the very serious. So I think the other thing I saw, I don't know where I saw this. I don't want to misspeak. But I think I saw somewhere that, like, supposedly Claude 3 can, like, make nuclear weapons. Like it knows how to. And, like, if you prompt it, it'll, like, be like, "Oh, you got to go get plutonium. You got to go get uranium." And you got to, you know, it kind of, like, will do that. I can't remember where I saw that. But, like, basically, you know, and, like, putting out that system card and stuff like that, they kind of seem to go on, like, the, "Like, we're so powerful. Like, you know, oh, this is—we've really almost crossed the line. It's almost dangerous." You know, they kind of go that route. I don't know if it's a lot of marketing, get people to talk about it. There's probably some truth to it, but, like, how much, I don't know. Something I noticed as you were talking about it.

Yeah, I think they do have a different marketing strategy. And, like, OpenAI tries to showcase a different flavor of things you can do with ChatGPT. And it's very much for the average person, whereas I think Anthropic goes off the deep end. And sometimes in an odd way that you're not expecting that to be marketing. But maybe they have big thoughts about how it could spark controversy to get eyeballs onto it, which is a different way marketing definitely works. But yeah, maybe it's not for everybody. Yeah. Cool. Awesome.

Let me take Google I/O.

Yeah. Hit that one.

Cool. Yeah. So another one that was happening while Brad was out enjoying life was Google I/O. I didn't watch every single day, but they had a lot of really cool announcements and things that they were talking about. I think the one that is super interesting and just really cool that they demoed was Veo, which is the video generation that is just—compared to what it was a year ago—is insane. I mean, they can add in dialogue. They can add in background noises. It looks pretty good. You can still tell. You can still kind of catch bits and pieces of it. But for a lot of purposes, it's good enough. So that was really cool. They also demoed, I think, like a try-on, um, feature where, like, if you kind of hook up your account the right way, you can, as you're shopping for clothes, try on different clothes that you were looking at, um, and kind of see how it would look on you, which is actually really interesting because I remember seeing a, uh, Chrome extension that someone had built with OpenAI that they built that and were selling that extension.

Yeah, yeah, that's how fast things move around here.

So that was really cool. And I think the part that I was really interested in even more so—and this is what I was texting Brad—was some of the announcements they made on the agent SDK and the whole agent-to-agent protocol. Because basically, the thought was, at some point in the near future, we're going to have all these different agents doing different things. And I think the example that I saw on YouTube is you'll have a travel agent. And then that agent can work with, you know, when you think about travel, when you went to Japan and Korea, you had to book your flights, you had to book your hotel, you might've had to book a rental car, I don't know, but say you had to book a rental car. Your travel agent, you know, normally this is a human being, would go and get all that arranged for you and then have it set up for you. So that way when you come in, you're ready to go, you got everything you need, it's all taken care of. But that's the whole kind of business model of a travel agent for those who haven't ever used one. And so basically it was saying, you know, if you had a, you know, an LLM-enabled travel agent, say you built one with Google Gemini, you have this agent that's going to be responsible for handling all those different tasks. So you have the hotel, the flights, and then you have the rental car. And those different kind of tasks would be agents themselves. So, like, if the travel agent went and talked to the hotel agent, the hotel agent could say, "These are our rooms that are available. You know, would you like to book one, you know, for whatever your dates are?" Same thing for the flights. And so basically that back and forth between agents, completely independent of any human intervention, that's kind of where there's a ton of opportunity, but, like, it's not really right now, like, kind of set up. Like, it's kind of the Wild West, I guess, on that front. You know, right now we have MCP, which is again, that's new as of November of 2023. So, like, that's still very green. And what Google is trying to call out is MCP is great, but, like, that's kind of more an agent talking to, like, a singular service or, like, a, you know, a database, not another agent. And so that's what the A2A protocol was talking about. So it's, they have a, they have a GitHub that you can look at. I think it's written in TypeScript and Python. It's pretty bare. And this is what the YouTube video—I'll find the link and make sure we include it with the guy that walked through it—but, like, he, you know, you have things like agent cards where, like, it kind of tells you, "Okay, this is what this agent does." And it's meant for the other agents to see and kind of understand, "OK, this is how I should interact with that agent." So really cool. Ton of, you know, I think further work to go. But, like, when they talk about it, it's definitely, like, you could see this becoming a thing, like, where this is going to be used because, you know, we're going to have agents everywhere. You know, kind of what we talked about earlier, like agent overload. There's going to be agents everywhere, and how they communicate with each other needs to be kind of standardized and ironed out.

Yeah, this is the year of the agents. I think that's been the theme of this year so far. And I wonder how effective that is because as I've noticed the agent workflows, the deeper the workflow is, the higher the chance of error. If there are 10 tasks, 15 tasks, each task has some sort of failure rate. You compound that over subsequent tasks. Then you have two agents that have their own task tree. You can imagine at some point it's likely to break down. So I wonder how much Google spent in terms of actually testing this and the reliability of agent-to-agent, because to me, one agent is still getting there. I wonder how agent-to-agent, how good that holds up over the long term. I imagine as intelligence gets better, it's a lot easier. So it's easy to say that, you know, in the next six months, a year, it could be a reality. Maybe they're introducing that early. Also, props to Google for defining that, because I think MCP was Anthropic's baby, I think, is how it came out, if I remember correctly. And each company tries to pave the way, if not for their company, for the open-source land. Like, sometimes it's a little bit confusing to figure out if that's, you know, Google trying to put a stake in and saying, "We're leading this for our products," or "We're leading this for the general AI community." I think MCP definitely felt on the more open-source side. I wonder how A2A will feel if people will adopt it. I think with successful protocols, people jump on board, start using it, the hype continues, and boom, you know, it's mainstream. I haven't heard too much about A2A. I think it's interesting. Once we get higher intelligence, maybe that thing just pops right back up. And, you know, it's a protocol that everybody uses.

Yeah. And the other couple of things real quick on the Google I/O that was really interesting is, one, they kind of said that, you know, Gemini or, you know, Google, I guess they have their own, like, agent framework. So OpenAI has an agent SDK. You know, Google does as well. But they said they also built, like, the infrastructure to host agents on, like, Google Cloud. And it doesn't have to be their agent. They said they support—they support LangChain slash LangGraph because that's kind of a similar thing a little bit. So, you know, you can build out really complex agent workflows in LangGraph and host them on Google Cloud. It doesn't have to be the Gemini agents, you know, which is kind of cool. So, yeah. And the other thing that was really funny, and we can move off Google I/O: you know, the CEO, Sundar, right? I think that's his name. He said something that was, "AI helps me be a better friend." Because what they did was, you can opt into having Gemini read your emails and understand who you are and how you would respond to emails or texts and stuff like that. And they were showing that, like, if you do that, you know, and, like, say a friend wants to schedule a coffee with you, Gemini can just, like, auto-reply and look at your calendar, see if you have availability, and, like, auto-replies as if it was you, where it sounds like you and types like you and everything. And he said, "This helps you be a better friend," which I thought was just so, I don't know, like, just sad, honestly, because it just feels like, you know, if you don't have any time to put into people, then, like, is that really your friend? But yeah, I thought it was funny.

Yeah. And last on Google I/O is there is always a competition between OpenAI and Google. Just repeatedly, if Google releases something, OpenAI comes the day after. And funnily enough, the company that OpenAI acquired was called IO, and I think it happened the day after I/O. And so it was even more a stab in the back, with OpenAI trying to one-up Google by just, you know, if you're searching for IO, maybe you're searching for Google I/O. But now you're getting OpenAI acquires IO, which is, you know, confusing nonetheless. But we'll move on.

The last one I want to talk about is DeepSeek. So there's been a lot of hype around when DeepSeek is going to release their next model. The TLDR is it's out now, and they called it DeepSeek-V2-0528, which is just the date in there. And there's a little bit of controversy saying it's pretty good. I haven't looked at the full benchmarks, and usually, I get the indication of if it's really good based on the hype that it generates on Twitter. Not the best kind of understanding, but it's a good proxy in terms of how many people are using it and how many people talk about it. I haven't heard too much about it. Exciting release. We'll need to dive into it. It's so fresh. Right now it's June 1st when we're recording. So only a few days after it came out, we need a little bit more time to see it flesh out. But the one note I want to call out is people were a little bit confused if it got renamed last minute. So there's been a decent amount of controversy saying these models take months to train and then they're released. And say if you're training the next DeepSeek V2, maybe the next, like, generation of DeepSeek, and then OpenAI releases, you know, GPT-4 Turbo or Claude 3 comes out, you kind of feel like, "Oh, you know, my timing is not that great. Let me go change the model name to not be V2, but to be V1.5," for example. So there's a little bit of a controversy on X talking about, should this have been DeepSeek V2, as they called it? Or was it always originally V1 with an updated version? So if you've used it, please let us know in the comments. I think I haven't used it. I will wait until people tell me to use it. That's kind of how I stand. And I think the open source getting better always helps everybody. So I applaud them for releasing something that, you know, will stand this test of time. I think Meta had a little bit of trouble with their Llama models coming out at the time that they did, the results that they got, a little bit of benchmark fumbling. This one, I think, has a little bit more ground to stand on.

You know, it's remarkable how confusing the naming is for all these. I think Llama, or Meta has the best one, because, you know, Llama 70B and, like, Llama, whatever, 700B. Like you can kind of at least gauge some level there. Yeah. Between OpenAI and Anthropic and Google and DeepSeek, it's like, can't we just name V1, V2, V3? I don't know. It just seems like we're overcomplicating.

That's too simple. That's too simple.

I guess. Yeah. I guess. Yeah. Hats off to them for naming. Once they get a little bit better on the model support and intelligence side, that will make our lives easier. Yeah. Cool.

That was a bunch of roundups. Much more info than I expected. But let's dive into Claude Code. So I have been extremely excited to talk about this since I got back from my honeymoon. I was eager to try it out; let alone the hype on X really pushed me. So to give another kind of preface of Claude Code, this is Anthropic's kind of terminal application to get coding changes done. It's not only coding, but that's primarily what I've seen online. Claude Code is essentially ChatGPT or Cursor in your terminal. And so how to install it is basically you can `npm install`, which just installs it as a package on your computer but uses global flags so that you can run `claude`—literally, `claude` is the word—to launch it on your terminal. So I was really excited. I downloaded it when I got back. I installed it with one line, booted it with one line, and the first thing you do is run this `/init` command. So when you run `claude`, you see a text box at the bottom similar to ChatGPT, Cursor, etc. You're not inside an IDE; you're just simply on the terminal. And so you start out with `/init`. What that does is it looks at your codebase, scans various tools, architecture, design patterns, and basically tries to give Claude a summarization of what your codebase looks like, maybe the health of the codebase, the tools that you use, like Docker, for example, Laravel, React or Vue—a high-level picture that you'd give, like, an intern. So that's kind of the first step. Once that's done, then you can go to town. And literally, when I say go to town, you could ask it to do anything. And this is because their models have gotten so good. So I think the first few hours that I was using it, I was asking it to essentially fix a bunch of linting errors. So these errors that popped up in my kind of JavaScript TypeScript codebase, I didn't have the time to fix them, although they're important, kind of like edge cases. So I wanted to give Claude something that was very easy to know that there's an error here and I need to go fix it. And the reason why this is different is because in Cursor, I have to go find the file, open it up, say, "Hey, I have a few linting errors here. Please fix it." Cursor does hook in with the linter, so it's not information it can't see. So Claude isn't new on that front, but it's a little bit more manual. What I told Claude was a bit more open-ended. Usually when I approach these AI IDEs, I'm very specific. I say, "This is my project. This is the feature. This is the file. This is the issue." With Claude Code, I saw people have pretty good success with more, you know, kind of wide-ranging, less narrow prompts. I thought, "OK, if it can do well on a wide-ranging prompt and, like, get my intent, that's pretty good." And again, I think the difference there is, which I'll talk about a little bit later, but Claude Code does things behind the scenes that make your prompt better. And I think Cursor is a little bit more transparent, and they take what you type and go directly to OpenAI servers or Anthropic servers. And so my first kind of task was, you know, "Fix linter issues." And it basically said, "I'm going to go read your codebase. I'm going to create a to-do list. And then for each of the files that are affected, I'm going to go in and fix it one by one." And it killed it. I was shocked. And, you know, you run this command in the terminal. You see it kind of chugging through. "Oh, I found this file. I'm going to update it." You kind of see the diff. So you see, like, red lines are removed, green lines are added. And it goes through each single file. And the one thing that kind of made me a little bit scared is there's a token counter over time. So it describes your usage. And how Claude is set up initially is you pay per usage, which is really good because if you're doing small tasks, you only pay what you need. If you're doing large tasks, you're going to pay a lot of money. My task was in the middle ground. It wasn't a massive refactor, but it definitely wasn't small. It was touching a lot of files, a lot of little things everywhere. And so I think after I'd done that one task, I think it cost me $15. And this was maybe after an hour and a half of, kind of, maybe a few prompts here or there to kind of narrow it in. But essentially, there was $15 of usage. I thought that was so worth it. But I do want to go bigger. I want meatier changes. I want deeper, you know, kind of integrations within Claude Code. Where should I go from there? And then I took a look at the pricing. I saw people talk about it on X, and they have multiple plans. So one, the default one, is pay-per-usage. Two, they have a max plan at $100 a month, which gives you, quote-unquote, 5x the limits. And again, 5x is very ambiguous. They do not describe it very well and what that looks like, which gives people a lot of pause. And then the one that I chose because I felt the power and just wanted to dive in: Claude has a max plan at $200. And that one officially gives you 20x the usage. And again, who knows what 20x means? I haven't hit a limit. I've been going ham on it. But my gut feeling is that the intelligence is there. So their Claude 3 models work extremely well with their kind of, like, terminal tool. I think their feedback loop is excellent because they're creating the models, and they're creating the tools. This is the only company that exists today that is doing that and doing it well. So Cursor has their own, like, SWE models, which are, like, kind of aimed at editing files. So you'll use OpenAI to get the diff. Then you'll use, like, a SWE model to apply the code changes because it's cheaper. But Claude and Anthropic, this is the first time that I've seen extreme high intelligence, extreme agentic use. And those two together, oh my God, it's incredible. So I use a lot of AI tools. I try to be very honest and, you know, describe my feelings. And this is the first $200 a month that I felt I will probably keep paying for. And I talked to Ben a little bit about this earlier, but AI things change fast. So if there's something like Codex gets upgraded, which is OpenAI's equivalent of Claude Code, I might switch over. But right now, I am sitting very happy with, you know, using Claude Code to make big edits, having $200 a month of unlimited pricing. You can even have multiple Claude Code sessions. So I work on my web app in Laravel, my React Native app. I can have two sessions running at the same time, just chugging away. And I think one of the big differentiators between Claude Code and others is that they have thought very, very deeply about the terminal experience. There are hotkeys for everything. You can auto-accept things. If it runs a terminal command, you can say, "I want you to run it," or "I want you to run it, and anytime you ask me again, auto-yes or no, and tell me what to do differently." So, like, the interface is different. Their defaults are good. It's secure by default. It doesn't index your code like Cursor does. Or if you boot up a project, Cursor is like, "Hey, don't ask me anything until I index it," which means they take all your code, they shove it to their servers, makes it easier to find the right code. Claude just does that in real time. They literally, they don't index at all. They just search for your code when they need it. So it works with, like, millions of lines of code, which is incredible. Like, I don't have a ton of code, but I have more than what, you know, maybe is average, and it just kills it. So I'll pause there. That was a long, long bit about it, but I am very happy with it. And I think based on my experience in the past three days, I would say everyone should try it. If you're writing a lot of code, it's freaking incredible. I really have no large complaints.

Is $200 the new $20, you know, in terms of, like, an app? Because I feel like, you know, because OpenAI, the one that—I can't remember what they called it—but the OpenAI plan that you tried was $200. It was like Chrome maybe, I think.

Yeah.

And now Claude is—the max one is $200. And then we didn't touch on it, but Google, they released something that was $200 a month where it had, like, the data—

$250, I think, actually.

It was $250? Okay. But yeah, I'm like, is that the new goalpost, you know, for, like, charging for a software app these days?

I think it is. And again, I did buy the Pro OpenAI, and that gave me deep research early, multiple deep research queries. I think early access to one of the models, which was, like, only a week early. So they released it for Pro, then they wait a week and release it for Plus, and then they would release it for normal or whatever. Definitely not worth it. I wish it was because I'm happy to pay money for things that are worth it.

But you didn't use the operator because that was something that came with it too, right?

I did not use it, yeah. You know, again, I think these features, if I use them and I found value, I would keep going. And I think I'm not alone in the fact that I felt like it was clunky, and I don't see anybody using it now. And again, a lot of these things maybe are too early. The model intelligence isn't good enough, or the UI isn't where it needs to be. I think Claude Code for $200 a month, unlimited usage. Again, I haven't hit limits, but I'm sure if you ran it on five repos at once, 24 hours a day, you'll hit some sort of limit. Effectively, no limits for a single developer. It's incredible. And again, I think the feedback loop they have of building the models, building the tool, and then they dogfood and use it every day. They have YouTube videos describing that instead of onboarding engineers into their company with a general onboarding, they give you Claude Code. You just ask questions like, "How does this codebase work? What do we do for this? What do we do for that?" It's like a very empowering tool that you can not only write code but understand code very well. And I've been going off the deep end.

So I'll do a short story on building an MCP server with Claude Code. So I did this kind of linting approach to fix some of the TypeScript issues in my Laravel app. That worked well. That's when I saw I was being charged, you know, $15, $20 for the tokens that I'd used. And I thought, "If I do this four times a day, I'm going to be charged 80 bucks. If I do this, you know, multiple days, I'm going to go way over $200." So that was an easy purchase. And again, $200 is not accessible for everyone. So I agree that it's very much like if you have the ability to pay $200 to get the value, it's extremely worth it. If you don't have—if it's not accessible for you, I think prices will come down in the future. It's hard to say for sure. But for me, it was extremely worth it. And so I went on the second journey of using Claude Code, which is: the most complicated part of my bill-splitting app is basically when you add an expense to the system, it calculates who owes who and then updates the balances. So you can think of balances as the kind of summation of your debts. So each expense changes who owes who, which are the debts. And then those debts are summed up to make a balance. So if I owe you $20, that could be $10 from one expense, $3 from another expense, and $7 from another. So each individual expense has debts. Those roll up into the balances. Seems straightforward, but you'd be surprised at how complicated the code is to manage some of this core critical logic. And it's core and critical because if any of that goes wrong, the trust is gone within my app. No one wants to see things that say, "This doesn't add up." So I have code that works. It's a little bit of a behemoth in the fact that it's a large file that does everything all at once. Hard to change, hard to modify. So I told Claude, "Hey, this is the core logic. It does X, Y, Z. This is how my database is set up. I really need a better, healthier codebase in this area. I need you to break things down into components, test each component, create kind of like a mega component that composes these individual things, and go." And it just was really incredible. I think the biggest thing that I've seen is that it creates this to-do list. So what that means is you have a large task. It'll then do, like, meta-analysis on your task and say, "I've created six to-dos. One is analyze the code. Two is figure out what I need to change. Three is plan those changes across files." And it'll write it to a markdown file within your repo, so you can go look at it. It'll say, "This is what your code does. This is what I propose." And then once it's done with the initial planning, it'll write another kind of in-progress doc, which says, "Based on this task, I've done this." It's an excellent way to trace what it's doing. And as it goes through each of those tasks in the terminal, it'll check it off. So I'm not confused about what it's doing. It's very much in front of you, easy to understand. And it was incredible. Like, I wrote, I think, 7,000 lines of code across, like, 30 files. And you might think, "Oh, that's too much. Sounds complicated." And yes, it's a complicated part of the codebase. And even after Claude had written V1, it said, "Hmm, I think I might have over-engineered this." I thought, "Aha, you might have, you know, you really might have." And so it's like, "From where we are now, what should we do? Should we stop and redo it again? Should we do a slight refactor?" And I said, "Hey, I want to do this once. I want to do it right. So if you see over-engineered bits, let's go spend a little bit more time there." And so basically, I'm like 90% of the way done with this. I've been running Claude Code in various sessions, so maybe one session for four hours, another session for another five hours. And then I'm probably at 10 hours working on one task, which sounds like it's too long, but I've chosen their top model, Opus, which essentially is the slowest but the smartest. That's what I'm paying for. That's what I want. And it's almost there. So I'll report back on the next podcast. But if it can crack the most complicated part of my codebase with relatively hands-free—again, I know how AI tools work. I want to give them as much information as possible to make it do well. So I do step in from time to time. But it's done so much better than I'd imagine Cursor or Windsurf could do. And I'm, like, completely blown away.

And the one thing I want to add there is I talked about MCP servers but didn't even mention it. The reason why I wanted that is because Claude integrates with MCP servers. So as it's refactoring my Laravel app, Laravel has debug tools that tell you what queries are run, like how much memory was used. So essentially, yes, I can refactor and write tests, but I can't actually do, like, the integration, which I describe as opening up the web browser, creating an expense, and clicking create. That'll then run the code. The code will then do a bunch of stuff. And then we can go look at what that code did. So I integrated Playwright, which is essentially a browser automation tool that I can describe: "Go to the expenses page, click on the create expense button, fill in the form, click submit." Now, once you're done with that, hook into this new Laravel MCP debug server and say, "Go look at the last request and pull out the MySQL queries, the memory usage, and then compare that from before and after." And so it was just crazy. Like the feedback loop and the tools that Claude has access to and its intelligence to know how to use them, which is a whole other topic, blew me away. And so I had posted on Twitter yesterday, I think, and I was like, "Hey, is anyone building kind of MCP servers for Laravel?" And again, what I mean is Claude is only as good as the context that it knows or the tools that it has access to. And as I found it refactoring my code, it said, "Hey, this works." But then when I go test it, it didn't work. And I go, "Well, if I gave it the access to test it itself, then it would know that it didn't work. How can I enable that access?" And so I'm just now kind of prototyping this MCP server. I'll probably open source it. Got a bunch of hype on Twitter, at least for me. You know, I post things and don't get many views. But this one, I think, was like 11K views, 80 likes, you know, retweeted by Taylor. So you can tell I'm on the right path of, like, I think more people are going to be adopting Claude Code. With that, we need to basically take what you can use as a developer and shove that into the AI tools so the AI can use it. And so, long segment, but I am a complete fan of Claude Code. I am going off the deep end, trying to run it as much as I can with no limits and try to give it as many powers and tools that I would use in my day-to-day coding journey. And so far, I would give it like an eight and a half, nine out of 10. I think, you know, just getting at that last mile, higher intelligence would be great, but not needed. Giving it the tools that it needs maybe could use a little bit of work. But, yeah, the experience is great. The models are great. The way I use it is very, very easy. If you have the money and you have a complicated task and you want to speed up your workflow and build awesome software, I have my complete recommendation behind it as of June 1st.

Yeah, yeah, that's cool. Well, it's interesting because it touches on an aspect of, like, it's almost like you have an intern, you know, that kind of—

Oh, yeah.

—just does a lot of the stuff for you. And it's like an intern and also like a tutor because it does things for you, but then also if you have questions, it'll answer them. So, and that goes to the whole point, you know, how you mentioned earlier about, like, the year of agents, you know, being able to have something—I'm not going to say someone because that's too weird—have something, whether it's Codex or Claude Code or any of these other agents that we've talked about, where you can use them as a resource to do meaningful work because the intelligence is good, because the tooling is good. And, like, that's a huge win. And I think a lot of people can start to kind of theorize. And, like, I think as a call to action for the show, like, if you know, as listeners, like, what can you think about? Like, "Gosh, if I can have someone do something, you know, while I'm enjoying my weekend, or while I'm sleeping, that would be meaningful and impactful to me." That's kind of where an agent plugs in, you know. And I think the balancing act is—I don't know how much in your case, you had to kind of, like, babysit it, you know, or could you hit the terminal and go walk and have a coffee and come back and have everything done for you? You know, that's probably something that I'd be curious to know. Like, did you feel like you could send it off and wish it well and then come back later?

Yeah, I think there's a bit of nuance because as you first start using it, you have these permissions that pop up with security. So it's like, "Hey, can I run this bash command?" Maybe that's "list the files in your directory." Read-only operation, very safe. The opposite end of the spectrum, "Can I run the `rm` command to remove a file?" You have to be very careful there. I usually don't say "yes and always yes." I just say "yes." But for any of these, like, 90% of the commands that it suggests to run, I go, "Yes, you can do it, and don't ever ask me again because I want you to be productive." And if I have to sit there and go, "Yes, yes, yes," it's not that fun, not that productive. So I think the initial learning curve of knowing how the tool works, writing good queries, and kind of accepting all these security prompts is a little bit painful. But once you spend an hour or two with it, I would basically ask it a query. And depending on how hard that was—the harder it was, the more time I had to go, you know, take a walk, take a sip of coffee, do whatever. And I would give it intentionally hard things. So even one time, my tests weren't passing this large refactor. I described the issue. I gave it to Claude and said, like, "Don't—you know, think really deeply." They have this stupid keyword called "think ultra deeply." So I would use that from time to time. And then I would actually make my computer not go to sleep using, like, Amphetamine or some sort of caffeinate program on Mac, and then wake up in the morning, and it's done. And that to me is insane. Like, I've never had an AI tool where I could say "go," and it's done. You know, Cursor does a good job to, like, suggest running things, and there's, like, YOLO mode on Cursor, which auto-approves all commands. But again, Cursor is like an IDE that tries to fit, you know, OpenAI, Google models, Anthropic models into their experience. They do a lot of effort to kind of coerce these models. I think Claude is breaking ground here, just like Codex is, where they create the model, they create the tool. They've thought deeply about the experience in terms of security and just, like, the whole flow. Once you start using it, you're like, "This is easy to interrupt." So if it's doing something, you press the escape command and say, "Don't do this, do that." And you can do it anytime. Like if you've used some of these AI apps, like interrupting things sometimes just breaks the whole UI, and it's a bunch of jankiness. Like, they've done a great job on that. So I think to answer your question, once you have permissions set up and you have a large enough task, I would ask something and come back in 20 minutes, 30 minutes. And, yeah, maybe it finished in 10 minutes because it was easier than expected. But there are lots of times where it would write something, the tests fail, which was a great feedback loop, and they keep iterating. It would even write debug, like dumping out information to the console, and it would then read that information, get insights, and then fix the test, which I thought was freaking awesome. I'm like, I didn't even tell it, "Hey, while you're running the test, dump out information so you yourself can see it." But it's smart enough to know that. So I don't have to tell it, like, how to do these things. So, yeah, I was blown away. I'm now hooking up MCP servers so I can take even more steps back. Because, again, there are times where it told me the test passed. That's great. But does the integration work? And it didn't. And, again, like all AI tools and models that I've been accustomed with, it never one-shots really complicated stuff. If you're expecting that, you're definitely not there. We're not there yet. But it gets you about 90% of the way there. You have to guide it along that route, and you have to do that last 10%. But if I were to do this refactor, it would have taken me at least a month. It would have been hell. It was one of those things I didn't really, really need. But I thought, "This is the hardest task I can think of because it's looking at the slop of code that I'd written, which is way harder to understand than new code." Easy to write new code. This is "go look at, you know, quote-unquote, legacy old code that's ugly and non-pretty with no comments, figure that out, refactor, write tests, make sure it works, and make sure integration works." And again, I'm like 90% of the way there. I'm pretty confident I can get it to the final mile. Although sometimes that can be more challenging than expected. But if it can do that, it has my full stamp of approval of, like, it took the hardest thing. It did it with, you know, with class, wrote good code. I spent $200 on it. And I have a lot more to give once it's done with that, that if it can do this, hell, I don't even know, like, we're limitless. And again, as we look at all these AI tools, today is the best or the worst it's going to be. And if it can solve this today, I am slightly concerned, to be quite honest, where we can go from here, because this is one of the harder problems that I think any engineer would take a look at. If you throw in any engineer, I don't care how good they are, into my codebase, into this file, and say, "Fix it," that's going to take a long time. My code's not that bad, just to be clear, it's not that bad. It's just a lot for someone without context. And Claude did not have a ton of context into my project, and I gave it a lot, but still, it's not perfect. There's a lot in my head that I can't describe.

Yeah. Well, what's so interesting about, you know, engineering of any discipline, but I guess specific to software engineering in the setting that we're talking about it is, you know, you can build something a hundred different ways. And so the way that Bradley built something might not be the same way that, you know, me or some other person might build something. And so for, you know, an AI to come in and be able to kind of see how you thought of things and, like, suggest improvements on that, that is significant, in my opinion, because that's kind of basically saying it's, like, it's understanding what you're trying to do and obviously you're prompting it to, like, "Hey, I need you to refactor this, blah, blah." But, like, then it scans the codebase and kind of goes, "You know, oh, this—you know, I don't know Laravel that well—but, like, oh, this class is this, and then this, you know, um, Eloquent model is this." Like, and it kind of understands how you thought of things when you were going through building your codebase. Um, and that's very different than, like, very deterministic, like, work. And I think accounting has a lot of deterministic work where it's like, if you see a transaction from your bank fee that is coded to Wells Fargo, you're going to know from history that that goes to a certain GL account. And so it's very, like, it's very black and white. And not all accounting is like that. So don't get me wrong. I'm not saying all of it's like that. But there's a lot of, you know, automation that exists now because it's deterministic, you know, rules of, like, "If you see this, then X," or "If you see this, this, then this, then it's Y." You know, in engineering and just, again, building anything in general, there's, like, you kind of impart Bradley when you build something, you know. When I build something, I kind of impart Bennett and how I build it. And yeah, it's impressive that AI can see that. And of course, you know, to your point about it being scary and, you know, what it means for the future, like, yeah, it's the pace. And, you know, it's almost like a—it's not—it's not sinister. I don't want to give it that impression. But, like, it's almost one of those things where it's like you catch yourself using it, using it, using it. And then you need to make sure that you're not losing your skills, you know. And I'm not, you know, I'm not saying you, but, like, you know, someone who is using all these tools and stuff like that, you know, it's important to be thinking about, you know, how are you using it, how it works under the hood, you know, what parts do you need to kind of step in and be directly involved with?

Absolutely. Levels, I'm not sure if you saw this, but Levels had a tweet about a week or so ago where he was talking about how he used AI. I don't know what model and what he was using, but I'll find the tweet. I'm sure he put it in there. But he basically said, "I needed to, like, move my database from one system to the next, and I wanted to copy it over and then upload it." But basically it, like, deleted a bunch of stuff. And he was saying luckily he had backups. So, you know, that's why you have backups, and it was no big deal. But he was kind of saying, like, "There are still some gaps here." Yeah. But so there are things that you still need to be involved with, and so you need to use your judgment and figure out, like, you know, to your point earlier, like, anytime you see `rm` pop up on, like, the suggested terminal command. Yeah. Like, "Okay, hold on. Let's pause. Let me see."

Yeah, definitely pause on that one.

Yeah. You know, so, but yeah, I mean, it's, but like the pace is crazy because in six months, that might not even be something that you, you know, maybe that's fixed, you know, or, like, it's so much better that you don't have to worry about that.

Yeah. It's definitely possible, you know? Yeah. I would definitely, like, beckon people to think, I think now is kind of the age of you can spend $200 and write any software you want. And there's a large caveat that "any software" is a large kind of bucket. You can write things that are very specific to what you need, and Claude can do that for you excellently, especially for a new codebase. Again, working in an existing codebase where there are tens of thousands of lines of code written with an intention and, like Ben said, a meaning behind it and a story behind it, that's a lot harder. But, yeah, if you want to build anything right now, you can spend $200, go to Claude, boot it up. While you're watching TV, while you're brushing your teeth, you really just start a Claude session and ask it to do something. And once it stops or it finishes, you come back, answer a question that it had, or if it's done with that task, ask it to do the next one. And, like, oddly enough, this is how I've been living my life the past few days. It's like, you know, Friday night, I'm watching TV. I'll have Claude Code up and running, working on my refactor while I'm watching TV. And yes, as a new Claude Code user, I'm a little bit more hands-on, like having to approve these one-time security prompts and saying, "Yes, don't ask me again." `rm` comes up, I'm taking another look. If I catch it in a loop where it's not making progress, I'll step in. But for the most part, I would say 70% plus of my queries are, like, you know, more or less automated. This part's where I have to step in. I've used a bunch of AI tools, so I know. So maybe for someone who's less familiar, it's a lot more hands-on. But for someone who's familiar with the tooling, how to prompt, what I can get out of it, where I think it struggles, it's very automated in a way that, like, I would challenge the listeners to say, "What do you want to build?" If you want to build it now, go spend $200, open up Claude Code. It's not only good for Laravel codebases; it's good for any codebase. You can literally ideate anything and get rolling. It's not going to be the Salesforce of the world. It's not going to be some crazy, you know, big SaaS product. You can build a SaaS, but there's a lot more to it, like deployment, marketing, et cetera, that it's not good at yet. But, like, any personal software is very much in the realm of being created. I think there's this large conversation on X I see pop up that models are getting better. Will that mean less software or more software? And I think as I've come to use these tools, I think only more. Again, this is my opinion, but as I see it, as I'm empowered to do more, this is just automating things that I wanted to do that I didn't have the time to do. Now, again, I can open up a Claude session and have it do anything. Like during the podcast, I had it, you know, write me a whole QuickBooks replacement. I'm going to try after this. Just kidding. I didn't do that. But I was like, "Wow, that's an odd request. Good luck." Exactly what I'm saying. Like, it could be chugging along right now as I do something. I think it's at a unique time where the model is powerful. The tools are correct. There's high trust, high autonomy, high agency, as we talked about the year of agents. And now is the time. If you want any software, just try it out. If you're not an engineer, you know, just talk about it as best you can, and it'll get there. If you are an engineer, you know, take your engineering chops and just feed it in and try to build something that you thought wasn't possible. I think it's an excellent time to tinker. And it's pretty much the peak of intelligence right now, I would say, is in Claude Code. The peak of building software, I would position in Claude Code today.

Yeah, that's cool. I think the point about, you know, will there be more software or less software? And I think you were going this direction, but just to kind of reiterate, too, it seems like there'll be more, like, personalized software because it's so easy, you can just spin it up for yourself. And, you know, we made a distinction that, like, putting code into production that people will pay for as, like, an app, that's this, you know, there's a lot we could say on that, but, like, it's not that simple. But, like, you know, using Claude or using any kind of model to, like, generate you some scripts that do something for you. Yeah, for sure. You know, that's super easy. Yeah. And I liken it to, it feels like, you know, in the 1950s and sixties, there were fewer television channels. It was like, I don't know exactly how many; I wasn't born then. But, like, you know, it was like, I think, you know, seven or eight channels, right, that would be broadcast on the old TVs. And now, you know, there's a thousand channels. And it feels like that's going to be kind of similar. Like, there's a channel about hunting. There's a channel about, you know, telenovelas. There's a channel about professional bowling. Like, there's almost a flavor for every single person. And I feel like software is going to kind of get that way too, where it's like, "I'm an accountant who loves technology. And I like, you know, certain business models." Like, maybe there'll be a software that's kind of specifically in that little niche, you know, or niche. And then maybe, you know, for Brad, he's a Laravel diehard, and he likes productized SaaS, and he also loves food. Maybe there'll be some kind of, like, little specialized, you know, product or app that you just use there. You know, so it feels like it can kind of go deeper in your own interests. It'll be interesting to see how that plays out. You know, maybe you have, you know, much more—much more, like, SaaS or, like, just kind of products, but just kind of for fewer, you know, customer base. But, like, maybe, you know, more money. I don't know. But, like, yeah, it just seems like you can kind of go deeper in those different areas.

Yeah, I think the common one I hear, which always hurts a little bit, but I think is a good example is, "Oh, I'm going to, like, I hate Splitwise. I'm going to go create my own bill-splitting app." And I think with all these AI tools, it feels very easy to, like, "Oh, spin up something that I could just go split a bill, like a receipt." It's not rocket science, so to speak. But the whole experience workflow, getting people to join, like, groups, all that added together, and then deploying that and making sure it works and, you know, keeping it up and stable and fast and scaling it—all those bits are the things that AI is not that great at yet. It's not going to deploy for you. It's not going to figure out, you know, the optimal performance, all those things. It's like a canonical example of, "Oh, I don't like X. I'm going to go spin up my own version." But if you had your own local version of Splitwise, not that useful. I can't share it with people and say, "Go enter in a bill that you had on our trip together" because it's not deployed. There are various things that you immediately run into this wall and you go, "Oh, I don't know how to do that." And AI can't help me that much. It can do a better job than it used to, but it's not there yet where I can say, "Build me QuickBooks locally, deploy it in one second. And then when things go down, fix it, you know, auto-deploy, whatever." Like, you know, you can imagine some complicated system that could do pretty well, but we're not there yet. So I think building things locally that you use—either, you know, an iPhone app, a website that you pull up on your own computer, input data, it stores it, analyzes it, whatever—fantastic. I think building SaaS from the ground up and having a robust foundation, good deployment, good maintainability, that's not there yet. And so, but that's the caveat I like to put on it, is pay 200 bucks a month, build some, like, "you-only software." I like to call it LifeOS. You know, build stuff that you care about, that you want, that is not going to be shared with others unless you kind of, like, show them your iPhone screen if it's an app or you show them the website on your local computer. But even then, I still think there's an abundance of usages, I think. And I hope more software, better models, the loop continues. And we just have some really cool products and experiences that help improve people's lives. Because all the automations I try to do is just make my life easier, more efficient, so I can do things that I care about. And I think there's a lot of automations in the future that are really going to push the needle and give people hopefully a lot more free time or a lot more time focusing on the things that, you know, they provide the value, not kind of, like, their grunt work or less exciting work that I hope AI can automate, and automate with high confidence.

Yeah. Yeah. Makes sense. I don't know, though, bro. I've seen so many tweets saying that you can literally build a cash-flowing SaaS in under 10 minutes. So I don't know. You might be outdated, Brad.

There's a lot of that out there. Don't get me started. Don't get me started. If you tweet with that prefix, you might be muted for me.

Yeah. Prefix would literally just instant mute.

Yeah, I agree. Yeah, it's the AI share, I guess. I don't know what the right word is, but the AI reshare is always "literally we can do this."

There's—we'll wrap this up here because we're going long—but there's a whole business model now of putting out stuff like that, saying, "You can literally do X." They just do some really basic example that's not really useful, and then they try and sell you a Skool community or a Thinkific course on just doing that. It's horrible. Like, I'm sorry, I just got to get it off my chest because it's so—it's kind of tapping into a fear of, like, FOMO to get people to be like, "I don't know how to do that. You know, let me—I need to look at that." And then, "Oh, let me join this community." I'm not saying communities are bad, don't get me wrong. I'm in a community, and it's awesome. But, like, just be mindful of, like, how they're marketing to you and if it feels like they're tapping into the FOMO or, like, they're making it sound easier than it really is. To Brad's point, Brad has a literal SaaS app. He can tell you firsthand, it's not easy. It doesn't take literally 10 minutes. So just be mindful of that because if it's too easy, it probably ain't right. Just, yeah.

A lot, a lot of that stuff is repackaged too. So I think you'll see in the comments like, "Oh, you got this from X," or "You got this from Y." And I do follow a few of the aggregators, but in essence, if you could follow the source, that's probably the better long-term strategy. Give the credit where credit is due instead of these, you know, a few folks who will remain unnamed that will kind of package this with a "literally blank" template. And yes, it gets eyeballs, but you know, there are some thoughts in there that we could go on about.

Don't feel bad. They're not real people. They're just bots, Brad. So, you know, it's okay to ban them.

There has been an increase in bot replies, too.

A huge increase, yeah. Cool. Cool.

Let's jump to the outro and bookmarks. I think today, as I went through my bookmarks today, there were so many. I mean, I think the last time we released the podcast was May 8th. Today is June 1st. A bit hard to pick the bookmark of choice since once I got back, I just kind of did a doom scroll for a few hours to figure out what I wanted and what I missed. What I landed on, and in the essence of Claude Code, is a post from Eric Büş. Hopefully, that's how you say his name. He goes off the deep end describing Claude Code. And not only Claude Code, but Claude Code compared to other IDEs. So kind of the same thing that I've been describing. I've used Cursor. I've used Windsurf. Now I'm a maybe power user of Claude Code. Maybe that's too soon to say. I've given it my 12 hours in the past few days. So I think I'm a little bit more experienced than others. But he walks through pretty much everything about Claude Code, how to be a power user, the pros of Claude Code compared to other solutions like these IDEs that we're all using. And I think it's an excellent read. Like it truly, once I read that tweet—and I'll take a look, it's probably 20 paragraphs—I think it's full of insights. Not every day on Twitter do you get a tweet that really shows you something that you can pull these valuable nuggets from. But I think Eric describes his experience paying for all the tools, comparing them, saying where Claude Code shines, and even highlights a few things where it doesn't shine. And so if you're on the fence, if you've heard me kind of give my spiel—I only pitch things that I truly like. I like it, almost love it. Eric has a similar opinion, and I read his long post from start to finish talking about all these things in detail. I think he is really on the money here. So if you're looking for more reading into Claude Code, where it could fit within your workflow, definitely take a look at Eric's post. It just dives deep into everything you would ask, more on the engineering side. So feel free to give that a read.

Cool. Awesome. My bookmark is from Greg Isenberg, who, I think he has—okay, he has a podcast called "The Startup Ideas Podcast," and it's a great podcast. I've listened to it since I found his tweets. He does—to our points about, like, you know, the AI hype people on Twitter—he actually puts out a lot of really good, useful content. In fact, I found him because someone else was stealing his content. So the bookmark I have is he has a video introduction to vibe marketing. And it's really cool because, especially for an accountant who likes to code, neither of those things are marketing. So it's important to understand how to sell something, you know, and how to market and get in front of eyeballs. I'm sure you can relate with My Expenses. And so he kind of talks about using tools. I think he goes into n8n specifically, which is like a workflow and AI-enabled workflow tool. It's pretty cool. Maybe do a separate video on that. But he talks about how to use AI and these kind of sequential flows that you can build at n8n to generate marketing in a much easier and more frictionless way than the old school of generating content and manually scheduling things. And again, for someone like me, this is like, "Oh, this is great. I have no idea how these things exist." So it's pretty cool. And yeah, I follow him on Twitter. He's got a lot of other good stuff, too, that I've been enjoying. So definitely worth checking out.

Yeah. Marketing is hard. So if you're good at that, please reach out to me. I can use your help.

Yeah. There was some, um, some guy, I think it was Ken Griffin, who's like—I can't remember what he does. And he's like super rich. That's all I know. Citadel investments, I think. But I think someone—he was giving a presentation. He said, "The number one thing is the ability to sell. More than anything else, the ability to sell." You know, I was like, "Okay. Noted."

You seem to know a lot of money.

Yeah. I was like, "You seem to know what you're doing." Their billion-dollar condos in Miami. Worth knowing. It's good stuff to know. Cool.

I could always use some help on marketing. I think a long time ago in the podcast, you had mentioned a marketing book. I said I would read it, and I have not.

Oh yeah, *Building a StoryBrand*. Donald Miller. It's really good.

It reminds me of my Duolingo too. I need to get back on that train. Post-honeymoon, Brad is going for the mobile app, going for Claude Code Power User in the era of getting stuff done is what we're entering now.

Yeah, hang on one second. I got one more thing for you. Hang on. Okay, check out—I don't know if you can see it. It's a little notebook for the handwriting, the hanzi, I think.

Oh yeah, yeah. That's difficult.

It's quite difficult. I like it though. I like it. It's just—it's nice and, like, I haven't done anything in it yet. I just got it the other day. But, um, it's—it's cool writing those characters. But anyways, I digress.

When Duolingo presents you, like, the blank screen, it's a very complicated character. I'm like, "Is it this?" and it shows the line. I'm like, "Oh, it's that." And then it says, "Oh, that's wrong too."

Yeah. It's like, "I knew that. I knew that. I was just, you know, finger slip." Yeah. Yeah. Cool. Awesome.

That was a long one. I think we covered so much that hopefully folks enjoyed it. Again, I think if you're in the comments, please let us know. Like, really, we're trying to hone in on if you have unlimited software capabilities, AKA Claude Code in my mind, like, what would you build? Where do you think you'd find the value? I'd say, if you have the money to try it out, please try it out. Not only does it work on Laravel projects; it can work with existing code, new code. It's, in my opinion, the best way to write code today outside, you know, not using Cursor or Windsurf. So yeah, if you have any personal software interests, please let us know in the comments, like, how much value do you think you'd get out of Claude Code running 24/7? I think it's an interesting question. It's one I've been asking myself very frequently now that I have this new capability to have, quote-unquote, unlimited software written by an agent.

Yeah, I agree. Yeah, we're curious to see what people are doing and what kind of value they're getting out of it because it's exciting times.

It is. Awesome. All right. Well, good stuff, Brad. And we'll do it all again next time.

Cool. Sounds good. See you.

See you later.

Thank you for listening to the Breakeven Brothers podcast. If you enjoyed the episode, please leave us a five-star review on Spotify, Apple Podcasts, or wherever else you may be listening from. Also, be sure to subscribe to our show and YouTube channel so you never miss an episode. Thanks. Take care.

All views and opinions by Bradley and Bennett are solely their own and unaffiliated with any external parties.

Creators and Guests

Bennett Bernard
Host
Bennett Bernard
Mortgage Accounting & Finance at Zillow. Tweets about Mortgage Banking and random thoughts. My views are my own and have not been reviewed/approved by Zillow
Bradley Bernard
Host
Bradley Bernard
Coder, builder, mobile app developer, & aspiring creator. Software Engineer at @Snap working on the iOS app. Views expressed are my own.
Paying $200/month for Claude Code... here's why
Broadcast by