Claude Code fell off... did GPT-5 codex take the throne?
Download MP3[Theme Music]
Cool. Alrighty, we are live.
It's been a long time for episode 29. How are things going, Brad?
It's been exactly one month. We've had things happen, but it's going well. It's my one-year wedding anniversary today, September 28th.
Congratulations.
Had a great time. Thank you, thank you. Yeah, things are going well. How about you?
Good, good. Yeah, I just got back from Hawaii yesterday. I was there all last week, which was awesome. So that's part of the reason why you haven't heard from us in a little bit. But another reason, just to cover our bases for why we've been radio silent the last few weeks, is I got pretty sick. Unfortunately, nothing was too bad. I eventually, of course, did recover, but there were about two weeks there where I was really down and out. I was on an inhaler, a steroid, and an antibiotic. And I told Brad, "I just can't do anything." There were nights where I just couldn't sleep because I was coughing so much.
That's crazy. That's scary, honestly. I'm glad you're feeling better.
Yeah, yeah, I was nervous. Of course, we had the Hawaii trip coming up right before that, and I was just like, "I'm going to Hawaii whether I'm sick or not." But fortunately, I was good, did a lot of swimming, and a lot of stand-up paddleboarding. And yeah, it's all behind us, knock on wood, so it's been good. But that's why we've been MIA the last few weeks.
Yeah, and the sad part about being gone for one month is there's so much to catch up on that it's almost too much. So I've just been looking through the bookmarks and the messages that I send to Ben on a regular basis, and I was like, "Wow, one month is so long." Usually, we record these every two weeks, and I look through things and I'm like, "Yeah, there's a decent chunk." But on a month cadence, there's a lot to get through—lots of nuance, lots of conversations, lots of releases. And I think coming up in October will be a big month as well. Yeah, it's an exciting time right now with AI and just all the announcements coming out.
Yeah, and I feel like even when we do these normally every two weeks, you and I have talked before; it's like we could probably do that every week and there'd still be new releases to catch up on. So of course, a month of no programming means there are lots of things that we're probably not going to be able to address in this episode. But we can save them for future episodes and get caught up on things.
Cool. So one of the things that I was interested to see was the announcement from Anthropic about using Excel with Claude. For those that aren't in finance or accounting and aren't that familiar with Excel, Excel is spreadsheet software by Microsoft. It has been the software to use if you're in accounting and finance. It lets you see your numbers really well visually, you can build out really complicated financial models, you can do all sorts of things with it. And for a long time, the lack of being able to work with Excel has, I'd say, had accountants waiting on the sidelines in terms of using AI as much as they want to. But now with Claude introducing support for Excel, it's kind of opened that door a little bit. So the way that it works, I was watching a demo, and I thought it was really interesting. Basically, Claude provisions a virtual workspace or computer and it generates these Excel files that you can then download and open up on your computer.
So being able to work with Excel files in that chat interface is pretty cool. In one of the videos I saw, a YouTuber compared prompting Claude to make an Excel financial model and then doing the same thing with OpenAI. The results were pretty dramatically different. The Claude model had great formatting, and the information was presented in a really nice way. The OpenAI result was much more bare-bones. It didn't even really look good; the formatting was really off. So that was really interesting to me because I'm curious if that will foster more adoption or use by accountants in different workflows. I have my thoughts on that that we can dig into a little bit more. But did you see that news, or has that not even made it into the programming world? 'Cause I know you guys aren't big Excel users.
I'm honestly shocked that that hasn't existed. I think ChatGPT has been able to work with files like PDFs, images, and Excel for some time now. So when I saw the news, I thought, "Okay, that's Anthropic playing catch-up." And they do have really good models, so Anthropic kind of leads the charge on coding models. Their whole company ethos is if we create the best coding model, we'll create the best model for everything because everything is underpinned by systems thinking and programming. Even as you mentioned, with Excel, in the background, it's probably writing Python code, executing on Excel sheets, and generating output.
So I am surprised that it took that long. It seems maybe they handle it a bit more uniquely than other providers, where other providers maybe just handle it with a generic, "We'll take in any file." But maybe Claude has specific reasoning or capabilities that are fine-tuned for that model to handle Excel. I've used Google Sheets AI, and I personally found it pretty lackluster and not very useful. I couldn't do anything that I really wanted. I even think they came out with something where if you type in a formula in Excel, there's an "equals AI" where you can chat in something directly. That sounds like such a good marketing feature, but who is actually using that to power real sheets with real data? My guess is not many. So my two cents is, I think it's exciting that it's out. I'm surprised they're playing catch-up, but maybe they have a much tighter integration and better output. That can make a big difference, honestly. Working with your own data, that's a huge unlock.
Yeah, to me it's interesting, and I'm going to go off on a tangent here. Excel to me... First of all, I think some of what's causing it to play catch-up is on the Microsoft side. Like you said, you can always work with CSV files or Google Sheets, but you couldn't really work with native Excel files. And in this announcement, we also didn't touch on PowerPoint. So basically, I think that Anthropic has developed the ability to work more with the traditional Microsoft Office file types like Excel and PowerPoint, which is cool if you are really locked into the Microsoft ecosystem. But to me, I don't think there's a ton of value in the long run in Excel. I know I'm going to get pitchforks thrown at me by the accountants in the room.
I thought you mastered Excel, though. That just can't be true.
I was Excel's biggest fan. When I came out of school, that was what you needed to do. I bought a course where you build financial models and learn different formulas, and that was time well spent because it definitely paid off in my career. But now we're in an era where, does that really still make sense to do? The way I think about accounting work is if you have work that's a series of logic-driven steps you need to do every single time, doing that in an Excel file is a fool's errand in my opinion. You're much better off having a code solution or having an agent be able to do that as a tool.
But where I think this could be helpful is for certain work that accountants do that is more ad hoc. Say your boss comes to you and says, "Hey, I need you to go look at the top 10 vendors that we spend money on in general and administrative expenses." Okay, that's not something you do every single month; maybe that's something that just came up. That makes sense, and that's where something like this could be useful because you can throw it in Claude and say, "Hey, here's my raw data, make me an Excel file that really highlights what I'm after." Beautiful. I think that makes sense in the short term.
But even in the medium and long term—and this is my tin foil hat on a little bit, Brad—it's going to be agent-to-agent at some point, right? The agents aren't going to work with Excel files; they're going to work with just regular raw data. And so, is this really that cool? It's cool, but is it really groundbreaking? In my opinion, no. I think it's a Band-Aid for any knowledge worker that uses Excel a lot. I think it's a stopgap until there's a bit more of an agent-to-agent interface that we can plug into and have them do the work for us. They're not going to be sharing Excel files like we humans do.
It sounds like we might need to cancel the World Excel Championship along with this tin foil hat of yours. Maybe it's not useful anymore and we move on to programming being the end-all, be-all.
Well, that's a hobby. The people that do that, that's a hobby for them, and more power to them.
Did you qualify? I can't remember.
I haven't actually tried, but I think I could have. I had all the hotkeys down. Alt+H+O+I is to auto-align the columns; Alt+E+S+V is paste special. I was an Excel power user. But then you've got to look at the writing on the wall and go, "Why would I do this myself?" You know, think about *The Matrix* where Morpheus gives the pill to Neo. It's like Pandas was me taking that pill, and I was like, "Wait, I don't need to do this anymore."
Python Pandas, you heard it here first.
Yeah, that book *Automate the Boring Stuff* by Al Sweigart and books like *Python for Finance* by Yves Hilpisch changed me in 2016-2017. And now here we are where, like you said, when you give AI a prompt to do something in Excel, it's really just writing a Python script to make the Excel file for you.
Behind the scenes, definitely. Sometimes it shows you, sometimes it doesn't, but the logic, the output, and the parsing are all done through a sandbox coding environment that it does not expose to you most of the time.
Yeah, and so it's doing that extra step for our benefit, for humans, because we want that. But AI is not going to care about that. It's agent-to-agent; it's just going to share the data around in whatever format, like JSON. It's like, we're going to look back and be like, "Who cares about Excel?" That's my hot take. It might take a little bit of time, but with how fast things move with AI, I think it's going to be: take whatever you think and take two or three years off of that, because yeah, that's going to be a big change for people, especially in finance.
I think recently, people have been talking about Dario, the Anthropic CEO, chatting about the length of tasks that AI can do autonomously. As we've ramped up on the coding agents, we're getting maybe one to three hours maximum of high-quality output that an AI can triage and do on its own. Part of that is models getting better, tools getting better, and chain-of-thought delivering much better results. But I think recently their CEO has mentioned that it's on the horizon of days. So maybe by the end of 2026, they've mentioned AI will be able to run for days unsupervised with high-quality output.
To me, that's huge, because we only have so much attention and we can only do so much as humans. If you've heard of deep work, you can do three or four hours of really good, high-quality work. After that, until you go to bed, you're not getting as much quality work. It's challenging. But having an AI run autonomously for multiple days—that's insane. It's an exciting future. But their CEO definitely pushes the boundary in terms of headlines. He's talked many times about the workforce being gone, AI automation to the moon, when Anthropic is doubling down on hiring.
So it paints a weird picture, but I think when you mention agent-to-agent, it brings up a point in my head that we're getting complicated, sophisticated AI systems that are hitting expert-level task responses. And we're getting autonomous agents that are running for a longer duration with higher trust. So do we need that handoff of Excel files, or are we going to be able to just orchestrate the system from end to end? It might create an Excel file on the way or pass off data, but maybe it doesn't need to do that at all. The final output could be Excel, but that intermediate layer is just you providing the credentials to these systems and it figuring out what to do with that.
So it's an exciting feature. I'm very ready for more hands-on AI to do things that I don't want to do so I can do the things that I actually want to do. And this whole Excel announcement brings me closer to that. I don't know how to use Excel that well, I'll be honest. I'm a programmer and good at that part of the job, but surprisingly I thought Excel was kind of simple. Then when I want to do a few things, I'm like, "I don't know exactly what I want to do here." I know what I want it to look like, but I don't know what buttons I need to click to get there. I've used ChatGPT before to be like, "Hey, create a chart, create a table," and to me, it feels kind of lazy. I have this gut reaction of like, "I can't do this; this seems so easy." But in the future, a lot of this will be generated through AI, whether that's Sheets or Excel. I do think it's a valuable skill, kind of like programming underpinning it all, but at some point, you can get pretty far without knowing it. So I applaud your tinfoil conspiracy that it's not going to be used anymore, or at least to a degree that's much lesser than today. But it is interesting to see what that long-term horizon might look like.
Well I think too, just breaking down accounting, we can see where Excel still fits. You have transactions that need to be coded a certain way, like your bank feeds. If you've used personal finance apps, a lot of times once you've coded something a certain way, the app will suggest that going forward. That machine learning isn't new, and most modern GL systems that accountants use have that feature. If they don't, they're just behind and you shouldn't use them. So category and transaction matching is already pretty much baked in.
Then if you go into a month-end close where you need to perform a series of journal entries, that can be automated. On my own YouTube channel, Augmented Accounting—shout-out, self-plug—I have videos where you can code those workflows to do those month-end journal entries for you. Even if you don't want to code it yourself, providers like Zapier all provide this kind of low-code or no-code agent builder where you can do these things yourself. So that series of steps you do every single month-end can be automated without Excel now. We've moved past that.
So really all you have left are those ad hoc requests, which is good; you can spend more time on that. But you have a lot more time on your hands than before because the transaction matching and journal entries—all that manual work—can be done automatically by an agent. So that is going to shift something. There is going to be an impact from that. And also too, it's like, well, does that change the going rate for someone in this profession? If you are now focused on these ad hoc things, does that necessitate some kind of change in pay or skill set? There are so many questions you can branch off from those truths. A lot of this work, and this is true for a lot of white-collar knowledge work, can be done by an agent. So how does that change things? It's a really interesting question. I don't agree with the headline stuff done for hype; there's some self-interest there from the Anthropic team for sure. But I don't think it's complete fabrication either. It's a really interesting question at a minimum.
I mean, it kind of touches on the software engineering impact. Being a new grad in 2025 with a computer science degree, the landscape is extremely different than when I graduated. It was easy to find a job, you didn't have to fight for interviews, you weren't fighting against AI because it didn't really exist in the same capacity. I do wonder if it'll affect others the same way, where AI gets to this baseline productivity and that becomes the standard. If you're not above that, you're not necessarily hirable, which is a sad reality.
But it does open up opportunities where you might not spend your time in Excel, but you spend your time on higher-level thinking or system design. You create business value while AI can do the part that is still valuable, but valuable in a different way. And that's where software engineering comes in. I'm chatting with AI all the time, getting stuff done through AI. That makes me more productive, but I'm focusing on a different part of programming than I was two or three years ago. I think it's this long-term horizon shift where programming is extremely aligned with LLMs, and it makes it quite easy to get things done. I think as Excel becomes a first-class citizen within Claude, that creates more of an opportunity for it to be revolutionized by AI.
I think they did a good job with this release and probably thought a lot about how to package it up. Anthropic as a company is very good at having high-quality stuff, even if it takes a while. Their latest release, Opus 4.1, is really good. I'll touch on some problems later, but overall, OpenAI ships faster, has more people, and does more things widely. I think Anthropic as a company focuses on this core developer-plus-other-things. And that "plus other things" now includes Excel, which is exciting. I'd be interested to see how this evolves over time, if there's some elimination of, like, junior accountants, for example, based on this AI understanding of how to wield Excel.
Yeah. Well, I'll just say this, as I mentioned on my other YouTube channel, Augmented Accounting, I don't think it's fearmongering to acknowledge that you'll probably be able to have fewer people do more work. That just seems like a very logical conclusion, no matter what line of knowledge work you're in. Now, to what scale is up for debate, but at a minimum...
Well actually, one thing that just reminded me of that, sorry for interrupting.
No, you're good.
I think the Amazon CEO or CFO mentioned the exact same quote about two weeks ago. He said we have X headcount, and over the next five years, AI is coming and we're going to do more with fewer people. We don't know what that looks like, but we know it's coming. And I think that was a really telling headline. People know it's powerful; we just don't know what's going to happen. Excel popping up could be an opportunity; programming getting better eliminates entry-level programmers. It's just a harder world. And I think that will continue down the line with these AI capabilities. The whole business landscape is very focused on efficiency, and AI plays to that to a large degree. So yeah, the underlying theme is "do more with less," and AI is sitting there powering that whole idea.
Yeah, exactly. And I think how far you want to take that scale is up for debate, but it seems like that is an accepted conclusion. But as I mentioned on my other channel, that doesn't mean you need to be fearful. It means you need to lean into certain areas. I think if you can build out some kind of automation or work with AI much better than the next person, that's where you can dig in and still be valuable. Businesses are going to change. Say there's an automation that works right now, but then the business adds a new product line. Someone's going to need to change or update that automation. So there will be new responsibilities that come.
I think leaning into being able to build with these tools, not just be a consumer, is important. I think also leaning into the social skills aspect of it is important because you can always be better. You don't have to be in customer service to think about customer service. As an accounting professional, my customers are the teams I work with, the external stakeholders. I need to make sure I'm communicating clearly and delivering what needs to be delivered on time. There's a human element that you can lean into as well. So that's what I would tell people if they're in school now and have this anxiety about job elimination. I would say lean into being able to build with these tools, whether it's code or low-code like Zapier. I don't think you can go wrong either way. But also, just focus on being more personable and making those connections with people, because I don't think that will ever go away. You know, the whole talk about AI chatbots for customer service... when you pick up the phone and call Hawaiian Airlines and it's an AI, everyone immediately just gets turned off by that, I think. And I don't think that's going to stick around as the norm where you just talk to robots all day. The human element will still always be there, in my opinion.
Yeah, I'd agree. I think the biggest part of the AI workforce shift is building with it and using it. One is using it on a frequent basis to get comfortable with it. If you can build on top of it, even better. And I think before both of those is having the mindset to be open to it, because there are lots of folks who see it as a threat and don't want to partake. They feel this existential dread that it's going to do a lot of harm. And you know, with new things, they can do bad things that we don't expect. But in the general consensus, being adaptable, shifting, listening, learning, and being aware that things are coming is the first step. The second step is to use it, build with it, and start leading some of that stuff.
In my day job, as I use AI, I frequently show people, "This is how I got the output from AI based on this chat." Oftentimes, people I work with are surprised at the way I talk to AI because it's so thorough, detailed, and technical. I've continuously refined how I've approached and conversed with AI to find the maximum output. By no means am I the top AI prompter, but I found something that works. There could be many things that work better, but as you get familiar with things, you realize what works and what doesn't. So I'd encourage people to just get more familiar and open with the tools. There are so many tools that pop out there, so many different chatbots. Just try things. It might suck, it might not. Don't give up on the first try. It's really easy to say, "Oh, everyone's telling me AI is good," then try something like Google Sheets AI, and it sucks. Yeah, I think it sucks. But if you tell me about Claude Code or other things like Cursor, which write programming code for me, those are great. There are different categories that do different things, so definitely keep an open mind and try to step into that uncomfortable space of AI tooling. I think that gives you the most growth and positions you pretty well for the 2025 AI-driven workforce surge.
Yeah, no, I agree. And I think people sometimes have an apprehension to sharing things. There's a bit of gatekeeping that sometimes goes on. But in this world, I think you've just got to share. If anything, put a light stamp on it in terms of having your name be like, "Oh, Brad showed me this." It's not that you need credit, but it's just that people know you are someone who is open to sharing and they can go to you if they need help. That's just a human-forward kind of thing to do. So yeah, I agree on all counts.
One quick last Excel thing, because I know we talked about Excel a lot. This was interesting, and it's from thedecoder.com, which I've never really gotten news from before, but also from thebulletin.com. Two different websites are reporting this, but Microsoft is reportedly allowing users of its Copilot AI solution to pick Anthropic models instead of the default OpenAI. Which to me was really interesting because Microsoft has this relationship with OpenAI where they are like investors or partners. I don't know what it's called technically, maybe you know better than I do. But the fact that Microsoft is allowing Anthropic models to be chosen if someone's using Copilot... again, maybe in Excel, they can do the "equals AI." Instead of it hitting the OpenAI model, it hits Opus 4.1 or whatever. That to me is really interesting. You know, what's going on there? Because it seems like an anti-beneficial move for that OpenAI partnership, and we've talked about rumblings there in the past. What do you make of that? What are your thoughts on that?
I've heard lots of different things about the whole partnership and how it's evolved. I think the underpinning theme on that decision is Anthropic has really good coding models, so it's hard to look past that. I do think OpenAI caught up quite significantly with GPT-5 and GPT-5 Codex, which is their latest model. But I think Microsoft was like, "Hey, I want intelligence." I have this great tool, but it needs the great underpinning technology of a model. I think Anthropic was leading for three to four months on "this is the best model that writes code." Therefore, when you're creating a coding tool, it's hard to say, "Oh, I'm tied to OpenAI, but they don't have the best coding model." Therefore, I need to look elsewhere.
I think with that partnership alone, they invested a billion dollars in OpenAI for a certain percentage. But OpenAI has gotten so much more funding that I think people were speculating that that is completely drowned out and not that useful anymore. It's super diluted, so it's not that big of a deal anymore. I don't know if this is fully true; this is just what I was reading online. But to me, it seems like the best way to be productive in the AI landscape is to have a harness that can swap out any model because the competition is neck and neck. We have Google swinging big, we have Anthropic releasing consistent model updates, we have OpenAI who does an extreme amount of marketing and sometimes delivers. They sometimes deliver, so I can't put them outside the top three. But the general theme is to make your AI tool as easy to hot-swap a model as possible because they are seriously very close, and a month or two later, something else takes the lead. So the Microsoft-OpenAI one was a great initial investment. I think OpenAI has ballooned past any evaluation that was then the case, and therefore it's not that big of a deal anymore, if I understand correctly. But again, I could be wrong about this.
Yeah, one more thing on that, then we can move on. The article mentions, and I'll paraphrase here, that the industry has assumed deep partnerships like the one between Microsoft and OpenAI would lead to exclusive integrations. However, Microsoft's move validates a best-of-breed or multi-model strategy, because the rationale is clear. Microsoft has invested over $13 billion in OpenAI and has every incentive to promote its models on its own Azure cloud, which OpenAI runs on. The article goes on to say the only logical reason to incur the financial cost and strategic awkwardness of paying a cloud competitor, AWS, which runs the Claude models, with its most profitable product, Office 365, is that the performance difference on critical tasks—specifically automating financial functions in Excel—is too significant to ignore.
So it's a long-winded way of saying exactly what you said. They're saying even though we have this partnership with OpenAI, even though that runs on our cloud services, we are willing to make a deal with our competitor AWS to have that better model. And again, I think it's also because Copilot is kind of not used that much in my experience, so they're trying to fast-forward that. So it's interesting the tradeoffs that companies will make to get there, and will that bite them later? It remains to be seen.
Yeah, I think they just came out with a CLI, too. I think it was GitHub Copilot CLI. So the CLI war has ramped up to full competition. I think they just released one like a day or two ago; it's really recent. But there have been so many CLIs. And speaking of CLIs, there's been a ton of conversation on Claude Code.
Bring us back to May. May was the golden time for Claude Code. And you heard it here first on the podcast; I picked it up in May and I thought, "Holy shit, this is insanely good." I thought, "This is going to change everything." Fast-forward to June, I was tweeting, "You've got to try Claude Code." It worked really well. Honestly, the results were incredible.
Then came July and August, which was a troublesome period for Anthropic. Everybody had picked it up. It was being rate-limited, throttled, you name it. They came out with an email saying on August 28th, "We're going to have limits" because people were running like 10 Claude Code instances at once and being abusive. Long story short is, they had poor model performance in July because the demand was high, and the demand was high because people were going ham on it because it was good.
Then there was a lot of speculation in September saying the model is acting poorly. Like what I got in May or June is not the same as September. Anthropic never acknowledged it; it was kind of hush-hush. About two weeks ago, they came out with a postmortem, which in engineering terms is like, "This is what happened, this is why it happened, these are our mistakes, and this is how we're going to resolve it." They said, "Oh, we had roughly a one-month period of degraded model performance," and they outlined the problem space and what they did to fix it.
It was very interesting because there were people raising pitchforks on Twitter saying, "Claude Code sucks." I never dipped that deeply into it being really bad. I think if I were to describe it, in May when I first picked it up: 9 out of 10. Today, or the past month or two, maybe like a 6.5 to a 7.5. So there's definitely a loss, but it wasn't significant enough for me to be like, "This sucks, I'm going to ditch it." But it was great that they came out with this postmortem because it acknowledged there was a dip in quality.
The reason I rated it only a little bit lower is because the people it affected were choosing not the most intelligent model. When you open up Claude Code, you can choose Opus, which is their most intelligent, or Sonnet. They mentioned that 17% of customers were affected during this two-to-three-week window where they had deployed a misconfigured Sonnet. I never used that model, so I didn't encounter it, but I still felt like there was some performance degradation.
So yeah, it's been a bumpy road for Anthropic recently. They had Opus 4.1 come out, which is great. The CLI Claude Code has still been doing well, but on Twitter, there's an extreme amount of people switching to Codex, which is OpenAI's equivalent CLI. I've given that a go and I think it's pretty damn good. I think Codex does a really good job at thinking, but it still has some areas to grow. I kind of bounce back and forth between Claude Code and Codex, but have you seen any of the Twitter hype on the Codex switch?
No. And it was interesting to me because I thought that Codex came out first. I can't remember if that's true, but I thought Codex was an announcement that came out early on. This is more suited for developers, so this wasn't something that I really bit my teeth into, but I remember it being part of the OpenAI announcement. I thought that happened before Claude Code. So when you said you were checking out Codex and wanted to talk about it, I was really surprised because obviously, you're a big Claude Code proponent. We didn't really dive into it when Codex came out. So what's changed with Codex? Is it more that Claude Code is underperforming, or are there changes in Codex that have made you see it as a better alternative?
Yeah, I think it's a multitude of things. I think Claude getting worse opened the door for opportunities. One, it got worse, and two, it added limits. So those are the two things that have shifted significantly since its golden period of late May, early June. Two, OpenAI is spending a lot more time on Codex. And I think you're right; the confusion in terms of Codex is that there's Codex CLI, which is the Claude Code equivalent, and then there's Codex. And if I understand correctly, Codex is this product that allows you to open ChatGPT as an app and ask AI to do something on your codebase. Codex is the functionality in which it has a replicated environment of your code on OpenAI's servers, and you're able to chat with that codebase and get it to create changes for you on the fly.
So OpenAI is horrible at naming things, sadly. They have Codex, they have Codex CLI, and they have Codex as a model. Like quite literally, they have their models, and they have Codex as a model. So all that to say, I think you're right, Codex came out first, but it was a product, whereas what we're talking about now is the CLI equivalent to Claude Code.
One of the things that changed recently is OpenAI is spending a lot more time here, so they release updates maybe once a week, which they didn't before. And two, they came out with GPT-5 Codex as a model. Originally, you used GPT-5 in Codex CLI. So bear with me with the terminology, but GPT-5 on Codex was extremely slow. GPT-5 was a good model, but it was hard to prompt. When you go to Claude Code and ask it to do something, it can figure out what you want to do. You can have a poor explanation, but Claude Code will be like, "I kind of get what you're trying to say, I'm going to go do that." With Codex, you say, "Fix this or fix that," it does exactly what you say in a very precise manner and doesn't get the overall feeling as much.
Then they released GPT-5 Codex, a new model, and improved the CLI tool. This new model has a flexible thinking strategy. This is really important because GPT-5 was so slow it was unusable. It was extremely good, but unusable because the iteration speed was terrible. So GPT-5 Codex came out, and the big thing there is it thinks dynamically. The reason that's important for a developer is sometimes you ask for a large change from AI and it needs to do a lot of planning, thinking, and understanding. If you imagine you're a new hire, you need to go understand all the systems you're working with and then say, "Okay, now I can make that change." Other times, you could be a new hire and be told, "Hey, can you go update this to-do list and cross off three items?" You don't need to know the system; it's obvious.
That parallel I like to draw to GPT-5 Codex, where some problems in programming take a lot of effort and understanding. Other problems are very straightforward and should not take five minutes for an AI to do. GPT-5 Codex was that unlock for OpenAI to say, "Is this task difficult? Do I need to think a lot to get the right output? If so, I'm going to spend that time. If not, I'm going to breeze through this task." So GPT-5 Codex gets you to a world where you leverage GPT-5, which is really good at doing things, but you're not paying the price of waiting five minutes for a simple change. I think that, together with Anthropic being slow and having degraded performance and limits, and OpenAI releasing a model that's dynamic and spending more time on Codex CLI itself, has created a world in which we're on the fence. Honestly, I'm toying between both of them. I'm going to a Claude Code meetup in roughly a month from now in SF to talk to people, but it's a weird world where Claude was killing it and they have just butchered their lead and really lost developer faith. I think there's a new release coming in the next week or so. We've heard about Sonnet 4.5, but it's a weird time. I want it to be a time where there's good competition, but I truly love Claude Code. It was great. Now we're at this spot where I want to get the best output, but I'm not sure what tool does the best job. So honestly, it's a little challenging.
Yeah, and we talked earlier before we started recording about the Codex and Claude Code thing, and I did some quick web research. A couple of things that came out recently were worth mentioning. Like you said, the Codex 5 model represents a major leap forward. I guess it's trained on complete repos versus isolated lines of code. In this article, the person is saying that lets you delegate more to it and have it be more end-to-end, whereas Claude Code is feeling like a collaborator. With Codex, you can delegate and have it be more autonomous. So that was one thing. And the other thing was that GPT-5 Codex uses way fewer tokens. It's way more token-efficient.
Oh, yeah. Yeah.
It says, and this is their quote and I'll link the article, "using approximately 90% fewer tokens than Opus 4.1 in completing tasks more quickly." It does say, however, Claude models still excel at detailed, high-fidelity tasks and complex multi-step operations. So just to what you had said, it's way more efficient and fast, maybe a better general-purpose coding model, but where you need that special knowledge or a really complex task, the Opus model is still the one to use.
Yeah, and I think GPT-5 Codex, the new model with Codex CLI, that is the first iteration in which you can give it a task that will run for hours. I remember spending time on something, I had multiple tabs open, I asked Codex to do something, and I forgot that it was still running. Sometimes it takes too long, to be completely honest. Still not as fast as I'd like, but it does the job. I came back and it had done all these things to get my React Native app to build. So much so that I looked through the chat history and just kept scrolling and scrolling because it had basically thought by itself, read a bunch of files, and continued down a rabbit hole that I would never have spent that much time on. So I think it's really good at uncovering these really deep-rooted issues and figuring out, "I'm going to keep going."
Whereas I think Claude has a certain limit where it gets far enough and then it says, "You know what, I think I know enough." That confidence is good sometimes, but other times when you have a really hard question, it's impossible for Claude to figure it out. For example, I had upgraded one of the dependencies in my React Native app, and unfortunately, the dependency situation is a nightmare. You're a layer away from native code, which on Apple platforms is Swift or Objective-C, and you're writing JavaScript. When you upgrade these dependencies, it potentially introduces a lot of errors. I had upgraded maybe 40 dependencies in my app for Split My Expenses, and one was breaking, but I didn't know why because the error message was generic.
So I sent it to Codex and said, "Hey, I upgraded these 40 dependencies, here's my error message. Good luck," because I was lost. I was Googling it and couldn't figure it out. Then it came to the conclusion that one of the packages had some update based on this complex JavaScript engine that React Native runs. "This is the issue." So I opened an issue on GitHub. I went to the owner of the package and said, "Hey, I upgraded my dependencies. Codex says it's your package that's breaking. To be honest, I'm not sure. I don't know if this is actually correct, but if this is correct, could you fix this?" About a week later, another engineer came to my issue post on GitHub and was like, "Hey, I was encountering the same thing, and I think Codex's output is actually correct." The way it described the issue and how to fix it was completely correct. And again, this is very in-the-weeds technical, which is why I couldn't even believe it myself. I was like, "Does it really know this much?" I couldn't even fact-check it.
So long story short, this guy had come up with a fix to improve it based off what Codex had told me initially. What it told me initially, I took with a grain of salt because it had spent 30 minutes trying to figure it out, and I was like, "There's no way this is right," but I'm going to post it anyway. And that was a great example of Codex going really deep, where I think Claude is a little bit lazy and confident, and that produces different results. So at the end of the day, we're in this AI CLI race. We've been there for a while, but it's really gotten to the peak of, "Do I choose Claude Code and suffer with its downsides, or do I choose Codex CLI and suffer with its other set of downsides?" I'm hoping in the next month or two as we wrap up the year, we get in a position where there's a clear dominant winner and I don't have to flex between both, since it's a little bit painful to be honest.
I mean, you know how I feel about JavaScript, and what you said about React Native just reaffirms that JavaScript is the worst and it should go away. One thing that's funny on the Codex CLI, and again I was reading this, I didn't know this before we wanted to talk about it. I guess OpenAI made it open source, which is interesting. That is a newer development. Previously it wasn't open source, and it was open-sourced in April 2024 under the Apache 2.0 license, if you know what that is. So they said they did that to encourage contributions and build a vibrant community and give people a high degree of customizability. I'm not sure if you've customized yours at all.
I have not.
Yeah, I thought that was interesting, I didn't know that. Then the other thing it mentions, and again, this is all coming together against JavaScript, which I love. The Codex CLI was transitioned from a Node.js implementation to native Rust.
Oh, I did hear about that.
Yeah, this rewrite removes the Node.js dependency. Bravo. Node.js sucks all around, everyone knows it. It streamlines installation and improves performance with faster startup times and lower memory usage. I don't know if you can quantify how much of the performance improvements are specifically due to that transition, but anytime you can get away from JavaScript, it's a win. You know what I mean? Everyone agrees on that, so yeah.
Yeah, honestly, it's going to be a crazy 2025. I was thinking, what if we do one mini bingo card, for example, for what you think is going to be the most exciting release from now, September 28th, to the end of the year? Because if you remember last year, we definitely had a busy period in December being a ship-a-thon for AI companies. Google specifically has a dedicated AI week where they hold a bunch of releases. If there's anything I've learned about Google and OpenAI, if Google's releasing something, so is OpenAI because they want to steal the thunder. And Claude and Anthropic are just sitting there consistently pushing updates for the most part, but we haven't seen anything from them in a while.
So from now until the end of the year, I guess I'll kick it off. I would love to see Gemini come back to the top two coding models. I'll be honest, when Gemini 2.5 Pro came out, everyone was going crazy, "Oh my God, so good, so good." And then Opus came, Claude Code came, and I've never touched Gemini again. I think one of the problems they run into is Gemini is a weird model. If you ask it to write code and it fails, it tells you it's sorry and has this weird, deep understanding of feeling sorry for itself and saying it's not worthy. You've probably seen that; we've talked about it on the podcast. But long story short, I would love it if Google could come back swinging on the coding model front because their models are extremely good for general-purpose tasks. 2.5 Flash, 2.5 Flash Lite are extremely cost-effective and good. But as engineers, as programmers, as someone who wants to work with Excel, we're thinking coding models all the way through, but Gemini 2.5 Pro is unfortunately not the best coding model. So I'm going to put one in Google's camp and say I'm hoping Gemini 3.0 Pro, their next iteration, really brings them back into the top two. Again, top two being OpenAI and Anthropic. I think that would be a sweet deal for Google to have all these amazing compute resources and build something exceptional.
Okay. You know, I'm a big Google fan, so I take some slight offense that you're pooh-poohing Gemini. It's still good. Don't hate on it.
It's good, not great.
We haven't touched on Imagen, their imaging model.
That was, yeah.
Yeah, that's been a big, huge thing. I don't know if it's the best model, but I've seen that everywhere. But um, okay, so my end-of-year prediction in terms of AI. I think we're going to have a massive security breach that's going to involve AI. Part of what has me thinking about this, and I'll tie this into my bookmark, is I think there was a post by someone who showed how the Google Drive MCP server or Gmail MCP server could be really easily tricked into giving up your email inbox. I'll link the video and the X post because the person deserves the credit and they really explain how it all works. But I remember watching the video and being like, "Wow, that is really sneaky that someone can do that."
I think, of course, all these models are worried about security and they're constantly developing that, but there's such a push to get things out that I think sometimes things get sacrificed, like security or quality checks. So there's that push to get things out, and also there's such a push on users to adopt and use these tools, right? And so you just have this anxiousness and FOMO of not using these tools. People need to stop and make sure they're using them correctly and be mindful of what they're granting access to and what these connectors are really doing. Because yeah, that was really interesting to see that AI agent get all the email information from that person's inbox. Do you remember which X post I'm talking about?
Yeah, that was ChatGPT turning on MCP, I think.
Yeah, it was an MCP-related thing.
So that's a segue into our bookmarks; that's my bookmark too.
Yeah, because that was crazy.
Yeah, I think it was ChatGPT. You have to go into the settings of the app, go to developer mode, which allows you to add MCP servers. Then you can add Google's. I think if you were asking AI to look at your calendar, and someone sent you a calendar invite with a malformed or attacking prompt, it would then take over and be able to read your data in some fashion. Which is crazy. I mean, it's almost like you're unaware you have this calendar invite, but it's able to do so much harm. So yeah, pretty crazy.
I think my bookmark, I was just looking through my list, I have quite a lot. One that I find really interesting is there's now a Chrome DevTools MCP server. That means you can connect with Chrome, get it to do browser automation, get it to do AI debugging. I think there's been a large influx of how you can make frameworks and applications more AI-focused. For example, the Laravel framework, which I'm a big fan of, came out with AI-assisted editing in which it modifies your Claude Code system prompt to be Laravel-focused. I think this would be a good step in that direction where when you use Laravel, you're writing a web app for the most part. When you create a web app, you need to test things where you're clicking on form elements, submitting forms, etc. It sounds like they could potentially integrate with the Chrome DevTools MCP and build out some way for you to say, "Can you build this feature," which is like a web form, "and then can you actually test it with Chrome in a simple manner?" It'll spin up a browser, click on things you just created, then AI can verify end-to-end that it worked. That's the whole dream—how you can close that feedback loop and say, "AI built it, tested it, and committed it." That is the end-all, be-all. So, pretty cool release from Google, I'm glad they're moving fast. Another hat tip to Google. I think they're really coming out of this phase of stagnant development and pushing the frontier of AI and spending a lot of resources there, so I'm really hoping they do well.
Mhm. Yeah. Cool, awesome. Well, let's wrap it up there, Brad. Good stuff. Finally, glad to be back and yeah, we'll do this all again next time.
Yep, sounds good. Awesome.
See ya.
See ya.
[Outro Music]
Thank you for listening to the Breakeven Brothers podcast. If you enjoyed the episode, please leave us a five-star review on Spotify, Apple Podcasts, or wherever else you may be listening from. Also, be sure to subscribe to our show and YouTube channel so you never miss an episode. Thanks and take care.
All views and opinions by Bradley and Bennett are solely their own and unaffiliated with any external parties.
