Codex maxxing and the AI-native workplace
Download MP3[00:00] Speaker: All right, we are live and back. We had some technical difficulties, but we are recording.
[00:13] Ben: We are good. It’s a Monday. How’s it going, Brad?
[00:17] Brad: It’s going freaking good. I just got back from North Carolina. Had some Cook Out out there. They have this incredible mint Oreo shake—literally to die for. It might sound weird, but kind of their fast food out there in North Carolina. So yeah.
[00:37] Ben: Hey, the mint Oreo shake is the fast food there? Expand on that.
[00:42] Brad: Cook Out is the restaurant, and they’re known for their shakes. They have a mint Oreo shake that is chef’s kiss. If you’re ever in the realm of Cook Out, you don’t need to get their food, but the mint Oreo shake—
[00:56] Ben: See, okay, I thought you went to a cookout and got some good homemade food.
[00:59] Brad: No, no, no, no.
[01:01] Ben: It’s a restaurant.
[01:02] Brad: This is fast food. Yeah. Yeah.
[01:04] Ben: Okay. That’s a little letdown, but yeah. Not much. I just got back from a work trip up in the Pacific Northwest. That was great. Got out of the heat a little bit.
[01:17] Ben: And then, just kind of AI-related—and we’ll talk about it a little bit more—but I’ve been dabbling in the VPS world of setting up your Codex on your VPS and not on your local machine, and doing the whole SSH business. So I have some thoughts as a non-technical person trying to navigate that a little bit. There were some struggles. I was on the struggle bus a little bit, but it was also really cool. So I get the appeal for it. And it’s funny because right when I got it working, something got announced, which we’ll talk about. But I was like, “Oh.” And I kind of knew that was going to happen, but we’ll get into it. We’ll tease it like that.
[01:55] Ben: Yeah. So Brad, you were mentioning that you saw some article that you thought would be great to kick us off with. What was that?
[02:06] Brad: Yep. So I saw an article from Jason, who actually works at OpenAI. He was describing kind of “AI native” from what he called “Codex maxing,” but I kind of think of it as using AI tools evolving from the chat interface to the agentic, long-term interface.
[02:21] Brad: Pretty interesting article. I can link it in the comments, but I think to explain it without going too deep in the weeds—because it’s a pretty long article talking about really cool stuff—I think AI has evolved from a ChatGPT single conversation to now Jason outlining Codex’s capabilities, whether that’s long threads, voice input, remote control that recently got released, heartbeats, automations. All these bits and pieces feel like they’re adding up to be a different workflow. And that’s kind of what I would coin as “AI native.”
[02:59] Brad: So how do we have this shift from the original: you talk to an AI chatbot, you get your answer, and you move on, versus these long-running threads that kind of encapsulate a bit more? You can almost think of it as like an always-on teammate, something that is there and has memory. One of the key points that Jason talks about in the article is: how do you store memory across chats? How do you link that up? Very, very interesting topic.
[03:21] Brad: We even have computer use now. So like we talked about last episode, very, very powerful stuff. Even when I joined OpenAI, one of the first mantras was, “Use Codex for everything,” and “Start with Codex.”
[03:31] Brad: And I think that part I would love to pick your brain on here, your thoughts on that, because to me, it feels like we have the bits and pieces there. Now it kind of takes our manual effort to coordinate and orchestrate this system to get those pieces to work together. But it changes the game because now you have a long-term spot to do, you know, memories, to have long-running tasks. There’s this kind of mental shift where Codex is cool, but now Codex has this different life.
[04:05] Brad: And honestly, Claude Code probably quite similar in that regard, but there’s a different shift. It’s not like I ask a question, I get an answer. It’s now like I have a goal. The AI app has memories. It has tools. It’s just a whole lot more than that. And once you get all those unlocks, we become AI native, where the first thing I get, I throw it out of chat. I organize all my stuff. I spend all this time to get my AI agent in a good spot so that I can work well over time and feed into other systems.
[04:38] Brad: And I don’t have the answer for what that looks like, but I see the world trending in that direction where I kick off things, I do things in parallel, and I give it the systems to be successful, almost like another employee, so to speak.
[04:50] Ben: Yeah, that makes a lot of sense. Because I think we’ve talked before on the podcast about IndieDevDan as a YouTube channel, and he’s explained it better than anyone that I’ve seen—or at least it’s resonated with me the most—is the “core four.” It’s talking about model, like picking the brain that you use. In your case, that’s 5.5 extra-high reasoning probably. And then the prompt being: what’s the action that you want this agent to take? Context is all the information that you need to know to do your job, or whatever the task is. It’s that background, that domain knowledge. And of course, tools is: how can the agent go about and do what you want it to on your behalf?
[05:33] Ben: And I think when I think about that, especially from my background in accounting, everyone’s kind of likening this to an intern. Like, okay, you’re going to train an intern. How do you go about doing that? Well, first you have a job description up, and you need to solicit the right kind of skills to have someone that can do your job. An engineering intern needs to have an engineering background. They need to have the right education and the right set of skills. Same as accounting.
[05:56] Ben: And so in my mind, that’s context. That’s where you kind of say, “This is everything you need to know about accounting and what I need you to do,” and same thing for engineering. So to me, it kind of goes to show that one of the first places people are going to go when they need help or need something done is AI. If you’re a team and you’re at capacity and you need to get some extra help, you’re going to first look to AI before you go to hire somebody now.
[06:30] Ben: And so I think it’s that kind of thinking of, two years ago—I mean, at least two years ago—that wasn’t even a thing because ChatGPT was here, Claude was here, but it was still in the browser. Aside from libraries like LangChain or building out with the agents SDK, it wasn’t as commonplace as Codex and Claude Code are.
[06:56] Ben: And I’ve seen a huge shift in my circles when people discover these apps that they can work with files on their computer versus having to be stuck in the browser. That’s been such a huge unlock for so many people, and I think their awareness of how powerful these tools can be. And so, yeah, now people’s first thought, I would say, whenever they need to get something done is, “Hey, how can I have AI do this?” I don’t know if that’s what Jason was referring to with AI native, but that’s kind of where I interpret that post.
[07:26] Brad: Yeah, I think part of that is we originally tried to write the best prompt and fire that off and get a single response back. So it was very much like prompt engineering back in the day. To me, I feel like Jason is painting a clear picture of AI native as building a really good loop.
[07:46] Brad: So we’re not as focused on a single really good prompt, but it’s harnessing skills, tools, ways for the agent to check its work, using features like goal, which tackle really big tasks over time. It’s how do we design this agentic loop in a better way to support AI-native work, where originally we kind of just fired these tasks and reviewed them manually. Now we need to change our systems to make AI productive.
[08:10] Brad: And like you said, oftentimes that means describing documentation that someone new to the company would need, building out tools that make them effective. It’s like building a mini version of yourself—what stuff you do day to day—and give that to the agent. Let it verify that work, check in on its progress, and steer it from making mistakes here and there.
[08:33] Brad: But it feels like we have elevated the capacity of the agent’s processing power. We need to spend time to make that loop work very well, but once you get there, it feels like there’s this massive unlock that you can just fire off large tasks and get things done.
[08:53] Brad: And to me, the angle shifts. Instead of having a one-off chat, you have a really long chat thread, and this thing accumulates memories over time. It auto-compacts when you hit token limits for chat, but it does it really well now. And so it’s this thing where originally I would have to figure out when to end the chat. I don’t have to do that anymore. I would have to figure out what size of task to chuck at AI. I don’t have to do that anymore. It feels like I can send these huge tasks and just go for it.
[09:22] Brad: So to me, it’s like, how do we design a really good loop for this so that my codebase on the engineering side has tests that the agent can feel good about, and various other codebase health factors?
[09:32] Brad: And then when we take a step back from that, there’s also memory. So one thing that Jason brings up is it’s really cool to have almost like a knowledge graph of how things work at your company. I haven’t done this, but he kind of describes a world in which you could know who’s working on what projects. You could reach out to those people.
[09:51] Brad: I think one part of being AI native is using Slack a lot. There’s a Slack MCP. You can ask it to read messages, to draft replies, and even send messages. So I think Jason kind of talks about: what is a world like in which I can ask Codex to do things, it will then deduce relationships based on Slack conversations, it could probably even read an org chart in Slack, and from that you have to be very hands off the wheel and say, “Make a code change in this product and tag the relevant people.” It would know what the project is, who the owners are, who the engineers are, who the PMs are.
[10:24] Brad: That to me sounds really good because there are often times information that I don’t give my agents that would be useful, but I can never tell when to give it, when not to give it, and not to spend too much time there. And to me, I think memory is that big bucket. If you’re able to identify what’s important and how to work well, you can imagine an AI agent running this long chat and then kind of extracting facts over time.
[10:47] Brad: And I think that part I have not dove into, but from Jason’s article, it makes me very curious because, I don’t know about you, but there are definitely parts of my work that I feel like, “Yeah, I should be better at this. I’m just not sure how to package it. I’m also not sure how to keep it up to date.” I think that part is a big, vast area that, if we could crack that, memory is going to be a huge, huge deal there.
[11:07] Ben: Yeah, it’s funny on the Slack thing because I’ve used that before. And I don’t know if it’s the same as in your case because we might be using different models, but if you just have Slack draft replies to things, one, it’ll sound like AI, and then two, it’ll sign you up for things that you wouldn’t agree to.
[11:26] Ben: Like if a PM, for example, reaches out to me and is like, “Oh hey, when are you going to get that testing done?” and I let whatever model—Claude, whatever—respond to that Slack, it’s going to say, “Oh hey, Jason. Yeah, I was busy this week, but I’m going to get to that tomorrow.” It’s like, well, I didn’t—don’t do that for me.
[11:44] Ben: But it is really cool. I just noticed that it was always really eager to people-please the other person talking to me when it wasn’t like—no, I have reasons I haven’t gotten to that yet, you know.
[11:56] Ben: But yeah, the memory thing is really interesting because I was having a conversation with people recently, and I was like, this is one of those tools where the more you use it, the better it’ll get. And I think people who maybe aren’t engineers or feel like it’s too technical to get started feel like it’s too intimidating.
[12:14] Ben: But I’m like, you just have to get in. You can ask it how it wants to be used. At no other point in time have we had a technology where you can literally ask it how to be used. Back in the day, you used to have to read documentation, read the training manuals, get hands-on training on certain software, right? So now you can just ask those tools how to be used.
[12:31] Ben: But then too, as you use it, it’ll kind of in the background—you wouldn’t even really notice it—write these things out to memory. It’ll kind of know, okay, Brad, when he refers to PHP, he’s referring to Laravel. Just something like that. It picks up on your nuances, picks up on your habits. And so the more and more you use it, the more it’s going to compound and know you.
[12:51] Ben: And I think to me that’s really interesting because what if you want to start to change? What if you have a habit and you call something a certain way, but then eventually you want to shift that? I’m just curious, purely as a human being, when does that memory—if it’s basically infinite memory—when does that compact?
[13:18] Ben: And that was something that, a year or two ago, it was like managing tokens and compaction. It was something that you had to really think about if you were building an agent with LangChain or something, but now it just all happens in the background. And so I think it’s good for a regular user to not have to worry about all that. But it’s just interesting. Where does that go? Do people have any say in that? Do they have any preference? Or is it just kind of like there’s a magic memory window that’s applicable, and then the rest of it just gets thrown out to save space? I don’t know.
[13:51] Brad: Yeah, I think memories, as I’ve seen it so far, are pretty much just Markdown files that are organized in the way that the agent decides. Again, the memory stuff today is very manual. You have to choose how to instruct the agent to write into said files. There isn’t really first-class memory support. But if there was, that’s what I would imagine it to be.
[14:11] Brad: So then you can look at the changes it makes. It’s committed as a file. It’s also probably breaking things down like engineering docs, people, projects, et cetera. So kind of like a skill, I can pull in the right context.
[14:23] Brad: Another thing that’s mentioned in the article is voice input. We’ve talked about this a bit before, but Wispr Flow was just an amazing speech-to-text. I think voice being in the AI-native space is a really big deal because you can talk a lot more than you can type.
[14:39] Brad: For example, if you’re trying to draft a Slack message, like you mentioned, there’s typing it out, which is like, “Go reply to Ben on Slack that I’ll do this.” Or there’s, “Ben asked me about this accounting question. I’m not exactly sure. I think I can do it, but I need to go check this sheet and go check this email.” And that is very unpolished, but the AI agent can figure it out.
[15:02] Brad: I think both throughput goes up and the clarity of what you’re trying to do. Yes, your thoughts are more messy than typing something out, but I think that is kind of the way forward of using voice.
[15:15] Brad: And I’ve even picked up a microphone at work, but I’m still a little intimidated to use it. I sit in an open office, and it’s not the most quiet, but we also don’t have 40 people talking to their microphone saying, “Hey, fix this, fix that.” But I can see a world in which we do that because you really can get more done with that.
[15:34] Brad: And when I’m at home working from home, I do that a lot. Oftentimes I’ll juggle three or four things at once. I click into a Codex thread, I talk to it, I go to a different thread, I move on. And so I think that’s a big deal.
[15:44] Brad: And the second part that kind of goes with that is steering. I’m not sure how much you’ve done this before, but steering is very, very first-class in probably all AI chat apps now. What that means is when you ask an AI agent to do something, it’s going to go off and do that task. However, if you’re expecting future instructions, you can send a message and steer the agent both to interrupt it or to queue it. So that’s kind of the two differentiations.
[16:13] Brad: Steering is whatever the agent is doing, you can send a message and press Escape on the CLI. Or I think in the Codex app, there’s a specific button to steer. What that does is, once the agent finishes its latest tool call or turn, it’ll interject your message.
[16:29] Brad: However, if you want to queue things up, you can send a message and just press Enter in the Codex app. Once that agent is done with your entire request, then that message will auto-send. So I think usually, in the old days, you send a prompt, you wait, you look at the response. Now it feels like you can kick off four things at once. You can go to any of those four things, steer it to fix it, or queue up a future change.
[16:57] Brad: Oftentimes for engineering, I’ll ask it to do something, and then I think of the test cases later. It’s not done with that task, but I’ll come back to that chat and say, “Test this, test that, test this,” and I click Enter. Once that implementation is done—I don’t know when it’ll be done—but it’ll auto-send my next message.
[17:10] Brad: So you get this higher throughput if you’re able to do things. And steering is really critical to make sure it stays on path. Oftentimes, you can get a better prompt if you spend more time upfront, but sometimes things go wrong. So being able to wield the steering and the queuing features makes me feel like that is very, very AI native: using voice and managing your chat window, and being on top of things with having a lot more throughput.
[17:37] Ben: Yeah, I was just looking at the memories. So Codex saves it in a `memory.md` folder. I was looking at it, and it’s actually kind of funny because it picked up that I’m cheap. It was like, “User preferences,” and it says, “When the user asks what’s a solid, cost-effective way to manage email lists,” they said—meaning me—“ConvertKit feels expensive.” And I put in parentheses, “$39 per month.” So I was like, “Am I really going to pay $40 a month in 2026 for email management?”
[18:06] Ben: But it was biased toward lower-priced—it’s cut off here, I can’t see what it says. And the other one that’s kind of funny is, “Be very specific.” It says, “When the assistant gave generic headlines, the user said, ‘WTF are those headlines? Make it specific to me.’” So yeah, it’s building a profile of you. It’s funny to actually go back and look at the Markdown file. Some of it’s useful and human-readable. Some of it’s not really. It’s kind of just AI gibberish, I guess. But yeah, it’s interesting.
[18:41] Ben: And on the steering thing too, I noticed that one time I asked it to do something, and I think I had wanted to put something in the prompt, but I forgot to. I hit send, and then usually I would try to escape or interrupt it—completely interrupt it, like Escape, Escape—and then redo the prompt. But yeah, there’s that way that you can, I think in Claude, it’s like `/btw`, like “by the way,” and you can ask a side question.
[19:09] Brad: Oh yeah, yeah, yeah.
[19:09] Ben: And I don’t think I’ve actually done that in Codex, or I’ve had to steer it yet, but yeah, it seems like one of those things where it’s almost like a recall. Like, “Wait, never mind. Actually, don’t do that. Go do this instead.”
[19:24] Brad: Yeah, yeah. Another one from the article that I want to call out is talking about heartbeats. This is not super new, but kind of new. So in Codex, there are automations. Heartbeat is basically a way for, within that chat thread, to do something on an interval.
[19:41] Brad: A key engineering task that I do with heartbeats is to basically put up code, ask people for review, nudge those reviewers, and also make sure that the tests pass. So sometimes you write a feature, you write tests for it, you push up your code, tests break, but they might not be your fault. There could be a GitHub outage, which has been really bad recently, and your pull request will be failing.
[20:05] Brad: So I say, “Hey, check in every 10 minutes. Go check in on my GitHub PR. If anything breaks that doesn’t look like it’s my fault, do a retry. If something breaks that is my fault, let’s try to fix it unless it’s really big, and then notify me.” So you kind of get from “I have to go manually manage all these bits and pieces” to “I get to a point where I can then hand off, at a recurring schedule, an AI agent to do something for me.”
[20:27] Brad: And again, this goes back to equipping it with skills, tools, MCP servers to get things done, and then having this kind of cron job schedule to say every five minutes, every hour, et cetera. You can say, “Read my email every hour and draft me replies. Read my Slack every 15 minutes and draft replies.” All those things are pretty valuable because you can even go look at your Slack draft. You can open up the Slack app, and if you ask it to save a draft, it’ll be right there in the input box.
[20:58] Brad: And it’s awesome because it’s usually 50% right. It’s not perfect. Like you mentioned, it’ll sometimes oversubscribe and say, “Hey, I’ll do this,” even though you clearly don’t want to do it. But in my experience, at least half of the message is directionally correct, and I think that part helps speed you up.
[21:14] Brad: So a big thing, I think, for being AI native is knowing when to be in control, how to manage the conversation, and then how to turn recurring work into a heartbeat. Because this feels like it comes from a single AI chat thread to more like a process. You don’t need to remember what to do. The task remembers itself. And it’s basically now, how can you make this recurring work be delegated to that chat thread? Honestly, very fascinating. I need to do more there.
[21:43] Ben: Yeah, I like the promise of that. I was having trouble—not to throw work on your plate at all—but I was having trouble connecting to some of the specific apps. I was trying to do some of the automations. I wanted to check a CRM every six hours or so and tell me, “Hey, you have a new deal,” or whatever it was. I wasn’t able to get that connected yet, but that’s definitely something where I think people, you know, you almost kind of fall into a routine of, “I do this thing every single day or every couple of days.”
[22:18] Ben: And it’s one of the things where you need to pause and, especially again from the accountant lens, look at that and go, “Can I get Codex to do this?” And again, most people who aren’t programmers aren’t familiar with the concept of cron jobs. And I’m not saying that the heartbeat is a cron job—that’s it by itself—but just the idea that you can have something that you do every so often, that’s pretty routine and doesn’t really change each time you do it, and just set that up in a schedule.
[22:50] Ben: And I did that one. Like I said, the Slack one I’ve done. I think I actually kind of stole that idea from what you had told me once when you were in town with us. And yeah, it was pretty cool. It definitely oversubscribes too much for me, so I’ve got to tweak that.
[23:04] Ben: But yeah, that idea of having things—it’s almost like having a team. The old adage that I hear all the time is, “It’s almost like you have your own EA.” It doesn’t work and look at your calendar and do all that stuff. The contrarian point of view, I think, on that is a lot of people probably don’t need an EA. They should just look at their calendar and spend less time tinkering with the prompt, and just go look at the calendar and figure that out.
[23:34] Ben: But if you can find really good value out of something that you’re doing all the time, and it’s easy to put that heartbeat in place or the automation in place, then why wouldn’t you? And if it’s as easy as a couple clicks and getting it set up and prompting it, then that opens up way more people to do it.
[23:52] Brad: Yeah, I think it feels like we’re transitioning from prompt experts to more like workflow experts, loop creators, delegators. That feels like the trajectory in which we’re headed with being AI native.
[24:08] Brad: Codex also recently introduced kind of like artifact support. So you can open up PDFs, you can open up websites directly in that right pane of Codex. Previously, it was mostly code-related files. Now you can ask Codex to say, “Go dive into some problem space and generate an HTML file, generate a PDF,” et cetera. It’ll just get things done, which is amazing to begin with. And then you can actually view those files directly in your chat thread using the file viewer within the Codex app.
[24:41] Brad: And this is awesome because it feels like instead of going to internal tools, you can give the model access to almost like raw data. It can reason about that raw data through MCP servers—for example, making SQL queries. I could go fetch product analytics, usage, et cetera, figure out why that’s the case, tie it back to code, create an interactive dashboard, whether that’s an HTML file, a PDF to share with your team, a PowerPoint presentation. The bounds of this are endless.
[25:09] Brad: And it feels like maybe—I don’t know if this is 100% correct—but maybe there’ll be a shift of less internal tools and more raw data with good skills, good connections, such that each person can have their own kind of personalized internal tool. Oftentimes, these internal tools at companies will be generic, support lots and lots of people. But now I get a personalized one.
[25:37] Brad: Imagine it’s creating an interactive diagram to map out product usage for me, and it knows who I am with memories. That is a huge deal. I care about this section of the code or this part of the product. I don’t need to tell it because I have a long-running thread. That long-running thread has memories. That thread has very useful MCP servers that I’ve hooked up a long time ago. It can validate its work, has all this context. You get to be put in a position where you are more hands-off once you have the system set up.
[26:02] Brad: That’s not to say, again, like you mentioned, maybe the memory grows out of control. It’s not great. Maybe it has bad bits and pieces in there. You have to do some pruning. But there is a question on how far this can go. Is this the future? Like 10 long-running chat threads, each having its own employee, so to speak? One for your EA, one for your code maintainer to make sure all your PRs are green.
[26:29] Brad: And there’s a world in which nobody really knows, but you can kind of see the writing on the wall that this feels right. The orchestration layer is now coming into control, where it feels like one chat thread is very capable. Now, how do we manage five at a time? How do we give them different rules? How do we tie them back so that they kind of know what’s going on as a whole? That part feels very unexplored. And I think AI native is pushing in that direction, but there are a lot of unknowns, I would say.
[26:56] Ben: Yeah. What’s also interesting too is that, obviously you and your situation, tokens are not an issue. I have the Pro plan, so I don’t really run into an issue. I think for a lot of people that maybe are not on that same frontier, there’s you, and I’m behind you, and then there’s still a gap to, I’d say, most people that don’t have a podcast about AI, right?
[27:22] Ben: So I do wonder: in those scenarios, does that shift how you would use it? If you’re more limited on tokens, you probably would pick and choose what things you want AI to help with. But I think in a scenario where you have unlimited or near-unlimited tokens, then it opens up all these different doors that you can do.
[27:43] Ben: So I’m curious. You and I can answer that because of, again, the plans and where you work in terms of just the resources available to you. But if someone’s on a $20-a-month plan and they get maxed out at tokens at a much faster rate, does that—I’m just curious. That’s the other thing I’m curious about too. It’s like the other end of it.
[28:03] Ben: And if prices of tokens were to change, for example, some of that stuff. It’s really interesting because I’m always used to just running 5.5 extra-high or 0.7 extra-high and letting it rip. And I talk about it with some of my accountant friends, and they’re like, “Well, don’t you need to do Sonnet or 5.3 or 5.2 or whatever to save on tokens?” I’m just like, “No.” I wouldn’t. But yeah, I think your plans—
[28:36] Brad: Yeah, I think it’s a great question because there have been multiple companies that have spoken out about the engineering budget allocated to AI tooling. And in the first few months of the year, they’ve already blown through that budget. I think Uber was one of them specifically. Their CEO came out and said, “Hey, we are way past the Claude Code budget. What do we do here?”
[28:57] Brad: I do think it’s different having the Pro plan, a huge token budget, being in limited tokens—obviously, it’s a different world. I think it paints the picture of what the future could be like. For example, heartbeats: amazing. You could have a heartbeat run every minute if you have a huge token budget. Also depends what you do in that one minute. Being efficient with tokens isn’t my core focus given the bandwidth that I have, but I could see someone getting really far with a $20-a-month plan on Codex.
[29:24] Brad: I think Codex is one of the more efficient pricing models for tokens out there, but it’s not unlimited. Clearly not. I mean, there are different plans to get higher and higher. But if you were to draw the line of what’s most important to you and how to build out efficient workflows, that is a whole other bucket.
[29:45] Brad: I think when I think of AI native, it would be hard to put a number on it, but it feels like these companies are wanting to spend money to get productivity. How much they’re willing to spend really depends on what they’re getting out of it. And measuring that productivity—what they’re getting out of it—is extremely hard. So I don’t know where it ends up.
[30:04] Brad: Clearly, there’s a differentiating factor for people who are using AI to speed up the workflows, and different workflows and different job functions benefit differently on that scale. But companies are excited to spend on AI. Many companies have promised billions and billions to go spend on AI. It’s here to stay.
[30:22] Brad: Sometimes it gets confusing on where to focus and where to optimize, and my head is at workflows that are leaning more toward a healthy token budget. Because I think for folks who use it in enterprise use cases—I don’t have a dollar number, but I imagine it’s a hundred bucks per employee per month or something along those lines. I don’t really have any accurate numbers, but I imagine there’ll be a lot more tokens to spend on the enterprise because you’re spending tokens to make more money.
[30:47] Brad: On the personal side, yeah, probably a bit harder to run some of these frequent tasks or recurring workflows. But I think if you have a token budget at work, spending a lot more time to learn and soak up what it would be like if you had that, and then distill some of that learning into your own personal experience, that would be awesome.
[31:08] Brad: But yeah, I think it’s a great question. I don’t know what the world looks like when you’re a bit more limited and have to pick and choose. Maybe just less heartbeats, for example.
[31:18] Ben: Well, and it’s interesting too because I tweeted about this the other day. I can’t remember exactly what it was—I won’t pull it up just for the sake of time—but it was like, it feels morally wrong to have Codex 5.5 extra-high fix a div or something like that, or just proofread something. But I just do it. I don’t think twice about it.
[31:38] Ben: And from an enterprise, business perspective, I think a lot of people will do that too. Maybe they don’t do 4-point-whatever Opus 4.7 extra-high or 5.5 extra-high, but they will just be like, “Hey, just look at this paragraph and make sure it looks nice.” And that costs more than proofreading it yourself, I think.
[32:06] Ben: Or maybe a better example is to just go find an email or go find a Slack. It’s like, “Hey, so-and-so Slacked me the other day. Can you go find that Slack and tell me what they said?” That costs more money than you just going to the Slack search bar and typing “Brad,” whatever the topic was, and then finding the Slack.
[32:22] Ben: So really, you just basically ended up with, one, a more expensive tool, and then two, I guess my point with that is that people are outsourcing some of the basic stuff to it because it’s so convenient. It’s like a Google search. Instead of having to look something up the hard way, you used to just Google everything. Now you have almost a better Google that is more interactive and, a lot of times, just gives you more of what you’re looking for.
[32:49] Ben: And so people are treating it like it’s a search engine, but it doesn’t cost like a search engine. So I guess that’s my point. I think a lot of companies are feeling the pressure from external forces to spend lots of money on AI. At the end of the day, there needs to be an ROI for that. But I don’t think we’ve seen—we’ve seen the spending, we just haven’t seen the ROI.
[33:11] Ben: And I do think the Codex tools and the Claude Code tools are going to help get there because that’s bringing it—dealing with files on the machine versus in the browser. I think it’s a matter of time. I do. But yeah, it’s just gangbusters, and people are just using it for every little thing. So you just end up with the same headcount, but more expensive tools. It’s the inverse of the whole job apocalypse and ultimate ROI from these tools just yet, you know?
[33:52] Brad: Yeah. I mean, I think a year and a half ago, maybe just a year ago, when these tools weren’t as mainstream as they kind of picked up at the end of last year, you could get a lot of work done with AI and no one would really know. But now it’s just front and center. Folks know it speeds you up, or at least speeds up some people, so the expectation is there.
[34:09] Brad: But it’s not always easy to dive in headfirst. So I think there’ll definitely be high ROIs, but yeah, we need to figure this stuff out. Ideally, people don’t need to listen to the podcast and figure out all these crazy tips and tricks. Hopefully Codex and Claude Code are able to build out an awesome memory feature that automatically prunes itself, you can review it, et cetera. But we’re not there yet. Who knows what the next frontier looks like? But yeah.
[34:34] Ben: Yeah. It’ll be interesting to see. Just to leave you with this one, the perfect example of my laziness and default to Codex is: I can just go check the performance of a Google Analytics page. I can just go to my Google Chrome, click three things, and then see it. But instead, I just ask Codex, and it does some API request that it just built because I asked it to, and it gets me that data. And it’s just like, that is such a waste if you were limited on tokens. That’s such a waste when I could just click three places and have it.
[35:07] Brad: We’ve got to coin a phrase for that because I have also done some extremely minor things that I thought, “I could open up a text editor and fix this in three seconds,” or I just go to my voice chat and say, “Fix that.” There are different scales. I could voice it, I could type it—these are all to Codex—or I could do it myself. All of which are extremely short.
[35:29] Brad: But yeah, maybe there should be a word for one, a task is so trivial but you do it through AI anyways, and two, if there’s any remorse with that. Like, “Why did I do that?” There’s a thought that goes through your head like, “Should I do this manually?” Maybe not.
[35:50] Ben: Yeah. It always reminds me of those memes where it’s like—I can’t remember exactly how it’s phrased—but basically it’s like, all this energy is being consumed to make silly meme AI videos. It was like when Sora was out and people were doing “Granny goes off the big ramp at the X Games” kind of thing. It’s just like, this is what we are burning energy for? These silly videos. But yeah, it’s all good fun.
[36:15] Brad: Yeah, yeah. We’ve all done it. If you’re listening, please comment down below the most minor tasks you’ve had Codex, Claude, or any other AI agent do. I’d love to hear some funny stories about what people talked about.
[36:30] Ben: Yeah, I agree. Cool. Awesome. What else we got, Brad?
[36:34] Brad: I think that’s it. We’re roughly at time, so we could probably wrap here. I have a really good bookmark. So for those listening and who made it this far, I saw this incredible YouTube video. The CEO talks about their journey of building a mobile app, and it was really enticing. I was kind of on the edge of my seat for about 20 minutes.
[36:57] Brad: And I won’t spoil it, but I’ll link it in the show notes. It’s an incredible story: solo founder, just doing their best and honestly doing really good work. I applaud them. So I’ll link it in the show notes. It’s a must-watch. If you’re a listener of the pod, it’s a must-watch. So I won’t go too deep into it. But again, if you made it this far, please, please watch the video. You’ll thank me later.
[37:18] Ben: That’s cool. Yeah, I saw a related video. It’s not my bookmark, but I saw it today. It was something about the Singapore prime minister. He was—no, not prime minister, but it was some government official in Singapore. He was doing a presentation, I think with AI Engineer, that YouTube channel.
[37:35] Brad: Oh, that one’s good.
[37:36] Ben: Yeah, it’s great. And so he was in a talk, and long story short, he had talked about how he was experimenting with OpenClaw. Basically, I think the reason that he was talking about it was: you can’t govern, you can’t legislate something you don’t understand. And so that was him trying to get his feet wet and experience it firsthand. Because yeah, if you don’t understand the tools, you shouldn’t be making the rules on them.
[37:59] Ben: So my bookmark is not really a bookmark. Well, it was a bookmark a couple months ago, but it was from Levels talking about his Termius VPS, Claude Code at the time.
[38:09] Brad: Is it not Terminus?
[38:11] Ben: I think it’s Termius. I could be wrong.
[38:14] Brad: I’ve installed it once, but—
[38:15] Ben: Yeah, Termius. That’s what it shows on the app here. So I did that.
[38:18] Ben: Basically, for those that didn’t see that bookmark or didn’t hear when we talked about it at the time, it’s a way that you can basically talk to Codex, talk to your AI, via your phone. So you have a virtual server that you host with DigitalOcean or Hetzner or whatever. Then on that server, you install Codex or Claude Code. Then you can use SSH via Termius on the phone to SSH into that server and then do your coding that way, and do your vibe coding that way.
[38:53] Ben: So not a technical person—oh, I stopped saying that. I mean, that’s slightly technical. But SSH stuff is always a bit of a doozy for me. But I got it set up last week. A lot of fun. I haven’t done anything with it yet, so I don’t have any “this unlocked so much for me” kind of thing. Because there are still some limitations for me that I need to better understand how to get comfortable with.
[39:20] Ben: A big example is if I asked it to make a web app, I don’t have any way to test that locally that I’m aware of in Termius. I think there’s a way that you can set up some kind of port forwarding that you can access from a browser on your phone. But on my computer, I would just go into terminal and run the Uvicorn script to set up the local server and go look at it. So I don’t have that.
[39:47] Ben: And then there are a couple other things that are kind of clunky. Just scrolling up and down on such a small screen is a little bit rough, or using Control-X if you’re in nano to write things. So there are a couple things that I’m not thrilled about. And then just all the SSH key crap is kind of annoying.
[40:01] Ben: But I made it really locked down. I did the whole Tailscale thing and refused incoming hosts, so hackers don’t try me. But I did all that, so that was fun.
[40:12] Ben: What I teased at the beginning of this episode, when I was doing that, was I was like, “I don’t know if this will be necessary because I think they’re all going to build their own mobile apps that can run without you being present,” which is kind of the whole sales pitch of the VPS in the cloud that you can interact with on your phone.
[40:32] Ben: So right when I did that, I think you guys—Codex—released some kind of mobile announcement where Codex is now in the mobile app or something like that.
[40:41] Brad: Yeah.
[40:41] Ben: I don’t know the exact quote, but literally the day or two after I did that, that announcement came out. I was like, “Uh.” But I was still glad I did it because I knew that was going to come no matter what. I just figured that was naturally where things were going to go. And two, it was fun. Part builder, it was fun tinkering with it and learning about Tailscale and how that works and stuff like that. So bit of an ordeal, got it to work. It was an experience.
[41:08] Brad: Yeah, nice. We do have Codex Remote out. So you can open up the ChatGPT app. There’s a Codex tab at the top. If you select that, you’ll get the option to connect to your computer, as long as you’re running the Codex app on a computer. I think it’s supported on Windows at the moment, but if not, Mac at least for now.
[41:29] Brad: So what that means is you can have your Mac awake at home. You can keep it awake with a script or whatnot. And you can have your ChatGPT iOS app connect to your computer securely, kind of like you mentioned, and you can ask your computer to do things for you on the go.
[41:45] Brad: So I’ve done this a little bit already, where I was fixing bugs with my expenses while doing errands, where I would voice prompt and be like, “Hey, go fix this in the mobile app.” And then I would close my phone, lock it. My computer is still running it because my phone is talking directly to my computer. My computer is running Codex. It’s beautiful. And you can see the full chat thread. Awesome project.
[42:08] Brad: I think that’s going to get a lot more love in the coming weeks and months, and it just makes it feel more productive. So it’s in the same vein as you described, but it’s a little bit more first-class, secure, and supported. And it runs Codex, no VPS required. It’s just straight from your phone to your computer. Keep your computer awake. And yeah, this is the future.
[42:28] Ben: Yeah, it was great marketing too. Because on the video I saw, it was all these MacBooks and laptops half-closed, which is how you kept your Codex agent running before this. So I guess, does it still need to be on if you do it this way, or no?
[42:43] Brad: I think there is a feature that allows you to have the MacBook lid be closed.
[42:49] Ben: I don’t think that’s a MacBook setting though, right?
[42:52] Brad: Yeah, yeah, yeah.
[42:53] Ben: Yeah, okay.
[42:54] Brad: I don’t love that feature because it just feels weird to me that my computer could be “awake” while it’s closed. Just in the history of using a MacBook, I’ve always seen that as kind of the end of the line. So I’m happy to crack my MacBook and explicitly opt in. But yes, you can officially close the lid and have it still do things.
[43:12] Ben: Yeah, when I was trying to do the OpenClaw setup back, I think right around the new year when we first talked about it, I had the old Linux computer and I had that setting too. It was like, don’t sleep on display. But it didn’t work for a couple other reasons. But yeah, no, it’s good stuff.
[43:28] Ben: So there’s no escaping it. You can be heads down Codexing or Clauding anywhere you go now. So yeah, there’s no excuse.
[43:36] Brad: We talked about this a long time ago when everyone was obsessed with token maxing. Now it’s a little bit easier to do that. I kind of had a thought cross my mind of, “Should I be doing this? Is this too much?” It didn’t get too bad yet, but I could see people getting quite addicted to it.
[43:52] Ben: Yeah. Cool. Awesome. Well, good stuff, Brad. That flew by. That was a great conversation. So kudos to, again, whoever wrote that article because that got a good convo going.
[44:04] Ben: Good stuff. So we’re at episode 42. We’ll call this season one of the Breakeven Brothers podcast. We’re going to take about a month or so hiatus due to travel and other plans. But when we’re back, expect to have a refreshed look. Awesome same content, same presenters. But yeah, stay tuned for that. It’ll be a short break, but we’ll be back hot off the press sooner than you know it.
[44:29] Brad: Cool. Yep. Good stuff.
[44:31] Ben: Cool. All right, Brad. See you next time.
[44:33] Brad: All right. See you next time.
[44:35] Speaker: Thank you for listening to the Breakeven Brothers podcast. If you enjoyed the episode, please leave us a five-star review on Spotify, Apple Podcasts, or wherever else you may be listening from. Also, be sure to subscribe to our show and YouTube channel so you never miss an episode. Thanks, and take care.
[44:52] Speaker: All views and opinions by Bradley and Bennett are solely their own and unaffiliated with any external parties.
Creators and Guests
