AI for taxes & the end of App Stores?

Download MP3

[Theme Music]

Awesome. We are recording Episode 38. We're almost up to 40, which is pretty crazy. What's going on, Brad?

Not much. I tried doing my taxes with AI recently. That was fun.

As a CPA, I'm offended by that. I have to take up my pitchfork with that statement.

That's why I brought it up. I think I tried it last year too, but that did not work out. I thought, you know, the model's gotten a little bit better, let's try it. I put everything in TurboTax, which was a bit of a sad day given that I owe a bit of money. I thought, why is this the case? Let me see what AI has to offer. It came up with a few details and talked about various tax laws that might affect me. But honestly, I didn't think it would be that bad. It seemed like it could be a bit better. Long story short, I think there's room to grow there. It's gotten really good at coding, which we've all seen on Twitter. But yeah, I think for taxes, there's still a lot of domain knowledge there.

Yeah, I think it's more a reflection of the U.S. tax code than it is of programming. When you're writing code, everything gets distilled into ones and zeros; it's very deterministic. With the tax code, it's deterministic because you are going to owe a certain amount, but it's written very ambiguously. If you have a complex tax situation, you might need to reference prior case law to see what the precedent is. It's just much more unstructured, like a blob, versus programming where everything is very ordered.

I'm surprised that TurboTax asks me so many questions. I can't imagine the branches that exist in that software. That part always gets me, where I wish I had an AI agent that was able to answer these questions immediately.

Yeah, one thing on taxes, and then we can move on. I remember in college, I had a great tax professor, and there was some weird tax rule. It was something like, you could depreciate things all the way up until 1987, and then if you put anything into service after 1988, you could keep depreciating. So there was a one-year gap that didn't make any sense. He asked if anyone knew why there was a one-year gap. Basically, everyone was trying to figure out what was happening in 1988. And he was like, "It's just because some politician lobbied and they probably had some vested interest, so that's just how it got put into law." There's not always a rhyme or reason, and it's not always logical.

Good old taxes. Death and taxes, always guaranteed.

There has been a crazy past few weeks in all things AI, and we're going to do a little bit of a release recap. When these releases come out, it always feels like everyone's firing at the same time. One of the releases is OpenAI Codex 5.3 Spark. This is the first model that's powered by Cerebras for fast inference. The Spark model is kind of the faster experience using the Codex tool. It's not the smartest model, but it's really fast and a really different experience. You know that sometimes speed can be a bottleneck, especially if you want to do a large refactor. So if you feel like you're doing something that doesn't need full intelligence and you just want to get stuff done, try out 5.3 Spark. From Anthropic, Claude Sonnet 4.6. I believe that one came out with a 1 million context window in the API. This was a big release from Anthropic that came out this week. They even came out with a fast mode. So if you toggle /fast, I believe in Claude, that gets you 2.5x faster inference. But I think it's six times the price. So it depends on your use case. It's cool that they're offering it. It feels like there's a differentiation in the market between high intelligence and slow, and good enough but fast. And then the last one that we'll talk about is Google. They came out with both Gemini 3.1 Pro in preview and Lyria 3. Lyria is basically their audio model to chat with to get an audio file created.

We were doing this right before recording the podcast. We wanted to replicate or maybe create a new Breakeven Brothers intro. A lot of listeners have reached out and applauded that intro, so thank you for all that feedback. Ben created it with other AI tools, and we love it. So we thought with Lyria 3, let's give it a try and see what Google has to offer. It fell a little bit short, I'll be honest. What do you think?

Yeah, no, it's... we basically gave it the same prompt that we came up with for the other service, which you hear in the intro, of course. And it just was not it. It's about musical taste, right? So someone might love that song. But to me, what it was producing... I said something like, "Hey, '90s R&B, West Coast, melodic radio jingle," or something like that. And that's what you hear in the intro to our podcast. And Lyria 3 took that prompt and made something that in my opinion doesn't sound anything like it. I don't know, maybe what we'll do, Brad, is we should download the MP3 and put it up on our YouTube channel. We can be like, "Hey, here's that song if you want to hear it." They sound completely unrelated. That's my overall point. You'd expect them to kind of sound similar based on the description, but they sound completely unrelated. The Suno one, which is what we use for the OG, is much more in sync with what the prompt was, whereas the Lyria one sounds like a Disney song in my opinion.

Yeah, and I was trying to ask it to do the "Breakeven Brothers podcast," and it came out multiple times saying the "Break Evan Brothers" podcast. The one thing I would say is that what I did notice, though, is that it's better with the lyrics, I think. In Suno, sometimes you can't understand the lyric as well. It kind of mumbles through it. I feel like Lyria is very enunciated. And in songs, people don't always enunciate, right? So I feel like that's what makes it not have as much flavor. I don't know. It feels like that for code, too. I feel like there's code that works, and then there's code that works and feels and looks good. And maybe that's the part that they might be missing, where the music is technically music at a high level, but it doesn't have the pizzazz, the taste. Our current intro, yeah, we could play it, we could listen to it. The fans love it, so we're keeping that one. Lyria 3, I think is an excellent addition, probably useful for a lot of things, but for radio jingles, maybe not yet.

One of the other things, Brad, that I wanted to chat about in this episode was actually related to Open Code. Are you familiar with Open Code? I'm sure you are.

It's funny you say that because I've installed it and I ran it one time, I think. It's the terminal UI. It's a bit of an experience, but I honestly haven't touched it ever since then.

Okay, cool. So I'm on a push to kind of diversify away from Claude Code. I like Codex, I got Claude Code, but Open Code to me feels like it can be a bit more agnostic. I can pick the model and still have the agentic harness that Claude Code is so great with. One of the things that I found was really cool was that you can download a local LLM onto your physical hardware. And through the config file of Open Code, you can hook up to that model as long as Ollama's running. So I was also tinkering with that, and I downloaded Llama 2, which was, like, really old, but it was really quick to download and quick to try. And then finally I downloaded the GPT-OSS 20B, which is the biggest model I've downloaded, and was playing with it. And I would say it's actually really impressive. One of the things that Open Code does that I didn't know is it picked up all my Claude skills. It just sat there and it just read all the Claude skills. So whenever I wanted to use my Nano Banana Image Pro skill that I had made in Claude Code, I could ask for it in Open Code and it would just go read those Claude files.

I'm sure it would be the same. I think there's been a big push to actually normalize those to be just "skills." There were specific named folders for like `.claude` or `.codex`, and I think there's been a unification. Specifically from these big AI labs saying, "Hey, we have these various toolings, we need to make it unified and get everybody on board." Open Code, Codex, Claude Code, it's a small world out there. So I think Open Code reading Claude Code is great. I think what would be even better, which I think we're getting to, is having them all standardized on one specific location and ideally one specific spec.

Yeah, and you know, that was really easy to get set up. I still moved my Claude skills into an Open Code directory. I don't know if that's just the OCD in me, but I was just like, "I don't want it to have to go through Claude." And then what I was going to do was basically just commit my skills into my GitHub account. And then if I need to work from my laptop, I can just pull those skills and keep them fresh that way. Because right now, I have my home machine, and I don't really use my laptop much for the AI stuff. But I want to start using it more, and my excuse has always been, "Well, I don't have it downloaded and I don't want to have to set it all up here." But I'm like, "Well, now I can just commit those."

I have to ask the question about the laptop. Are you getting the token frenzy, sir?

Token frenzy? I don't know that term.

Yeah, it's the proliferation of using AI tools because they're so good now that people just want to use them all the time. So, having another computer to run AI tools, is that the mantra?

No, I think for me, it's efficiency. So I wrote a blog on my website about on-premise AI. And I was out doing stuff with the kids. My daughter was in dance. So I was at the dance studio, and I was just working. And I was like, "Well, you know, I'd like to be able to do what I want to do and have all the tools I need on this laptop," but I otherwise wouldn't if I didn't set this up that way. So it's more about efficiency. I don't use it just to use it, you know? I like AI for the tool aspect. I'm not into AI just for the pure passion of, like, geeking out on a million-context window versus a four-million-context window. It's just more about using it as a tool. What do you think about on-premise AI models? And the reason I'm going to preface that is because in accounting—and accounting isn't the only industry that has this—but finance data is typically very restricted and sensitive and confidential. A lot of accountants will tell me, "Oh, I can't use AI because even if I have the plans that don't train on your data, you never know." One of the things I've kind of said is, "Well, you can just download these models and run them locally." Now, you probably don't have the hardware to really run the max models, but if you invested enough, you could have the hardware. That just runs completely air-gapped. Like, you can set it up so it doesn't even have internet access. It just connects to your machine. I operate in the mortgage industry, and that industry has a lot of sensitive data, has borrower information. If there was ever a use case for needing on-prem AI, a mortgage company wanting to use AI to read customer documents would be a solution. What do you think about that take?

I like it, but I think an open-source model with the class of intelligence we're seeing from leading AI labs feels far off. I think for me as a developer personally, I get very accustomed to using the latest model. If you take a look back at a year of model advancements, they're significant. But as you use whatever the new model is, 5.3, you get used to it. And once the next model comes out, it just feels like that's the new baseline. I think with the open-source models, we get good releases, they just aren't prioritized as much. Because I'm just trying to use the latest and best, I feel like I'm so spoiled by having all this good intelligence. I personally really like the private model, own your own stack, own your own inference. But I do know that comes with a cost of compute, and running a good open-source model takes compute. I think that is really exciting, where if you have this air-gapped workflow and you want to run a good model, it requires decent compute. Now, Nvidia can package this up for you and ship it to consumers. It's something that's in between, where it's much better than a small model that could run on your local hardware. Long story short, I feel like the workflows that I think people are working on today are very much powered by the latest and greatest. So to me, it might be kind of hard to have something feel reliable with local compute, given the open source we see today.

And I think for sure, like if you were an enterprise and you felt like you needed to go down that path, you would need to spend some serious coin. You wouldn't just go pick up an Alienware desktop; you would need a server rack and all that stuff. I think one of the things though that's interesting is in your line of work, having that next wave of intelligence is really important and that can be another unlock. That enables people to be more efficient and it pushes all the other models forward too. For me, since I'm in accounting and kind of finance, it's very operationally focused. I actually don't want things to change that much. I kind of want things to be, "Let's have this work and let's not fuss with it until it breaks," basically. One of the benefits of being on-prem and running local models on your own hardware is you can kind of control when you want to upgrade the model. Whereas if you're using a different provider, they're constantly tweaking things. My overall focus was like, on-prem isn't really talked about very much, especially in the non-technical world. People tend to think of just ChatGPT as AI, and it's like, "No, there's way more out there." You may have a client that says, "I don't want AI having any of my information." You may still be able to explain, "Hey, this is just running locally. Even if a nuclear bomb goes off and destroys the grid, this thing can still run." I think there's a lot of use cases for that in accounting.

Yeah, it's cool. I think when I played with Ollama, it was very interesting to see that the longer the conversation goes, I believe the more RAM it took. When you're using these APIs that provide you inference, you don't feel any of these tuning parameters. So I think you have to choose the exact model that you want, like Llama 7B or Llama 30B, and different sizes impact your machine. I think the local LLM gives you a lot more control, but with that comes the expertise needed. I feel like people don't get that or don't want to spend time on it. It's kind of, you know, "I just want AI, and I don't want to think about it." I think they forego that option and will go the easier route. I think if that was a little bit easier or maybe there was better education, that could be a big win. There are a lot of use cases. This might sound crazy, but if you imagine 10 years out, the open-source models that we look at today will be really capable. At that point, these models are getting so good that it feels like an open-source model could handle some pretty complex tasks. If you give Llama 2 a very complex task today, I guarantee you it will not have the same performance as what we see from the leading models that are private.

Yeah, and one thing I noticed is I would tell Open Code to do something, and you'd see it thinking, and then it would just stop. It wouldn't respond to my original request. I'm like, "What the hell, what gives?" And I would go to the Ollama CLI and I would ask the same thing, and it would do it. Long story short, what I realized was that when you have Open Code, it sends way more context than just the CLI. And that was what was causing the difference. It's not just sending your message to the API; it's sending a whole bunch of other information that makes those systems so amazing. But there's just so much behind the scenes that I didn't appreciate until I was tinkering with the local model.

There is a lot about the harness, about the prompt. There's a lot of details and juice in there. One thing that I've really liked recently is in the Codex app, there's actually an automation section. This is something that you can tell Codex in the UI to do something at some interval. One thing that I found is it's really nice to have a weekly dependency check. For projects that I work on, oftentimes I want to upgrade my dependencies, but I don't want to do it manually. It'd be nice if every week or even every month, I could schedule an automation to say, "Look at the dependencies in my app, see what the changes are, and create a plan to upgrade." One other one that I've really liked is if you use Codex a lot, you can ask Codex to do an analysis on itself. You can say, "Look at my past seven days, look at all the chat sessions I've had with you, and please suggest various rules to add to my agents.md file based on the chat history." So it can parse all this JSON, see what you're explaining multiple times, and suggest real-life edits. To me, it feels like these automations can unlock a lot of power. With all things AI, it feels like it takes some time for folks to figure out the power. It takes a lot of effort and thought to think, what is actually useful and what's not? If you are using Codex or even Claude Code, a really cool trick is to just ask it to do a kind of meta-analysis on itself. You'll be like, "Hey, here are my chats in the past seven days," and it can figure out where they're stored, parse all that info, and give you actually valuable insights. It's, you know, no human is going to go take a look at their past 40 chats and analyze line by line. It feels like AI is just built for that, and that takes five seconds to kick off a query. I don't know if you've tried automations, but they're really cool. And if anyone listening is using the Codex app, please leave a comment below. I'd love to hear about some cool automations. I think it's a severely under-invested-in part of the app.

Yeah, I'm guilty of not using... I haven't tried Co-work or the Codex app yet. I feel like I'm just kind of enjoying the CLIs for now. And like what you were saying of having a daily process where it comes in and does something in the Codex app, right now I have a daily process that I just go to Open Code and run manually. But I like running it, Brad. I like running it. I don't want it to be automated. But no, it is a cool feature. AI is so interesting because there are certain things where I just could not imagine putting AI in the mix. Like, I don't like AI for cold sales outreach. No one wants to talk to a bot and get sold by a bot, in my opinion. But the dependency managing thing, that is one of those things where, probably looking back, you'll be like, "How did we ever manage dependencies? How did we do it manually? What are you talking about?" It's like, "We had to drive stick shift? What do you mean, it wasn't automatic?" It's going to be the same kind of thing. And yeah, it's such a, you know, I'm sure for the engineers out there, it's such a relief to not have to worry about little stuff like that that can nuke your program if a dependency isn't updated correctly. It just gets handled in the background, and you can focus on the big things.

Yeah, it feels like we're in the early days of all the automations. You can even have automations that reference each other. I think there's even a history too, so each run it can add to a shared notes file about that run and what it did, and it can learn over time. So it's all basic things. Like when we talked about skills, a skill is a Markdown file that's just packaged. It's just a plain Markdown file. It feels like automations are very simple too. It's just like a scheduled cron job that runs a Markdown file. To me, AI is really good at the fundamentals and the logic and writing code, and then we build these systems around it to make that life easier and to unlock a lot more. They might not feel very meaningful today, but I think with the right insights and tinkering, automations can be a huge, huge win. So yeah, shout out to the Codex team. I think they're doing a fantastic job.

I think those desktop apps like Codex and Co-work, I don't think they've really been absorbed into the mainstream knowledge work yet. I think once they are, that'll be the de facto choice because with Vim coding, you kind of need to know what you're doing if you're going to make anything really meaningful and solid. And you know, the CLI tools are amazing, but I still think most people don't work with a command line. That's just completely foreign to them. Co-work and Codex, the desktop apps, I think are that nice middle ground of being on your computer, which means they can access the files and do all the stuff that makes those CLI tools so great. But you don't need to know Linux commands and stuff like that to get around, right?

Yeah, I think that is the whole selling point, but I agree. Things definitely move fast. So what we use today, or what, you know, the bleeding edge is, definitely changes, and what the mainstream uses lags pretty significantly. It's cool using the latest and greatest, but yeah, you got to be mindful that this stuff will really probably pick up in six months once everyone else gets around to it.

There is one last tweet I want to talk about on the pod. Karpathy was kind of mentioning the end of the App Stores. And for someone who works on Apple platforms on the iOS side, I have definitely seen an increase in review times. The whole hypothesis to this debate is that more apps are being created. How is that happening? Well, we have these awesome AI coding agents. AI coding agents are creating new apps, and new apps are being submitted to the app store. At some point, though, it feels like if you are an engineer, you can make your own tool, which all engineers love. You know, "Do I spend five minutes doing something or do I spend 20 hours automating it?" So his tweet kind of highlighted, is the App Store the right way to do things? Is software going to change in a way because it's easier to write software, where it might not make sense to have an App Store? Maybe the software is built, used, and torn down, which with the power of Codex and Claude Code, doesn't feel that crazy to me. I think the pricing structure and the intelligence feel like it's there, but not everyone wants to build an app. To me, I see both sides. I think it's an inflection point, but also not really because this is bleeding edge and it's not mainstream. But some of these software companies are seeing a bit of a hit in their stock price. Some could say it could be because of this hype. So, all in all, I think there are a lot more apps coming out in the App Store, and there's this weird balance of, is this going to make more software or is this going to make less software? Are App Stores the right avenue given that five years ago it was hard to make an app? It took a lot of freaking effort. Now you can code an app in two days and get it up on the App Store. Apple has never experienced such an influx of apps. I'm very curious how it will all pan out. What are your thoughts?

Yeah, well, you led this segment with "bleeding edge," and I think that's a good way to describe the stock charts these days. Lots of red. It's interesting because one tangible example is, I think as of today, February 23rd, Anthropic released that Opus can write COBOL. And on that news, the stock of IBM just shrank. And I'm not an IBM analyst. This is not financial advice. But I think the understanding from that was that COBOL, which is used in all the financial transactions, underpins our banking systems supposedly. It's a really legacy language that people don't really learn these days, but it runs everything. And so there's a select few that are responsible for managing it. But with AI, with Opus knowing how to code in COBOL, the moat to competing or making a new software company has gotten a lot lower. And that got a lot lower really fast, like very fast. And so that's what I think is really interesting. I think what we're seeing with a lot of SaaS is, to your point earlier, not a lot of people want to clone Salesforce. So there's going to be certain apps and certain software that have staying power. But if it's just like a calorie counter app, I'd rather just take a picture of what I made and ask AI. That's an app that I can't see really sticking around these days. So stuff like that, it just... how technical is the problem that you're solving? If it's not that technical and AI can do it with relative ease, then yeah, why have an app for that anymore?

Yeah, and I did look it up, so it's C-O-B-O-L, so Ben is right. But I love the name: Common Business-Oriented Language. What a name. But I agree. I think when you look at a software business, there's a lot more to it than software. There's uptime, there's upkeep. We've talked about it plenty of times on the podcast. All the business headaches. For example, if you look at a business like Uber, all the legality of doing ride-sharing. If you look at a business like Airbnb, breaking down barriers and figuring out the laws in all these regions. Those aren't clonable businesses. So I think, as we look at these takes, yes, but no. If you're on the bleeding edge, you think, "Oh, I can clone everything." But again, no one wants to spend all that time. I think it becomes increasingly easy for a large company to do that, to say, "Oh, I don't want to pay my $10 million Salesforce bill," as an example. But if that's the case, you now have these tools that empower you to write code, but then you have to maintain it. So that's a cost for your own employees and software. So I think one note is build versus buy, and I think "build" will become easier, but there are hidden costs. And two, on the App Store specifically, there has never been a time where there have been more apps. So now when you go search for "calorie counter," you end up in a sea of AI-coded apps plus the apps that have been there for 10 to 15 years. Ideally, Apple maybe ranks the old ones higher, but then you come to the conclusion that maybe there's a really good app that was written with Claude or Codex that should be the top contender. So it's a weird divide where the quite literal review times for Apple apps have tripled in the past few months. It has clearly taken a toll on the Apple App Store reviewers. So I bet Apple will do something, if not hire more people, then figure out maybe a better way to do things because they've historically been rather strict. Given the amount of code and apps that are in the ecosystem, maybe there's time for a change. I think overall, apps are here to stay. I think the discoverability of the App Store is here to stay. In a few years, five years, who knows? I don't know. But I think a lot of these pieces that we see on Twitter are more thought-provoking than they are the actual truth. It's what is the direction that this is talking about and how would it affect people if this was the reality today? I think that's the interesting thought of, you know, in five years when it's really easy to write any of these apps, do people publish on the App Store anymore? Because they just write it and tear it down. So pretty interesting.

Yeah, for sure. Strange world out there. It keeps getting weirder.

Cool. Awesome. All right. Well, should we wrap it up with our bookmarks, Brad?

Yeah, so my bookmark is Replit's new animations. So I'd say in the past two to three months, there's been a lot more effort to create animations with AI. There was a Remotion skill to use React code to create animations. And now this latest one is called Replit Animation. You basically prompt an animation into existence, and it creates up to 30 seconds using Gemini 3.1 Pro, which is that new Google model. It takes three minutes to generate, and maybe there's some free tier. I haven't done it myself, but looking at the videos that it has generated on X has been pretty impressive. For someone who creates products and wants to have flashy marketing videos, I have not stepped in the ring to try any of these tools, but they all are popping up. It's on my list to get started, but really exciting to see a good animation tool because it feels nice when you look at these videos and you think, "Oh, how much effort would that have taken? Like, how do you even animate that?" I don't know. And so all these tools are popping up with AI and skills that make this a lot easier. So shout out to Replit. I think they'll probably do pretty well here given that animations are a huge part of marketing and people do not want to spend time on that. If I've learned anything about building your own business, it's that it's really fun to code as an engineer, but marketing stuff sucks and you want to spend as little effort as possible. If I can spend 10 minutes to make a video, sign me up. I'll share the link, but yeah, it's called Replit Animation.

That's cool. Yeah, I think, I don't know if you can see it in the background, but there's a book called *The Accountant Marketer* up on my bookshelf back there because I'm with you, marketing is not my strong suit. It's hard. So I respect people that can do it really well because it's hard. Okay, cool, awesome. So my bookmark is from CNET.com, and it's "Hackers are trying to copy Gemini via thousands of AI prompts, Google reports." So the long and short of it is that Google issued some report, I guess, a threat tracker report. And basically they're stating that hundreds of thousands of AI prompts were used on Gemini in what's called a distillation attack. Essentially they're saying that foreign adversaries—countries like North Korea, Russia, China—are attempting to steal the intellectual property of the model and use it for their own models that they're building. So basically, copying the notes of the Gemini model for when they build their own models. To me, it was really interesting. I think cybersecurity probably needs to catch up to how fast AI is developing. The pace of AI, we talk about it a lot on the show, the intelligence is getting better and better. The models are getting into more and more people's hands, which is a good thing. AI is becoming more and more ubiquitous, but there are still a lot of security concerns people need to be aware of. You know, you wouldn't give your password out to some random website, but sometimes we are maybe flippant with a model that we shouldn't be, right? And so it's just, I think for again, as a non-technical person, it's always important to keep the cybersecurity element in mind. And seeing companies like Google and others getting pelted with these kinds of attacks, it's sobering that, look, this is a really competitive space. There's a lot of stuff going on beyond just vibe coding, right? There's a lot of competition. So it's pretty crazy.

Yeah, wow, jam-packed. Shout out to Peter of Open-Claw joining OpenAI. I think security has been talked about a lot, and I think he's done a lot of security improvements recently that he's talked about on X. But I agree 100%. Sometimes it feels like we're quick to go to AI and sometimes we forget that security is important. Give it the right data, manage your data well, and don't install tools if you don't know what's going on with them. So I think that's a perfect example and a good reminder, honestly. There needs to be some sort of monthly reminder that all this stuff is really cool, but just be a little careful.

Yep, agreed. One thing, Brad, one thing that I will throw in at the end here, and we'll tease it for the next episode. I think one of your bingo cards is going to come true, one of your bingo spots.

I don't have the card up, but I do make good guesses, so I'm not surprised. Makes sense to me.

Yeah, yeah, we'll leave it there. We'll leave it there. People can tune into the next episode. I think we should revisit it at the top of the list. Because yeah, I saw something and I'm like, "I think that's it, I think that's golden. I think that's going to be on there."

Okay, Episode 39 then.

Episode 39. We'll give Brad credit for a bingo card.

We will see. Tune in to find out.

Awesome. Cool. All right, good stuff. We'll leave it there.

[Theme Music]

Creators and Guests

Bennett Bernard
Host
Bennett Bernard
Mortgage Accounting & Finance at Zillow. Tweets about Mortgage Banking and random thoughts. My views are my own and have not been reviewed/approved by Zillow
Bradley Bernard
Host
Bradley Bernard
Coder, builder, mobile app developer, & aspiring creator. Software Engineer at @Snap working on the iOS app. Views expressed are my own.
AI for taxes & the end of App Stores?
Broadcast by