Agent to Agent: Google’s ADK and the new frontier of AI

Download MP3

Alrighty, welcome, everybody, to episode 23—the great Michael Jordan episode 23—of the Breakeven Brothers podcast. I am, of course, Bennett Bernard, and joining me as always is Bradley Bernard. How you doing, Brad?

Pretty good. I feel like I'm getting close to being impressed by how many episodes we've done. Like, we're about to be at 25, which is awesome.

Yeah, I think I've had co-workers ask me how long we've been doing it, and I'm like, "Oh, honestly, it's been a while." And speaking of a while, I got an email the other day saying our domain was up for renewal, and I was like, "Oh, nice." Like, immediately, yes, of course we're renewing it because we love it, we enjoy it. So yeah, I've been doing pretty well. I did replace the battery on my car this weekend. So I went to Costco, bought my own battery, and tried to do a five-minute install job. So I texted one of my friends, Travis, and I was like, "Hey, I'm picking up a new battery." He had taught me everything I know about cars, working on my old 4Runner that Ben had actually handed down to me. And I was like, "Oh, I'm gonna do it in five minutes." So I pick up the battery from Costco, go to the parking lot, and uninstall the old one. And as I'm putting in the new one, there's a little clip that holds the battery down, kind of top-down. And this clip kind of hangs there. And unfortunately, as I was putting the new battery in, it fell and it kind of went all the way to the bottom of the car. And then it was an extra 30 minutes trying to fish that thing out from the bottom of the car. Once I got that out, it was very quick. But I guess I was a little ambitious to fix a battery within five minutes. For a pro, very easy. For someone like me, maybe next time. So that was kind of my weekend story.

So would you describe yourself as someone who's handy? Because I know the 4Runner gave you lots of projects of upkeep and all that. So would you say that you're someone who's become kind of handy, or would you still not consider yourself handy?

I think I've gotten a lot more handy. I'm self-proclaimed handy, in that sense. I know a lot of people that are a lot more hands-on and know a lot more about things, especially cars. But I'd say from where I came from, doing all those mini-projects, especially failing and Googling, I think I'm a lot more well-equipped with cars. I think the funny thing about doing car things is you always watch these YouTube videos where it's, "Hey, this is how you replace the battery in a Honda Civic," and they walk you through things. But the problem is, for all these videos across the board, they show you exactly what to do, then they skip these critical parts of, "Oh, like, unscrew this," and then boom, the video jumps and you miss it. And you're like, "How, exactly? Mine's not exactly like that." And so that's the biggest frustrating part about car stuff, is you need a really good video. Once you find a really good video, everyone in the comments section is just praising it. Yeah, long story short, I think I'm a little bit more handy now, and I encourage people. I think our quote was like hundreds of dollars for a battery replacement, and buying and installing it yourself, it should be 10 minutes and it's not that much effort. So that's the nice part about being handy now.

Yeah, yeah. It's funny on the car thing because I've become more car-handy just because I have a crappy car. It should be nice, but it's a Volkswagen, so if you know, you know. So I had to get more handy with it. But it's funny, as you're describing the thing falling into the engine bay and never seeing it again, that happened to me once. I had a magnetic bit and I was trying to undo something—I can't remember what I was doing—then I don't think I had it really in the wrench, like, seated in the wrench that well, and I just heard it go down and *ding, ding, ding*, and it got to the bottom. I never found it. So it's somewhere, I guess.

That's rough. Yeah, mine was a really large piece, so I was like, "I can't... I need this for the battery." And I was like, "How do I get this out?" I was like, they could, you know, jack up my car and tilt it, and it would slide out. But unfortunately, I went down on the ground and looked up, and from the kind of driver's side tire wheel area, there's a large hole, and I just saw it dangling there and got it out that way. But I considered just moving on.

Yeah, the funny thing on the YouTube videos, too, really fast, is I saw videos on TikTok, and it was like, "When you're watching YouTube to repair your car," and it's like the guy's like, "Okay, go ahead and unscrew this," and then you hit pause and then you unscrew it, and then you hit resume and then it's like, "But don't unscrew that," and it's like you already unscrewed that. And so, it was a parody, but it's exactly true, because it was like, "Do these seven steps, but then don't do this," and it was like, if you're trying to follow along on YouTube, it's pretty hard.

Yeah, depending on it. You gotta watch the entire video first and then watch it a second time.

Yeah, I got to prep myself. Yeah, yeah, that's cool. Awesome. Well, it sounds like you had a good time over the weekend with the car. My week was, I flew back from—I was on a work trip up to Seattle. So I flew back on, I think it was, gosh, Thursday. And I realized—what I didn't just realize in that moment, but I've kind of progressively come to the acceptance that I just hate flying, especially when it's just me. When it's with my family, okay, that's different, but I just... it's freaking me out. And so, to kind of give you context, for Thursday, you know, I already don't like flying. I feel like I'm just like a passenger... well, I *am* a passenger, but you know what I mean, you're kind of just like, "Well, this is out of my hands." And so, I'm going to the airport, and I'm in the Uber, then I see on Twitter, I see Air India is trending, and I'm already like, "I don't want to see this. I already know what's going on here." And of course, I look at it, it's a horrible crash. And I think all but one person died, which is kind of a crazy story in and of itself. But I was even looking, like, "How did you do that?" Just so, if you know, you know, what was your strategy? But yeah, so that was horrible. And then something that kind of ties into the podcast, too, is like, while that's all going down, everything on the internet is down. Like ChatGPT, you know, Google, AWS, gosh, what else? YouTube, Twitch—like every single website felt like it was down for over an hour. And it turns out—and I didn't know this at the time, so in my head, I'm like, "Oh my gosh, I'm about to get on this plane in an hour. Do Delta's servers and their coordinates use these tools that are down? Like, am I going to be flying the old-fashioned way?" I was just hating every second of that. Fortunately, you know, made it home, which is great. But yeah, I just realized I hate flying. But it turns out that outage was actually a Google Cloud outage, which is pretty crazy because I think that revealed how much is underpinned by Google Cloud, because it was like everything was down. And I'm sure people that work in depth in, I guess, what would that be, like SRE, Site Reliability Engineers, I guess they probably knew that. But as a layman, it was pretty stunning to see the whole internet was down. I thought we had gotten cyberattacked or something for a hot minute because it was crazy.

I think that's what some people thought at the time, was like, "Oh, is this a coordinated cyberattack?" And I think Google put out a postmortem saying that they tried rolling out a new feature and it was like a null pointer exception, so they rolled out something too fast. But yeah, everything was down. Even I couldn't work for a few hours. A lot of our stuff was down. And then I think it was going to come up really quickly, but then it took a little bit longer. And a bunch of companies put out postmortems to have, like, I think Cloudflare came out and said, "Oh, we have XYZ products, and we don't want to rely on our own infrastructure. So we rely on Google for that to have some redundancy in case things go down. We're not totally screwed." But when our main provider goes down like Google, then we get kind of housed. So I think that one wasn't too great. But yeah, I'm curious to see if anything will change from that. I think all these companies have these large SLA buffers where they can be down a few hours a year, maybe like 99.99% uptime, and then, depending on how many nines there are, that gives them that many hours of downtime. So, huge outage. Everything is back up now. But yeah, it is pretty revealing seeing how much relies on Google across the whole internet.

Yeah, yeah. It was, and again, the timing of it was horrible because in my head I'm going, you know, again, Delta is not going to know which way to fly. You know, they don't have access to whatever they need. So, but yeah, that was my week. And then, of course, yesterday was Father's Day. It was a Monday night. Yesterday was Father's Day. I had a good time. I hung out with the family. I ate some good food. It was good.

Did you make anything on the Blackstone?

I didn't. It's so hot in Arizona. It's so hot. You don't even want to be outside.

You don't even need a flame. You probably could.

You probably could. It was 111 yesterday. It's 92 right now. We're late in the evening. Yeah. So, yeah, I was inside. I wasn't messing with that.

Yeah. Ours is a low of 51, high of 72. So I cannot say the same.

Cannot beat that. Yeah, I definitely miss that.

But yeah, speaking of Google, they came out with a big release last week: Gemini 1.5 Flash. I think it was Gemini 1.5 Pro with the 06-13 date. So we've kind of harped on the AI model naming for our entire podcast, but now we're coming up to the trend of just adding dates for models just to give them a kind of a checkpoint of when they're released. And Gemini 1.5 Pro stands out because it's number one on the Arena. So people love it. The capacity is pretty large, so sometimes these new models come out, but they're down a lot. I think Google has done a fantastic job putting out something new, getting people to use it, and integrating with all the popular AI IDEs: Cursor, Phind, you name it. Because I think last time, Anthropic came out with Claude 3, but they didn't give access to Phind, which is a whole different conversation. And so Phind was kind of behind, I guess, a little bit on the model intelligence. They've caught up since, but there was a little bit of drama that they didn't have access to it. So if you're using Gemini 1.5 Pro, make sure you update your model in your code. I think for the most part, they're trying to redirect Gemini 1.5 Pro to this latest version. So, super excited from Google. Good price, good speed, kind of like an all-around solid model, really long context too. And then the second one is Opus 3. So this one is a huge release. I think people characterize Opus 3 as kind of a specialist that takes a long time to come up with an answer but is really thoughtful in that process. So people are trying to push the model's thinking time to be as high as it can, which means you ask it a question, you tell it to think really, really deeply, you tell it to self-analyze and have this chain of thought of thinking. And so I think I've seen posts on Twitter talking about like a 10 or 15-minute response time, which is insane. You'd imagine there's lots of intelligence behind that. And I think right now it's number two on the Arena. But again, if you compare Opus 3, a slow, intelligent model, to Gemini 1.5 Pro, completely different use cases. Like, you would never, ever use Opus 3 for a real-time chat or any other real-time experience. It's very much for something like analysis, large-scale code refactor planning tasks. So it really depends on your use case. But Gemini 1.5 Pro is all-around pretty good, great price. Opus 3, only if you need it. And I would say it's very, very specific use cases where you need deep thought and analysis. So if you haven't used them, check them out. Opus 3, I think, is $15 per million token input, something like $75 per million token output. I can't remember. But relatively expensive. And Gemini 1.5 Pro is very, very cheap. So, yeah, definitely try those out.

Okay, nice. Yeah, cool. And I think I saw the Gemini one. I don't think I saw the Opus 3 one, because I think that must have been when I was on my work trip, so I was kind of heads-down doing other stuff. So that's cool. Yeah. 15 minutes is a long time. I mean, that's kind of like a deep research, you know?

I'm not sure if you've used the deep research. Yeah, it is very similar to that.

Yeah. Interesting. I'll check that out. Cool. Awesome. Well, one of the things that I wanted to talk about in this episode was kind of a follow-up from, I think, either our last episode or the episode before. I can't remember exactly. We're so many episodes deep, but Google ADK. And so just to give a quick backstory for those that didn't listen to that podcast and the one I'm talking about: Google announced or released an Agent Development Kit, which basically is a framework for building agentic AI. And it's very similar to LangGraph or LangChain in the sense of pairing an LLM with tools or pairing them with kind of structured workflows and stuff like that. So I'd kind of talked about it being a release, I think, as part of the Google I/O presentation that happened back in May. But I was able to actually get my hands on it and build some kind of prototypes and some demos with Google ADK. And so I had some interesting takeaways and thoughts behind that experience that I was going to kind of share with this podcast. So I think the first step, I'm coming from LangGraph and LangChain, so that's going to be my kind of benchmark to compare against. And overall, Google ADK is a much simpler framework to work with. And I think if you look at—sticking with LangGraph, I guess I'll stop switching between LangChain and LangGraph because they are two different things. All the things I've been doing in LangGraph are pretty much solely LangGraph and don't have much to do with LangChain. But compared Google ADK to LangGraph, and it's so much simpler. And a lot of the kind of overhead that you have to deal with with LangGraph is either just not there or it's abstracted away from you, you know, kind of developing. It's kind of under the hood, I guess. And it makes it so much more enjoyable to build out agents compared to LangGraph, which is very, very detailed. But we'll talk about pros and cons because I think there's a reason for that.

Is this also in Python, or is this a different language?

Yeah, this is all in Python. I don't know if Google ADK has two or if it's just Python. They might have a TypeScript one. I can't remember. But yeah, the one I was doing and what I was building was just all Python.

Okay. So, yeah.

But yeah, so I think one of the first things that got me really excited about Google ADK was MCP. And specifically because I think there is a philosophy or a framework of how AI agents should work, you know, and I think this is kind of like a frontier that we're still exploring. I think it's probably not a one-size-fits-all, either. But when we think about agents, a lot of times they're doing certain tasks for us, and a lot of times they need to get that data from a system. You know, we're in the year 2024. All of our data lives in different systems. And, you know, if you are running a business, you might have Stripe, you might have QuickBooks, you might have your bank reports. Right? And so you have all these different kind of data systems or systems where data is stored. And MCP is a phenomenal way to have agents get what you need. It's much better and much more suited for AI than, like, you know, our API calls. And we've talked a lot about MCP in previous episodes, so I won't hammer how that works in this episode. But I will say that building with MCPs has been kind of tricky. And I think, Brad, you're going to talk about that in a little bit and go into more detail on building MCP servers. But I would say overall, the developer experience on MCP is pretty tricky. It's much trickier than APIs. And especially for someone coming from a non-technical background, the first shot of MCP that you get, you just go, "This is bizarre, and I don't understand." And it takes two or three times, I think, to really understand what's going on. So Google ADK does a great job of kind of building in MCP as just part of its agent tools. And so when you go to define an agent with the Google ADK, you'll just have a simple function that's like `create_agent`. You define the model, you define, I think, you just have a description of it, and then you can give it tools. And the ADK gives you a built-in MCP toolkit tool that you just define the—like, if you set up an MCP server, you'll know it's like you have the server parameters where it's like an `npx -y` and it has the path, right? So you just put in the exact MCP server parameters in that MCP toolkit function in Python, and then it just works. And so all of a sudden you're, as long as it doesn't have authentication and all that kind of stuff, but if you just have a basic MCP server, you just define it, you put it in there, and then the agent has access to those MCP tools and resources and all the other things that come with MCP.

LangGraph didn't have that? I'm surprised.

LangGraph, I think, is starting to have it. It was definitely not as easy as that. When I was looking at it, that was something that was a challenge for sure. And I'm better at it—I'm better just in terms of MCP and building agentic AI now. So I want to kind of go back and revisit it, because I think when I was doing it with LangGraph, I was just kind of getting more familiar with MCP. So I want to give it a fair comparison, but the Google ADK is just so much cleaner, you know? And so, yeah, it was just, you know, seven lines of code, basically, defining the server parameters and then adding that into the agent when you instantiate the agent. So easy. So that was a huge, huge benefit. And I think once you kind of connect an agent to an MCP server and you start having access to those resources and those tools, that's where the juices really get flowing in terms of like, "Oh, I can build something that, you know, takes my—" I think the demo or the example that you had used in prior episodes was the kind of personal finance software that you have. You know, I can go get that data with this MCP tool and then I can have the AI analyze it and not have to specify API routes and all that kind of stuff. Super, super fun. Kind of like an aha moment.

Yeah. The thing I need to understand more is kind of the multi-agent workflow. Like, I get one agent with an MCP toolkit, which kind of wraps the MCP server. To me, it almost sounds like, instead of having one super-agent or having 50 MCP servers, each with a specific toolkit and knowing too many things to do, it almost sounds like you're supposed to break these agents down into maybe like the Stripe agent or the financial agent, which has the Stripe MCP, your bank's MCP. Then maybe you have your accounting agent that has access to QuickBooks, Xero. Like, is that the right way to look at it? Or has anyone gone down the path of understanding how to break up one agent and coordinate them with this kind of multi-agent framework?

Yeah. Yeah. So that's a great question and something that I've been looking into. I'll tell you from my experience, when I've tried to build out agents, I kind of think about what I do. And so I just kind of go, as an example, it's like, "Okay, if I need to pull a transaction out of QuickBooks and then create a customized profit and loss statement," like, I think about myself. I go, "Okay, I need to pull this from QuickBooks. I need to put it into Excel or Google Sheets. Then I need to feed it back into this reporting software that I use." And so I think about it from that approach. Now, is that the best approach? I don't know. I think it's one of those things where there's lots of different ways to build something. I've seen, and kind of what you said, where you just have specialized agents that just do that one thing, right? So I think one of the things that is a challenge with multi-agent systems is the handoff between different agents, and I'll kind of talk about that a little bit, too, because that is actually a specific point in building with Google ADK that I ran into. But yeah, trying to define the scope of these agents, I think, is something that's kind of an art, not a science right now. Because in my mind, I just go, "Okay, I want my agent. I'll just have one agent that has an MCP server with QuickBooks and whatever other systems I need access to. And I write custom tool calls to do the unique things that I need it to do that are specific to me. And then that's my agent." But if you start to give the agent too many tools, or too many MCP tools even, I think right now it's very easy for that agent to not know exactly what to do. I think it thrives in having a limited scope, but how limited is the question right now.

Yeah, for sure. It almost sounds like you have to give them multiple MCP servers which have their own tools, but not give too much. Because even as I've found tinkering with Claude, adding MCP servers, the more tools, the better, up until you hit a certain point where maybe some of these tools have similar definitions. So the whole protocol of getting the AI agent to go invoke a tool is based on the developer's description of what this tool does. And if you have a tool that, you know, Stripe can maybe fetch your bank account balances because you linked your banks to Stripe, but then you have a specific bank account integration MCP server, you have kind of this overlap of, you know, how do I differentiate between these tools that might on paper sound similar? So it's like maybe finding the intersection of tools that are similar and then making sure that their MCP tools don't overlap too much. If they do, maybe separate them and describe them. It almost sounds like the agent is an organization of MCP servers and tools, where, again, you had a financial agent, you could say, "If you have any finance-related questions, route to this person." That person has all the tools for the financial area of your business. So then accounting could be separate, or coding could be separate. So maybe that's the way to look at it. But again, I think these are still, like you mentioned, very early on. We're trying to figure this out, and these models get better and better, which hopefully lowers the barrier to having this kind of mystical art of figuring out the right query, the right prompt, the right tool. Hopefully, we get better at building the tools, and the models get better at choosing what tools to run. And therefore, maybe we don't need to spend as much time trying to overthink how to slice and dice these agents.

Right. Yeah. I think LangGraph actually has done a great—I think it's like a Notion page. I don't know if it's an article, I wouldn't call it that, but they talk about different agent structures. And so the one that Google ADK kind of leads with—I think there's different, you can structure them however you want—but I think the one that they kind of lead with is like a hierarchical structure where you have an agent and then you can give that agent sub-agents. And so those sub-agents have their own model, they have their own tools, they can have access to their own MCP servers. And that parent agent is where you give an open-ended task and you basically are saying, "Hey, this agent has all the tools it needs. It has the team," which is just sub-agents, "that it needs to get this task done. So go forth and conquer." That's like the hierarchical agent. I think there's agent swarm; you may have heard of that. I think there's Lindy AI, I think is the name. They do like a swarm, and I'm not familiar with it, but I think basically it's just like you have five or more agents and you just say, "Go do this thing," and then they all just go do it simultaneously and in parallel. I don't know. I think there's some... I'm not familiar with the use cases for that. I don't know if that's for like a deep research thing where you just want to go out and get all kinds of different sources. But I know that's a structure that you can use. And I think there's a couple other ones, too, that I'm... you know, there's like multi-hierarchical agents and all that, which I think you can get pretty down the rabbit hole on that. But what I would say is—and this is a great kind of segue into one of the cons of working with Google ADK that I noticed—was the agent handoff. And when you have a hierarchical structure, I noticed it struggles, at least just the way I wrote my model. So, you know, of course, there probably is some user error, user learning that needs to take place. But I had a pretty simple agent where I had two sub-agents. So there was an agent, and I gave a sub-agent that could access system A, and I have an agent that can access system B. And I was basically telling the parent agent, "Using the tools that you have and your sub-agents have, basically reconcile these two reports. These two things should agree." And what I noticed it would do is the parent agent would transfer the request to one of the system agents. The system agent would go do that task, and then it almost would not go back to the top agent. It would kind of go, "Oh, I don't have the tools necessary to reconcile with that other system." And I would have to tell it. I was like, "Go back to the parent agent." And then it would transfer back, and then it would go to the right one. So that was something that was really interesting and I did not expect. But if I prompted it differently, I could sometimes get it to do it the right way, where I say, "Hey, once you're done, go back to the parent agent and then go." So it was very fickle, which was really interesting, because the way I prompted it made a difference. Sometimes it would work, sometimes it didn't. And I was demoing this to some folks at work, so it was really interesting. And I think that's why Google is spending a lot of time and, I think, intent on agent handoffs. And LangGraph is, too. We'll talk about that in a second. But specifically, Google has released A2A protocol. So we talked about Model Context Protocol. A2A protocol is Agent-to-Agent protocol. And it's basically how you have a bunch of different agents and how they talk to each other and share information and, again, manage that handoff. So very early stages on that. But I think it's a problem that, you know, I think if anyone who's built agents can attest, it's a problem that exists right now of like, how do you get these things to share information in the way that you need them to, responsibly, too? You don't want them sharing the wrong thing. And so that's something that I think is still to be worked out. And with Google ADK, it's really easy to build agents and multi-agent workflows, but I think the control is less than in LangGraph. So that was something I noticed as well.

It kind of reminds me, sometimes I add MCP servers to Claude and I ask it to do something, and sometimes I'd hope it would kind of infer to go use this tool. Other times it doesn't. So oftentimes I'll go back and say, like, "Use the MySQL tool, use Context7," and then it'll pick it up and say, "Oh, I'm going to go read the docs," or "I'm going to go invoke a SQL query to get the data that I need." And so ideally it's smart enough to know that. If not, yeah, you can nudge it in the right direction. So it sounds very similar. We're at this stage where they're like 90% of the way there, but that last 10% is, am I doing something wrong? Is the model not smart enough? Where does that end up in terms of logic to get what you want to do done?

Yeah, so it's super cool. So that's definitely a pain point, I think, is managing the agent handoff and having the parent agent delegate correctly, especially, again, this is a hierarchical framework, I guess. So I think if you give it more—my demo was pretty bare-bones, it was not super crazy—so I think if you give it more instructions and more prompting, it probably does a better job. But again, just one pain point I noticed. The other one real quick is OAuth. So I think this is a pain point that's with Bennett and not necessarily with maybe everybody else, but OAuth and just the overall flow of you have your redirect URL, and then you have to get the access token. QuickBooks, they do what they do well, but their API and their OAuth is kind of a pain in the rear end. And so Google ADK has built-ins to manage this. I wasn't able to get it up and working. I do think that's some of my own issue. But I was using even the Context7 MCP server, which we'll talk about in a bit, but basically it reads the whole library for you and then can basically summarize what you need to know from the whole repo. So I was asking it, "Explain OAuth to me. Like, give me an example, make it make sense to me." I even did the "explain like I'm five." And it did its best, and I think it was just a me issue, but it was not... I wasn't able to get it successful. So that was something that was a pain point, was just, and I think authentication in the AI age is something I hear a lot of. Like, MCP servers, are they secure? You know, how do we make sure agents are secure and they don't run amok and share information they're not supposed to? So that's something that I noticed as well.

OAuth is historically complicated, and I don't think anyone's a complete expert at it. I pull in libraries that handle it all for me, but back in the vanilla PHP days of 2011, I remember doing an OAuth integration with Twitter, and that was a huge pain in the butt. And I think it's still maybe even harder, but our open-source community has done so much effort to abstract that away and do it for you. And now I install something, if it doesn't work, I blame the library and I don't have to think too much about it. But yeah, I pulled up the OAuth 2.0 spec while you were talking about it, and authorization codes, client credentials, device codes, refresh tokens... I mean, very secure. I'm glad we have it. It's something I never want to spend a lot of time in. I use it because I need to, and I move on. So I would say it's kind of the tried-and-true ringer test of software engineering: can you get through an OAuth flow and can you make it work? And it sounds like you're going through that at the moment.

Yeah, yeah, it's definitely... I'm sure I'll get there. But just in my time playing with it, I wasn't able to really make it work, and then I moved on. So yeah, more to come, I guess. We'll come back to that one another time if I'm ever able to get it up and running. So that was a lot about Google ADK. What I want to kind of finish this section off with is just a quick comparison to LangGraph in a bit more detail. So what I really like about LangGraph over the Google ADK was that LangGraph, for all of its kind of manual pain, is much more specific. And I feel like therefore you have much more control over the workflow that the agent kicks off. It's much more controlled, and you have to be more intentional about how you set up the nodes. So they think about things in terms of nodes. And so it's like you have node A, B, C, D. You can put kind of controls around those nodes where it's like if node B's output is yes, then go to node C; if it's no, then go to node D. And you can kind of conceptualize that a bit better, and it makes more sense. I didn't see anything in Google ADK that really had that same kind of structured process flow. And then the other part of LangGraph that is actually much better, in my opinion, is that the state is much more explicitly defined and managed. Where, like, state—I think just for folks that maybe aren't familiar, it's kind of... I don't know how to even describe state. It's like the domain that your program operates in, or the... I don't know if that makes sense if I'm explaining that right, Brad, but that's kind of something that... I don't know, it's much more specific. So it's kind of painful to define the state typically in LangGraph, but then when you define it, you're a bit more hands-on, whereas in Google ADK, I don't think you can really define it explicitly. Or maybe you can, but it's much more under the hood, and therefore I feel like you have less control. It's something that was kind of, you know, made it easier to develop, but I don't think you'd be as specific or it'd be harder to get the specific result that you want compared to LangGraph. So just, those are some pros in favor of LangGraph. And then I think the last point I would say is, I think LangGraph more recently, they've released something called LangSmith, which is like their own IDE. It's like a web-based IDE, which is super helpful. Prior to that, though, it was really hard to visualize your workflow. You know, it was kind of just, you have to run the program, but it wasn't as visual. A lot of times you need these things to be visual so you can see the workflows and see what the agent's doing and see the API calls that it's making and all that. And so they just more recently, I think, released that, whereas Google ADK has like a built-in web command in the CLI that you can use where it spits up a FastAPI-backed localhost page that you can kind of see what's going on with your agent. So they both do a good job of that. LangSmith is a bit more detailed. You can see a lot more. All in all, LangGraph and LangSmith are just so much more hands-on. So if you're into that, it's a great fit. Google ADK is a bit more, I guess, abstracted or just a bit more... it's kind of hidden under the hood. And so it's easier at first, but I think if you need to get specific, I haven't been able to get there yet. I haven't really been able to just be so structured with it.

Yeah, definitely. They're both great to build with, though. And yeah, it's been good. It's been fun tinkering with them. And, you know, there's a lot more pros and cons I could say, but, you know, I won't keep it too long in this one. But, yeah, they're both great experiences and they're fun to toy with.

I'm glad they're different because that's the whole point of having these different frameworks and offerings. Because again, I used LangChain what feels like years ago when it first came out to be mainstream. I kind of thought it was abstracting a lot away, and it's interesting hearing that it's a little bit the opposite when it comes to the agent side of Google abstracting more away from you, so you feel like you don't have as much control, whereas LangChain gives you a bit more. Secondly, I watched a few videos from LangChain's Interrupt conference. This was last month, probably about a month ago exactly from today. They held an event in San Francisco, had a bunch of pretty cool presentations inviting industry kind of leaders in the AI space creating agents. I think they launched their LangSmith web application. I think before it was like a macOS app only, so Windows people couldn't use it. So I think they made it a web app, and they're very focused on deployments. Really, really cool stuff. If you have free time, I would highly recommend watching the Interrupt videos from the LangChain conference. I think they're all free, all on YouTube. Pretty cool interviews, cool releases. I even pulled up their site while you were talking, and their site looks really good. It's very interesting seeing the rise of open-source software that becomes VC-backed. I assume they're VC-backed just by the way it looks and such. But like VC-backed open-source companies who now charge for a hosting platform. It's a very, I mean, to me, it's a new business model where they create and give so much for free, but then they say, "Hey, we offer this paid product that is very fine-tuned for what we offer." So really cool. Check out the videos. I think it's exciting seeing these frameworks. You kind of gave me a little bit of energy to build something. And honestly, hearing you talk about it, I'm like, "Oh, you know, it's kind of fun to try new things." And I've been on the spree of trying new AI tools, and this sounds like something I should pick up again and kind of try it out with this new generation of LangChain.

Yeah. Let me just do one quick plug on LangGraph, then we can move on. The team, I think it's—I mean, there's a bunch of people that work there, obviously. But the two main people I see always, I think his name is Harrison and Lance. I think they're both the founders or co-founders. They do such a great job in all their videos, like when they explain, especially Lance. And I say that because LangGraph has, on their page, an academy kind of thing where you can work through video tutorials and stuff, and he explains things super well. So they did a great job with that. And then the, you know, just that point of building something, you can even plug in, what's the right word, like open-source models. So you can plug in DeepSeek or Qwen into these workflows. It doesn't have to be OpenAI or Google. So if you have any really sensitive workflows or if you're dealing with HIPAA data or PII that you just want to be 100% controlled, you can pretty easily just plug in these open-source local models and run those in place of the foundational, again, OpenAI, Google models. And so, yeah, it's pretty cool. I mean, I was telling people that I run into and talk about this, I'm just like, "If you have something that you're doing for your job, an agent can do that, most likely."

Very likely. Very likely.

And Harrison, if you're listening to the podcast, we would appreciate a sponsor. I know Ben talks highly of LangChain. I'm coming around to it. Also, Anthropic, if you're listening, I am pushing Claude. You're welcome. I think it's two kind of best-in-class solutions for building agents and writing code. So, yeah, if you guys are listening, feel free to reach out. I think our contact info is on the website.

Give us some swag.

Give us some swag. We'll wear it on the pod.

Yeah, we'll wear it. Yeah, yeah. Cool.

Yeah, I want to talk about a fun project. I know I had started it last time on the podcast, and I had gone through the journey of building my MCP server for Laravel Telescope. And I won't go super deeply into Laravel Telescope, but the TL;DR is it's a developer tool that you use for Laravel applications to kind of expose performance data. So, for example, if you have your web app and you build it, maybe there are MySQL queries that are slow or other various Laravel things that make your app slow. And the reason I wanted to build this MCP server was because I was trying to use Claude to attack a really hard part of my code base. The approach that I had taken was asking Claude to look at the code, understand what it did, write a new version, make it a better version, and have test coverage so that seemingly things work. It got me pretty far. Again, I think last time I talked on the pod, it was like 80% done. The last 20% actually turned out to be quite hard. I had to scrap it, redo it. I kind of felt again and again, Claude was really smart, but it didn't have enough information. And so that's why I'd gone on the journey of building this MCP server to expose all this critical developer debug data to Claude. And the first iteration I had of this was very, very simple. I essentially took Laravel Telescope's data, which is stored in a MySQL table, and dumped it into a FastAPI MCP server in Python. So basically you gave me your kind of database credentials—this is all localhost, like a very non-production use case. I'd log into the MySQL server, make a query, and dump out all the data. And I realized this kind of sucked. Claude initially said, "Way too many characters in the response." So when you invoke that MCP tool, it would say, "I can't even look at the response because it had so much freaking data." So then I kind of went down the path of, how do I build a good MCP server? Because the ones I'd interacted with, I install them, they work, I don't look back. I have no freaking clue what's happening under the hood because I install it and I know generally what direction it's going, but I'm not looking at the code. Maybe I should, maybe I shouldn't, but either way, I install it, it works, I don't think twice. Building my own MCP server was pretty fun and challenging because you end up in this era of trying to fit your tool to kind of the state-of-the-art MCP clients. So Claude is the client because you can add MCP servers to it. And I ended up in this system where everything was paginated. So this was a pretty critical insight I had realized, is that Laravel gives you a ton of freaking debug data, great for developers and very easily parsable, but for LLMs, you have this limit of how much text you can return, how much text it can parse. So my guiding principle as I refactored my MCP server was, one, make it paginated, and two, also expose other tools. So for example, the debug tool shows on a per-request basis. So if I added a new friend on my bill-splitting app, it would show up saying "HTTP POST to /add-friend route." That hit this controller, and inside this controller action, a bunch of stuff happened. For example, maybe I made 50 MySQL queries to get that controller action to work. And as I designed the MCP server, I can't return all 50 queries. It's a ton of information. So I had to chop it up into pages, maybe 10 queries per page. And then in the response, which was JSON, I would actually indicate to the AI, saying, "To get more queries, hit my API with page two." Or, as I had a request overview page, the request overview essentially outlined queries, cache data, basically all these associated resources with the request. I would outline to the AI, saying, "If you want to get queries, invoke this tool. If you want to get views, invoke this tool." And I actually didn't come up with this myself. When I was trying to generate the MCP server using Claude, which I love, by the way, it was suggesting to do this. So I think the top takeaways were: one, please build your own software because we need more of it. The era of AI is here. Two, paginated APIs are an absolute freaking must for MCP servers. You will not be successful without it. Three, you're kind of talking to a human. Like, when you usually think about these interfaces, it's have data in, have data out, no fluff. And I definitely agree, try to reduce as many extraneous outputs as you can. But at the same time, you can basically talk to the LLM in the response, which is a little weird, but it worked really well. It was like, "Hey, if you want this extra data, invoke this function or this MCP tool," which, again, if you're looking at the output, you think that's a little odd, but the LLM takes that whole output and then analyzes it. So I open-sourced it on Twitter. I posted a short video, maybe two minutes, talking about the tool, the background, how to install it. And essentially it's one Python script, maybe 400-500 lines, open source on my GitHub. So check it out, github.com/bradleybernard. I think it's `telescope-mcp` is what the repo name is. And as of the recording, I try to post on X, don't get a ton of traction, so if you're not following me, go follow me there. But I got 120 likes, 70 bookmarks, and like 10k views, which to me is a great success. I find I try to post good things; sometimes the algorithm doesn't like me. That one turned out to be not too bad. And then I checked on GitHub, it has 11 stars, which might be my highest open source to date. I don't do that much open source. I would like to do more, but this one was a super fun learning experience for me. Got retweeted by Taylor Otwell, so that was great. I imagine Laravel will do a lot more first-party MCP servers to kind of make their developer experience a lot better. This is an area that I can see as a company they can invest a lot more resources in, so I'm excited to kind of kickstart that effort. It's unofficial, but I think it works pretty well. And yeah, thrilled to have that out.

That's cool. So one quick thing, or one quick distinction, just again, for folks that don't know, Taylor Otwell is the founder of Laravel, right?

Exactly.

I know you guys are on a first-name basis, but just wanted to make sure folks, everyone else knew that. That's cool. That's super cool. So let me ask you, did you do the HTTP transport with the MCP server? Did you do the standard I/O or whatever?

I did standard I/O. So everything is evolving. I think there's standard I/O, SSE, like server-sent events, whatever. I did the grand old legacy, a standard I/O MCP server.

Yeah. And so, because I know that they're trying to—I think "they" as in Anthropic, the one that founded, or I guess invented, I don't know what to call it, pushed the MCP protocol out—I know that they're pushing people towards HTTP as the preferred method in production, I think. I think they're trying to phase out SSE. I don't know. I think the security reasons behind that are above my head. So I was just curious if that was something you did. But it sounds like you did the standard I/O, which is what I did on my ADK one, and it works great locally, obviously. It's very easy.

Yeah, honestly, it works great. I think the security thing is a big deal. One other thing that was really cool is I was using Claude to build it. So I had one terminal open in my Telescope MCP repository. Then I had another repository open, my Laravel app that I had added that MCP server to. And I was asking it, "Hey, can you debug this data using my MCP server?" And it was crawling through all the tools, invoking them, everything was working. Then I asked Claude in that same session, I said, "Hey, I actually control this MCP server. I wrote the code for it. Can you please do a deep dive and understand if there's anything that I can improve upon it?" And so it gave me a bunch of feedback. I took the feedback in, deleted a few things because I think Claude can be very, like, give you an extremely long list of things that at the end aren't super useful, like a few of them. So I trimmed the list, sent it back over to the other Claude session. I was like, "Hey, I reviewed my code. I think we should improve this." You know, I took credit for it, of course. Asked it to do that, and it improved it. Then I took it again and I restarted Claude. I said, "Hey, I updated it. Do another look." And again, it gave me even more feedback. And I was like, "OK, after round two, I'm not really that interested." So I ditched that and I said, "This is good enough." I then went back to the original repository that I was building it in. I said, "I'm going to go open source this. Can you please polish this up, you know, remove all your test scripts and junk?" And then I published it, made the video, and it was great. So another little hack, if you're building an MCP server, integrate it with Claude, ask it to use it because you control it, and do a deep analysis session. That worked out really well for me. I think it helped give me kind of this AI natural language usage, which I talked about earlier, of not only including your data but including a little JSON blurb of, "If you want more data, hit this tool," which sounds kind of weird, but it worked out really well.

Yeah, I think natural language interaction with agents is definitely going to be—I don't want to say the future because that sounds very definitive—but a very big part of our lives going forward. You know, and I think that's something that with the Google ADK and then with the LangGraph agents that I was building, basically building natural language ways to kick off accounting workflows, you know? And it's like, "Hey, if I have this report I need to generate every single month for clients, I can just tell my agent, 'Hey, do this for me. Do this last month.'" Whereas before you'd have to specify, you know, if it was like a regular Python program, it's like, "Okay, this is the month I need, so I need to say `python main.py` and then space and then put the month." Like, you have to be very specific with the parameters you pass in. And now you can just say, "Hey, run this process for last month." And it understands, "Okay, today is June, last month means May, I'll run this for May 2024." And it's just so much more frictionless, you know? And that's such a... it's way quicker too, and it just feels nicer. And even with the Google ADK, they support—there's a bug, so for those interested, they can just comment on the video and I can kind of point it out to you or share the link—but there's a bug in the ADK code, at least there was when I was building, but you can make a minor tweak to the code, and then it'll support live voice streaming. So you can talk to it even, you know, talk to your agent and say, "Hey, run that process for me," and it goes and does that. And it'll talk back to you and say, "Okay, job's done." So natural language is super exciting, I think, because it puts programs that people build—customized, talk about customized SaaS and customized software—and then now they can run it in a much more flexible way. When it works well, it's just perfect.

Yeah. I even think voice is kind of the next frontier of getting that to happen fast, but then voice kind of falls apart if you're at the office and you're like, "Hey, create an expense report," and you've got 40 people next to you chatting to their AI agents. Yeah, that kind of sucks. So the next generation of leap, which we'll eventually get to, and maybe OpenAI is working on this, is like brain-to-computer communication, and that would be fantastic, of not having to vocally describe something but having the same kind of bandwidth of communication, being able to say things really fast, not having to type it, and kick off multiple things. But yeah, I agree, really, really cool to have this kind of natural language input. And then on my end, kind of continuing the Claude journey, I had talked about it last podcast, and my short update is I still think it's the best experience so far for writing code again. And you can have multiple Claude sessions running across multiple projects. What I've been doing is having a web one, my kind of `splitmyexpenses.com`, you can imagine, and then the React Native app. I run those at the same time. And I've even run two Claude sessions within the same repository and folder, working on different things, of course. One, I would ask to create some UI. Other ones I would ask to fix the logout, whatever. Really, really great experience. And they keep updating it. They have a new plan mode, which allows the AI to come up with a detailed kind of task plan, and then you sign that off and say yes or no. Once you're done with that, it goes down and executes that. So pretty impressive releases from Anthropic, and they're putting a lot of effort behind it. The one thing I did want to talk about is the MCP servers that I use. So I just pulled up `claude.ai/brand/mcp`. And today I have Context7, which is kind of the leading how-to for getting up-to-date documentation for libraries that you use in your project. Fantastic resource. I would 100% recommend that. I think Ben had also asked me if I used it, and I just used it a few days ago to pull up the latest and greatest React Native docs. Excellent resource. MySQL. So I installed some open-source package that allows you to connect to a database, and in this case, I use it for my Laravel Sail app, which is Docker. So I connect to my local database and it's able to query things for me. Again, super fantastic. Playwright, so browser automation. You can tell it to boot up your app, click on various things, and using natural language processing, it can read the DOM, read the HTML, and click on things correctly. Very, very good. The one downside I have to Playwright is that the output is outrageous. It takes up so many damn tokens, and Claude has a context window, so the more you chat with it, the more it fills up. And if I use Playwright, man, my context is shot. So maybe we'll create a wrapper around Playwright, or they've got to have something better to do for output, because that part sucks. Stripe. I use Stripe to kind of analyze churn for my web app, my business. Pretty nice. Don't have too many comments there. I think they're leading on the MCP server front, so if you have a business, check it out.

They were super early.

Yeah, they were very early.

Yeah, they got that out as soon as, maybe in December. I don't remember exactly when. MCP came out in November 2023, I want to say, around that time frame, and I think Stripe was the first enterprise one I saw. They demoed it and everything, so that was exciting.

And Stripe's history, I feel like that totally lines up with what they do. And then the last one I have installed is my own, the Telescope MCP server, only for Laravel devs. But I think if you haven't tried it, you should definitely try it out. Yeah, I think the biggest bang for my buck, the one I use the most, is probably Telescope and probably Context7. So Context7 is a must for just keeping good docs, because again, all these LLMs are trained at a certain time point. Maybe the knowledge cutoff is October 2023, and React Native and Expo have a bunch of updates, and they don't know how to kind of process the latest and greatest. So what you have to do is say, "I want to accomplish this in React Native using this UI component," finish off your query, then say, space, "use Context7." So you have to tell the AI to go use it, but it takes about 10 or 12 characters to instruct it to use the tool. It goes, searches the docs, pulls them up, then uses that as a reference. So freaking fantastic. And then the last thing I want to talk about for Claude is how many tokens I use. So we had chatted about the Claude Pro plan, and that plan is $200 a month, and it is so worth it. So some guy came out with an open-source package that essentially looks at the input and output tokens, which is how AI is billed, and it looks at the model that you're using in your session. So I guess Claude logs this to some area so that you can pull these stats. I just ran this little CLI tool that tells me how much I've spent on Claude. I started on May 28th. That was my first day. On May 29th, I spent $280. May 30th, $220. May 31st, $100. So if you add up four days of Claude usage, I'm already at like $600, and I pay $200 a month. I won't bore you for the rest of the days, but I can tell you I've used it for about 15 days. My total cost in US dollars is $1,403. Total input is 44 million tokens. Total output is 1.5 million tokens. So it's incredible. I am very, very happy with the experience. Extremely satisfied with the price. Almost so satisfied that when people post on Twitter their own screenshot of their output, I'm a little concerned that it's going to change. I sent it to Ben earlier this week, this guy posting like $6,000 in two weeks. I thought, damn, if people keep posting this, something's going to change because I imagine their GPUs are on fire because of the great tool they put out.

Yeah, yeah, that is, you know, it's like, "Shh, don't tell anybody that you're getting all this arbitrage from." You know, it goes to show, I think people... there's a lot of people that pay for the plans that don't use that much, and so they're probably trying to make even or balance that out. But, I mean, yeah, I've seen kind of the excitement on X. I don't have a Mac, and so, you know, Claude is a macOS... I think only as far as right now. Is it? I don't know.

I think so. Because when I looked at it, downloading it, it was pushing me towards the Windows Subsystem for Linux thing, WSL. So I was just like, yeah.

I wouldn't spend 1,400 tokens probably in my own developer experience. So Cursor has been great for me. But yeah, I've seen Claude and all the excitement.

I'll get you on it eventually.

It seems pretty unanimous. I will say it seems pretty unanimous. I don't hear people complain about it. I think I hear so many positive things, and I don't see a lot of negative things. Whereas things like Phind, I'll hear negative things about Phind every now and then. And, you know, Cursor even has its fair share of, you know, "Oh, you got to have Cursor rules" and all this kind of stuff. But, yeah, Claude has been, from what I've seen, pretty much overwhelmingly positive.

Yeah. For the record, I would like to say I was maybe slightly early to this conclusion. So, again, last podcast episode, end of May, I had sipped the Kool-Aid a little bit and thought, you know, based on my few days of work here, I knew it was something different. And, again, these things could change. It could get better, it could get worse, a new tool could pop up. But as of, you know, middle of June, if you're not using it, I would highly, highly recommend it. Buy the max plan. Get it to work for you. It produces pretty great output depending on the task. But again, it's the new... it's very developer-focused. So if you're a developer, Cursor is a little bit more friendly. I feel like Claude is a little bit less friendly, but it gets stuff done, and that's what I care about. So it still gets a heavy recommendation from me, and I would not have as much output if I did not use it. So it's pretty great.

For sure. Cool. Well, I guess that's a great segue into one of the last things we want to talk about in this episode. With how easy and, let's just say, ubiquitous—I don't get to use that word enough, so I'm gonna use it there—AI and coding is, we've talked about Cursor, Claude, we haven't really talked about things like Vercel and Replit, which are kind of just like prompt-to-website builders and prompt-to-app builders. All in all, it's just so much easier to build customized software. And I think this has generated some discussion online about, "Is the death of SaaS here?" SaaS as in software as a service, you know, things that you pay monthly for that you get to use—X Premium, Split My Expenses—things that people have built that users want to pay money for on a monthly basis, typically, in order to use. And when people are saying, "I can build a clone of, you know, I don't know, Twitter," just—I can't think of any better example—"I can build a clone of Twitter on my own. So why would I pay, you know, $18 a month for that?" So let's hear, is that a hot take? Is that hyperbole? What are your thoughts on that? As someone who has been in SaaS, gosh, I don't know, since 2006, Brad, with IonFollow. I mean, you've been a SaaS boy.

Yeah, I would say the energy and excitement is there for building software. And of course, people want to make money, and SaaS is kind of the largest, highest-margin business you can think of. Software as a service is like 80-90% margins. I think less so with AI, but yeah, right now we're seeing this little bit of a tale that anyone could clone Slack, anyone could clone Jira. Yes and no. Yes, you could probably get something working, knowing the fact that you're never going to get all the bells and whistles and integrations and documentation, uptime, operations, you name it. You would never get there with the tools we have today. And again, I'm a huge fan of Claude. It can do a lot. It can never write a replacement for Slack with all the bells and whistles. So I think it's kind of this AI short-sightedness of the excitement that it can do things and it can do things fast. When I say fast, a lot faster than I could write code myself. But it's not even close to replacing entire SaaS apps. I think the problem that I see today is you can write a lot of code and you can write it fast, but to get it to production, to deploy it, to iterate on it... once you have a decently large code base, it gets extremely hard to make changes to it because you have a large system. Writing a new app, it feels like you have these superpowers of the AI now. You can go from concept, which is natural language: "I want to build an app that tracks macros." It can probably scaffold an app really quick, but once you have a large code base, getting it to add small features, make sure things work across the entire app, have a well-thought-out feature base, all that stuff gets incrementally harder and harder and harder. And I think we have this kind of naivety of like, "Oh, it's easy to build a new thing, therefore, I can get to a Slack or a Jira." But I think as you get a larger code base, as you add a lot more features that a lot of people probably don't realize are there, no, no chance. So I would call it a complete overstatement of AI tooling and systems. And again, I'm excited about where we're at today, but by no means are we there yet. I think all these companies would take a decent revenue dip. And you can't even factor in the network effect. A lot of these companies have tons of companies on contract already. So good luck competing with them. I would say don't fall for the trap of cloning XYZ B2B SaaS and trying to take their cake. Good luck.

Yeah, I totally agree. I mean, I think that ignores that time is money. And you're going to spend a bunch of time trying to make a clone of something that's not even going to probably work in production, at least to the level of sophistication that you need. But one thing I did want to get your take on that's kind of related to this, and I see it going one way, but I want to get your kind of unbiased thoughts, is, okay, we think software built by others for others is here to stay for sure. Everyone's not going to have their own custom little assistant or custom app that they're going to use just for them. And so that's always going to be in place. But what do you think about the kind of pay-once model versus the monthly recurring revenue? Because MRR, as it's called, has been such a staple in SaaS. And I think initially because people just, it's a smaller bill up front, obviously, they just kind of sign up, "Oh, $10 a month, no biggie." It's a lot different than $120. And so there's that aspect to it. But there's also the kind of aspect of not having to worry about the infrastructure and just having all the hosting and managed infrastructure done for you. So what do you think about the kind of push or the trend of kind of self-hosting software and paying for it once? I think DHH's company, they did that recently. I think they call it ONCE, right? Yeah, 37signals. They have something called ONCE.

Yeah. I think it's ONCE. I honestly forgot what the product is.

Yeah. So I guess what's your thought on that? Because I think about that in a certain way with AI, but what are your thoughts on, I guess this is more of like B2B, like of companies, not consumers, but companies kind of self-hosting or self-managing software?

I think the one-time payment can work for self-hosting, but I think that business model kind of sucks. The amount of money you can make from a self-hosted one-time payment is extremely low. Yes, you get a large influx of cash immediately, but the amount of money you can make with recurring revenue over time is way, way higher. There's so much math and statistics and general business guidance to say, "Hey, you want recurring revenue that's extremely stable and a healthy business model where you can predict things." One-time, can't predict it. Self-hosted, they get the value and they're gone. I think there's one slice of self-hosted one-time payment and there's another slice of just general B2C, or mostly B2C, one-time payment because people say, "Oh, I'm done with subscriptions." I think on that category, although you didn't ask, on that bit, it's a little bit harder with AI because it costs more money than the traditional, usual expenses of a SaaS. So I usually lean away from that because I think very critically about, per user, how much is it going to cost? If they're using AI, it's a larger cost than them not using AI. And in the age of AI, I want my features to be AI-capable, empowered, and be kind of re-centered around the intelligence we have today that's pretty cheap. So I try to stay away from one-time pricing because it would eventually price me out. It wouldn't make sense. It would make me kind of cut corners or make decisions I didn't want to make because I had a user pay $100, but they're just bogging my server down with crazy load. And I'm like, "How do I backtrack from this?" So I think depending on your application, if you use AI or not, you can do one-time. If you're doing a self-hosting route, I think my clear answer is it's not the best business model, and I think you should have a healthy business model, so I would deter from that. I think DHH is unique in the fact that he has a platform. People will buy whatever he puts out. I'm not saying the software is bad, I'm sure it's great, but I'm sure if they had a recurring revenue model, they'd be making a lot more, although it might be a different group of people.

Yeah. The one thing where I feel like there's some nuance to that is the specific AI agent software, because I do think those, in just my own experience, the workflows that you need to build, I think in order to augment your workforce—let's just say, you know, since I'm in accounting, you know, you have an accounting firm, you have workflows that someone's doing right now that you might want to make more efficient by having AI do a lot of the dirty work, a lot of the kind of prep work and all that kind of stuff, and then the human is just doing the review and the human-in-the-loop kind of stuff, right? I think a lot of times there's never a one-size-fits-all to that kind of work. You know, what the accountant at Nike is doing is going to be different than what the accountant at Adidas is doing. They're working in the same industry, they're competitors with each other, but the way that they, the systems that they use, and the way that that person does their work is going to be different. So I feel like there is going to be a need for a customized, you know, and it doesn't have to be like a one-time fee, but a customized build and implementation, and then maybe ongoing maintenance for those agent workflows. That is a business model, I think, that's worth considering, or I guess you can kind of see that on the horizon as something that's a possibility. Because, you know, especially with automations in general, I mean, this is pre-AI anyways, but with automation, someone has to maintain those and manage those. Like, things change, the business changes, and all of a sudden, you know, the kind of pipeline or the SLA that you had set up to get the data and push the data somewhere else, like, "Oh, we changed vendors. We don't have that anymore." Or, "Oh, we have a new business line that we need to incorporate into that." So there's always going to be a need for someone to maintain that and adjust accordingly. And so I think those kinds of workflows and automations and, again, agentic AI are going to be so custom that it's going to need kind of a self-hosted solution, or maybe a company can host it for you on your company's behalf. Like, all the workflows are custom, and then you just pay an ongoing fee for the maintenance, you know, kind of like a retainer almost, of like, "Hey, if this thing breaks down, I can call and say, 'Hey, I need this fixed,'" and they are getting paid to be there for me in terms of support if this thing goes down. So I think that's a really interesting kind of business that's evolving, and that's kind of how I see it playing out. But time will tell. You know, maybe I have it all wrong, but we'll see.

There's a new "AI scouting" rule that I've heard pop up on X a few times where I think, you know, for us specifically, we try to use the latest and greatest tools, you know, use things that are maybe early, even in the X area. So I follow a bunch of people on X that talk about AI, and I go on maybe daily just looking at the feed, trying to pull out as many insights as I can and then use them myself. And I think this AI scouting rule reminds me of what you're talking about, because a lot of people want to adopt AI, they don't know how to. They want to make automations, but they don't know how to. So it's kind of our job to use the tools, know where they're great at, know where they suck at, understand the trajectory of them, and kind of bring that experience to companies, to our companies, et cetera, just to distill and kind of educate people on, "Hey, this is what's possible today. This is probably what's going to be possible in six months. How can we leverage this?" Because there's so many things that you run into. Like you mentioned, authentication is hard. Plenty of internal tools and systems might not have great APIs. You got to go work with access control. On top of that, maybe APIs return something like Protobuf, which is a very different response format and hard to use. So sometimes you need to even change internal systems to make them AI-friendly. And so when you go down this path of "I want to automate X, Y, Z," like you mentioned earlier in the podcast, I kind of think of how you can put yourself in the zone of, "How could I create an agent that can replace me?" The agent needs to have tools to go access system A, system B. To access system A and system B, we need to understand how they work, create integrations, do all that. So I think workflows are freaking exciting. They always have been. I've loved automating things even before the AI era. Now with the AI era, it's like, "How can we do these things a lot faster and with a lot more flexibility?" Like you mentioned, if I want to run a monthly report, I don't need to run a Python script and type in the manual dates. I can now go, in natural language, just drive it all, assuming I have the right infrastructure and reliability in place. So I think we're approaching this era of, how can people get onboarded onto these systems and have someone who can get them up to speed quickly? Because every company really wants to say that they're using AI, and "agents" in particular is an extremely popular word. And "agentic" is kind of the scale of how autonomous it is. People want full agents that do everything for them. We're not there yet. I think it's a flavor of agentic, so maybe you can call it an agentic tool where it does a lot for you, but it's not replacing a full employee. And so, yeah, I would say if you're using AI tools, position yourself so you're using the latest and greatest so you can see what's possible. You know, spend a few dollars to try out Opus 3. It's a great, important model. And maybe your use case is enabled by Opus 3, and it costs a ton of money today, but I can guarantee you in a year's time from now, Opus 3 is going to be as cheap as Gemini 1.5 Pro. All these tools and bugs that Ben ran into in the Google ADK, definitely going to be fixed and a better framework by then. So I would say if you have any workflows in place, try to use the latest AI tools, try to plug it in, see where it fails, and get that expertise and experience that you'll need to kind of understand. And as AI gets better, you'll know when it's time, and when it's time, you'll be kind of the first people that make their lives easier, both at home and at work. And I don't know about you, but the more friction I can reduce from my work life, my personal life, and do things I actually care about, that's what I want. And I think AI is an extreme driving force to make that a reality.

Yeah, absolutely. I mean, I think just the, you know, it's kind of always been this way. So, and I'm not saying anything that people haven't said before me, but, you know, there's a big difference between someone who can create and someone who just consumes. And I think everyone has a little bit of both, you know, no one just solely creates and never consumes. And then surely no one just consumes, consumes, consumes and never creates anything. But I think if you think about your role, if you can be a bit more on the creator side than the consumer side, that's going to benefit you more. And I think about it again, in my industry in accounting, you know, if the work of how something gets done is going to be automated... it's like the actual *doing* of the work is getting automated. Well, then you can be so valuable and tap into, "How do you make that? How do you build that?" Because that's not an easy problem to solve. You know, again, to your point earlier, yeah, anyone can with a new app idea get 90% of the way there, but it's that 10% of finishing and then adding features and maintaining that, that takes a lot of skill, and no one can go into that green and be successful right away. So you need to try things. I mean, just one of the examples or demos that I saw that made a ton of sense to me was, thinking back to the kind of multi-agent structure, you have an agent who's kind of just looking at tools, right? All they have access to is all these different tools. And then you have the parent agent that is just the one that hands off to the different agents. And you can customize. Maybe you want that parent agent to have Opus 3, like it can reason, I don't know, just say you want that. But then into the specific workflows, like where the work is actually getting done and the data is being used, maybe that's really private and you can use a local, open-source model like DeepSeek or Llama where that data is very protected and you're just using the AI in a much more controlled environment. And then you're selecting what you want to send back up to the parent agent. So I say that because there are so many different ways you can implement it. And it's completely up to you, and all the tools are there, and you just need to kind of imagine and know how to put these things together. And, you know, like Brad and I have said a thousand times on this podcast, you got to just start tinkering and just playing with it because building is where it's going to be. And that's where the value-add is, to me. That's where you're going to be rewarded. And if you're in your career or if you're in a job, building these things is going to be important, not just being a user of them, if that makes sense.

Yeah, I kind of like to frame it as, you know, like you're mentioning, so many systems inside a company access various systems that have different data. It could be related data, but the sad truth is there's no system that does it all. And I think, how could we imagine a future in which you're not interacting with that system, but you're interacting in a natural language format? So if you're pulling expense reports and analyzing them, you could have an interface to kind of say, "What would that look like? How would I interact with that?" Because I think we're going to get there eventually, and building it is a ton of fun. So if you're on that train of engineering and tinkering, I think it's even more fun. But if you're not, I think we're going to get to a point where you can imagine a ChatGPT-like client that has access to all your tools. And how would you query it such that it knew enough to pull off your tasks? Because we end up in the age of writing very detailed prompts that get our work done. And then, yes, it's a lot of effort to write the prompt because that takes inherent skill. But even if that takes time and skill, it's going to be a lot faster than doing it manually yourself. So if you're not building the tools, imagine a world in which you had agents that encompassed Stripe, that encompassed your bank accounts. And now you had to go talk to it. How could you assemble your workflow with bits and pieces if you had to manually talk it out and write it out? You know, "Pull the expense reports from last month, reconcile with bank accounts A, B, and C." That's even almost like documentation at a certain point, and that's kind of how we're going to be interacting with AIs in the future. And if you can get closer to that now, you're going to be in a lot better spot when these models get better and engineers that build these solutions at your company or in your personal life help enable you to have that experience.

Yeah. Yeah, well said. I agree. Cool. Awesome. All right, well, should we jump into bookmarks and wrap this one up?

Yeah, I can go first. Mine's a little bit of a joke post, but related to the outage with Google, Cursor actually stopped working because they relied on Google—I assume, haven't checked, but I assume. Levels.fyi tweeted, quote, "I had stopped coding and just sat on the couch because Cursor stopped working." And I really enjoy this because Pieter Levels has tons of successful businesses and posts about them frequently as an indie hacker on X. And he has picked up the AI train, built various things purely with AI, and I think his main product is like Photo AI. So he's very in the AI space. And the fact that he had kind of highlighted that every engineer is using Cursor these days, and when it's down, it almost feels like we're in the Stone Age... And I had felt that too, of if I don't have access to Claude, if I don't have access to Cursor, I'm writing manual code. And not that I can't do it—I've done it for many, many years—but with these tools, I almost forget we're in this new era. And stopping to take a break when Claude or Cursor is down is almost a better thing because I think the AI does a great job, and getting more in-depth with AI tools is kind of the best way to set yourself up for success. So I thought it was kind of funny. Like, "Hey, Cursor's down. I can't work. I'm an engineer. How do I even write code anymore?" Of course, that's not how things are for a lot of people, but the new reality is these tools are here to stay and we're investing more and more and more in them, and getting good at them is becoming a core competency as a software engineer. So his tweet had kind of reminded me of that, and that outage kind of brought that thought up of, "Oh, wow, like a few years ago, this didn't even exist, and people all wrote code manually. Now everyone's using it. It's basically a faster autocomplete, and it's the future." So a very funny and interesting tweet by Levels that we are in the new generation, and it's clear that when things are down, you really see the gap.

Yeah. Yeah, there were some great memes coming out during that cloud outage, and I did see that one too. Cool. So my bookmark is actually a paper. The paper is titled "The Illusion of Thinking in LLMs," and then it has a much longer subtitle which I'm not going to read, but basically it's Apple researchers, and I guess they did some testing and they had some results to share about the strengths and limitations of reasoning models. So reasoning models: Opus 3... What are the Gemini reasoning models? Is Gemini 1.5 Pro a reasoning model?

I think it's just "thinking" or something. Yeah, it's got so many models.

What's Claude's reasoning model? Is that the...

Opus?

Yeah, I think so. So anyways, there are certain models that can do reasoning where they're like, you know, given logic, they can execute and solve whatever riddles or whatever tasks you're giving it, right? Things that require a bit more thinking than just, "What's the weather today?" So anyways, Apple came out with this paper, and I'll save you the whole scientific thing because I'm not a scientist, but the findings were very interesting because, one, the short or the abstract is that as they gave the LLMs more complex tasks—there's the famous task of Towers of Hanoi, I don't know if you're familiar with that, where it's like you have to move disks in a certain way to get them all stacked on one side. You know what I'm talking about?

I've heard the name, but I'm not sure.

So it's like a puzzle. It's like a puzzle. There's an algorithm that you have to do to get the things across the board. Anyways, it's very complicated. And so they gave the LLM that task. And they gave it the four bridges, too, where it's like, "Hey, you need to cross these four bridges without crisscrossing your line." So anyways, they gave it all these different puzzles, and they found that the more complex the puzzle, the LLMs actually start breaking down and not getting the right answer. They would start hallucinating or, you know, completely just failing the test. And so I think the kind of argument was that, you know, in these kind of high-complexity tasks, the reasoning just collapses and they just completely guess or just don't even execute the logic. Which was funny because, on one hand, that's kind of how we are. Like, if you gave me a highly complex task, I would break down and not be able to do it either. So it's kind of like, "Yeah, you know, OK, in a weird way, you're proving nothing. You're proving that complex problems are more complicated than simple problems." Like, well done. And I think there are some thoughts that Apple put this out because Apple, as we've discussed, is really far behind in the AI arms race, if we'll call it that. And so I don't know if this is their way of putting some water on that fire that they're under. But, you know, the other part which I think is valid to the actual paper is it talks about how, due to the nature of LLMs and how they work, this problem is going to exist until there's some breakthrough in terms of how they work. And they actually go on to mention quantum computing, which I don't understand, I'm not gonna pretend to understand, but basically they're just kind of saying that due to the way that these generative AI models work and how they're kind of predicting the next set of tokens—really, it's kind of all they're doing these days, I mean, they're very impressive at it—but because of that being their nature, it's more kind of just pattern matching or pattern generating than actual solving logic, and that can only take us so far with these reasoning models. So that was kind of the 30,000-foot overview, if you will. But it was really interesting, and yeah, there's a lot of science stuff in there that I didn't understand, but from what I was able to glean, it was interesting to hear both sides of the coin on if that paper was good, if it was useful, and what it was talking about. So, yeah.

Nice. That's cool. I hope Apple really comes out with a good Siri integration, and I'm hoping this research helps influence that. Because there was a recent interview in the WWDC week, I think, with their head of software, Craig Federighi, and I think it was, I don't know, Wall Street Journal or some news article just kind of grilling Apple on their Siri delays and AI fumble. I think they have a renewed focus and energy, and I'm hoping they publish more, have more open source. It's kind of the Apple you don't really see. I think they're shifting from this closed-source, secretive company to kind of this more open company very, very slowly. And again, this is my personal opinion from being an outsider. I don't know internally if it feels that way, but it seems like they're on the right track, and it's exciting to see them publish research. I think they had a model that was able to understand interfaces like a long time ago. So they don't publish too much, at least not publications that hit Twitter/X where I would see it. Maybe they publish a lot in other avenues that I don't have access to or look at. But this was awesome to see. And yeah, bravo to them for kind of publishing things and being more open, because we need more of that in the AI space.

Yeah. Yeah. It was good. It's a good article. I'll link it—or good paper. I guess we'll link it in there. Yeah. All right. Cool, Brad. Well, I think we'll wrap it up there. Good stuff as always. And yeah, again, for listeners, you know, I think one thing that we want to hear from you guys on is, you know, what Brad said earlier: imagine an AI agent helping you in your work. How would you speak to it? What would you want it to do? If you have any good thoughts, definitely leave a comment on YouTube or on Spotify and let us know what you think.

Yeah, yeah. I mean, well, this was a jam-packed episode. So yeah, if you guys have any feedback, any comments, I think we talked about a lot of interesting stuff. Please, please respond or chat in the comments. We read everything. So feel free to hit us up on YouTube.

Yeah, absolutely. All right. Good stuff, Brad. Until next time.

Awesome, see ya.

See ya. Thank you for listening to the Breakeven Brothers podcast. If you enjoyed the episode, please leave us a five-star review on Spotify, Apple Podcasts, or wherever else you may be listening from. Also, be sure to subscribe to our show and YouTube channel so you never miss an episode. Thanks, take care.

All views and opinions by Bradley and Bennett are solely their own and unaffiliated with any external parties.

Creators and Guests

Bennett Bernard
Host
Bennett Bernard
Mortgage Accounting & Finance at Zillow. Tweets about Mortgage Banking and random thoughts. My views are my own and have not been reviewed/approved by Zillow
Bradley Bernard
Host
Bradley Bernard
Coder, builder, mobile app developer, & aspiring creator. Software Engineer at @Snap working on the iOS app. Views expressed are my own.
Agent to Agent: Google’s ADK and the new frontier of AI
Broadcast by