At Anthropic, the way that we thought about it is we don't build for the model of today. We build for the model six months from now. That's still my advice to founders that are building on LLMs: just try to think about what is that frontier where the model is not very good today, because it's going to get good at it.
All of Claude Code has just been written and rewritten and rewritten over and over and over. There is no part of Claude Code that was around six months ago. You try a thing, you give it to users, you talk to users, you learn, and then eventually you might end up at a good idea. Sometimes you don't.
Are you also in the back of your mind thinking that maybe in six months you won't need to prompt that explicitly? Like the model will just be good enough to figure out on its own?
Maybe in a month.
No more need for plan mode in a month?
Oh my god.
Welcome to another episode of The Light Cone, and today we have an extremely special guest, Boris Cherny, the creator and engineer of Claude Code. Boris, thanks for joining us.
Thanks for having me.
Thanks for creating a thing that has taken away my sleep for about three weeks straight. I am very addicted to Claude Code, and it feels like rocket boosters. Has it felt like this for people for months at this point? I think it was like end of November where a lot of my friends said something changed.
I remember for me, I felt this way when I first created Claude Code and I didn't yet know if I was onto something. I kind of felt like I was onto something, and then that's when I wasn't sleeping. And that was just like three straight months. This was September 2024. I didn't take a single day of vacation. Worked through the weekends, worked every single night. I was just like, "Oh my god, I think this is going to be a thing. I don't know if it's useful yet because it couldn't code yet."
If you look back on those moments to now, what would be the most surprising thing about this moment right now?
It's unbelievable that we're still using a terminal. That was supposed to be the starting point; I didn't think that would be the ending point. And then the second one is that it's even useful, because at the beginning, it didn't really write code. Even in February when we GA'd, it wrote maybe like 10% of my code or something like that. I didn't really use it to write code; it wasn't very good at it. I still wrote most of my code by hand.
So the fact that our bets paid off and it got good at the thing that we thought it was going to get good at—because it wasn't obvious. At Anthropic, the way that we thought about it is we don't build for the model of today. We build for the model six months from now. And that's still my advice to founders that are building on LLMs: just try to think about what is that frontier where the model is not very good today, because it's going to get good at it and you just have to wait.
Going back though, do you remember when you first got the idea? Can you just talk us through that? Like was it some spark, or what was even the first version of it in your mind?
It's funny. It was so accidental that it just kind of evolved into this. At Anthropic, I think the bet has been coding for a long time, and the bet has been the path to safe AGI is through coding. This has kind of always been the idea, and the way you get there is you teach the model how to code, then you teach it how to use tools, then you teach it how to use computers.
You can kind of see that because the first team that I joined at Anthropic was called the Anthropic Labs team, and it produced three products: Claude Code, MCP, and the desktop app. So you can kind of see how these weave together. For the particular product that we built, no one asked me to build a CLI. We kind of knew maybe it was time to build some kind of coding product because it seemed like the model was ready, but no one had yet really built the product that harnessed this capability.
So there's this insane feeling of product overhang. But at the time, it was just even crazier because no one had built this yet. And so I started hacking around and I was like, "Okay, we build a coding product. What do I have to do first? I have to understand how to use the API because I hadn't used Anthropic's API at that point."
So I just built like a little terminal app to use the API. That's all that I did. And it was a little chat app because you think about the AI applications of the time—and for non-coders today, what most people are using is just a chat app. So that's what I built, and it was in a terminal. I can ask questions, it gives answers.
Then I think tool use came out. I just wanted to try out tool use because I didn't really understand what this was. I was like, "Tool use is cool. Is this useful? Probably not. Let me just try it."
You built it in terminal just because it was the easiest way to get something up and running?
Yes, because I didn't have to build a UI. It was just me at that point.
It was like the IDEs, Cursor, Windsurf were taking off. Were you sort of under any pressure or getting lots of suggestions of, "Hey, we should build this out as a plugin or as a fully featured IDE itself?"
There was no pressure because we didn't even know what we wanted to build. The team was just in explore mode; we didn't know vaguely we wanted to do something in coding, but it wasn't obvious what. No one was high confidence enough. That was my job to figure out. And so I gave the model the bash tool. That was the first tool that I gave it just because I think that was literally the example in our docs. I just took the example, it was in Python, and I ported it to TypeScript because that's how I wrote it.
You know, I didn't know what the model could do with bash. So I asked it to like read a file. It could like cat the file. So like that was cool. And then I was like, "Okay, what can you do?" And I asked it, "What music am I listening to?" It wrote some AppleScript to script my Mac and look up the music in my music player.
Oh my god.
And this was Sonnet 3.5. I didn't think the model could do that. And that was my first ever "feel the AGI" moment where I was just like, "Oh my god, the model just wants to use tools. That's all it wants."
That's kind of fascinating. It's very contrarian that Claude Code works so well in such an elegant simple form factor. Terminals have been around for a really long time, and that seemed to be like a good design constraint that allowed a lot of interesting developer experiences. It doesn't feel like working; it just feels fun as a developer. I don't think about files, where everything is, and that came by accident almost.
Yeah, it was an accident. I remember after the terminal started to take off internally—and honestly, after building this thing, I think like two days after the first prototype, I started giving it to my team just for dogfooding. If you come up with an idea and it seems useful, the first thing you want to do is give it to people to see how they use it.
I came in the next day and Robert, who sits across from me and is another engineer, he just had Claude Code on his computer and he was using it to code. I was like, "What are you doing? This thing isn't ready. It's just a prototype." But yeah, it was already useful in that form factor.
I remember when we did our launch review to launch Claude Code externally—this was in December or November 2024—Dario asked, "The usage chart internally, the DAU chart is vertical. Are you forcing engineers to use it? Why are you mandating them?" And I was just like, "No, we didn't. I just posted about it and they've just been telling each other about it." Honestly, it was just accidental. We started with the CLI because it was the cheapest thing and it just kind of stayed there for a bit.
So in that late 2024 period, how were the engineers using it? Were they sort of shipping code with it yet, or were they using it in a different way?
The model was not very good at coding yet. I was using it personally for automating Git. I think at this point I've probably forgotten most of my Git commands because Claude Code has just been doing it for so long. But yeah, automating bash commands was a very early use case, and operating Kubernetes and things like this.
People were using it for coding; there were some early signs of this. I think the first use case was writing unit tests because it's a little bit lower risk and the model was still pretty bad at it, but people were figuring it out and figuring out how to use this thing.
One thing that we saw is people started writing these markdown files for themselves and then having the model read that markdown file. And this is where ClaudeMD came from. Probably the single biggest principle in product for me is latent demand. Every bit of this product is built through latent demand after the initial CLI, and ClaudeMD is an example of that.
There's this other general principle that I think is interesting where you can build for the model and then you can build scaffolding around the model in order to improve performance. Depending on the domain, you can improve performance maybe 10-20% or something like that, and then essentially the gain is wiped out with the next model. So either you can build the scaffolding and get some performance gain and then rebuild it again, or you just wait for the next model and then you kind of get it for free.
ClaudeMD and the scaffolding is an example of that. Really, I think that's why we stayed in the CLI—because we felt there is no UI we could build that would still be relevant in six months because the model was improving so quickly.
Earlier we were saying we should compare ClaudeMDs, but you said something very profound, which is yours is very short. That's almost the opposite of what people might expect. Why is that? What's in your ClaudeMD?
I checked this before we came. My ClaudeMD is just two lines. The first line is: "Whenever you put up a PR, enable auto-merge." So as soon as someone accepts it, it's merged. That's just so I can code and I don't have to go back and forth with CR. And then the second one is: "Whenever I put up a PR, post it in our internal team stamps channel" so someone can stamp it and I can get unblocked.
The idea is every other instruction is in our project-level ClaudeMD that's checked into the codebase, and it's something our entire team contributes to multiple times a week. Very often I'll see someone's PR and they make some mistake that's totally preventable, and I'll literally tag Claude on the PR. I'll just do like "Add this to the ClaudeMD," and I'll do this many times a week.
Do you have to compact the ClaudeMD? I definitely reached a point where I got the message saying your ClaudeMD is like thousands of tokens now. What do you do when you hit that?
Our ClaudeMD is pretty short, maybe a couple thousand tokens. If you hit this, my recommendation would be: delete your ClaudeMD and just start fresh.
I think a lot of people try to over-engineer this, and really the capability changes with every model. So you want to do the minimal possible thing in order to get the model on track. If you delete your ClaudeMD and then the model is getting off track or does the wrong thing, that's when you add back a little bit at a time. What you're probably going to find is with every model, you have to add less and less.
For me, I consider myself a pretty average engineer, to be honest. I don't use a lot of fancy tools. I don't use Vim; I use VS Code because it's simpler.
Wait, really? I would have assumed that because you built this in the terminal that you were like a diehard terminal/Vim-only person.
We have people like that on the team. There's Adam Wolf, for example; he's like, "You will never take Vim from my cold, dead hands." There's definitely a lot of people like that on the team, and this is one of the things that I learned early on: every engineer likes to hold their dev tools differently. There's just no one tool that works for everyone.
But I think this is one of the things that makes it possible for Claude Code to be so good, because I think about it as: what is the product that I would use that makes sense to me? To use Claude Code, you don't have to understand Vim, tmux, or SSH. You just have to open up the tool and it'll guide you.
How do you decide how verbose you want the terminal to be? Sometimes you have to go Ctrl+O and check it out. Is it like internal bike-shed battles around longer vs. shorter? Every user probably has an opinion. How do you make those decisions?
What's your opinion? Is it too verbose right now?
Oh, I love the verbosity because sometimes it just goes off the deep end and I'm watching and then I can just read very quickly and it's like, "Oh, no, no, it's not that." Then I Escape and stop it, and it stops an entire bug farm as it's happening. That's usually when I didn't do plan mode properly.
This is something that we probably change pretty often. I remember early on, maybe six months ago, I tried to get rid of bash output internally just to summarize it. I was like, "These giant long bash commands, I don't care." Then I gave it to Anthropic employees for a day and everyone just revolted. "I want to see my bash because it is quite useful." For something like Git output, maybe it's not useful, but if you're running Kubernetes jobs, you want to see it.
We recently hid the file reads and file searches. Instead of saying "Read foo.md," it says "Read one file, searched one pattern." This is something I think we could not have shipped six months ago because the model just was not ready; it would still read the wrong thing pretty often. As a user, you had to be there and catch it. Nowadays, I notice it's on the right track almost every time.
Because it's using tools so much, it's a lot better just to summarize it. But then we shipped it, dogfooded it for a month, and then people on GitHub didn't like it. There was a big issue where people were like, "No, I want to see the details." That was really great feedback, so we added a new verbose mode. In `/config` you can enable verbose mode if you want to see all the file outputs.
I posted on the issue and people still didn't like it, which is awesome because my favorite thing in the world is just hearing people's feedback. So we just iterated more and more to make it the thing that people want. I'm amazed how much I enjoy fixing bugs now. All you have to do is have really good logging and then just say, "Hey, check out this particular object, it messed up in this way," and it searches the log and figures everything out.
It can make a production tunnel and look at your production DB for you. Bug fixing is just going to Sentry and copying markdown. Pretty soon it's just going to be straight MCP. It's like an auto-bug fixing and test-making startup factory.
Right. There's all these concepts now of rather than having to review the code—I'm old school, so I like the verbosity. I like to say, "Well, you're doing this, but I want you to do that." But there's a totally different school of thought now that says anytime a real human being has to look at code, that's bad.
Yeah, yeah.
Which is fascinating.
I think Dan Shipper talks about this a lot: whenever you see the model make a mistake, try to put it in the ClaudeMD or put it in skills so it's reusable. But there's this meta point that I struggle with a lot. People talk about what agents can do, but that changes with every single model. Sometimes a new person joins the team and they use Claude Code more than I would have.
I'm constantly surprised by this. For example, we had a memory leak we were trying to debug. Jared Sumner has just been on this crusade killing all the memory leaks, and it's been amazing. But before Jared was on the team, I had to do this. I was trying to debug it, so I took a heap dump, opened it in DevTools, and was looking through the profile and code trying to figure it out.
Then another engineer on the team, Chris, just asked Claude Code. He was like, "Hey, I think there's a memory leak. Can you run this and try to figure it out?" Claude Code took the heap dump, wrote a little tool for itself to analyze the heap dump, and then it found the leak faster than I did. This is just something I have to constantly relearn because my brain is still stuck somewhere six months ago at times.
So what would be some advice for technical founders to really become maximalists at the latest model release? It sounds like people fresh off of school that don't have any assumptions might be better suited than engineers who have been working at it for a long time. How do the experts get better?
I think for yourself, it's kind of beginner mindset and humility. Engineers have learned to have very strong opinions, and senior engineers are rewarded for this. In my old job at a big company when I hired architects, you look for people that have a lot of experience and really strong opinions. But it turns out a lot of this stuff just isn't relevant anymore and those opinions should change because the model is getting better. So the biggest skill is people that can think scientifically and think from first principles.
How do you screen for that when you try to hire someone now for your team?
I sometimes ask, "What's an example of when you were wrong?" It's a really good one. Some of these classic behavioral questions—not even coding questions—are quite useful because you can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake, and if they learned something from it.
I think a lot of very senior people—there are some founder types like this—but founders in particular are quite good at it. Other people sometimes will never really take the blame for a mistake. Personally, I'm wrong probably half the time. Half my ideas are bad, and you just have to try stuff. You try a thing, give it to users, talk to users, learn, and eventually you might end up at a good idea. This is the skill that was very important for founders, but now I think it's very important for every engineer.
Do you think you would ever hire someone based on the Claude Code transcript of them working with the agent? We're actively doing that right now. We just added, as a test, that you can upload a transcript of you coding a feature with Claude Code or Codex. Personally, I think it's going to work. You can figure out how someone thinks—whether they're looking at the logs, if they can correct the agent if it goes off the rails, if they use plan mode and make sure there are tests.
Do they even understand systems? There's just so much sort of embedded in that. I just want like a spiderweb graph, like in video games like NBA 2K. It's like, "Oh, this person's really good at shooting or defense." You could imagine a spiderweb graph of someone's Claude Code skill level.
Yeah. What would those skills be?
I think it's like systems testing, user behavior, design, product sense, maybe also just automating stuff. My favorite thing in ClaudeMD for me is I have a thing that says: "For every plan, decide whether it's over-engineered, under-engineered, or perfectly engineered, and why."
I think this is something that we're trying to figure out, too. When I look at engineers on the team that I think are the most effective, it's very bimodal. There's one side where it's extreme specialists—I named Jared before, and the Bun team is a really good example. Hyper specialists who understand dev tools and JavaScript runtime systems better than anyone else.
Then there's the flip side of hyper generalists, and that's kind of the rest of the team. A lot of people span product and infra, or product and design, user research, business. I really like to see people that just do weird stuff. That was kind of a warning sign in the past because it was like, "Can these people build something useful?" That's the litmus test.
But nowadays—for example, an engineer on the team, Daisy—she was on a different team and then she transferred onto our team. The reason that I wanted her to transfer is she put up a PR for Claude Code a couple weeks after she joined. Instead of just adding the feature, what she did was first put up a PR to give Claude Code a tool so that it can test an arbitrary tool and verify that works. And then she had Claude write its own tool instead of herself implementing it.
It's this kind of out-of-the-box thinking that is just so interesting because not a lot of people get it yet. We use the Claude Agents SDK to automate pretty much every part of development: code review, security review, labeling issues, shepherding things to production. Externally, I'm seeing a lot of people start to figure this out, but it's taken a while to figure out how you use LLMs in this way.
I've been having office hours with various founders about this. You have the visionary founder who has the idea, they've built this crystal palace of the product they want to build. They've totally loaded in their brain who the user is and what they're motivated by. They're sitting in Claude Code and they can do like 50x work.
But they have engineers who work for them who don't have that crystal memory palace of the platonic ideal of the product, and they can only do like 5x work. Are you hearing stories like that? There's usually a person who's the core designer of a thing and they're just trying to blast it out of their brain. What's the nature of teams like that?
It seems like that's almost a stable configuration. You're going to have the visionary who is now unleashed. But maybe going back to the top of it—I'm experiencing this right now. I'm only a solo person and I need to eat and sleep and I have a whole job. How am I going to do this?
We just launched Claude Teams, and this is a way to do it, but you can also just build your own way to do it. It's pretty easy.
What's the vision for Claude Teams?
Just collaboration. There's this whole new field of agent topologies that people are exploring. There's this one sub-idea which is uncorrelated context windows. The idea is multiple agents having fresh context windows that aren't polluted with each other's context or their own previous context. If you throw more context at a problem, that's like a form of test-time compute, so you just get more capability that way.
If you have the right topology on top of it, so the agents can communicate in the right way, they can just build bigger stuff. Teams is one idea; there's a few more coming pretty soon. The first big example where it worked is our plugins feature was entirely built by a swarm over a weekend. It just ran for a few days without human intervention. Plugins is pretty much in the form that it was when it came out.
How did you set that up? Did you spec out the outcome you were hoping for and then let it figure out the details and let it run?
Yeah, an engineer on the team just gave Claude a spec and told Claude to use an Asana board. Claude just put up a bunch of tickets on Asana and then spawned a bunch of sub-agents, and the agents started picking up tasks. The main Claude just gave it instructions and they all just figured it out.
Like independent agents that didn't have the context of the bigger spec?
Right. I would bet the majority of agents today are sub-agents prompted by Claude Code. A sub-agent is just a recursive Claude Code process. We call her "Mama Claude." That's how most agents are launched.
My Claude insights just told me to do this more for debugging. I spend a lot of time on debugging and it would just be better to have multiple sub-agents spin up and debug something in parallel. I added that to my ClaudeMD to be like, "Hey, next time you try and fix a bug, have one agent look in the log and one that looks in the code path." That just seems sort of inevitable.
For weird, scary bugs, I try to fix them in plan mode and then it seems to use the agents to sort of search everything. Whereas when you're just trying to do it inline, it's like, "Okay, I'm going to do this one task instead of searching wide." This is something I do all the time too. I calibrate the number of sub-agents I ask it to use based on the difficulty of the task. If it's really hard, I'll say use three, five, or even ten sub-agents to research in parallel.
I'm curious—then why don't you put that in your ClaudeMD file?
It's kind of case-by-case. A ClaudeMD is just a shortcut. If you find yourself repeating the same thing over and over, you put it in there. But otherwise, you don't have to put everything there; you can just prompt Claude.
Are you also in the back of your mind thinking that maybe in six months you won't need to prompt that explicitly? Like the model will just be good enough to figure it out on its own.
Maybe in a month. No more need for plan mode in a month.
Oh my god.
I think plan mode probably has a limited lifespan.
That's some alpha for everyone here. What would the world look like without plan mode? Do you just describe it at the prompt level and it would just do it, one-shot it?
Yeah, we've started experimenting with this because Claude Code can now enter plan mode by itself. We're trying to get this experience really good, so it would enter plan mode at the same point where a human would have wanted to enter it. Plan mode is no big secret; all it does is add one sentence to the prompt that's like "Please don't code." You can just say that.
So it sounds like a lot of the feature development for Claude Code is very much what we talk about at YC: talk to your users and then implement it. It wasn't that you had this master plan and then implemented all the features.
Yeah, that's all it was. Plan mode was because we saw users who were like, "Hey Claude, come up with an idea, plan this out, but don't write any code yet." Sometimes it was just talking through an idea or asking for very sophisticated specs. Sunday night at 10 PM, I was looking at GitHub issues and our internal Slack feedback channel, and I just wrote this thing in like 30 minutes and shipped it that night. It went out Monday morning. That was plan mode.
So do you mean that there will be no need for plan mode in the sense of "I'm worried that the model's going to head off in the wrong direction," but there will still be a need to think through the idea and figure out exactly what it is that you want?
I think about it in terms of increasing model capabilities. Six months ago, a plan was insufficient; you get Claude to make a plan and you still have to babysit it because it can go off track. Nowadays, I'm a heavy plan mode user—probably 80% of my sessions I start in plan mode. Claude will start making a plan, I'll move on to my second terminal tab and have it make another plan. When I run out of tabs, I open the desktop app and start a bunch of tabs in the code tab.
Once the plan is good, I just get Claude to execute. Nowadays with Opus 4.5—and I think it started with 4.6—it got really good. Once the plan is good, it just stays on track and does the thing exactly right almost every time. Before you had to babysit after the plan; now it's just before the plan. Maybe next you just won't have to babysit at all.
The next step is Claude just speaks to your users directly and bypasses you entirely.
It's funny, this is the current stuff for us. Our Claudes talk to each other and they talk to our users on Slack pretty often. My Claude will like tweet once in a while, but I delete it. It's just a little cheesy; I don't love the tone.
What does it want to tweet about?
Sometimes it'll just like respond to someone because I always have co-work in the background. It's the co-work that really loves to do that because it likes using a browser. A really common pattern is I ask Claude to build something, it'll look in the codebase, see some engineer touched something in the Git blame, and then it'll message that engineer on Slack asking a clarifying question.
What are some tips for founders now on how to build for the future? What are some principles that will stay and what will change?
I think some of these are pretty basic, but they're even more important now. One example is latent demand. It's just the single biggest idea in product. People will only do a thing that they already do. You can't get people to do a new thing. If people are trying to do a thing and you make it easier, that's a good idea. If you try to make them do a different thing, they're not going to do that.
I think Claude is going to get increasingly good at figuring out these product ideas for you because it can look at feedback and debug logs.
That's what you mean by "plan mode was latent demand"—people were already had their Claude chat window open in a browser talking to it to figure out the spec, and now that just became something you do in Claude Code.
Yeah, that's it. Sometimes I'll just walk around our office floor and stand behind people to see how they're using Claude Code.
You're surprised how far the terminal has gone and how far it's been pushed. How far do you think it has left to go given this world of swarms and multiple agents? Do you think there's going to be a need for a different UI on top of it?
If you asked me this a year ago, I would have said the terminal has a three-month lifespan and then we're going to move on. You can see us experimenting—Claude Code started in a terminal, but now it's on web, in the desktop app code tab, in the iOS and Android apps, in Slack, in GitHub, and VS Code/JetBrains extensions. We're always experimenting with different form factors. I've been wrong so far about the death of the CLI, so I'm probably not the person to forecast that.
What about your advice to DevTool founders? Should they be building for engineers and humans, or building for what Claude is going to want—building for the agent?
The way I would frame it is: think about the thing that the model wants to do and figure out how you make that easier. When I first started hacking on Claude Code, I realized this thing just wants to use tools and interact with the world. How do you enable that? You don't put it in a box and say, "Here's the API." You see what tools it wants to use and you enable that the same way you do for your users. Think about the problem you want to solve for the user, and then what is the thing the model wants to do to solve it?
What is the technical and product solution that serves the latent demand of both?
Back in the day, more than 10 years ago, you were a very heavy user and you wrote a book about TypeScript, right? Before TypeScript was cool, when everyone was deep in JavaScript in the early 2010s.
Yeah, something like that.
Before TypeScript was a thing, because back then it was a very weird language—it wasn't supposed to do a lot of things with being typed in JavaScript. Now it's the right thing, and it feels like Claude Code in the terminal has a lot of parallels with TypeScript at the beginning.
TypeScript makes a lot of really weird language decisions. In the type system, pretty much anything can be a literal type, for example. This is super weird because even Haskell doesn't do this. It has conditional types which I don't think any language thought of at all.
It was like very strongly typed.
Yeah, and the idea was when the early team was building this, the way they built it was: "Okay, we have these big untyped JavaScript codebases. We have to get types in there, but we're not going to get engineers to change the way they code." You're not going to get JavaScript people to have 15 layers of class inheritance like a Java programmer. They're going to use reflection and mutation and all these features that traditionally are very difficult to type.
They're a very unsafe type to any strong functional programmer.
That's right. And so what they did instead of getting people to change how they code was build a type system around it. It's brilliant because there are ideas that no one was thinking about, even in academia. It purely came out of observing people and seeing how JavaScript programmers want to write code.
For Claude Code, there are ideas that are similar. You can use it like a Unix utility; you can pipe into it and out of it. In some ways it is rigorous, but in almost every other way it's just the tool that we wanted. I build a tool for myself, then the team builds it for themselves, then for Anthropic employees, then for users. Now more codebases are in TypeScript because it's way more practical.
Right, TypeScript solves a problem.
I guess one thing that's cool—I don't know how many people know—but the terminal is one of the most beautiful terminal apps out there, and it is written with React Terminal.
When I first started building it—you know, I did front-end engineering for a while. I'm sort of a hybrid; I do design, user research, and write code. We love hiring engineers that are generalists. For me it's like, "Okay, I'm building a thing for the terminal. I'm kind of a shitty Vim user, so how do I build a thing for people like me?"
I think just the delight is so important. Fall in love with the product. Designing for the terminal has been hard—it's like 80x100 characters, 256 colors, one font size, no mouse interactions. A little known thing: you can actually enable mouse interactions in a terminal, like clicking.
Oh, how do you do that? I've been trying to figure out how to do this.
We don't have it in Claude Code because we prototyped it and it felt really bad because the trade-off is you have to virtualize scrolling. Terminals have no DOM; it's just ANSI escape codes and weird organically evolved specs from the 1960s.
Yeah, it feels like BBS door games.
That's like a great compliment. It should feel like you're discovering.
Lord of the Red Dragon! Fantastic.
We've had to discover all these kind of UX principles for building the terminal because no one really writes about this stuff. If you look at big terminal apps from the 80s or 90s, they have all these windows and look kind of janky by modern standards. We had to reinvent a lot. For example, the terminal spinner has gone through probably 100 iterations.
80% of those didn't ship. We tried it, it didn't feel good, move on. This is one of the amazing things about Claude Code: you can write 20 prototypes back-to-back, see which one you like, and ship that within a couple of hours. In the past, you'd use Origami or Framer and build three prototypes in two weeks. We have the luxury of iterating so quickly to build a product that's joyous to use.
Boris, you had other advice for builders and we kept interrupting you because we have so many questions.
I would say two pieces of advice that are kind of weird because it's about building for the model. One: don't build for the model of today, build for the model of six months from now. You might think you can't find PMF if the product doesn't work, but if you don't do this, you're going to get leapfrogged by someone else building for the next model.
Second thing: in the Claude Code area where we sit, we have a framed copy of The Bitter Lesson on the wall. The idea is that the more general model will always beat the more specific model. Never bet against the model. We could build a product feature into Claude Code—we call this scaffolding—but we could also just wait a couple months and the model will probably just do the thing instead.
There's always this trade-off: engineering work now to extend capability 10-20%, or wait for the next model. Assume whatever the scaffolding is, it's just technical debt.
How often do you rewrite the codebase of Claude Code? Is it every six months? Is there scaffolding you've deleted because the model improved?
Oh, so much. All of Claude Code has just been written and rewritten over and over. We unship tools every couple weeks and add new tools. There's no part of Claude Code that was around six months ago.
Would you say 80% of the current codebase is less than a couple months old?
Yeah, definitely.
So the shelf life of code is just a couple months.
Yeah.
That's another alpha for the best founders.
Did you see Steve Yegge's post about working at Anthropic? I think there's a line that says an Anthropic engineer averages 1,000x more productivity than a Google engineer at Google's peak. 1,000x—this is unbelievable.
Internally, all technical employees use Claude Code every day. Even non-technical ones—I think half the sales team uses Claude Code, though they've started switching to co-work because it's safer with the VM. Productivity per engineer grew something like 70% last year while the team doubled in size. Since Claude Code came out, productivity per engineer at Anthropic has grown 150%.
This is crazy because in my old life, I was responsible for code quality at Meta across Facebook, Instagram, WhatsApp. One of the things we worked on was improving productivity. Back then, a 2% gain in productivity was a year of work by hundreds of people. So 150% is just completely unheard of.
What drove you to come to Anthropic? As a builder, you could go anywhere. What was the moment that made you say this is the set of people or the approach?
I was living in rural Japan and opening up Hacker News every morning. It just started to be all AI stuff. I started using some early products and it just took my breath away. That was the feeling. As a builder, I'd never felt that using early products. That was back in the Claude 2 days. I started talking to friends at Labs to see what was going on.
I met Ben Mann, one of the founders, and he immediately won me over. Anthropic operates as a research lab, so the product was tiny. It's really all about building a safe model. That idea of being very close to the model resonated with me. The second thing was just how mission-driven it is. I'm a huge sci-fi reader; I know how bad this can go.
When I think about what's going to happen this year, it's going to be totally insane. In the worst case, it can go very bad. I wanted to be at a place that really internalized that. At Anthropic, if you overhear conversations in the lunchroom, people are talking about AI safety. That's what everyone cares about more than anything.
What is going to happen this year?
If you think back six months ago—Dario predicted that 90% of the code at Anthropic would be written by Claude. This is true. For me personally, it's been 100% since Opus 4.5. I uninstalled my IDE. I land like 20 PRs a day. Across Anthropic, it ranges from 70-90% depending on the team. For a lot of people, it's 100%.
I predicted in May that you wouldn't need an IDE to code anymore. People gasped because it was such a silly prediction at the time. But all it is is tracing the exponential. That's deep in the DNA at Anthropic because our founders were co-authors of the scaling laws paper. Coding will be generally solved for everyone.
I think we're going to start to see the title "software engineer" go away. It'll just be builder, product manager, maybe we'll keep the title as a vestigial thing. But the work won't just be coding; engineers will be writing specs and talking to users. Every single function on our team codes—PMs, designers, EM, even our finance guy.
This is the lower bound if we just continue the trend. The upper bound is a lot scarier—like we hit ASL-4. At Anthropic, we talk about these safety levels. ASL-3 is where models are now. ASL-4 is when the model is recursively self-improving. If this happens, we have to meet a bunch of criteria before we can release a model.
The extreme is some kind of catastrophic misuse, like designing bio-viruses or zero-days. We're actively working so that doesn't happen. It's been exciting and humbling seeing how people use Claude Code. I just wanted to build a cool thing and it ended up being really useful.
My impression from Twitter is everyone went away over the holidays and then found out about Claude Code and it's been crazy ever since. Is that how it was internally? Did you have a nice Christmas break and then come back to what?
For all of December, I was traveling around and took a "coding vacation"—I was coding every day. I also started to use Twitter then because I worked on Threads back in the day. I think for a lot of people, that was the moment they discovered Opus 4.5. Internally, Claude Code has been on an exponential tear for months.
Mercury had a stat that 70% of startups are choosing Claude as their model of choice. SemiAnalysis said 4% of all public commits are made by Claude Code. It wrote the code that plotted the course for Perseverance, the Mars rover. That's the coolest thing for me. NASA chooses to use this thing. It's humbling, but it feels like the very beginning.
What's the interaction between Claude Code and co-work? Was it a fork? Did you have Claude Code look at Claude Code and say, "Let's make a new spec for non-technical people?" What's the genesis of that?
This is my fifth time using the term "latent demand." We were looking at Twitter and there was that one guy using Claude Code to monitor his tomato plants. Another person was using it to recover wedding photos off a corrupted hard drive. People were jumping through hoops to install a thing in the terminal just so they could use this.
We knew we wanted to build something for those people. The thing that took off was just a little Claude Code wrapper in a GUI in the desktop app. It's just Claude Code under the hood—the same agent. Felix, an early Electron contributor, built it in like 10 days. It was 100% written by Claude Code.
It felt ready to release. We had to build stuff for non-technical users—all the code runs in a virtual machine, there's protections against deletions, and guardrails for users. Garry Tan: Boris, thank you so much for making something that is taking away all my sleep, but making me feel creator mode again. It's been an exhilarating three weeks.
I can't believe I waited that long since November to get into it. Thank you so much for being with us.
Yeah, thanks for having me. And send bugs.
Sounds good. Come on now.