The questions that you ask the agent is going to affect the quality of the thing that you get out.
This is Michael Bolin, tech lead for the open-source Codex repository and former Distinguished Engineer at Meta. And we talked about how his career grew.
So I was like, "All right, hackathon. Totally going to like make a new build system." It relatively quickly had something that was dramatically better, like at least twice as fast.
He shared about how OpenAI engineers use Codex.
I almost feel a little bad writing code by hand.
Well, what percent of your code would you say is human written?
Because every time I sit down, I'm like, "Should I write this?" The answer is almost always no.
And we discussed AI research versus big tech engineering cultures. What are your thoughts on that? Research-led culture versus engineering culture.
It was certainly an adjustment, right? I think if anyone who comes here and says otherwise is a liar, but here's the full episode.
I was looking deep into your website and there was something—most everything I could find more information about—but there's this one thing that you seem to be pretty excited about at the time, but I couldn't find any information about it because all the links were dead. What is Chickenfoot?
Oh man, that's strangely relevant because it was my Master's thesis project. It was a Firefox extension, probably one of the very few that was written in JavaScript for Firefox as a thesis project. And it was a little coding tool in a sidebar of Firefox and it was like a little REPL. It was called "end-user programming for the web."
The idea was end-user programming for the web. And so there were functions like `enter` and `click` and things like that, and you'd say `enter` and pass a string argument and it would like find the box and you'd say `click "search"` or whatever. A lot of the work was all these heuristics that we built under the hood that if you said `enter "first name"`, finding the words "first" and "name" and finding the text box that was closest and then using that as the input.
I just think about that now because it's really funny. We did a lot of work, and that's a lot of what these agents are doing right now, right? It's similar, but now truly in natural language, not in this little JavaScript REPL.
Oh interesting. So it parsed the frontend and had this REPL where you could say, I don't know, "find the first name field" and—
Yeah, we'd use like the accessibility tags and alt text for images, and it worked well. It was really good at Craigslist because that's one of the simplest websites you could use. But I had friends who made automated tasks and made real money off of things they automated with this tool, which was really fun.
When you entered the industry you were really excited about going to Google and specifically working on Google Calendar. So what drew you to Google and what was that like?
Yeah, so I got online in the '90s, right? I remember browsing the web and it was at a time you'd try five different search engines to hopefully find the thing that you wanted. And it's funny because I distinctly remember my roommate in March of 2000 saying like, "Hey, there's this site, this search engine that looks a bit better." I think it was still `stanford.google.edu`. And I was like, "Oh, this is better."
And then you started reading and they were just so different, right? Yahoo was all very cluttered and they tried to be very sparse and were more principled about it at that time. And then like a lot of things, you start seeing people who get a job there and you're like, "Oh man, they're taking really good people, I want to work with those people."
I felt like they just really got the web. Especially at that time, Microsoft killed the Internet Explorer project altogether and I was like, "This is the portal into the web and you guys are dismantling it." Whereas Google was way more web-forward. And so, between that and the quality of engineering and the impact they were having, certainly a place that I was really excited to go to coming out of school.
And what was the culture like there at the time? Like I think in some of your writing, you talked about product versus infrastructure.
A lot of companies, especially ones that get big like that, it's kind of like whatever they were good at first is what the founders always have a soft spot for, right? So certainly in information retrieval and infrastructure, right, those were key to growing that company.
Whereas I was drawn there at the time because they put out things like Gmail, right, which were big, right? But it still didn't have the same cachet inside the company as Search and that sort of thing. And so when I was working on Calendar it was still mostly consumer. It was also sold to enterprises, right, but we were not the ones making the money. We were probably a cost center at that point in time.
I saw eventually you ended up leaving Google and from your post it looks like it was pretty bittersweet. So what led you to leaving Google?
I had been there for four years. And you know, realistically, vesting was a thing. Finances definitely changed after that four-year period ended. But also it was just tough. I had a bad habit at that point of working really hard on things that were important to me but maybe weren't important to Google, right?
So I worked on Calendar, that was pretty important. And then I worked on Google Tasks which was a very small feature within Calendar, right? Like probably at least two to three orders of magnitude fewer users. But I was really passionate about it. And then I was really passionate about the JavaScript infrastructure and the Closure suite of JavaScript tools.
I enjoyed and I'm proud of all the effort that I put into those things. I went and wrote the book on Closure because I was motivated. But career-wise, maybe not the best move, right? You're like, "But I'm doing all this high-quality stuff," and you see other people getting recognized. You're like, "But I'm working so hard." You know, that's kind of like the "harder not smarter" type of mistake. And so it's kind of like I should go somewhere or try something else and see if the things that I'm excited about are the things that the company I'm working at is most excited about.
Later you came back into big tech and you started working at Facebook. I understand that you were kind of a JavaScript expert at the time and then one of your first big projects at Meta was kind of build tooling in the Android codebase. So I want to know the story behind how you got involved in that.
So at the time it was like, "Facebook's going to make a phone," right? There were some failed projects but it was like, "This time it's really going to happen. We're going to partner with HTC. We're going to fork Android and do some stuff." And so that seemed super exciting as a person just coming into the company.
I had done quite a bit of Java. I was more JavaScript, but at this point, this is also where they called it "Faceweb." The version of like they kind of put HTML5 Facebook on the phone and that was clearly not working. And it was clear that mobile was going to be the future. That was kind of make-or-break for the company.
And suddenly a friend was like, "Hey, I know you really like JavaScript, but you should really pick Java or Objective-C and get good at this if you want to be a product person." And that was really great advice. And so I was like, "Well, I like Java. I don't like Objective-C. So let's go." And so that's how I found myself on that project.
We had a very short timeline because unlike almost every other project, there was a hard deadline. Usually you ship when it's ready. This was like, "Nope, we got to send a build to HTC because they're going to burn it into phones on March 1st" or whatever it was. So it was really a scramble.
The original Android codebase was—or a bunch of it was also inherited from a contractor that Google paid. Facebook didn't want to make a native app and they paid some guy and then it was like the app in the store and they're like, "Here, the code's yours now," and we should have thrown it away but we didn't. And there were a lot of things that were frustrating but also iteration time.
I think for everybody, if you had been a web developer for such a long time you're used to this "edit-refresh" and then the Android build system was rough. It was some build tools and Ant. There was no way to modularize it out of the box. We had to hack it up just to even have four modules or five modules or something like that.
And it was so painful to get anything done that I was like, "I need to fix the build system." I know I've done a lot with Java. I know it's not fundamentally this slow to iteratively build this sort of thing. And so Facebook has this hackathon culture. So I was like, "All right, hackathon. Totally going to make a new build system," unceremoniously in the style of Google's build system.
And there's another build system called FB Build that was already a different mirror of the Google build system as well. That one's written in Python. I only did C++. And I was like, "Either I'm going to make this work or I don't even know if I'm going to make it here because it's just going to be tearing my hair out working on this thing."
So if you didn't fix this, you would have quit the company?
I was like, "Or I'll at least switch projects or I need to find a way to find a happy place and come into work every day." I want to write code. I want to do work. I just want to try to do my best work. And so it was funny, I give people credit. A lot of people told me what I was doing was a very bad idea. Almost everybody except one person, a senior Android engineer. But nobody said "no."
I felt like at Google people said "no" more. And so I just rolled with it and I quickly—relatively quickly—had something that was dramatically better, like at least twice as fast or something like that. And so that turned a lot of heads and people were like, "Okay, I guess we'll go with that one."
The interesting thing to me is it seemed like all the odds were stacked against you. A lot of people who are an engineer noticed the same problem would not have chosen to start the project because there's an existing one and also Google's got some competing one, maybe you're not going to beat that one. What gave you the conviction that your project could beat the other ones and become the default?
Yeah, I guess a couple things. I mean, one, like I said, I'd done other Java stuff. I was like, "This shouldn't be that slow." Or just as a software engineer, this fundamentally should not be as slow as it is. And most of the pushback was of the nature like, "Well, if we deviate from the standard thing, we're not going to be on the standard thing. And what if it gets 100x better next week and now we're stuck on your thing?"
Which is really funny when you think about it because they were making their own PHP virtual machine and language. They've certainly embraced doing their own thing many times. Why this one was different, I don't know. Everything about mobile was like, "How are we going to make this work?" I guess there was just a lot of fear.
And again, like timelines, it's like, "We have the senior person, why? It sounds like they're going to go off and do a science project but we have a hard deadline. Is that the best use of our time?" But it worked out. I will say another thing too was I also tried to couch it a little bit. I was like, "Hey, I'm just trying to make an Android build system. I'm not trying to take over the company. I'm not trying to change anyone else's project."
Because I certainly felt that that was going to invite more friction. And so I knew that—I tried to make it in a way that it could support more of the company, but I never pushed it. And so I was really heartened when a year or so later the iOS people were like, "Hey, our build system sucks. Can we use Buck?" I was like, "Yeah, sure. Come on board."
Yeah. That's also another interesting thing because you came in and you didn't have a whole lot of credibility. You were just hired and then you're trying to build something and everyone's saying "don't do this," and then you have to convince them, "Hey, this is the right direction." How do you influence that change without any credibility?
I borrowed some. There's one senior Android engineer who was also ex-Google, John Perllo. He was like, "You're going to do it, just do it fast before anyone cancels you." He's like, "But I'll support you if you get this done." So he was one of the first people, and he was a very prolific coder. So he was genuinely very happy to have a faster dev cycle. He was one of the first people to give it kudos and I think that helped a lot.
But I will say I made some big mistakes on my way in there. You mentioned about having no credibility—I came from Google and I was like, "Oh, this is where the Bell Labs people are, all these great people." And then you go to Facebook and you're like, "Oh, it's a bunch of college kids. How could they possibly know what they're doing?" And I made the mistake a few times of saying, "Well, at Google we did it this way." And people were like, "We really don't care." And they were right in most cases. Just because it worked there didn't mean it was going to work here.
One last thing on this topic is, I mean, what you built was way more performant than anything else by many multiples. What's the technical intuition behind what you did that made it so much more efficient?
I think the big thing was that I just sat down and looked at the tool from Google and I was like, "What is it doing right?" And I think the big thing is that the Google one would kind of just if you changed anything it would just start over from scratch. And so that's why it was so slow, especially for incremental builds.
And so then I started really understanding, "Okay, but what depends on what?" Because there are still these Android resources which had a very bespoke thing and it was complicated. And because it was somewhat complicated, I think that's probably why by default it blew everything away and started over.
But once I got in there and I was like, "Oh, okay, we can take advantage of it. If these things don't change, then we can cache this result from this step. We don't have to redo it." And like suddenly that made things quite a bit faster. And then also just even supporting the idea that you could have more than the four modules that we had.
I think every time someone added a module they had to add 200 lines of XML of Ant build script that nobody really understood and so no one wanted to modularize anything. You didn't want to be responsible for that. So a big thing with Buck was making it that it was a lot simpler to add a new module. And so then that also meant we ended up with more modules and then builds are more incremental as a result. So it's really a change in the mindset.
So less redundant work.
Yeah.
After you solved these build problems in Android, you went further into other parts of the company. I saw that you started to work more on the IDE. What was the problem that you saw in IDEs that made you want to get into that?
So I did a brief stint after Buck on Messenger iOS. So I was like, "Okay, I've done Android, maybe I should branch out even though I don't like it." And I still didn't like it. People don't even realize in Objective-C, there's a thing that happens for a long time now called ARC, Automatic Reference Counting. Nowadays the compiler injects it, but you used to have to add code into Objective-C to do every reference count and memory allocation.
Most people have never seen this type of code, but the iOS Messenger code that came in through acquisition was so old that it was still written in this other way and it was incredibly painful. I guess now you'd have Codex clean it up or something like that, but at the time we just sucked it up. And Xcode just didn't feel right to me. I didn't like header files and implementation files. I still don't.
And also for both Android and iOS, Facebook always had the biggest app. We had one app with every feature in it and we would ship it, as opposed to Google where they'd have a Drive app and a Sheets app or whatever. But they also owned one of the platforms so they could put 20 apps by default on there. And so what that meant was Facebook was always hitting the scaling limits of every mobile developer tool before everybody else.
Which was painful, but as a devtools person was kind of interesting because we got to solve problems that nobody had solved before, and not just as a science project but because it was real business value. And so with Xcode similarly we talked to Apple like, "Hey, Xcode is not really scaling to our project." They're like, "Your project's too big. You should make it smaller." That was kind of the feedback that we got from them.
And so it seemed justifiable to go and build an editor. It was a similar thing. I was like, "What is an IDE doing right?" It's talking to Clang, the language server and that sort of stuff. I was like, "We could build a nicer shell on top of that." Then we had started by that point deviating from Git and switching to Mercurial as a company. I was like, "No one's going to support Mercurial out of the box, right?" And if Buck is going to be our build system, Xcode is never going to do all of these bespoke Facebook things. So it seemed justifiable to invest the time to try to improve that experience. I didn't feel that way about Android because IntelliJ was quite good and we had figured out how to make that work at scale, but Xcode was a little more difficult at that point.
So there was Xcode which was bad and didn't fit the existing needs. And then there was another IDE that another team was building. I think it was web-based?
It was web-based. I'm not laughing at that part, that part's fine. But it was built off of an abandoned Google open-source project written in GWT, which is Google Web Toolkit, where you would write Java and it would codegen to JavaScript for everything. And I tried to build on what they had. I tried to build some credibility, even sped up their build a bit.
But again, kind of looking at the iteration times—and also, I was like, "This is an abandoned open-source project, it's written in Google Web Toolkit, and we are the React company at that point!" Why would we not empower people who want to build devtools to build on the technologies that we are the leader in and really because we think we're really good?
So I was like, "You guys are crazy." And so I started—and it's similar to the thing where I mentioned with Buck—I started it as the "Java build tool," not the "everything build tool." It was a similar thing. I was like, "Hey, I'm going to go over here and start this other editor, but we're just going to focus on iOS. I'm not trying to take things over."
Yeah, you didn't want all the friction. And that team, they had all the existing users though, right? They had thousands of engineers.
Maybe a thousand. I think eventually leadership sided with what eventually became Nuclide, which is what you built.
But why did they side with you when you didn't have any users on your product?
Yeah, I think it was a combination of two things. One was the arguments that I made in favor of the technology stack to build on. Another was that Nuclide was a desktop application. Part of it was, if this is going to be an iOS Xcode replacement, people are going to want to be able to talk to the simulator or plug in the phone or whatever. In theory, we could go through the web and do all those things, but it just seems like a lot of extra work. And I think the other part is that I had built up some credibility with the Buck thing that they were willing to take a gamble. Like, "Well, the last thing worked out, we'll try this one."
I saw that this, in combination with some of your other work, led to your promotion to E8, which is also known as Principal in the industry. What was your reaction to that at the time?
Yeah, certainly I was very excited. I felt like in the ways that I had been out of step at Google—this was also validation that now I'm growing not just technically but understanding what it means to do the things that are in line with your employer. So that also just was equally valuable and satisfying.
I know that Nuclide was open source and I think Buck was as well. What's the rationale for open sourcing? What are the pros and cons of open sourcing technology that you build?
Yeah, let's see. Buck is the more interesting one. Nuclide kind of didn't really get adopted externally. I think in both cases, all these companies have benefited so much from open source that there's just a feeling of, "If this is not really the 'secret sauce,' let's share this with other people." We've done Codex as well and a number of other things in my career, so I do feel like that sharing of information is just good. Even if no one uses your tool, just seeing it as a reference of how a thing could be done is valuable.
In the better scenarios, you get meaningful contributions back. I remember Uber adopted Buck, Airbnb did. Facebook was the biggest app so we would hit all these problems first and then this next wave of companies would start these things and be like, "Let's check out what these guys did."
Also at Google, internally it's Blaze, externally it's Bazel. So we were "open source first" and we always did kind of wonder if we put some pressure on. Ultimately I think we've gotten a little bit of credit from that unofficially from some of the folks who worked on that. But I think also it's a recruiting thing—showing that if you want to be at the leading edge in whatever area this technology is and you want to do this all the time, this is who we are.
And this decision to open source—was this a bottoms-up decision where engineers said, "We're just going to do this," or is this also leadership buy-in?
Yeah, I think in both cases it was certainly bottoms-up. Things like React and PyTorch—those are the big success stories where the value back to the company is unquestionable. And then there's this longer tail of things where depending on the economy and other things, managers get grumbly if they feel like their engineers are doing too much open source. So it's almost always bottoms-up, I think. But it wasn't met with a lot of resistance and you usually get a good conference talk or two and a nice blog post, and those blog posts do pay dividends over time for recruiting.
Okay, so you got your E8 promo at this point. I imagine the expectations in your mind are going up and now you need to find an E8 problem. So what did you do after you got promoted?
I think that's where I got a little over my skis and I tried to help with Web speed. Again, it was a thing that was a really big problem—just the load time of `facebook.com` was not in great shape. The architecture was a bit stale and the problem was so big. I didn't have the background—a lot of people who had worked on Web at Facebook had worked on it for a long time, and I had put myself in Mobile and DevTools.
I remember sat down with another person and we started compiling V8 from source and trying to see if we could change the way that we generated JavaScript so it would be friendlier to V8. We were just trying wacky things, none of which panned out, by the way. I'm better in a project that involves writing a lot of code from the beginning, and I think a project like that was more about looking at data and talking to a lot of people, and that's just not my strength.
I think you mentioned at this point in your career the idea of a "hero quest." What's that mean?
The idea that there's something about ego—this idea that there's this Gordian knot and all these engineers are like, "If only someone would come in and solve this engineering problem." I was like, "I know JavaScript, I'll just come in that way." But I definitely did not. And I think that's been an important learning. I've had to relearn it at least one other time: that I can do a lot of things, but there is a smaller subset of things that I genuinely enjoy doing and that I'm going to be a lot more successful at. I try to expand that over time, but we don't all have to be the best at everything. We should accept it and embrace it.
So then how did you find the E8 problem after that?
That was a bit of luck. We'd have these summits with smaller groups of engineers about brainstorming what's going to bite us in the future. The repo keeps growing and they're going to hit some scaling problem at some point. A person who became my manager, Brian O'Sullivan, put some people together to work on making a virtual file system to get ahead of that problem. So we got myself, Adam Simpkins, and Wes Furlong—both tremendous engineers. I was the worst engineer on the project for quite a bit.
You mentioned a virtual file system. On a high level, what's the benefit to a company like Meta for something like that?
If you have bought into the monorepo philosophy—put all the code in one repo—most things that people do only need a subset of that repo to look at the files at any point in time. And so the idea is that you design all your tooling around this virtual file system so that when you clone the repo or update to a different commit, you don't have to write out every single file in the repo on disk. That is going to grow proportionally with the size of the repo, so at some point you're going to be very sad.
There's two parts to it. One is building this virtual file system that's like, "Hey, I know the user's at this commit, if the operating system asks me for the contents of a file, I can go get it and it will appear like they had laid out all the files." The other part, which is the part that I was better at, is anticipating: "Well, we have all these tools in our toolchain that are used to just reading all the files—you Ripgrep and it just reads everything." How do we start changing our development flow and our tools such that they are designed with the virtual file system in mind? Because if you just materialize all the files with your tool, now you've lost all the benefits that you've made.
So on a high level, it's just lazy loading a huge file system. It's more efficient because you don't need to do everything at the beginning.
Yeah.
And so that part of this project that you said you're better at is integrating everything on top of that primitive.
Yeah. One thing that I did with Hansen Wong, who's here on the Codex team with me now, was: the traditional way you want really fast file search in your IDE, most of these things walk the entire file system to find out what the files are. I was like, "Well, that's going to be a serious problem, it's going to undo all the benefits." So first it was, "How can we implement file search that's not going to undo all the benefits?" And then, "How can we do it even better than how it's done today?"
And so we ended up building this file system called "Miles" for "My Files." On a cron job it would ingest all the new commits that had come in on trunk and keep track of which files had been added or removed—just the names of the files, not the contents. And then Hansen had some clever ideas about how we maintained that index such that we could support fuzzy file matching. Instead of just substring matching, you want to be able to type just the uppercase letters or if your spelling's bad.
It came up with this really interesting way to represent kind of all the files that had been seen at some point, and then some bits to represent if I'm at this commit was this file present or not, and then when you sent a query you'd send what commit you're at and if you've added or removed any files locally. I think we got this over a million files in like 10 to 20 milliseconds. So you'd be in your editor typing and it was way faster than what Xcode or VS Code would give you out of the box.
That was solving a problem for Eden, which was the name of the virtual file system. It was so fast and it was available as a Thrift service internally, people started using it for all sorts of other things. When I left there were 30 servers running Miles spread around the globe, so clearly that was more than just people personally searching for files.
Interesting. Most people don't use LeetCode on the job, but that sounds like—how do you put that? Is it like a tree or a trie?
No, that's what's funny. This was pretty cool in that we had kind of two parallel arrays. One was the file contents and one was an index into that. And then we had a 64-bit mask—so you had 26 lowercase, 26 capital, 10 digits, and maybe a dash. And every bit was set if that character was used at all in the file that you were searching.
The first thing was we could blow through that list and exclude a bunch of things right off the top. But it was also very designed so all these arrays were in parallel to each other so that for cache-wise we knew it'd be very efficient for the CPU to read memory linearly and then it lent itself to parallelism. I should really probably write it up at some point because it's really cool. It wasn't just "out of the textbook," that's for sure.
You worked on Eden and Miles, and then these eventually led to another promotion. Prior to that promotion, there were some learnings you might have had about influence and conflict in the org.
Yeah, that's one of the things about being an E8 who primarily writes code. The majority of people at that level or higher are not writing code; they're exclusively spent doing more influence or working across teams, writing the big Google Doc and getting everyone on board. As an E8 trying to have that level of impact, it's hard to do that just writing code. So I was like, "I need to spend at least some time influencing other people."
Sometimes you're just so confident in some insights you have that—certainly I was—that I would just come down way too hard. That did not go well for me and promo got delayed. I was very anxious about when Microsoft acquired GitHub because Nuclide was built on this primarily GitHub technology and I was like, "That's going to go away because VS Code's going to make them not be a project anymore," which did happen.
I was just so anxious that this was a risk, and I was pushing people. But I didn't really account for the fact that people were happy with what they were doing and didn't want their cheese moved right out from under them. And so I got a bit of a talking to, I had to wait a little bit for promo. I got some coaching after that point to work on that specifically.
What's the number one thing you learned about coaching?
I think I'm more aware of things that trigger me, whether they be technical decisions or what have you, and recognizing when that's happening and being like, "Okay, let's not act in that moment." Or if I don't think I can have the conversation or I'm not in the best place to have it, maybe I go talk to the person's manager instead rather than being a "bull in a china shop" to the engineer. Like, "I have this thing, I'm thinking about it this way, I'm kind of riled up about it, help me how I can work with your team."
It's interesting because the promo got postponed, you saw that VS Code was going to come up and the thing Nuclide was built on was going to go down, and you were right in hindsight. So what are your thoughts when you saw things play out and you go, "I was right the whole time"?
I did have some conversations about a year later and I was like, "Hey, can we balance the ledger a little bit? I bit pretty hard for that thing and it was kind of good." We worked it out.
So the learning is just how you went about it, not what you were saying.
Yeah.
You seem to have a great time at Meta and eventually you left. So what was the thing that drew you to OpenAI?
Yeah, a number of things. I interviewed at the end of 2023 with OpenAI. I had spent 2023 at Meta trying to do LLM-based developer tools. We had our own little version of GitHub Copilot, "CodeCompose." We did a paper and a talk on that. There's a lot of enthusiasm around delivering quickly and pushing the boundaries, and truthfully, you get feedback on some of these things and it's like, "Why is this not GPT-4?" and I was like, "Well, we're on Llama 2, it's not the same thing."
I was not a researcher, I just wanted to build the experiences. And so I wanted to go to the place where I could build with the best model. Secondly, it was seeing the people who were coming here to OpenAI—people I really respect. I felt like I could work with more senior people at OpenAI than where I was. And third, this has been just a very special place at this point in time. I felt like this would be the most similar to starting at Google in 2000—they got some footing and some product-market fit.
And the last one was that I really enjoyed Calendar because it was consumer and I shipped to a lot of people. I went to Facebook because I thought it was a huge consumer place, but ended up doing developer tools where my users were my friends at work—20,000 people, not a billion people. Thinking about OpenAI and the chance to come back to consumer or at least have a large user base—working on Codex, I wonder if it's over a million weekly actives, and it just keeps growing. Vertical line rather than a hockey stick.
Dev tooling for the industry, almost.
Yeah.
Meta is very engineering-driven, very bottoms-up. I feel like a lot of the AI companies are also like that but on the research side. Rather than "engineer is the first-class citizen," it's "let's make sure the research goes well," and for good reason—that's why the models are good. As an engineer, you mentioned you weren't doing research. What are your thoughts on that research-led culture versus engineering-led culture?
It was certainly an adjustment, right? I think if anyone who comes here and says otherwise is a liar. But when you talk about impact, which I think is important—I love the work that I do on Codex, on the harness, but if the model weren't very good it wouldn't really matter what we did on the harness. So that's how it is.
But I feel really great. We sit right next to the research team and work really closely with them. That relationship, getting to co-develop the thing, was another reason for leaving Meta in the LLM space—I want to build the product with the people who are building the model so we can do this thing together.
When you got here it sounds like you were working on Codex and starting that project up. I understand with the initial launch of Codex CLI it was not exactly what you hoped for in terms of how it was received, but later it all came together. Can you tell that story?
Yeah, sure. It's been a wild ride. So Codex CLI, we launched it in April 2025. It was kind of this "one more thing" moment at the end of the GPT-4o mini live stream and we demoed it live. We open-sourced it. A lot of people tried it out, everyone was excited to try a new coding agent and it was pretty good, but it was pretty rushed to get it out the door.
Which in some ways was good for engagement because now we're open source and we were getting pull requests coming in all over the place—I feel like we were at 10 to 20,000 stars in a week or two. That part was fun, but I think we weren't quite staffed to really drive that the way we needed to. Just a month after that, a team of seven engineers and researchers launched Codex Cloud Web, where you use Codex in a container or just kick off a new thing from your phone, which is super cool.
That was a more well-staffed effort and I think the long-term vision is correct, but with a lot of this stuff, you have to bring the users with you. I think that one's just a little bit ahead of its time and people were still more into local coding agents. We saw the Web product have big adoption initially, but it wasn't as sticky as we had hoped. Then through the summer, local agents were still the stronger product-market fit.
In the limit, you're going to need more machines than just your laptop as a place for agents to run. So we shifted quite a bit over the summer. We brought more people onto the Codex CLI. GPT-5 was going to come out and that was looking really good. And a thing I was personally excited about because I had prototyped it a couple times before was that in addition to the CLI, we also started working on the VS Code extension.
I felt very strongly that the terminal is good for a lot of things but it has limitations; you make a lot of compromises to make a nice UI in the terminal, whereas in VS Code we didn't have to make as many compromises. August was a crazy month. GPT-5 came out, we released our new refresh terminal UI, also the GPT-4o open-source model came out and we supported that in the TUI as well. Later that month the VS Code extension came out. We were just shipping like crazy. That confluence of things is where we started to see the inflection point that's brought us to the vertical growth that we're at today.
You mentioned the local versus the remote version of these coding agents, and sounds like you have a lot of conviction that the right long-term direction is remote and in the cloud. Why is that?
Well, I think about people for whom it is sticky now: imagine you want to automatically set the agent on every GitHub issue or Linear tile that comes in. Obviously there are costs, but for an internal private repo you want it to be a piece in any sort of automation pipeline. You can't have all that happening on your laptop. As an individual I see maybe we'll still personally spend more time with the local agent, but in terms of compute time of agents doing work, getting that set up in the cloud is quite nice.
I see. So you're not saying the local product will change, you're saying that across the industry, the compute that goes towards agents, the majority will be in the cloud.
Yeah. Even when the VS Code extension first came out, one of the things was the ability to take the conversation you were having and hit a button to have it transfer to the cloud if you were set up to do it. I think we'll continue to see that where maybe you're working on something and you want to throw it over there and have it bring it back when it's done.
Your Codex usage is 5xed since the beginning of the year and over a million people are using it now. I'm curious, has your AI workflow changed a lot since you started to use this newer version of Codex?
Yeah, it has. I'm a much bigger user of the app now than I thought I would be. For a while I was very strongly in our VS Code extension—I want that sidebar, I want all the code there next to me. I'm not a person who doesn't look at the code. For projects that are true prototypes that are throwaway, I will not look at the code. It's very freeing and I understand why people are so excited about it. But for the code that goes into Codex itself, I'm like, "No, I still need to look at this, this is pretty important."
But you start to get a sense of, "Okay, I'm confident the model's going to be able to do this change." I don't need to babysit it. I'm going to just write a lot upfront. I have like four or five clones of the Codex repo on my machine. I have enjoyed in the Codex app the multitasking—it's just a lot easier because now you're just kind of hanging out in one window. It's like, "How much throughput can I get in terms of how many balls can I juggle in the air?" Sometimes it can feel a little hectic.
Context switching.
But at the same time you're like, "I'm getting a lot more done." I almost feel a little bad writing code by hand sometimes because you're like, "I could have asked this in the right way." When you started it was like, "Oh, I'm just going to change these three lines," and then 30 minutes later you're like—we all like to type still, I think.
What percent of your code would you say is human written versus model generated these days?
Oh man, it's probably 80 to 90%. I mean, yeah. Especially debugging a test or if a CI thing is bad, I'm like, "Hey, how to write print debug whatever," and that's great. That's really freeing.
Digging into the problems that are suitable for LLMs and the ones that are not—what do you need to see where you think, "Okay, I need to go in and write it myself"? What's that 10% and what's in that 80-90%?
Yeah, that's a good one. I think about that because every time I sit down I'm like, "Should I write this?" The answer is almost always "no." There's things that are lower level—the Codex harness part that I spend my time on is in Rust and that means we can do operating system specific things in that codebase.
I spend a lot of time on sandboxing—the thing that really upholds the security integrity of what we're doing so the model can't go outside the bounds that you set. I do more of that by hand because I need to be really sure that's all correct, that our test coverage is good. Sometimes I'll seed it and then once I've got the groundwork and the pieces that I had a lot of feelings about, I let it fill in the rest. But a lot of like refactors and building up big PRs—I'll be like, "Okay, please split this up into reviewable-sized commits." I think about how much time I spent on that sort of stuff before, and now it frees you up.
What about code review? What percent of lines of code are people reviewing manually versus agents?
I like the approach where the agent should do multiple rounds of review until it's confident that it's worth a human's time looking at it. But we still do look at it before it goes in. Generally speaking, we have our `agents.md` file like everybody else does. Sometimes you find a gap in knowledge or some context that needs to get added back in, things that we haven't memorialized that as a human I still happen to know.
But I'll say now that people are using AI to write their pull request summaries, our summaries across the team are getting way better. When I'm going into review, it's been reviewed by Codex, there's a summary that has the "why" and the "what." That is helping get through these PRs faster, which is good because there was a lot more review to do.
That's amazing. 50% of diffs used to have almost nothing in the summary. I want to talk about the Codex CLI being open source. Why is it open source for something that's that critical?
You're going to put this thing on my machine, I care about what it's doing. In this domain in particular, it's really important that people can look at it. People have a lot of questions about AI agents, and I think in this area it's really important. Also, we have gotten a lot of great contributions and bug reports that we would have missed out on. And sharing with the world how this is done—we do it through code. I've put out one blog post about how the agent loop works, and there's a plan to do more of that. It's just that time is the limiting factor.
It was funny because I had two candidates who came through and one's like, "Hey, I wrote it, right?" I was like, "No, no, I wrote it." And another person came in and said, "Oh, you can tell that you did not outsource the coding." I was like, "Oh, thank you."
You mentioned the blog post. How does Codex find what is available to it in its environment? It's kind of amazing—it's thinking to itself and discovering all these things in my terminal. How does that typically work?
There's a few ways. Obviously, what Codex is based on—it loves to use Ripgrep very well to find all sorts of things. And then if you have your `agents.md` file and you say, "In this repo, these tools are really important, you should use these," or the READMEs. Or obviously if you use MCP and associate those MCP servers with where you're working, that injects the set of tool definitions at the start of the conversation. That's not even discovery on Codex's part, it's kind of just put front and center there.
I see. Some of it's the harness explicitly throwing that into the context, and then there is a big chunk though where the model is just doing all the heavy lifting of finding things.
Yeah.
Reflecting over your career, the breadth and depth of your work is insane. You were JavaScript frontend, then you have all these devtool build, fuzzy file search, virtual file systems, now you're working on Codex. What are the top technical books that have helped you educate yourself?
One was this book on operating systems, it's like a thousand pages. It's the Addison-Wesley book. I'm trying to remember the author, but it was funny because I was working on the virtual file system project and I had gotten to that point in my career without ever writing C—undergrad was more theoretical. And yet I'm working on a virtual file system project. That's why I joke that I was the worst engineer on the project.
Someone said something and I realized I didn't know what they were talking about. This is kind of embarrassing. I was like, "What book do I have to read?" and my manager, Aaron Kushner, was like, "Well, there's this thousand-page book." I was like, "Done." I bought it, read the thing cover to cover. I took it to Hawaii with me, I took it everywhere until I finished it. It's amazing and sad how far many of us can get in software engineering without really having a clue how computers work.
There's just so many levels of abstraction. On one hand it's very freeing, and on the other hand it's a little bananas. What I would say to people right now is: actively try to go deeper through the layers and understand these things. Many times I saw other people do it and now I can do it—there are problems some other people could solve that I couldn't solve because I didn't know there was this croft between these two layers. If you got rid of it, you get a 10x improvement. If you're operating so high up, you don't really know what you can break down.
In terms of books, I've enjoyed the O'Reilly Rust books—big fan of Rust and they're well-written and thorough. And honestly another thing that's not in the book category but more fun: CTFs or "Capture the Flag" security type competitions. It helps with an adversarial mindset. It's like a computer decathlon—there'll be multiple challenges and maybe this one you need to understand assembly and this one you need to understand what someone's janky PHP admin pages do. It forces this breadth on you in a way that's kind of hard to generate otherwise.
Can you give some context on what a CTF is?
It's usually a competition in the infosec/security domain. There's the "Jeopardy style" one where there's a bunch of challenges designed ahead of time with point values associated with them. You're trying to solve these things and discover a "flag"—a secret piece of text. If you can discover it, that means you figured out or reverse engineered what you were supposed to do. It's kind of like an escape room but in your terminal.
And your recommendation is that people who want to become better engineers should invest in doing some of these CTFs.
Yeah. You develop skills that you just wouldn't have if you're writing React every day. You're probably not going to open up GDB and reverse a tic-tac-toe game, but I did that because of a CTF challenge. And then I learned how to use GDB, and when you're faced with other problems, your toolkit of ways to solve things is just much broader.
A lot of people see the power of Codex doing everything for them and I imagine them saying, "Well, I don't need to learn GDB because Codex knows it." What would your advice be for people thinking about their engineering education?
I think everyone's struggling to answer that question right now. I still think trying to forcefully go through the levels of abstraction and just understand at a deeper level how things work is going to be important. This will change over time, but right now the questions that you ask the agent are going to affect the quality of the thing that you get out.
If you're not asking the right questions, you're not going to get the best engineering solution. As things go on, perhaps that will also be another layer that's removed. But learning how to ask what the right question is—I haven't totally pinned down what that means for someone who's starting out new. Fortunately, we have experience to fall back on where that taste or intuition of what to ask has developed.
Reflecting over your career, the expectations at these really high levels are kind of crazy. Senior Staff level is an unattainable level of impact for most people. Someone who gets promoted to that level thinks, "I've got to work super hard now because the bar is up here." And then you went two levels past that. Is it stressful for you?
It was never not stressful. I sat in calibrations and for other people you talk about their level and their impact and you want to be really fair. Then you're playing it out in your mind: "Someone's sitting in my calibration talking about this and they want to be fair." You get to E8 and it's like a D1, a Director. You're like, "How do I have as much impact as a person who has over a hundred people in their org?"
A lot of people who do it do it by a different form of people management, where they're trying to write the right doc and get people aligned. The reason they do it and are not a Director is they have this technical credibility because they built this thing; when they go to the team it lands differently than if an engineering director does. If you're influencing 50 to 100 people as a senior IC then that's D1 impact.
As a coder, you have to be really thoughtful about the projects that you pick. It can't just be a "this is fun for me" type of project if you care about your performance rating. Even when I would start a project, I would think: "Okay, maybe there's this feature I want to write because it's fun." And I'd say "No, I should let somebody else do that and think about what code I should personally write to maximize my impact versus if someone else could do it 80% as well as me."
But even then, leading a five-person project, is that still getting to E8/E9 impact? It's still hard. So really finding that project that's a force multiplier. The virtual file system was a great project because we knew that down the road this was really going to unlock so many things and prevent us from being completely blocked.
Another big part of it is recognizing senior managers who pair the senior IC with the right project. Some senior engineers are amazing fixers and coders but they're not the "idea-comer-up-withers." A lot of times it's the manager who realizes, "Oh, this project needs this person," and that person would have never realized it themselves.
One of your old colleagues, Adam Ernst, mentioned your ability to start projects was really good. A lot of them you created out of nowhere—you had an idea, went off and built a prototype, and came back very convincing that it was a better solution. Do you have any advice for engineers who have a problem and a solution and want to build a project from scratch?
A lot of good projects come from being a little bit dissatisfied about something. Sometimes I was just charging ahead and building the thing without really thinking about what the best way to do it was. A funny example is Google Calendar. I was like, "I want weather icons in Google Calendar," and I just charged through and cobbled things together. And then my tech lead was like, "Wait, how are you storing that? We should have talked about protocol buffers and binary formats." I was so set on "weather" that I never bothered to ask anybody if there was a better way to do it.
So the skill is digging into the dissatisfaction and solving your own problems. It seems almost every single one of these projects is "I want something to happen, this shouldn't be this way," and then you went and solved it. What gave you the confidence to know that you could make it so much better?
Having been at Google, I knew Blaze existed. I never worked on it, but I knew there was a thing out there that had this shape and was a lot better. So that's an existence proof. I identify myself as a "coding machine." I always had confidence I could build a prototype correctly and answer the basic hypothesis: "Spiritually, should there be a way forward to this thing that I think should exist?" Generally, if you are determined, you find a way.
I've noticed that pattern: people go to big companies, see world-class infrastructure, then go to another company and build new versions of it. Your writing is so clear; what advice would you have for engineers who want to write better?
Reading other good writing is a good start; you start to pick up patterns. Think about: what is it that I'm trying to convey? What would someone really want to know? Outline a lot upfront. Just being like, "Does this set of things linearly follow?" And asking yourself if you made too big of a jump. If you can anticipate that and put in the example that someone needed to make that jump, that's a big deal for technical writing.
You have that career note that I love where you lay out a three-step plan for impact. Can you explain that?
Step one is figure out what you really like to do and be honest with yourself. Step two is figuring out what's really valuable to your employer. At Google I didn't do a good job of that—I did stuff I was excited about but it wasn't "AdWords for Google." And step three is find that intersection and just really lean into that.
Last question: if you go back to yourself at the beginning of your career knowing everything now, what advice would you give yourself?
I should have been open to learning more things sooner. There's so much to learn when you're starting and whatever your first programming language is, you have a soft spot for it because it's the first thing that enabled you to do anything. But it's also a hazard because you want to hold onto it because you're finally productive. In my case, I probably went too deep with JavaScript and it took a long time before I wrote any C. If I had been a little more curious and flexible in terms of what types of projects I was willing to take on, it might have made a shift for me earlier.
I gather from your story there was a point with Xcode where you hated Objective-C and were coming up with ways to compile Java into Objective-C, and then talking about C++ for Miles—it seemed like a very concerted effort to learn it. Well, maybe with Codex in the future it'll be less of a hurdle for people.
No, it's true. Opens a lot of doors.
Awesome. Well, thank you so much for your time.
All right. Thank you, Ryan.
Thank you for listening to the podcast. It's a passion project of mine. Another passion project I've been working on in secret is building an ergonomic keyboard—it's ultra low-profile and ergonomic. I'll put a link to the keyboard in the description. Also, if you have any feedback for me about the show, I'd love to hear it. Comments on YouTube have led to guests coming on like Ilya Grigorik and David Fowler. Please keep letting me know what you'd like to see more of in the show, and I'll see you in the next episode.