OpenAI's Codex Lead： Why Coding as We Know It is Over (Transcript)

[00:00] Alexander Embiricos

You still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You can have some fun jokes about that. I don't think you need them.

[00:07] Harry Stebbings

Today, joining us in the hot seat, we have Alexander Embiricos, product lead for Codex at OpenAI. This is an incredible discussion. Time to get the notebook out. For me, the most exciting future with AI is one where everyone just feels like a superhuman, empowered by AI. And for that, we need tools that everyone feels fluent with.

Your job is the success of Codex. Our job is the distribution of intelligence, and this is really unintuitive, but like we put all this effort into training these models and then we serve these models to our competitors.

This is so difficult for me as a venture capitalist to understand. Elon said that coding is one of the first professions to be largely automated. Do you agree?

For sure I would agree that coding is one of the first domains where LLMs are really good. But what does it mean for coding to be automated? It's like kind of a heavy statement, right? For example... Ready to go.

Alex, I'm so excited for this, dude. I told you I've been at a PE conference and all I could think was, "Thank God I've got Alex next because this is going to be a great one." So, thank you so much for joining me, man.

So excited to be here. Thank you.

Now, this is weird first start, but roll with it. You'll understand my British intricacies. I'm fascinated by people's motivations. Are you motivated more by the fear of losing or like the thrill and excitement of winning?

I'm a maximalist. I'm definitely much more motivated by the idea of winning than the fear of losing. But I'll admit to you something: when I was running a startup before joining OpenAI, one of my darkest moments—and there were many dark moments while I was running the startup—was recognizing that I had spent the past few months trying to avoid losing, and all of a sudden I was like, "Oh my god, that is why I'm so unhappy and that's probably why the startup isn't going well."

And so when we flipped, you know, every now and then I have to reach myself and flip back into this idea of winning. But really what motivates me even more than that is I think I just love building things and building things for people. And man, I am so excited for this year because many amazing things that don't exist yet are going to be built and given to a lot of people.

I'm diving right in. Elon said that coding is one of the first professions to be largely automated. Do you agree given your position and what you see day-to-day?

I think for sure I would agree that coding is one of the first domains where LLMs are really good. What does it mean for coding to be automated? It's like kind of a heavy statement, right? Like for example, now that we no longer write assembly—when that change happened and we moved to higher level languages—did we say coding is automated? Not really, right? We were just able to write much more code and then as a result there was much more demand for code and there were many more software engineers required.

But yeah, part of what they used to do is automated in the same way that... do you know the origin of the word computer?

No.

I might pronounce the location wrong, but I think it was at Bletchley Park. There were all these machines for like decoding German Enigma. And there were humans who would like punch out punch cards and like put them into the machine and do a bunch of like tabulated math. I'm probably butchering this, but there was an intensely manual part of work. And even like the first spreadsheet software was kind of loosely based off this idea that you would have an office full of desks arranged in a grid and people doing tabulations and then passing their sheets to the next person. And so all these things, those specific tasks have become automated, but every time that's happened there's been an explosion in demand for the output and so you need many more people to do that kind of work even if the specific task has changed.

So you think we will have more engineers in 5 years, not less?

Yeah, and sometimes we change what terms mean, right? Like the term computer now refers to something else, but now we have the term software engineer. And so I definitely think we'll have many more builders. Something interesting that I'm observing now is like there's this compression of the talent stack. Like you still need software engineers today. You still need designers. I'm a PM. Do you need PMs? You could have some fun jokes about that. I don't think you need them.

But maybe, you know, maybe when you say engineer, you might be thinking of someone who's like much more full stack, right, than has been true before. Like even if you go back a few years, it was much... you had many more places where there was the backend engineer and the front-end engineer, right? Whereas like now, at least if I think about the Codex team, that's much less the case and things are much more full stack. And so I think this talent stack will compress, but we'll still have people building.

Why do you think we don't need PMs in this world? You dangled the carrot.

Yeah. It's my fun joke. I think, first of all, it's incredibly hard to define what a PM is, what a product manager is. I kind of think of the role as like explicitly undefined, and your goal is just to adapt to whatever the team or business needs. And often if you have a bunch of people like here trying to build as quickly as possible, then what a product manager can do is spend time taking a few steps back and trying to look around corners and figure out what to do, collaborate with the folks and go-to-market, and maybe be the team's greatest cheerleader and quality raiser. But all of those things I just described, which are maybe my current role, could be done by a really strong eng lead or a designer who thinks a lot about product. And so I think it's often useful to have product managers, but you probably don't want many of them until the team is really large.

I was stalking the out of you for the last few days, which was a very fun expedition into your writing, into your tweets, into your prior interviews, and you said that human typing speed and validation work is the key bottleneck to AGI, not model compute or architecture. It kind of left there, and I was like, help me understand why human typing speed and validation work is the key bottleneck and what you really meant by that.

For sure. Okay, that's a fun one. I think there are multiple bottlenecks, but that's maybe the most sort of click-baity one. So if you don't mind, we'll do this slightly socratically. Like, how many times would you say you use AI today?

30 plus times a day.

Okay, cool. How many times do you think, assuming it was like zero energy expenditure from you, how many times you think AI could help you per day?

In everything. I think we'll have inference running 24 hours a day across every single thing.

Exactly. And like I hear things now from engineers like at OpenAI and also outside who are telling me like, "I constantly have Codex running. I never close my laptop, and if it's not running while I'm in a meeting, I'm like wasting my time. I need to make sure Codex always has work for me that it's doing." And that's super cool and super exciting, but that's a lot of work to manage these agents and make sure they're always working.

Going back to the 30 times per day thing. When we look at how often Codex users are using Codex, it's like kind of this tens of times range. I think AI should be helping us tens of thousands of times per day, compute budget permitting, and we'll get there over time. But the problem is, at least if I think of myself—I work on this stuff, I know I should be using AI for everything, but I'm too lazy to type out that many prompts and I am too uncreative to figure out all the ways that AI can help me. And so I end up kind of at a similar number as you.

And I'm still at the point where when I use AI to do something cool like prep for this conversation with you, I'm like kind of proud of myself. I'm like, "Oh, cool. I managed to use AI in this new way." And but that's fine for people like you and me who are like really interested in this topic, right? But I don't think for most people we should expect to, in order to benefit from AGI, should need to put so much effort into how to use this tool. It should just be effortless for them. And so I think the world we want to get to is one where to use AI you don't really need to figure out the right way to prompt. It's just super easy for you and you don't even need to recognize that AI could help you. It's just like, knows you, connected to your context, and chimes in helpfully.

That's where I think like Claude has done well in terms of the packaging they've done, like Claude for legal, Claude for Excel, where you can implement it and have a DCF model—I'm not into models, but like better than one could do before. Do you think it is your job then to productize the prompts and the human actions to remove that bottleneck?

Yeah, totally. So I think that it is our job to make sure that we have the models with amazing capabilities, and then eventually to get to a world where this is like highly productized and so you just have this magic text box or audio input or whatever, or you can just add AI to your group chat and it just starts to help. But I think there's quite an interesting in-between stage, and I think that is where the most value lies right now.

Here's what I mean. You could try to productize like a specific feature of AI for a specific market, and many companies are doing this, but I think it's a little bit hard to know what exactly will work, what is the right form factor. Someone was on your podcast earlier and they said something that I thought was quite interesting about how you cannot adopt AI at enterprise without FDEs.

Yeah, it was Matt Fitzpatrick from Invisible AI.

Yeah. So even though I am literally hiring FDEs—and if you're an FDE, please apply for a job with me—I disagree with that entirely. So, what I think we need to do is build tools for people. Like you can use FDEs, you know, as Fitzpatrick said on the podcast, to automate workflows, right? But then you're limited by what you from your top-down perspective can do and what you from your FDE staffing can staff to be built, right?

But for me, the most exciting future with AI is one where everyone just feels like a superhuman, just empowered by AI. And for that, we need tools that are for people, for individual users, and that everyone feels fluent with. I think the phase that's most interesting that we're at now is building for the kind of people who are interested in figuring out how to use AI. So what we need to ship—and I think this was the genius of when Claude Code first shipped—what they really got right was they had this tool that was super easy to use in whatever context you want, just in your terminal, and people started experimenting with where to use it.

And so I think as we think about AI being used outside of coding work, one of the most important things we can do is not overly build it like, "Okay, this is AI capabilities but only specifically for finance, only for specifically for this workflow," but build a much more open-ended tool that someone can use for any given task creatively.

But does that not put the onus or the effort back on the user, back to the point of your bottleneck of human action and lack of activity on them? If you don't define the task, you put the responsibility on them for defining the task, which humans lack the ability or inclination to do.

Yeah, I think that so that's why I think it's the bottleneck. So here are the three phases in my mind. First, let's have agents work really well for software engineering and coding because LLMs happen to be good at that. Next, let's realize that for an agent to be useful more generally, using a computer is super valuable. And also, we'll realize that all agents are coding agents because coding is just the best way for an agent to use a computer.

So, let's take that same super flexible idea, but make it available to anyone who's excited to explore and tinker. And we're already seeing people start to do this with like the Codex app. Codex app is built for builders, but we're seeing builders use it for all sorts of non-coding tasks. Then finally, once we see what's working, let's build that like productization that you were talking about where you have highly specific features that just work immediately out of the box for people. And I think we're going to speedrun this entire 1-2-3 journey in the next months.

My challenge with what you said about kind of FDEs and implementation within enterprise is data security, sensitivity, permissioning, access provisions is really freaking hard and people much less intelligent and confident than we give them credit for. I think especially in large enterprises, sorry. And I think you need an FDE to go in and custom fit a lot of the different horizontal solutions to make it work. Am I wrong?

I think you're right. If you're trying to go like all the way from zero to one and you have this like—and I don't mean grand negatively here, but if you have like a grand vision for some like ultimate workflow automation system—then yeah, you're going to have to clear through all of these security hurdles, all these like compliance hurdles that are really real, right? Build connections to all these data systems and systems of record and action. So you're going to need an FDE to do that.

What I've seen is that when we do these things top-down, we end up like massively underleveraging the potential of AI in like helping that company. Whereas if you can maybe do that in parallel, right? But if you can just give AI to the people like doing the work, they can start to like get a mental model for how AI can help and then they can start pulling AI into their workflows at the same time.

Here's just like an analogy or something: imagine if you work in a customer support role and AI is being brought into your role and starting to automate like meaningful chunks of your work, but you've never heard of ChatGPT nor are you allowed to use it. In that world, in that scenario, you have like no intuition for what this thing is. Whereas in a world where you've been using ChatGPT for work at the same time as like parts of your work are getting automated by an LLM, you have much more intuition for how this works. And I would argue you feel much more empowered about this idea that it's being accelerated. You have some degree of control to steer like where these automations are built as opposed to like it's like this complete like "deus ex machina" kind of thing that is quite disempowering.

So bringing this back, I think there is a way to do this because the data control issues you mentioned are real, right? But at the end of the day, every tool, every feature, every workflow is for a human who is somewhere, right? An employee somewhere. And that employee is accessing that tooling via their browser or via their file system like at the end of the day, right? And so at the end of the day, everything comes to an interface that an agent running locally on your computer can work with.

And I think it's quite unusual: in OpenAI, we're building a browser, Atlas, right? And you might wonder why, and there are many reasons why, but I think one of the key reasons is that by building a browser and by controlling it like tightly end-to-end, we can build like safe agentic browsing for enterprise that is a way to access things agentically that are otherwise not yet built out by FDEs.

There are so many questions that I have to ask you. I want to go back before I lose the thread. You mentioned about engineers like not closing their laptops because they don't want to lose productivity and time with building with Codex. You partnered with Cerebras, and Cerebras is the fastest provider obviously of inference out there. Amazing win, I think, for both, bluntly. How important is speed for developers when using Codex and in the future of AI code?

The simple answer is it's super important.

And so is it like an inference monopoly like you have it now and competitors don't?

This is just my opinion, but I don't think we're going to end up in like this kind of monopolistic world. I think there's so much competitive pressure that there'll be like multiple answers to this. But I will say that we have like news coming out about that partnership soon and I'm very excited for these kinds of things to ship. It's going to be awesome.

But even so, like with GPT-5.3 Codex, that model is like significantly more efficient than prior models. And so we... in the feedback we've heard is that people feel like now this is like a very competitively fast model than before. So there's a lot of things you can do just in terms of the model. There also things you can do like improving how you do inference. So we recently rolled out a change where in the API those models are served like 40% faster and in Codex they're served like a quarter (25%) faster. So I think like speed matters a lot and we're kind of approaching it from all angles like both the hardware, how you do inference, and the model level.

You mentioned earlier about kind of putting it in the hands of users and we talked about inference there. One of my dear friends is Jason Lemkin from SaaStr and he says that inference is the new sales and marketing. Instead of sales and marketing teams, you're paying for inference so users can onboard quickly, easily, see value, and you will see the removal of sales and marketing teams. It's kind of like next-gen of PLG.

I don't know. I think I struggle with that. I think fundamentally in this new world where anyone can build and it is increasingly easy to build things... but what is hard, right? I think having a good relationship with customer, knowing what they need, is as hard as ever, maybe even harder as it's just like there's just more stuff in the market to choose from. The other things that are hard are like building the right thing, having a really high quality thing. But going back to the sales and marketing thing, like I don't think that goes away because I think that's just gotten harder as the markets—any given market—gets more competitive with more software out there.

Can I ask how much of internal code for you today is produced by Codex? I remember like Claude for work, Boris said, was like 100% or nearly 100%. How much is internal Codex used?

So like, I'll speak for myself and then for the team. I would say like most people that I know are not opening editors anymore.

And this was a step function change that happened in it. It's been happening gradually, but I'd say that the external market touch point for this was like GPT-5.2 Codex where all of a sudden the model was like way better at running for longer, handling tasks end-to-end, managing its context, and following instructions. And so we kind of saw this inflection point and that's why, part of why we built the app.

So broadly, I think before GPT-5.2 Codex, the kinds of AI features we were using to write code were like tab completion or maybe you were pair programming with the model. In my mind, you still needed to be at your laptop with your hands on the keyboard-ish and like it might go off and do a little bit of work, but you kind of still need to be there and like drive—it's just like handling these small things for you. And then at the time of GPT-5.2 Codex in December, we kind of switched to like, "I'm just going to fully delegate this task." It's like I'm going to have a... do a plan with it, make sure we like the spec that it's going to do, and then I'm just going to go let it cook. And this is quite a different way of working.

So, it's changing like literally as we speak. And so, part of why we built this Codex app that we released last week is because we wanted to build like a form factor or user experience where it felt like very ergonomic to be delegating instead of pairing with an agent—delegating to multiple agents at once. And so even at OpenAI this is changing massively. I don't have a percentage stat for you but I would say like the vast majority of code is written by AI and I would say that now probably like most people are not even like opening an IDE. Maybe if they are opening IDEs, it's to like help flesh out like the interface between two modules and then like AI fills it out, or maybe they want to like collaborate on a plan but then have AI fill it out. The code itself is not being written by humans anymore.

Will we have IDEs as a part of the stack in 24 months' time?

Okay, so the formal definition, right? Integrated Development Environment? That phrase is so squishy that like literally anything could be an IDE, right? So I don't think that's very useful. If that's the answer, then yes, you could even argue the Codex app is an IDE. I don't think it is. Like for me, I think of an IDE as like a really powerful editor. We explicitly didn't build editing into the Codex app because we wanted it to be really clear how you're meant to use it. So it has a lot of affordances for managing multiple agents, for delegating, for reviewing changes. It has really prominent "skills," which are an open standard that are really useful for doing non-coding work—stuff like triaging tasks or monitoring deploys or something—but it doesn't have text editing.

If we assume a large percentage is done by Codex in terms of the code produced, how do you do coding reviews and is AI responsible for internal coding reviews?

So there are a few things here. First off, the spec for what you want to do or the plan becomes more important than ever, right? So like think architecturally—how should this code work? So we recently shipped like a very prominent plan mode that works a little differently than others where you have the agent go off and like propose how it's going to do something. It's like quite a long plan and then it asks you questions about if you agree on how it wants to do it or if you want to have input. And this is very similar to like if you had a new hire who was new to your codebase and they had to present a sort of a Request for Comments to the rest of the team before they started doing the work. So even though that's not formally code review, I would say review of the plan is something that's becoming more important because we're entering more of this like delegation phase of working with agents. So that's an underrated thing.

Then, okay, there's actual code review. I think a problem that I hear a lot of people talking about, especially in the open-source world, is like a lot of AI slop. Like people will just be submitting PRs to these open-source repos and they're trash and like maybe the user hasn't—the person submitting the PR hasn't—even tested them or definitely hasn't reviewed the code. I think this is a problem. And so a common practice with Codex is to have Codex like review its own PR or its own change. And Codex is incredibly good at this. We've explicitly trained the model to be good at code review. And that included things like making sure it's like really good at creating like high signal feedback so it'll like have few false positives of criticism which means you can really trust when it has feedback.

And so we not only do we encourage people like on the team and elsewhere like to like just ask Codex to review, you can then also set it up to just like automatically review. So like nearly all code at OpenAI is reviewed by Codex automatically whenever you push it to a repo. Like one fun thing for people who haven't tried Codex yet or didn't try it recently is sometimes the way that people like see how good our models are is by asking Codex to review a different model's code and they're like, "Oh, shoot. I should probably just be using Codex to write my code in general."

You said something really interesting there. You said for those that maybe haven't tried it yet or you are coming back to it, how do you think about retention with this category? I remember Tom Blomfield, who's a YC partner, tweeted months and months ago—but it stuck with me, a weird brain—about the ease of transition between different providers whether it was Cursor or Claude Code or Codex. I can't remember which one it was to be honest, but how sticky are users and how do you think about retention?

We've taken this like kind of counterintuitive approach with Codex to just build it super openly. So like the Codex core harness is open source and we're always trying to make it easier for people to switch. So for instance, when we first launched Codex last year, we created like—created is even a heavy word—it was just we established a convention which is called `agents.md`. This is a file that you can put instructions for the agent in. We didn't call it `codex.md`; we just wanted it to be something that all agents can use. And pretty much every agent except Claude uses `agents.md`, which is awesome.

And then just last week we helped push for putting "skills," which are a standard for like giving the agent instructions and scripts—we pushed for those to be stored in sort of a neutral named folder called `agents` instead of in like `codex` or something. And again, everyone has jumped on it except the usual suspect. So I think it's really great for the developers to have a lot of choice and we're trying to make it even easier for people to try different things.

Now that said, I think these coding tasks where you're asking an agent to write some code, they're quite hermetic. And what I mean by this is because you can... it's like or maybe an analogy in TV would be like episodic, right? Like you can come in and you've got this like open-ended like agents file that like any agent can read from. You've got these skills that any agent can use. And you can ask the agent to write some code and it produces a patch and that patch goes into Git. So kind of like both ends of this are pretty neutral, vendor neutral. So very easy to move between for now.

As agents start to do work that is not writing code but more general work—again for software engineers or beyond for any builder—they're going to need to start interfacing with other systems. Right, so as they start... maybe your agent is talking to Sentry, or it's talking to your Google Docs or something, then I think these agents become much stickier because deciding to connect an agent to that system is a sticky decision. And if you're an enterprise, really trusting that the agent is going to have access to these tools, but there are really good secure guardrails and sandboxes and like controls over how the agent works with these systems, I think is critically important. And that's not something that you're going to want to do multiple times. And so we've been kind of building Codex knowing that this is coming. And so we have like the most conservative sandboxing approach. Sandboxing is kind of like a set of controls, OS-level controls over what the agent can do.

But I'm a fan of *Seven Powers*, this brilliant book which talks about kind of seven ways that businesses accrue value and sustainability, and like your stickiness or your retention is one. If we're on the same team with Codex, how do we create retentive patterns, behaviors, programs to ensure that people stay with Codex and they don't flip to Cursor when there's a better model or Claude Code when there's a better model?

Yeah. It's interesting because I think on the one hand, like, we think about this—obviously we're running a business—but our mission here is to like ensure that like we safely deliver the benefits of AGI to all humanity. And so something that's like unintuitive to people about like the Codex team...

Alex, I know, but your job is the success of Codex, I guess.

Our job is the distribution of intelligence, right? And so we're obviously building out Codex, and this is really unintuitive to a lot of listeners, but like, we put all this effort into training these models and then we serve these models to our competitors, right? And from our perspective...

This is so difficult for me as a venture capitalist to understand. You are aware of this.

Yeah, I'm totally aware of this. Like, OpenAI is a really interesting and unusual place to work. But because we're playing such a long game, for us, like, if the competition gets better, we learn. It's helpful for us. And so we're pushing really hard at growing Codex and...

If they're closed... and they improve, you don't learn.

I don't think so. Like for example, there are a bunch of recent launches. Like even today, I literally just like quote-tweeted a thing this morning about a launch from Warp. No particular affiliation, right? And there are a bunch of cool ideas in there about how they like framed up the way that their agent can work in the cloud at the same time as working locally. For me that's like inspiring and I think I see all these things from various companies and like one of the coolest things about the space is it's like we're all kind of inevitably reaching the same conclusions together and then building things out.

And so on the Codex team I think we have some massive advantages, right? We have the massive distribution advantage with ChatGPT, we have the massive like capability advantage of training our own models to be good in our harness and building our harness to be good at the new models—and like no one else has early access to those. And so I think we're we're playing to win and we have a really big advantage or a number of advantages, but we're also playing this long game where again we serve our models to everyone where we push for open standards so that everyone can use like all the things that we're pushing for as well.

Can I ask you what will be the defining factor of winning? And I know I'm using venture language and you're you're brilliant in kind of much more free and open. But what was like the defining factor of winning? Again, if I push you, is it like GTM, which is like the biggest enterprises in the world do want to work with OpenAI? I have many friends in your sales team. The inbound that you get from the largest brands is incredible. So GTM because of the incredible brand product execution and just Codex being a freaking awesome product? Or compute inference speed, actual like compute advantage? Which one is the defining winner?

Okay, so I think if we're going to talk about it more from an OpenAI perspective, obviously this is way above my pay grade, but I would say it's compute advantage and having the best models, right? And in order to achieve that we then need to build businesses to generate revenue. And also that something we've that's really interesting we noticed with having the Codex team which is a sort of combined team of research and product is also by building these successful products we create a lot of pressure to improve the model in sort of a faster way.

So that's maybe the company perspective, right? If we come to the product perspective I think the single most important thing we can do is build a really good product that people want to use. And like I was saying earlier, I think we really want to build products for individuals and then allow the like people to become fluent in those products and then like pull in automation. And I think that may be counterintuitive but will result in way more impact than anyone purely approaching it from like the enterprise workflow perspective.

So I think that's mostly a question of product execution and then that works for say like prosumer. When it comes to enterprise, the go-to-market side is really important. Like something that I've learned the hard way is if we go to an enterprise and we're just like, "Hey, we're here like feel free to use the stuff." That doesn't work. There's quite a lot of education that needs to be done and there's a lot of like configuration that we need to support and sort of like education of the broader team. So like that motion looks much more like coming in, pitching, meeting the head of developer experience or whatever, understanding how they want their team to operate and then giving them tools to like propagate that mechanism of operating to the rest of the team.

You said the word "revenue" there, which is one metric to measure a business against. When you think about like your metric of success which you sit down with Sam or Brad or whoever it is and say, "Hey, this is what we're optimizing for," what is the metric that you use as the defining northstar for your progression?

It's not revenue as the primary. The primary is active users.

How do you measure active users, like daily active users?

Yeah, we so we measure weekly active users and it's it's just like, you know, did this person like do a turn in our product? Did they send a prompt?

Is weekly active a frequent enough metric, do you think? Sounds nice. But if this is replacing the IDE, is daily active not better?

I think daily active will be better soon. Yeah, we just happen to use weekly active. It's like a standard here. And I think as we were getting started, it made sense. But I... I agree with the criticism there. It's like, we should probably just be at daily. Like, I think we need to be getting to a world where for any given task that you have, your first instinct is to ask an agent to help, right?

It's kind of like how with Google search it's just like, "Okay, anything I need to do I just like go into this text box and I can get navigated to the right location." Then you have ChatGPT, it's like for any information I need I can go into this text box, type it out, and get information that helps me. And I think the next phase that we'll see this year is like for any task I need to do—as opposed to just get information—I go to this text box or this input and something happens that helps me, even if it's not the full task, even if it's only a small part of it.

You said about chat and the interface there. I'm... I'm really fascinated by this because it is a seemingly incredibly efficient input function for busy humans. But I spoke to Anish Acharya, who's a GP at Andreessen, and it came out the other day and he's like, "No, no, this was created by Sam and Elon and it works for very efficient people, but most of the planet want browser-based discovery interactions, UIs." Do you think that chat will be the enduring UI in the next wave of AI interaction with humanity?

The simple answer is yes, but I think there's two components here. Like, if we just imagine the future, like let's think of some sci-fi movie, right? Like what does AI look like? I... I believe that sci-fi is a really good predictor of what the future should look like. And usually it's pretty simple because it's a story. And I think simple is usually right. It's going to be some just like entity that I can talk to however I want about whatever I want, right? I thought like, I shouldn't have to navigate to a place where I work with like my coding AI and then I have this like different place for my like sales AI and I have to be like, "Hey, I'm now talking to sales thing and like do that." It's just like, I'm just going to talk to a thing and it's just going to help. So I think what we're going to have is that we'll have chat or voice. Conversational interface will be sort of the pillar of everything that you can talk to about anything and that you can add into any group chat or whatever so it can like discover how to help you.

But then if you're like a power user and you're very good at a specific thing, you probably don't want to be disintermediated by having to talk to another person. It'd be like if you had an executive assistant, but you can only work by talking to them. That's like super annoying, right? So, at some point, you want to... you want to get to the show notes and like look at them yourself and like edit them yourself, right? You want to edit the thing yourself. So, I think we'll pair chat with like functional like graphical interfaces that are bespoke to like what someone needs. So, like in my case, I will probably chat to like do my, you know, podcast prep. But when it comes to like looking at product and code, I probably want like the Codex app that I can go into and get deep in. Whereas maybe if we're talking to a marketer, maybe that marketer will like chat to ask questions about the product. They're not going to download the Codex app just to ask questions about the product, but maybe they'll have a super custom GUI for like ad analytics or something that they go into.

Totally get that. And it kind of wrongly assumes on my behalf a consumer interaction at some point in that journey. And I want to ask you, how do you think about like agent-to-agent experiences and designing experiences for agents? We spoke about, for example, going to large enterprises and how you can be helpful. I'm just using the most boring thing ever: expense approval. You could have agent submission of expenses on my behalf for my trip to San Francisco and then the agent on the flip side doing approvals for that from OpenAI's compliance department. Agent-to-agent. How do you think about that and that paradigm shift?

My... like quickest answer to this is that like we've noticed as we build Codex that the best... like the best interfaces for Codex to do work are also tend to be the best interfaces for humans. So like when people ask like, "Oh, like, how can I make my codebase like more efficient for the agent to work with?" the answer is often like, "Well, have you looked at it yourself and is it... is it easy for a human to work with?"

So like a very specific example would be like running tests in a codebase. Naively if you just like set up most test runners, they just like emit all the outputs of all the tests. And so like as a human, it's really annoying because you have to go in and like find the one that failed and it's like, you've got to read hundreds of thousands of lines. Turns out that's terrible for AI as well. But if you filter it down to just only emit the failed test—better for humans, also better for agents. So probably the agent-to-agent interaction points will be very similar to like if there was a human in the loop and that's nice because it means you can kind of atomically replace individual systems.

I mentioned our show on LinkedIn and a wonderful investor from a different company... it's like Harry Potter/Voldemort. It's like he-who-shall-not-be-named. I don't want Sam to kill me, but from another company was like, "You gotta... you gotta ask him... ask him how do you think about a coding data moat and does Anthropic have all the data now?"

I think that from what we've seen—and... and I would defer to my research team on this—but I feel like we feel like we have plenty enough data to build really good coding models. I think the place that's more interesting for getting data now is like as we get into like knowledge work tasks. That's kind of data that's like not really like available most places on the internet. And so you start to have like really interesting brainstorms for like how to help a model be good at it. Like maybe you have to like pay people to like simulate doing tasks so that you can like learn these trajectories for the model. Maybe you should acquire startups that are no longer in business or that... and but have a lot of like data—like say they're Slack or something. Yeah, I think that kind of knowledge work task distribution is like much harder than coding.

That's so interesting you said there about kind of the data that doesn't exist so to speak. How do you think about your interactions with the data providers—your Appen, your Turings, your Invisibles, your Datas of the world? Like, will you spend 10x there or will you go, "We are spending too much on data, we should do it ourselves and do data acquisition"?

Yeah, I think the way that we think about these things is just like, how do we move as quickly as possible? And so getting becoming able to set these things up in-house is like very expensive in time and we're a small team. So what I have observed so far is that if we need to run a data campaign at scale, we're usually going to enlist help from one of these companies.

On the consumer side for Codex, we spoken about like enterprises and going into them, how to engage in terms of developer experience, developer relations. Do you compete with a Lovable and a Replit on a like low-end consumer basis in a year or two's time? Is that a business where you're like what—Codex is not for every person to create an "About Me" or a small business to create their own site? How do you think about consumer in that way?

Yeah, I would say that right now it doesn't feel like we're competing super directly. But I don't know if you saw our Super Bowl ad, the tagline of which is just "You can just build things." With the app we noticed that like many people who are less technical are starting to build things and so the kinds of things they're building are much more "Hello World-y." And so I think that we will see some overlap in use cases where you have people just pulling up Codex because they have it as part of their ChatGPT.

Like a big announcement last week was that we're now offering some Codex to people even on free ChatGPT plans or on the go ChatGPT plans. So this is... this is massive just in terms of like bringing availability to everyone. And so I think we're definitely going to see people with like a free ChatGPT plan coming in and just like building simple things where they otherwise might have gone to a specialized tool.

What would you most like to do differently, but for whatever reason you can't?

This is an interesting one. I feel like it's been a very good few weeks for us. So we're very... I'm pretty jazzed by everything that's happening. But maybe the feeling that I had the most...

Yeah, that's really interesting. You said it's been a very good few weeks for us and I feel that. Does the team feel changing winds of momentum both in positive and negative cycles?

Absolutely. We are very attuned to it, right? Like if you look at the history of Codex, the first thing we launched last year was like this amazing idea that people were super excited about. It's like, "Hey, we're going to give the agent its own computer in the cloud. You can have as many of them as you want work for you in parallel on tasks." Super great idea. To be honest, it didn't work as well as what we shipped later. It was not the best.

And then since August with GPT-5, we started pushing really hard on interactive coding, which is where most of the competition in the market is. And we went on an absolute tear. I feel like the public metric we had was like since August, we grew by like 20x. And then like even like late in the year, we like doubled from December to now. I forget the exact number there, but like that was competing neck-and-neck.

But the shift that we feel last week is we felt like we had the most intelligent model that was cemented with 5.3 Codex. We had feedback around our model being slower and like maybe less fun to work with and like being less good at communicating with you while it was working. We addressed that feedback. And that's true even compared to like the other competitor model that launched like 20 minutes before us and was like... maybe this is spicy... it was like SOTA for 20 minutes. SOTA means State Of The Art.

And then we'd always been getting a lot of feedback on like the quality of the user experience in Codex. Our most popular surface was the IDE extension and our CLI (which is a command line interface) was less polished. But with the app, the feedback has been like resounding from the market that this is like a really high quality experience. It's like simple, like unintuitively simple, and people are just loving using it—even our biggest critics are converted. So yeah, and then we and then we had the Super Bowl ad and then we went to free.

So going back to your question of like, what do I most want to do differently? The first is I want to get back to cloud. When we pivoted our strategy from like building the cloud... focusing on the cloud agent last year to working interactively, the thinking was very simple. It was just—and it's kind of like what I was telling you about FDEs—if you go too far ahead to workflow automation before your end user is fluent with the tooling and can get it to work simply, then there's like this disconnect and you just have this pipe dream idea that's not like effective except for the most power users. But once you have this base where people are using your tool every day like you said and they're configuring it and every time they use it gets better, then like the step up to like letting it run independently in the cloud is a much smaller step up, right? So I think it's time for us to like get back to like building out the cloud product and making it super tightly integrated with the local product. It already is somewhat integrated.

And the other thing I want to do differently is start thinking more about the bottlenecks. Like codegen—writing code—has become like, you know, trivial now, but the hard part is like what you were talking about with like code review, right? Like how do we know the code quality is good, how do we know we're doing the right things? And those bottlenecks I think are under-underappreciated still and under-invested in. So like, I think we want to get to a world where you can have an agent that is unbottlenecked, right? That you trust to like own an entire microservice or internal tool or whatever and can do the full iterative loop including feedback from users without having to go through human review. And that is a really hard problem to solve both from an intelligence perspective but also from like a safety perspective and a controls perspective.

How much weight should we place on benchmarks and evals?

I think probably—this is an annoying answer for you—it's like "some," right? Like, they do tell you... they kind of in my mind they give you a good measure of intelligence, right? And so you can put weight on those for intelligence. And especially before evals are saturated, I think you... when you see meaningful progress in those benchmarks, it's like very helpful. And then I think you have to pair that though with like what it feels like to use the model. And that's... that's a vibes thing. Whenever I talk to any... like even internally or even talking to like customers of our models, I'm always surprised by how vibes-based the evaluation of how it feels to work with a model is.

How vibes-based life is. People want to work with people they like is the lesson that I give to kids. In terms of like market composition as an investor, I have to think through how do I think about the eventual state of this given market—kind of a terminal state. How do you think about that? Is it like Uber and Lyft and like the majority of the market will be on Codex or Claude Code, or is it like an AWS, Azure, Google Cloud and a 33-33-33?

I think this might end up with fewer providers that are capturing a lot of value in the long run. And here's why. Like—and maybe this is a bit spicy—but I think that we are kind of in this temporary phase where we have agents that are really good at coding, right? And if you look back last year, like maybe more people thought we would have agents that are good at other domains too, but that didn't happen last year. So we only have PMF for coding agents like in the industry overall, I would say, right? And then there's some like very narrow other use cases like customer support etc. But I think that's probably temporary and then over time I think we're going to end up with agents that kind of can do anything for you.

This is kind of what I was saying earlier: like there's just like a super assistant, you talk to it about anything, and then there is like specific like UI that you can go look at if you happen to be deep in a specific function. So in that world, I don't think you want like 12 agents at the company and you have to like... your employees have to go figure out the right one to talk to because then they won't achieve fluency and if they don't achieve fluency then they will also won't like pull automation into their roles. But if you have this one thing that you can talk to about anything, right? So your onboarding is just like, "Go talk to this thing about anything you need," then people will develop muscle memory to go to it. It'll become the center of gravity of work and people will pull in automation. So I think that future makes much more sense and I think like as the people building ChatGPT, we're like really well set up to deliver that.

This... this is kind of a stretch but an analogy here is I used to work at Dropbox and for a while—this is before Slack was big—and for a while we thought... I wonder... we wondered if people should like go comment on like documents in Dropbox or and then or if they should like go talk about the documents in Slack. And it was like obvious that it was like more optimal for people to like put comments on the right time stamp in the video in Dropbox or like comment on the document in Dropbox, right? So it was more optimal. However, what we saw is that Slack is just such a center of gravity of people just like talking to each other. Like nobody wants to comment on the document. I just want to Slack you, right? And so we saw that like there was this really big pull towards things happening in Slack even if it was less efficient.

And I think we're going to see something similar at work where if there is a single agent you can use for nearly anything, it... there will just be this giant pull and everyone will talk about how they use that one agent for things. Teams will share best practices with each other. There'll be hackathons around how to use that thing. Yeah, and you'll end up with just a handful of these.

You said about kind of agents not really proliferating in terms of usage other than coding and maybe this being the time, and you mentioned customer support is one of the examples. My question to you is: I'm an investor today. I'm looking for companies which will accrue value over time and provide incredible products to customers. There is a belief that the durability of revenue of large SaaS companies today is zero and that SaaS is dead because the model providers—you, Anthropic, others—are going to come for our lunch so to speak. What would you advise me?

Like, things are built for humans. Like, otherwise, what's the point, right? Even... even SaaS tools are built for humans. And so for me, I think my question is like, does this SaaS company own a relationship with a human on the other end of things? And if it does, then I suspect it's... it's not going away. Or does the SaaS company own some like really important system of record? It's probably not going away. Maybe those both of those two things—the interaction with the human and the system of record—are like more important than ever. On the other hand, is the SaaS company like a kind of a glue layer but it doesn't own either of those two things? I'm not the expert here, but I'm more nervous about that kind of company.

So then if we take that stance, Salesforce and ServiceNow... they're down 20, 30, 40%. I think it's massively exaggerated. I think there are some companies that legitimately should be—respectfully, I think Dropbox is in a very difficult position. And I think your Monday.coms of the world though, for the majority of SMBs and consumers who use it—which is large majority of their market—could they vibe-code a to-do list? Yes. Would it be cost-efficient to do so? Not really by the time you customize it and perfect it. And to be honest, a to-do list is generally pretty bland in terms of what you need to do: add task, complete task, show historical tasks, assign to new members. It's not very difficult and so I think you just keep it.

And so I think it's massively overblown. And I think that's the classic knee-jerk reaction from markets. But I do think—sorry, I do think like—I think you're going to come for customer support and I wouldn't want to be in that category.

I think this maybe changes what kind of founder you invest in, right? Like, I think there was this maybe temporary phase where that I liked personally as a product builder—there was this phase where you would invest in like the person who can just like build good product and you could kind of ignore if they had a good thesis around a customer or go-to-market or distribution or anything like that because it was so hard to build good product, right? And I think that was a... that was an anomaly. If we look at where we are now, like maybe that kind of founder is not the founder you should invest in because it's like kind of relatively easier to build good product and you need to go back to like investing in the founder who's like thought through distribution, who has a good domain expertise of what to build for a specific customer.

So again, if you were on my team as an investor, how would you think about interesting areas for us to invest in companies that will accrue value and not be threatened by model providers? Because again, like, you're going into health, you go into code. Obviously Codex is very clear, you go into customer support. Where are you not going and where's Claude Code not going?

I'm tempted to just say like, "I don't know." I couldn't... I think it's a hard time to be an investor. The market is so dynamic. It's hard to say.

It's a really tough time to be investing today. My answer is kind of twofold which is like: number one, I look for things with physical infrastructure—I don't think you're going into energy supply. And then two is like the fintech and banking integrations, gnarly financial products. I don't think OpenAI is going to go into building 500 relationships with banks in Southeast Asia.

Yeah, I... I tend to agree. I again comes back to: are you going into like a gnarly complicated market where customer relationships and like knowledge of the market are everything? That still seems great.

How bad is the war for talent from the UK? We look at SF and I say to companies, it's better to build in Europe because it's impossible to acquire talent and it's impossible to retain it. Am I wrong?

I think that the war for talent is incredibly fierce right now. Obviously at OpenAI, we have an incredibly strong brand and so we're able to attract a lot of talent. But even so we put a ton of effort into like closing candidates that we're really excited about. Even like... even we feel it. It's not like you don't just get whoever you want for free.

Can I ask at the entry price that you get stock at, is it still attractive for the best talent?

I haven't had anyone tell me anything to the contrary.

To what extent do you think about like finding the perfect fit versus finding someone who's good enough?

So earlier I made my joke about like PMs kind of being optional.

Yeah.

I think that's not true. You still need product people, but I do think that they have to be the perfect fit. And if you... if you have someone who's like not the perfect fit, they might just do more harm than good. So, it's kind of means that like we're way more selective than I might have been in other roles.

I'm a CS student, okay? I'm at Stanford. I'm Imperial. I'm at Cambridge. I'm wherever—ETH. Great institution. What would you advise me knowing all that you know now that would help me navigate the next 5 years of my career? I want to be valuable to the AI ecosystem environment as an engineer entering the workforce in the next year.

There's never been a better time to be an engineer because you have incredible tooling available to you to get an incredible amount done and your ability to like ramp into like a complex codebase that you might be hired into has never been faster because you can go ask AI like a ton of questions about the codebase and you can ask it to plan out changes that would otherwise take you like days to research maybe. So I think first off I would say like you should be like very optimistic.

But then of course like... about you want your abilities once you're at the job—then now the question is how do you get the job? I think that because it's never been like easier to build things, the thing that becomes scarcer is like agency, taste, and like quality. And so I would urge you to like just build things and demonstrate your agency and your taste around what you build and like build things that are of high quality and then share those things. Like we get a lot of inbound for from folks both applying for jobs through the careers page or also on social. This is just me but when someone writes to me with like some interesting thoughts and like a link to an interesting project that gets my attention much more than like a normal resume does.

Final questions before we do a quick fire. You mentioned Dropbox earlier. It's... the alumni from Dropbox is incredible. Really like amazing to see the talent that's come out of Dropbox. What's your single biggest lesson from Dropbox that has shaped some of your thinking now with OpenAI?

Oh, I don't need to think about that one. That's kind of the thing I was telling you about earlier, right? Like I think when you're building tooling for people—like for end users—you have to think about like that tooling as a system of engagement, right? If people don't want to use your tool, if it doesn't like naturally feel like the easiest way to get something done, then people just won't use it, right? And so, like, again, I learned that from watching how Slack just absolutely took off.

And so I think about that a lot now when we were building these agents. I'm like, if we build our agent purely as like workflow automation, then it's always going to be like pulling teeth to get that thing started, right? You're going to need to hire Accenture or someone to come in. They're going to need to deploy FDEs. It's going to be tough. But if you can build a system that like people just love using, even if they only use it for partial tasks, over time they'll get better and better at using it. And then that... you'll get connected to the tools you want over time and then you can start layering in automation. Obviously, these aren't mutually exclusive.

How on earth do you reinvigorate growth at Dropbox today?

At least from when I was at Dropbox, the thing we were uniquely good at was desktop software. And desktop software is—it's funny—it was never not "back" but anyways it's so back because if you're solving for productivity and knowledge work, yes there are systems of record everywhere that you need to connect with, but everything at the end of the day happens on the user's computer either in their browser or just like locally in apps on their computer.

And so I do think that the the fastest way we're going to see productivity gains from agents at work is going to be at first meeting users on their computer, working with the stuff that they have available to them without having deployed FDEs to set anything up. And then over time, you'll connect in these various systems. And so, if I was Dropbox, I'd be thinking about how do we leverage our unique domain expertise in like building really good like desktop software and this sort of collaborative layer on top of your computer. How do we leverage that to enable productivity agents? It's a bit broad, but I think that's the angle you go for.

No, I love it and I really appreciate the response. Final one before we do a quick fire, I promise. I've been brought up in a world where margin matters. Software margins are wonderful and it's what makes software a brilliant category to invest in. We're seeing margin profiles that are very different in inference-heavy players in particular. To what extent should I put that out of mind and appreciate that cost will come down, cost of tokens will come down and it's about usage and customer love? Margins will come... or no, margins are freaking important—keep that focus.

I think both costs are going to come down significantly. And I also think that if this is the year of agents being deployed like broadly at work connected, then this is also the year where they're going to have to be connected to all these various systems and I think that's going to be very sticky. And so I view this year as a race. And so I think you want to win that race and you should be okay taking some hit to margin in the meantime.

Dude, quick fire round. So, I say a short statement, you give me your immediate thoughts. Does that sound okay?

Yeah.

What have you changed your mind on most in the last 12 months?

When I joined OpenAI, I thought that—this was a little longer than 12 months ago, but—when I joined OpenAI, I thought that we would all just be hanging out with our computers screen sharing, but within a year from there we'd have this agent that we're just talking to. That was completely wrong. I think the rate of like progress in like multimodal models was like slower than I expected. Multimodal means like models that work with like video and audio. So instead, what happened was that we saw that like agents that work with your computer through code are the way. And so for me, that's been a complete rethink in terms of like how we bring the benefits of AI to like just people generally. It's not through video and audio primarily.

Which lesser known competitor do you respect most and why?

First one that came to mind was AM.

I think they're building—Yeah, AM.

Okay. It's out of... out of the folks at Sourcegraph. Their product has a great reputation of just being like, you know, punching way above its weight. But I think the other thing that I really respect is that they helped initiate this whole like standardization around like àgents.md` and like àgents/skills`, which are what I was saying earlier about like making it so it's easier for users to manage all these different agents that they're trying. You know, we obviously put out àgents.md`, but they put out àgent.md` and Quinn started this all by putting out a tweet that said, "Hey, if you guys buy the domain àgents.md`, we'll standardize to your spelling." And as small as that was, that initiated this whole standardization that I think has been awesome in the community.

Do you think the response to Anthropic's ads was the right response?

There were so many different responses. The one that I heard, obviously, I think was right. The one that I heard was, "Well, one company's being pretty negative about the future and the other company—us, OpenAI—is being really positive and just telling people they can build things and to dream." I thought that response was brilliant.

What's the hardest product decision you've had to make since being at Codex?

Well, I can tell you the most painful product decision we had to make.

Great.

For a while Codex Cloud was like effectively unlimited—not free, like you needed to pay for ChatGPT, but then you had unlimited usage. And you know, we every day that we left it that way, we knew that would be harder to wind back—it being like unlimited—but we were just so focused on competing on our other things that had more PMF that we kind of punted that decision out. And when we when we wound back that unlimited use to some like more reasonable limit, there was a lot of blowback from users. And it was a very small minority of users who like thought everything should be kind of like pseudo-free forever. But that blowback affected us everywhere because like the social chatter doesn't really distinguish between these things. So I think the lesson I learned the hard way there is like you can't... can't make things unlimited for too long.

Dude, it's like pricing grandfathering. Pricing is just it's such a hard thing. What do we do today in engineering or product that in 5 years' time you'll look back on and go, "Oh my god, can you believe that we did that?"

Well, one is just editing code by hand. I think probably another one—this is maybe spicier, but—another one might even be like managing the deployment and monitoring of systems by hand. Like I think that probably big companies will take a long time to like deploy this, but many startups might kind of start building on a completely new stack that's like fully AI managed. To be clear, the stack doesn't exist yet, but a fully managed AI stack where because like it's been built to give you really strong deterministic guardrails over what the agent can do and like control over to like roll back deploys and everything like that.

And so we'll get to a world where the way you start a company is you start by getting an agent and just asking it to build things and then you get more agents in that and then maybe eventually you add you add your co-founders to this service that you use to work with agents. And so you end up like maybe your main communication tool is your agent communication tool and then maybe you're not like hand-holding this like very painful CI and deploy process but you're just like having agents do things.

Weird question but I'm intrigued. Are you the one providing agent guardrails? And what I mean by that is you agents can go anywhere within an enterprise. Are you responsible providing those guardrails or is there a third party matter provider who is saying, "Hey, or Alex, you can't go into that—that's Human Resources," or "Oh, you can't go into that—that's Marketing"? How do you think about guardrail provisioning and is that the role of the agent provider or a third party provider?

I think we'll probably see both. Like we are putting a lot of effort into agent guardrails. Like I said, we have—I think the most—we're the only company that cares about OS-level sandboxing for coding agents for instance. There's none that exists on Windows. We're the ones building that. And we're doing it in open source, so hopefully other people can use it. We think about that a lot. Our chat supports connectors, so you can talk to your like Google Docs or something. And we put a lot of effort into guardrails around what the agent can do with your Google Docs. So those are just two examples, but we think a lot about this. And I think probably though the way that we'll do it will not be sufficient. Like, there'll be third parties who provide like very bespoke things for very bespoke, you know, company needs and there'll probably be a mix of both.

Final one for you, my friend. What are you most excited about when you look forward 10 years?

This is probably going to happen in much less than 10 years. But like my mission sort of personally when I joined the company was I just felt like even with the models we had a year and a half ago, there was so much just capability overhang or just ability for these things to be useful, but we hadn't built the right products around that. And so people like me were getting more benefit than like people like my grandma.

And so what I'm most excited for is to get to like a form factor for AI that means that they're just helping everyone regardless of whether they're in tech and especially if they're not in tech or especially if they're older. And so the concrete vision I have is like at some point we'll like add an agent to like our family WhatsApp or something and it'll just start like being useful to the family without anyone having to think harder about it than that. There are many other ways that could happen, but I think concretely that's the most obvious thing we could do with like my grandma.

Dude, I so appreciate you. I so appreciate you putting up with my wandering questions and my very episodic mind. You've been fantastic, man.

Thanks so much. I appreciate you putting up with my wandering answers. So, all good here.