AI Agents Are Replacing Chatbots: What OpenAI’s Codex Reveals About the Future of Work

30 Jun

A new OpenAI-linked study suggests that the next phase of artificial intelligence may not be about asking chatbots better questions. It may be about delegating work to agents that can act, code, analyse, run tools and complete tasks with less direct human control.

What this article covers

This article explains why OpenAI’s Codex has become one of the clearest signs that AI is moving beyond chatbots, how agentic AI is changing work inside OpenAI and other organisations, why non-developers are adopting coding agents, what this could mean for jobs, productivity and software, and what remains uncertain as companies begin handing more work to AI systems.

In simple terms

Chatbots answer. Agents act.

That is the simplest way to understand the shift now taking place in artificial intelligence. A chatbot waits for a question and gives a response. An AI agent can be given a task, use tools, inspect files, write code, run tests, produce outputs and sometimes work across multiple steps with far less minute-by-minute instruction from a human.

OpenAI’s Codex has become one of the most important examples of this shift. Launched in 2025 as a cloud-based software engineering agent, Codex can work on multiple tasks in parallel, write features, answer questions about a codebase, fix bugs and propose pull requests for review. Each task runs in its own cloud sandbox environment connected to a user’s repository.

A new OpenAI report, also published as a research paper with academics from Columbia, Duke and the University of Pennsylvania, says Codex usage has grown more than fivefold in the first half of 2026. It also says the fastest growth is happening outside the original audience of software developers. Inside OpenAI, Codex has become nearly universal and has largely replaced business use of ChatGPT.

That does not mean everyone is suddenly managing armies of autonomous AI workers. The shift is still uneven. Axios reports that fewer than 1 percent of active consumer users across Go, Free, Pro and Plus plans used Codex during a recent 28-day period. But for people and organisations already using it, Codex is beginning to change the shape of work itself.

The chatbot was only the first interface

For the past few years, the public story of AI has been dominated by the chatbot. The interface was simple, almost disarmingly so: type a question, get an answer. That design helped millions of people understand large language models for the first time. It made AI feel accessible. It also created a misleading impression that the main future of AI would be conversational.

The new evidence from Codex points somewhere else. The most important AI interface may not be a chat window. It may be a task window.

Instead of asking “What should I do?”, users increasingly ask “Can you do this?” That difference is subtle, but enormous. It changes the human role from questioner to delegator. It changes the AI role from respondent to operator. It changes the economic unit of AI from an answer to a completed piece of work.

OpenAI’s own internal data makes this shift unusually visible. The company says that through August 2025, the average OpenAI worker spent less than 10 percent of their output tokens on Codex. By June 2026, Codex had become the primary AI tool for every department at OpenAI, including non-technical teams such as legal, finance and recruiting. OpenAI says Codex now accounts for more than 85 percent of output tokens for the average OpenAI worker and 99.8 percent of weekly output tokens generated inside the company.

Those numbers need careful interpretation. OpenAI is not a normal workplace. Its staff have unusual access to frontier tools, deep AI fluency, strong internal incentives and a culture built around experimentation. The company itself says its internal adoption should be understood as a signal of what may happen when cost, access, training and buy-in are mostly removed. But that is precisely why the data matters. OpenAI may be an outlier, but it is a revealing one.

What makes Codex different from a normal chatbot?

Codex is not simply ChatGPT with a developer label. It belongs to a class of systems often described as agentic AI: software that can take actions on a user’s behalf, use tools, operate across multiple steps and work towards an outcome rather than merely return text.

OpenAI describes Codex as a suite of software agent offerings, including Codex CLI, Codex Cloud and the Codex VS Code extension. In its technical explanation of the Codex agent loop, OpenAI says the system orchestrates interaction between the user, the model and the tools the model invokes to perform software work.

That orchestration is the key. A chatbot may explain how to fix a bug. An agent can inspect the code, attempt the fix, run tests and offer a proposed change. A chatbot may suggest a spreadsheet formula. An agent can potentially transform the file, write a script, check the result and deliver the output. In software, this shift is especially visible because code provides relatively clear feedback. It can compile or fail. Tests can pass or fail. Pull requests can be accepted or rejected.

This is why coding has become the proving ground for agentic AI. It is complex enough to matter, but structured enough to evaluate. It involves real economic value, real professional workflows and real mistakes. It is also an area where users are already willing to pay for tools that save time.

The new data: people are giving agents longer and harder tasks

The most striking finding in OpenAI’s report is not simply that more people are using Codex. It is that users are giving it longer-horizon work.

OpenAI says that by May 2026, 80.6 percent of sampled individual Codex users had made at least one request estimated to represent more than 30 minutes of work by an experienced human. It says 70.2 percent had made at least one request estimated to exceed one hour, and 25.6 percent had made at least one request estimated to exceed eight hours.

The research paper reports that more than 10 percent of users manage three or more concurrent Codex agents at some point each week, while 26.6 percent use “skills”, which allow users to share instructions for complex workflows. It also says the share of individual Codex users submitting at least one request estimated to require more than eight hours of experienced human work has increased nearly tenfold since the start of 2026.

This is where the story becomes more than a productivity story. A person asking one chatbot a question is still working in the old rhythm. A person running multiple agents in parallel is organising work differently. They are no longer simply receiving assistance. They are allocating tasks, monitoring outputs and deciding what to accept, reject or revise.

That is why the language around agents matters. Calling them “assistants” may understate what is happening. In some workflows, they are becoming junior operators, production units or parallel work streams. They are not fully autonomous colleagues. But they are also no longer just autocomplete.

Non-developers may be the more important story

Codex began as a coding tool, but OpenAI’s data suggests the fastest growth is now coming from non-developers. Since August 2025, OpenAI says non-developer users rose 137 times among individual users, 189 times among organisational users and 12 times within OpenAI. The company says non-technical workers use Codex for automation, data transformation, tooling, debugging and structured analysis, often taking on technical execution outside their normal job description.

That may be the bigger long-term story. If agentic AI remains only a developer tool, its impact is large but relatively contained. If it becomes a way for lawyers, analysts, recruiters, marketers, operators and finance teams to generate technical work without being engineers, the organisational effects become much wider.

This does not mean everyone becomes a software developer. It means more people may begin to perform software-shaped work. A legal worker might automate document checks. A recruiter might build a small candidate-tracking workflow. A finance analyst might create a data-cleaning script. A support team might use an agent to interrogate logs or draft internal tools. In many cases, the human still needs judgement, domain knowledge and review. But the boundary between “technical” and “non-technical” work starts to blur.

The danger is that organisations may confuse ability to produce technical output with ability to govern technical systems. An agent can help a non-developer create a script, but that does not automatically mean the script is secure, maintainable, compliant or suitable for long-term use. The rise of non-developer agent use could therefore expand productivity and risk at the same time.

From code generation to delegated execution

The academic literature around coding agents is beginning to frame this as a structural shift in software work. A 2026 survey paper on agentic AI in the software development lifecycle argues that the central object of inquiry has moved from code generation to “delegated execution under human supervision.” It contrasts earlier tools such as code completion with modern agentic systems that operate at the level of a repository, feature or algorithm.

That distinction is crucial. Code completion helps a developer type faster. Agentic coding changes who or what carries out a task. The developer, or increasingly the non-developer, becomes a supervisor of attempted work. The skill shifts from writing every line to defining the task clearly, judging the output, checking the risks and integrating the result.

The same survey paper consolidates evidence suggesting substantial advances in benchmark performance and productivity, while also identifying unresolved problems around evaluation, governance, technical debt, skill redistribution and what it calls the economics of attention.

That final phrase is worth pausing on. If agents make it possible to launch more work, they also create more work to review. The bottleneck may move from production to judgement. Organisations may not run out of ideas or generated outputs. They may run out of human attention to check what the agents have done.

The productivity promise is real, but uneven

The optimistic case for agents is straightforward. They can reduce the friction between intention and execution. A worker has an idea, delegates the first version to an agent, reviews the result, asks for changes and moves faster than they could alone. In the best cases, agents may help people attempt work they previously avoided because it required unfamiliar technical skills.

Axios quoted workplace culture expert Jessica Kriegel saying agents reduce the “psychological cost of action”, making unfamiliar work feel more approachable. That is a useful way to understand why adoption can accelerate once people trust the tool enough to begin.

But productivity is not automatic. The usefulness of an agent depends heavily on the task, the user, the workflow, the review process and the cost of errors. A coding agent that drafts documentation may be useful even if the output needs editing. A coding agent that modifies authentication logic, payment code or compliance workflows requires much tighter control. The easier agents make it to produce work, the more important it becomes to know which work should not be delegated casually.

Research comparing AI coding agents across real pull requests also suggests performance varies by task type. One 2026 paper analysing 7,156 pull requests from the AIDev dataset found that documentation tasks had an 82.1 percent acceptance rate compared with 66.1 percent for new features. The same paper reported that OpenAI Codex performed consistently across nine task categories, but that no single agent performed best across all task types.

That is a useful corrective to hype. Agents are not uniformly good at “software”. They are better at some tasks than others. The practical question for companies is not whether agents work. It is where they work, under what supervision, with what controls and at what cost.

The hidden cost of agentic work

One of the most revealing developments around Codex happened not in a research paper, but in a usage-limit incident. Business Insider reported on 30 June 2026 that OpenAI had fixed an issue that caused some Codex users to hit usage limits faster than expected. According to the report, Codex engineering lead Thibault Sottiaux said internal features such as auto-review and helper subagents sometimes ran more frequently than intended, consuming more compute.

This may sound like a small product problem. It is actually a window into the economics of agents.

Chatbot use is comparatively legible: a user sends a prompt, receives an answer, and the exchange consumes tokens. Agentic work is messier. An agent may plan, inspect, retry, run tools, call subagents, review its own work and generate intermediate outputs the user never directly sees. Some of the most valuable work may happen in the background. Some of the most expensive work may also happen there.

That has consequences for pricing, transparency and trust. If users cannot understand why an agent consumed a large amount of their quota, they may become reluctant to delegate. If companies cannot predict how much agentic workflows will cost, they may restrict access. If providers cannot manage background compute reliably, agents become harder to scale profitably.

The Codex incident does not prove that agents are unsustainable. OpenAI said it deployed fixes, reset usage limits and improved monitoring. But it does show that agentic AI is not just a new interface. It is a new cost structure.

The labour question: replacement, reorganisation or both?

Any serious article about AI agents has to address the labour question without falling into either panic or denial. Codex does not prove that agents will replace whole professions. It does show that pieces of knowledge work can increasingly be delegated to software. That is enough to matter.

The OpenAI-linked research paper discusses implications for productivity, job reorganisation and workforce restructuring. That framing is careful, and it should be. The likely impact of agents will not be identical across industries, roles or companies. Some workers may become dramatically more productive. Some teams may shrink. Some roles may be redesigned around review and orchestration. Some junior pathways may be weakened if entry-level tasks are handed to agents. New roles may emerge around evaluation, governance, prompt design, workflow architecture and AI operations.

The most immediate change may be less visible than mass replacement. Work may be reorganised around people who can delegate well. Employees who know how to break problems into agent-ready tasks, evaluate outputs and integrate results may gain leverage. Employees whose value lies mainly in routine execution may face more pressure.

There is also a management question. If one worker can supervise multiple agents, how should performance be measured? Is output generated by an agent the worker’s output, the company’s output or the tool’s output? Who is responsible when it fails? How should review be documented? How should sensitive data be protected? How should companies prevent an explosion of poorly maintained internal tools and scripts?

These are not distant philosophical questions. They are the operational questions that arrive once agents leave demos and enter workflows.

Why this matters beyond software

Software is only the first obvious domain because it is digital, structured and testable. But the pattern behind Codex is broader. Many forms of office work involve files, rules, systems, repetitive decisions and digital tools. Once agents can safely interact with those systems, the logic of delegation spreads.

A marketing team may delegate research summaries, content audits and campaign reports. A finance team may delegate reconciliation checks or spreadsheet transformation. A legal team may delegate contract comparison or document organisation, while still requiring human review. A customer support team may delegate ticket triage, knowledge-base updates or internal tooling. A founder may delegate prototype creation. A journalist may delegate transcript cleaning, source organisation and data extraction, while retaining editorial judgement.

The dividing line will not be whether a task is “creative” or “technical”. It will be whether the task can be specified, executed, checked and corrected within an acceptable risk threshold.

That risk threshold matters. An agent producing a first draft of an internal tool is one thing. An agent acting on live customer data, legal obligations, financial systems or medical information is another. The future of agents will depend not only on model capability, but on permissions, audit trails, sandboxing, data access, organisational rules and user trust.

What we know and what remains unclear

What we know is that Codex usage is rising quickly among active users, that OpenAI’s internal workplace has moved heavily towards agentic tooling, and that non-developers are adopting Codex faster than the original developer audience. We also know that users are increasingly delegating longer tasks and running multiple agents in parallel.

What remains unclear is how representative this is of the wider economy. OpenAI is an unusually AI-native organisation. External consumer adoption remains much lower, and Axios reports that fewer than 1 percent of active consumer users in relevant ChatGPT plan groups used Codex during the measured 28-day period.

It is also unclear how much productivity gain survives after review, correction, governance and integration costs are counted. A task that an agent completes in ten minutes may still require an expert to inspect it. If the expert spends twenty minutes checking and fixing the result, the productivity gain may be smaller than it first appears. In high-risk domains, review may be non-negotiable.

The final uncertainty is whether agents will remain specialised or become general work interfaces. Codex is still strongly associated with coding, even as non-developers use it for broader tasks. The next stage may be agents designed explicitly around business workflows, not software repositories. If that happens, the chatbot may start to feel like an early transitional interface rather than the destination.

What happens next

The next phase of agentic AI will be shaped by four questions.

The first is capability. Agents need to become more reliable across longer tasks, more transparent in their reasoning and better at recovering from mistakes. It is not enough for them to succeed impressively in demos. They must be dependable in ordinary workflows.

The second is cost. Agentic systems can consume far more compute than simple chatbot exchanges because they act, retry, review and coordinate background work. The Codex usage-limit incident shows how sensitive this will become for both providers and users.

The third is governance. Companies will need rules for what agents can access, what they can change, when humans must approve actions, how outputs are logged and who is accountable when something goes wrong.

The fourth is culture. Workers must learn not only how to prompt agents, but how to manage them. That may become a core workplace skill: knowing what to delegate, how to specify the task, how to check the work and when to stop the agent from doing more.

The chatbot age is not ending overnight. Most people still use AI by typing into a box and reading a reply. But the direction of travel is becoming clearer. The next interface for AI may be less like conversation and more like command, supervision and review. The question is no longer only what AI can say. It is what AI can do.

Key takeaways

OpenAI’s Codex has become one of the clearest examples of AI moving from chatbot interaction to delegated work.
OpenAI says Codex now accounts for more than 85 percent of output tokens for the average OpenAI worker and 99.8 percent of weekly output tokens inside the company.
Codex usage has grown more than fivefold in the first half of 2026, according to an OpenAI-linked research paper.
Non-developers are among the fastest-growing Codex user groups, suggesting agentic AI may spread beyond software engineering into broader office work.
Users are increasingly delegating longer tasks, with OpenAI reporting that 25.6 percent of sampled individual Codex users made at least one request estimated to exceed eight hours of experienced human work.
The productivity promise is real, but uneven. Agents perform better on some task types than others and still require human supervision.
The future of work may depend less on asking AI questions and more on managing AI agents.

FAQs

What is an AI agent?

An AI agent is a system that can take actions on behalf of a user, often using tools, files, software, APIs or browsers to complete a task. Unlike a chatbot, which mainly responds with text, an agent can work through multiple steps towards an outcome.

What is OpenAI Codex?

OpenAI Codex is an AI software engineering agent that can write features, fix bugs, answer questions about codebases and propose pull requests. It can run tasks in cloud sandbox environments connected to a user’s repository.

How are AI agents different from chatbots?

Chatbots primarily answer questions or generate text. AI agents can take actions, use tools, inspect files, run commands, produce outputs and work across longer tasks. The main shift is from conversation to delegation.

Is Codex only for software developers?

Codex began as a coding tool, but OpenAI says non-developer use has grown rapidly. Non-technical users are using it for automation, data transformation, tooling, debugging and structured analysis.

Are AI agents replacing workers?

The evidence does not show simple one-for-one replacement. It suggests work is being reorganised. Some tasks can be delegated to agents, while humans increasingly define, supervise, review and integrate the work.

Why are AI agents expensive to run?

Agents can consume more compute than ordinary chatbot interactions because they may plan, call tools, retry tasks, use subagents, review outputs and generate intermediate work in the background. This makes cost control and usage transparency important.

What are the risks of using AI agents at work?

The risks include incorrect outputs, security issues, data exposure, poor-quality code, hidden technical debt, unclear accountability, overreliance and high compute costs. Human review and governance are essential.

Will chatbots disappear?

No. Chatbots will remain useful for explanation, drafting and quick answers. But for many work tasks, the more important interface may become agentic: assigning work, monitoring progress and reviewing completed outputs.

Katie Wilde