A first look at the new ChatGPT agent (and how it can change the internet)
OpenAI's powerful new ChatGPT Agent redefines AI capabilities while introducing new risk and attack vectors in security and data integrity of AI systems.
OpenAI has released ChatGPT agent, a new AI tool that can actively work on a user's behalf. It moves beyond conversation to execution, using its own virtual computer to handle complex, multi-step tasks. You can ask it to perform requests like analyzing competitors to create a slide deck or planning and purchasing ingredients for a meal. The agent autonomously navigates websites, runs code, analyzes data, and delivers editable documents to complete the task.
ChatGPT agent integrates the strengths of earlier, separate tools to create a unified system. The result is an agent that can reason through a problem and then take the necessary steps to solve it, changing how users can approach both professional and personal digital tasks. This development has broad implications for the future of AI assistants and introduces new challenges in safety and reliability.
A look under the hood
The agent's core capability comes from merging two previously distinct functions. It combines the web-interaction skills of a tool like Operator, which can click and type on websites, with the analytical power of a tool like deep research, which synthesizes information from various sources across the web. This enables ChatGPT agent to actively engage with websites to gather precise information and then perform deep analysis on that data within a single, seamless process.
To execute tasks, the agent has a suite of tools. This includes a visual browser for graphical interfaces, a text-based browser for simpler queries, a terminal for running code, and API access through connectors for apps like Gmail and Github. This multi-tool approach allows the agent to select the most efficient method for any given step, such as using an API to check a calendar, a text browser to quickly parse large documents, or a visual browser to interact with a complex web application. The agent performs all this work in its own virtual computer, preserving context across different tools. This is similar to the approach used by Manus, an Chinese AI startup that provides agentic tools.
OpenAI emphasizes that the user remains in control throughout the process. An on-screen narration shows what the agent is doing in real-time. The user can interrupt the task, take over the browser to log in or make corrections, or stop the process entirely. For actions with real-world consequences, such as making a purchase, the agent is designed to explicitly ask for user permission before proceeding.
How ChatGPT agent changes the game
Perhaps the most significant design shift is ChatGPT agent’s interactive workflow. Unlike earlier models that required users to wait for a task to finish before making corrections, ChatGPT agent can be interrupted and redirected at any point. This creates a more organic, human-like collaboration. A user can start with a general idea for a task and refine the instructions as the agent works, lifting the burden of having to define every detail perfectly from the outset. The agent reinforces this collaborative feel by proactively asking for clarification when required.
The agent's capabilities are supported by strong performance on several industry benchmarks. It set a new state-of-the-art score of 68.9% on BrowseComp, a test measuring the ability to find hard-to-find information online. On SpreadsheetBench, it outperformed existing models, scoring 45.5% compared to Copilot in Excel's 20.0%. It also achieved 27.4% accuracy on FrontierMath, considered the hardest known math benchmark.






Promise, peril, and the path forward
While the ability to supervise the agent and jump into its environment is a crucial feature for today's models, the ultimate goal is greater autonomy. The current "look over the shoulder" approach is a necessary stopgap. An ideal agent would not require constant monitoring; instead, it would operate independently and know when to proactively seek human input if it gets stuck or has low confidence. The current design is a practical step, but the technology's full potential lies in freeing up the user's time, which requires less direct oversight.
This increased autonomy also magnifies existing challenges, particularly with hallucinations. As agents produce highly polished final outputs like presentations or financial models, it becomes easier to trust the results without question. Verifying the accuracy of a report compiled from dozens of web sources is difficult. In one of the examples shown in the OpenAI live demo, the researchers used ChatGPT agent to create a slide show for the benchmark results of their newly released tool. The result was impressive but only useful because they already knew the benchmark numbers and didn’t need to check the validity of the information.
Security presents another significant hurdle. OpenAI has implemented safeguards, including defenses against prompt injection and requiring user confirmation for sensitive actions such as payments. However, new tools create new threat vectors. Malicious actors will likely develop sophisticated, chained attacks. For instance, a website could present different information to an AI agent than it does to a human user, or hide malicious instructions in invisible elements. Such an attack could first fool the agent into taking an unwanted action and then deceive the human supervisor who approves it, creating complex security challenges that go beyond current defenses.
Ultimately, tools like ChatGPT agent are a stepping stone to a deeper transformation of the web. Currently, AI agents are learning to navigate a digital world built for humans. The next evolution will likely see the internet itself adapt to accommodate both humans and AI agents. This could manifest as richer, more standardized APIs and MCP endpoints for websites and applications, allowing agents to interact with services directly and efficiently without having to parse human-centric user interfaces. Such a shift would unlock a vast new range of applications and use cases for agentic AI.
Once agents stop imitating human clicks and start expecting MCP endpoints, the web will fragment into two internets: one for people, one for machines
I agree that ChatGPT's agent is a game-changer.
Do you think that this might become available on a larger scale for non-paid users (eg. maybe Google's AI Studio or Gemini might replicate something like GPT's agent?) As you mentioned in your last paragraph, how do you think this will alter the internet (negative and positive)?
I think the agent will get stuck on payment issues, captchas, and other means of moving forward designed for humans...