Artificial intelligence is no longer a tool that just spits out paragraphs and poems. After years of ChatGPT answering questions, correcting grammar and helping us brainstorm names for our cats, OpenAI has released something radically different: ChatGPT’s Agent mode. This new feature promises to take the dull parts of your day off your plate by literally hopping on the web, filling out forms, running code and producing polished deliverables. Sounds like science fiction? It isn’t.
In this post I’ll unpack what Agent mode actually is, why OpenAI built it, what real‑world tasks it can (and cannot) handle, and what all this means for your business. I’ve tested the agent on my own errands and combined those observations with security research and reviews from across the web. Grab a cup of coffee and let’s demystify this shiny new AI assistant.
The easiest way to describe Agent mode is to imagine giving ChatGPT a virtual laptop. Instead of only generating text, the agent spins up its own secure browser, terminal and workspace. It can click buttons, type into forms, run Python scripts and even open connectors to services like Google Drive or GitHub. The agent “automates tasks for you” by using a virtual computer to browse websites, fill forms, run code, analyse data and create deliverables like slides or spreadsheets. You watch every step in real time: the agent shows you which tool it’s using and asks for your approval before critical actions.
The move is strategic. AI research has been marching toward autonomy: models that reason, plan and act rather than simply respond. OpenAI’s official announcement frames the agent as the bridge from research to action: a system that can “handle requests like ‘look at my calendar and brief me on upcoming client meetings based on recent news,’ ‘plan and buy ingredients to make Japanese breakfast for four,’ and ‘analyse three competitors and create a slide deck’”.
In other words, they want to transform ChatGPT from an advisory tool into a productivity engine. The agent also allows OpenAI to showcase GPT‑4o’s new reasoning capabilities — the model breaks tasks into steps, decides when to browse, when to run code and when to ask questions — all while staying within a single conversational context. It’s a taste of where AI assistants are headed.
Agent mode shines when tasks involve multiple steps across different tools. It combines web navigation and deep research in one place. So, the agent can:
You’ve probably seen some of the demos: hunting down a sold‑out toy, planning multi‑day itineraries, and compiling grocery lists. With a bit of patience it can even snag a DMV appointment in a matter of seconds.
So, I’ve also tried to test it a little bit. Below are the tasks I set for the agent and what happened:
I asked the agent to find and book a luxury hotel in Paris for August 1–3, 2025. It compiled a table of five-star options complete with pricing, amenities, and star ratings — a helpful and well-structured starting point. After I chose the Norman Paris Hôtel & Spa, the agent opened Booking.com, entered the travel details, filtered the results, located the exact listing, and selected the correct room (~€892 total).
It reached the final checkout page but stalled when it asked for guest details and payment — information it couldn’t fill in because I hadn’t provided it. While the booking wasn’t completed, the agent did everything else correctly, automating about 90% of the process. I stopped just short of confirming to protect the last €5 on my card from any surprise transactions.
I asked the agent to write a detailed comparison of OpenAI vs Anthropic, with the expectation that it would handle the entire process — from gathering information to structuring the article. While the final blog post wasn’t exceptionally well written, it was far more accurate than what you'd usually get from ChatGPT in regular mode. Since the agent searched the web in real time, you could trace every claim back to its source, which made the process much more trustworthy. I then asked it to turn the article into a presentation.
The agent produced a complete slide deck with all the right sections, plus charts, icons, and some imagery. The design wasn’t visually impressive: the graphics were basic and not really correctly placed, and layout options limited. It also hit a wall when trying to upload to Google Docs (due to login limitations), but it returned Word and PPT files.
One of my regular blog tasks is creating interactive elements, so I asked the agent to build a toggle switch for the 923 website — one that flips between a plugged-in and unplugged laptop, complete with matching text and animation. Instead of providing style guidelines, I asked it to extract them directly from our site. On the first attempt, ChatGPT ignored our branding entirely. After I asked it to try again using our actual color palette and visual tone, it identified key styles from our website (like the purple-pink gradient, teal accent, and dark background) and incorporated them correctly.
It created a smooth React component via v0.dev, then converted it into a standalone HTML/CSS/JS version when requested. A hiccup occurred when it couldn’t preview the result on CodePen due to CAPTCHA, but it quickly pivoted to JSFiddle. After one small manual fix (moving a script from the wrong pane), the final component worked exactly as intended and kinda matched our visual identity.
The agent isn’t always the superhero you hope for. Some of its shortcomings are simply growing pains; others are inherent limitations of giving an AI the keys to the internet.
Giving a model permission to act as you online introduces a new class of risks. Security researchers warn that Agent mode amplifies existing AI vulnerabilities and creates fresh ones.
If you’re considering using Agent mode for sensitive tasks, take these precautions:
Beyond security, there are practical limitations that might make Agent mode unsuitable for mission‑critical workflows:
Not directly. The consumer agent is tied to ChatGPT’s interface and cannot be embedded via API. If you want similar functionality inside your own software, you need to build a custom agent using OpenAI’s developer tools. The Agents API and Responses API allow you to orchestrate multi‑tool workflows, search files and execute code under your own governance. These APIs are not subject to the same usage caps and can be integrated into secure environments.
But with that control comes responsibility. You’ll need to think about least‑privilege access, authentication, audit logs, and prompt‑injection safeguards. These agents must be designed carefully. At this stage, they’re best suited for assisting with research, drafting reports, or automating routine decisions – always with a human in the loop.
This is where most forward-thinking companies are heading: not just using Agent mode inside ChatGPT, but building controlled simulations where agents and employees operate together. These custom agents can operate within specific boundaries, align with your data governance, and reflect your brand and decision-making logic.
As our CEO puts it:
Yes. Free users cannot access Agent mode. Pro subscribers get roughly 400 agent messages per month, while Plus and Team subscribers receive about 40 messages.
Not without your approval. The agent can fill out forms, add items to a cart and pre‑fill payment details, but it stops and asks for confirmation before submitting. It also refuses to conduct high‑stakes financial actions like transferring money.
No. The agent cannot solve CAPTCHAs or bypass bot detectors. You must complete these steps manually, and even then it might still not work.
Yes. The agent stores cookies so it can continue tasks across websites. This means it might remain logged into your accounts. Always sign out inside connected services to ensure the agent no longer has access.
Not in consumer Agent mode. The agent uses the same permissions as your login. You cannot grant it read‑only access or limit its scope. To implement granular permissions, use the developer APIs and design your own connector layer.
Agent mode shows where AI is going. It's no longer just about answering questions but about taking action. When it works, it feels like a capable assistant handling research, drafting, or navigating the web for you. But it’s still early. The agent can be slow, make mistakes, and inherit your account access, which raises real security concerns.
For individuals, it offers a new level of convenience. For businesses, it’s a reason to think beyond small automations. The real opportunity lies in building structured systems where agents and people work together, with clear boundaries and shared objectives.
That’s what we help companies build at NineTwoThree. Whether you’re testing Agent mode or planning custom tools with OpenAI’s APIs, we design agentic systems that actually work. If you're exploring how agents could support your business, we’d be happy to talk.
Artificial intelligence is no longer a tool that just spits out paragraphs and poems. After years of ChatGPT answering questions, correcting grammar and helping us brainstorm names for our cats, OpenAI has released something radically different: ChatGPT’s Agent mode. This new feature promises to take the dull parts of your day off your plate by literally hopping on the web, filling out forms, running code and producing polished deliverables. Sounds like science fiction? It isn’t.
In this post I’ll unpack what Agent mode actually is, why OpenAI built it, what real‑world tasks it can (and cannot) handle, and what all this means for your business. I’ve tested the agent on my own errands and combined those observations with security research and reviews from across the web. Grab a cup of coffee and let’s demystify this shiny new AI assistant.
The easiest way to describe Agent mode is to imagine giving ChatGPT a virtual laptop. Instead of only generating text, the agent spins up its own secure browser, terminal and workspace. It can click buttons, type into forms, run Python scripts and even open connectors to services like Google Drive or GitHub. The agent “automates tasks for you” by using a virtual computer to browse websites, fill forms, run code, analyse data and create deliverables like slides or spreadsheets. You watch every step in real time: the agent shows you which tool it’s using and asks for your approval before critical actions.
The move is strategic. AI research has been marching toward autonomy: models that reason, plan and act rather than simply respond. OpenAI’s official announcement frames the agent as the bridge from research to action: a system that can “handle requests like ‘look at my calendar and brief me on upcoming client meetings based on recent news,’ ‘plan and buy ingredients to make Japanese breakfast for four,’ and ‘analyse three competitors and create a slide deck’”.
In other words, they want to transform ChatGPT from an advisory tool into a productivity engine. The agent also allows OpenAI to showcase GPT‑4o’s new reasoning capabilities — the model breaks tasks into steps, decides when to browse, when to run code and when to ask questions — all while staying within a single conversational context. It’s a taste of where AI assistants are headed.
Agent mode shines when tasks involve multiple steps across different tools. It combines web navigation and deep research in one place. So, the agent can:
You’ve probably seen some of the demos: hunting down a sold‑out toy, planning multi‑day itineraries, and compiling grocery lists. With a bit of patience it can even snag a DMV appointment in a matter of seconds.
So, I’ve also tried to test it a little bit. Below are the tasks I set for the agent and what happened:
I asked the agent to find and book a luxury hotel in Paris for August 1–3, 2025. It compiled a table of five-star options complete with pricing, amenities, and star ratings — a helpful and well-structured starting point. After I chose the Norman Paris Hôtel & Spa, the agent opened Booking.com, entered the travel details, filtered the results, located the exact listing, and selected the correct room (~€892 total).
It reached the final checkout page but stalled when it asked for guest details and payment — information it couldn’t fill in because I hadn’t provided it. While the booking wasn’t completed, the agent did everything else correctly, automating about 90% of the process. I stopped just short of confirming to protect the last €5 on my card from any surprise transactions.
I asked the agent to write a detailed comparison of OpenAI vs Anthropic, with the expectation that it would handle the entire process — from gathering information to structuring the article. While the final blog post wasn’t exceptionally well written, it was far more accurate than what you'd usually get from ChatGPT in regular mode. Since the agent searched the web in real time, you could trace every claim back to its source, which made the process much more trustworthy. I then asked it to turn the article into a presentation.
The agent produced a complete slide deck with all the right sections, plus charts, icons, and some imagery. The design wasn’t visually impressive: the graphics were basic and not really correctly placed, and layout options limited. It also hit a wall when trying to upload to Google Docs (due to login limitations), but it returned Word and PPT files.
One of my regular blog tasks is creating interactive elements, so I asked the agent to build a toggle switch for the 923 website — one that flips between a plugged-in and unplugged laptop, complete with matching text and animation. Instead of providing style guidelines, I asked it to extract them directly from our site. On the first attempt, ChatGPT ignored our branding entirely. After I asked it to try again using our actual color palette and visual tone, it identified key styles from our website (like the purple-pink gradient, teal accent, and dark background) and incorporated them correctly.
It created a smooth React component via v0.dev, then converted it into a standalone HTML/CSS/JS version when requested. A hiccup occurred when it couldn’t preview the result on CodePen due to CAPTCHA, but it quickly pivoted to JSFiddle. After one small manual fix (moving a script from the wrong pane), the final component worked exactly as intended and kinda matched our visual identity.
The agent isn’t always the superhero you hope for. Some of its shortcomings are simply growing pains; others are inherent limitations of giving an AI the keys to the internet.
Giving a model permission to act as you online introduces a new class of risks. Security researchers warn that Agent mode amplifies existing AI vulnerabilities and creates fresh ones.
If you’re considering using Agent mode for sensitive tasks, take these precautions:
Beyond security, there are practical limitations that might make Agent mode unsuitable for mission‑critical workflows:
Not directly. The consumer agent is tied to ChatGPT’s interface and cannot be embedded via API. If you want similar functionality inside your own software, you need to build a custom agent using OpenAI’s developer tools. The Agents API and Responses API allow you to orchestrate multi‑tool workflows, search files and execute code under your own governance. These APIs are not subject to the same usage caps and can be integrated into secure environments.
But with that control comes responsibility. You’ll need to think about least‑privilege access, authentication, audit logs, and prompt‑injection safeguards. These agents must be designed carefully. At this stage, they’re best suited for assisting with research, drafting reports, or automating routine decisions – always with a human in the loop.
This is where most forward-thinking companies are heading: not just using Agent mode inside ChatGPT, but building controlled simulations where agents and employees operate together. These custom agents can operate within specific boundaries, align with your data governance, and reflect your brand and decision-making logic.
As our CEO puts it:
Yes. Free users cannot access Agent mode. Pro subscribers get roughly 400 agent messages per month, while Plus and Team subscribers receive about 40 messages.
Not without your approval. The agent can fill out forms, add items to a cart and pre‑fill payment details, but it stops and asks for confirmation before submitting. It also refuses to conduct high‑stakes financial actions like transferring money.
No. The agent cannot solve CAPTCHAs or bypass bot detectors. You must complete these steps manually, and even then it might still not work.
Yes. The agent stores cookies so it can continue tasks across websites. This means it might remain logged into your accounts. Always sign out inside connected services to ensure the agent no longer has access.
Not in consumer Agent mode. The agent uses the same permissions as your login. You cannot grant it read‑only access or limit its scope. To implement granular permissions, use the developer APIs and design your own connector layer.
Agent mode shows where AI is going. It's no longer just about answering questions but about taking action. When it works, it feels like a capable assistant handling research, drafting, or navigating the web for you. But it’s still early. The agent can be slow, make mistakes, and inherit your account access, which raises real security concerns.
For individuals, it offers a new level of convenience. For businesses, it’s a reason to think beyond small automations. The real opportunity lies in building structured systems where agents and people work together, with clear boundaries and shared objectives.
That’s what we help companies build at NineTwoThree. Whether you’re testing Agent mode or planning custom tools with OpenAI’s APIs, we design agentic systems that actually work. If you're exploring how agents could support your business, we’d be happy to talk.