ChatGPT Agent Mode: Use Cases, Limits, and Business Value

Published on

August 6, 2025

Updated on

September 9, 2025

ChatGPT Agent Mode: Use Cases, Limits, and Business Value

Tried and tested: here’s what Agent Mode gets right, where it struggles, and how it changes the way we interact with AI.

Artificial intelligence is no longer a tool that just spits out paragraphs and poems. After years of ChatGPT answering questions, correcting grammar and helping us brainstorm names for our cats, OpenAI has released something radically different: ChatGPT’s Agent mode. This new feature promises to take the dull parts of your day off your plate by literally hopping on the web, filling out forms, running code and producing polished deliverables. Sounds like science fiction? It isn’t.

In this post I’ll unpack what Agent mode actually is, why OpenAI built it, what real‑world tasks it can (and cannot) handle, and what all this means for your business. I’ve tested the agent on my own errands and combined those observations with security research and reviews from across the web. Grab a cup of coffee and let’s demystify this shiny new AI assistant.

What is ChatGPT Agent mode?

The easiest way to describe Agent mode is to imagine giving ChatGPT a virtual laptop. Instead of only generating text, the agent spins up its own secure browser, terminal and workspace. It can click buttons, type into forms, run Python scripts and even open connectors to services like Google Drive or GitHub. The agent “automates tasks for you” by using a virtual computer to browse websites, fill forms, run code, analyse data and create deliverables like slides or spreadsheets. You watch every step in real time: the agent shows you which tool it’s using and asks for your approval before critical actions.

Why did OpenAI introduce it?

The move is strategic. AI research has been marching toward autonomy: models that reason, plan and act rather than simply respond. OpenAI’s official announcement frames the agent as the bridge from research to action: a system that can “handle requests like ‘look at my calendar and brief me on upcoming client meetings based on recent news,’ ‘plan and buy ingredients to make Japanese breakfast for four,’ and ‘analyse three competitors and create a slide deck’”.

In other words, they want to transform ChatGPT from an advisory tool into a productivity engine. The agent also allows OpenAI to showcase GPT‑4o’s new reasoning capabilities — the model breaks tasks into steps, decides when to browse, when to run code and when to ask questions — all while staying within a single conversational context. It’s a taste of where AI assistants are headed.

What can it do? (Spoiler: it’s not a miracle worker)

Agent mode shines when tasks involve multiple steps across different tools. It combines web navigation and deep research in one place. So, the agent can:

Browse the web visually or via text, clicking links, scrolling and filling out forms. You literally watch the cursor move.
Run code and analyze data. Built‑in Python means it can process spreadsheets, generate charts and clean data.
Create documents and slides. It can compile your findings into docs or PowerPoint decks without you opening another tab.
Use connectors. When you allow it, the agent can read your Gmail, fetch files from Google Drive or interact with GitHub. It always requests confirmation before sending emails or making purchases.
‍

Need help understanding AI?

Talk to Andrew Amann

🚀 Created and Launched 14 Startups (4 Acquisitions)

⚡ Built 150+ mobile & web apps for 70+ Clients

📍 2 U.S. Patents in bluetooth tracking manufacturing parts

🏆 3 years in a row "Fastest Growing Company" on Inc 5000

💬 Speaks and writes about advising companies on AI strategies,
process improvement, and technology innovation.

Book a consultation

‍
Real-life examples

You’ve probably seen some of the demos: hunting down a sold‑out toy, planning multi‑day itineraries, and compiling grocery lists. With a bit of patience it can even snag a DMV appointment in a matter of seconds.

So, I’ve also tried to test it a little bit. Below are the tasks I set for the agent and what happened:

Booking a hotel room

I asked the agent to find and book a luxury hotel in Paris for August 1–3, 2025. It compiled a table of five-star options complete with pricing, amenities, and star ratings — a helpful and well-structured starting point. After I chose the Norman Paris Hôtel & Spa, the agent opened Booking.com, entered the travel details, filtered the results, located the exact listing, and selected the correct room (~€892 total).

It reached the final checkout page but stalled when it asked for guest details and payment — information it couldn’t fill in because I hadn’t provided it. While the booking wasn’t completed, the agent did everything else correctly, automating about 90% of the process. I stopped just short of confirming to protect the last €5 on my card from any surprise transactions.

Researching and creating an article and presentation

I asked the agent to write a detailed comparison of OpenAI vs Anthropic, with the expectation that it would handle the entire process — from gathering information to structuring the article. While the final blog post wasn’t exceptionally well written, it was far more accurate than what you'd usually get from ChatGPT in regular mode. Since the agent searched the web in real time, you could trace every claim back to its source, which made the process much more trustworthy. I then asked it to turn the article into a presentation.

The agent produced a complete slide deck with all the right sections, plus charts, icons, and some imagery. The design wasn’t visually impressive: the graphics were basic and not really correctly placed, and layout options limited. It also hit a wall when trying to upload to Google Docs (due to login limitations), but it returned Word and PPT files.

Building a 3D interactive component with v0

One of my regular blog tasks is creating interactive elements, so I asked the agent to build a toggle switch for the 923 website — one that flips between a plugged-in and unplugged laptop, complete with matching text and animation. Instead of providing style guidelines, I asked it to extract them directly from our site. On the first attempt, ChatGPT ignored our branding entirely. After I asked it to try again using our actual color palette and visual tone, it identified key styles from our website (like the purple-pink gradient, teal accent, and dark background) and incorporated them correctly.

It created a smooth React component via v0.dev, then converted it into a standalone HTML/CSS/JS version when requested. A hiccup occurred when it couldn’t preview the result on CodePen due to CAPTCHA, but it quickly pivoted to JSFiddle. After one small manual fix (moving a script from the wrong pane), the final component worked exactly as intended and kinda matched our visual identity.

Hiccups and frustrations

The agent isn’t always the superhero you hope for. Some of its shortcomings are simply growing pains; others are inherent limitations of giving an AI the keys to the internet.

Glacial pace. The Verge’s hands‑on test compares the agent to “a day‑one intern who’s incredibly slow at every task”. Searching for a lamp on Etsy took fifty minutes while the agent slowly filtered results and even misinterpreted instructions. For many chores, a human still works faster.
Misreported actions. In the same test, the agent claimed it added five lamps to the user’s Etsy cart, yet nothing appeared in the real cart because it acted in its own virtual browser. Don’t trust everything it says – double‑check.
Crashes and loops. During trip planning, Tom’s Guide had to restart the app after the agent crashed. Others have reported loops when dealing with dynamic dropdowns or CAPTCHAs.
Stuck logins. When I used the agent to log into my v0 account and Google Drive, I noticed that simply “ending” the agent session didn’t revoke access. The agent stayed logged in on its virtual computer, and if I relaunched Agent mode it could immediately access those sessions. To truly sign out I had to log out inside Google and v0 manually.

The security risks you can’t ignore

Giving a model permission to act as you online introduces a new class of risks. Security researchers warn that Agent mode amplifies existing AI vulnerabilities and creates fresh ones.

Full user privileges. Because the agent uses your own login credentials, it inherits your permissions. Noma Security notes that there is no differential access control: there is no easy way to grant the AI fewer rights than you have. It can therefore perform irreversible actions – like deleting files or sending messages – if prompted incorrectly.
Limited auditability. Actions performed by the agent are nearly indistinguishable from those of the user in system logs. This makes it hard to determine who clicked what during an incident.
Prompt injection and content-based attacks. OpenAI acknowledges that agents are vulnerable to prompt injection attacks, where malicious instructions are hidden in user-facing content, such as invisible text, metadata, or even embedded in webpages the agent reads. This can lead the agent to act in unintended ways or disclose sensitive information from authorized tools. As OpenAI notes, mitigating these types of attacks remains an active area of research and one of the hardest open challenges in agent security.
Excessive autonomy and hallucinations. Because the agent operates with your full privileges, any misunderstanding or hallucination can lead to unintended actions.
Session handling. The agent stores session cookies so it can continue tasks across websites. If you forget to log out, the agent may access your accounts later – a risk I observed personally. Always end sessions inside connected services when you’re done.

If you’re considering using Agent mode for sensitive tasks, take these precautions:

Only enable connectors you absolutely need and prefer read‑only access when possible.
Monitor the agent’s actions in real time. Use the “watch mode” for high‑risk tasks so you can intervene.
Always log out of your accounts and clear cookies when you finish a session.
Add explicit instructions in ChatGPT’s custom instructions to forbid deleting or editing files and to avoid open‑ended web browsing.
If you’re an enterprise, implement governance and audit frameworks around AI usage.

‍

Here is more on the danger ChatGPT Agent Mode can pose to your business:

‍

Limitations that businesses should know

Beyond security, there are practical limitations that might make Agent mode unsuitable for mission‑critical workflows:

Speed. Even simple tasks can take several minutes; complex ones can take half an hour.
Mis‑clicks and UI issues. Small buttons or dynamic interfaces confuse the agent; it sometimes clicks the wrong element or repeats steps.
Site blocking and CAPTCHAs. Many retailers and travel sites block automated browsing. The agent struggles with heavy JavaScript and cannot solve CAPTCHAs without human help.

‍

Usage caps and cost. Plus and Team users get only 40 messages per month, and each pause or confirmation counts as a message. Long tasks can quickly burn through your allowance.
No API access. The consumer agent mode lives inside ChatGPT’s interface. There is no way to embed this exact agent into your product. Developers must instead use OpenAI’s Agents API and Agents SDK, which let you build custom agents with function‑calling, web‑browsing and code‑execution capabilities.

Can businesses use Agent mode in their products?

Not directly. The consumer agent is tied to ChatGPT’s interface and cannot be embedded via API. If you want similar functionality inside your own software, you need to build a custom agent using OpenAI’s developer tools. The Agents API and Responses API allow you to orchestrate multi‑tool workflows, search files and execute code under your own governance. These APIs are not subject to the same usage caps and can be integrated into secure environments.

But with that control comes responsibility. You’ll need to think about least‑privilege access, authentication, audit logs, and prompt‑injection safeguards. These agents must be designed carefully. At this stage, they’re best suited for assisting with research, drafting reports, or automating routine decisions – always with a human in the loop.

This is where most forward-thinking companies are heading: not just using Agent mode inside ChatGPT, but building controlled simulations where agents and employees operate together. These custom agents can operate within specific boundaries, align with your data governance, and reflect your brand and decision-making logic.

As our CEO puts it:

Companies don't want uncontrollable agents built by the employees running around doing things on the employees' behalf. Companies want systems that are digital twins of their current work environment — where work is completed by both agents AND humans. This simulation helps discover where agents can best serve the system, and enables faster execution, better decisions, and higher ROI.

Andrew Amann

CEO and Co-Founder at NineTwoThree

FAQ

Do I have to pay for Agent mode?

Yes. Free users cannot access Agent mode. Pro subscribers get roughly 400 agent messages per month, while Plus and Team subscribers receive about 40 messages.

Will the agent automatically buy things for me?

Not without your approval. The agent can fill out forms, add items to a cart and pre‑fill payment details, but it stops and asks for confirmation before submitting. It also refuses to conduct high‑stakes financial actions like transferring money.

Can it handle CAPTCHAs or login verification?

No. The agent cannot solve CAPTCHAs or bypass bot detectors. You must complete these steps manually, and even then it might still not work.

Does the agent remember my sessions?

Yes. The agent stores cookies so it can continue tasks across websites. This means it might remain logged into your accounts. Always sign out inside connected services to ensure the agent no longer has access.

Can I restrict its access to certain files or folders?

Not in consumer Agent mode. The agent uses the same permissions as your login. You cannot grant it read‑only access or limit its scope. To implement granular permissions, use the developer APIs and design your own connector layer.

Conclusion

Agent mode shows where AI is going. It's no longer just about answering questions but about taking action. When it works, it feels like a capable assistant handling research, drafting, or navigating the web for you. But it’s still early. The agent can be slow, make mistakes, and inherit your account access, which raises real security concerns.

For individuals, it offers a new level of convenience. For businesses, it’s a reason to think beyond small automations. The real opportunity lies in building structured systems where agents and people work together, with clear boundaries and shared objectives.

That’s what we help companies build at NineTwoThree. Whether you’re testing Agent mode or planning custom tools with OpenAI’s APIs, we design agentic systems that actually work. If you're exploring how agents could support your business, we’d be happy to talk.