How to Hire an AI Agency Without Joining the 95% Failure Rate

Published on

March 4, 2026

Updated on

March 5, 2026

How to Hire an AI Agency Without Joining the 95% Failure Rate

Don't waste $80,000 on the wrong AI agency. This guide reveals what to look for in discovery, pricing, and planning to find partners that actually deliver.

An enterprise hired an AI agency and lost $80,000. When the project ended, they had nothing to show for it.

Stories like this are becoming common. AI agencies and implementation partners are appearing everywhere, but the market is flooded with more garbage than legitimate expertise. A recent MIT study found that 95% of AI agencies fail their enterprise clients. That means if you hire randomly, you have a 1 in 20 chance of success.

The problem isn't just incompetence. Many of these agencies are using AI to generate proposals without listening to actual client needs. They're applying "spray and pray" tactics, sending out templated pitches and hoping something sticks. When one company recently gathered proposals for the same AI project, the range was staggering: $20,000 to $280,000 for identical requirements.

That price gap tells you everything. Either someone doesn't understand the work, someone is trying to undercut the market with an unrealistic bid, or someone is dramatically overcharging. For companies trying to implement AI seriously, this creates a minefield. How do you separate legitimate partners from expensive disasters?

This guide walks through the specific red flags to watch for and the questions that reveal whether an agency knows what they're doing.

If you're planning an AI implementation, download our AI Strategy Toolkit to evaluate proposals, calculate ROI, and build a roadmap before you commit to any agency.

You can also watch the full conversation with Andrew Amann, walking you through how to vet AI agencies.

The Discovery Phase Separates Real Agencies from Posers

The single most important indicator of a legitimate AI agency is whether they conduct a proper discovery phase before proposing a solution.

Real agencies don't start with technology. They start by listening. They investigate your business, understand your specific challenges, and then propose a solution based on what you actually need. This process has different names at different firms (rapid validation sprint, discovery workshop, business analysis), but the principle is the same: understand the problem before proposing the solution.

“

We just had a client tell me after five other agencies, you were the only person that listened to what I said, wrote that down in the proposal and used my words in the proposal itself, which also means people are just using AI to write all these proposals.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

What proper discovery looks like

Before a legitimate agency sends you a proposal, you should speak with multiple people on their team.

At a minimum, you should talk to:

An engineer (to understand technical feasibility)
A product manager (to scope the actual work)
Leadership (CEO or CTO, to ensure strategic alignment)
Sometimes marketing or sales (to understand your business model)

If an agency sends you a proposal after one 30-minute call, they don't understand your business. They're guessing.

During these conversations, the agency should be asking questions like:

What specific process is causing the bottleneck?
Who currently handles this task, and how long does it take?
What have you already tried to solve this?
What does success look like in measurable terms?
What data do you have available, and where does it live?

Agencies that skip discovery end up building solutions to problems you don't have. They create flashy demos that don't integrate with your actual workflows. When you try to deploy them, the edge cases they never considered destroy the timeline and budget.

“

It is impossible to hire anyone from the outside world. Drop them into your business and have them like work and produce money for you. It will never work.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

The templated proposal red flag

“

I was sitting next to an agency ownerat a workshop,and he's like, 'Oh, we just won a project.' I'm like, 'What's it for?' He's like, 'I got no idea. We sent about 10 of these proposals out last week. It looks like somebody said yes.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

That's not an agency. That's a numbers game.

If your proposal reads like it could have been sent to any company in your industry, walk away. A real proposal should reference specific challenges you mentioned, use your terminology, and demonstrate understanding of your business model.

Single Player vs. Multiplayer: Why AI Project Costs Vary Wildly

When you receive proposals ranging from $20,000 to $200,000 for the same project, the gap usually comes down to one distinction: single-player versus multiplayer complexity.

Single player solutions

A single-player AI solution is built for one person (typically an executive) to use in isolation. Examples include:

A CEO's personal invoice reviewer
An individual time tracker
A private research assistant

These can often be "vibe coded" quickly because the edge cases are limited. When the single user encounters a problem, they adapt. There's no need to handle multiple users with different permissions, varying levels of technical skill, or competing use patterns.

The $20,000 proposals you see are typically single-player MVPs. And there's nothing inherently wrong with that, as long as you understand what you're buying.

The single-player trap

The problem happens when companies treat single-player pricing as if it applies to multiplayer deployment.

When you build a single-player solution without proper planning, every new user type becomes weeks of additional development. The sales team needs different data from the engineering team. Customer-facing staff need different permissions than internal analysts. What seemed like a straightforward tool becomes a sprawling mess of special cases.

Multiplayer complexity

When you move from a single-user tool to a system that serves your entire organization, three critical challenges emerge. Each one adds layers of technical complexity that cheap proposals simply don't account for.

User permissions and access control

Who has access to what data? How do those permissions change based on role, department, or seniority? When Person A asks a question, they should get different results than Person B asking the same question if their access levels differ.

Two years ago, this had to be custom-coded. Metadata descriptions tracked user permissions manually. Now, Microsoft and Google ecosystems automatically store user permissions with vectorized documents. This helps, but you still need an agency to map these permissions and figure out what happens when something changes.

Consistency across skill levels

Jill has been with your company for 20 years. She knows exactly how to phrase questions to get useful answers. Susie started six months ago and doesn't know the right terminology yet.

This requires building systems that understand user intent, not just literal queries. It means handling recency (newer information should surface first), avoiding bias (everyone should get accurate answers regardless of how they ask), and eliminating redundancy (similar questions shouldn't produce wildly different responses).

Tone and voice consistency

When your AI represents your company to customers or helps employees make decisions, it needs to maintain a consistent tone and voice. A support bot that sounds friendly to one customer and robotic to another creates a broken experience.

All three of these challenges require sophisticated prompt engineering, extensive testing across user types, and quality assurance processes that single-player tools never need.

How to know which you're buying

“

If you ask a $20,000 agency, can I hold you to that number? I guarantee you they will say no. They will say no because their business model does not succumb to that type of structure. They cannot do a fixed price because they know every project they do ends up being 50, 80, $100,000.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

If the agency says yes, they've built this type of system before. They're confident in their estimate because they've planned for the edge cases. They know what multiplayer complexity costs.

If they say no, or if they hedge, their business model doesn't support fixed pricing. They know the project will balloon once reality hits. The low initial quote is bait.

Look for agencies that provide a top limit and a structured breakdown of what you need to go live. You should be able to see exactly what you're paying for and know that the price won't creep upward as "unforeseen complications" emerge.

Want to see if an agency's proposal makes financial sense? Use our AI ROI framework to evaluate whether their timeline and budget will actually deliver positive returns.

You can download the whole AI Strategy Toolkit right here.

The Architecture Question: Planning vs. Trial and Error

Building AI solutions follows the same principle as building houses: if you plan the architecture properly, you never have to tear anything down and rebuild.

When you build a house, you plan where the bathrooms go, where the plumbing runs, where the electrical needs to be installed. You get approvals before breaking ground. This prevents you from building a room and then realizing the plumbing has to run through the middle of the kitchen.

AI projects work the same way.

The vibe coding approach (trial and error)

Some agencies approach AI projects through rapid iteration: build something in one prompt, see what breaks, spend hours fixing edge cases, repeat. This feels fast initially because you get something working quickly.

But every fix creates new problems. The architecture wasn't planned, so each addition requires refactoring. The $20,000 MVP becomes a $100,000 mess of patches and workarounds.

The architectural approach (planning first)

“

You get a prompt, you get an idea, you build that, you plan it, you get the structure, you spend like 80% of your time typing out how the architecture is going to work and how the plan is going to look and how the different edge cases are going to be resolved. And then the last 20% is coding.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

This means:

Mapping out all the user flows before writing a line of code
Identifying edge cases in advance
Planning data structures that can handle future complexity
Getting feedback on the plan before implementation

Only after the architecture is solid do you hit submit and generate code. The result? Projects that come in on time and on budget because you anticipated the complexity instead of discovering it mid-build.

When evaluating agencies, ask them to walk through their planning process. Do they dive straight into coding, or do they start with architecture diagrams and user flow mapping? The answer reveals whether they've done this before.

What Success Looks Like

The agencies that consistently deliver successful AI projects share common characteristics.

“

24 out of the 27 projects we've delivered have been on time and on budget, and they've returned more ROI than what the client has paid us to build that project. And that's like a slot machine. You put $2 into a slot machine, and you get $4 out. You would do that over and over and over again.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

Experience matters. Agencies that have delivered hundreds of projects know what works and what doesn't. They've seen the edge cases. They've learned where complexity hides.

By the time a legitimate agency says yes to your project, they should already know:

Your business model and how the AI fits into it
How much ROI can you expect and when
The technical approach they'll use to build it
Which specific engineer on their team is best suited for your project
How will they staff the work if that engineer is currently on another project
The estimated release date

All of that planning happens before you sign a contract. When you get confirmation that the agency wants to move forward, they should already have answers to these questions. If they're still figuring out the basics after you've committed, they didn't do the work upfront.

The Model Choice Question: OpenAI, Claude, or Gemini?

When evaluating AI agencies, you'll often hear them discuss which models they use. This matters more than it might seem.

OpenAI's enterprise vs. consumer split

OpenAI's reasoning models are excellent. The APIs are robust and predictable. For enterprise applications requiring complex logic and decision-making, OpenAI's models perform well.

But OpenAI has increasingly focused on consumer-grade products rather than enterprise infrastructure:

These are not enterprise features. They're designed to capture consumer data and enable advertising revenue. ChatGPT Atlas, for example, gives OpenAI access to every site users visit while browsing. That's valuable for ad targeting, but it's a privacy nightmare for enterprises.

Claude and Gemini's enterprise focus

Claude (from Anthropic) and Gemini (from Google) have both invested heavily in enterprise infrastructure:

Claude Code integrates with internal file systems and development workflows
Gemini deeply integrates with Google Workspace, Drive, and admin tools
Both focus on user permissions, data security, and multiplayer complexity

For companies on Microsoft ecosystems, Microsoft Copilot (which uses OpenAI models under the hood) provides enterprise-grade tooling. For Google Workspace users, Gemini offers native integration.

What this means when vetting agencies

Ask the agency which models they typically use and why. If they default to consumer ChatGPT for enterprise projects, they may not understand the security and permission requirements your business actually needs.

Legitimate agencies should be able to explain:

Why they chose a particular model for your use case
How they handle user permissions and data access
What happens to your data (is it used for model training?)
How they ensure consistency across different user types

If they can't answer these questions clearly, they're not thinking about multiplayer complexity.

Don't Lower Your Standards for AI Hiring

Companies often lower their hiring standards when it comes to AI implementation. They wouldn't hire a salesperson without checking references, or bring on a product manager without understanding their process. But they'll hand $50,000 to an AI agency and just hope it works out.

Apply the same rigor to vetting AI agencies that you would to hiring any critical team member:

Check their track record
Talk to past clients
Understand their process
Ensure they've solved similar problems before
Get fixed-price quotes with detailed breakdowns
Verify they conduct proper discovery before proposing solutions

The 95% failure rate isn't inevitable. It exists because companies don't know what questions to ask. Now you do.

If you're looking for an AI implementation partner that actually delivers results, talk to NineTwoThree. We've successfully launched over 160 projects by following exactly the process outlined in this guide. We start with discovery, we listen to your actual needs, and we build solutions that work in production, not just in demos.

Because the best AI strategy is the one that actually ships.

An enterprise hired an AI agency and lost $80,000. When the project ended, they had nothing to show for it.

This guide walks through the specific red flags to watch for and the questions that reveal whether an agency knows what they're doing.

If you're planning an AI implementation, download our AI Strategy Toolkit to evaluate proposals, calculate ROI, and build a roadmap before you commit to any agency.

You can also watch the full conversation with Andrew Amann, walking you through how to vet AI agencies.

The Discovery Phase Separates Real Agencies from Posers

The single most important indicator of a legitimate AI agency is whether they conduct a proper discovery phase before proposing a solution.

“

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

What proper discovery looks like

Before a legitimate agency sends you a proposal, you should speak with multiple people on their team.

At a minimum, you should talk to:

An engineer (to understand technical feasibility)
A product manager (to scope the actual work)
Leadership (CEO or CTO, to ensure strategic alignment)
Sometimes marketing or sales (to understand your business model)

If an agency sends you a proposal after one 30-minute call, they don't understand your business. They're guessing.

During these conversations, the agency should be asking questions like:

What specific process is causing the bottleneck?
Who currently handles this task, and how long does it take?
What have you already tried to solve this?
What does success look like in measurable terms?
What data do you have available, and where does it live?

“

It is impossible to hire anyone from the outside world. Drop them into your business and have them like work and produce money for you. It will never work.

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

The templated proposal red flag

“

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

That's not an agency. That's a numbers game.

Single Player vs. Multiplayer: Why AI Project Costs Vary Wildly

When you receive proposals ranging from $20,000 to $200,000 for the same project, the gap usually comes down to one distinction: single-player versus multiplayer complexity.

Single player solutions

A single-player AI solution is built for one person (typically an executive) to use in isolation. Examples include:

A CEO's personal invoice reviewer
An individual time tracker
A private research assistant

The $20,000 proposals you see are typically single-player MVPs. And there's nothing inherently wrong with that, as long as you understand what you're buying.

The single-player trap

The problem happens when companies treat single-player pricing as if it applies to multiplayer deployment.

Multiplayer complexity

User permissions and access control

Consistency across skill levels

Jill has been with your company for 20 years. She knows exactly how to phrase questions to get useful answers. Susie started six months ago and doesn't know the right terminology yet.

Tone and voice consistency

All three of these challenges require sophisticated prompt engineering, extensive testing across user types, and quality assurance processes that single-player tools never need.

How to know which you're buying

“

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

If the agency says yes, they've built this type of system before. They're confident in their estimate because they've planned for the edge cases. They know what multiplayer complexity costs.

If they say no, or if they hedge, their business model doesn't support fixed pricing. They know the project will balloon once reality hits. The low initial quote is bait.

Want to see if an agency's proposal makes financial sense? Use our AI ROI framework to evaluate whether their timeline and budget will actually deliver positive returns.

You can download the whole AI Strategy Toolkit right here.

The Architecture Question: Planning vs. Trial and Error

Building AI solutions follows the same principle as building houses: if you plan the architecture properly, you never have to tear anything down and rebuild.

AI projects work the same way.

The vibe coding approach (trial and error)

But every fix creates new problems. The architecture wasn't planned, so each addition requires refactoring. The $20,000 MVP becomes a $100,000 mess of patches and workarounds.

The architectural approach (planning first)

“

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

This means:

Mapping out all the user flows before writing a line of code
Identifying edge cases in advance
Planning data structures that can handle future complexity
Getting feedback on the plan before implementation

What Success Looks Like

The agencies that consistently deliver successful AI projects share common characteristics.

“

”

Andrew Amann

CEO and Co-Founder at NineTwoThree

Experience matters. Agencies that have delivered hundreds of projects know what works and what doesn't. They've seen the edge cases. They've learned where complexity hides.

By the time a legitimate agency says yes to your project, they should already know:

Your business model and how the AI fits into it
How much ROI can you expect and when
The technical approach they'll use to build it
Which specific engineer on their team is best suited for your project
How will they staff the work if that engineer is currently on another project
The estimated release date

The Model Choice Question: OpenAI, Claude, or Gemini?

When evaluating AI agencies, you'll often hear them discuss which models they use. This matters more than it might seem.

OpenAI's enterprise vs. consumer split

OpenAI's reasoning models are excellent. The APIs are robust and predictable. For enterprise applications requiring complex logic and decision-making, OpenAI's models perform well.

But OpenAI has increasingly focused on consumer-grade products rather than enterprise infrastructure:

Claude and Gemini's enterprise focus

Claude (from Anthropic) and Gemini (from Google) have both invested heavily in enterprise infrastructure:

Claude Code integrates with internal file systems and development workflows
Gemini deeply integrates with Google Workspace, Drive, and admin tools
Both focus on user permissions, data security, and multiplayer complexity

For companies on Microsoft ecosystems, Microsoft Copilot (which uses OpenAI models under the hood) provides enterprise-grade tooling. For Google Workspace users, Gemini offers native integration.

What this means when vetting agencies

Legitimate agencies should be able to explain:

Why they chose a particular model for your use case
How they handle user permissions and data access
What happens to your data (is it used for model training?)
How they ensure consistency across different user types

If they can't answer these questions clearly, they're not thinking about multiplayer complexity.

Don't Lower Your Standards for AI Hiring

Apply the same rigor to vetting AI agencies that you would to hiring any critical team member:

Check their track record
Talk to past clients
Understand their process
Ensure they've solved similar problems before
Get fixed-price quotes with detailed breakdowns
Verify they conduct proper discovery before proposing solutions

The 95% failure rate isn't inevitable. It exists because companies don't know what questions to ask. Now you do.

Because the best AI strategy is the one that actually ships.

Alina Dolbenska

Content Marketing Manager

Andrew Amann

CEO of NineTwoThree AI studio

How to Hire an AI Agency Without Joining the 95% Failure Rate

The Discovery Phase Separates Real Agencies from Posers

What proper discovery looks like

The templated proposal red flag

Single Player vs. Multiplayer: Why AI Project Costs Vary Wildly

Single player solutions

The single-player trap

Multiplayer complexity

User permissions and access control

Consistency across skill levels

Tone and voice consistency

How to know which you're buying

The Architecture Question: Planning vs. Trial and Error

The vibe coding approach (trial and error)

The architectural approach (planning first)

What Success Looks Like

The Model Choice Question: OpenAI, Claude, or Gemini?

OpenAI's enterprise vs. consumer split

Claude and Gemini's enterprise focus

What this means when vetting agencies

Don't Lower Your Standards for AI Hiring

The Discovery Phase Separates Real Agencies from Posers

What proper discovery looks like

The templated proposal red flag

Single Player vs. Multiplayer: Why AI Project Costs Vary Wildly

Single player solutions

The single-player trap

Multiplayer complexity

User permissions and access control

Consistency across skill levels

Tone and voice consistency

How to know which you're buying

The Architecture Question: Planning vs. Trial and Error

The vibe coding approach (trial and error)

The architectural approach (planning first)

What Success Looks Like

The Model Choice Question: OpenAI, Claude, or Gemini?

OpenAI's enterprise vs. consumer split

Claude and Gemini's enterprise focus

What this means when vetting agencies

Don't Lower Your Standards for AI Hiring

Subscribe To Our Newsletter

An award-winning AI development agency with a proven track record