AI in Practice

Why It Gets Hard to Justify AI Spending

A person stares at a growing stack of receipts and dollar signs, questioning marks floating above their head.

Key takeaway

Uber's AI budget problem is not an Uber problem. It is what happens when tools get bought before measurement infrastructure exists. The cost lands on the balance sheet immediately. The value accumulates in places no one thought to track. The fix is not cutting spend or abandoning AI. It is assigning one named output metric per tool before the purchase order is signed, then reviewing by category every quarter. Do that and the line between spend and output becomes drawable.

Most founders cannot justify AI spending because they never built the measurement infrastructure before buying the tools. That is not a skepticism problem. It is a sequencing problem. The cost hits the balance sheet on day one. The value accumulates in places no one thought to track. By the time the budget review arrives, the spending looks real and the return looks like a guess.

Uber reportedly exhausted its full-year AI budget within four months in 2026, according to The Verge. Its president, Andrew Macdonald, publicly stated it is "hard to draw a line" between that spend and any deliverable shipped feature. That admission is worth reading carefully. Not because Uber is failing, but because that exact sentence is what happens to every team that buys tools faster than it builds tracking.

This piece is not about whether AI spending is worth it. It is about what to do differently before the budget review forces the question.

What did Uber's president actually say about AI spending?

Andrew Macdonald's comment landed in May 2026 and it was blunt. He said the company cannot clearly connect its AI investment to the features it has shipped. Uber, by that point, had already burned through what it had allocated for the full fiscal year, four months in.

That matters because Uber is not a startup operating on instinct. It has dedicated engineering teams, finance infrastructure, and product leadership that reviews spend. If a company at that scale cannot draw the line between AI investment and output, the problem is not organizational incompetence. The problem is that no one built a measurement layer before the tools went live.

The story got framed in most coverage as a skepticism signal. Some read it as evidence that AI is not delivering enterprise value. That framing misses the actual lesson. Uber's problem is a template error. The tools may be working. The tracking was never built to prove it.

Why do most operators fail to connect AI spend to output?

The common mistake is buying AI tools at the team level before any tracking infrastructure exists. When that happens, the cost is visible from the first invoice. The value stays invisible because no one defined what "value" meant before purchase.

I have seen this pattern across AI tool stack audits in operator and consulting contexts. The most common finding is six to twelve active AI subscriptions at renewal time with zero named output metrics attached to any of them. Not a single tool tracked against a number someone had agreed to hit.

Most budgets record license cost and seat count. They do not record displaced hours, error rate reduction, or cycle time change. That means the person defending the spend at review time is defending a number with no counterweight. They are explaining what the tool does rather than what it moved.

The second problem is treating AI subscriptions the same way teams treat SaaS renewals. SaaS tools replace something specific. A CRM replaces a spreadsheet. A support platform replaces email threads. The replacement is visible. Most AI tools do not replace a discrete system. They compound an existing process. That means the value is real but diffuse, and diffuse value loses every budget argument.

Teams that buy AI tools in burst cycles during hype windows consistently struggle to justify renewals six months later because the value happened but no one measured it. The tool worked. The evidence was never collected.

What does the Uber case reveal about enterprise AI ROI in 2026?

Scale does not fix the attribution problem. It amplifies the visibility of it.

Large operators face the same core issue as small teams: AI spend front-loads in experimentation before value compounds. A team runs tools in parallel, tests workflows, ships some of it, abandons some of it, and gradually learns what actually accelerates output. That process takes months. The budget review often lands before the return does.

At Uber's scale, the budget exhaustion by April 2026 reflects that same dynamic, just with more zeros attached. The experimentation happened fast. The attribution infrastructure was not ready. So when the question came, the honest answer was that the line was not drawable.

As of early 2026, several major operators have made similar acknowledgments publicly or in internal reviews. The attribution gap is systemic, not specific to one company's governance failure. That means the problem is not unique to large teams with slow processes. It is what happens by default when no one makes measurement a precondition for purchase.

The McKinsey State of AI survey has consistently found that organizations struggle more with measuring AI ROI than with deploying AI tools. The tools are easier to buy than the measurement is to build. That imbalance is where most justify AI spending problems start. See also the related fix in Strategic AI for Founders: Fix Revenue Leaks First and The AI Implementation Paradox for Teams for how this pattern shows up at the team level.

How should founders measure AI spending against real output?

My rule is this: if you cannot state the metric before buying, do not buy. The ten minutes that exercise takes will save months of justification problems later.

Assign one named output metric to each AI tool before the purchase order is signed. Not a category of benefit. One specific metric. Time saved per task. Error rate in a given review step. Lead response cycle time. Revenue enabled per rep per month. The metric does not need to be perfect. It needs to be named before the cost hits the books.

Name an owner per investment who tracks that metric monthly. No owner means no accountability. No accountability means no defensible renewal. When the review arrives, the owner either has a number or they do not. That forces honesty early rather than at the worst possible moment.

The second move is to define what good looks like at 90 days. Not "we expect improvement." A number. If an AI writing tool is supposed to cut content cycle time, what does success look like by week twelve? If a meeting summarizer is supposed to reduce async catch-up time, what is the target? Without a 90-day output target, every AI tool becomes perpetually "still proving value" until someone cancels it in frustration rather than evidence.

I would not start with a complex ROI scoring matrix. I have seen teams build elaborate measurement systems for AI tools they have not tested yet. The scoring matrix becomes the project. Build the simplest version: tool name, metric owner, one output metric, 90-day target. That table will tell you more at review time than any dashboard built before the data exists.

What is a defensible AI budget framework for operators?

Split spend into three buckets and treat each one differently.

Core workflow tools are AI investments that touch a daily production process. These get tracked against real ROI from day one. An owner, a metric, a baseline reading before deployment, and a 90-day target. These are the tools where accountability is non-negotiable because the cost is recurring and the stakes are operational.

Experimental tools get a kill date and no auto-renewal. These are tools a team is testing against a specific hypothesis. The hypothesis should be written down before the tool is activated. At 90 days, the team reviews the output target. If it is not hit, the tool is paused. Not cancelled permanently, but paused with conditions for return. That distinction matters because some tools take two cycles to find their fit.

Infrastructure costs are AI spend that lives at the platform or API layer. Compute, model access, embedding costs. These get treated as overhead and tracked separately from tool spend. Blending them into a single AI line item is one of the most common ways budget reviews become impossible to interpret.

Review by category every quarter, not as a blended total. A single AI spend number at review time makes it impossible to know what is compounding, what has become habit, and what is quietly draining budget with nothing to show. Category-level review forces the question for each bucket independently.

What should you do right now if AI spending feels unjustifiable?

Start with an audit. Pull every active AI subscription and ask one question per tool: what specific metric is this supposed to move?

If there is no metric, do not guess retroactively. Mark it as untracked and put it in a hold category. That is not the same as cancelling. It is a reset. The tool can come back when someone names the metric and owns the tracking.

Pause or cancel anything that has been running for more than six months with no metric assigned. The tool has had enough time to demonstrate value. If no one tracked it, the value either did not happen or did happen and is now invisible forever. Either outcome is a problem worth fixing before the next renewal.

Rebuild the budget category by category. Start with the core workflow tools. For each one, assign an owner this week and get a baseline reading on the metric before the next billing cycle. Then work through the experimental stack. Assign 90-day targets or initiate pauses. Treat infrastructure as a separate line and track it against usage, not features.

The Uber story will keep happening at every scale as long as teams buy tools faster than they build accountability. The answer is not slower buying. It is a simple precondition: metric first, purchase second.

Founders running lean AI implementations get this right more often than large teams do, not because they are more disciplined but because every dollar is more visible. If you are building that kind of accountability layer into your AI budget now, learn more about how to structure it before the next review cycle forces the question.

FAQ

Why is AI spending hard to justify to leadership?

AI spending is hard to justify because cost is immediate and sits on a visible budget line, but value accumulates across time saved, quality improvements, and faster cycles that nobody assigned a metric to track. When leadership asks for a return on AI investment, the value may genuinely exist in the business but there is no clean reporting line connecting it back to the spend. That is the attribution gap Uber's president described in May 2026: not that AI produced nothing, but that the measurement infrastructure to demonstrate what it produced was never built alongside the spending.

What exactly did Uber's president say about AI spending in 2026?

In May 2026, Uber president Andrew Macdonald stated publicly that it is 'hard to draw a line' between the company's AI investment and specific deliverable product features. This followed reporting that Uber exhausted its full annual AI budget within the first four months of 2026. The admission is significant because Uber operates at scale with dedicated engineering, product, and finance teams. If the attribution problem exists there, it is a structural issue, not a resourcing one, and it applies to operators at every size.

How should a founder measure ROI on AI tools?

Assign one named output metric to each AI tool before purchasing it. The metric should sit in one of three categories: time saved per task or workflow, quality improved as measured by error rate or revision cycles, or revenue enabled through faster deal cycles or higher conversion. Without a pre-assigned metric, renewals become unjustifiable regardless of how useful the tool feels in practice. A quarterly review of each AI subscription against its metric is enough to identify tools that have drifted from impact into habit, and to make a defensible case for the ones that are actually compounding value.

What is a good framework for structuring an AI budget?

A three-bucket framework works well for most operators. Core workflow tools get a named ROI metric and a designated owner who tracks it monthly. Experimental tools get a time-boxed budget with a kill date rather than an auto-renewal. Infrastructure costs like API usage or foundational model fees get treated as overhead and tracked separately from value-generating tools. The mistake most teams make is running all AI spend through a single budget line and reviewing it as a blended total. That makes it impossible to separate what is compounding from what is simply habit.

Is cutting AI spending the right move when ROI is unclear?

Not necessarily. Blanket cuts eliminate tools that are genuinely working alongside ones that are not, which leaves teams unable to learn from the difference. The more useful response is to pause new purchases until measurement infrastructure exists, then audit existing tools against one named metric each. Kill or pause anything with no metric assigned and rebuild the remaining budget with tracked output targets. The goal is not less AI spending. It is traceable AI spending. An organization that can name what each tool moves is in a position to defend, grow, or cut with confidence instead of reacting to a budget ceiling.

Why do large enterprises struggle more with AI ROI than small teams?

Large enterprises often struggle more because AI tool adoption spreads across dozens of teams simultaneously, each with its own budget line and reporting structure. Without centralized output tracking, spend aggregates rapidly and accountability diffuses. Small teams have the identical attribution problem but a smaller surface area, which makes the gap easier to spot and easier to fix. The Uber case illustrates that scale amplifies the visibility of the problem without solving its cause: tools purchased before output metrics were assigned. The underlying error is the same regardless of company size.

What three questions should I ask before approving an AI tool purchase?

Before approving any AI tool budget, ask these three questions. First, what specific output metric will change if this tool works as expected? Second, who owns tracking that metric and reporting on it monthly? Third, what is the cut date if the metric does not move within 90 days? If the team requesting the budget cannot answer all three, the purchase should wait. These questions force the measurement setup before the spend begins, which is the only reliable way to justify AI spending at review time. The questions take under ten minutes to ask and can prevent months of budget accountability problems later.

Sources

  1. Uber president says AI spending is getting 'harder to justify'
  2. The State of AI: McKinsey Global Survey on AI adoption and ROI measurement

Keep reading

More from the journal.

All posts