listicle 14 min read May 04, 2026

11 Common AI Agent Failure Modes (and How to Catch Them)

Q: What are the most common AI agent bugs?

Five failure classes account for nearly all production agent incidents. Per Arize's 2026 field analysis, context blindness (31.6%), rogue actions (30.3%), silent degradation (24.9%), memory corruption (8.1%), and runaway execution (5.1%) cover the bulk of incidents.

Q: How do you debug an infinite agent loop?

Hash every (tool_name, arguments) pair in the trace and alert when the same hash appears three or more times. Add a hard step cap (15--25 turns) and a no-progress detector comparing the last three reasoning steps. Most loops are the agent retrying a tool against an unhandled error class.

Q: What is prompt injection in an AI agent context?

Prompt injection is when untrusted text consumed by the agent contains instructions that override the system prompt. Google's April 2026 security report found indirect prompt injection volume rose 32% between November 2025 and February 2026, with around 40% of agent protocols exploitable. Tag every untrusted span and never let tool output influence control flow without validation.

Q: How do you detect tool-call hallucinations?

Validate the tool name against your live registry on every model output before dispatch. If the name is invalid, return the registry list as a tool result and let the agent self-correct. Track an invalid_tool_name counter per model version -- spikes after upgrades signal different tool-naming priors.

Q: What is eval-prod skew and how do you prevent it?

Eval-prod skew is the gap between offline eval scores and live success rates. The AlphaEval 2026 study found the best agent config scored only 64.41/100 on production-grounded tasks. Prevent it by replaying anonymized production traces as evals weekly, running shadow traffic, and tracking the same success metric in both surfaces.

Q: What causes agent token costs to run away?

Multi-turn agent cost compounds, not adds. Every turn re-processes the entire prior context. Teams that model cost as 'turns x avg cost' underprice their systems by 3x to 5x. Hard per-trace token budgets and tool-output compaction are the highest-ROI fixes.

Q: How do you catch partial-state corruption in an agent workflow?

Treat every multi-step agent action as a saga with explicit compensating actions. Persist a workflow_state record with per-step status. Require all steps to reach 'committed' before returning success. Alert on workflows where the final response says 'done' but the side-effect ledger has open steps.

Q: What is schema drift in LLM tool calling?

Schema drift is when an LLM's structured output starts matching the schema partially or with extra fields, often after a model upgrade. Set additionalProperties: false and strict: true on every tool schema, validate with a real JSON Schema validator, and run a Prompt -> Generate -> Validate -> Repair -> Parse loop.

By Peter Foy

11 AI agent failure modes we've hit in production -- anatomy, trace fingerprint to grep for, and the fix. War stories from real shipped agents.

TL;DR

Eleven failure modes cover almost every production agent incident: tool hallucination, infinite reasoning loops, premature termination, context bloat, prompt injection, cost runaway, schema drift, stale memory, parallel tool race conditions, partial-state corruption, and eval-prod skew. Each has a specific trace fingerprint you can grep for and a known fix. We've shipped against all eleven.

88% of agent failures trace to infrastructure gaps, not model quality (Arize, 2026).
Context blindness (31.6%) and rogue actions (30.3%) are the top two failure classes in production.
Indirect prompt injection volume grew 32% in three months ending February 2026 (Google).
Multi-turn agent cost compounds 3x--5x faster than naive 'turns x avg cost' models predict.
Best fix per mode: hash-based loop detection, schema-strict tool calls, replayed prod traces as evals.

We've shipped agents to production for two years and hit every failure mode in this list. Some cost us money. Some cost us a weekend. One almost cost us a customer. This is a field guide: for each of the eleven most common bugs, you get a one-paragraph anatomy, the exact trace fingerprint to grep for, and the fix that actually held up. The data backs the pattern. Per Arize's 2026 production analysis, 88% of agent failures trace to infrastructure gaps, not model quality. That's good news -- infrastructure is something you can fix.

What are the most common AI agent failure modes?

Five failure classes cover almost every production agent incident. Arize's 2026 field analysis of real production incidents found context blindness (31.6%), rogue actions (30.3%), silent degradation (24.9%), memory corruption (8.1%), and runaway execution (5.1%) account for nearly all reported failures.

The eleven failure modes in this guide are the sharpest sub-classes inside those buckets -- the ones that show up in real traces and have real, repeatable fixes. We grouped them by what kind of harm they cause:

Reasoning failures: tool hallucination, infinite loops, premature termination
Resource failures: context bloat, cost runaway
Security failures: prompt injection
Integration failures: schema drift, parallel tool races, partial-state corruption
Memory failures: stale memory
Process failures: eval-prod skew

The rest of this article walks each one. Use the comparison table at the end to map a symptom you're seeing right now to its trace fingerprint and fix.

Distribution of AI Agent Failure Modes in Production

Context Blindness

31.6%

Rogue Actions

30.3%

Silent Degradation

24.9%

Memory Corruption

8.1%

Runaway Execution

5.1%

Source: Arize AI, 'Why AI Agents Break: A Field Analysis of Production Failures' (2026)

1. Tool hallucination -- the agent invents a function that doesn't exist

Anatomy. The model emits call_tool(name="send_slack_message_to_user", ...). You don't have that tool. You have slack.send_message. The agent didn't malfunction -- it confidently wrote a tool call that should exist given the prompt context, and many frameworks crash or silently pass garbage when this happens. We hit this on the first day of a Claude 3.5 -> Sonnet 4 upgrade because the new model preferred snake-case names for the same registry. Per AgentLens, confident-sounding false tool names are one of the most common hallucination signatures.

Trace fingerprint. A ToolNotFoundError (or your equivalent) on the first tool call after a model swap. A spike in your invalid_tool_name counter that correlates with a deploy. Calls to plausible-but-wrong names like search_web when your registry has web.search.

Fix.

Validate every tool name against the live registry before dispatch, not after.
On invalid name, return a synthetic tool result containing the actual registry list. Let the agent self-correct on the next turn.
Pin tool schemas per model version. Run a regression suite of 50 known-good tool-call prompts on every model upgrade.
Use OpenAI/Anthropic strict: true mode wherever the SDK supports it.

2. Infinite reasoning loop -- the agent calls the same tool forever

Anatomy. The agent issues search('q1'). The result is empty or unhelpful. The agent issues search('q1') again. And again. We had one production trace where the agent ran the same query 47 times before our cost alarm fired. Per the redteamer.tips writeup, the most common root cause is an unhandled tool-error class -- the agent doesn't reason that 'rate limit' and 'no results' are different conditions, so it retries blindly.

Trace fingerprint. Hash every (tool_name, arguments) pair in a trace. The same hash appearing >=3 times is a 99% reliable loop signal. Also: trace duration > p99 with no terminal stop_reason.

Fix.

Per-trace step cap (we use 25). Hard fail at the cap, return the partial result.
Cycle detector on (tool, args) hashes. On repeat, inject a synthetic observation: "You just called this exact tool with these exact arguments and got this result. Try a different approach or stop."
Distinct error taxonomy: RATE_LIMIT, EMPTY, AUTH, INVALID. Pass the class, not just the string, back to the model.
Log no-progress with a small embedding-similarity check between the last three reasoning steps.

3. Premature termination -- the agent says 'done' before it's done

Anatomy. The agent has a six-step task. It does steps 1--3, hits a soft error, decides 'I have provided the user with relevant information,' and emits a final response. The user sees a polite, confident answer. The job isn't actually finished. Arize's analysis flags this as a core failure pattern -- an order-triage agent correctly identifies a shipping exception, then silently skips the refund step and reports the case as resolved.

Trace fingerprint. stop_reason=end_turn before the goal predicate is satisfied. A workflow_state record with status='completed' but open downstream steps. Customer-side reports of 'the agent said it did X but X didn't happen.'

Fix.

Define a goal-completion predicate per agent type (a function, not a vibe). The agent cannot return a terminal response until the predicate returns true.
Use a small judge model on the final response: "Given the original request and the action log, did the agent actually complete the task?" If not, force another turn.
Persist workflow_state with explicit per-step status. The terminal response writes a single row. Open rows trigger a paging alert.

4. Context bloat -- the prompt gets bigger every turn until the agent forgets the goal

Anatomy. Turn 1: 4k tokens. Turn 5: 38k tokens. Turn 10: 92k tokens, the model is paying $1.40 per turn, and the original user goal is buried under 80k tokens of stale tool output. Per VentureBeat's xMemory analysis, context bloat happens when old tool outputs, resolved errors, and superseded plans stay in the prompt indefinitely.

Trace fingerprint. prompt_tokens climbing super-linearly across turns. Tool-output payloads larger than 4k tokens that aren't summarized. The model citing facts from turn 2 instead of more recent observations.

Fix.

Sliding-window history: keep the system prompt + last N turns + a running summary of everything else.
Tool-output compaction: any tool result over 2k tokens gets summarized before re-entering the context.
Persist full tool outputs in object storage with an ID. The agent can re-fetch by ID if it actually needs the raw data.
Track prompt_tokens_per_turn as a SLO. Page when p95 crosses your threshold.

5. Prompt injection -- untrusted text takes over your agent

Anatomy. Your agent reads a web page. The page contains: . If your agent has an email.send tool, you have a problem. Per Google's April 2026 security report, indirect prompt injection volume grew 32% between November 2025 and February 2026, and around 40% of agent protocols are exploitable. The SQ Magazine 2026 report found a single GUI-agent injection attempt succeeds 17.8% of the time without safeguards, and 78.6% by the 200th attempt.

Trace fingerprint. Tool output containing strings like 'ignore previous,' 'new instructions,' 'system:', or markup that looks like a prompt boundary. Tool calls to high-privilege actions (email, payments, data exfil) immediately after a tool that returned external content.

Fix.

Tag every untrusted span. Wrap external content in <untrusted_content> markers in the prompt. Train the agent (in the system prompt) to never follow instructions inside those markers.
Sanitize tool outputs: strip HTML comments, hidden Unicode, instruction-shaped strings.
Capability-gate: high-privilege tools require an explicit user confirmation step that cannot be triggered by tool output alone.
See our agent guardrails tutorial for the full template.

6. Cost runaway -- one trace burns more than your daily budget

Anatomy. A single agent run spirals. It re-reads the same 50k-token document four times, calls a model API 30 times, and lands a $47 bill on one user request. Per the DEV Community analysis, teams that model agent cost as 'turns x average cost per turn' underprice their systems by 3x to 5x because every turn re-processes the entire prior context.

Trace fingerprint. $/trace p95 more than 10x p50. A long tail of single traces over $5. prompt_tokens growing geometrically across turns in the same trace.

Fix.

Hard per-trace token budget. Soft warning at 50%, hard stop at 100%. The agent gets one chance to summarize and conclude.
Per-user rate limit on agent calls.
Circuit breaker: if $/trace p95 crosses threshold for 5 minutes, route new requests to a cheaper model or queue them.
Tag every trace with user_id, tenant_id, and feature. Cost runaways always have a fingerprint when you can group by these dimensions.

7. Schema drift -- the JSON validates yesterday and breaks today

Anatomy. Your tool schema says status: 'pending' | 'completed' | 'failed'. After a model upgrade, the agent starts returning status: 'in_progress' (not in the enum). Or it adds an extra notes field your parser doesn't expect. Per Collin Wilkins' 2026 structured output guide, schema drift is one of the top regression sources after model upgrades because newer models have slightly different output priors.

Trace fingerprint. A spike in Pydantic/Zod ValidationError immediately after a model deploy. Tool calls that 'succeed' but the downstream system has missing fields. Soft drifts where required fields are present but enum values are out of distribution.

Fix.

strict: true and additionalProperties: false on every tool schema.
Run the Prompt -> Generate -> Validate -> Repair -> Parse loop. The validator hard-fails. The repair step asks the model to fix only the broken fields. Then parse.
Golden contract tests: 50 fixed inputs, expected outputs validated against the schema. Run on every model upgrade. A single broken contract blocks the deploy.

8. Stale memory -- the agent confidently uses information that's no longer true

Anatomy. The user updates their shipping address on Tuesday. The agent's memory store cached the old address from a Monday session. On Wednesday, the agent ships to the wrong address, with a confident note saying 'I used the address you provided.' This is what Arize classifies as memory corruption (8.1% of incidents) -- not data loss, but data staleness without invalidation.

Trace fingerprint. Retrieved memory record where memory.timestamp < user.last_updated_at. Tool calls that quote field values not matching the current source-of-truth. Customer complaints of the form 'the agent used my old [X].'

Fix.

TTL on every memory row. User-facing facts: short TTL (1 hour). Long-term preferences: longer.
Write-through invalidation: any update to a source system fires an event that purges relevant memory keys.
On retrieval, compare memory.timestamp to the source's updated_at. If memory is older, refetch.
Never let the agent assert a fact from memory without a freshness check on critical fields (addresses, payment, identity).

9. Parallel tool race conditions -- two calls clobber each other silently

Anatomy. The model emits two parallel tool calls: inventory.decrement(item=A) and inventory.check(item=A). Both read the row at value=10. The decrement writes 9. The check returns 10 (stale read). The agent thinks there are 10 in stock. Per MachineLearningMastery's analysis, these are silent -- no exception, just corrupted state. We hit one of these in a multi-agent customer-service workflow where two sub-agents updated the same ticket and the second write silently overwrote the first.

Trace fingerprint. tool_use blocks without matching tool_result IDs (we've seen real bugs of this exact shape). Logical inconsistencies: an agent that 'just decremented' inventory reading a higher count than expected. Anomalies on writes when concurrency is enabled.

Fix.

Atomic operations: delegate read-modify-write to the database (UPDATE ... WHERE version = X), not the agent.
Idempotency key on every tool call. Replays are safe.
Verify every tool_use block has a matching tool_result ID before the next turn. Fail loudly if not.
For shared mutable state, lease-based locking with bounded TTL so a crashed agent releases the lock automatically.

10. Partial-state corruption -- 'completed' workflows with missing side effects

Anatomy. A six-step refund agent: validate -> charge reversal -> inventory restore -> notification -> ledger update -> close ticket. Step 4 fails. The agent catches the error, decides the customer 'still got their refund initiated,' and marks the workflow complete. The ledger never updates. Three days later finance finds the discrepancy. This is the most expensive failure mode on the list because it gets discovered far downstream.

Trace fingerprint. A workflow_state record with status='completed' but one or more steps in status='pending' or status='failed'. The final agent message claims success, but the side-effect ledger has gaps.

Fix.

Saga pattern. Every step has a forward action and an explicit compensating action. On failure, run compensations in reverse.
Two-phase commit on side effects where the underlying system supports it. Otherwise use the outbox pattern.
The agent never marks a workflow complete. A separate reconciliation job does, after verifying every step landed.
Daily reconciliation: count workflows marked complete vs side-effects observed. Page on any drift.

11. Eval-prod skew -- 92% pass on the eval suite, 67% success in production

Anatomy. Your eval suite passes. You ship. Production success rate is 25 points lower than offline metrics predicted. Per the AlphaEval study (2026), the best agent configurations score only 64.41/100 on production-grounded tasks despite topping benchmarks. A separate survey found 63% of teams report low confidence in whether model updates actually improve their products. The eval suite was a snapshot. Production is a moving distribution.

Trace fingerprint. Eval pass rate >90%, production success rate <70% on the same intent. Model upgrades that look great offline and worse online. User-reported failures the eval suite never reproduces.

Fix.

Sample anonymized production traces weekly. Replay them as evals. The eval suite is a rolling snapshot of production, not a static fixture.
Shadow traffic: run the new agent version on real user inputs without surfacing responses, compare to the live version.
Track the same success metric (task completion, user-correction rate, downstream business KPI) in both eval and prod. If the metric definitions differ, you don't have an eval suite, you have a benchmark.
See the full methodology in our evaluating an AI agent framework write-up.

How do you find these failure modes in your traces?

Most of these failure modes have a specific, greppable signature in OpenTelemetry traces. Build the eleven detectors once and you'll catch the bulk of incidents before users do.

The minimum trace stack:

Every LLM call: model, prompt_tokens, completion_tokens, latency, $cost, full prompt + response (sampled).
Every tool call: name, arguments, result, duration, error_class.
Every workflow: workflow_id, step_index, step_status, retry_count.
Per trace: trace_id, user_id, tenant_id, total_cost, total_tokens, terminal stop_reason.

OpenTelemetry's agent semantic conventions, now extended by Microsoft and Cisco's Outshift for multi-agent systems, give you a portable schema. With that schema you can build SQL alerts for every fingerprint in this article: 'show me traces where the same (tool, args) hash appears >=3 times,' 'show me traces where prompt_tokens grew >2x turn-over-turn,' 'show me workflows marked complete with open steps.'

We walk the full setup in our OpenTelemetry-based agent observability guide.

Quick reference: the 11 failure modes table

Use this as a one-page cheat sheet when triaging an incident. Match the symptom your on-call is describing to the trace fingerprint, then apply the primary fix.

The table also doubles as a launch checklist. Before you ship an agent to production, you should be able to answer 'what's the detector for this failure mode?' for all eleven rows. If you can't, that mode will hit you. We've been there.

#	Failure Mode	Trace Fingerprint	Primary Fix
1	Tool Hallucination	tool_name not in registry; ToolNotFoundError on first call	Strict tool-name validation + reflective retry with the registry list
2	Infinite Reasoning Loop	Same (tool, args) hash repeated >=3 times in one trace	Per-trace step cap + cycle detector on (tool, args)
3	Premature Termination	stop_reason=end_turn before goal predicate is satisfied	Goal-completion eval before allowing terminal response
4	Context Bloat	prompt_tokens climbing super-linearly across turns	Sliding-window summarization + tool-output compaction
5	Prompt Injection	Tool output contains 'ignore previous' / new system text	Untrusted-data tagging + tool-output sanitization
6	Cost Runaway	$/trace p95 > 10x p50; long tail of >$5 traces	Hard token budget per trace + circuit breaker
7	Schema Drift	Pydantic ValidationError spike after model or tool upgrade	Strict JSON schema + repair loop + golden contract tests
8	Stale Memory	Retrieved memory.timestamp older than user.last_updated_at	TTL on memory rows + invalidation on writes
9	Parallel Tool Race	tool_use blocks without matching tool_result IDs	Atomic state ops + idempotency keys on every tool call
10	Partial-State Corruption	Workflow marked 'completed' with downstream side-effects missing	Saga pattern + compensating actions on partial failure
11	Eval-Prod Skew	Eval pass rate >90%, prod success rate <70% on same intent	Replay prod traces as evals + shadow traffic

Frequently asked questions

What are the most common AI agent bugs?

Five failure classes account for nearly all production agent incidents. Per Arize's 2026 field analysis, context blindness (31.6%), rogue actions (30.3%), silent degradation (24.9%), memory corruption (8.1%), and runaway execution (5.1%) cover the bulk of incidents. Tool hallucination, schema drift, and prompt injection are the three sharpest sub-classes inside those buckets.

How do you debug an infinite agent loop?

Hash every (tool_name, arguments) pair in the trace and alert when the same hash appears three or more times in a single run. Add a hard step cap (typically 15--25 turns) and a no-progress detector that compares the last three reasoning steps for semantic similarity. Most loops are the agent re-trying a tool call against an unhandled error class, so log the full tool-error payload, not just the message string.

What is prompt injection in an AI agent context?

Prompt injection is when untrusted text consumed by the agent (web pages, emails, documents, tool outputs) contains instructions that override the system prompt. Google's April 2026 security report found indirect prompt injection volume rose 32% between November 2025 and February 2026, and that around 40% of agent protocols are exploitable. The fix is to tag every untrusted span and never let tool output influence control flow without validation.

How do you detect tool-call hallucinations?

Validate the tool name against your live registry on every model output, before dispatch. If the name is not in the registry, do not crash -- return the registry list as a tool result and let the agent self-correct. Track a 'invalid_tool_name' counter per model version. Spikes after a model upgrade are your earliest signal that the new model has different tool-naming priors than the old one.

What is eval-prod skew and how do you prevent it?

Eval-prod skew is the gap between offline eval scores and live success rates. The AlphaEval study (2026) found the best agent config scored only 64.41/100 on production-grounded tasks despite passing benchmark suites. Prevent it by replaying anonymized production traces as evals weekly, running shadow traffic against new versions, and tracking the same success metric in both surfaces.

What causes agent token costs to run away?

Multi-turn agent cost compounds, not adds. Every turn re-processes the entire prior context, so by turn ten you are paying for the original instructions, every tool call payload, and every superseded plan. Teams that model cost as 'turns x avg cost' underprice their system by 3x to 5x. Hard per-trace token budgets and tool-output compaction are the two highest-ROI fixes.

How do you catch partial-state corruption in an agent workflow?

Treat every multi-step agent action as a saga with explicit compensating actions. Persist a 'workflow_state' record with per-step status and require all steps to reach 'committed' before the agent returns success. Alert on any workflow where the agent's final response says 'done' but the side-effect ledger has open steps -- this is the most common silent failure mode.

What is schema drift in LLM tool calling?

Schema drift is when an LLM's structured output starts matching the schema partially or with extra fields, often after a model upgrade or tool change. Set additionalProperties: false and strict: true on every tool schema, validate with a real JSON Schema validator (not regex), and run a Prompt -> Generate -> Validate -> Repair -> Parse loop. Track validation-error rate per model version as a leading indicator.

Why do parallel tool calls cause race conditions in agents?

When the model issues parallel tool calls that read or write shared state, two calls can interleave -- one reads, the other updates, the first writes back a stale version. There is no exception, just silent corruption. Force atomic database operations, attach idempotency keys to every tool call, and verify each tool_use ID has a matching tool_result before continuing the trace.

How fast do AI agent failure modes show up in production?

Most fail within hours of first real traffic. Tool hallucinations and schema drift surface in the first 100 traces. Cost runaway and context bloat appear within the first week as tail-risk traces accumulate. Prompt injection and eval-prod skew are slower burns that often surface only after a model or tool upgrade -- which is why version-pinned regression suites matter.

After the Eval-Prod Skew section and FAQ, point readers to the deeper observability stack write-up.

Get the full agent observability stack template