Why does my agent call the wrong tool intermittently?

Usually because two of its tools look interchangeable. The model chooses from the name and description, so if two tools read the same it has no signal to separate them and selects unpredictably. Sharpen each description to name the case it owns, or merge them, and the intermittent wrong calls go away.

How many tools is too many for one agent?

There's no hard limit, but selection accuracy falls off past roughly 20-25 tools because the model has more near-neighbors to confuse and more text to weigh each turn. Beyond that, expose only the tools relevant to the current step or route them behind a sub-agent.

Two tools the model can't tell apart

When two tools have similar names or descriptions, the model has no reliable basis to choose between them, so it picks one at random — which shows up as flaky, hard-to-reproduce agent behavior.

see_it · fix_it

The failure, then the fix

Each verdict below is the actual MCP & Agent Tool Auditor run on the snippet, not a description of one.

before

[
  {
    "name": "search_docs",
    "description": "Search the product documentation and return the matching pages. Use this when the user asks a how-to question.",
    "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] }
  },
  {
    "name": "find_documents",
    "description": "Search the product documentation and return matching pages. Use this when the user asks a how-to question.",
    "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] }
  }
]

Warns · auditor verdictThese pairs look interchangeable by name or description: search_docs ↔ find_documents. When two tools read the same, the model picks between them at random, which surfaces as flaky, hard-to-reproduce behavior.

after

[
  {
    "name": "search_help_center",
    "description": "Search published, customer-facing help-center articles. Use this when the user asks a how-to or product question. Prefer this over search_runbooks for end-user questions.",
    "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] }
  },
  {
    "name": "search_runbooks",
    "description": "Search internal engineering runbooks and incident docs. Use this only for on-call, debugging, or infrastructure questions, never for customer-facing answers.",
    "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] }
  }
]

Passes · auditor verdictNo two tools read as interchangeable, so the model isn't forced to guess between near-duplicates.

fix · Merge the duplicates into one tool, or sharpen each description to name the case it owns and say which to prefer when both seem to fit.

why_it_matters

An LLM selects a tool from its name and description alone. When two tools read as near-duplicates (search_docs and find_documents, both 'search the docs'), the model is being asked to flip a coin, and it does. Sometimes it calls the right one, sometimes the wrong one, and because the inputs look identical you can't reproduce the failure on demand. This is one of the most common reasons a tool-calling agent is 'mostly fine' but occasionally does the wrong thing.

The fix is to make each tool own a distinct case in its description, or to merge the duplicates into one tool. If two tools genuinely differ, say how: 'Search published help-center articles' versus 'Search internal engineering runbooks', and add a line on which to prefer when both seem to fit. The auditor flags a pair when their names normalize to the same verb-and-noun or their descriptions share most of their content words.

search_docs ↔ find_documentsmodel picks at randomname each tool's distinct case

more_failure_modes

Related ways tools break

Tool selection

Descriptions that don't say when to use the tool

Read it Parameter schema

Parameters with no description

Read it

See all 6 failure modes, or read what tool calling is.

faq

Questions & answers

Why does my agent call the wrong tool intermittently?: Usually because two of its tools look interchangeable. The model chooses from the name and description, so if two tools read the same it has no signal to separate them and selects unpredictably. Sharpen each description to name the case it owns, or merge them, and the intermittent wrong calls go away.
How many tools is too many for one agent?: There's no hard limit, but selection accuracy falls off past roughly 20-25 tools because the model has more near-neighbors to confuse and more text to weigh each turn. Beyond that, expose only the tools relevant to the current step or route them behind a sub-agent.

Spotting one failure is easy. Hardening the whole agent is the work.

I review which tools the loop can reach autonomously, how you fence destructive calls behind confirmation, idempotency on the side effects, and the evals that catch a wrong tool call before users do. Book a call, or leave your email.

Book a call

Prefer proof first? See how this plays out in real case studies →