How long should an LLM tool description be?

One or two sentences is the sweet spot: enough to say what the tool does, what it returns, and when to call it, without padding the tool block (which is re-sent every turn). Skip implementation detail the model doesn't need; spend the words on the usage trigger instead.

What makes a good tool description for function calling?

It names what the tool does and returns, and it states an explicit trigger for when to use it. The trigger is the highest-leverage part, because the model selects tools by matching the request against it. If the tool is easy to confuse with another, lead with the distinguishing case.

Descriptions that don't say when to use the tool

The tool description is the model's only basis for choosing a tool. A missing or one-line description that says what the tool is but not when to use it leaves the model guessing, so the tool gets called at the wrong time or never.

see_it · fix_it

The failure, then the fix

Each verdict below is the actual MCP & Agent Tool Auditor run on the snippet, not a description of one.

before

[
  {
    "name": "get_weather",
    "description": "Weather.",
    "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name, e.g. Paris" } }, "required": ["city"] }
  }
]

Warns · auditor verdictVery short description on: get_weather. A few words rarely say enough about what the tool does and when to use it for the model to pick it reliably.

after

[
  {
    "name": "get_weather",
    "description": "Get the current conditions and a short forecast for a city. Use this when the user asks about weather, temperature, or whether to expect rain for a specific place.",
    "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name, e.g. Paris" } }, "required": ["city"] }
  }
]

Passes · auditor verdictEvery tool carries a substantive description, which is what the model reads to decide whether a tool fits the request.

fix · Write one or two sentences: what the tool does, what it returns, and when to call it. Treat it as the prompt that sells the tool to the model.

why_it_matters

A tool description is the prompt that sells the tool to the model. 'Get weather' tells the model almost nothing: not what it returns, not what arguments it expects, and crucially not when to reach for it instead of answering directly. Treat the description as two jobs in one or two sentences: what the tool does and returns, and an explicit trigger for when to call it ('Use this when the user asks about current conditions or forecast for a place').

The trigger clause is what most descriptions miss and what most improves selection. Lead with it when a tool is easy to confuse with another. The auditor fails a tool with no description, warns on a description too short to carry that information, and separately checks whether the descriptions across your set actually state a usage trigger rather than only naming the tool.

"Get weather" → ?what + when to callthe trigger is the part that's missing

more_failure_modes

Related ways tools break

Tool selection

Two tools the model can't tell apart

Read it Parameter schema

Parameters with no description

Read it

See all 6 failure modes, or read what tool calling is.

faq

Questions & answers

How long should an LLM tool description be?: One or two sentences is the sweet spot: enough to say what the tool does, what it returns, and when to call it, without padding the tool block (which is re-sent every turn). Skip implementation detail the model doesn't need; spend the words on the usage trigger instead.
What makes a good tool description for function calling?: It names what the tool does and returns, and it states an explicit trigger for when to use it. The trigger is the highest-leverage part, because the model selects tools by matching the request against it. If the tool is easy to confuse with another, lead with the distinguishing case.

Spotting one failure is easy. Hardening the whole agent is the work.

I review which tools the loop can reach autonomously, how you fence destructive calls behind confirmation, idempotency on the side effects, and the evals that catch a wrong tool call before users do. Book a call, or leave your email.

Book a call

Prefer proof first? See how this plays out in real case studies →