What does a customer support agent cost to run?
A support assistant that answers from your help-center docs and can look up an order or account before replying. Moderate volume, retrieval on every turn, and a tool call or two per conversation. On Claude Sonnet this works out to about $5,105 a month; here is the figure across every model and what drives it.
assumptions
A planning estimate for this shape of workload. Tune any of it in the calculator.
- 3,000 conversations a day
- ~1,200 prompt tokens (system + question) plus ~2,500 retrieved tokens of help-center context
- ~350 output tokens per reply
- 2 tool calls per request (order/account lookups)
monthly_cost · Claude Sonnet
$5,105/ month
- Input tokens13.7k/req · agentic context
- $3,686
- Output tokens1.1k/req
- $1,418
- Embeddings (RAG)query embedding per request
- $2.16
13.7k input · 1.1k output · 3 LLM turns / request
cost_by_model
A customer support agent across every model
| model | cost / month |
|---|---|
| Gemini 1.5 FlashGoogle (Vertex) | $123cheapest |
| GPT-4o miniOpenAI | $243 |
| Claude HaikuAnthropic | $1,363 |
| Gemini 1.5 ProGoogle (Vertex) | $2,010 |
| GPT-4oOpenAI | $4,018 |
| Claude SonnetAnthropic · shown above | $5,105 |
| Claude OpusAnthropic | $25,517 |
cheapest · public list prices as of 2026-06 · planning estimate, not a quote
what_drives_it
Where the money goes
Retrieval adds a few thousand tokens to every turn, and each tool call re-sends that context, so input tokens dominate the bill.
The cheapest option here, Gemini 1.5 Flash, comes to about $123 a month against $5,105 on Claude Sonnet. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.
faq
Questions & answers
- How much does a customer support agent cost per month?
- On Claude Sonnet, about $5,105 a month at 3,000 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $123 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
- What makes a customer support agent expensive?
- Retrieval adds a few thousand tokens to every turn, and each tool call re-sends that context, so input tokens dominate the bill.
- Which model is cheapest for a customer support agent?
- Gemini 1.5 Flash, at about $123 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.
A cost estimate is a start. Making an agent cheap in production is the work.
Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →