What does a voice assistant backend cost to run?
The language model behind a voice assistant: short prompts, short replies, very high request volume, and one tool call for an action. Latency-sensitive, so a fast cheap model is the usual choice. On GPT-4o mini this works out to about $549 a month; here is the figure across every model and what drives it.
assumptions
A planning estimate for this shape of workload. Tune any of it in the calculator.
- 40,000 turns a day
- ~600 input tokens per turn
- ~150 output tokens per reply
- 1 tool call per turn (an action or lookup)
monthly_cost · GPT-4o mini
$549/ month
- Input tokens1.9k/req · agentic context
- $333
- Output tokens300/req
- $216
1.9k input · 300 output · 2 LLM turns / request
cost_by_model
A voice assistant backend across every model
| model | cost / month |
|---|---|
| Gemini 1.5 FlashGoogle (Vertex) | $275cheapest |
| GPT-4o miniOpenAI · shown above | $549 |
| Claude HaikuAnthropic | $3,216 |
| Gemini 1.5 ProGoogle (Vertex) | $4,575 |
| GPT-4oOpenAI | $9,150 |
| Claude SonnetAnthropic | $12,060 |
| Claude OpusAnthropic | $60,300 |
cheapest · public list prices as of 2026-06 · planning estimate, not a quote
what_drives_it
Where the money goes
Throughput dominates: at 40,000 turns a day even small per-turn token counts add up, so the per-token rate is what matters.
The cheapest option here, Gemini 1.5 Flash, comes to about $275 a month against $549 on GPT-4o mini. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.
faq
Questions & answers
- How much does a voice assistant backend cost per month?
- On GPT-4o mini, about $549 a month at 40,000 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $275 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
- What makes a voice assistant backend expensive?
- Throughput dominates: at 40,000 turns a day even small per-turn token counts add up, so the per-token rate is what matters.
- Which model is cheapest for a voice assistant backend?
- Gemini 1.5 Flash, at about $275 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.
A cost estimate is a start. Making an agent cheap in production is the work.
Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →