Skip to content

What does a voice assistant backend cost to run?

The language model behind a voice assistant: short prompts, short replies, very high request volume, and one tool call for an action. Latency-sensitive, so a fast cheap model is the usual choice. On GPT-4o mini this works out to about $549 a month; here is the figure across every model and what drives it.

assumptions

A planning estimate for this shape of workload. Tune any of it in the calculator.

  • 40,000 turns a day
  • ~600 input tokens per turn
  • ~150 output tokens per reply
  • 1 tool call per turn (an action or lookup)

monthly_cost · GPT-4o mini

$549/ month

Input tokens1.9k/req · agentic context
$333
Output tokens300/req
$216

1.9k input · 300 output · 2 LLM turns / request

cost_by_model

A voice assistant backend across every model

Monthly cost of a Voice assistant backend across models
modelcost / month
Gemini 1.5 FlashGoogle (Vertex)$275cheapest
GPT-4o miniOpenAI · shown above$549
Claude HaikuAnthropic$3,216
Gemini 1.5 ProGoogle (Vertex)$4,575
GPT-4oOpenAI$9,150
Claude SonnetAnthropic$12,060
Claude OpusAnthropic$60,300

cheapest · public list prices as of 2026-06 · planning estimate, not a quote

free_toolTune this scenario to your numbersOpens the AI Agent Cost Calculator prefilled with this workload. Change the volume, tokens, tool calls, and RAG to match your own and watch the cost move.

what_drives_it

Where the money goes

Throughput dominates: at 40,000 turns a day even small per-turn token counts add up, so the per-token rate is what matters.

The cheapest option here, Gemini 1.5 Flash, comes to about $275 a month against $549 on GPT-4o mini. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.

faq

Questions & answers

How much does a voice assistant backend cost per month?
On GPT-4o mini, about $549 a month at 40,000 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $275 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
What makes a voice assistant backend expensive?
Throughput dominates: at 40,000 turns a day even small per-turn token counts add up, so the per-token rate is what matters.
Which model is cheapest for a voice assistant backend?
Gemini 1.5 Flash, at about $275 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.

A cost estimate is a start. Making an agent cheap in production is the work.

Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.

Book a call

No spam. You'll get a reply from me.

Prefer proof first? See how this plays out in real case studies →