Skip to content

What does a rag chatbot cost to run?

A retrieval-augmented chatbot answering questions over a knowledge base. High volume, retrieval on every request, no agentic tool calls, so cost scales cleanly with traffic and retrieved context. On GPT-4o mini this works out to about $133 a month; here is the figure across every model and what drives it.

assumptions

A planning estimate for this shape of workload. Tune any of it in the calculator.

  • 5,000 questions a day
  • ~800 prompt tokens plus ~3,000 retrieved tokens per question
  • ~500 output tokens per answer
  • No tool calls (single retrieval-then-answer turn)

monthly_cost · GPT-4o mini

$133/ month

Input tokens3.8k/req · agentic context
$85.50
Output tokens500/req
$45.00
Embeddings (RAG)query embedding per request
$2.40

3.8k input · 500 output · 1 LLM turn / request

cost_by_model

A rag chatbot across every model

Monthly cost of a RAG chatbot across models
modelcost / month
Gemini 1.5 FlashGoogle (Vertex)$67.65cheapest
GPT-4o miniOpenAI · shown above$133
Claude HaikuAnthropic$758
Gemini 1.5 ProGoogle (Vertex)$1,090
GPT-4oOpenAI$2,177
Claude SonnetAnthropic$2,837
Claude OpusAnthropic$14,177

cheapest · public list prices as of 2026-06 · planning estimate, not a quote

free_toolTune this scenario to your numbersOpens the AI Agent Cost Calculator prefilled with this workload. Change the volume, tokens, tool calls, and RAG to match your own and watch the cost move.

what_drives_it

Where the money goes

The retrieved context injected into every request is the biggest line; trimming chunks or raising relevance cuts the bill directly.

The cheapest option here, Gemini 1.5 Flash, comes to about $67.65 a month against $133 on GPT-4o mini. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.

faq

Questions & answers

How much does a rag chatbot cost per month?
On GPT-4o mini, about $133 a month at 5,000 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $67.65 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
What makes a rag chatbot expensive?
The retrieved context injected into every request is the biggest line; trimming chunks or raising relevance cuts the bill directly.
Which model is cheapest for a rag chatbot?
Gemini 1.5 Flash, at about $67.65 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.

A cost estimate is a start. Making an agent cheap in production is the work.

Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.

Book a call

No spam. You'll get a reply from me.

Prefer proof first? See how this plays out in real case studies →