What does a research agent cost to run?
An agent that searches the web, reads results, and synthesizes an answer over many tool calls. Low volume but long, tool-heavy runs, so the agentic context tax is the headline cost. On Claude Sonnet this works out to about $3,598 a month; here is the figure across every model and what drives it.
assumptions
A planning estimate for this shape of workload. Tune any of it in the calculator.
- 600 research runs a day
- ~1,500 prompt tokens per request
- ~800 output tokens per turn
- 6 tool calls per run (search, fetch, read, repeat)
- A flat serving line for a warm endpoint
monthly_cost · Claude Sonnet
$3,598/ month
- Input tokens37.8k/req · agentic context
- $2,041
- Output tokens5.6k/req
- $1,512
- GCP servingwarm endpoint, flat
- $45.00
37.8k input · 5.6k output · 7 LLM turns / request
cost_by_model
A research agent across every model
| model | cost / month |
|---|---|
| Gemini 1.5 FlashGoogle (Vertex) | $126cheapest |
| GPT-4o miniOpenAI | $208 |
| Claude HaikuAnthropic | $993 |
| Gemini 1.5 ProGoogle (Vertex) | $1,400 |
| GPT-4oOpenAI | $2,754 |
| Claude SonnetAnthropic · shown above | $3,598 |
| Claude OpusAnthropic | $17,811 |
cheapest · public list prices as of 2026-06 · planning estimate, not a quote
what_drives_it
Where the money goes
Six tool calls per run re-send a growing context each turn, so input tokens climb steeply. The loop length, not the volume, sets the bill.
The cheapest option here, Gemini 1.5 Flash, comes to about $126 a month against $3,598 on Claude Sonnet. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.
faq
Questions & answers
- How much does a research agent cost per month?
- On Claude Sonnet, about $3,598 a month at 600 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $126 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
- What makes a research agent expensive?
- Six tool calls per run re-send a growing context each turn, so input tokens climb steeply. The loop length, not the volume, sets the bill.
- Which model is cheapest for a research agent?
- Gemini 1.5 Flash, at about $126 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.
A cost estimate is a start. Making an agent cheap in production is the work.
Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →