What does a document extraction pipeline cost to run?
A batch pipeline turning documents into structured JSON (invoices, forms, contracts). Very high volume, one shot per document, no tools or retrieval, so the cost is almost purely tokens times throughput. On GPT-4o mini this works out to about $369 a month; here is the figure across every model and what drives it.
assumptions
A planning estimate for this shape of workload. Tune any of it in the calculator.
- 20,000 documents a day
- ~2,500 input tokens per document
- ~400 output tokens of JSON per document
- No tool calls, no retrieval
monthly_cost · GPT-4o mini
$369/ month
- Input tokens2.5k/req · agentic context
- $225
- Output tokens400/req
- $144
2.5k input · 400 output · 1 LLM turn / request
cost_by_model
A document extraction pipeline across every model
| model | cost / month |
|---|---|
| Gemini 1.5 FlashGoogle (Vertex) | $185cheapest |
| GPT-4o miniOpenAI · shown above | $369 |
| Claude HaikuAnthropic | $2,160 |
| Gemini 1.5 ProGoogle (Vertex) | $3,075 |
| GPT-4oOpenAI | $6,150 |
| Claude SonnetAnthropic | $8,100 |
| Claude OpusAnthropic | $40,500 |
cheapest · public list prices as of 2026-06 · planning estimate, not a quote
what_drives_it
Where the money goes
Pure volume: at 20,000 documents a day the input token count dominates, so a cheaper model or prompt caching moves the bill the most.
The cheapest option here, Gemini 1.5 Flash, comes to about $185 a month against $369 on GPT-4o mini. Whether the cheaper model fits is a question for your evaluation set, not the price sheet. The bigger lever is usually the workload itself: caching re-sent context, trimming what each turn carries, and capping the tool loop move the bill more than swapping models does.
faq
Questions & answers
- How much does a document extraction pipeline cost per month?
- On GPT-4o mini, about $369 a month at 20,000 requests a day with the assumptions below. The cheapest model compared here, Gemini 1.5 Flash, runs about $185 for the same workload. Your real figure moves with volume and tokens, so tune it in the calculator.
- What makes a document extraction pipeline expensive?
- Pure volume: at 20,000 documents a day the input token count dominates, so a cheaper model or prompt caching moves the bill the most.
- Which model is cheapest for a document extraction pipeline?
- Gemini 1.5 Flash, at about $185 a month for this workload. Cheaper is not automatically better: a model that needs retries or longer prompts can cost more in practice, so test the candidates on your own evaluation set before committing.
A cost estimate is a start. Making an agent cheap in production is the work.
Prompt caching, context trimming, and the right model per step usually cut an agent's bill by more than half. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →