Retrieval-Augmented Generation (RAG)
RAG is the pattern of fetching relevant documents at query time and putting them in the prompt so the model answers from your data instead of its training, without retraining the model.
also: RAG · retrieval augmented generation · vector search
A RAG pipeline embeds your documents into vectors, stores them, and at query time embeds the question, retrieves the closest chunks, and injects them into the prompt as context. The model then answers grounded in those chunks. It is how you get a model to speak to private, current, or domain-specific knowledge it never saw in training, and it makes answers checkable because you can cite the retrieved sources.
RAG is cheaper and faster to change than fine-tuning (update the documents, not the weights), but its quality is bounded by retrieval: if the right chunk is not fetched, the model cannot use it, and irrelevant chunks waste context and can mislead. The work is in chunking, embedding choice, and ranking, and the cost shows up as embedding calls plus the retrieved tokens you add to every request, which is why retrieval size is a budget line, not a free win.
related_terms
faq
Questions & answers
- When should I use RAG instead of fine-tuning?
- Use RAG when the knowledge changes often, must be cited, or is too large to bake into weights, which covers most 'answer over our docs' use cases. Fine-tuning suits teaching a style, format, or skill rather than facts. Many systems use RAG for knowledge and light fine-tuning for behaviour.
- Why are my RAG answers wrong even with the data indexed?
- Usually retrieval, not the model. If the relevant chunk is not in the top results, the model never sees it, so answers fall back to guesses. Fixes live in chunking, the embedding model, and ranking or reranking, plus checking that the retrieved context actually contains the answer before blaming the generation step.
Want this applied to your stack, not just defined?
The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →