All articles
RAG

RAG Chatbots Explained: When They Work and When They Don't

An honest primer on Retrieval-Augmented Generation: the four patterns where RAG genuinely earns its keep, the four where it fails, and a realistic look at engineering effort and cost.

1 Apr 2026·9 min read·Productized Team

RAG — Retrieval-Augmented Generation — is the default answer every vendor gives when you say the words "AI on our own data". The pitch sounds magical: take your documents, plug in a chatbot, and watch employees ask questions in natural language. The reality is more nuanced. RAG is a useful pattern for a specific class of problems, and a wasteful detour for everything else. This article is for technical decision-makers who want to know which is which before signing off on a budget.

We write this as a software vendor that builds RAG systems in production for mid-market clients. We've shipped RAG chatbots that genuinely changed how teams work. We've also walked away from RAG projects where a 200-line script or a better search box would have done the job at a tenth of the cost.

What RAG actually is (and isn't)

RAG is a two-step pattern: retrieve relevant content from a corpus, then ask an LLM to generate an answer grounded in that content. The retrieval part is usually a vector search — your documents are split into chunks, each chunk gets an embedding, and at query time the system pulls the closest matches. The generation part is a Claude or GPT API call with the chunks injected as context.

What RAG is not: a new kind of AI, a substitute for a search engine, a magic trick that lets an LLM "learn" your data. The model doesn't memorise your documents. Every question is answered fresh, with whatever the retriever happens to surface in that moment. If retrieval is bad, the answer is bad — no amount of clever prompting fixes a corpus that wasn't ready.

The four patterns where RAG works

In our work we see RAG genuinely earn its keep in four scenarios. They share three traits: the corpus is large enough that humans can't scan it, the questions are open-ended, and answers must cite sources.

1. Internal knowledge base

Confluence, SharePoint, internal wikis, policy PDFs — places where the answer exists but nobody can find it. A RAG chatbot turns hours of searching into a single question. The win is rarely "new knowledge"; it's faster access to knowledge that's already there.

2. Customer support over product docs

Tier-1 customer questions where the answer lives in your help centre. RAG handles the long tail of "how do I do X" questions, escalates the rest to humans. Works particularly well when docs are kept fresh — fails when they aren't.

3. Sales enablement

Reps need answers about pricing, competitors, integrations, edge cases — fast, during a call. A RAG bot over case studies, battle cards and product specs is genuinely useful here. The corpus is bounded, the questions are predictable, the consequence of a wrong answer is bounded.

4. Compliance & policy Q&A

Regulated industries with thick policy documents. Employees need to know the rule that applies to a specific situation. RAG with strict citation requirements works — the model retrieves the relevant clause and quotes it. Critical here: the system must say "I don't know" when retrieval misses, never invent.

The four patterns where RAG fails

Equally important: when RAG is the wrong answer. Four cases we see repeatedly.

1. The corpus is small and clean enough that you don't need RAG

If your knowledge fits in 50–100 pages of well-structured text, you don't need a vector store. Just put the whole thing in the LLM context window. Modern models handle 200K+ tokens. RAG adds infrastructure complexity without value below a certain corpus size.

2. The question needs an action, not an answer

"Cancel my subscription", "raise a ticket with priority high", "update the contract end date in the CRM" — these aren't retrieval problems. They're agent problems. A RAG bot will helpfully explain how to cancel a subscription instead of cancelling it. If the desired output is an action in a system, you need an agent with tools, not a chatbot.

3. Latency matters more than depth

RAG adds at least one network round-trip and one LLM call to every query — typically 1.5–4 seconds end to end. For autocomplete, real-time UI hints, or anything inside a typing flow, that's too slow. Use a smaller search index or classical retrieval and skip the generation step.

4. The sources can't be trusted

If your knowledge base is a graveyard of stale, contradictory, or poorly-written documents, RAG will faithfully hallucinate authoritative-sounding answers from garbage. Garbage in, confident garbage out. Fix the corpus first; only then add a chatbot on top.

The "I don't know" guardrail is the single most important feature of a production RAG system. A chatbot that confidently invents an answer when retrieval misses is worse than no chatbot at all. Build it in from day one — and test for it.

The engineering reality

The demo of RAG takes an afternoon. The production version takes weeks. The non-obvious work:

  • Chunking: how you split documents matters more than which embedding model you pick. Bad chunks (mid-sentence cuts, lost headings, no overlap) destroy retrieval quality. Good chunking respects document structure.
  • Embedding choice: the default OpenAI or Cohere embedding is fine for English. For Dutch, multilingual models (e.g. multilingual-e5) typically perform meaningfully better — test before committing.
  • Retrieval evaluation: build a set of 50–200 real questions with expected source documents. Measure recall@k. Without this, you're tuning blind.
  • Reranking: a cross-encoder reranker on top of vector search consistently improves answer quality. It's an extra 200–500ms per query, usually worth it.
  • The "I don't know" guardrail: instruct the model to refuse when retrieved chunks don't contain the answer. Test this aggressively — it's the difference between trustworthy and dangerous.
  • Source citation: surface which chunks were used, ideally with page numbers and a link back to the original document. Builds trust and lets users verify.

RAG vs fine-tuning vs general LLM vs agent

These four options get conflated in vendor pitches. They solve different problems.

General LLMRAGFine-tuningAgent
Use caseGeneric Q&A, writing, codeQ&A grounded in your documentsStyle/format adaptationMulti-step tasks with actions
Updates with new info?No (until next model release)Yes — just add documentsNo — requires retrainingYes — uses RAG + tools
Source citations?NoYesNoYes (when it uses RAG)
Takes actions?NoNoNoYes
Build cost€0 — just API calls€20K–€80K€50K–€300K+€40K–€250K
When to chooseDefault. Try this first.You have a corpus and need grounded answersRare — voice/format is the actual problemYou need actions, not answers

Our default advice: start with a general LLM call. If that's not enough, add RAG. Only add fine-tuning if a specific style or format requirement makes prompting impractical — which is rare. If the user wants something done rather than answered, you're building an agent, not a chatbot.

Cost and timeline reality check

Realistic ranges for a production RAG chatbot at a mid-market company:

  • Light internal RAG bot, single source, basic UI: €20K–€35K, 4–5 weeks.
  • Multi-source RAG with reranking, evaluation set, monitoring: €40K–€60K, 6–8 weeks.
  • Customer-facing RAG with strict guardrails, SSO, audit logging: €60K–€80K, 8–10 weeks.
  • Plus 15–25% per year for maintenance — corpus changes, models change, prompts drift.

Anything quoted dramatically below that range is either skipping evaluation work, reusing a closed-source platform you'll be locked into, or under-scoping. Anything dramatically above is usually scope creep — you're paying for a data platform you didn't ask for.

How we build RAG systems

We build RAG chatbots and AI agents for mid-market companies. We start with a 1–2 week discovery: assess the corpus, build an evaluation set, prototype the retrieval pipeline, and show real numbers before quoting a build. More on our approach is on our service page for RAG chatbots.

Have a corpus and a question in mind? Describe what you'd want users to ask it via our contact form — we'll respond within one working day with an honest read on whether RAG is the right shape, and roughly what it would cost.

Relevant pages