Fine-tuning vs RAG vs prompting: choosing in 2025
There are three ways to make an LLM do what your application needs — prompting, retrieval, and fine-tuning — and teams routinely reach for the wrong one. They solve different problems, and the costs differ enormously. A trend post on what each is actually for, and a decision order that saves you from over-engineering.
When you need an LLM to do something specific for your application, there are three broad approaches: prompting (carefully instructing the model), RAG (retrieving relevant information and giving it to the model at query time), and fine-tuning (further training the model on your own data). By 2025 these are well-established, but teams still routinely reach for the wrong one — most often fine-tuning, the heaviest tool, for problems the lightest tool would solve. The confusion is understandable, because the three sound interchangeable but actually solve different problems and cost wildly different amounts. This is a trend post on what each technique is genuinely for, how they compare, and — most usefully — the order you should consider them in, which is itself the antidote to the over-engineering that plagues this decision.
What each one actually does
The key to choosing well is understanding that these aren’t three options for the same job — they target different things:
- Prompting shapes the model’s behaviour through the input: clear instructions, few-shot examples, a specified output format. It changes how the model responds to a given request without changing the model or adding external data. It’s the lightest touch, and it’s remarkably powerful — modern models follow good prompts well.
- RAG gives the model knowledge it doesn’t have by retrieving relevant documents at query time and including them in the prompt. It’s the answer to “the model needs to know things from my data” — your docs, your records, current information past the training cutoff. It changes what the model knows for this request, not how it behaves.
- Fine-tuning further trains the model on examples to change its behaviour, style, or format at a deep level — baking a consistent way of responding into the weights themselves. It changes the model itself, and it’s by far the heaviest of the three: you need training data, a training process, and the ongoing reality of maintaining a custom model.
State them side by side and the distinction is sharp: prompting changes how the model responds, RAG changes what it knows, fine-tuning changes what it fundamentally is. Most “how do I customize the model” questions are really one of these three specific questions in disguise, and naming which one you actually have is most of the battle.
The most common mistake: fine-tuning for knowledge
The single most frequent error is reaching for fine-tuning to give the model knowledge. “We want the model to know our product documentation, so let’s fine-tune it on our docs.” This is usually wrong, and expensively so. Fine-tuning is poorly suited to injecting knowledge for several reasons:
- It bakes information into weights, where it’s hard to update. Your docs change; a fine-tuned model is frozen at training time, so keeping it current means retraining — slow and costly — versus RAG, where you just update the indexed documents.
- It’s lossy and unreliable for facts. Fine-tuning shifts the model’s tendencies; it doesn’t reliably store specific facts you can later retrieve accurately. The model may “sort of” know your data rather than being able to cite it.
- RAG does this job better and cheaper. Retrieval gives the model the exact relevant information at query time, with sources, updatable instantly by changing the documents. For the “model needs to know my data” problem, RAG is almost always the right answer and fine-tuning the wrong one.
If your problem is knowledge — the model needs to know things it wasn’t trained on — reach for RAG, not fine-tuning. This one correction saves more wasted effort than any other in this space.
What fine-tuning is actually for
Fine-tuning isn’t useless — it’s just for a narrower problem than people reach for it. It earns its keep when you need to change the model’s behaviour or form consistently, in ways prompting can’t reliably achieve:
- A very specific, consistent output style or format that you need on every single call and that’s hard to get reliably from prompting alone.
- A specialized task the base model does poorly, where you have many examples of the desired input→output mapping and need the model to internalize that pattern.
- Reducing prompt size at scale. If you’d otherwise need a huge prompt with many examples on every call, fine-tuning that behaviour in can cut per-call token cost — a real but specific economic case.
The common thread is behaviour and form, not knowledge — and even then, only when prompting has genuinely fallen short, because fine-tuning’s cost (data, training, maintaining a custom model) is high enough that it should be a considered choice, not a first reach.
The decision order
The practical guidance, which is also the over-engineering antidote, is to consider these techniques in increasing order of cost and complexity, stopping as soon as one solves your problem:
- Start with prompting. It’s the cheapest, fastest, and most flexible, and modern models are good enough that careful prompting solves a surprising amount. Always try this first, and invest in doing it well before reaching further.
- Add RAG when the problem is knowledge. If the model needs information it doesn’t have — your data, current facts — add retrieval. This handles the large category of “make the model know my stuff” and is far lighter than fine-tuning.
- Reach for fine-tuning last, and only for behaviour/form. If, after prompting and RAG, you still have a genuine need to change how the model fundamentally behaves or formats — and you have the data and a real justification — then fine-tune. It’s the heaviest tool and should be the last considered.
Crucially, these combine: a real system often uses a fine-tuned (or just well-chosen) model, with RAG for knowledge, and careful prompting on top. They’re layers, not alternatives. But the order of reach matters enormously, because reaching for the heavy tool first is the classic over-engineering trap dressed in AI clothes — taking on a custom-model maintenance burden for a problem a better prompt or a retrieval step would have solved.
Verdict
There are three ways to make an LLM do what your app needs, and they solve different problems: prompting changes how the model responds, RAG changes what it knows, and fine-tuning changes what it fundamentally is. The most common and expensive mistake is reaching for fine-tuning to give the model knowledge — which it does poorly, lossily, and un-updatably, where RAG does the same job better, cheaper, and with live updates and sources. Fine-tuning is genuinely for changing behaviour or form — a consistent style or format, a specialized task with many examples, cutting prompt size at scale — and only when prompting has truly fallen short, because its cost in data, training, and custom-model maintenance is high. The decision framework is the antidote to over-engineering: try prompting first (cheapest, surprisingly capable), add RAG when the problem is knowledge, and reach for fine-tuning last and only for behaviour, remembering that the three are layers that combine, not competing alternatives. Match the technique to the actual problem — and recognize which of the three problems you really have — and you’ll avoid the dominant failure in this space: hauling out the heaviest tool for a job the lightest one was built to do.