Table of Contents

Should I Use RAG or Fine-Tuning for My Use Case?


TL;DR

Use RAG when your data changes frequently, you need source attribution, or you want to add knowledge without modifying the model. Use fine-tuning when you need to change the model's behavior, writing style, or teach it a specialized skill. Many production systems use both: fine-tuning for style and behavior, RAG for up-to-date factual knowledge.


Quick Decision Guide

Your situation Recommendation
Your knowledge base changes weekly or more often RAG
You need the model to cite sources RAG
You want to add domain knowledge without retraining RAG
You need the model to write in a specific tone or format Fine-tuning
You want the model to follow a specialized workflow Fine-tuning
You have less than 100 training examples RAG (fine-tuning needs more data)
You need both current facts and specialized behavior Both

How RAG Works in LM-Kit.NET

RAG keeps the model unchanged. Instead, it retrieves relevant passages from your documents at query time and injects them into the prompt context:

using LMKit.Model;
using LMKit.Retrieval;

// 1. Index your documents (one-time)
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
var ragEngine = new RagEngine(embeddingModel);
ragEngine.ImportDocument("product-catalog.pdf");
ragEngine.ImportDocument("support-articles.md");

// 2. Query with automatic retrieval
using LM chatModel = LM.LoadFromModelID("qwen3.5:9b");
var chat = new RagChat(chatModel, ragEngine);
var answer = await chat.SubmitAsync("What is the return policy for electronics?");

Strengths:

  • Knowledge base can be updated instantly (re-index changed documents)
  • Answers are grounded in specific passages (traceable, auditable)
  • No GPU-intensive training step
  • Works with any model out of the box

Limitations:

  • Does not change the model's behavior or writing style
  • Retrieval quality depends on embedding model and chunking strategy
  • Context window limits how much retrieved text can be injected

How Fine-Tuning Works in LM-Kit.NET

Fine-tuning modifies the model's weights using your training data, changing how it generates output. LM-Kit.NET supports LoRA (Low-Rank Adaptation), which trains a small adapter on top of the base model:

using LMKit.Model;
using LMKit.Finetuning;

using LM model = LM.LoadFromModelID("qwen3.5:4b");

var finetuning = new LoraFinetuning(model);

// Configure training
finetuning.Intent = LoraFinetuning.FinetuningIntent.StylisticGuidance; // rank 4
finetuning.Iterations = 100;
finetuning.BatchSize = 4;

// Train on your examples
finetuning.AddTrainingExample(prompt: "Summarize this ticket", completion: "...");
// ... add more examples

// Export as LoRA adapter (small file, ~10-50 MB)
finetuning.Finetune2Lora("my-adapter.lora");

Strengths:

  • Changes the model's behavior, tone, and output style
  • Knowledge is embedded in the model weights (no retrieval step at inference time)
  • Smaller LoRA adapters can be swapped at runtime for different tasks

Limitations:

  • Requires training data (ideally 50+ high-quality examples)
  • Training takes time and GPU resources
  • Knowledge embedded by fine-tuning can become stale
  • Risk of degrading the model's general capabilities if over-tuned

Combining Both Approaches

The most powerful setup uses fine-tuning for behavior and RAG for knowledge:

  1. Fine-tune the model to follow your output format, use your terminology, and match your brand voice.
  2. Use RAG to ground every response in current, verified data from your knowledge base.

This gives you a model that behaves the way you want while always answering from up-to-date facts.


Cost and Complexity Comparison

Factor RAG Fine-Tuning
Setup time Minutes (index documents) Hours (prepare data, train)
GPU requirement for setup Minimal (embedding model only) Significant (full training loop)
Data maintenance Re-index when documents change Retrain when behavior needs to change
Inference cost Slightly higher (retrieval + generation) Same as base model
Reversibility Instant (remove documents) Requires discarding adapter

Share