Table of Contents

Improve RAG Results with Reranking

A basic RAG pipeline retrieves documents by embedding similarity, but embedding similarity is not always the best measure of relevance. A document might be semantically close to the query yet not actually answer the question. Reranking applies a cross-encoder model that scores each retrieved passage against the query with higher precision, then blends that score with the original similarity score. LM-Kit.NET provides both a standalone Reranker class and an integrated RagReranker that plugs directly into the RAG pipeline.


Why This Matters

Two enterprise problems that reranking solves:

  1. Reducing hallucinations from irrelevant passages. Without reranking, the top-retrieved chunk might be thematically related but factually irrelevant, causing the LLM to hallucinate an answer based on the wrong context. A reranker catches these false positives by evaluating each passage against the query with a cross-encoder that considers both inputs simultaneously.
  2. Improving precision in domain-specific knowledge bases. Medical, legal, and financial documents contain many passages with overlapping terminology. Reranking distinguishes the passage that actually answers the question from ones that merely contain similar vocabulary.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
RAM 16 GB recommended
VRAM 6 GB (for embedding + chat models)
Disk ~4 GB free for model downloads

Step 1: Create the Project

dotnet new console -n RerankingQuickstart
cd RerankingQuickstart
dotnet add package LM-Kit.NET

Step 2: Standalone Reranking

The Reranker class scores documents against a query independently of the RAG pipeline. This is useful for evaluating passage relevance, sorting search results, or building custom retrieval logic.

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create reranker
// ──────────────────────────────────────
var reranker = new Reranker(embeddingModel);

// ──────────────────────────────────────
// 3. Score individual documents against a query
// ──────────────────────────────────────
string query = "What are the side effects of ibuprofen?";

string[] documents =
{
    "Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.",
    "Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.",
    "Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.",
    "The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours."
};

Console.WriteLine($"Query: \"{query}\"\n");

float[] scores = reranker.GetScore(query, documents);

// Sort by relevance score
var ranked = documents
    .Zip(scores, (doc, score) => new { Document = doc, Score = score })
    .OrderByDescending(x => x.Score)
    .ToList();

Console.WriteLine("Reranked results:");
for (int i = 0; i < ranked.Count; i++)
{
    Console.WriteLine($"  #{i + 1} (score: {ranked[i].Score:F4})");
    Console.WriteLine($"     {ranked[i].Document}\n");
}

Run it:

dotnet run

Example output:

Loading embedding model...
  Loading: 100%    Done.

Query: "What are the side effects of ibuprofen?"

Reranked results:
  #1 (score: 0.9234)
     Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.

  #2 (score: 0.6812)
     Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.

  #3 (score: 0.5147)
     The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours.

  #4 (score: 0.4203)
     Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.

Notice that the passage about side effects ranks first, while the general definition of ibuprofen ranks last despite containing the keyword "ibuprofen" prominently.


Step 3: Integrating Reranking into a RAG Pipeline

The RagEngine.RagReranker class plugs reranking directly into the RAG pipeline. When set, the engine automatically reranks retrieved partitions before passing them to the LLM for answer generation.

using LMKit.Data;
using LMKit.Retrieval;
using LMKit.TextGeneration;

// ──────────────────────────────────────
// 1. Load chat model
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Set up RAG with reranking
// ──────────────────────────────────────
var dataSource = DataSource.CreateInMemoryDataSource("KnowledgeBase", embeddingModel);
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);

// Enable reranking with alpha=0.7 (favor reranker score)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);

// ──────────────────────────────────────
// 3. Index documents
// ──────────────────────────────────────
string[] docs =
{
    "Ibuprofen side effects include stomach pain, nausea, dizziness, and in rare cases gastrointestinal bleeding.",
    "Ibuprofen is classified as a nonsteroidal anti-inflammatory drug used for pain and fever reduction.",
    "The chemical formula of ibuprofen is C13H18O2 with a molecular weight of 206.28 g/mol.",
    "Patients with kidney disease should consult their doctor before taking ibuprofen."
};

foreach (string doc in docs)
    rag.ImportText(doc, "KnowledgeBase", "medical-docs");

// ──────────────────────────────────────
// 4. Query with reranking active
// ──────────────────────────────────────
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.2f);

Console.WriteLine("Retrieved passages (with reranking):");
foreach (var match in matches)
    Console.WriteLine($"  score={match.Similarity:F3}  \"{match.Payload.Content.Substring(0, Math.Min(80, match.Payload.Content.Length))}...\"");

// ──────────────────────────────────────
// 5. Generate answer from reranked context
// ──────────────────────────────────────
var chat = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "Answer the question using only the provided context.",
    MaximumCompletionTokens = 256
};

var result = rag.QueryPartitions(query, matches, chat);
Console.WriteLine($"\nAnswer: {result.Completion}");

Step 4: Tuning the Reranked Alpha

The rerankedAlpha parameter controls how much influence the reranker has on the final score. The formula is:

final_score = (alpha x original_similarity) + ((1 - alpha) x reranker_score)
// Conservative: trust embeddings more
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.3f);

// Aggressive: trust reranker more (better for domain-specific queries)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.8f);
Alpha Behavior Best For
0.0 Original similarity only (reranker disabled) Baseline comparison
0.3 Slight reranking boost General-purpose, high-quality embeddings
0.5 Equal blend (default) Balanced starting point
0.7 Reranker-weighted Domain-specific knowledge bases
1.0 Reranker score only Maximum precision, ignore embedding similarity

Start with the default (0.5) and adjust based on your evaluation results. For domain-specific corpora with overlapping terminology, values in the 0.6 to 0.8 range typically produce the best results.


Step 5: Reranking with PartitionSimilarity Objects

For advanced scenarios, you can rerank PartitionSimilarity objects directly. This is useful when you want to rerank results from FindMatchingPartitions before passing them to a custom generation step.

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });

// Retrieve results without reranking
rag.Reranker = null;
var matches = rag.FindMatchingPartitions(query, topK: 10, minScore: 0.1f);

// Rerank manually with a specific alpha
var reranker = new Reranker(embeddingModel);
reranker.Rerank(query, matches, rerankedAlpha: 0.7f);

// Now 'matches' are re-sorted by blended score
foreach (var match in matches)
    Console.WriteLine($"  score={match.Similarity:F3}  {match.Payload.Content.Substring(0, Math.Min(60, match.Payload.Content.Length))}...");

Common Issues

Problem Cause Fix
Reranking makes results worse Alpha too high for your data Lower alpha to 0.3 and increase gradually
No improvement over baseline Passages already well-ranked by embeddings Reranking helps most when top results are ambiguous
Slow retrieval Reranker runs cross-encoder on every result Reduce topK to limit the number of passages reranked
Scores all near zero NormalizeScore disabled Set reranker.NormalizeScore = true (default)

Next Steps