Improve RAG Results with Reranking

A basic RAG pipeline retrieves documents by embedding similarity, but embedding similarity is not always the best measure of relevance. A document might be semantically close to the query yet not actually answer the question. Reranking applies a cross-encoder model that scores each retrieved passage against the query with higher precision, then blends that score with the original similarity score. LM-Kit.NET provides both a standalone Reranker class and an integrated RagReranker that plugs directly into the RAG pipeline.

Why This Matters

Two enterprise problems that reranking solves:

Reducing hallucinations from irrelevant passages. Without reranking, the top-retrieved chunk might be thematically related but factually irrelevant, causing the LLM to hallucinate an answer based on the wrong context. A reranker catches these false positives by evaluating each passage against the query with a cross-encoder that considers both inputs simultaneously.
Improving precision in domain-specific knowledge bases. Medical, legal, and financial documents contain many passages with overlapping terminology. Reranking distinguishes the passage that actually answers the question from ones that merely contain similar vocabulary.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
RAM	16 GB recommended
VRAM	6 GB (for embedding + chat models)
Disk	~4 GB free for model downloads

Step 1: Create the Project

dotnet new console -n RerankingQuickstart
cd RerankingQuickstart
dotnet add package LM-Kit.NET

Step 2: Standalone Reranking

The Reranker class scores documents against a query independently of the RAG pipeline. This is useful for evaluating passage relevance, sorting search results, or building custom retrieval logic.

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create reranker
// ──────────────────────────────────────
var reranker = new Reranker(embeddingModel);

// ──────────────────────────────────────
// 3. Score individual documents against a query
// ──────────────────────────────────────
string query = "What are the side effects of ibuprofen?";

string[] documents =
{
    "Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.",
    "Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.",
    "Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.",
    "The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours."
};

Console.WriteLine($"Query: \"{query}\"\n");

float[] scores = reranker.GetScore(query, documents);

// Sort by relevance score
var ranked = documents
    .Zip(scores, (doc, score) => new { Document = doc, Score = score })
    .OrderByDescending(x => x.Score)
    .ToList();

Console.WriteLine("Reranked results:");
for (int i = 0; i < ranked.Count; i++)
{
    Console.WriteLine($"  #{i + 1} (score: {ranked[i].Score:F4})");
    Console.WriteLine($"     {ranked[i].Document}\n");
}

Run it:

dotnet run

Example output:

Loading embedding model...
  Loading: 100%    Done.

Query: "What are the side effects of ibuprofen?"

Reranked results:
  #1 (score: 0.9234)
     Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.

  #2 (score: 0.6812)
     Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.

  #3 (score: 0.5147)
     The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours.

  #4 (score: 0.4203)
     Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.

Notice that the passage about side effects ranks first, while the general definition of ibuprofen ranks last despite containing the keyword "ibuprofen" prominently.

Step 3: Integrating Reranking into a RAG Pipeline

The RagEngine.RagReranker class plugs reranking directly into the RAG pipeline. When set, the engine automatically reranks retrieved partitions before passing them to the LLM for answer generation.

using LMKit.Data;
using LMKit.Retrieval;
using LMKit.TextGeneration;

// ──────────────────────────────────────
// 1. Load chat model
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Set up RAG with reranking
// ──────────────────────────────────────
var dataSource = DataSource.CreateInMemoryDataSource("KnowledgeBase", embeddingModel);
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);

// Enable reranking with alpha=0.7 (favor reranker score)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);

// ──────────────────────────────────────
// 3. Index documents
// ──────────────────────────────────────
string[] docs =
{
    "Ibuprofen side effects include stomach pain, nausea, dizziness, and in rare cases gastrointestinal bleeding.",
    "Ibuprofen is classified as a nonsteroidal anti-inflammatory drug used for pain and fever reduction.",
    "The chemical formula of ibuprofen is C13H18O2 with a molecular weight of 206.28 g/mol.",
    "Patients with kidney disease should consult their doctor before taking ibuprofen."
};

foreach (string doc in docs)
    rag.ImportText(doc, "KnowledgeBase", "medical-docs");

// ──────────────────────────────────────
// 4. Query with reranking active
// ──────────────────────────────────────
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.2f);

Console.WriteLine("Retrieved passages (with reranking):");
foreach (var match in matches)
    Console.WriteLine($"  score={match.Similarity:F3}  \"{match.Payload.Content.Substring(0, Math.Min(80, match.Payload.Content.Length))}...\"");

// ──────────────────────────────────────
// 5. Generate answer from reranked context
// ──────────────────────────────────────
var chat = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "Answer the question using only the provided context.",
    MaximumCompletionTokens = 256
};

var result = rag.QueryPartitions(query, matches, chat);
Console.WriteLine($"\nAnswer: {result.Completion}");

Step 4: Tuning the Reranked Alpha

The rerankedAlpha parameter controls how much influence the reranker has on the final score. The formula is:

final_score = (alpha x original_similarity) + ((1 - alpha) x reranker_score)

// Conservative: trust embeddings more
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.3f);

// Aggressive: trust reranker more (better for domain-specific queries)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.8f);

Alpha	Behavior	Best For
0.0	Original similarity only (reranker disabled)	Baseline comparison
0.3	Slight reranking boost	General-purpose, high-quality embeddings
0.5	Equal blend (default)	Balanced starting point
0.7	Reranker-weighted	Domain-specific knowledge bases
1.0	Reranker score only	Maximum precision, ignore embedding similarity

Start with the default (0.5) and adjust based on your evaluation results. For domain-specific corpora with overlapping terminology, values in the 0.6 to 0.8 range typically produce the best results.

Step 5: Reranking with PartitionSimilarity Objects

For advanced scenarios, you can rerank PartitionSimilarity objects directly. This is useful when you want to rerank results from FindMatchingPartitions before passing them to a custom generation step.

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });

// Retrieve results without reranking
rag.Reranker = null;
var matches = rag.FindMatchingPartitions(query, topK: 10, minScore: 0.1f);

// Rerank manually with a specific alpha
var reranker = new Reranker(embeddingModel);
reranker.Rerank(query, matches, rerankedAlpha: 0.7f);

// Now 'matches' are re-sorted by blended score
foreach (var match in matches)
    Console.WriteLine($"  score={match.Similarity:F3}  {match.Payload.Content.Substring(0, Math.Min(60, match.Payload.Content.Length))}...");

Common Issues

Problem	Cause	Fix
Reranking makes results worse	Alpha too high for your data	Lower alpha to 0.3 and increase gradually
No improvement over baseline	Passages already well-ranked by embeddings	Reranking helps most when top results are ambiguous
Slow retrieval	Reranker runs cross-encoder on every result	Reduce `topK` to limit the number of passages reranked
Scores all near zero	`NormalizeScore` disabled	Set `reranker.NormalizeScore = true` (default)

Next Steps

Build a RAG Pipeline Over Your Own Documents: full RAG pipeline with indexing, search, and answer generation.
Boost Retrieval with Hybrid Search: combine reranking with hybrid (vector + BM25) retrieval for maximum quality.
Build Conversational RAG with RagChat: multi-turn conversational interface with built-in reranking support.
Improve Recall with Multi-Query and HyDE Retrieval: expand queries before reranking for broader recall.
Diversify and Filter RAG Results: pair reranking with MMR diversity to eliminate redundant top results.
Optimize RAG with Custom Chunking Strategies: tailor chunking to your content type for better retrieval quality.
Build a Unified Multimodal RAG System: index audio, images, and text in one knowledge base.
Build Semantic Search with Embeddings: embedding fundamentals and similarity computation.
Chat with PDF Documents: high-level PDF chat API with built-in reranking support.
Samples: Conversational RAG: multi-turn RAG with query contextualization and hybrid retrieval.

Table of Contents