Improve RAG Results with Reranking
A basic RAG pipeline retrieves documents by embedding similarity, but embedding similarity is not always the best measure of relevance. A document might be semantically close to the query yet not actually answer the question. Reranking applies a cross-encoder model that scores each retrieved passage against the query with higher precision, then blends that score with the original similarity score. LM-Kit.NET provides both a standalone Reranker class and an integrated RagReranker that plugs directly into the RAG pipeline.
Why This Matters
Two enterprise problems that reranking solves:
- Reducing hallucinations from irrelevant passages. Without reranking, the top-retrieved chunk might be thematically related but factually irrelevant, causing the LLM to hallucinate an answer based on the wrong context. A reranker catches these false positives by evaluating each passage against the query with a cross-encoder that considers both inputs simultaneously.
- Improving precision in domain-specific knowledge bases. Medical, legal, and financial documents contain many passages with overlapping terminology. Reranking distinguishes the passage that actually answers the question from ones that merely contain similar vocabulary.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| RAM | 16 GB recommended |
| VRAM | 6 GB (for embedding + chat models) |
| Disk | ~4 GB free for model downloads |
Step 1: Create the Project
dotnet new console -n RerankingQuickstart
cd RerankingQuickstart
dotnet add package LM-Kit.NET
Step 2: Standalone Reranking
The Reranker class scores documents against a query independently of the RAG pipeline. This is useful for evaluating passage relevance, sorting search results, or building custom retrieval logic.
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create reranker
// ──────────────────────────────────────
var reranker = new Reranker(embeddingModel);
// ──────────────────────────────────────
// 3. Score individual documents against a query
// ──────────────────────────────────────
string query = "What are the side effects of ibuprofen?";
string[] documents =
{
"Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.",
"Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.",
"Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.",
"The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours."
};
Console.WriteLine($"Query: \"{query}\"\n");
float[] scores = reranker.GetScore(query, documents);
// Sort by relevance score
var ranked = documents
.Zip(scores, (doc, score) => new { Document = doc, Score = score })
.OrderByDescending(x => x.Score)
.ToList();
Console.WriteLine("Reranked results:");
for (int i = 0; i < ranked.Count; i++)
{
Console.WriteLine($" #{i + 1} (score: {ranked[i].Score:F4})");
Console.WriteLine($" {ranked[i].Document}\n");
}
Run it:
dotnet run
Example output:
Loading embedding model...
Loading: 100% Done.
Query: "What are the side effects of ibuprofen?"
Reranked results:
#1 (score: 0.9234)
Ibuprofen may cause stomach pain, nausea, and dizziness. In rare cases, it can lead to gastrointestinal bleeding.
#2 (score: 0.6812)
Acetaminophen is an alternative to ibuprofen for pain relief, with fewer gastrointestinal side effects.
#3 (score: 0.5147)
The recommended dosage of ibuprofen for adults is 200 to 400 mg every 4 to 6 hours.
#4 (score: 0.4203)
Ibuprofen is a nonsteroidal anti-inflammatory drug (NSAID) commonly used to reduce fever and treat pain.
Notice that the passage about side effects ranks first, while the general definition of ibuprofen ranks last despite containing the keyword "ibuprofen" prominently.
Step 3: Integrating Reranking into a RAG Pipeline
The RagEngine.RagReranker class plugs reranking directly into the RAG pipeline. When set, the engine automatically reranks retrieved partitions before passing them to the LLM for answer generation.
using LMKit.Data;
using LMKit.Retrieval;
using LMKit.TextGeneration;
// ──────────────────────────────────────
// 1. Load chat model
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Set up RAG with reranking
// ──────────────────────────────────────
var dataSource = DataSource.CreateInMemoryDataSource("KnowledgeBase", embeddingModel);
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);
// Enable reranking with alpha=0.7 (favor reranker score)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);
// ──────────────────────────────────────
// 3. Index documents
// ──────────────────────────────────────
string[] docs =
{
"Ibuprofen side effects include stomach pain, nausea, dizziness, and in rare cases gastrointestinal bleeding.",
"Ibuprofen is classified as a nonsteroidal anti-inflammatory drug used for pain and fever reduction.",
"The chemical formula of ibuprofen is C13H18O2 with a molecular weight of 206.28 g/mol.",
"Patients with kidney disease should consult their doctor before taking ibuprofen."
};
foreach (string doc in docs)
rag.ImportText(doc, "KnowledgeBase", "medical-docs");
// ──────────────────────────────────────
// 4. Query with reranking active
// ──────────────────────────────────────
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.2f);
Console.WriteLine("Retrieved passages (with reranking):");
foreach (var match in matches)
Console.WriteLine($" score={match.Similarity:F3} \"{match.Payload.Content.Substring(0, Math.Min(80, match.Payload.Content.Length))}...\"");
// ──────────────────────────────────────
// 5. Generate answer from reranked context
// ──────────────────────────────────────
var chat = new SingleTurnConversation(chatModel)
{
SystemPrompt = "Answer the question using only the provided context.",
MaximumCompletionTokens = 256
};
var result = rag.QueryPartitions(query, matches, chat);
Console.WriteLine($"\nAnswer: {result.Completion}");
Step 4: Tuning the Reranked Alpha
The rerankedAlpha parameter controls how much influence the reranker has on the final score. The formula is:
final_score = (alpha x original_similarity) + ((1 - alpha) x reranker_score)
// Conservative: trust embeddings more
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.3f);
// Aggressive: trust reranker more (better for domain-specific queries)
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.8f);
| Alpha | Behavior | Best For |
|---|---|---|
| 0.0 | Original similarity only (reranker disabled) | Baseline comparison |
| 0.3 | Slight reranking boost | General-purpose, high-quality embeddings |
| 0.5 | Equal blend (default) | Balanced starting point |
| 0.7 | Reranker-weighted | Domain-specific knowledge bases |
| 1.0 | Reranker score only | Maximum precision, ignore embedding similarity |
Start with the default (0.5) and adjust based on your evaluation results. For domain-specific corpora with overlapping terminology, values in the 0.6 to 0.8 range typically produce the best results.
Step 5: Reranking with PartitionSimilarity Objects
For advanced scenarios, you can rerank PartitionSimilarity objects directly. This is useful when you want to rerank results from FindMatchingPartitions before passing them to a custom generation step.
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load embedding model (same model for embeddings + reranking)
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
// Retrieve results without reranking
rag.Reranker = null;
var matches = rag.FindMatchingPartitions(query, topK: 10, minScore: 0.1f);
// Rerank manually with a specific alpha
var reranker = new Reranker(embeddingModel);
reranker.Rerank(query, matches, rerankedAlpha: 0.7f);
// Now 'matches' are re-sorted by blended score
foreach (var match in matches)
Console.WriteLine($" score={match.Similarity:F3} {match.Payload.Content.Substring(0, Math.Min(60, match.Payload.Content.Length))}...");
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Reranking makes results worse | Alpha too high for your data | Lower alpha to 0.3 and increase gradually |
| No improvement over baseline | Passages already well-ranked by embeddings | Reranking helps most when top results are ambiguous |
| Slow retrieval | Reranker runs cross-encoder on every result | Reduce topK to limit the number of passages reranked |
| Scores all near zero | NormalizeScore disabled |
Set reranker.NormalizeScore = true (default) |
Next Steps
- Build a RAG Pipeline Over Your Own Documents: full RAG pipeline with indexing, search, and answer generation.
- Build Semantic Search with Embeddings: embedding fundamentals and similarity computation.
- Build a Private Document Q&A System: PDF Q&A with source references.
- Chat with PDF Documents: high-level PDF chat API with built-in reranking support.