Table of Contents

Diversify and Filter RAG Results

When your knowledge base contains many similar passages, the top-K retrieved results can be dominated by near-duplicates from the same section of a document. Maximal Marginal Relevance (MMR) solves this by balancing relevance and diversity, ensuring each retrieved passage adds new information. Combined with metadata filtering via DataFilter, you can scope retrieval to specific data sources or sections, building multi-tenant or context-aware RAG systems.

This tutorial shows how to tune MMR, configure metadata filtering, and control the context window for production-quality retrieval.


Why This Matters

Two enterprise problems that diversity and filtering solve:

  1. Redundant passages wasting context budget. A technical manual describes the same procedure in an overview section, a step-by-step guide, and a troubleshooting FAQ. Without diversity filtering, all three near-identical passages are retrieved, consuming the LLM's context window with redundant information while crowding out other relevant content.
  2. Multi-tenant knowledge bases. A SaaS platform serves multiple customers, each with their own documents in the same RagEngine. Without metadata filtering, a query from Customer A might retrieve passages from Customer B's documents. DataFilter scopes retrieval to the correct tenant.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
RAM 16 GB recommended
VRAM 6 GB (for both models simultaneously)
Disk ~4 GB free for model downloads

You should be familiar with the foundational RAG pipeline before starting this tutorial.


Step 1: Create the Project

dotnet new console -n DiversityFilterQuickstart
cd DiversityFilterQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand MMR

Standard top-K:                    MMR top-K (lambda=0.7):

  1. "Reset via Settings > Reset"    1. "Reset via Settings > Reset"
  2. "Go to Settings, tap Reset"     2. "Reset by holding power 10s"     ← new info
  3. "Open Settings and hit Reset"   3. "Factory reset erases all data"  ← new info
     (all say the same thing)           (diverse, complementary passages)

MMR re-ranks candidates using this formula:

MMR(d) = lambda × Similarity(query, d) − (1 − lambda) × max(Similarity(d, selected))
  • lambda = 1.0: pure relevance, no diversity (default, MMR disabled)
  • lambda = 0.0: pure diversity, ignores relevance
  • lambda = 0.7: recommended balance for most use cases

Step 3: Enable MMR Diversity

using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Build RAG engine with intentionally redundant content
// ──────────────────────────────────────
var dataSource = DataSource.CreateInMemoryDataSource("KnowledgeBase", embeddingModel);
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);

// These passages describe the same procedure in slightly different ways
string[] docs =
{
    "To factory reset the device, navigate to Settings > System > Reset and tap 'Erase All Data'.",
    "Open the Settings app, go to System, select Reset, and choose 'Erase All Data' to restore factory defaults.",
    "A factory reset can also be performed by holding the power button and volume-down for 10 seconds until the LED flashes.",
    "Warning: a factory reset permanently erases all user data, accounts, and installed applications.",
    "After a factory reset, the device boots into the initial setup wizard. Have your account credentials ready."
};

foreach (string doc in docs)
    rag.ImportText(doc, "KnowledgeBase", "device-manual");

// ──────────────────────────────────────
// 3. Enable MMR with lambda = 0.7
// ──────────────────────────────────────
rag.MmrLambda = 0.7f;  // Balance relevance (0.7) and diversity (0.3)

string query = "How do I factory reset?";
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.2f);

Console.WriteLine($"Query: \"{query}\" (MMR lambda=0.7)\n");
Console.WriteLine("Diverse results:");
foreach (var m in matches)
{
    string preview = m.Payload.Content.Substring(0, Math.Min(90, m.Payload.Content.Length));
    Console.WriteLine($"  score={m.Similarity:F3}  {preview}...");
}

Instead of three near-identical "go to Settings > Reset" passages, MMR returns one settings-based method, one hardware-button method, and the warning about data loss. Each passage adds unique information.


Step 4: Tune the Lambda Parameter

// Pure relevance, no diversity (default behavior)
rag.MmrLambda = 1.0f;

// Balanced: relevance-weighted diversity (recommended starting point)
rag.MmrLambda = 0.7f;

// Strong diversity: useful for brainstorming or covering all angles
rag.MmrLambda = 0.5f;

// Maximum diversity: risks pushing less relevant passages into results
rag.MmrLambda = 0.3f;
Lambda Behavior Best For
1.0 Disabled (pure relevance) When passages are already diverse, or when precision is critical
0.7 Slight diversity boost General-purpose, manuals with overlapping content
0.5 Equal balance Exploratory queries, covering multiple perspectives
0.3 Strong diversity Research, brainstorming, broad topic surveys

Start with 0.7 and adjust based on how much redundancy you observe in your results.


Step 5: Filter by Data Source and Section

DataFilter restricts which data sources and sections are searched. This is essential for multi-tenant systems, role-based access, or context-scoped queries.

// ──────────────────────────────────────
// Multi-tenant setup: each customer has a separate DataSource
// ──────────────────────────────────────
var customerA = DataSource.CreateInMemoryDataSource("CustomerA", embeddingModel);
var customerB = DataSource.CreateInMemoryDataSource("CustomerB", embeddingModel);

rag.AddDataSource(customerA);
rag.AddDataSource(customerB);

rag.ImportText("Customer A's product ships with a 2-year warranty.", "CustomerA", "policies");
rag.ImportText("Customer B's product includes free lifetime support.", "CustomerB", "policies");

// Scope retrieval to Customer A only
// DataFilter delegates return true to EXCLUDE a data source or section
rag.Filter = new DataFilter(
    dataSourceFilter: ds => ds.Identifier != "CustomerA"  // exclude everything except CustomerA
);

var matches = rag.FindMatchingPartitions("What is the warranty policy?", topK: 3);
// Only returns results from CustomerA

Filter by Section

Restrict retrieval to specific sections within a data source:

// Only search the "troubleshooting" and "faq" sections
rag.Filter = new DataFilter(
    sectionFilter: section => section.Identifier != "troubleshooting" &&
                              section.Identifier != "faq"  // exclude everything else
);

Combined Filters

Apply both data source and section filters simultaneously:

string currentTenant = "CustomerA";
string[] allowedSections = { "policies", "faq" };

rag.Filter = new DataFilter(
    dataSourceFilter: ds => ds.Identifier != currentTenant,
    sectionFilter: section => !allowedSections.Contains(section.Identifier)
);

Remove Filters

Clear all filters to search the entire knowledge base:

rag.Filter = null;

Step 6: Control the Context Window

The context window determines how many tokens of retrieved content are passed to the LLM. This interacts with MMR and filtering.

// Limit retrieved context to 2048 tokens
rag.ContextWindow = 2048;

// Or limit by character count
rag.MaxContextWindowCharacters = 8000;
Setting Effect
ContextWindow Maximum tokens of retrieved context injected into the prompt.
MaxContextWindowCharacters Maximum characters (useful when token counting is not needed).

When combined with MMR, a well-tuned context window ensures that diverse, non-redundant passages fill the available budget. Without MMR, redundant passages waste context space.


Step 7: Use Unique Section Constraints

For a quick way to ensure diversity across document sections, use the forceUniqueSection parameter:

var matches = rag.FindMatchingPartitions(
    query,
    topK: 5,
    minScore: 0.3f,
    forceUniqueSection: true  // at most one result per section
);

This is a coarser mechanism than MMR. It guarantees structural diversity (one passage per section) rather than semantic diversity. Use it when your sections represent distinct documents or topics.


Step 8: Combine Everything

A production-quality retrieval pipeline combining all techniques:

// Hybrid search for broad recall
rag.RetrievalStrategy = new HybridRetrievalStrategy();

// MMR for diversity
rag.MmrLambda = 0.7f;

// Reranking for precision
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);

// Scope to current tenant
rag.Filter = new DataFilter(
    dataSourceFilter: ds => ds.Identifier != currentTenant
);

// Context budget
rag.ContextWindow = 4096;

// Wrap in conversational interface with query expansion
using var ragChat = new RagChat(rag, chatModel)
{
    QueryGenerationMode = QueryGenerationMode.MultiQuery,
    MaxRetrievedPartitions = 5,
    MinRelevanceScore = 0.3f
};

The full pipeline: Multi-Query expansion (capture varied phrasings) into Hybrid search (semantic + keyword) into MMR diversity (remove redundancy) into Reranking (precision ordering), all scoped by DataFilter (tenant isolation).


Common Issues

Problem Cause Fix
MMR makes results less relevant Lambda too low Increase MmrLambda toward 0.8 or 0.9
Filter returns no results Filter delegates inverted (returning true for wanted items) Remember: true means exclude. Return true for items you want to skip.
Context window too small Important passages truncated Increase ContextWindow or reduce MaxChunkSize in chunking
forceUniqueSection too restrictive Only one result per section, missing good matches Use MMR instead for semantic diversity without structural constraints

Next Steps

Share