Table of Contents

Class Bm25RetrievalStrategy

Namespace
LMKit.Retrieval.Bm25
Assembly
LM-Kit.NET.dll

A retrieval strategy that scores partitions using the BM25+ ranking function with proximity-aware boosting, measuring lexical relevance based on term frequency, document length, and query term proximity.

public sealed class Bm25RetrievalStrategy : IRetrievalStrategy
Inheritance
Bm25RetrievalStrategy
Implements
Inherited Members

Examples

using LMKit.Model;
using LMKit.Retrieval;
using LMKit.Retrieval.Bm25;

LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
RagEngine ragEngine = new RagEngine(embeddingModel);

// Use pure BM25 keyword retrieval.
ragEngine.RetrievalStrategy = new Bm25RetrievalStrategy
{
    K1 = 1.5f,
    B = 0.6f,
    Language = Language.English
};

ragEngine.ImportText("The quick brown fox jumps over the lazy dog.", "docs", "animals");
var results = await ragEngine.QueryAsync("brown fox");

Remarks

BM25+ excels at exact keyword matching, complementing vector search which captures semantic similarity. An inverted index is built lazily on the first query and rebuilt automatically when the underlying data changes.

Only TextPartition instances are indexed; image partitions are skipped because they contain no textual content for lexical matching.

The BM25+ variant adds a configurable Delta floor to the term frequency component, preventing excessive penalization of long documents. When multiple query terms appear close together in a document, the ProximityWeight parameter controls how much this phrase-like co-occurrence boosts the score.

Set the Language property to match the language of your corpus. This selects the appropriate stopword list for filtering high-frequency function words and controls whether suffix stemming is applied during tokenization. The default is English.

Scores are normalized to [0, 1] via sigmoid scaling so that the standard minScore threshold works consistently.

Fields

DefaultB

The default value for the B parameter.

DefaultDelta

The default value for the Delta parameter.

DefaultK1

The default value for the K1 parameter.

DefaultProximityWeight

The default value for the ProximityWeight parameter.

Properties

B

Gets or sets the BM25 length normalization parameter.

CustomStopWords

Gets or sets custom stopwords to filter during tokenization in addition to the language-specific stopwords selected by Language.

Delta

Gets or sets the BM25+ lower-bound delta applied to the term frequency component.

K1

Gets or sets the BM25 term saturation parameter.

Language

Gets or sets the language used for stopword filtering and stemming during BM25 tokenization.

ProximityWeight

Gets or sets the weight applied to the proximity boosting factor.

RequiresQueryVector

Gets a value indicating whether the strategy requires a query embedding vector.

Methods

InvalidateIndex()

Explicitly invalidates the cached BM25 index, forcing a rebuild on the next query.

RetrieveAsync(IReadOnlyList<DataSource>, string, float[], int, float, bool, bool, DataFilter, CancellationToken)

Retrieves matching partitions from the given data sources.

See Also

Share