Table of Contents

Property Language

Namespace
LMKit.Retrieval.Bm25
Assembly
LM-Kit.NET.dll

Language

Gets or sets the language used for stopword filtering and stemming during BM25 tokenization.

public Language Language { get; set; }

Property Value

Language

A value from the Language enumeration. Default is English.

Examples

using LMKit.TextGeneration;

// Configure BM25 for a French corpus.
var bm25 = new Bm25RetrievalStrategy
{
    Language = Language.French
};

ragEngine.RetrievalStrategy = bm25;

Remarks

Each language has a curated stopword list that filters high-frequency function words (e.g., "the", "and" for English; "le", "la", "les" for French). Setting Undefined disables language-specific stopword filtering entirely.

Suffix stemming is currently implemented for English only. For all other languages tokens are returned in their lowercased form without morphological normalization. Additional stemmers can be added in the future without changing the API.

Changing this property invalidates the cached index, forcing a full rebuild on the next query.

Share