Property Language
Language
Gets or sets the language used for stopword filtering and stemming during BM25 tokenization.
public Language Language { get; set; }
Property Value
Examples
using LMKit.TextGeneration;
// Configure BM25 for a French corpus.
var bm25 = new Bm25RetrievalStrategy
{
Language = Language.French
};
ragEngine.RetrievalStrategy = bm25;
Remarks
Each language has a curated stopword list that filters high-frequency function words (e.g., "the", "and" for English; "le", "la", "les" for French). Setting Undefined disables language-specific stopword filtering entirely.
Suffix stemming is currently implemented for English only. For all other languages tokens are returned in their lowercased form without morphological normalization. Additional stemmers can be added in the future without changing the API.
Changing this property invalidates the cached index, forcing a full rebuild on the next query.