Table of Contents

Class TextChunking

Namespace
LMKit.Retrieval
Assembly
LM-Kit.NET.dll

Implements a recursive chunking strategy for partitioning text into manageable segments, known as "chunks," to support retrieval-augmented generation tasks.
This approach is particularly effective for processing extensive texts, systematically breaking them down into smaller segments that are easier to handle.
Unlike linear chunking methods that sequentially divide text, this recursive strategy dynamically adjusts the segmentation process based on the complexity and structure of the text.
This allows for more nuanced and efficient handling of text data, especially when dealing with nested or hierarchical information.

public class TextChunking
Inheritance
TextChunking
Inherited Members

Fields

KeepSpacings

Determines whether the system preserves multiple consecutive spaces and maintains the original text layout.

Properties

MaxChunkSize

Gets or sets the maximum number of tokens that each text chunk can contain.
This property determines the size of the chunks into which the text is divided.

MaxOverlapSize

Gets or sets the maximum number of tokens to be duplicated (overlapped) between consecutive text chunks. This overlap ensures that context is not lost at the boundaries between chunks. It aids in maintaining the continuity of the text across chunks, especially important for cohesive text analysis and generation.