Class TextChunking

Namespace: LMKit.Retrieval

Assembly: LM-Kit.NET.dll

Implements a recursive chunking strategy for partitioning text into manageable segments, known as "chunks," to support retrieval-augmented generation tasks.
This approach is particularly effective for processing extensive texts, systematically breaking them down into smaller segments that are easier to handle.
Unlike linear chunking methods that sequentially divide text, this recursive strategy dynamically adjusts the segmentation process based on the complexity and structure of the text.
This allows for more nuanced and efficient handling of text data, especially when dealing with nested or hierarchical information.

public class TextChunking

Inheritance: object

TextChunking

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Fields

KeepSpacings: Determines whether the system preserves multiple consecutive spaces and maintains the original text layout.

Properties

MaxChunkSize: Gets or sets the maximum number of tokens that each text chunk can contain.
This property determines the size of the chunks into which the text is divided.

MaxOverlapSize: Gets or sets the maximum number of tokens to be duplicated (overlapped) between consecutive text chunks. This overlap ensures that context is not lost at the boundaries between chunks. It aids in maintaining the continuity of the text across chunks, especially important for cohesive text analysis and generation.

Table of Contents

Class TextChunking

Fields

Properties