Class TextChunking
Implements a recursive chunking strategy for partitioning text into manageable segments, known as "chunks," to support retrieval-augmented generation tasks.
This approach is particularly effective for processing extensive texts, systematically breaking them down into smaller segments that are easier to handle.
Unlike linear chunking methods that sequentially divide text, this recursive strategy dynamically adjusts the segmentation process based on the complexity and structure of the text.
This allows for more nuanced and efficient handling of text data, especially when dealing with nested or hierarchical information.
public class TextChunking
- Inheritance
-
TextChunking
- Inherited Members
Fields
- KeepSpacings
Determines whether the system preserves multiple consecutive spaces and maintains the original text layout.
Properties
- MaxChunkSize
Gets or sets the maximum number of tokens that each text chunk can contain.
This property determines the size of the chunks into which the text is divided.
- MaxOverlapSize
Gets or sets the maximum number of tokens to be duplicated (overlapped) between consecutive text chunks. This overlap ensures that context is not lost at the boundaries between chunks. It aids in maintaining the continuity of the text across chunks, especially important for cohesive text analysis and generation.