Table of Contents

Method ImportText

Namespace
LMKit.Retrieval
Assembly
LM-Kit.NET.dll

ImportText(string, TextChunking, string, string, CancellationToken)

Imports text data into a specified DataSource object, creating a new or updating an existing Section entry.

public DataSource ImportText(string data, TextChunking textChunking, string dataSourceIdentifier, string sectionIdentifier = "default", CancellationToken cancellationToken = default)

Parameters

data string

The textual data to be imported. This data will be segmented into text partitions and added to a new Section within the specified DataSource object.

textChunking TextChunking

A TextChunking object specifying the text chunking strategy to be used by the engine.

dataSourceIdentifier string

The unique identifier for the DataSource. If this identifier matches an existing DataSource, the data is added to a new section within it. If no matching identifier is found, a new DataSource is created.

sectionIdentifier string

Optional. The identifier for the new Section within the DataSource where the data will be imported. The default value is 'default'.

cancellationToken CancellationToken

Optional. A CancellationToken that can be used to signal cancellation of the import operation.

Returns

DataSource

The DataSource object into which the data has been imported.

Remarks

The hierarchy of data structures is organized as follows: a DataSource contains a collection of Section objects, each of which can hold a collection of TextPartition instances.

Each import operation can target a new or existing data source, depending on whether the specified dataSourceIdentifier matches an existing DataSource element within the DataSources property.

If the import operation targets an existing data source that contains a section with the same identifier as dataSourceIdentifier, the imported text data will be appended to that Section. Otherwise, a new Section is created with the provided identifier.

Exceptions

ArgumentNullException

Thrown when either the 'data' or 'dataSourceIdentifier' parameter is null or empty.

OperationCanceledException

Thrown when the operation is cancelled based on the CancellationToken.

ImportText(IList<string>, TextChunking, string, IList<string>, CancellationToken)

Imports an array of text data into a specified DataSource object, dynamically creating new Section entries for each item in the array.
This method is ideal for importing multipage documents where each page is treated as a separate section.

public DataSource ImportText(IList<string> data, TextChunking textChunking, string dataSourceIdentifier, IList<string> sectionIdentifiers, CancellationToken cancellationToken = default)

Parameters

data IList<string>

The array of text strings to be imported. Each string can represent a page of a document.

textChunking TextChunking

A TextChunking object specifying the text chunking strategy to be used by the engine.

dataSourceIdentifier string

A unique identifier for the DataSource. If an existing DataSource matches this identifier, the new data is added within it; otherwise, a new DataSource is created.

sectionIdentifiers IList<string>

A list of identifiers for the new Sections to be created within the DataSource. Each identifier can correspond to a page of text from the 'data' array.

cancellationToken CancellationToken

Optional. A CancellationToken that can be used to cancel the import operation. Default is None, indicating no cancellation is requested.

Returns

DataSource

The updated or newly created DataSource object into which the data has been imported.

Remarks

This method facilitates structured import of text data by mapping each text entry in 'data' to a separate Section within the DataSource, allowing for organized data handling especially useful in scenarios like document processing where each document page is a distinct section.

If the specified 'dataSourceIdentifier' matches an existing DataSource, the method appends the new sections to it. If no match is found, a new DataSource is created. If a match is found for any 'sectionIdentifier' within an existing DataSource, the corresponding text data is appended to that section; otherwise, a new section is created for each identifier.

Exceptions

ArgumentNullException

Thrown if the 'data' or 'dataSourceIdentifier' is null or empty.

ArgumentOutOfRangeException

Thrown if 'sectionIdentifiers' contains more identifiers than there are pages of data, or if it contains duplicate identifiers.

OperationCanceledException

Thrown if the operation is cancelled due to a timeout, as indicated by the CancellationToken.