Method ImportText
ImportText(string, TextChunking, string, string, CancellationToken)
Imports text data into a specified DataSource object, creating a new or updating an existing Section entry.
public DataSource ImportText(string data, TextChunking textChunking, string dataSourceIdentifier, string sectionIdentifier = "default", CancellationToken cancellationToken = default)
Parameters
data
stringThe textual data to be imported. This data will be segmented into text partitions and added to a new Section within the specified DataSource object.
textChunking
TextChunkingA TextChunking object specifying the text chunking strategy to be used by the engine.
dataSourceIdentifier
stringThe unique identifier for the DataSource. If this identifier matches an existing DataSource, the data is added to a new section within it. If no matching identifier is found, a new DataSource is created.
sectionIdentifier
stringOptional. The identifier for the new Section within the DataSource where the data will be imported. The default value is 'default'.
cancellationToken
CancellationTokenOptional. A CancellationToken that can be used to signal cancellation of the import operation.
Returns
- DataSource
The DataSource object into which the data has been imported.
Remarks
The hierarchy of data structures is organized as follows: a DataSource contains a collection of Section objects, each of which can hold a collection of TextPartition instances.
Each import operation can target a new or existing data source, depending on whether the specified dataSourceIdentifier matches an existing DataSource element within the DataSources property.
If the import operation targets an existing data source that contains a section with the same identifier as dataSourceIdentifier, the imported text data will be appended to that Section. Otherwise, a new Section is created with the provided identifier.
Exceptions
- ArgumentNullException
Thrown when either the 'data' or 'dataSourceIdentifier' parameter is null or empty.
- OperationCanceledException
Thrown when the operation is cancelled based on the CancellationToken.
ImportText(IList<string>, TextChunking, string, IList<string>, CancellationToken)
Imports an array of text data into a specified DataSource object, dynamically creating new Section entries for each item in the array.
This method is ideal for importing multipage documents where each page is treated as a separate section.
public DataSource ImportText(IList<string> data, TextChunking textChunking, string dataSourceIdentifier, IList<string> sectionIdentifiers, CancellationToken cancellationToken = default)
Parameters
data
IList<string>The array of text strings to be imported. Each string can represent a page of a document.
textChunking
TextChunkingA TextChunking object specifying the text chunking strategy to be used by the engine.
dataSourceIdentifier
stringA unique identifier for the DataSource. If an existing DataSource matches this identifier, the new data is added within it; otherwise, a new DataSource is created.
sectionIdentifiers
IList<string>A list of identifiers for the new Sections to be created within the DataSource. Each identifier can correspond to a page of text from the 'data' array.
cancellationToken
CancellationTokenOptional. A CancellationToken that can be used to cancel the import operation. Default is None, indicating no cancellation is requested.
Returns
- DataSource
The updated or newly created DataSource object into which the data has been imported.
Remarks
This method facilitates structured import of text data by mapping each text entry in 'data' to a separate Section within the DataSource, allowing for organized data handling especially useful in scenarios like document processing where each document page is a distinct section.
If the specified 'dataSourceIdentifier' matches an existing DataSource, the method appends the new sections to it. If no match is found, a new DataSource is created. If a match is found for any 'sectionIdentifier' within an existing DataSource, the corresponding text data is appended to that section; otherwise, a new section is created for each identifier.
Exceptions
- ArgumentNullException
Thrown if the 'data' or 'dataSourceIdentifier' is null or empty.
- ArgumentOutOfRangeException
Thrown if 'sectionIdentifiers' contains more identifiers than there are pages of data, or if it contains duplicate identifiers.
- OperationCanceledException
Thrown if the operation is cancelled due to a timeout, as indicated by the CancellationToken.