Class DataSource
Represents a repository for various data sources, encapsulating the content from diverse sources such as documents, web pages, and other data-rich environments. This class provides a unified interface to interact with different types of data in a consistent manner.
public sealed class DataSource : ISerializableData, IDisposable
- Inheritance
-
DataSource
- Implements
- Inherited Members
Remarks
In LMKit, the hierarchy of data structures is organized as follows: a DataSource contains a collection of Section objects, each of which can hold a collection of TextPartition instances.
Properties
- EmbeddingSize
Gets the dimension of embedding vectors.
- Identifier
Gets the unique identifier for this instance.
- Metadata
Gets the metadata associated with this instance.
- Sections
Gets a read-only collection of sections that make up this instance.
- StorageMode
Gets the storage mode that indicates how the data source is persisted.
Methods
- Clone()
Creates a deep clone of this DataSource instance.
- CreateFileDataSource(string, string, LM, MetadataCollection)
Creates a new file-backed DataSource instance with the specified identifier and language model. This method creates a new file at the given path and opens it with read-write access. The resulting DataSource is initially created in-memory and then associated with the backing file stream, enabling future updates to be synchronized with the file.
- CreateInMemoryDataSource(string, LM, MetadataCollection)
Creates a new in-memory DataSource instance with the specified identifier and language model. This instance can subsequently be saved to a file using one of the Serialize methods.
- CreateVectorStoreDataSource(IVectorStore, string, LM, MetadataCollection, CancellationToken)
Creates a new DataSource instance that is backed by the specified vector store. This is a synchronous wrapper around the asynchronous CreateVectorStoreDataSourceAsync(IVectorStore, string, LM, MetadataCollection, CancellationToken) method.
- CreateVectorStoreDataSourceAsync(IVectorStore, string, LM, MetadataCollection, CancellationToken)
Asynchronously creates a new DataSource instance that is backed by the specified vector store. This method leverages the vector store to store and retrieve data associated with the data source.
- Deserialize(byte[], LM, CancellationToken)
Deserializes the given binary data into a DataSource instance using the specified model for context.
- Deserialize(Stream, LM, CancellationToken)
Deserializes a DataSource from a stream using the provided model for context.
- Deserialize(string, LM, CancellationToken)
Deserializes the binary data from the specified file path into a DataSource instance using the provided model for context.
- GetSectionByIdentifier(string)
Retrieves a Section object from this instance based on the specified identifier.
- GetSectionByIdentifierAsync(string, CancellationToken)
Asynchronously retrieves a Section object from this instance based on the specified identifier.
- HasSection(string)
Determines whether the data source contains a section with the specified identifier.
- LoadFromFile(string, LM, bool, CancellationToken)
Loads a DataSource from the specified file path by lazily loading its content. This method opens the file in either read-only or read-write mode, and when read-write mode is used, the backing file remains open so that the DataSource is only partially loaded into memory with additional content decoded on demand during the instance's life cycle. This lazy-loading approach is recommended for large data sources.
- LoadFromFileAsync(string, LM, bool, CancellationToken)
Asynchronously loads a DataSource from the specified file path by lazily loading its content. The file is opened in either read-only or read-write mode, and when read-write mode is used, the file stream remains open so that modifications to the DataSource are automatically synchronized back to the file. This lazy-loading design is particularly useful for large data sources where it is inefficient or unnecessary to load the entire content into memory at once.
- LoadFromStore(IVectorStore, string, LM, CancellationToken)
Loads a DataSource from the specified vector store using the provided data source identifier and language model. This is a synchronous wrapper around the asynchronous LoadFromStoreAsync(IVectorStore, string, LM, CancellationToken) method.
- LoadFromStoreAsync(IVectorStore, string, LM, CancellationToken)
Asynchronously loads a DataSource from the specified vector store using the provided data source identifier and language model.
- OptimizeDataSource(string, CancellationToken)
Synchronously optimizes a file-based DataSource by compacting its data and updating its internal format to the latest version.
- OptimizeDataSourceAsync(string, CancellationToken)
Asynchronously optimizes a file-based DataSource by compacting its data and migrating its internal format to the latest supported version.
- RemoveSection(Section, CancellationToken)
Removes the specified Section from this DataSource.
- RemoveSection(string, CancellationToken)
Removes the section identified by the specified identifier from this DataSource.
- RemoveSectionAsync(Section, CancellationToken)
Asynchronously removes the specified Section from this DataSource.
- RemoveSectionAsync(string, CancellationToken)
Asynchronously removes the section identified by the specified identifier from this DataSource.
- Serialize()
Serializes this instance into a binary format and returns the resulting data as a byte array.
- Serialize(Stream)
Serializes this instance into a binary format and writes it to the specified stream.
- Serialize(string)
Serializes this instance into a binary format and writes it to the specified file path.