Table of Contents

Class DataSource

Namespace
LMKit.Data
Assembly
LM-Kit.NET.dll

Represents a repository that encapsulates content from diverse data sources, including text, images, documents, web pages, and other data-rich environments. It provides a unified interface for consistent interaction with various types of data.

public sealed class DataSource : ISerializableData, IDisposable
Inheritance
DataSource
Implements
Inherited Members

Remarks

In LMKit, the hierarchy of data structures is organized as follows: a DataSource contains a collection of Section objects, each of which can hold a collection of TextPartition instances.

Properties

EmbeddingSize

Gets the dimension of embedding vectors.

Identifier

Gets the unique identifier for this instance.

Metadata

Gets the metadata associated with this instance.

Sections

Gets a read-only collection of sections that make up this instance.

StorageMode

Gets the storage mode that indicates how the data source is persisted.

Methods

Clone()

Creates a deep clone of this DataSource instance.

CreateFileDataSource(string, string, LM, MetadataCollection, bool)

Creates a new file-backed DataSource instance with the specified identifier and language model. This method creates a new file at the given path and opens it with read-write access. The resulting DataSource is initially created in-memory and then associated with the backing file stream, enabling future updates to be synchronized with the file.

CreateInMemoryDataSource(string, LM, MetadataCollection)

Creates a new in-memory DataSource instance with the specified identifier and language model. This instance can subsequently be saved to a file using one of the Serialize methods.

CreateVectorStoreDataSource(IVectorStore, string, LM, MetadataCollection, CancellationToken)

Creates a new DataSource instance that is backed by the specified vector store. This is a synchronous wrapper around the asynchronous CreateVectorStoreDataSourceAsync(IVectorStore, string, LM, MetadataCollection, CancellationToken) method.

CreateVectorStoreDataSourceAsync(IVectorStore, string, LM, MetadataCollection, CancellationToken)

Asynchronously creates a new DataSource instance that is backed by the specified vector store. This method leverages the vector store to store and retrieve data associated with the data source.

Deserialize(byte[], LM, CancellationToken)

Deserializes the given binary data into a DataSource instance using the specified model for context.

Deserialize(Stream, LM, CancellationToken)

Deserializes a DataSource from a stream using the provided model for context.

Deserialize(string, LM, CancellationToken)

Deserializes the binary data from the specified file path into a DataSource instance using the provided model for context.

GetSectionByIdentifier(string)

Retrieves a Section object from this instance based on the specified identifier.

GetSectionByIdentifierAsync(string, CancellationToken)

Asynchronously retrieves a Section object from this instance based on the specified identifier.

HasSection(string)

Determines whether the data source contains a section with the specified identifier.

LoadFromFile(string, LM, bool, CancellationToken)

Loads a DataSource from the specified file path by lazily loading its content. This method opens the file in either read-only or read-write mode, and when read-write mode is used, the backing file remains open so that the DataSource is only partially loaded into memory with additional content decoded on demand during the instance's life cycle. This lazy-loading approach is recommended for large data sources.

LoadFromFileAsync(string, LM, bool, CancellationToken)

Asynchronously loads a DataSource from the specified file path by lazily loading its content. The file is opened in either read-only or read-write mode, and when read-write mode is used, the file stream remains open so that modifications to the DataSource are automatically synchronized back to the file. This lazy-loading design is particularly useful for large data sources where it is inefficient or unnecessary to load the entire content into memory at once.

LoadFromStore(IVectorStore, string, LM, CancellationToken)

Loads a DataSource from the specified vector store using the provided data source identifier and language model. This is a synchronous wrapper around the asynchronous LoadFromStoreAsync(IVectorStore, string, LM, CancellationToken) method.

LoadFromStoreAsync(IVectorStore, string, LM, CancellationToken)

Asynchronously loads a DataSource from the specified vector store using the provided data source identifier and language model.

OptimizeDataSource(string, CancellationToken)

Synchronously optimizes a file-based DataSource by compacting its data and updating its internal format to the latest version.

OptimizeDataSourceAsync(string, CancellationToken)

Asynchronously optimizes a file-based DataSource by compacting its data and migrating its internal format to the latest supported version.

RemoveSection(Section, CancellationToken)

Removes the specified Section from this DataSource.

RemoveSection(string, CancellationToken)

Removes the section identified by the specified identifier from this DataSource.

RemoveSectionAsync(Section, CancellationToken)

Asynchronously removes the specified Section from this DataSource.

RemoveSectionAsync(string, CancellationToken)

Asynchronously removes the section identified by the specified identifier from this DataSource.

Serialize()

Serializes this instance into a binary format and returns the resulting data as a byte array.

Serialize(Stream)

Serializes this instance into a binary format and writes it to the specified stream.

Serialize(string)

Serializes this instance into a binary format and writes it to the specified file path.

Upsert(string, IEnumerable<VectorEntry>, MetadataCollection, CancellationToken)

Inserts a sequence of vector entries into the specified section, creating it if necessary. Wraps each DataSource.VectorEntry’s payload in a TextPartition and stores its embedding.

Upsert(string, IEnumerable<float[]>, MetadataCollection, CancellationToken)

Inserts multiple embedding vectors into the specified section in a single operation, creating the section if necessary. Delegates to the core Upsert implementation after wrapping each float[] in a DataSource.VectorEntry.

Upsert(string, float[], MetadataCollection, CancellationToken)

Inserts a single embedding vector into the specified section, creating the section if it does not already exist. Wraps the provided float[] payload in a DataSource.VectorEntry and delegates to the core Upsert implementation.

UpsertAsync(string, VectorEntry, MetadataCollection, CancellationToken)

Asynchronously inserts a single vector entry into the specified section, creating the section if it does not already exist. Wraps the DataSource.VectorEntry’s payload in a TextPartition and stores its embedding.

UpsertAsync(string, IEnumerable<VectorEntry>, MetadataCollection, CancellationToken)

Inserts a sequence of vector entries into the specified section, creating the section if it does not already exist. Each DataSource.VectorEntry’s payload is wrapped in a TextPartition and its embedding stored.

UpsertAsync(string, IEnumerable<float[]>, MetadataCollection, CancellationToken)

Asynchronously inserts multiple embedding vectors into the specified section in a single operation, creating the section if necessary. Delegates to the core UpsertAsync implementation after wrapping each float[] in a DataSource.VectorEntry.

UpsertAsync(string, float[], MetadataCollection, CancellationToken)

Asynchronously inserts a single embedding vector into the specified section, creating the section if it does not already exist. Wraps the provided float[] payload in a DataSource.VectorEntry and delegates to the core UpsertAsync implementation.