Class LM
Provides a unified manager for AI model instances, including small and large language models, embedding models (image and text), vision-language models, and speech-to-text models, across GGUF, GGML, ONNX, and LMK container formats. This class handles downloading, loading, configuring, and interacting with these models, ensuring a consistent API for text generation, embeddings, multimodal inference, and speech recognition.
public sealed class LM : IDisposable
- Inheritance
-
LM
- Implements
- Inherited Members
Examples
Example: Load a model by ID (recommended)
using LMKit.Model;
using System;
// Load a predefined model by its ID
LM model = LM.LoadFromModelID("llama-3.2-1b");
Console.WriteLine($"Model: {model.Name}");
Console.WriteLine($"Context length: {model.ContextLength}");
Console.WriteLine($"Has text generation: {model.HasTextGeneration}");
Console.WriteLine($"Has vision: {model.HasVision}");
// Use with various LM-Kit components
// var chat = new MultiTurnConversation(model);
// var summarizer = new Summarizer(model);
Example: Load from local file with GPU configuration
using LMKit.Model;
using System;
// Configure GPU usage
var deviceConfig = new LM.DeviceConfiguration
{
GpuLayerCount = 35 // Offload 35 layers to GPU
};
// Load from local GGUF file
LM model = new LM(
"models/llama-3.2-1b.Q4_K_M.gguf",
deviceConfiguration: deviceConfig,
loadingProgress: progress =>
{
Console.WriteLine($"Loading: {progress:P0}");
return true; // Continue loading
});
Console.WriteLine($"Loaded: {model.Name}");
Console.WriteLine($"GPU layers: {model.GpuLayerCount}");
Example: Load embedding model
using LMKit.Model;
using LMKit.Embeddings;
using System;
// Load an embedding model
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
Console.WriteLine($"Is embedding model: {embeddingModel.IsEmbeddingModel}");
Console.WriteLine($"Embedding size: {embeddingModel.EmbeddingSize}");
// Use with Embedder
var embedder = new Embedder(embeddingModel);
float[] vector = embedder.GetEmbeddings("Hello world");
Example: Download model from URI with progress
using LMKit.Model;
using System;
var modelUri = new Uri("https://huggingface.co/lmkit/llama-3.2-1b-gguf/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf");
LM model = new LM(
modelUri,
storagePath: "models/",
downloadingProgress: (path, contentLength, bytesRead) =>
{
if (contentLength.HasValue)
Console.WriteLine($"Downloading: {bytesRead}/{contentLength} bytes");
return true; // Continue downloading
});
Console.WriteLine($"Downloaded and loaded: {model.Name}");
Remarks
The LM class is the core entry point for working with language models in LM-Kit.NET. It supports multiple model types and provides unified access to model capabilities.
Supported Model Types
- Text generation models (chat, completion)
- Embedding models (text and image)
- Vision-language models (multimodal)
- Speech-to-text models (Whisper)
Loading Methods
- LoadFromModelID(string, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback) - Load predefined models by ID (recommended)
- LM(string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback) - Load from local file path
- LM(Uri, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback) - Load from URI (auto-download)
- LM(ModelCard, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback) - Load from ModelCard
Key Properties
- HasTextGeneration - Whether the model supports text generation
- IsEmbeddingModel - Whether the model is an embedding model
- HasVision - Whether the model supports vision input
- HasSpeechToText - Whether the model supports speech recognition
- HasToolCalls - Whether the model supports function/tool calling
- ContextLength - Maximum context window size in tokens
- EmbeddingSize - Dimension of embedding vectors
Constructors
- LM(ModelCard, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback)
Creates an instance of the Model class from a ModelCard object.
- LM(Stream, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback, bool)
Creates an instance of the Model class from a caller-supplied
modelStreamcarrying a plaintext GGUF model. The stream must be readable and seekable; its current position is reset to the start of the stream when reading. By default the constructor takes ownership of the stream and disposes it once loading completes (or fails); passleaveOpen= true to retain ownership in the caller.
- LM(string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback)
Creates an instance of the Model class from a file.
- LM(Uri, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback)
Creates an instance of the Model class from a System.Uri object.
Properties
- Adapters
Gets the collection of LoRA adapters currently applied to the model.
- Architecture
Retrieves the architecture type of the model. Ie: 'llama', 'bert', 'phi'...
- ChatTemplateFormat
Retrieves or sets the format of the model chat template as detected by LMKit.
- ChatTemplateFormatFlags
Gets or sets the flags that specify how the chat template should be formatted.
- ContextLength
Gets the context size the model was trained on.
- Description
Specifies the model description.
- DraftModel
Gets or sets an optional smaller "draft" model that accelerates this model's text generation through draft-model speculative decoding.
- DraftModelSizeBytes
Gets the weight size, in bytes, of the attached speculative-decoding draft model, or
0when no separate draft model is attached. The draft model's weights are held apart from this model's Size, so this is reported separately to make the draft's contribution to the total memory footprint visible. In-model Multi-Token Prediction (self-speculation) carries no separate weights and returns0here while still enabling speculative decoding (see HasSpeculativeDecodingDrafts).
- EmbeddingSize
Gets the dimension of embedding vectors produced by the model.
- GpuLayerCount
Gets the count of layers that have been previously loaded into the VRAM (Video Random Access Memory) or GPU (Graphics Processing Unit) memory.
- HasImageEmbeddings
Indicates whether the model supports image embeddings.
- HasImageSegmentation
Indicates whether the model supports image segmentation, enabling partitioning of an image into regions or object masks for downstream analysis and understanding.
- HasReasoning
Gets a value indicating whether the model supports intermediate reasoning steps, such as structured thinking, multi-step problem solving, or chain-of-thought prompting.
- HasSpeculativeDecodingDrafts
Indicates whether this model carries speculative-decoding draft assets that were loaded and can drive draft-and-verify decoding. This is
truewhen either the checkpoint declares Multi-Token Prediction (MTP) heads (GGUF key<arch>.nextn_predict_layers, present on architectures such as Qwen 3.5 / 3.6, GLM-4.x, DeepSeek V3 / R1, BailingMoE2, and ExaOne-MoE) and those heads were loaded, or a DraftModel is attached (including one shipped inside the model envelope). Returnsfalsewhen the packaged draft assets were skipped at load time via EnableSpeculativeDecodingDrafts =false.
- HasSpeechToText
Gets a value indicating whether speech-to-text support is available in this model.
- HasTextGeneration
Indicates whether this model can perform text generation (chat/completions).
- HasToolCalls
Gets a value indicating whether the loaded model can emit tool calls during chat/text generation.
- HasVision
Indicates whether the model includes vision capabilities, enabling it to process visual content in addition to text.
- ID
Gets a unique identifier for the loaded model instance.
- IsEmbeddingModel
Indicates whether the model primarily functions as a text embedding model.
- IsEncrypted
True if this model instance was loaded from an LM-Kit encrypted GGUF container via LoadEncrypted(string, GgufEncryptionScheme, string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback) or LoadEncryptedFromStream(Stream, GgufEncryptionScheme, string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback, bool).
- LayerCount
Specifies the number of input layers in the model.
- MainGpu
Gets the GPU used for scratch and small tensors.
- ModelMetadata
Specifies metadata keys in the model.
- ModelPath
Gets the full local file system path where the model is stored on the local machine. This property represents the destination location after the model has been downloaded.
- ModelType
Gets the precision of the model input tensors.
- ModelUri
Gets the original URI from which the model is downloaded or can be accessed remotely. This property represents the source location of the model.
- Name
Specifies the model name.
- ParameterCount
Gets the number of parameters of the model.
- RopeAlgorithm
Gets the type of rope algorithm used for positional encoding in the model.
- RopeFreqScaleTrain
Gets the RoPE frequency scaling factor used for training the rope positional encoding in the model.
- Size
Gets the model size, in bytes.
- Vocabulary
Gets the model's vocabulary handler, offering tokenization features.
Methods
- ApplyLoraAdapter(LoraAdapterSource)
Preloads a Low-Rank Adaptation (LoRA) adapter into the model using parameters from a LoraAdapterSource instance. Upon preload, the adapter is registered in the model’s Adapters collection and will only be applied at inference time when its Scale is greater than zero.
- ApplyLoraAdapter(string, float)
Preloads a Low-Rank Adaptation (LoRA) adapter from the specified file into the model, preparing it for dynamic application at inference time. Upon preload, a new entry appears in the model’s Adapters collection. The adapter will only take effect when its
scaleis greater than zero.
- ClearCache()
Removes all cached resources linked to this model instance from memory.
- ClearLookupCaches()
Clears this model's rebuildable lookup caches (tokenization and embedding caches) without disposing loaded weights or inference contexts. Unlike ClearCache(), this is safe to call while the model is actively in use: cleared entries are simply recomputed on demand.
- Dispose()
Ensures the release of this instance and the complete removal of all associated unmanaged resources.
- GetLoadedContexts()
Returns a snapshot of the inference contexts (KV-caches) currently held in memory for this model: those actively in use and those idle in the recycle pool. Each entry reports the context's token capacity, memory footprint, residency state, and device, so callers can see exactly what is keeping the model resident and where its memory is going. The list is empty when no context is held.
- HibernateAllContexts()
Schedules background hibernation of every in-memory inference context held for this model, both those actively in use and those idle in the recycle pool, serializing each context's state (including its speculative-decoding draft sibling) to disk and releasing its device and host memory. Returns the number of contexts scheduled.
- LoadEncrypted(string, GgufEncryptionScheme, string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback)
Load a GGUF model from an LM-Kit encrypted container, decrypting tensor bytes on the fly from disk. The plaintext GGUF is never materialized in memory nor written back to disk: only the metadata block (a few MB) and one tensor's worth of bytes at a time are ever decrypted.
- LoadEncryptedFromStream(Stream, GgufEncryptionScheme, string, DeviceConfiguration, LoadingOptions, ModelLoadingProgressCallback, bool)
Load an LM-Kit encrypted GGUF (.lmke) container directly from a caller-supplied
encryptedStream. Tensor bytes are decrypted on the fly into the native destination buffer; no plaintext copy lives in memory nor is written to disk. The stream must be readable and seekable.
- LoadFromModelID(string, string, DeviceConfiguration, LoadingOptions, ModelDownloadingProgressCallback, ModelLoadingProgressCallback)
Loads an LM instance based on a given model identifier.
Where to find
modelID:
1. Use GetPredefinedModelCards(bool) to see a list of available predefined models. Each predefined model exposes its unique identifier in the ModelID property.
2. For custom models, define your own ModelCard instance and specify the ModelID.
- ReleaseIdlePooledContexts(out long)
Disposes every inference context sitting idle in the process-wide recycle pool and returns the number released. Contexts currently in use by a live session are never touched, so this is safe to call on a busy process: the only cost is that the next request for a recycled context allocates a fresh one. Use it to reclaim KV-cache memory on demand.
- RemoveLoraAdapter(LoraAdapter)
Removes a previously applied LoRA adapter from this model instance.
- ValidateFormat(string, bool)
Validates the specified model file to ensure it adheres to a supported format, including verifying its header and overall structure.