Class Configuration
A static class providing global configuration settings for the LM-Kit runtime.
public static class Configuration
- Inheritance
-
Configuration
- Inherited Members
Examples
Example: Configure model storage directory
using LMKit.Global;
using LMKit.Model;
using System;
// Set custom model storage directory before loading any models
Configuration.ModelStorageDirectory = @"D:\MyModels";
// Models will now be downloaded to and loaded from this directory
LM model = LM.LoadFromModelID("llama-3.2-1b");
Console.WriteLine($"Model loaded from: {Configuration.ModelStorageDirectory}");
Example: Configure for multi-GPU inference
using LMKit.Global;
using LMKit.Model;
using System;
// Enable distributed inference across multiple GPUs
Configuration.FavorDistributedInference = true;
// Load a large model that benefits from multi-GPU
LM model = LM.LoadFromModelID("llama-3.1-70b");
Console.WriteLine("Model distributed across available GPUs");
Example: Performance tuning settings
using LMKit.Global;
using System;
// Disable checksum validation for faster loading (trusted sources only)
Configuration.EnableModelChecksumValidation = false;
// Enable model caching for faster subsequent loads
Configuration.EnableModelCache = true;
// Enable KV cache recycling to reduce memory usage
Configuration.EnableKVCacheRecycling = true;
// Enable token healing for better tokenization
Configuration.EnableTokenHealing = true;
Console.WriteLine("LM-Kit configured for optimal performance");
Remarks
The Configuration class allows you to customize how LM-Kit operates at the global level. Settings include model storage locations, GPU utilization, caching behavior, and performance tuning.
Key Configuration Areas
- ModelStorageDirectory - Where models are downloaded and cached
- EnableModelCache - Control model caching for faster reloading
- FavorDistributedInference - Multi-GPU distribution settings
- EnableTokenHealing - Improve tokenization quality
Fields
- DownloadChunkSize
Gets or sets the size of the download chunks. This value determines the size of each chunk when downloading data.
- EnableCompletionHealing
Gets or sets a value indicating whether completion healing is enabled. Completion healing attempts to correct errors during the completion process.
- EnableContextRecycling
Gets or sets a value indicating whether context recycling is enabled. This helps in reusing context data to improve performance.
- EnableContextTokenHealing
Gets or sets a value indicating whether context token healing is enabled. This setting helps in correcting errors in the context tokenization.
- EnableDynamicSampling
Gets or sets a value indicating whether the dynamic sampling strategy is enabled.
When enabled, the dynamic sampling strategy is applied during inference time to optimize model performance and the quality of generated outputs by utilizing multiple token selection methods.
Key features of dynamic sampling include:
- Dynamic Constrained GenerationRestricts the token space at each decoding step based on real-time conditions, ensuring relevance and adherence to specific constraints.
- Perplexity-Based Token SelectionSelects tokens that minimize perplexity, enhancing the coherence and contextual consistency of the generated output.
- Context-Aware SamplingLeverages predefined contextual data to guide token choices, resulting in more fluent and contextually appropriate completions.
- Speculative SamplingIncorporates speculative sampling techniques based on real-time natural language processing (NLP) analysis during the decoding process.
- Adaptive Model CompatibilityEliminates the need for model fine-tuning to achieve high accuracy. The strategy adapts to the model's stylistic preferences during inference while maintaining low perplexity for future token selections.
Dynamic sampling acts as a real-time "voting" mechanism, blending constrained sampling with speculative sampling based on the current decoding state during inference.
This strategy is particularly effective at reducing inference times while improving accuracy and quality. It excels in tasks such as function calling, classification, and information extraction.
- EnableKVCacheRecycling
Gets or sets a value indicating whether KV (key-value) cache recycling is enabled. KV cache recycling can help reduce memory footprint and improve performance.
- EnableModelCache
Gets or sets a value indicating whether model caching is enabled. This helps in improving the performance by reusing models.
- EnableModelChecksumValidation
Gets or sets a value indicating whether model file integrity must be validated during loading.
- EnableTokenHealing
Gets or sets a value indicating whether token healing is enabled. Token healing attempts to correct errors in tokenization.
- EnableTokenizationCache
Gets or sets a value indicating whether tokenization caching is enabled. Tokenization caching improves performance by storing the results of tokenization operations.
- FavorDistributedInference
Determines whether the runtime should prefer splitting computations across multiple GPUs.
- MaxCachedContextLength
Gets or sets the maximum length of cached context. This value determines the maximum length of context data that can be cached.
- ModelStorageDirectoryEnvVar
The name of the environment variable used to specify a custom model storage directory.
- TokenizerCacheLength
Gets or sets the length of the tokenizer cache. This value determines how many tokens are stored in the cache.
- UseAsyncModelAttributesLoading
Gets or sets a value indicating whether asynchronous model attributes loading is enabled. This can improve performance by loading model attributes asynchronously.
Properties
- MinContextSize
Gets or sets the minimum context size. This value determines the smallest context size allowed.
- ModelStorageDirectory
Gets or sets the default storage path for model files. If the path does not exist during a set operation, it will be created automatically.
- ThreadCount
Gets or sets the max number of threads to be used for processing. This value defaults to the number of physical cores available on the machine.
Methods
- EnableExperimentalFeature(string)
Enables an experimental feature by specifying its feature identifier.