Load and Merge LoRA Adapters at Inference Time
LoRA (Low-Rank Adaptation) adapters let you customize a base model's behavior without retraining from scratch. LM-Kit.NET supports two strategies for using LoRA adapters at inference time: hot-swapping (applying and removing adapters on a running model) and permanent merging (baking one or more adapters into a new model file). This tutorial covers both approaches, from applying a single adapter to blending multiple domain-specific adapters into one model.
Why LoRA Adapter Composition Matters
Two real-world problems that LoRA adapter composition solves:
- Multi-tenant customization without model duplication. A SaaS platform serves multiple customers, each with fine-tuned behavior. Instead of loading separate model files per tenant, you load one base model and hot-swap LoRA adapters per request. This cuts VRAM usage dramatically.
- Combining domain expertise from multiple fine-tunes. A legal assistant fine-tuned on contracts and another fine-tuned on regulatory filings can be merged into a single adapter. Merging avoids the latency of switching adapters mid-conversation and produces a model that handles both domains simultaneously.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| Base model | Any GGUF model file |
| LoRA adapter(s) | GGUF-format LoRA files compatible with the base model |
Step 1: Create the Project
dotnet new console -n LoraAdapters
cd LoraAdapters
dotnet add package LM-Kit.NET
Step 2: Understand the Two Strategies
┌───────────────────────────────────────────────┐
│ Base Model (GGUF) │
└───────────────────┬───────────────────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌───────────────┐ ┌─────────────────┐
│ Hot-Swap │ │ Permanent │
│ (runtime) │ │ Merge │
│ │ │ │
│ ApplyLora() │ │ LoraMerger │
│ RemoveLora() │ │ .Merge() │
│ │ │ │
│ Instant swap │ │ New GGUF file │
│ No disk I/O │ │ No runtime cost │
└───────────────┘ └─────────────────┘
| Approach | Best for | Trade-off |
|---|---|---|
| Hot-swap | Per-request customization, A/B testing, multi-tenant | Small runtime overhead per token |
| Permanent merge | Production deployment, single-purpose models | Disk space for merged file |
Step 3: Validate LoRA Compatibility
Before using a LoRA adapter, verify it matches the base model architecture:
using LMKit.Finetuning;
string baseModelPath = "models/base-model.gguf";
string loraPath = "adapters/customer-support-lora.gguf";
// Validate that the adapter is compatible with the base model
bool isValid = LoraAdapterSource.ValidateFormat(loraPath);
if (isValid)
{
Console.WriteLine("LoRA adapter format is valid.");
}
else
{
Console.WriteLine("LoRA adapter format is invalid or corrupted.");
}
Step 4: Hot-Swap LoRA Adapters at Runtime
using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";
using LM model = new LM(baseModelPath,
loadingProgress: p =>
{
Console.Write($"\r Loading base model: {p * 100:F0}% ");
return true;
});
Console.WriteLine($"\n Loaded: {model.Name}\n");
// ──────────────────────────────────────
// 2. Apply a LoRA adapter
// ──────────────────────────────────────
string loraPathSupport = "adapters/customer-support-lora.gguf";
model.ApplyLoraAdapter(loraPathSupport, scale: 1.0f);
Console.WriteLine($"Applied adapter. Active adapters: {model.Adapters.Count}");
foreach (LoraAdapter adapter in model.Adapters)
{
Console.WriteLine($" - {adapter.Identifier} (scale: {adapter.Scale:F2}, path: {adapter.Path})");
}
// ──────────────────────────────────────
// 3. Run inference with the adapter active
// ──────────────────────────────────────
var chat = new SingleTurnConversation(model)
{
SystemPrompt = "You are a helpful customer support agent.",
MaximumCompletionTokens = 256
};
TextGenerationResult result = chat.Submit("How do I reset my password?");
Console.WriteLine($"\n[With support adapter]\n{result.Completion}");
// ──────────────────────────────────────
// 4. Swap to a different adapter
// ──────────────────────────────────────
// Remove the current adapter
LoraAdapter currentAdapter = model.Adapters[0];
bool removed = model.RemoveLoraAdapter(currentAdapter);
Console.WriteLine($"\nRemoved adapter: {removed}");
// Apply a different adapter
string loraPathLegal = "adapters/legal-review-lora.gguf";
model.ApplyLoraAdapter(loraPathLegal, scale: 0.8f);
var legalChat = new SingleTurnConversation(model)
{
SystemPrompt = "You are a legal document reviewer.",
MaximumCompletionTokens = 256
};
TextGenerationResult legalResult = legalChat.Submit("Summarize the key obligations in this contract clause.");
Console.WriteLine($"\n[With legal adapter]\n{legalResult.Completion}");
// ──────────────────────────────────────
// 5. Use LoraAdapterSource for more control
// ──────────────────────────────────────
var adapterSource = new LoraAdapterSource(
path: "adapters/medical-lora.gguf",
scale: 0.6f);
// Remove previous adapter first
model.RemoveLoraAdapter(model.Adapters[0]);
// Apply using the source object
model.ApplyLoraAdapter(adapterSource);
Console.WriteLine($"\nApplied medical adapter at scale {adapterSource.Scale:F1}");
Console.WriteLine($"Active adapters: {model.Adapters.Count}");
Step 5: Adjust Adapter Scale Dynamically
You can change an adapter's influence without removing and re-applying it:
using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";
using LM model = new LM(baseModelPath,
loadingProgress: p =>
{
Console.Write($"\r Loading base model: {p * 100:F0}% ");
return true;
});
LoraAdapter activeAdapter = model.Adapters[0];
// Start with full influence
activeAdapter.Scale = 1.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");
// Reduce influence to 50%
activeAdapter.Scale = 0.5f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");
// Effectively disable without removing
activeAdapter.Scale = 0.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2} (disabled)");
This is useful for gradually blending adapter behavior during A/B testing or for implementing a "confidence dial" that controls how much the fine-tuned behavior influences responses.
Step 6: Permanently Merge LoRA Adapters
using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
string baseModelPath = "models/base-model.gguf";
string mergedOutputPath = "models/merged-model.gguf";
// ──────────────────────────────────────
// 1. Create a merger from the base model path
// ──────────────────────────────────────
var merger = new LoraMerger(baseModelPath);
// ──────────────────────────────────────
// 2. Add multiple adapters with different scales
// ──────────────────────────────────────
merger.AddLoraAdapter("adapters/customer-support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/legal-review-lora.gguf", scale: 0.7f);
// Or use LoraAdapterSource objects
var medicalAdapter = new LoraAdapterSource("adapters/medical-lora.gguf", scale: 0.5f);
merger.AddLoraAdapter(medicalAdapter);
Console.WriteLine("Adapters queued for merge:");
Console.WriteLine(" - customer-support (scale: 1.0)");
Console.WriteLine(" - legal-review (scale: 0.7)");
Console.WriteLine(" - medical (scale: 0.5)");
// ──────────────────────────────────────
// 3. Configure merge options
// ──────────────────────────────────────
merger.EnableQuantization = true; // quantize during merge
merger.ThreadCount = Environment.ProcessorCount;
// ──────────────────────────────────────
// 4. Execute the merge
// ──────────────────────────────────────
Console.WriteLine($"\nMerging into: {mergedOutputPath}");
merger.Merge(mergedOutputPath);
Console.WriteLine("Merge complete.");
// ──────────────────────────────────────
// 5. Load and test the merged model
// ──────────────────────────────────────
using LM mergedModel = new LM(mergedOutputPath,
loadingProgress: p =>
{
Console.Write($"\r Loading merged model: {p * 100:F0}% ");
return true;
});
Console.WriteLine($"\n Loaded merged model: {mergedModel.Name}");
Console.WriteLine($" No adapters needed: {mergedModel.Adapters.Count} active adapters");
Step 7: Merge Using a Loaded Model
If you already have a model loaded in memory, you can merge directly without specifying the file path again:
using LM model = new LM("models/base-model.gguf");
// Create merger from the loaded model
var merger = new LoraMerger(model);
merger.AddLoraAdapter("adapters/support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/tone-lora.gguf", scale: 0.4f);
// Merge to a new file
merger.Merge("models/production-model.gguf");
Console.WriteLine("Merged successfully from loaded model.");
// Clear adapters if you want to reuse the merger
merger.ClearAdapters();
API Reference
Hot-Swap Methods (on LM)
| Method | Description |
|---|---|
ApplyLoraAdapter(string path, float scale = 1) |
Apply a LoRA adapter from a file path |
ApplyLoraAdapter(LoraAdapterSource source) |
Apply using a pre-configured source |
RemoveLoraAdapter(LoraAdapter adapter) |
Remove a specific adapter; returns true if found |
Adapters |
Read-only list of currently active adapters |
LoraAdapter Properties
| Property | Type | Description |
|---|---|---|
Identifier |
string |
Unique identifier for this adapter |
Path |
string |
File path of the adapter |
Scale |
float |
Influence weight (0.0 = disabled, 1.0 = full); settable |
LoraMerger Members
| Member | Description |
|---|---|
LoraMerger(string modelPath) |
Create merger from a model file path |
LoraMerger(LM model) |
Create merger from a loaded model |
AddLoraAdapter(string path, float scale) |
Queue an adapter for merging |
AddLoraAdapter(LoraAdapterSource source) |
Queue using adapter source |
ClearAdapters() |
Remove all queued adapters |
Merge(string outputPath, MetadataCollection overrides) |
Execute the merge |
EnableQuantization |
Quantize the merged model |
ThreadCount |
Thread count for merge operation |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
ValidateFormat returns false |
Adapter was trained for a different architecture | Verify the adapter was fine-tuned from the same base model |
| Degraded output quality after merge | Adapter scales too high when combining multiple adapters | Reduce individual scales (e.g., 0.5 each for two adapters) |
| Out of memory during merge | Base model + adapters exceed available RAM | Use ThreadCount = 1 to reduce peak memory, or merge on a machine with more RAM |
No effect after ApplyLoraAdapter |
Adapter scale set to 0 | Check adapter.Scale is greater than 0 |
Next Steps
- Prepare Training Datasets for LoRA Fine-Tuning: create the datasets that produce LoRA adapters.
- Browse and Select Models Programmatically: pick the right base model.
- Distribute Large Models Across Multiple GPUs: run large merged models across GPUs.