Load and Merge LoRA Adapters at Inference Time

LoRA (Low-Rank Adaptation) adapters let you customize a base model's behavior without retraining from scratch. LM-Kit.NET supports two strategies for using LoRA adapters at inference time: hot-swapping (applying and removing adapters on a running model) and permanent merging (baking one or more adapters into a new model file). This tutorial covers both approaches, from applying a single adapter to blending multiple domain-specific adapters into one model.

Why LoRA Adapter Composition Matters

Two real-world problems that LoRA adapter composition solves:

Multi-tenant customization without model duplication. A SaaS platform serves multiple customers, each with fine-tuned behavior. Instead of loading separate model files per tenant, you load one base model and hot-swap LoRA adapters per request. This cuts VRAM usage dramatically.
Combining domain expertise from multiple fine-tunes. A legal assistant fine-tuned on contracts and another fine-tuned on regulatory filings can be merged into a single adapter. Merging avoids the latency of switching adapters mid-conversation and produces a model that handles both domains simultaneously.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
Base model	Any GGUF model file
LoRA adapter(s)	GGUF-format LoRA files compatible with the base model

Step 1: Create the Project

dotnet new console -n LoraAdapters
cd LoraAdapters
dotnet add package LM-Kit.NET

Step 2: Understand the Two Strategies

┌───────────────────────────────────────────────┐
│             Base Model (GGUF)                 │
└───────────────────┬───────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌───────────────┐     ┌─────────────────┐
│  Hot-Swap     │     │  Permanent      │
│  (runtime)    │     │  Merge          │
│               │     │                 │
│ ApplyLora()   │     │  LoraMerger     │
│ RemoveLora()  │     │  .Merge()       │
│               │     │                 │
│ Instant swap  │     │ New GGUF file   │
│ No disk I/O   │     │ No runtime cost │
└───────────────┘     └─────────────────┘

Approach	Best for	Trade-off
Hot-swap	Per-request customization, A/B testing, multi-tenant	Small runtime overhead per token
Permanent merge	Production deployment, single-purpose models	Disk space for merged file

Step 3: Validate LoRA Compatibility

Before using a LoRA adapter, verify it matches the base model architecture:

using LMKit.Finetuning;

string baseModelPath = "models/base-model.gguf";
string loraPath = "adapters/customer-support-lora.gguf";

// Validate that the adapter is compatible with the base model
bool isValid = LoraAdapterSource.ValidateFormat(loraPath);

if (isValid)
{
    Console.WriteLine("LoRA adapter format is valid.");
}
else
{
    Console.WriteLine("LoRA adapter format is invalid or corrupted.");
}

Step 4: Hot-Swap LoRA Adapters at Runtime

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";

using LM model = new LM(baseModelPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading base model: {p * 100:F0}%   ");
        return true;
    });

Console.WriteLine($"\n  Loaded: {model.Name}\n");

// ──────────────────────────────────────
// 2. Apply a LoRA adapter
// ──────────────────────────────────────
string loraPathSupport = "adapters/customer-support-lora.gguf";

model.ApplyLoraAdapter(loraPathSupport, scale: 1.0f);
Console.WriteLine($"Applied adapter. Active adapters: {model.Adapters.Count}");

foreach (LoraAdapter adapter in model.Adapters)
{
    Console.WriteLine($"  - {adapter.Identifier} (scale: {adapter.Scale:F2}, path: {adapter.Path})");
}

// ──────────────────────────────────────
// 3. Run inference with the adapter active
// ──────────────────────────────────────
var chat = new SingleTurnConversation(model)
{
    SystemPrompt = "You are a helpful customer support agent.",
    MaximumCompletionTokens = 256
};

TextGenerationResult result = chat.Submit("How do I reset my password?");
Console.WriteLine($"\n[With support adapter]\n{result.Completion}");

// ──────────────────────────────────────
// 4. Swap to a different adapter
// ──────────────────────────────────────
// Remove the current adapter
LoraAdapter currentAdapter = model.Adapters[0];
bool removed = model.RemoveLoraAdapter(currentAdapter);
Console.WriteLine($"\nRemoved adapter: {removed}");

// Apply a different adapter
string loraPathLegal = "adapters/legal-review-lora.gguf";
model.ApplyLoraAdapter(loraPathLegal, scale: 0.8f);

var legalChat = new SingleTurnConversation(model)
{
    SystemPrompt = "You are a legal document reviewer.",
    MaximumCompletionTokens = 256
};

TextGenerationResult legalResult = legalChat.Submit("Summarize the key obligations in this contract clause.");
Console.WriteLine($"\n[With legal adapter]\n{legalResult.Completion}");

// ──────────────────────────────────────
// 5. Use LoraAdapterSource for more control
// ──────────────────────────────────────
var adapterSource = new LoraAdapterSource(
    path: "adapters/medical-lora.gguf",
    scale: 0.6f);

// Remove previous adapter first
model.RemoveLoraAdapter(model.Adapters[0]);

// Apply using the source object
model.ApplyLoraAdapter(adapterSource);
Console.WriteLine($"\nApplied medical adapter at scale {adapterSource.Scale:F1}");
Console.WriteLine($"Active adapters: {model.Adapters.Count}");

Step 5: Adjust Adapter Scale Dynamically

You can change an adapter's influence without removing and re-applying it:

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";

using LM model = new LM(baseModelPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading base model: {p * 100:F0}%   ");
        return true;
    });

LoraAdapter activeAdapter = model.Adapters[0];

// Start with full influence
activeAdapter.Scale = 1.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");

// Reduce influence to 50%
activeAdapter.Scale = 0.5f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");

// Effectively disable without removing
activeAdapter.Scale = 0.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2} (disabled)");

This is useful for gradually blending adapter behavior during A/B testing or for implementing a "confidence dial" that controls how much the fine-tuned behavior influences responses.

Step 6: Permanently Merge LoRA Adapters

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

string baseModelPath = "models/base-model.gguf";
string mergedOutputPath = "models/merged-model.gguf";

// ──────────────────────────────────────
// 1. Create a merger from the base model path
// ──────────────────────────────────────
var merger = new LoraMerger(baseModelPath);

// ──────────────────────────────────────
// 2. Add multiple adapters with different scales
// ──────────────────────────────────────
merger.AddLoraAdapter("adapters/customer-support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/legal-review-lora.gguf", scale: 0.7f);

// Or use LoraAdapterSource objects
var medicalAdapter = new LoraAdapterSource("adapters/medical-lora.gguf", scale: 0.5f);
merger.AddLoraAdapter(medicalAdapter);

Console.WriteLine("Adapters queued for merge:");
Console.WriteLine("  - customer-support (scale: 1.0)");
Console.WriteLine("  - legal-review (scale: 0.7)");
Console.WriteLine("  - medical (scale: 0.5)");

// ──────────────────────────────────────
// 3. Configure merge options
// ──────────────────────────────────────
merger.EnableQuantization = true;  // quantize during merge
merger.ThreadCount = Environment.ProcessorCount;

// ──────────────────────────────────────
// 4. Execute the merge
// ──────────────────────────────────────
Console.WriteLine($"\nMerging into: {mergedOutputPath}");
merger.Merge(mergedOutputPath);
Console.WriteLine("Merge complete.");

// ──────────────────────────────────────
// 5. Load and test the merged model
// ──────────────────────────────────────
using LM mergedModel = new LM(mergedOutputPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading merged model: {p * 100:F0}%   ");
        return true;
    });

Console.WriteLine($"\n  Loaded merged model: {mergedModel.Name}");
Console.WriteLine($"  No adapters needed: {mergedModel.Adapters.Count} active adapters");

Step 7: Merge Using a Loaded Model

If you already have a model loaded in memory, you can merge directly without specifying the file path again:

using LM model = new LM("models/base-model.gguf");

// Create merger from the loaded model
var merger = new LoraMerger(model);

merger.AddLoraAdapter("adapters/support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/tone-lora.gguf", scale: 0.4f);

// Merge to a new file
merger.Merge("models/production-model.gguf");
Console.WriteLine("Merged successfully from loaded model.");

// Clear adapters if you want to reuse the merger
merger.ClearAdapters();

API Reference

Hot-Swap Methods (on `LM`)

Method	Description
`ApplyLoraAdapter(string path, float scale = 1)`	Apply a LoRA adapter from a file path
`ApplyLoraAdapter(LoraAdapterSource source)`	Apply using a pre-configured source
`RemoveLoraAdapter(LoraAdapter adapter)`	Remove a specific adapter; returns `true` if found
`Adapters`	Read-only list of currently active adapters

LoraAdapter Properties

Property	Type	Description
`Identifier`	`string`	Unique identifier for this adapter
`Path`	`string`	File path of the adapter
`Scale`	`float`	Influence weight (0.0 = disabled, 1.0 = full); settable

LoraMerger Members

Member	Description
`LoraMerger(string modelPath)`	Create merger from a model file path
`LoraMerger(LM model)`	Create merger from a loaded model
`AddLoraAdapter(string path, float scale)`	Queue an adapter for merging
`AddLoraAdapter(LoraAdapterSource source)`	Queue using adapter source
`ClearAdapters()`	Remove all queued adapters
`Merge(string outputPath, MetadataCollection overrides)`	Execute the merge
`EnableQuantization`	Quantize the merged model
`ThreadCount`	Thread count for merge operation

Common Issues

Problem	Cause	Fix
`ValidateFormat` returns false	Adapter was trained for a different architecture	Verify the adapter was fine-tuned from the same base model
Degraded output quality after merge	Adapter scales too high when combining multiple adapters	Reduce individual scales (e.g., 0.5 each for two adapters)
Out of memory during merge	Base model + adapters exceed available RAM	Use `ThreadCount = 1` to reduce peak memory, or merge on a machine with more RAM
No effect after `ApplyLoraAdapter`	Adapter scale set to 0	Check `adapter.Scale` is greater than 0

Next Steps

Prepare Training Datasets for LoRA Fine-Tuning: create the datasets that produce LoRA adapters.
Browse and Select Models Programmatically: pick the right base model.
Distribute Large Models Across Multiple GPUs: run large merged models across GPUs.

Table of Contents