Table of Contents

Load and Merge LoRA Adapters at Inference Time

LoRA (Low-Rank Adaptation) adapters let you customize a base model's behavior without retraining from scratch. LM-Kit.NET supports two strategies for using LoRA adapters at inference time: hot-swapping (applying and removing adapters on a running model) and permanent merging (baking one or more adapters into a new model file). This tutorial covers both approaches, from applying a single adapter to blending multiple domain-specific adapters into one model.


Why LoRA Adapter Composition Matters

Two real-world problems that LoRA adapter composition solves:

  1. Multi-tenant customization without model duplication. A SaaS platform serves multiple customers, each with fine-tuned behavior. Instead of loading separate model files per tenant, you load one base model and hot-swap LoRA adapters per request. This cuts VRAM usage dramatically.
  2. Combining domain expertise from multiple fine-tunes. A legal assistant fine-tuned on contracts and another fine-tuned on regulatory filings can be merged into a single adapter. Merging avoids the latency of switching adapters mid-conversation and produces a model that handles both domains simultaneously.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
Base model Any GGUF model file
LoRA adapter(s) GGUF-format LoRA files compatible with the base model

Step 1: Create the Project

dotnet new console -n LoraAdapters
cd LoraAdapters
dotnet add package LM-Kit.NET

Step 2: Understand the Two Strategies

┌───────────────────────────────────────────────┐
│             Base Model (GGUF)                 │
└───────────────────┬───────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌───────────────┐     ┌─────────────────┐
│  Hot-Swap     │     │  Permanent      │
│  (runtime)    │     │  Merge          │
│               │     │                 │
│ ApplyLora()   │     │  LoraMerger     │
│ RemoveLora()  │     │  .Merge()       │
│               │     │                 │
│ Instant swap  │     │ New GGUF file   │
│ No disk I/O   │     │ No runtime cost │
└───────────────┘     └─────────────────┘
Approach Best for Trade-off
Hot-swap Per-request customization, A/B testing, multi-tenant Small runtime overhead per token
Permanent merge Production deployment, single-purpose models Disk space for merged file

Step 3: Validate LoRA Compatibility

Before using a LoRA adapter, verify it matches the base model architecture:

using LMKit.Finetuning;

string baseModelPath = "models/base-model.gguf";
string loraPath = "adapters/customer-support-lora.gguf";

// Validate that the adapter is compatible with the base model
bool isValid = LoraAdapterSource.ValidateFormat(loraPath);

if (isValid)
{
    Console.WriteLine("LoRA adapter format is valid.");
}
else
{
    Console.WriteLine("LoRA adapter format is invalid or corrupted.");
}

Step 4: Hot-Swap LoRA Adapters at Runtime

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";

using LM model = new LM(baseModelPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading base model: {p * 100:F0}%   ");
        return true;
    });

Console.WriteLine($"\n  Loaded: {model.Name}\n");

// ──────────────────────────────────────
// 2. Apply a LoRA adapter
// ──────────────────────────────────────
string loraPathSupport = "adapters/customer-support-lora.gguf";

model.ApplyLoraAdapter(loraPathSupport, scale: 1.0f);
Console.WriteLine($"Applied adapter. Active adapters: {model.Adapters.Count}");

foreach (LoraAdapter adapter in model.Adapters)
{
    Console.WriteLine($"  - {adapter.Identifier} (scale: {adapter.Scale:F2}, path: {adapter.Path})");
}

// ──────────────────────────────────────
// 3. Run inference with the adapter active
// ──────────────────────────────────────
var chat = new SingleTurnConversation(model)
{
    SystemPrompt = "You are a helpful customer support agent.",
    MaximumCompletionTokens = 256
};

TextGenerationResult result = chat.Submit("How do I reset my password?");
Console.WriteLine($"\n[With support adapter]\n{result.Completion}");

// ──────────────────────────────────────
// 4. Swap to a different adapter
// ──────────────────────────────────────
// Remove the current adapter
LoraAdapter currentAdapter = model.Adapters[0];
bool removed = model.RemoveLoraAdapter(currentAdapter);
Console.WriteLine($"\nRemoved adapter: {removed}");

// Apply a different adapter
string loraPathLegal = "adapters/legal-review-lora.gguf";
model.ApplyLoraAdapter(loraPathLegal, scale: 0.8f);

var legalChat = new SingleTurnConversation(model)
{
    SystemPrompt = "You are a legal document reviewer.",
    MaximumCompletionTokens = 256
};

TextGenerationResult legalResult = legalChat.Submit("Summarize the key obligations in this contract clause.");
Console.WriteLine($"\n[With legal adapter]\n{legalResult.Completion}");

// ──────────────────────────────────────
// 5. Use LoraAdapterSource for more control
// ──────────────────────────────────────
var adapterSource = new LoraAdapterSource(
    path: "adapters/medical-lora.gguf",
    scale: 0.6f);

// Remove previous adapter first
model.RemoveLoraAdapter(model.Adapters[0]);

// Apply using the source object
model.ApplyLoraAdapter(adapterSource);
Console.WriteLine($"\nApplied medical adapter at scale {adapterSource.Scale:F1}");
Console.WriteLine($"Active adapters: {model.Adapters.Count}");

Step 5: Adjust Adapter Scale Dynamically

You can change an adapter's influence without removing and re-applying it:

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the base model
// ──────────────────────────────────────
string baseModelPath = "models/base-model.gguf";

using LM model = new LM(baseModelPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading base model: {p * 100:F0}%   ");
        return true;
    });

LoraAdapter activeAdapter = model.Adapters[0];

// Start with full influence
activeAdapter.Scale = 1.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");

// Reduce influence to 50%
activeAdapter.Scale = 0.5f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2}");

// Effectively disable without removing
activeAdapter.Scale = 0.0f;
Console.WriteLine($"Scale: {activeAdapter.Scale:F2} (disabled)");

This is useful for gradually blending adapter behavior during A/B testing or for implementing a "confidence dial" that controls how much the fine-tuned behavior influences responses.


Step 6: Permanently Merge LoRA Adapters

using System.Text;
using LMKit.Finetuning;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

string baseModelPath = "models/base-model.gguf";
string mergedOutputPath = "models/merged-model.gguf";

// ──────────────────────────────────────
// 1. Create a merger from the base model path
// ──────────────────────────────────────
var merger = new LoraMerger(baseModelPath);

// ──────────────────────────────────────
// 2. Add multiple adapters with different scales
// ──────────────────────────────────────
merger.AddLoraAdapter("adapters/customer-support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/legal-review-lora.gguf", scale: 0.7f);

// Or use LoraAdapterSource objects
var medicalAdapter = new LoraAdapterSource("adapters/medical-lora.gguf", scale: 0.5f);
merger.AddLoraAdapter(medicalAdapter);

Console.WriteLine("Adapters queued for merge:");
Console.WriteLine("  - customer-support (scale: 1.0)");
Console.WriteLine("  - legal-review (scale: 0.7)");
Console.WriteLine("  - medical (scale: 0.5)");

// ──────────────────────────────────────
// 3. Configure merge options
// ──────────────────────────────────────
merger.EnableQuantization = true;  // quantize during merge
merger.ThreadCount = Environment.ProcessorCount;

// ──────────────────────────────────────
// 4. Execute the merge
// ──────────────────────────────────────
Console.WriteLine($"\nMerging into: {mergedOutputPath}");
merger.Merge(mergedOutputPath);
Console.WriteLine("Merge complete.");

// ──────────────────────────────────────
// 5. Load and test the merged model
// ──────────────────────────────────────
using LM mergedModel = new LM(mergedOutputPath,
    loadingProgress: p =>
    {
        Console.Write($"\r  Loading merged model: {p * 100:F0}%   ");
        return true;
    });

Console.WriteLine($"\n  Loaded merged model: {mergedModel.Name}");
Console.WriteLine($"  No adapters needed: {mergedModel.Adapters.Count} active adapters");

Step 7: Merge Using a Loaded Model

If you already have a model loaded in memory, you can merge directly without specifying the file path again:

using LM model = new LM("models/base-model.gguf");

// Create merger from the loaded model
var merger = new LoraMerger(model);

merger.AddLoraAdapter("adapters/support-lora.gguf", scale: 1.0f);
merger.AddLoraAdapter("adapters/tone-lora.gguf", scale: 0.4f);

// Merge to a new file
merger.Merge("models/production-model.gguf");
Console.WriteLine("Merged successfully from loaded model.");

// Clear adapters if you want to reuse the merger
merger.ClearAdapters();

API Reference

Hot-Swap Methods (on LM)

Method Description
ApplyLoraAdapter(string path, float scale = 1) Apply a LoRA adapter from a file path
ApplyLoraAdapter(LoraAdapterSource source) Apply using a pre-configured source
RemoveLoraAdapter(LoraAdapter adapter) Remove a specific adapter; returns true if found
Adapters Read-only list of currently active adapters

LoraAdapter Properties

Property Type Description
Identifier string Unique identifier for this adapter
Path string File path of the adapter
Scale float Influence weight (0.0 = disabled, 1.0 = full); settable

LoraMerger Members

Member Description
LoraMerger(string modelPath) Create merger from a model file path
LoraMerger(LM model) Create merger from a loaded model
AddLoraAdapter(string path, float scale) Queue an adapter for merging
AddLoraAdapter(LoraAdapterSource source) Queue using adapter source
ClearAdapters() Remove all queued adapters
Merge(string outputPath, MetadataCollection overrides) Execute the merge
EnableQuantization Quantize the merged model
ThreadCount Thread count for merge operation

Common Issues

Problem Cause Fix
ValidateFormat returns false Adapter was trained for a different architecture Verify the adapter was fine-tuned from the same base model
Degraded output quality after merge Adapter scales too high when combining multiple adapters Reduce individual scales (e.g., 0.5 each for two adapters)
Out of memory during merge Base model + adapters exceed available RAM Use ThreadCount = 1 to reduce peak memory, or merge on a machine with more RAM
No effect after ApplyLoraAdapter Adapter scale set to 0 Check adapter.Scale is greater than 0

Next Steps

Share