Table of Contents

Migrate from Cloud AI APIs to Local Inference with Microsoft.Extensions.AI

If your .NET application already uses the IChatClient or IEmbeddingGenerator interfaces from Microsoft.Extensions.AI, you can swap cloud-based providers (OpenAI, Azure OpenAI) for local LLM inference powered by LM-Kit.NET. No rewrites. Same interfaces, same dependency injection, same calling code. The only change is the service registration. This guide shows how to make that switch.


Why Migrate to Local Inference

Two enterprise problems that switching from cloud APIs to local models solves:

  1. API costs compound at scale. A customer support application handling 10,000 conversations per day at $0.01 per call adds up to $36,500 per year in API fees alone. A one-time hardware investment and a local model eliminates recurring per-token costs entirely, with predictable infrastructure budgets.
  2. Data never leaves your network. When your application processes medical records, legal contracts, or financial statements, every API call sends sensitive data to a third-party server. Local inference keeps all data on your machines, eliminating data processing agreements, third-party audit requirements, and the risk of data exposure.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB (for a 4B chat model)
NuGet packages LM-Kit.NET, LM-Kit.NET.ExtensionsAI

Step 1: Install Packages

dotnet new console -n LocalInferenceApp
cd LocalInferenceApp
dotnet add package LM-Kit.NET
dotnet add package LM-Kit.NET.ExtensionsAI
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.DependencyInjection

Step 2: Understand the Bridge Architecture

The LM-Kit.NET Extensions.AI bridge implements Microsoft's standard AI abstractions, so your application code works identically whether the backend is a cloud API or a local model.

        ┌─────────────────────────────────────────────┐
        │           Your Application Code             │
        │                                             │
        │   IChatClient          IEmbeddingGenerator  │
        │       │                        │            │
        └───────┼────────────────────────┼────────────┘
                │                        │
        ┌───────▼────────┐     ┌─────────▼──────────┐
        │ Cloud Provider │     │   Cloud Provider   │
        │  (OpenAI SDK)  │     │   (OpenAI SDK)     │
        └────────────────┘     └────────────────────┘
                │                        │
                ▼        REPLACE         ▼
        ┌───────────────────┐  ┌────────────────────────┐
        │  LMKitChatClient  │  │ LMKitEmbeddingGenerator│
        │  (local model)    │  │ (local model)          │
        └───────────────────┘  └────────────────────────┘
Interface Cloud Implementation LM-Kit Implementation
IChatClient OpenAIChatClient LMKitChatClient
IEmbeddingGenerator<string, Embedding<float>> OpenAIEmbeddingGenerator LMKitEmbeddingGenerator

Step 3: Replace the Cloud Chat Client

Before (cloud-based)

// Typical OpenAI setup
using Microsoft.Extensions.AI;
using OpenAI;

IChatClient chatClient = new OpenAIClient("sk-your-api-key")
    .GetChatClient("gpt-4o-mini")
    .AsIChatClient();

After (local inference with LM-Kit)

using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.ChatClient;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

// Load a local model (one-time download, then runs offline)
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (path, contentLength, bytesRead) =>
    {
        if (contentLength.HasValue && contentLength.Value > 0)
        {
            double pct = (double)bytesRead / contentLength.Value * 100;
            Console.Write($"\rDownloading model... {pct:F1}%");
        }
        return true;
    });
Console.WriteLine();

// Create the bridge (same IChatClient interface)
IChatClient chatClient = new LMKitChatClient(model);

Your existing calling code stays exactly the same:

var messages = new List<ChatMessage>
{
    new(ChatRole.System, "You are a helpful assistant."),
    new(ChatRole.User, "What are the benefits of local AI inference?")
};

// This call works identically with both cloud and local backends
ChatResponse response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response.Text);

Step 4: Replace Cloud Embeddings

Before (cloud-based)

using Microsoft.Extensions.AI;
using OpenAI;

IEmbeddingGenerator<string, Embedding<float>> embedder =
    new OpenAIClient("sk-your-api-key")
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

After (local inference with LM-Kit)

using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.Embeddings;

// Load a local embedding model
using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b");

IEmbeddingGenerator<string, Embedding<float>> embedder =
    new LMKitEmbeddingGenerator(embeddingModel);

Generate embeddings with the same interface:

var result = await embedder.GenerateAsync(new[] { "How to deploy AI offline" });
Embedding<float> embedding = result[0];
Console.WriteLine($"Embedding dimensions: {embedding.Vector.Length}");

Step 5: Wire Up Dependency Injection

For production applications, register LM-Kit services through DI so they can be injected anywhere in your application.

using System.Text;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI;

LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\rChat model: {(double)read / len.Value * 100:F1}%");
        return true;
    });
Console.WriteLine();

using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\rEmbedding model: {(double)read / len.Value * 100:F1}%");
        return true;
    });
Console.WriteLine();

// ──────────────────────────────────────
// 2. Register services
// ──────────────────────────────────────
var services = new ServiceCollection();

services.AddLMKitChatClient(chatModel, new ChatOptions
{
    Temperature = 0.7f,
    MaxOutputTokens = 512
});
services.AddLMKitEmbeddingGenerator(embeddingModel);

var provider = services.BuildServiceProvider();

// ──────────────────────────────────────
// 3. Resolve and use (same as cloud code)
// ──────────────────────────────────────
var client = provider.GetRequiredService<IChatClient>();
var embedGen = provider.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>();

var response = await client.GetResponseAsync(
    new List<ChatMessage>
    {
        new(ChatRole.User, "Explain embeddings in one sentence.")
    });
Console.WriteLine(response.Text);

var embeddings = await embedGen.GenerateAsync(new[] { "test sentence" });
Console.WriteLine($"Embedding vector length: {embeddings[0].Vector.Length}");

Key point: If you later need to switch back to cloud (for burst capacity, for example), you only change the DI registration. Application code is untouched.


Step 6: Use Streaming and Tool Calling

Both streaming and function calling work through the standard IChatClient interface.

Streaming Responses

var messages = new List<ChatMessage>
{
    new(ChatRole.User, "Write a short poem about local AI.")
};

await foreach (ChatResponseUpdate update in chatClient.GetStreamingResponseAsync(messages))
{
    Console.Write(update.Text);
}
Console.WriteLine();

Tool Calling

The bridge automatically adapts Microsoft.Extensions.AI AIFunction definitions to LM-Kit's tool system:

using Microsoft.Extensions.AI;

// Define a tool using AIFunctionFactory
var weatherTool = AIFunctionFactory.Create(
    (string city) => $"The weather in {city} is 22°C and sunny.",
    "get_weather",
    "Get the current weather for a city");

var options = new ChatOptions
{
    Tools = new List<AITool> { weatherTool },
    ToolMode = ChatToolMode.Auto
};

var messages = new List<ChatMessage>
{
    new(ChatRole.User, "What's the weather in Paris?")
};

var response = await chatClient.GetResponseAsync(messages, options);
Console.WriteLine(response.Text);

Migration Checklist

Step Cloud Code Local Equivalent
Package Azure.AI.OpenAI or OpenAI LM-Kit.NET + LM-Kit.NET.ExtensionsAI
Chat client new OpenAIChatClient(...) new LMKitChatClient(model)
Embeddings new OpenAIEmbeddingGenerator(...) new LMKitEmbeddingGenerator(model)
DI registration AddOpenAIChatClient(...) AddLMKitChatClient(model)
API key Required Not needed
Internet Required for every call Not needed after model download

Common Issues

Problem Cause Fix
InvalidModelException on chat Model lacks chat capability Use a chat model (gemma3:4b, qwen3:8b), not an embedding model
Empty embedding vectors Model is not an embedding model Use qwen3-embedding:0.6b or embeddinggemma-300m for embeddings
Slower first response Model loading is deferred Pre-load models at application startup, not on first request
NullReferenceException in DI Model disposed before client Keep the LM instance alive for the application's lifetime

Next Steps

Share