Migrate from Cloud AI APIs to Local Inference with Microsoft.Extensions.AI

If your .NET application already uses the IChatClient or IEmbeddingGenerator interfaces from Microsoft.Extensions.AI, you can swap cloud-based providers (OpenAI, Azure OpenAI) for local LLM inference powered by LM-Kit.NET. No rewrites. Same interfaces, same dependency injection, same calling code. The only change is the service registration. This guide shows how to make that switch.

Why Migrate to Local Inference

Two enterprise problems that switching from cloud APIs to local models solves:

API costs compound at scale. A customer support application handling 10,000 conversations per day at $0.01 per call adds up to $36,500 per year in API fees alone. A one-time hardware investment and a local model eliminates recurring per-token costs entirely, with predictable infrastructure budgets.
Data never leaves your network. When your application processes medical records, legal contracts, or financial statements, every API call sends sensitive data to a third-party server. Local inference keeps all data on your machines, eliminating data processing agreements, third-party audit requirements, and the risk of data exposure.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB (for a 4B chat model)
NuGet packages	`LM-Kit.NET`, `LM-Kit.NET.ExtensionsAI`

Step 1: Install Packages

dotnet new console -n LocalInferenceApp
cd LocalInferenceApp
dotnet add package LM-Kit.NET
dotnet add package LM-Kit.NET.ExtensionsAI
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.DependencyInjection

Step 2: Understand the Bridge Architecture

The LM-Kit.NET Extensions.AI bridge implements Microsoft's standard AI abstractions, so your application code works identically whether the backend is a cloud API or a local model.

        ┌─────────────────────────────────────────────┐
        │           Your Application Code             │
        │                                             │
        │   IChatClient          IEmbeddingGenerator  │
        │       │                        │            │
        └───────┼────────────────────────┼────────────┘
                │                        │
        ┌───────▼────────┐     ┌─────────▼──────────┐
        │ Cloud Provider │     │   Cloud Provider   │
        │  (OpenAI SDK)  │     │   (OpenAI SDK)     │
        └────────────────┘     └────────────────────┘
                │                        │
                ▼        REPLACE         ▼
        ┌───────────────────┐  ┌────────────────────────┐
        │  LMKitChatClient  │  │ LMKitEmbeddingGenerator│
        │  (local model)    │  │ (local model)          │
        └───────────────────┘  └────────────────────────┘

Interface	Cloud Implementation	LM-Kit Implementation
`IChatClient`	`OpenAIChatClient`	`LMKitChatClient`
`IEmbeddingGenerator<string, Embedding<float>>`	`OpenAIEmbeddingGenerator`	`LMKitEmbeddingGenerator`

Step 3: Replace the Cloud Chat Client

Before (cloud-based)

// Typical OpenAI setup
using Microsoft.Extensions.AI;
using OpenAI;

IChatClient chatClient = new OpenAIClient("sk-your-api-key")
    .GetChatClient("gpt-4o-mini")
    .AsIChatClient();

After (local inference with LM-Kit)

using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.ChatClient;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

// Load a local model (one-time download, then runs offline)
using LM model = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (path, contentLength, bytesRead) =>
    {
        if (contentLength.HasValue && contentLength.Value > 0)
        {
            double pct = (double)bytesRead / contentLength.Value * 100;
            Console.Write($"\rDownloading model... {pct:F1}%");
        }
        return true;
    });
Console.WriteLine();

// Create the bridge (same IChatClient interface)
IChatClient chatClient = new LMKitChatClient(model);

Your existing calling code stays exactly the same:

var messages = new List<ChatMessage>
{
    new(ChatRole.System, "You are a helpful assistant."),
    new(ChatRole.User, "What are the benefits of local AI inference?")
};

// This call works identically with both cloud and local backends
ChatResponse response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response.Text);

Step 4: Replace Cloud Embeddings

Before (cloud-based)

using Microsoft.Extensions.AI;
using OpenAI;

IEmbeddingGenerator<string, Embedding<float>> embedder =
    new OpenAIClient("sk-your-api-key")
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();

After (local inference with LM-Kit)

using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.Embeddings;

// Load a local embedding model
using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b");

IEmbeddingGenerator<string, Embedding<float>> embedder =
    new LMKitEmbeddingGenerator(embeddingModel);

Generate embeddings with the same interface:

var result = await embedder.GenerateAsync(new[] { "How to deploy AI offline" });
Embedding<float> embedding = result[0];
Console.WriteLine($"Embedding dimensions: {embedding.Vector.Length}");

Step 5: Wire Up Dependency Injection

For production applications, register LM-Kit services through DI so they can be injected anywhere in your application.

using System.Text;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI;

LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\rChat model: {(double)read / len.Value * 100:F1}%");
        return true;
    });
Console.WriteLine();

using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\rEmbedding model: {(double)read / len.Value * 100:F1}%");
        return true;
    });
Console.WriteLine();

// ──────────────────────────────────────
// 2. Register services
// ──────────────────────────────────────
var services = new ServiceCollection();

services.AddLMKitChatClient(chatModel, new ChatOptions
{
    Temperature = 0.7f,
    MaxOutputTokens = 512
});
services.AddLMKitEmbeddingGenerator(embeddingModel);

var provider = services.BuildServiceProvider();

// ──────────────────────────────────────
// 3. Resolve and use (same as cloud code)
// ──────────────────────────────────────
var client = provider.GetRequiredService<IChatClient>();
var embedGen = provider.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>();

var response = await client.GetResponseAsync(
    new List<ChatMessage>
    {
        new(ChatRole.User, "Explain embeddings in one sentence.")
    });
Console.WriteLine(response.Text);

var embeddings = await embedGen.GenerateAsync(new[] { "test sentence" });
Console.WriteLine($"Embedding vector length: {embeddings[0].Vector.Length}");

Key point: If you later need to switch back to cloud (for burst capacity, for example), you only change the DI registration. Application code is untouched.

Step 6: Use Streaming and Tool Calling

Both streaming and function calling work through the standard IChatClient interface.

Streaming Responses

var messages = new List<ChatMessage>
{
    new(ChatRole.User, "Write a short poem about local AI.")
};

await foreach (ChatResponseUpdate update in chatClient.GetStreamingResponseAsync(messages))
{
    Console.Write(update.Text);
}
Console.WriteLine();

Tool Calling

The bridge automatically adapts Microsoft.Extensions.AI AIFunction definitions to LM-Kit's tool system:

using Microsoft.Extensions.AI;

// Define a tool using AIFunctionFactory
var weatherTool = AIFunctionFactory.Create(
    (string city) => $"The weather in {city} is 22°C and sunny.",
    "get_weather",
    "Get the current weather for a city");

var options = new ChatOptions
{
    Tools = new List<AITool> { weatherTool },
    ToolMode = ChatToolMode.Auto
};

var messages = new List<ChatMessage>
{
    new(ChatRole.User, "What's the weather in Paris?")
};

var response = await chatClient.GetResponseAsync(messages, options);
Console.WriteLine(response.Text);

Migration Checklist

Step	Cloud Code	Local Equivalent
Package	`Azure.AI.OpenAI` or `OpenAI`	`LM-Kit.NET` + `LM-Kit.NET.ExtensionsAI`
Chat client	`new OpenAIChatClient(...)`	`new LMKitChatClient(model)`
Embeddings	`new OpenAIEmbeddingGenerator(...)`	`new LMKitEmbeddingGenerator(model)`
DI registration	`AddOpenAIChatClient(...)`	`AddLMKitChatClient(model)`
API key	Required	Not needed
Internet	Required for every call	Not needed after model download

Common Issues

Problem	Cause	Fix
`InvalidModelException` on chat	Model lacks chat capability	Use a chat model (`gemma4:e4b`, `qwen3.5:9b`), not an embedding model
Empty embedding vectors	Model is not an embedding model	Use `qwen3-embedding:0.6b`, `harrier-oss:0.6b`, or `embeddinggemma-300m` for embeddings
Slower first response	Model loading is deferred	Pre-load models at application startup, not on first request
`NullReferenceException` in DI	Model disposed before client	Keep the `LM` instance alive for the application's lifetime

Next Steps

Build and Deploy an Offline AI Application for Edge Environments to package your local application for air-gapped deployment
Build a RAG Pipeline Over Your Own Documents to add retrieval-augmented generation
Create an AI Agent with Tools for advanced agent capabilities beyond the IChatClient interface
Microsoft.Extensions.AI Integration sample for a runnable demo

Table of Contents

Migrate from Cloud AI APIs to Local Inference with Microsoft.Extensions.AI

Why Migrate to Local Inference

Prerequisites

Step 1: Install Packages

Step 2: Understand the Bridge Architecture

Step 3: Replace the Cloud Chat Client

Before (cloud-based)

After (local inference with LM-Kit)

Step 4: Replace Cloud Embeddings

Before (cloud-based)

After (local inference with LM-Kit)

Step 5: Wire Up Dependency Injection

Step 6: Use Streaming and Tool Calling

Streaming Responses

Tool Calling

Migration Checklist

Common Issues

Next Steps