Migrate from Cloud AI APIs to Local Inference with Microsoft.Extensions.AI
If your .NET application already uses the IChatClient or IEmbeddingGenerator interfaces from Microsoft.Extensions.AI, you can swap cloud-based providers (OpenAI, Azure OpenAI) for local LLM inference powered by LM-Kit.NET. No rewrites. Same interfaces, same dependency injection, same calling code. The only change is the service registration. This guide shows how to make that switch.
Why Migrate to Local Inference
Two enterprise problems that switching from cloud APIs to local models solves:
- API costs compound at scale. A customer support application handling 10,000 conversations per day at $0.01 per call adds up to $36,500 per year in API fees alone. A one-time hardware investment and a local model eliminates recurring per-token costs entirely, with predictable infrastructure budgets.
- Data never leaves your network. When your application processes medical records, legal contracts, or financial statements, every API call sends sensitive data to a third-party server. Local inference keeps all data on your machines, eliminating data processing agreements, third-party audit requirements, and the risk of data exposure.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB (for a 4B chat model) |
| NuGet packages | LM-Kit.NET, LM-Kit.NET.ExtensionsAI |
Step 1: Install Packages
dotnet new console -n LocalInferenceApp
cd LocalInferenceApp
dotnet add package LM-Kit.NET
dotnet add package LM-Kit.NET.ExtensionsAI
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.DependencyInjection
Step 2: Understand the Bridge Architecture
The LM-Kit.NET Extensions.AI bridge implements Microsoft's standard AI abstractions, so your application code works identically whether the backend is a cloud API or a local model.
┌─────────────────────────────────────────────┐
│ Your Application Code │
│ │
│ IChatClient IEmbeddingGenerator │
│ │ │ │
└───────┼────────────────────────┼────────────┘
│ │
┌───────▼────────┐ ┌─────────▼──────────┐
│ Cloud Provider │ │ Cloud Provider │
│ (OpenAI SDK) │ │ (OpenAI SDK) │
└────────────────┘ └────────────────────┘
│ │
▼ REPLACE ▼
┌───────────────────┐ ┌────────────────────────┐
│ LMKitChatClient │ │ LMKitEmbeddingGenerator│
│ (local model) │ │ (local model) │
└───────────────────┘ └────────────────────────┘
| Interface | Cloud Implementation | LM-Kit Implementation |
|---|---|---|
IChatClient |
OpenAIChatClient |
LMKitChatClient |
IEmbeddingGenerator<string, Embedding<float>> |
OpenAIEmbeddingGenerator |
LMKitEmbeddingGenerator |
Step 3: Replace the Cloud Chat Client
Before (cloud-based)
// Typical OpenAI setup
using Microsoft.Extensions.AI;
using OpenAI;
IChatClient chatClient = new OpenAIClient("sk-your-api-key")
.GetChatClient("gpt-4o-mini")
.AsIChatClient();
After (local inference with LM-Kit)
using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.ChatClient;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
// Load a local model (one-time download, then runs offline)
using LM model = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (path, contentLength, bytesRead) =>
{
if (contentLength.HasValue && contentLength.Value > 0)
{
double pct = (double)bytesRead / contentLength.Value * 100;
Console.Write($"\rDownloading model... {pct:F1}%");
}
return true;
});
Console.WriteLine();
// Create the bridge (same IChatClient interface)
IChatClient chatClient = new LMKitChatClient(model);
Your existing calling code stays exactly the same:
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a helpful assistant."),
new(ChatRole.User, "What are the benefits of local AI inference?")
};
// This call works identically with both cloud and local backends
ChatResponse response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response.Text);
Step 4: Replace Cloud Embeddings
Before (cloud-based)
using Microsoft.Extensions.AI;
using OpenAI;
IEmbeddingGenerator<string, Embedding<float>> embedder =
new OpenAIClient("sk-your-api-key")
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();
After (local inference with LM-Kit)
using Microsoft.Extensions.AI;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI.Embeddings;
// Load a local embedding model
using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b");
IEmbeddingGenerator<string, Embedding<float>> embedder =
new LMKitEmbeddingGenerator(embeddingModel);
Generate embeddings with the same interface:
var result = await embedder.GenerateAsync(new[] { "How to deploy AI offline" });
Embedding<float> embedding = result[0];
Console.WriteLine($"Embedding dimensions: {embedding.Vector.Length}");
Step 5: Wire Up Dependency Injection
For production applications, register LM-Kit services through DI so they can be injected anywhere in your application.
using System.Text;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using LMKit.Model;
using LMKit.Integrations.ExtensionsAI;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\rChat model: {(double)read / len.Value * 100:F1}%");
return true;
});
Console.WriteLine();
using LM embeddingModel = LM.LoadFromModelID("qwen3-embedding:0.6b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\rEmbedding model: {(double)read / len.Value * 100:F1}%");
return true;
});
Console.WriteLine();
// ──────────────────────────────────────
// 2. Register services
// ──────────────────────────────────────
var services = new ServiceCollection();
services.AddLMKitChatClient(chatModel, new ChatOptions
{
Temperature = 0.7f,
MaxOutputTokens = 512
});
services.AddLMKitEmbeddingGenerator(embeddingModel);
var provider = services.BuildServiceProvider();
// ──────────────────────────────────────
// 3. Resolve and use (same as cloud code)
// ──────────────────────────────────────
var client = provider.GetRequiredService<IChatClient>();
var embedGen = provider.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>();
var response = await client.GetResponseAsync(
new List<ChatMessage>
{
new(ChatRole.User, "Explain embeddings in one sentence.")
});
Console.WriteLine(response.Text);
var embeddings = await embedGen.GenerateAsync(new[] { "test sentence" });
Console.WriteLine($"Embedding vector length: {embeddings[0].Vector.Length}");
Key point: If you later need to switch back to cloud (for burst capacity, for example), you only change the DI registration. Application code is untouched.
Step 6: Use Streaming and Tool Calling
Both streaming and function calling work through the standard IChatClient interface.
Streaming Responses
var messages = new List<ChatMessage>
{
new(ChatRole.User, "Write a short poem about local AI.")
};
await foreach (ChatResponseUpdate update in chatClient.GetStreamingResponseAsync(messages))
{
Console.Write(update.Text);
}
Console.WriteLine();
Tool Calling
The bridge automatically adapts Microsoft.Extensions.AI AIFunction definitions to LM-Kit's tool system:
using Microsoft.Extensions.AI;
// Define a tool using AIFunctionFactory
var weatherTool = AIFunctionFactory.Create(
(string city) => $"The weather in {city} is 22°C and sunny.",
"get_weather",
"Get the current weather for a city");
var options = new ChatOptions
{
Tools = new List<AITool> { weatherTool },
ToolMode = ChatToolMode.Auto
};
var messages = new List<ChatMessage>
{
new(ChatRole.User, "What's the weather in Paris?")
};
var response = await chatClient.GetResponseAsync(messages, options);
Console.WriteLine(response.Text);
Migration Checklist
| Step | Cloud Code | Local Equivalent |
|---|---|---|
| Package | Azure.AI.OpenAI or OpenAI |
LM-Kit.NET + LM-Kit.NET.ExtensionsAI |
| Chat client | new OpenAIChatClient(...) |
new LMKitChatClient(model) |
| Embeddings | new OpenAIEmbeddingGenerator(...) |
new LMKitEmbeddingGenerator(model) |
| DI registration | AddOpenAIChatClient(...) |
AddLMKitChatClient(model) |
| API key | Required | Not needed |
| Internet | Required for every call | Not needed after model download |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
InvalidModelException on chat |
Model lacks chat capability | Use a chat model (gemma3:4b, qwen3:8b), not an embedding model |
| Empty embedding vectors | Model is not an embedding model | Use qwen3-embedding:0.6b or embeddinggemma-300m for embeddings |
| Slower first response | Model loading is deferred | Pre-load models at application startup, not on first request |
NullReferenceException in DI |
Model disposed before client | Keep the LM instance alive for the application's lifetime |
Next Steps
- Build and Deploy an Offline AI Application for Edge Environments to package your local application for air-gapped deployment
- Build a RAG Pipeline Over Your Own Documents to add retrieval-augmented generation
- Create an AI Agent with Tools for advanced agent capabilities beyond the
IChatClientinterface - Microsoft.Extensions.AI Integration sample for a runnable demo