👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/extensions_ai_integration

Microsoft.Extensions.AI Integration for C# .NET Applications

🎯 Purpose of the Demo

This Microsoft.Extensions.AI Integration Demo illustrates how to use the LM-Kit.NET.ExtensionsAI NuGet package to run LM-Kit.NET through the standard IChatClient and IEmbeddingGenerator abstractions from Microsoft.Extensions.AI. By implementing these interfaces, LM-Kit.NET becomes a drop-in provider for any .NET application that targets the M.E.AI abstraction layer, enabling middleware composition, dependency injection, and provider-agnostic AI code.

👥 Industry Target Audience

This demo is designed for developers and teams adopting the standard .NET AI abstraction layer:

🤖 AI & Chatbot Development: Use LM-Kit.NET as a local provider behind the IChatClient interface, enabling seamless switching between cloud and on-device inference.
📚 Knowledge Management: Combine IEmbeddingGenerator with in-memory or external vector stores to build RAG pipelines using the standard .NET abstractions.
🏢 Enterprise & Compliance: Run fully local AI inference while keeping your codebase portable across providers through the M.E.AI contract.
🔧 Platform & Library Authors: Build middleware, caching layers, or telemetry decorators that work with any IChatClient provider, including LM-Kit.NET.
💼 Business & Enterprise: Integrate local LLM capabilities into existing .NET applications that already use dependency injection and the M.E.AI ecosystem.

🚀 Problem Solved

Many .NET applications target the IChatClient and IEmbeddingGenerator abstractions so they can swap AI providers without code changes. This demo shows how to:

Use LM-Kit.NET through standard interfaces: Create an IChatClient and IEmbeddingGenerator backed by local LLM inference with zero cloud dependencies.
Stream responses token by token: Leverage GetStreamingResponseAsync for real-time output using the standard streaming API.
Build a RAG pipeline with standard abstractions: Generate embeddings, search by cosine similarity, and augment prompts with retrieved context, all through M.E.AI interfaces.

This integration enables developers to write provider-agnostic .NET AI code while running entirely on-device through LM-Kit.NET.

💻 Sample Application Description

The Microsoft.Extensions.AI Integration Demo is a console application that demonstrates three approaches for working with the standard .NET AI abstractions:

Direct Chat Completion: Invoke IChatClient.GetResponseAsync with usage reporting (token counts, finish reason).
Streaming Chat Completion: Stream the same question token by token using IChatClient.GetStreamingResponseAsync.
RAG with Embeddings: Embed facts using IEmbeddingGenerator, search by cosine similarity, and stream an augmented answer grounded in retrieved context.

The demo uses gemma3:4b for chat and embeddinggemma-300m for embeddings, loaded through LM.LoadFromModelID.

✨ Key Features

🔗 Standard .NET AI Abstractions: Uses IChatClient and IEmbeddingGenerator<string, Embedding<float>> from Microsoft.Extensions.AI.
📊 Usage Reporting: Displays input/output token counts and finish reason from ChatResponse.Usage.
⚡ Streaming Support: Real-time token-by-token output through the standard GetStreamingResponseAsync API.
🧠 Simple RAG Pipeline: Embeds facts, retrieves by cosine similarity, and augments prompts with context.
⚙️ ChatOptions Configuration: Demonstrates Temperature and MaxOutputTokens through the standard ChatOptions class.
🖥️ Fully Local: All inference runs on-device with no cloud dependencies.

🧠 Supported Models & Technologies

The demo utilizes LM-Kit.NET to power both the chat and embedding models:

Chat Model: gemma3:4b (~3 GB VRAM), a lightweight instruction-following model for generating responses.
Embedding Model: embeddinggemma-300m (~300 MB VRAM), used for creating text embeddings that enable semantic similarity search.

The LM-Kit.NET.ExtensionsAI package provides the bridge, implementing IChatClient via LMKitChatClient and IEmbeddingGenerator via LMKitEmbeddingGenerator.

🔗 Additional Resources

NuGet Package: LM-Kit.NET.ExtensionsAI
Source Code: This plugin is open source under the Apache 2 license and is hosted on GitHub.
Microsoft.Extensions.AI Documentation: Microsoft Learn

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 SDK
Sufficient VRAM for gemma3:4b (~3 GB) and embeddinggemma-300m (~300 MB)

📥 Download the Project

.NET Console Demo

▶️ Running the Application

📂 Clone the repository:

git clone https://github.com/LM-Kit/lm-kit-net-samples

📁 Navigate to the project directory:

cd lm-kit-net-samples/console_net/extensions_ai_integration

🔨 Build and run the application:
```
dotnet build
dotnet run
```

💡 Example Usage

Upon running the application, you will see three distinct parts:

Direct Chat (Part 1):
- The application asks "Who is Elodie's favourite detective?" without any context.
- The model responds with "I don't know" since it has no relevant information.
- Token usage and finish reason are displayed.
Streaming Chat (Part 2):
- The same question is streamed token by token using GetStreamingResponseAsync.
Memory-Enhanced Answer via RAG (Part 3):
- Five detective-related facts are embedded using IEmbeddingGenerator.
- The most relevant facts are retrieved by cosine similarity.
- An augmented prompt is built and the answer is streamed, now grounded in the retrieved context.

The code snippet below illustrates the key components:

// Create Microsoft.Extensions.AI services from LM-Kit models
IChatClient chatClient = new LMKitChatClient(chatModel);
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
    new LMKitEmbeddingGenerator(embeddingModel);

// Direct chat completion with usage reporting
var response = await chatClient.GetResponseAsync(messages);
Console.WriteLine($"Answer: {response.Text}");
Console.WriteLine($"Tokens: {response.Usage?.InputTokenCount} in / {response.Usage?.OutputTokenCount} out");

// Streaming chat completion
await foreach (var update in chatClient.GetStreamingResponseAsync(messages))
{
    Console.Write(update.Text);
}

// Generate embeddings for RAG
var result = await embeddingGenerator.GenerateAsync(["Some fact text"]);
float[] embedding = result[0].Vector.ToArray();

// Stream with ChatOptions
var options = new ChatOptions { Temperature = 0.3f, MaxOutputTokens = 512 };
await foreach (var update in chatClient.GetStreamingResponseAsync(augmentedMessages, options))
{
    Console.Write(update.Text);
}

Developers can modify the stored facts, chat options, or model selections to tailor the demo to different scenarios.

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│              Your .NET Application              │
│                                                 │
│  IChatClient          IEmbeddingGenerator       │
│      │                        │                 │
├──────┼────────────────────────┼─────────────────┤
│      ▼                        ▼                 │
│  LMKitChatClient     LMKitEmbeddingGenerator    │
│  (LM-Kit.NET.ExtensionsAI)                      │
│      │                        │                 │
├──────┼────────────────────────┼─────────────────┤
│      ▼                        ▼                 │
│  MultiTurnConversation    Embedder              │
│  (LM-Kit.NET Core SDK)                          │
│      │                        │                 │
├──────┼────────────────────────┼─────────────────┤
│      ▼                        ▼                 │
│          Local LLM Inference Engine             │
│          (CPU / CUDA / Vulkan / Metal)          │
└─────────────────────────────────────────────────┘

📚 Summary

The Microsoft.Extensions.AI Integration Demo provides a straightforward foundation for using LM-Kit.NET through the standard .NET AI abstraction layer. By leveraging the LM-Kit.NET.ExtensionsAI package, developers can write provider-agnostic code using IChatClient and IEmbeddingGenerator while running fully local inference. This enables seamless integration with the broader M.E.AI ecosystem, including middleware, caching, telemetry, and dependency injection.

For more details, check out the demo repository.

Migrate from Cloud to Local Inference: Replace cloud AI providers with LM-Kit.NET while keeping your IChatClient code unchanged.
Glossary: Embeddings: Core concepts behind vector representations used by IEmbeddingGenerator for RAG pipelines.
Glossary: Semantic Similarity: Understanding cosine similarity used in the demo's RAG retrieval step.
Semantic Kernel Integration Memory Demo: Companion demo using the Semantic Kernel abstraction layer as an alternative to Microsoft.Extensions.AI.

Table of Contents