👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/telemetry_observability

Telemetry & Observability for C# .NET Applications

🎯 Purpose of the Demo

This demo showcases LM-Kit.NET's OpenTelemetry integration for monitoring and observing LLM inference operations. It demonstrates how to capture traces and metrics following the OpenTelemetry GenAI semantic conventions, enabling integration with industry-standard observability platforms.

👥 Who Should Use This Demo

DevOps Engineers implementing monitoring for AI applications
Platform Engineers building observability pipelines for LLM workloads
Developers who need to track token usage, latency, and throughput
Teams requiring distributed tracing across AI-powered microservices

🚀 What Problem It Solves

AI applications require visibility into:

Token consumption for cost tracking and optimization
Latency metrics (time-to-first-token, generation speed)
Request correlation across distributed systems
Error tracking and debugging of inference failures

This demo shows how LM-Kit.NET automatically emits telemetry data that can be collected, analyzed, and exported to any OpenTelemetry-compatible backend.

💻 Demo Application Overview

The demo provides an interactive chat interface that silently collects telemetry in memory. Use commands to view collected traces and metrics on demand.

✨ Key Features

Feature	Description
In-Memory Collection	Captures telemetry silently using .NET's `ActivityListener` and `MeterListener`
On-Demand Display	View traces with `/traces` and metrics with `/metrics`
Conversation Correlation	Each session has a unique `ConversationId` for span correlation
GenAI Semantic Conventions	Follows OpenTelemetry GenAI standards for interoperability

Example Output

==============================================
   LM-Kit.NET Telemetry & Observability Demo
==============================================

Conversation ID: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
(This ID correlates all telemetry spans for this session)

User: /traces

--- Collected Trace Spans ---

[1] text_completion ministral-3-3b-instruct
    Duration: 1523.45ms
    Status: Ok
    gen_ai.operation.name: text_completion
    gen_ai.conversation.id: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
    gen_ai.response.finish_reasons: stop
    gen_ai.request.temperature: 0.7
    gen_ai.usage.input_tokens: 45
    gen_ai.usage.output_tokens: 128

User: /metrics

--- Collected Metrics ---

gen_ai.client.token.usage (input):
    Count: 3, Sum: 135, Avg: 45.00

gen_ai.client.token.usage (output):
    Count: 3, Sum: 384, Avg: 128.00

gen_ai.server.request.duration:
    Count: 3, Sum: 4.52s, Avg: 1.51s

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Application                     │
├─────────────────────────────────────────────────────────┤
│  MultiTurnConversation                                  │
│  ├── ChatHistory.ConversationId (session correlation)   │
│  └── Submit() → generates telemetry                     │
├─────────────────────────────────────────────────────────┤
│  LM-Kit.NET Telemetry Layer                             │
│  ├── ActivitySource: "LM-Kit" (traces)                  │
│  └── Meter: "LM-Kit" (metrics)                          │
├─────────────────────────────────────────────────────────┤
│  .NET Diagnostics / OpenTelemetry                       │
│  ├── ActivityListener (in-memory collection)            │
│  ├── MeterListener (in-memory collection)               │
│  └── Or: OpenTelemetry SDK exporters                    │
│      ├── OTLP → Jaeger, Tempo, etc.                     │
│      ├── Console (debugging)                            │
│      └── Application Insights, Datadog, etc.            │
└─────────────────────────────────────────────────────────┘

⚙️ Getting Started

Prerequisites

.NET 8.0 or later
3-6 GB VRAM depending on model selection

Download the Demo

git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/telemetry_observability

Run the Demo

dotnet run

🔧 Telemetry Configuration

Collecting with ActivityListener and MeterListener

using LMKit.Telemetry;
using System.Diagnostics;
using System.Diagnostics.Metrics;

// Listen to LM-Kit activities (traces)
var activityListener = new ActivityListener
{
    ShouldListenTo = source => source.Name == LMKitTelemetry.ActivitySourceName,
    Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
        ActivitySamplingResult.AllDataAndRecorded,
    ActivityStopped = activity =>
    {
        // Process completed spans
        Console.WriteLine($"Span: {activity.DisplayName}");
        Console.WriteLine($"  Duration: {activity.Duration.TotalMilliseconds}ms");
        foreach (var tag in activity.Tags)
        {
            Console.WriteLine($"  {tag.Key}: {tag.Value}");
        }
    }
};
ActivitySource.AddActivityListener(activityListener);

// Listen to LM-Kit metrics
var meterListener = new MeterListener();
meterListener.InstrumentPublished = (instrument, listener) =>
{
    if (instrument.Meter.Name == LMKitTelemetry.MeterName)
    {
        listener.EnableMeasurementEvents(instrument);
    }
};
meterListener.SetMeasurementEventCallback<double>((instrument, value, tags, state) =>
{
    Console.WriteLine($"Metric: {instrument.Name} = {value}");
});
meterListener.Start();

Exporting to OpenTelemetry Backends

using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;

// Configure with OTLP exporter (Jaeger, Tempo, etc.)
var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource(LMKitTelemetry.ActivitySourceName)
    .AddOtlpExporter()
    .Build();

var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter(LMKitTelemetry.MeterName)
    .AddOtlpExporter()
    .Build();

📊 Available Telemetry

Metrics

Metric	Unit	Description
`gen_ai.server.time_to_first_token`	seconds	Time until first token generated
`gen_ai.server.time_per_output_token`	seconds	Average latency per output token
`gen_ai.server.request.duration`	seconds	Total request duration
`gen_ai.client.token.usage`	tokens	Token counts (tagged by input/output)
`gen_ai.client.operation.duration`	seconds	Client-side operation duration

Span Attributes

Attribute	Description
`gen_ai.conversation.id`	Session correlation ID
`gen_ai.response.id`	Unique response identifier
`gen_ai.response.finish_reasons`	Why generation stopped (stop, length, tool_calls)
`gen_ai.request.temperature`	Sampling temperature
`gen_ai.request.top_p`	Top-p sampling parameter
`gen_ai.request.top_k`	Top-k sampling parameter
`gen_ai.request.max_tokens`	Maximum completion tokens
`gen_ai.usage.input_tokens`	Input token count
`gen_ai.usage.output_tokens`	Output token count

🚀 Extend the Demo

Add cost tracking: Calculate costs based on token usage and model pricing
Export to Grafana: Use OTLP exporter with Tempo for distributed tracing
Build dashboards: Create Prometheus/Grafana dashboards for LLM metrics
Add alerting: Set up alerts for high latency or token budget exceeded

📚 Additional Resources

Add Telemetry and Observability: Step-by-step guide for configuring ActivitySource listeners, meters, and exporting to OpenTelemetry backends.
Monitor Agent Execution with Tracing: Extend observability to agent workflows with distributed tracing across tool calls and planning steps.
Glossary: AI Observability: Core concepts behind monitoring, tracing, and metrics collection for AI applications.
Filter Pipeline Demo: Companion demo that uses middleware filters for logging and telemetry at the prompt and completion level.

Table of Contents