Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/telemetry_observability

Telemetry & Observability for C# .NET Applications


🎯 Purpose of the Demo

This demo showcases LM-Kit.NET's OpenTelemetry integration for monitoring and observing LLM inference operations. It demonstrates how to capture traces and metrics following the OpenTelemetry GenAI semantic conventions, enabling integration with industry-standard observability platforms.


👥 Who Should Use This Demo

  • DevOps Engineers implementing monitoring for AI applications
  • Platform Engineers building observability pipelines for LLM workloads
  • Developers who need to track token usage, latency, and throughput
  • Teams requiring distributed tracing across AI-powered microservices

🚀 What Problem It Solves

AI applications require visibility into:

  • Token consumption for cost tracking and optimization
  • Latency metrics (time-to-first-token, generation speed)
  • Request correlation across distributed systems
  • Error tracking and debugging of inference failures

This demo shows how LM-Kit.NET automatically emits telemetry data that can be collected, analyzed, and exported to any OpenTelemetry-compatible backend.


💻 Demo Application Overview

The demo provides an interactive chat interface that silently collects telemetry in memory. Use commands to view collected traces and metrics on demand.


✨ Key Features

Feature Description
In-Memory Collection Captures telemetry silently using .NET's ActivityListener and MeterListener
On-Demand Display View traces with /traces and metrics with /metrics
Conversation Correlation Each session has a unique ConversationId for span correlation
GenAI Semantic Conventions Follows OpenTelemetry GenAI standards for interoperability

Example Output

==============================================
   LM-Kit.NET Telemetry & Observability Demo
==============================================

Conversation ID: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
(This ID correlates all telemetry spans for this session)

User: /traces

--- Collected Trace Spans ---

[1] text_completion ministral-3-3b-instruct
    Duration: 1523.45ms
    Status: Ok
    gen_ai.operation.name: text_completion
    gen_ai.conversation.id: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
    gen_ai.response.finish_reasons: stop
    gen_ai.request.temperature: 0.7
    gen_ai.usage.input_tokens: 45
    gen_ai.usage.output_tokens: 128

User: /metrics

--- Collected Metrics ---

gen_ai.client.token.usage (input):
    Count: 3, Sum: 135, Avg: 45.00

gen_ai.client.token.usage (output):
    Count: 3, Sum: 384, Avg: 128.00

gen_ai.server.request.duration:
    Count: 3, Sum: 4.52s, Avg: 1.51s

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Application                     │
├─────────────────────────────────────────────────────────┤
│  MultiTurnConversation                                  │
│  ├── ChatHistory.ConversationId (session correlation)   │
│  └── Submit() → generates telemetry                     │
├─────────────────────────────────────────────────────────┤
│  LM-Kit.NET Telemetry Layer                             │
│  ├── ActivitySource: "LM-Kit" (traces)                  │
│  └── Meter: "LM-Kit" (metrics)                          │
├─────────────────────────────────────────────────────────┤
│  .NET Diagnostics / OpenTelemetry                       │
│  ├── ActivityListener (in-memory collection)            │
│  ├── MeterListener (in-memory collection)               │
│  └── Or: OpenTelemetry SDK exporters                    │
│      ├── OTLP → Jaeger, Tempo, etc.                     │
│      ├── Console (debugging)                            │
│      └── Application Insights, Datadog, etc.            │
└─────────────────────────────────────────────────────────┘

⚙️ Getting Started

Prerequisites

  • .NET 8.0 or later
  • 3-6 GB VRAM depending on model selection

Download the Demo

git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/telemetry_observability

Run the Demo

dotnet run

🔧 Telemetry Configuration

Collecting with ActivityListener and MeterListener

using LMKit.Telemetry;
using System.Diagnostics;
using System.Diagnostics.Metrics;

// Listen to LM-Kit activities (traces)
var activityListener = new ActivityListener
{
    ShouldListenTo = source => source.Name == LMKitTelemetry.ActivitySourceName,
    Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
        ActivitySamplingResult.AllDataAndRecorded,
    ActivityStopped = activity =>
    {
        // Process completed spans
        Console.WriteLine($"Span: {activity.DisplayName}");
        Console.WriteLine($"  Duration: {activity.Duration.TotalMilliseconds}ms");
        foreach (var tag in activity.Tags)
        {
            Console.WriteLine($"  {tag.Key}: {tag.Value}");
        }
    }
};
ActivitySource.AddActivityListener(activityListener);

// Listen to LM-Kit metrics
var meterListener = new MeterListener();
meterListener.InstrumentPublished = (instrument, listener) =>
{
    if (instrument.Meter.Name == LMKitTelemetry.MeterName)
    {
        listener.EnableMeasurementEvents(instrument);
    }
};
meterListener.SetMeasurementEventCallback<double>((instrument, value, tags, state) =>
{
    Console.WriteLine($"Metric: {instrument.Name} = {value}");
});
meterListener.Start();

Exporting to OpenTelemetry Backends

using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;

// Configure with OTLP exporter (Jaeger, Tempo, etc.)
var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource(LMKitTelemetry.ActivitySourceName)
    .AddOtlpExporter()
    .Build();

var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter(LMKitTelemetry.MeterName)
    .AddOtlpExporter()
    .Build();

📊 Available Telemetry

Metrics

Metric Unit Description
gen_ai.server.time_to_first_token seconds Time until first token generated
gen_ai.server.time_per_output_token seconds Average latency per output token
gen_ai.server.request.duration seconds Total request duration
gen_ai.client.token.usage tokens Token counts (tagged by input/output)
gen_ai.client.operation.duration seconds Client-side operation duration

Span Attributes

Attribute Description
gen_ai.conversation.id Session correlation ID
gen_ai.response.id Unique response identifier
gen_ai.response.finish_reasons Why generation stopped (stop, length, tool_calls)
gen_ai.request.temperature Sampling temperature
gen_ai.request.top_p Top-p sampling parameter
gen_ai.request.top_k Top-k sampling parameter
gen_ai.request.max_tokens Maximum completion tokens
gen_ai.usage.input_tokens Input token count
gen_ai.usage.output_tokens Output token count

🚀 Extend the Demo

  • Add cost tracking: Calculate costs based on token usage and model pricing
  • Export to Grafana: Use OTLP exporter with Tempo for distributed tracing
  • Build dashboards: Create Prometheus/Grafana dashboards for LLM metrics
  • Add alerting: Set up alerts for high latency or token budget exceeded

📚 Additional Resources