👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/telemetry_observability
Telemetry & Observability for C# .NET Applications
🎯 Purpose of the Demo
This demo showcases LM-Kit.NET's OpenTelemetry integration for monitoring and observing LLM inference operations. It demonstrates how to capture traces and metrics following the OpenTelemetry GenAI semantic conventions, enabling integration with industry-standard observability platforms.
👥 Who Should Use This Demo
- DevOps Engineers implementing monitoring for AI applications
- Platform Engineers building observability pipelines for LLM workloads
- Developers who need to track token usage, latency, and throughput
- Teams requiring distributed tracing across AI-powered microservices
🚀 What Problem It Solves
AI applications require visibility into:
- Token consumption for cost tracking and optimization
- Latency metrics (time-to-first-token, generation speed)
- Request correlation across distributed systems
- Error tracking and debugging of inference failures
This demo shows how LM-Kit.NET automatically emits telemetry data that can be collected, analyzed, and exported to any OpenTelemetry-compatible backend.
💻 Demo Application Overview
The demo provides an interactive chat interface that silently collects telemetry in memory. Use commands to view collected traces and metrics on demand.
✨ Key Features
| Feature | Description |
|---|---|
| In-Memory Collection | Captures telemetry silently using .NET's ActivityListener and MeterListener |
| On-Demand Display | View traces with /traces and metrics with /metrics |
| Conversation Correlation | Each session has a unique ConversationId for span correlation |
| GenAI Semantic Conventions | Follows OpenTelemetry GenAI standards for interoperability |
Example Output
==============================================
LM-Kit.NET Telemetry & Observability Demo
==============================================
Conversation ID: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
(This ID correlates all telemetry spans for this session)
User: /traces
--- Collected Trace Spans ---
[1] text_completion ministral-3-3b-instruct
Duration: 1523.45ms
Status: Ok
gen_ai.operation.name: text_completion
gen_ai.conversation.id: 7a3b2c1d4e5f6a7b8c9d0e1f2a3b4c5d
gen_ai.response.finish_reasons: stop
gen_ai.request.temperature: 0.7
gen_ai.usage.input_tokens: 45
gen_ai.usage.output_tokens: 128
User: /metrics
--- Collected Metrics ---
gen_ai.client.token.usage (input):
Count: 3, Sum: 135, Avg: 45.00
gen_ai.client.token.usage (output):
Count: 3, Sum: 384, Avg: 128.00
gen_ai.server.request.duration:
Count: 3, Sum: 4.52s, Avg: 1.51s
🏗️ Architecture
┌─────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────┤
│ MultiTurnConversation │
│ ├── ChatHistory.ConversationId (session correlation) │
│ └── Submit() → generates telemetry │
├─────────────────────────────────────────────────────────┤
│ LM-Kit.NET Telemetry Layer │
│ ├── ActivitySource: "LM-Kit" (traces) │
│ └── Meter: "LM-Kit" (metrics) │
├─────────────────────────────────────────────────────────┤
│ .NET Diagnostics / OpenTelemetry │
│ ├── ActivityListener (in-memory collection) │
│ ├── MeterListener (in-memory collection) │
│ └── Or: OpenTelemetry SDK exporters │
│ ├── OTLP → Jaeger, Tempo, etc. │
│ ├── Console (debugging) │
│ └── Application Insights, Datadog, etc. │
└─────────────────────────────────────────────────────────┘
⚙️ Getting Started
Prerequisites
- .NET 8.0 or later
- 3-6 GB VRAM depending on model selection
Download the Demo
git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/telemetry_observability
Run the Demo
dotnet run
🔧 Telemetry Configuration
Collecting with ActivityListener and MeterListener
using LMKit.Telemetry;
using System.Diagnostics;
using System.Diagnostics.Metrics;
// Listen to LM-Kit activities (traces)
var activityListener = new ActivityListener
{
ShouldListenTo = source => source.Name == LMKitTelemetry.ActivitySourceName,
Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
ActivitySamplingResult.AllDataAndRecorded,
ActivityStopped = activity =>
{
// Process completed spans
Console.WriteLine($"Span: {activity.DisplayName}");
Console.WriteLine($" Duration: {activity.Duration.TotalMilliseconds}ms");
foreach (var tag in activity.Tags)
{
Console.WriteLine($" {tag.Key}: {tag.Value}");
}
}
};
ActivitySource.AddActivityListener(activityListener);
// Listen to LM-Kit metrics
var meterListener = new MeterListener();
meterListener.InstrumentPublished = (instrument, listener) =>
{
if (instrument.Meter.Name == LMKitTelemetry.MeterName)
{
listener.EnableMeasurementEvents(instrument);
}
};
meterListener.SetMeasurementEventCallback<double>((instrument, value, tags, state) =>
{
Console.WriteLine($"Metric: {instrument.Name} = {value}");
});
meterListener.Start();
Exporting to OpenTelemetry Backends
using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
// Configure with OTLP exporter (Jaeger, Tempo, etc.)
var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddSource(LMKitTelemetry.ActivitySourceName)
.AddOtlpExporter()
.Build();
var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddMeter(LMKitTelemetry.MeterName)
.AddOtlpExporter()
.Build();
📊 Available Telemetry
Metrics
| Metric | Unit | Description |
|---|---|---|
gen_ai.server.time_to_first_token |
seconds | Time until first token generated |
gen_ai.server.time_per_output_token |
seconds | Average latency per output token |
gen_ai.server.request.duration |
seconds | Total request duration |
gen_ai.client.token.usage |
tokens | Token counts (tagged by input/output) |
gen_ai.client.operation.duration |
seconds | Client-side operation duration |
Span Attributes
| Attribute | Description |
|---|---|
gen_ai.conversation.id |
Session correlation ID |
gen_ai.response.id |
Unique response identifier |
gen_ai.response.finish_reasons |
Why generation stopped (stop, length, tool_calls) |
gen_ai.request.temperature |
Sampling temperature |
gen_ai.request.top_p |
Top-p sampling parameter |
gen_ai.request.top_k |
Top-k sampling parameter |
gen_ai.request.max_tokens |
Maximum completion tokens |
gen_ai.usage.input_tokens |
Input token count |
gen_ai.usage.output_tokens |
Output token count |
🚀 Extend the Demo
- Add cost tracking: Calculate costs based on token usage and model pricing
- Export to Grafana: Use OTLP exporter with Tempo for distributed tracing
- Build dashboards: Create Prometheus/Grafana dashboards for LLM metrics
- Add alerting: Set up alerts for high latency or token budget exceeded