Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/local-inference/backends/runtime_diagnostic_report

Hardware Backends Inspection for C# .NET Applications


🎯 Purpose of the Demo

Reports which native backend LM-Kit picked (CPU, CUDA 12/13, Vulkan, Metal, AVX2, AVX, SYCL), enumerates every visible GPU with its total and free VRAM, and shows how a model's layers actually land on devices after LM.LoadFromModelID().

👥 Who Should Use This Demo

  • Anyone shipping LM-Kit.NET in a desktop app and wants to log the user's hardware on first run.
  • Support engineers diagnosing "the model is slow" tickets.
  • DevOps building startup diagnostics for a GPU server fleet.

🚀 What Problem It Solves

When a customer says "performance is bad", you cannot fix it without knowing whether the layers are on GPU at all. This demo is the diagnostic harness you can paste into your own crash handler or first-run setup wizard.

💻 Demo Application Overview

A console app that:

  1. Calls LMKit.Global.Runtime.Initialize().
  2. Prints Runtime.Backend, Runtime.Version, Runtime.HasGpuSupport, Runtime.EnableCuda, Runtime.EnableVulkan, Runtime.BackendDirectory.
  3. Lists every entry in GpuDeviceInfo.Devices with name, total VRAM, free VRAM.
  4. Loads a small chat model (gemma3:270m) with auto configuration and prints LM.LayerCount and LM.GpuLayerCount.
  5. Optionally reloads the same model pinned to CPU (GpuLayerCount = 0) so the placement difference is visible.

✨ Key Features

  • Runtime.Backend enum: CPU, Cuda12, Cuda13, Metal, Vulkan, Sycl, Avx, Avx2.
  • GpuDeviceInfo.Devices is a IReadOnlyList<GpuDeviceInfo> with DeviceName, TotalMemorySize, FreeMemorySize, DeviceNumber.
  • LM.DeviceConfiguration.GpuLayerCount = 0 forces CPU-only loading.
  • LM.DeviceConfiguration.AutoFitToVram = true lets the loader retry with fewer GPU layers on OOM.

Example Output

LM-Kit runtime version : 2026.5.x
Selected backend       : Cuda13
Has GPU support        : True
CUDA enabled           : True
Vulkan enabled         : False

Detected 1 GPU device(s):

  # | Device                           |   Total VRAM |    Free VRAM
--------------------------------------------------------------------------------
  0 | NVIDIA GeForce RTX 4090          |     24.0 GB |     22.4 GB

  Layers in model       : 26
  Layers offloaded GPU  : 26
  Layers on CPU         : 0

⚙️ Getting Started

Run:

cd lm-kit-net-samples/console_net/local-inference/backends/runtime_diagnostic_report
dotnet run

🔧 Troubleshooting

  • "No GPU device visible" on a machine with a GPU -> verify the LM-Kit CUDA / Vulkan / Metal backend NuGet was installed and the appropriate driver is present.
  • Runtime.Backend reports CPU on a machine with CUDA -> the runtime did not find the CUDA shared libraries. Check Runtime.BackendDirectory.

🚀 Extend the Demo

  • Plug the report into your application's About dialog.
  • On startup, refuse to load any model larger than GpuDeviceInfo.Devices[0].FreeMemorySize and surface the message in your UI.
  • Add a self-test that runs a 50-token completion and reports tokens/sec by backend.

📚 Additional Resources

Share