Table of Contents

Configure GPU Backends

GPU acceleration is the single most impactful setting for inference performance. A model that takes several seconds per response on CPU can generate tokens in milliseconds when offloaded to a supported GPU. This guide explains how to install, enable, and verify each backend so you can get the best throughput from your hardware.


TL;DR

# Install the base SDK (includes CPU, AVX, and Vulkan backends)
dotnet add package LM-Kit.NET

# For NVIDIA GPUs, add a CUDA package
dotnet add package LM-Kit.NET.Backend.Cuda12.Windows   # or .Linux / .linux-arm64
using LMKit.Global;
using LMKit.Model;

// The SDK auto-selects the best backend: CUDA 13 → CUDA 12 → Vulkan → CPU
Runtime.Initialize();

Console.WriteLine($"Backend: {Runtime.Backend}");  // e.g. "Cuda12", "Vulkan", "Avx2"

using LM model = LM.LoadFromModelID("gemma3:4b");
  • Vulkan is included in the base LM-Kit.NET package. No extra install needed.
  • CUDA requires a separate backend package (see table below).
  • Metal (macOS) is included automatically.
  • If CUDA is installed but no NVIDIA GPU is found, the SDK falls back to Vulkan automatically.

Prerequisites

Requirement Details
LM-Kit.NET Installed via NuGet (LM-Kit.NET package)
.NET .NET 8.0 or later (or .NET Standard 2.0 compatible)
GPU (optional) NVIDIA GPU with CUDA 12+ drivers, or any GPU with Vulkan 1.2+ support
macOS (optional) Apple Silicon or AMD GPU for Metal acceleration

Platform Support

Platform Architectures Status
Windows x64 Fully supported
Windows ARM64 Coming soon
Linux x64, ARM64 Fully supported
macOS Universal (Apple Silicon + Intel) Fully supported

Available Backends

LM-Kit.NET selects the optimal backend automatically at startup and keeps it for the lifetime of the process. The table below lists every supported backend.

Backend Description Platform NuGet Package
CPU (SSE4) Default fallback. Works on any modern x64 processor. Windows, Linux Included in LM-Kit.NET
AVX / AVX2 Leverages wider SIMD registers for faster CPU inference. Windows, Linux Included in LM-Kit.NET
CUDA 12 NVIDIA GPU acceleration using CUDA 12.x drivers. Windows, Linux (x64 and ARM64) LM-Kit.NET.Backend.Cuda12.Windows / LM-Kit.NET.Backend.Cuda12.Linux / LM-Kit.NET.Backend.Cuda12.linux-arm64
CUDA 13 NVIDIA GPU acceleration using CUDA 13.x drivers. Windows only LM-Kit.NET.Backend.Cuda13.Windows
Vulkan Cross-platform GPU acceleration (NVIDIA, AMD, Intel). Windows, Linux Included in LM-Kit.NET
Metal Apple GPU acceleration via the Metal API. macOS Included in LM-Kit.NET

Note: On macOS, Metal is enabled automatically when a compatible GPU is present. The EnableCuda and EnableVulkan properties have no effect on macOS.

Note: CUDA 13 support for Linux is coming soon.


Automatic Backend Selection

LM-Kit.NET automatically selects the best available backend at startup. You do not need to pick one manually. The SDK evaluates backends using three conditions:

  1. Runtime files are available. The matching NuGet backend package is installed.
  2. Backend is not disabled in code. EnableCuda and EnableVulkan have not been set to false.
  3. A compatible device exists on the system. At least one GPU can be loaded by the backend.

When all three conditions are met, the backend is selected. When multiple GPU backends qualify, the SDK follows this priority order:

CUDA 13  →  CUDA 12  →  Vulkan  →  CPU (AVX2 / AVX / SSE4)

Key behavior: If a CUDA backend package is installed but no NVIDIA GPU is found on the system, LM-Kit.NET automatically falls back to Vulkan (provided Vulkan is not explicitly disabled in code). This means you can safely install a CUDA backend package even on machines that may not have an NVIDIA GPU. The SDK will select the next best option.


Step 1: Install the Backend NuGet Package

The base LM-Kit.NET package includes CPU, AVX, and Vulkan support. For CUDA GPU acceleration you must add the matching CUDA backend package.

CUDA 12 (NVIDIA)

# Windows
Install-Package LM-Kit.NET.Backend.Cuda12.Windows

# Linux x64
Install-Package LM-Kit.NET.Backend.Cuda12.Linux

# Linux ARM64
Install-Package LM-Kit.NET.Backend.Cuda12.linux-arm64

CUDA 13 (NVIDIA, Windows only)

# Windows
Install-Package LM-Kit.NET.Backend.Cuda13.Windows

Tip: You can install multiple CUDA backend packages in the same project. LM-Kit.NET will automatically select the highest priority backend that matches your hardware. If no NVIDIA GPU is detected, the SDK falls back to Vulkan, then CPU.


Step 2: Enable the Backend in Code

Backend flags must be set before the runtime is initialized. Once Initialize() has been called (or any operation triggers automatic initialization), these properties become immutable.

using LMKit.Global;

// Enable CUDA (default is true, shown here for clarity)
Runtime.EnableCuda = true;

// Enable Vulkan (default is true)
Runtime.EnableVulkan = true;

// Initialize the runtime explicitly
Runtime.Initialize();

// Check which backend was selected
Console.WriteLine($"Active backend: {Runtime.Backend}");

To force CPU-only inference, disable both GPU backends before initialization:

Runtime.EnableCuda = false;
Runtime.EnableVulkan = false;
Runtime.Initialize();

// Runtime.Backend will be CPU, Avx, or Avx2
Console.WriteLine($"Backend: {Runtime.Backend}");

Step 3: Verify GPU Availability

After initialization you can query whether GPU offload is available and enumerate the detected devices.

using LMKit.Global;
using LMKit.Hardware.Gpu;

Runtime.Initialize();

Console.WriteLine($"Backend:     {Runtime.Backend}");
Console.WriteLine($"GPU support: {Runtime.HasGpuSupport}");
Console.WriteLine($"GPU devices: {GpuDeviceInfo.Devices.Count}");

foreach (var device in GpuDeviceInfo.Devices)
{
    double totalGB = device.TotalMemorySize / (1024.0 * 1024.0 * 1024.0);
    double freeGB  = device.FreeMemorySize  / (1024.0 * 1024.0 * 1024.0);

    Console.WriteLine($"  [{device.DeviceNumber}] {device.DeviceName}");
    Console.WriteLine($"      Type:        {device.DeviceType}");
    Console.WriteLine($"      Total VRAM:  {totalGB:F1} GB");
    Console.WriteLine($"      Free VRAM:   {freeGB:F1} GB");
}

Step 4: Control GPU Layer Offloading

By default, LM-Kit.NET offloads all model layers to the GPU when GPU support is available. You can limit the number of offloaded layers through LM.DeviceConfiguration. This is useful when a model is too large to fit entirely in VRAM.

using LMKit.Model;

// Offload all layers (default behavior)
using LM model = LM.LoadFromModelID("gemma3:4b");
Console.WriteLine($"GPU layers: {model.GpuLayerCount}");

// Offload only 20 layers to save VRAM
using LM partialGpu = LM.LoadFromModelID("gemma3:12b",
    deviceConfiguration: new LM.DeviceConfiguration
    {
        GpuLayerCount = 20
    });

Console.WriteLine($"GPU layers (partial): {partialGpu.GpuLayerCount}");

Choosing the Right Layer Count

  • int.MaxValue (default): Offload everything. LM-Kit.NET will automatically reduce the count if VRAM is insufficient.
  • A specific number (e.g., 20): Useful when you want to share VRAM with other processes or run multiple models concurrently.
  • 0: Force all computation to the CPU, even when a GPU backend is active.

Step 5: Multi-GPU Configuration

When multiple GPUs are detected, LM-Kit.NET selects the device with the most available memory as the main GPU. You can override this by specifying a GpuDeviceInfo instance.

using LMKit.Hardware.Gpu;
using LMKit.Model;

// Pick a specific GPU by device number
var device = GpuDeviceInfo.GetDeviceFromNumber(1);

using LM model = LM.LoadFromModelID("gemma3:12b",
    deviceConfiguration: new LM.DeviceConfiguration(device));

Console.WriteLine($"Running on: {device.DeviceName}");

When to Use CPU vs. GPU

Scenario Recommendation
Small models (under 3B parameters) CPU or AVX2 is often fast enough.
Medium models (4B to 8B parameters) GPU recommended. A card with 6+ GB VRAM provides smooth inference.
Large models (12B+ parameters) GPU strongly recommended. 10+ GB VRAM for full offload.
No dedicated GPU available CPU with AVX2 gives the best software-only performance.
Shared GPU environment Use partial layer offloading to limit VRAM consumption.
Batch processing or server workloads CUDA typically provides the highest throughput for NVIDIA hardware.
Cross-vendor GPU support Vulkan works across NVIDIA, AMD, and Intel GPUs. No extra package needed.
Mixed deployment (some machines with NVIDIA GPUs, some without) Install a CUDA package. The SDK falls back to Vulkan automatically on machines without NVIDIA GPUs.

Troubleshooting

Problem Solution
Runtime.Backend shows CPU despite installing a GPU package Verify that the correct NuGet backend package is installed and that your GPU drivers meet the minimum version. Check that EnableCuda and EnableVulkan are not set to false.
HasGpuSupport is false The GPU driver may be outdated, or EnableCuda / EnableVulkan was set to false before initialization.
Out-of-memory when loading a large model Lower GpuLayerCount to offload fewer layers, or switch to a smaller quantization variant.
InvalidOperationException when setting EnableCuda The runtime has already been initialized. Backend flags must be set before any LM-Kit.NET operation.
Slow inference despite GPU backend Ensure the model actually has GPU layers. Check model.GpuLayerCount after loading.
CUDA installed but Vulkan is selected No NVIDIA GPU was detected. The SDK automatically fell back to Vulkan. Install NVIDIA drivers or verify that the GPU is recognized by the system.

Quick Reference

using LMKit.Global;
using LMKit.Hardware.Gpu;
using LMKit.Model;

// 1. Configure backend (before initialization)
Runtime.EnableCuda = true;
Runtime.Initialize();

// 2. Verify
Console.WriteLine($"Backend: {Runtime.Backend}");
Console.WriteLine($"GPU support: {Runtime.HasGpuSupport}");

// 3. Load model with GPU acceleration
using LM model = LM.LoadFromModelID("gemma3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%"); return true; });

Console.WriteLine($"\nGPU layers: {model.GpuLayerCount}");

Next Steps

Share