Configure GPU Backends
GPU acceleration is the single most impactful setting for inference performance. A model that takes several seconds per response on CPU can generate tokens in milliseconds when offloaded to a supported GPU. This guide explains how to install, enable, and verify each backend so you can get the best throughput from your hardware.
TL;DR
# Install the base SDK (includes CPU, AVX, and Vulkan backends)
dotnet add package LM-Kit.NET
# For NVIDIA GPUs, add a CUDA package
dotnet add package LM-Kit.NET.Backend.Cuda12.Windows # or .Linux / .linux-arm64
using LMKit.Global;
using LMKit.Model;
// The SDK auto-selects the best backend: CUDA 13 → CUDA 12 → Vulkan → CPU
Runtime.Initialize();
Console.WriteLine($"Backend: {Runtime.Backend}"); // e.g. "Cuda12", "Vulkan", "Avx2"
using LM model = LM.LoadFromModelID("gemma3:4b");
- Vulkan is included in the base
LM-Kit.NETpackage. No extra install needed. - CUDA requires a separate backend package (see table below).
- Metal (macOS) is included automatically.
- If CUDA is installed but no NVIDIA GPU is found, the SDK falls back to Vulkan automatically.
Prerequisites
| Requirement | Details |
|---|---|
| LM-Kit.NET | Installed via NuGet (LM-Kit.NET package) |
| .NET | .NET 8.0 or later (or .NET Standard 2.0 compatible) |
| GPU (optional) | NVIDIA GPU with CUDA 12+ drivers, or any GPU with Vulkan 1.2+ support |
| macOS (optional) | Apple Silicon or AMD GPU for Metal acceleration |
Platform Support
| Platform | Architectures | Status |
|---|---|---|
| Windows | x64 | Fully supported |
| Windows | ARM64 | Coming soon |
| Linux | x64, ARM64 | Fully supported |
| macOS | Universal (Apple Silicon + Intel) | Fully supported |
Available Backends
LM-Kit.NET selects the optimal backend automatically at startup and keeps it for the lifetime of the process. The table below lists every supported backend.
| Backend | Description | Platform | NuGet Package |
|---|---|---|---|
| CPU (SSE4) | Default fallback. Works on any modern x64 processor. | Windows, Linux | Included in LM-Kit.NET |
| AVX / AVX2 | Leverages wider SIMD registers for faster CPU inference. | Windows, Linux | Included in LM-Kit.NET |
| CUDA 12 | NVIDIA GPU acceleration using CUDA 12.x drivers. | Windows, Linux (x64 and ARM64) | LM-Kit.NET.Backend.Cuda12.Windows / LM-Kit.NET.Backend.Cuda12.Linux / LM-Kit.NET.Backend.Cuda12.linux-arm64 |
| CUDA 13 | NVIDIA GPU acceleration using CUDA 13.x drivers. | Windows only | LM-Kit.NET.Backend.Cuda13.Windows |
| Vulkan | Cross-platform GPU acceleration (NVIDIA, AMD, Intel). | Windows, Linux | Included in LM-Kit.NET |
| Metal | Apple GPU acceleration via the Metal API. | macOS | Included in LM-Kit.NET |
Note: On macOS, Metal is enabled automatically when a compatible GPU is present. The
EnableCudaandEnableVulkanproperties have no effect on macOS.
Note: CUDA 13 support for Linux is coming soon.
Automatic Backend Selection
LM-Kit.NET automatically selects the best available backend at startup. You do not need to pick one manually. The SDK evaluates backends using three conditions:
- Runtime files are available. The matching NuGet backend package is installed.
- Backend is not disabled in code.
EnableCudaandEnableVulkanhave not been set tofalse. - A compatible device exists on the system. At least one GPU can be loaded by the backend.
When all three conditions are met, the backend is selected. When multiple GPU backends qualify, the SDK follows this priority order:
CUDA 13 → CUDA 12 → Vulkan → CPU (AVX2 / AVX / SSE4)
Key behavior: If a CUDA backend package is installed but no NVIDIA GPU is found on the system, LM-Kit.NET automatically falls back to Vulkan (provided Vulkan is not explicitly disabled in code). This means you can safely install a CUDA backend package even on machines that may not have an NVIDIA GPU. The SDK will select the next best option.
Step 1: Install the Backend NuGet Package
The base LM-Kit.NET package includes CPU, AVX, and Vulkan support. For CUDA GPU acceleration you must add the matching CUDA backend package.
CUDA 12 (NVIDIA)
# Windows
Install-Package LM-Kit.NET.Backend.Cuda12.Windows
# Linux x64
Install-Package LM-Kit.NET.Backend.Cuda12.Linux
# Linux ARM64
Install-Package LM-Kit.NET.Backend.Cuda12.linux-arm64
CUDA 13 (NVIDIA, Windows only)
# Windows
Install-Package LM-Kit.NET.Backend.Cuda13.Windows
Tip: You can install multiple CUDA backend packages in the same project. LM-Kit.NET will automatically select the highest priority backend that matches your hardware. If no NVIDIA GPU is detected, the SDK falls back to Vulkan, then CPU.
Step 2: Enable the Backend in Code
Backend flags must be set before the runtime is initialized. Once Initialize() has been called (or any operation triggers automatic initialization), these properties become immutable.
using LMKit.Global;
// Enable CUDA (default is true, shown here for clarity)
Runtime.EnableCuda = true;
// Enable Vulkan (default is true)
Runtime.EnableVulkan = true;
// Initialize the runtime explicitly
Runtime.Initialize();
// Check which backend was selected
Console.WriteLine($"Active backend: {Runtime.Backend}");
To force CPU-only inference, disable both GPU backends before initialization:
Runtime.EnableCuda = false;
Runtime.EnableVulkan = false;
Runtime.Initialize();
// Runtime.Backend will be CPU, Avx, or Avx2
Console.WriteLine($"Backend: {Runtime.Backend}");
Step 3: Verify GPU Availability
After initialization you can query whether GPU offload is available and enumerate the detected devices.
using LMKit.Global;
using LMKit.Hardware.Gpu;
Runtime.Initialize();
Console.WriteLine($"Backend: {Runtime.Backend}");
Console.WriteLine($"GPU support: {Runtime.HasGpuSupport}");
Console.WriteLine($"GPU devices: {GpuDeviceInfo.Devices.Count}");
foreach (var device in GpuDeviceInfo.Devices)
{
double totalGB = device.TotalMemorySize / (1024.0 * 1024.0 * 1024.0);
double freeGB = device.FreeMemorySize / (1024.0 * 1024.0 * 1024.0);
Console.WriteLine($" [{device.DeviceNumber}] {device.DeviceName}");
Console.WriteLine($" Type: {device.DeviceType}");
Console.WriteLine($" Total VRAM: {totalGB:F1} GB");
Console.WriteLine($" Free VRAM: {freeGB:F1} GB");
}
Step 4: Control GPU Layer Offloading
By default, LM-Kit.NET offloads all model layers to the GPU when GPU support is available. You can limit the number of offloaded layers through LM.DeviceConfiguration. This is useful when a model is too large to fit entirely in VRAM.
using LMKit.Model;
// Offload all layers (default behavior)
using LM model = LM.LoadFromModelID("gemma3:4b");
Console.WriteLine($"GPU layers: {model.GpuLayerCount}");
// Offload only 20 layers to save VRAM
using LM partialGpu = LM.LoadFromModelID("gemma3:12b",
deviceConfiguration: new LM.DeviceConfiguration
{
GpuLayerCount = 20
});
Console.WriteLine($"GPU layers (partial): {partialGpu.GpuLayerCount}");
Choosing the Right Layer Count
int.MaxValue(default): Offload everything. LM-Kit.NET will automatically reduce the count if VRAM is insufficient.- A specific number (e.g., 20): Useful when you want to share VRAM with other processes or run multiple models concurrently.
0: Force all computation to the CPU, even when a GPU backend is active.
Step 5: Multi-GPU Configuration
When multiple GPUs are detected, LM-Kit.NET selects the device with the most available memory as the main GPU. You can override this by specifying a GpuDeviceInfo instance.
using LMKit.Hardware.Gpu;
using LMKit.Model;
// Pick a specific GPU by device number
var device = GpuDeviceInfo.GetDeviceFromNumber(1);
using LM model = LM.LoadFromModelID("gemma3:12b",
deviceConfiguration: new LM.DeviceConfiguration(device));
Console.WriteLine($"Running on: {device.DeviceName}");
When to Use CPU vs. GPU
| Scenario | Recommendation |
|---|---|
| Small models (under 3B parameters) | CPU or AVX2 is often fast enough. |
| Medium models (4B to 8B parameters) | GPU recommended. A card with 6+ GB VRAM provides smooth inference. |
| Large models (12B+ parameters) | GPU strongly recommended. 10+ GB VRAM for full offload. |
| No dedicated GPU available | CPU with AVX2 gives the best software-only performance. |
| Shared GPU environment | Use partial layer offloading to limit VRAM consumption. |
| Batch processing or server workloads | CUDA typically provides the highest throughput for NVIDIA hardware. |
| Cross-vendor GPU support | Vulkan works across NVIDIA, AMD, and Intel GPUs. No extra package needed. |
| Mixed deployment (some machines with NVIDIA GPUs, some without) | Install a CUDA package. The SDK falls back to Vulkan automatically on machines without NVIDIA GPUs. |
Troubleshooting
| Problem | Solution |
|---|---|
Runtime.Backend shows CPU despite installing a GPU package |
Verify that the correct NuGet backend package is installed and that your GPU drivers meet the minimum version. Check that EnableCuda and EnableVulkan are not set to false. |
HasGpuSupport is false |
The GPU driver may be outdated, or EnableCuda / EnableVulkan was set to false before initialization. |
| Out-of-memory when loading a large model | Lower GpuLayerCount to offload fewer layers, or switch to a smaller quantization variant. |
InvalidOperationException when setting EnableCuda |
The runtime has already been initialized. Backend flags must be set before any LM-Kit.NET operation. |
| Slow inference despite GPU backend | Ensure the model actually has GPU layers. Check model.GpuLayerCount after loading. |
| CUDA installed but Vulkan is selected | No NVIDIA GPU was detected. The SDK automatically fell back to Vulkan. Install NVIDIA drivers or verify that the GPU is recognized by the system. |
Quick Reference
using LMKit.Global;
using LMKit.Hardware.Gpu;
using LMKit.Model;
// 1. Configure backend (before initialization)
Runtime.EnableCuda = true;
Runtime.Initialize();
// 2. Verify
Console.WriteLine($"Backend: {Runtime.Backend}");
Console.WriteLine($"GPU support: {Runtime.HasGpuSupport}");
// 3. Load model with GPU acceleration
using LM model = LM.LoadFromModelID("gemma3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%"); return true; });
Console.WriteLine($"\nGPU layers: {model.GpuLayerCount}");
Next Steps
- Distributed Inference Across Multiple GPUs: split large models across multiple GPUs for better throughput.
- Understanding Model Loading and Caching: learn about download behavior, caching, and model properties.
- Your First AI Agent: build a working agent with tools.
- Choosing the Right Model: select the best model for your hardware and use case.
- Configure GPU Backends and Optimize Performance (How-To): advanced tuning and performance optimization.