Table of Contents

Can I Run LM-Kit.NET on Embedded Devices and Edge Hardware?


TL;DR

Yes. LM-Kit.NET supports Linux ARM64 natively, which covers NVIDIA Jetson (Orin, Xavier), AWS Graviton, and other ARM64 hardware. For resource-constrained devices, use small models (0.6B to 1B parameters) with CPU-only inference. The CPU backend on ARM64 is approximately 39 MB, making it practical for embedded deployments.


Supported Edge Platforms

Device / Platform Architecture GPU Backend Recommended Models
NVIDIA Jetson Orin / Xavier Linux ARM64 CUDA 12 Up to 8B depending on GPU memory
AWS Graviton instances Linux ARM64 CPU Any model that fits in RAM
Raspberry Pi 5 (8 GB) Linux ARM64 CPU 0.6B to 1B models
Industrial Linux ARM64 boards Linux ARM64 CPU or Vulkan 0.6B to 4B depending on RAM
Intel NUC / mini PCs Windows or Linux x64 Vulkan or CPU Up to 8B depending on RAM/VRAM

Choosing Models for Constrained Hardware

On edge devices with limited memory, pick the smallest model that meets your quality requirements:

Model File Size RAM Needed Good For
gemma3:270m 253 MB ~512 MB Lightweight classification, simple extraction
qwen3.5:0.8b 484 MB ~1 GB Basic chat, keyword extraction, short summaries
gemma3:1b 806 MB ~1.5 GB General chat, sentiment analysis, simple agents
qwen3.5:2b 1.3 GB ~2 GB Better quality chat, structured extraction
qwen3.5:4b 2.5 GB ~3.5 GB Tool calling, agents, multi-turn conversation

For embedding-only tasks (semantic search, RAG indexing), dedicated embedding models are even more compact:

Model File Size RAM Needed
bge-small 68 MB ~128 MB
nomic-embed-text 90 MB ~256 MB
embeddinggemma-300m 303 MB ~512 MB

Deployment Strategy

1. Pre-bundle the model file

On edge devices, you typically cannot rely on an internet connection to download models at runtime. Bundle the model file with your application package or flash it onto the device image.

// Load from a known local path on the device
using LM model = new LM(new Uri("file:///opt/models/qwen3.5-0.8b-Q4_K_M.lmk"));

2. Use CPU-only inference

The CPU backend on ARM64 is approximately 39 MB total and requires no GPU drivers or special configuration:

# Install only the base package (no CUDA needed)
dotnet add package LM-Kit.NET

3. Reduce context size to save memory

On memory-constrained devices, use a smaller context window to reduce the KV cache footprint:

using LMKit.Model;

var fit = MemoryEstimation.FitParameters(
    modelPath: "/opt/models/qwen3.5-0.8b-Q4_K_M.lmk",
    contextSize: 2048  // Small context to conserve memory
);

if (fit.Success)
{
    Console.WriteLine($"Fits with {fit.ContextSize} token context");
}

AOT Compilation for Edge

LM-Kit.NET supports Ahead-of-Time (AOT) compilation on .NET 10.0, which produces a self-contained native binary with faster startup and no JIT overhead. This is particularly useful for embedded devices where startup time matters:

dotnet publish -r linux-arm64 -c Release_AOT

AOT builds use PublishAot=true and InvariantGlobalization=true, producing a single self-contained executable.


Jetson-Specific Setup

NVIDIA Jetson devices (Orin, Xavier) have integrated NVIDIA GPUs and support CUDA 12. This gives you GPU-accelerated inference on an edge device:

# Add CUDA 12 for ARM64 Linux
dotnet add package LM-Kit.NET.Backend.Cuda12.linux-arm64

With a Jetson Orin (8 to 64 GB shared memory), you can run models up to 8B parameters with GPU acceleration, which is enough for production-grade agents, RAG, and document processing.


Share