Can I Run LM-Kit.NET on Embedded Devices and Edge Hardware?

TL;DR

Yes. LM-Kit.NET supports Linux ARM64 natively, which covers NVIDIA Jetson (Orin, Xavier), AWS Graviton, and other ARM64 hardware. For resource-constrained devices, use small models (0.6B to 1B parameters) with CPU-only inference. The CPU backend on ARM64 is approximately 39 MB, making it practical for embedded deployments.

Supported Edge Platforms

Device / Platform	Architecture	GPU Backend	Recommended Models
NVIDIA Jetson Orin / Xavier	Linux ARM64	CUDA 12	Up to 8B depending on GPU memory
AWS Graviton instances	Linux ARM64	CPU	Any model that fits in RAM
Raspberry Pi 5 (8 GB)	Linux ARM64	CPU	0.6B to 1B models
Industrial Linux ARM64 boards	Linux ARM64	CPU or Vulkan	0.6B to 4B depending on RAM
Intel NUC / mini PCs	Windows or Linux x64	Vulkan or CPU	Up to 8B depending on RAM/VRAM

Choosing Models for Constrained Hardware

On edge devices with limited memory, pick the smallest model that meets your quality requirements:

Model	File Size	RAM Needed	Good For
`qwen3.5:0.8b`	484 MB	~1 GB	Basic chat, keyword extraction, short summaries
`qwen3.5:2b`	1.3 GB	~2 GB	Better quality chat, structured extraction
`qwen3.5:4b`	2.5 GB	~3.5 GB	Tool calling, agents, multi-turn conversation

For embedding-only tasks (semantic search, RAG indexing), dedicated embedding models are even more compact:

Model	File Size	RAM Needed
`bge-small`	68 MB	~128 MB
`nomic-embed-text`	90 MB	~256 MB
`embeddinggemma-300m`	303 MB	~512 MB
`harrier-oss:0.6b`	357 MB	~640 MB

Deployment Strategy

1. Pre-bundle the model file

On edge devices, you typically cannot rely on an internet connection to download models at runtime. Bundle the model file with your application package or flash it onto the device image.

// Load from a known local path on the device
using LM model = new LM(new Uri("file:///opt/models/qwen3.5-0.8b-Q4_K_M.lmk"));

2. Use CPU-only inference

The CPU backend on ARM64 is approximately 39 MB total and requires no GPU drivers or special configuration:

# Install only the base package (no CUDA needed)
dotnet add package LM-Kit.NET

3. Reduce context size to save memory

On memory-constrained devices, use a smaller context window to reduce the KV cache footprint:

using LMKit.Model;

var fit = MemoryEstimation.FitParameters(
    modelPath: "/opt/models/qwen3.5-0.8b-Q4_K_M.lmk",
    contextSize: 2048  // Small context to conserve memory
);

if (fit.Success)
{
    Console.WriteLine($"Fits with {fit.ContextSize} token context");
}

AOT Compilation for Edge

LM-Kit.NET supports Ahead-of-Time (AOT) compilation on .NET 10.0, which produces a self-contained native binary with faster startup and no JIT overhead. This is particularly useful for embedded devices where startup time matters:

dotnet publish -r linux-arm64 -c Release_AOT

AOT builds use PublishAot=true and InvariantGlobalization=true, producing a single self-contained executable.

Jetson-Specific Setup

NVIDIA Jetson devices (Orin, Xavier) have integrated NVIDIA GPUs and support CUDA 12. This gives you GPU-accelerated inference on an edge device:

# Add CUDA 12 for ARM64 Linux
dotnet add package LM-Kit.NET.Backend.Cuda12.linux-arm64

With a Jetson Orin (8 to 64 GB shared memory), you can run models up to 8B parameters with GPU acceleration, which is enough for production-grade agents, RAG, and document processing.

Which operating systems and CPU architectures does LM-Kit.NET support?: Full platform compatibility matrix and NuGet package mapping.
How much disk space do LM-Kit.NET binaries add to my application?: ARM64 binary sizes for deployment planning.
Can LM-Kit.NET run completely offline?: Air-gapped deployment instructions for devices with no internet.
How do I choose the right model size for my hardware?: Model selection guidance based on available memory.
Glossary: Edge AI: Concepts and patterns for running AI on edge devices.

Table of Contents