Can I Run LM-Kit.NET on Embedded Devices and Edge Hardware?
TL;DR
Yes. LM-Kit.NET supports Linux ARM64 natively, which covers NVIDIA Jetson (Orin, Xavier), AWS Graviton, and other ARM64 hardware. For resource-constrained devices, use small models (0.6B to 1B parameters) with CPU-only inference. The CPU backend on ARM64 is approximately 39 MB, making it practical for embedded deployments.
Supported Edge Platforms
| Device / Platform | Architecture | GPU Backend | Recommended Models |
|---|---|---|---|
| NVIDIA Jetson Orin / Xavier | Linux ARM64 | CUDA 12 | Up to 8B depending on GPU memory |
| AWS Graviton instances | Linux ARM64 | CPU | Any model that fits in RAM |
| Raspberry Pi 5 (8 GB) | Linux ARM64 | CPU | 0.6B to 1B models |
| Industrial Linux ARM64 boards | Linux ARM64 | CPU or Vulkan | 0.6B to 4B depending on RAM |
| Intel NUC / mini PCs | Windows or Linux x64 | Vulkan or CPU | Up to 8B depending on RAM/VRAM |
Choosing Models for Constrained Hardware
On edge devices with limited memory, pick the smallest model that meets your quality requirements:
| Model | File Size | RAM Needed | Good For |
|---|---|---|---|
gemma3:270m |
253 MB | ~512 MB | Lightweight classification, simple extraction |
qwen3.5:0.8b |
484 MB | ~1 GB | Basic chat, keyword extraction, short summaries |
gemma3:1b |
806 MB | ~1.5 GB | General chat, sentiment analysis, simple agents |
qwen3.5:2b |
1.3 GB | ~2 GB | Better quality chat, structured extraction |
qwen3.5:4b |
2.5 GB | ~3.5 GB | Tool calling, agents, multi-turn conversation |
For embedding-only tasks (semantic search, RAG indexing), dedicated embedding models are even more compact:
| Model | File Size | RAM Needed |
|---|---|---|
bge-small |
68 MB | ~128 MB |
nomic-embed-text |
90 MB | ~256 MB |
embeddinggemma-300m |
303 MB | ~512 MB |
Deployment Strategy
1. Pre-bundle the model file
On edge devices, you typically cannot rely on an internet connection to download models at runtime. Bundle the model file with your application package or flash it onto the device image.
// Load from a known local path on the device
using LM model = new LM(new Uri("file:///opt/models/qwen3.5-0.8b-Q4_K_M.lmk"));
2. Use CPU-only inference
The CPU backend on ARM64 is approximately 39 MB total and requires no GPU drivers or special configuration:
# Install only the base package (no CUDA needed)
dotnet add package LM-Kit.NET
3. Reduce context size to save memory
On memory-constrained devices, use a smaller context window to reduce the KV cache footprint:
using LMKit.Model;
var fit = MemoryEstimation.FitParameters(
modelPath: "/opt/models/qwen3.5-0.8b-Q4_K_M.lmk",
contextSize: 2048 // Small context to conserve memory
);
if (fit.Success)
{
Console.WriteLine($"Fits with {fit.ContextSize} token context");
}
AOT Compilation for Edge
LM-Kit.NET supports Ahead-of-Time (AOT) compilation on .NET 10.0, which produces a self-contained native binary with faster startup and no JIT overhead. This is particularly useful for embedded devices where startup time matters:
dotnet publish -r linux-arm64 -c Release_AOT
AOT builds use PublishAot=true and InvariantGlobalization=true, producing a single self-contained executable.
Jetson-Specific Setup
NVIDIA Jetson devices (Orin, Xavier) have integrated NVIDIA GPUs and support CUDA 12. This gives you GPU-accelerated inference on an edge device:
# Add CUDA 12 for ARM64 Linux
dotnet add package LM-Kit.NET.Backend.Cuda12.linux-arm64
With a Jetson Orin (8 to 64 GB shared memory), you can run models up to 8B parameters with GPU acceleration, which is enough for production-grade agents, RAG, and document processing.
📚 Related Content
- Which operating systems and CPU architectures does LM-Kit.NET support?: Full platform compatibility matrix and NuGet package mapping.
- How much disk space do LM-Kit.NET binaries add to my application?: ARM64 binary sizes for deployment planning.
- Can LM-Kit.NET run completely offline?: Air-gapped deployment instructions for devices with no internet.
- How do I choose the right model size for my hardware?: Model selection guidance based on available memory.
- Glossary: Edge AI: Concepts and patterns for running AI on edge devices.