How Much Disk Space Do LM-Kit.NET Binaries Add to My Application?
TL;DR
The LM-Kit.NET native binaries add approximately 30 MB on Windows and 44 MB on Linux for a CPU-only deployment. Enabling GPU acceleration increases this to 85 MB (Vulkan) or 755 MB (CUDA 12 with SDK on Windows). Model files are separate and range from under 100 MB to over 17 GB depending on the model you choose.
What Ships with Your Application
When you add the LM-Kit.NET NuGet package to your project, the deployment includes three components:
- The managed .NET assembly (
LM-Kit.NET.dll): the C# SDK itself. This is small (a few MB) and runs on .NET Standard 2.0, .NET 8.0, 9.0, and 10.0. - Native runtime binaries: platform-specific libraries for LLM inference (llama.cpp), PDF processing (PDFium), and ONNX inference. These make up the bulk of the deployment size.
- Model files: downloaded separately on first use (or pre-bundled). These are not included in the NuGet package.
The tables below cover the native runtime binaries, which represent the main disk space cost.
Deployment Size by Platform and Backend
CPU Backends (No GPU Required)
| Platform | Backend | Native Binaries |
|---|---|---|
| Windows x64 | CPU (SSE4) | ~30 MB |
| Windows x64 | AVX | ~30 MB |
| Windows x64 | AVX2 | ~30 MB |
| Linux x64 | CPU (SSE4) | ~44 MB |
| Linux x64 | AVX / AVX2 | ~44 MB |
| Linux ARM64 | CPU | ~39 MB |
The base LM-Kit.NET package includes CPU, AVX, AVX2, and Vulkan backends. The SDK auto-selects the fastest instruction set your processor supports.
GPU Backends
| Platform | Backend | Native Binaries | Notes |
|---|---|---|---|
| Windows x64 | Vulkan | ~85 MB | Included in the base package. Supports NVIDIA, AMD, and Intel GPUs. |
| Windows x64 | CUDA 12 | ~755 MB | Requires a separate LM-Kit.NET.Backend.Cuda12.Windows package. Includes NVIDIA cuBLAS SDK (~553 MB). |
| Windows x64 | CUDA 13 | ~679 MB | Requires a separate LM-Kit.NET.Backend.Cuda13.Windows package. Includes NVIDIA cuBLAS SDK (~508 MB). |
| Linux x64 | Vulkan | ~98 MB | Included in the base package. |
| Linux x64 | CUDA 12 | ~222 MB | CUDA SDK libraries are installed system-wide on Linux, not bundled. |
| Linux ARM64 | CUDA 12 | ~240 MB | For Jetson and NVIDIA ARM platforms. |
| macOS | Metal | ~141 MB | Universal binary (Apple Silicon + Intel). Metal GPU acceleration is included automatically. |
What Makes CUDA Packages Larger?
The CUDA backend includes NVIDIA's cuBLAS libraries for GPU-accelerated matrix math. The largest single file is cublasLt64 at approximately 458 MB. This is why CUDA deployments are significantly larger than Vulkan.
If deployment size is a concern and you still want GPU acceleration, Vulkan is the best option: it adds only ~55 MB over the CPU baseline and supports GPUs from NVIDIA, AMD, and Intel without vendor-specific SDK dependencies.
Model Files Are Separate
Model files are not included in the NuGet package. They are downloaded on first use (or you can pre-bundle them for offline deployment). Here are representative sizes for popular models in Q4_K_M quantization:
| Model | File Size | Parameters | Use Case |
|---|---|---|---|
gemma3:1b |
806 MB | 1.0B | Lightweight chat, edge devices |
qwen3.5:4b |
2.5 GB | 4.0B | General chat, tool calling |
qwen3.5:9b |
5.0 GB | 9.0B | Strong chat, reasoning, agents |
gemma3:12b |
7.9 GB | 11.8B | High-quality generation |
gptoss:20b |
12.1 GB | 20.9B | Advanced reasoning, long context |
gemma3:27b |
17.1 GB | 27.0B | Maximum quality |
For the complete list with download sizes and capabilities, see the Model Catalog.
Reducing Deployment Size
A few strategies to minimize your application's footprint:
- Ship only the backends you need. If your target hardware does not have an NVIDIA GPU, skip the CUDA backend package entirely. The base package with CPU and Vulkan covers most scenarios.
- Target a single platform. Use .NET runtime identifiers (
win-x64,linux-x64,linux-arm64,osx) to publish only the binaries for your target OS. - Pre-download models. Bundle the model file with your installer instead of downloading at runtime. This lets you control the total download size and avoids surprises for end users.
- Choose smaller models. A 1B parameter model at 806 MB delivers surprisingly good results for many tasks. See Choosing the Right Model for guidance.
📚 Related Content
- Do I need a GPU to run AI models with LM-Kit.NET?: Understand when CPU is enough and when GPU acceleration is worth the extra deployment size.
- How do I choose the right model size for my hardware?: Match model file sizes to your available memory.
- Configure GPU Backends: Step-by-step install and verification for every backend.
- Model Catalog: Browse all models with exact file sizes and capabilities.
- Can LM-Kit.NET run completely offline?: Learn how to pre-bundle models for air-gapped deployments.