How Much Disk Space Do LM-Kit.NET Binaries Add to My Application?

TL;DR

The LM-Kit.NET native binaries add approximately 30 MB on Windows and 44 MB on Linux for a CPU-only deployment. Enabling GPU acceleration increases this to 85 MB (Vulkan) or 755 MB (CUDA 12 with SDK on Windows). Model files are separate and range from under 100 MB to over 17 GB depending on the model you choose.

What Ships with Your Application

When you add the LM-Kit.NET NuGet package to your project, the deployment includes three components:

The managed .NET assembly (LM-Kit.NET.dll): the C# SDK itself. This is small (a few MB) and runs on .NET Standard 2.0, .NET 8.0, 9.0, and 10.0.
Native runtime binaries: platform-specific libraries for LLM inference (llama.cpp), PDF processing (PDFium), and ONNX inference. These make up the bulk of the deployment size.
Model files: downloaded separately on first use (or pre-bundled). These are not included in the NuGet package.

The tables below cover the native runtime binaries, which represent the main disk space cost.

Deployment Size by Platform and Backend

CPU Backends (No GPU Required)

Platform	Backend	Native Binaries
Windows x64	CPU (SSE4)	~30 MB
Windows x64	AVX	~30 MB
Windows x64	AVX2	~30 MB
Linux x64	CPU (SSE4)	~44 MB
Linux x64	AVX / AVX2	~44 MB
Linux ARM64	CPU	~39 MB

The base LM-Kit.NET package includes CPU, AVX, AVX2, and Vulkan backends. The SDK auto-selects the fastest instruction set your processor supports.

GPU Backends

Platform	Backend	Native Binaries	Notes
Windows x64	Vulkan	~85 MB	Included in the base package. Supports NVIDIA, AMD, and Intel GPUs.
Windows x64	CUDA 12	~755 MB	Requires a separate `LM-Kit.NET.Backend.Cuda12.Windows` package. Includes NVIDIA cuBLAS SDK (~553 MB).
Windows x64	CUDA 13	~679 MB	Requires a separate `LM-Kit.NET.Backend.Cuda13.Windows` package. Includes NVIDIA cuBLAS SDK (~508 MB).
Linux x64	Vulkan	~98 MB	Included in the base package.
Linux x64	CUDA 12	~222 MB	CUDA SDK libraries are installed system-wide on Linux, not bundled.
Linux ARM64	CUDA 12	~240 MB	For Jetson and NVIDIA ARM platforms.
macOS	Metal	~141 MB	Universal binary (Apple Silicon + Intel). Metal GPU acceleration is included automatically.

What Makes CUDA Packages Larger?

The CUDA backend includes NVIDIA's cuBLAS libraries for GPU-accelerated matrix math. The largest single file is cublasLt64 at approximately 458 MB. This is why CUDA deployments are significantly larger than Vulkan.

If deployment size is a concern and you still want GPU acceleration, Vulkan is the best option: it adds only ~55 MB over the CPU baseline and supports GPUs from NVIDIA, AMD, and Intel without vendor-specific SDK dependencies.

Model Files Are Separate

Model files are not included in the NuGet package. They are downloaded on first use (or you can pre-bundle them for offline deployment). Here are representative sizes for popular models in Q4_K_M quantization:

Model	File Size	Parameters	Use Case
`gemma4:e4b`	4.8 GB	7.5B	Efficient chat, vision, tool calling
`qwen3.5:4b`	2.5 GB	4.0B	General chat, tool calling
`qwen3.5:9b`	5.0 GB	9.0B	Strong chat, reasoning, agents
`gptoss:20b`	12.1 GB	20.9B	Advanced reasoning, long context
`gemma4:26b-a4b`	17.3 GB	26B	Maximum quality

For the complete list with download sizes and capabilities, see the Model Catalog.

Reducing Deployment Size

A few strategies to minimize your application's footprint:

Ship only the backends you need. If your target hardware does not have an NVIDIA GPU, skip the CUDA backend package entirely. The base package with CPU and Vulkan covers most scenarios.
Target a single platform. Use .NET runtime identifiers (win-x64, linux-x64, linux-arm64, osx) to publish only the binaries for your target OS.
Pre-download models. Bundle the model file with your installer instead of downloading at runtime. This lets you control the total download size and avoids surprises for end users.
Choose smaller models. A 4B parameter model at 2.5 GB delivers surprisingly good results for many tasks. See Choosing the Right Model for guidance.

Do I need a GPU to run AI models with LM-Kit.NET?: Understand when CPU is enough and when GPU acceleration is worth the extra deployment size.
How do I choose the right model size for my hardware?: Match model file sizes to your available memory.
Configure GPU Backends: Step-by-step install and verification for every backend.
Model Catalog: Browse all models with exact file sizes and capabilities.
Can LM-Kit.NET run completely offline?: Learn how to pre-bundle models for air-gapped deployments.

Table of Contents