Class MemoryEstimation
Provides methods for estimating memory requirements and fitting model and context parameters to available device memory using llama.cpp's native memory estimation.
public static class MemoryEstimation
- Inheritance
-
MemoryEstimation
- Inherited Members
Remarks
Unlike the heuristic-based approach in DeviceConfiguration, this class uses
llama.cpp's built-in llama_params_fit function to accurately probe available memory
across all devices (CPU + GPUs), accounting for KV cache, compute buffers, and tensor placement.
The estimation runs without loading the full model weights, making it suitable for pre-flight checks before allocating resources.
Methods
- FitParameters(LM, uint, uint)
Fits model and context parameters to available device memory using a loaded model instance.
- FitParameters(string, uint, uint, DeviceConfiguration)
Fits model and context parameters to available device memory, determining the optimal context size and GPU layer count that can be allocated without running out of memory.