Class LM.TensorOverride
Specifies a tensor buffer type override that controls which device (CPU or GPU) is used for tensors matching a regex pattern. This enables fine-grained control over tensor placement, particularly useful for offloading MoE (Mixture of Experts) expert weights to CPU while keeping attention layers on GPU.
public sealed class LM.TensorOverride
- Inheritance
-
LM.TensorOverride
- Inherited Members
Examples
Example 1: Offload all MoE expert weights to CPU
var config = new LM.DeviceConfiguration
{
GpuLayerCount = int.MaxValue,
TensorOverrides = new List<LM.TensorOverride>
{
LM.TensorOverride.Cpu(@"\.ffn_.*_exps\.weight")
}
};
LM model = new LM(modelUri, deviceConfiguration: config);
Example 2: Multi-GPU with expert offloading
var config = new LM.DeviceConfiguration
{
GpuLayerCount = int.MaxValue,
TensorOverrides = new List<LM.TensorOverride>
{
LM.TensorOverride.Gpu(@"blk\.(0|1|2)\.attn", gpuIndex: 0),
LM.TensorOverride.Cpu(@"\.ffn_.*_exps\.weight"),
}
};
LM model = new LM(modelUri, deviceConfiguration: config);
Remarks
The Pattern property accepts a C++ std::regex pattern that is matched against tensor names using substring search. The first matching override wins when multiple overrides could match the same tensor.
Common patterns for MoE expert offloading:
\.ffn_.*_exps\.weightmatches all routed expert FFN weightsblk\.(0|1|2)\.ffn_.*_expsmatches experts in specific layers
Constructors
- TensorOverride(string, int)
Creates a tensor override that places matching tensors on a specific device.
Properties
- DeviceIndex
The GPU device index to place matching tensors on, or -1 for CPU.
- Pattern
The regex pattern to match against tensor names. Uses C++ std::regex syntax with substring matching (not anchored).
Methods
- Cpu(string)
Creates a tensor override that places matching tensors on CPU.
- Gpu(string, int)
Creates a tensor override that places matching tensors on a specific GPU.