Class LM.TensorOverride

Namespace: LMKit.Model

Assembly: LM-Kit.NET.dll

Specifies a tensor buffer type override that controls which device (CPU or GPU) is used for tensors matching a regex pattern. This enables fine-grained control over tensor placement, particularly useful for offloading MoE (Mixture of Experts) expert weights to CPU while keeping attention layers on GPU.

public sealed class LM.TensorOverride

Inheritance: object

LM.TensorOverride

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

Example 1: Offload all MoE expert weights to CPU

var config = new LM.DeviceConfiguration
{
    GpuLayerCount = int.MaxValue,
    TensorOverrides = new List<LM.TensorOverride>
    {
        LM.TensorOverride.Cpu(@"\.ffn_.*_exps\.weight")
    }
};
LM model = new LM(modelUri, deviceConfiguration: config);

Example 2: Multi-GPU with expert offloading

var config = new LM.DeviceConfiguration
{
    GpuLayerCount = int.MaxValue,
    TensorOverrides = new List<LM.TensorOverride>
    {
        LM.TensorOverride.Gpu(@"blk\.(0|1|2)\.attn", gpuIndex: 0),
        LM.TensorOverride.Cpu(@"\.ffn_.*_exps\.weight"),
    }
};
LM model = new LM(modelUri, deviceConfiguration: config);

Remarks

The Pattern property accepts a C++ std::regex pattern that is matched against tensor names using substring search. The first matching override wins when multiple overrides could match the same tensor.

Common patterns for MoE expert offloading:

\.ffn_.*_exps\.weight matches all routed expert FFN weights
blk\.(0|1|2)\.ffn_.*_exps matches experts in specific layers

Constructors

TensorOverride(string, int): Creates a tensor override that places matching tensors on a specific device.

Properties

DeviceIndex: The GPU device index to place matching tensors on, or -1 for CPU.

Pattern: The regex pattern to match against tensor names. Uses C++ std::regex syntax with substring matching (not anchored).

Methods

Cpu(string): Creates a tensor override that places matching tensors on CPU.

Gpu(string, int): Creates a tensor override that places matching tensors on a specific GPU.