Property EnableMultiTokenPrediction
EnableMultiTokenPrediction
Gets or sets a value indicating whether Multi-Token Prediction (MTP) is enabled for this LM instance.
public bool EnableMultiTokenPrediction { get; set; }
Property Value
- bool
The default value is
true.
Remarks
When true (the default), MTP head tensors are
loaded into VRAM if the GGUF declares them, and MTP
self-speculative decoding is used at inference time on
supported architectures.
When false, the head tensors are skipped at load
time (saving a few hundred MiB to ~1 GiB of VRAM
depending on the model) and MTP is unavailable for this
instance. HasMultiTokenPrediction
returns false even if the file itself declares
the heads. The decision is permanent for the lifetime
of the LM object; toggle MTP by loading
the model with a different value.