Property EnableSpeculativeDecodingDrafts
EnableSpeculativeDecodingDrafts
Gets or sets a value indicating whether the speculative decoding draft assets packaged with the model are loaded for this LM instance.
public bool EnableSpeculativeDecodingDrafts { get; set; }
Property Value
- bool
Defaults to the process-wide EnableSpeculativeDecodingDrafts (itself
trueunless changed). Set this property explicitly to override the global default for a single model load.
Remarks
This is the per-model override of the speculative-decoding default. A new LM.LoadingOptions initializes it from the process-wide default EnableSpeculativeDecodingDrafts; assign this property before loading a model to deviate from that global default for that single model.
"Draft assets" covers every in-envelope source of candidate
tokens for draft-and-verify decoding: Multi-Token Prediction
(MTP) head tensors declared by the GGUF, and a draft model
shipped inside the LMK archive. When true, these assets
are loaded into VRAM when present, and the runtime uses them for
speculative decoding on supported architectures.
When false, the draft assets are skipped at load time
(saving a few hundred MiB to ~1 GiB of VRAM depending on the
model) and speculative decoding from packaged drafts is
unavailable for this instance.
HasSpeculativeDecodingDrafts then returns
false even if the file declares the assets. The decision
is permanent for the lifetime of the LM object;
load the model again with a different value to change it.
This does not affect a draft model assigned explicitly through DraftModel, which is wired independently of the model envelope.