Property EnableSpeculativeDecodingDrafts
EnableSpeculativeDecodingDrafts
Gets or sets a value indicating whether the speculative decoding draft assets packaged with the model are loaded for this LM instance.
public bool EnableSpeculativeDecodingDrafts { get; set; }
Property Value
- bool
The default value is
true.
Remarks
"Draft assets" covers every in-envelope source of candidate
tokens for draft-and-verify decoding: Multi-Token Prediction
(MTP) head tensors declared by the GGUF, and a draft model
shipped inside the LMK archive. When true (the default),
these assets are loaded into VRAM when present, and the runtime
uses them for speculative decoding on supported architectures.
When false, the draft assets are skipped at load time
(saving a few hundred MiB to ~1 GiB of VRAM depending on the
model) and speculative decoding from packaged drafts is
unavailable for this instance.
HasSpeculativeDecodingDrafts then returns
false even if the file declares the assets. The decision
is permanent for the lifetime of the LM object;
load the model again with a different value to change it.
This does not affect a draft model assigned explicitly through DraftModel, which is wired independently of the model envelope.