Field EnableSpeculativeDecodingDrafts
Gets or sets the process-wide default that decides whether the speculative-decoding draft assets packaged with a model are loaded.
public static bool EnableSpeculativeDecodingDrafts
Returns
- bool
trueto load packaged speculative-decoding drafts by default; otherwise,false. Default istrue.
Remarks
This value seeds EnableSpeculativeDecodingDrafts: every LM.LoadingOptions instance adopts whatever this property holds at the moment it is constructed. It is therefore the global default, not a hard switch. To change the decision for an individual model, set EnableSpeculativeDecodingDrafts explicitly when loading that model; the per-model value always wins over this default.
"Draft assets" covers every in-envelope source of candidate tokens for
draft-and-verify decoding: Multi-Token Prediction (MTP) head tensors
declared by the GGUF, and a draft model shipped inside the LMK archive.
When true (the default), these assets are loaded into VRAM when
present and the runtime uses them for speculative decoding on supported
architectures. When false, they are skipped at load time, which
saves VRAM but disables speculative decoding from packaged drafts. This
does not affect a draft model assigned explicitly through
DraftModel, which is wired independently of the model
envelope.