Property EnableSpeculativeDecodingDrafts

Namespace: LMKit.Model

Assembly: LM-Kit.NET.dll

EnableSpeculativeDecodingDrafts

Gets or sets a value indicating whether the speculative decoding draft assets packaged with the model are loaded for this LM instance.

public bool EnableSpeculativeDecodingDrafts { get; set; }

Property Value

bool: Defaults to the process-wide EnableSpeculativeDecodingDrafts (itself true unless changed). Set this property explicitly to override the global default for a single model load.

Remarks

This is the per-model override of the speculative-decoding default. A new LM.LoadingOptions initializes it from the process-wide default EnableSpeculativeDecodingDrafts; assign this property before loading a model to deviate from that global default for that single model.

"Draft assets" covers every in-envelope source of candidate tokens for draft-and-verify decoding: Multi-Token Prediction (MTP) head tensors declared by the GGUF, and a draft model shipped inside the LMK archive. When true, these assets are loaded into VRAM when present, and the runtime uses them for speculative decoding on supported architectures.

When false, the draft assets are skipped at load time (saving a few hundred MiB to ~1 GiB of VRAM depending on the model) and speculative decoding from packaged drafts is unavailable for this instance. HasSpeculativeDecodingDrafts then returns false even if the file declares the assets. The decision is permanent for the lifetime of the LM object; load the model again with a different value to change it.

This does not affect a draft model assigned explicitly through DraftModel, which is wired independently of the model envelope.

Table of Contents

Property EnableSpeculativeDecodingDrafts

EnableSpeculativeDecodingDrafts

Property Value

Remarks