Table of Contents

Property DraftModel

Namespace
LMKit.Model
Assembly
LM-Kit.NET.dll

DraftModel

Gets or sets an optional smaller "draft" model that accelerates this model's text generation through draft-model speculative decoding.

public LM DraftModel { get; set; }

Property Value

LM

Remarks

When set, every text-completion context created from this model runs a draft-and-verify loop: the draft model proposes several continuation tokens, and this (target) model verifies them in a single batched decode, accepting the longest prefix it agrees with. Output is identical to non-speculative greedy decoding; only throughput changes. The technique pays off when the draft model is much smaller than the target yet agrees with it often (for example a few-hundred-million-parameter assistant model drafting for a multi-billion-parameter target from the same family).

The draft model must share the target's tokenizer: same vocabulary type, matching BOS/EOS handling, and near-identical token tables. Incompatible pairs are rejected at context-creation time and the model silently falls back to ordinary single-token decoding.

A draft model is one source of speculative-decoding drafts; the other is in-model Multi-Token Prediction (MTP) self-speculation, which uses a single model's own nextn heads and requires no second model. A draft model takes precedence when both are available. HasSpeculativeDecodingDrafts reports whether either source is present. Set this before creating conversations or other inference sessions; changing it afterwards does not affect contexts that have already been created.

The draft model is owned by the caller: it is not disposed when this model is disposed, and it must outlive any inference session created from this model.

Exceptions

ArgumentException

Thrown when the value is this same model instance.

Share