Property DraftModel
DraftModel
Gets or sets an optional smaller "draft" model that accelerates this model's text generation through draft-model speculative decoding.
public LM DraftModel { get; set; }
Property Value
Remarks
When set, every text-completion context created from this model runs a draft-and-verify loop: the draft model proposes several continuation tokens, and this (target) model verifies them in a single batched decode, accepting the longest prefix it agrees with. Output is identical to non-speculative greedy decoding; only throughput changes. The technique pays off when the draft model is much smaller than the target yet agrees with it often (for example a few-hundred-million-parameter assistant model drafting for a multi-billion-parameter target from the same family).
The draft model must share the target's tokenizer: same vocabulary type, matching BOS/EOS handling, and near-identical token tables. Incompatible pairs are rejected at context-creation time and the model silently falls back to ordinary single-token decoding.
A draft model is one source of speculative-decoding drafts; the other is in-model Multi-Token Prediction (MTP) self-speculation, which uses a single model's own nextn heads and requires no second model. A draft model takes precedence when both are available. HasSpeculativeDecodingDrafts reports whether either source is present. Set this before creating conversations or other inference sessions; changing it afterwards does not affect contexts that have already been created.
The draft model is owned by the caller: it is not disposed when this model is disposed, and it must outlive any inference session created from this model.
Exceptions
- ArgumentException
Thrown when the value is this same model instance.