Method HibernateAllContexts
HibernateAllContexts()
Schedules background hibernation of every in-memory inference context held for this model, both those actively in use and those idle in the recycle pool, serializing each context's state (including its speculative-decoding draft sibling) to disk and releasing its device and host memory. Returns the number of contexts scheduled.
public int HibernateAllContexts()
Returns
Remarks
Each request is coalesced and runs behind the context's write lock, so a context that is mid-decode hibernates as soon as it frees up rather than being interrupted. A hibernated context rehydrates transparently on its next use, so a busy session may re-materialize on its following request. Contexts that are already hibernated, not yet created, or pinned against hibernation are skipped.