Handle Long Inputs with Overflow Policies
Real-world inputs are unpredictable. A customer support ticket might be 50 tokens or 50,000. A RAG system might inject more context than the model's window allows. LM-Kit.NET provides two policy mechanisms to handle these situations gracefully: InputLengthOverflowPolicy (what happens when the input exceeds the context window before generation starts) and ContextOverflowPolicy (what happens when the context fills up during generation). This tutorial shows how to configure both policies for different scenarios.
Why Overflow Policies Matter
Two enterprise problems that overflow policies solve:
- Preventing crashes in production pipelines. Without overflow policies, oversized inputs throw exceptions that crash batch processing jobs. A single malformed document or unexpectedly long ticket can halt an entire queue. Overflow policies let you handle these cases gracefully without losing the entire batch.
- Preserving the most relevant context. Trimming from the start keeps recent conversation context, while trimming from the end preserves the original question. The right policy depends on the use case, and choosing wrong means the model generates answers from the wrong part of the input.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n OverflowPolicyQuickstart
cd OverflowPolicyQuickstart
dotnet add package LM-Kit.NET
Step 2: Understanding the Default Behavior
using System.Text;
using LMKit.Model;
using LMKit.Inference;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Inspect default policies
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 256
};
Console.WriteLine("Default policies:");
Console.WriteLine($" Input overflow: {chat.InferencePolicies.InputLengthOverflowPolicy}");
Console.WriteLine($" Context overflow: {chat.InferencePolicies.ContextOverflowPolicy}");
Expected output:
Default policies:
Input overflow: TrimAuto
Context overflow: KVCacheShifting
Step 3: Configuring Policies for Different Scenarios
Each scenario calls for a different combination of policies. Here are three common patterns.
Scenario A: Long Document Summarization (Keep the Start)
When summarizing documents, the most important content is typically at the beginning. Trim from the end to preserve the introduction, abstract, or opening sections:
var summarizer = new SingleTurnConversation(model)
{
SystemPrompt = "Summarize the following document concisely.",
MaximumCompletionTokens = 512
};
// If the document is too long, trim from the end (keep the beginning)
summarizer.InferencePolicies.InputLengthOverflowPolicy = InputLengthOverflowPolicy.TrimEnd;
summarizer.InferencePolicies.ContextOverflowPolicy = ContextOverflowPolicy.StopGeneration;
Scenario B: Conversational Assistant (Keep Recent Context)
In multi-turn conversations, recent messages are more relevant than earlier ones. Trim from the start to keep the most recent exchanges:
var assistant = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 1024
};
// If conversation history exceeds context, trim from the start (keep recent messages)
assistant.InferencePolicies.InputLengthOverflowPolicy = InputLengthOverflowPolicy.TrimStart;
assistant.InferencePolicies.ContextOverflowPolicy = ContextOverflowPolicy.KVCacheShifting;
Scenario C: Strict Validation (Throw on Overflow)
For classification or validation tasks where partial input would produce wrong results, throw an exception and let the caller decide how to handle it:
var validator = new SingleTurnConversation(model)
{
SystemPrompt = "Classify the following text.",
MaximumCompletionTokens = 64
};
// Throw an exception if input is too long (let the caller handle it)
validator.InferencePolicies.InputLengthOverflowPolicy = InputLengthOverflowPolicy.Throw;
try
{
var result = validator.Submit(veryLongText);
Console.WriteLine($"Classification: {result.Completion}");
}
catch (LMKit.Exceptions.NotEnoughContextSizeException ex)
{
Console.WriteLine($"Input too long: {ex.Message}");
Console.WriteLine("Consider splitting the input or using a model with a larger context window.");
}
Step 4: Monitoring Context Usage
Track how much of the context window is consumed during a conversation so you can take action before overflow occurs:
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 512
};
chat.InferencePolicies.InputLengthOverflowPolicy = InputLengthOverflowPolicy.TrimStart;
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("You: ");
Console.ResetColor();
string? input = Console.ReadLine();
if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
break;
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write("Assistant: ");
Console.ResetColor();
var result = chat.Submit(input);
int remaining = chat.ContextRemainingSpace;
int total = model.ContextLength;
double usage = (double)(total - remaining) / total * 100;
Console.WriteLine($"\n [context: {usage:F0}% used, {remaining} tokens remaining]\n");
}
Input Length Overflow Policies
| Policy | When Input Is Too Long | Best For |
|---|---|---|
TrimAuto |
System chooses the best trim strategy | General purpose (default) |
TrimStart |
Removes oldest tokens | Conversations (keep recent context) |
TrimEnd |
Removes newest tokens | Document processing (keep the beginning) |
KVCacheShifting |
Shifts the KV cache window | Long-running generation tasks |
Throw |
Raises NotEnoughContextSizeException |
Strict validation, custom handling |
Context Overflow Policies
| Policy | When Context Fills During Generation | Best For |
|---|---|---|
KVCacheShifting |
Dynamically shifts the cache | Long responses (default) |
StopGeneration |
Stops and returns what was generated | Bounded output, predictable behavior |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
NotEnoughContextSizeException |
Input exceeds context with Throw policy |
Switch to TrimAuto or split input into smaller chunks |
| Response cuts off mid-sentence | StopGeneration policy triggered |
Use KVCacheShifting or increase context size |
| Old messages forgotten in chat | TrimStart removed early conversation |
Use AgentMemory for long-term recall across sessions |
| Garbled output after long conversation | Context shifting artifacts | Clear history periodically with chat.ClearHistory() |
Next Steps
- Build a Conversational Assistant with Memory: persistent memory across sessions with
AgentMemory. - Build a RAG Pipeline: inject external knowledge into the context window.
- Summarize Documents and Text: document summarization with overflow handling.