Summarize Documents and Text
Long documents, articles, and reports need to be condensed before humans can act on them. LM-Kit.NET's Summarizer class generates titles and summaries from raw text, PDFs, images, and Office documents. It handles content that exceeds the model's context window by recursively splitting and summarizing. This tutorial builds a summarization tool that processes both text and file attachments with configurable output length and overflow strategies.
Why Local Summarization Matters
Two enterprise problems that on-device summarization solves:
- Confidential documents stay private. Legal briefs, medical records, internal strategy docs. Sending these to a cloud API means a third party processes your most sensitive content. Local summarization keeps every word on your infrastructure.
- Unlimited throughput at fixed cost. Summarizing thousands of customer support tickets, research papers, or news articles daily accumulates fast with per-token pricing. A local model handles the volume at hardware cost only.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n SummarizeQuickstart
cd SummarizeQuickstart
dotnet add package LM-Kit.NET
Step 2: Understand the Summarizer
The Summarizer class generates two outputs from any content:
- Title: a short heading (controlled by
MaxTitleWords) - Summary: a condensed version (controlled by
MaxContentWords)
It supports two intents:
| Intent | What it does |
|---|---|
Classification |
Identifies the content type and topic (default) |
Abstraction |
Generates a semantic summary in new words |
When content exceeds the model's context window, the OverflowStrategy controls behavior:
| Strategy | Behavior |
|---|---|
RecursiveSummarize |
Splits into chunks, summarizes each, then summarizes the summaries (default) |
Truncate |
Cuts content from the end to fit |
Reject |
Throws an exception |
Step 3: Summarize Text
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Configure summarizer
// ──────────────────────────────────────
var summarizer = new Summarizer(model)
{
Intent = Summarizer.SummarizationIntent.Abstraction,
MaxTitleWords = 10,
MaxContentWords = 100,
GenerateTitle = true,
GenerateContent = true,
OverflowStrategy = Summarizer.OverflowResolutionStrategy.RecursiveSummarize
};
// ──────────────────────────────────────
// 3. Summarize sample text
// ──────────────────────────────────────
string article = """
Retrieval-Augmented Generation (RAG) combines information retrieval with text
generation to produce grounded responses. The approach first retrieves relevant
documents from a knowledge base using vector similarity search, then passes those
documents as context to a language model. This reduces hallucination because the
model generates answers based on actual source material rather than relying solely
on its training data. RAG systems typically use embedding models to convert both
queries and documents into dense vectors, stored in a vector database for efficient
similarity search. At inference time, the user's query is embedded, the top-k most
similar document chunks are retrieved, and these chunks are prepended to the prompt
sent to the generation model. Production RAG systems often add reranking, hybrid
search combining keyword and semantic matching, and chunk overlap strategies to
improve retrieval quality.
""";
Summarizer.SummarizerResult result = summarizer.Summarize(article);
Console.WriteLine($"Title: {result.Title}");
Console.WriteLine($"Summary: {result.Summary}");
Console.WriteLine($"Confidence: {result.Confidence:P0}");
Step 4: Summarize Documents and Images
The Summarizer accepts file attachments (PDFs, Word documents, images) through the Attachment class:
using LMKit.Data;
// Summarize a PDF
string pdfPath = "quarterly_report.pdf";
var attachment = new Attachment(pdfPath);
Summarizer.SummarizerResult pdfResult = summarizer.Summarize(attachment);
Console.WriteLine($"Title: {pdfResult.Title}");
Console.WriteLine($"Summary: {pdfResult.Summary}");
For vision-capable models, you can also summarize images directly:
using LMKit.Graphics;
// Load model that supports vision
using LM visionModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
var visionSummarizer = new Summarizer(visionModel)
{
Intent = Summarizer.SummarizationIntent.Abstraction,
MaxContentWords = 50
};
var image = new ImageBuffer("whiteboard_photo.png");
Summarizer.SummarizerResult imageResult = visionSummarizer.Summarize(image);
Console.WriteLine($"Image summary: {imageResult.Summary}");
Step 5: Batch Summarization
Process multiple documents and export results:
string[] files = Directory.GetFiles("documents", "*.txt");
var output = new List<string>();
output.Add("file,title,summary,confidence");
Console.WriteLine($"Summarizing {files.Length} files...\n");
foreach (string file in files)
{
string content = File.ReadAllText(file);
Summarizer.SummarizerResult r = summarizer.Summarize(content);
string fileName = Path.GetFileName(file);
Console.WriteLine($" {fileName}: {r.Title}");
output.Add($"\"{fileName}\",\"{r.Title}\",\"{r.Summary.Replace("\"", "\"\"")}\",{r.Confidence:F2}");
}
File.WriteAllLines("summaries.csv", output);
Console.WriteLine($"\nExported {files.Length} summaries to summaries.csv");
Step 6: Classification vs. Abstraction
Choose the right intent for your use case:
string email = """
Hi team, just a reminder that the quarterly planning meeting is scheduled for
Friday at 2pm. Please review the attached budget proposal before the meeting
and come prepared with your department's priorities for Q3. Let me know if you
have any scheduling conflicts.
""";
// Classification: identifies what the content IS
summarizer.Intent = Summarizer.SummarizationIntent.Classification;
var classified = summarizer.Summarize(email);
Console.WriteLine($"Classification: {classified.Title}");
// Output example: "Meeting Reminder Email"
// Abstraction: condenses what the content SAYS
summarizer.Intent = Summarizer.SummarizationIntent.Abstraction;
var abstracted = summarizer.Summarize(email);
Console.WriteLine($"Abstraction: {abstracted.Title}");
Console.WriteLine($"Summary: {abstracted.Summary}");
// Output example: "Q3 Planning Meeting Friday at 2pm"
Use Classification when you need to tag or sort content. Use Abstraction when you need a human-readable condensation.
Step 7: Controlling Output with Guidance
The Guidance property steers the summarizer toward specific angles:
string technicalDoc = File.ReadAllText("architecture_doc.txt");
// Default summary
summarizer.Guidance = "";
var general = summarizer.Summarize(technicalDoc);
Console.WriteLine($"General: {general.Summary}\n");
// Focus on security aspects
summarizer.Guidance = "Focus on security implications and vulnerabilities.";
var security = summarizer.Summarize(technicalDoc);
Console.WriteLine($"Security: {security.Summary}\n");
// Focus on cost
summarizer.Guidance = "Focus on cost, budget, and resource requirements.";
var cost = summarizer.Summarize(technicalDoc);
Console.WriteLine($"Cost: {cost.Summary}\n");
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Summary is too generic | Using Classification instead of Abstraction |
Set Intent = SummarizationIntent.Abstraction |
| Content too long error | OverflowStrategy set to Reject |
Switch to RecursiveSummarize or Truncate |
| Summary exceeds desired length | MaxContentWords too high |
Lower MaxContentWords (default is 200) |
| No title generated | GenerateTitle is false |
Set GenerateTitle = true |
| Low confidence on image input | Model lacks vision capability | Use a VLM like gemma3:4b which supports both text and images |
| Batch processing slow | Large documents with recursive overflow | Reduce MaximumContextLength or use Truncate strategy for speed |
Next Steps
- Build a RAG Pipeline Over Your Own Documents: combine summarization with retrieval for Q&A.
- Build a Private Document Q&A System: interactive PDF Q&A.
- Samples: Text Summarizer: text summarization demo.
- Samples: Document Summarizer: document summarization demo.