Table of Contents

Summarize Documents and Text

Long documents, articles, and reports need to be condensed before humans can act on them. LM-Kit.NET's Summarizer class generates titles and summaries from raw text, PDFs, images, and Office documents. It handles content that exceeds the model's context window by recursively splitting and summarizing. This tutorial builds a summarization tool that processes both text and file attachments with configurable output length and overflow strategies.


Why Local Summarization Matters

Two enterprise problems that on-device summarization solves:

  1. Confidential documents stay private. Legal briefs, medical records, internal strategy docs. Sending these to a cloud API means a third party processes your most sensitive content. Local summarization keeps every word on your infrastructure.
  2. Unlimited throughput at fixed cost. Summarizing thousands of customer support tickets, research papers, or news articles daily accumulates fast with per-token pricing. A local model handles the volume at hardware cost only.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n SummarizeQuickstart
cd SummarizeQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand the Summarizer

The Summarizer class generates two outputs from any content:

  • Title: a short heading (controlled by MaxTitleWords)
  • Summary: a condensed version (controlled by MaxContentWords)

It supports two intents:

Intent What it does
Classification Identifies the content type and topic (default)
Abstraction Generates a semantic summary in new words

When content exceeds the model's context window, the OverflowStrategy controls behavior:

Strategy Behavior
RecursiveSummarize Splits into chunks, summarizes each, then summarizes the summaries (default)
Truncate Cuts content from the end to fit
Reject Throws an exception

Step 3: Summarize Text

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Configure summarizer
// ──────────────────────────────────────
var summarizer = new Summarizer(model)
{
    Intent = Summarizer.SummarizationIntent.Abstraction,
    MaxTitleWords = 10,
    MaxContentWords = 100,
    GenerateTitle = true,
    GenerateContent = true,
    OverflowStrategy = Summarizer.OverflowResolutionStrategy.RecursiveSummarize
};

// ──────────────────────────────────────
// 3. Summarize sample text
// ──────────────────────────────────────
string article = """
    Retrieval-Augmented Generation (RAG) combines information retrieval with text
    generation to produce grounded responses. The approach first retrieves relevant
    documents from a knowledge base using vector similarity search, then passes those
    documents as context to a language model. This reduces hallucination because the
    model generates answers based on actual source material rather than relying solely
    on its training data. RAG systems typically use embedding models to convert both
    queries and documents into dense vectors, stored in a vector database for efficient
    similarity search. At inference time, the user's query is embedded, the top-k most
    similar document chunks are retrieved, and these chunks are prepended to the prompt
    sent to the generation model. Production RAG systems often add reranking, hybrid
    search combining keyword and semantic matching, and chunk overlap strategies to
    improve retrieval quality.
    """;

Summarizer.SummarizerResult result = summarizer.Summarize(article);

Console.WriteLine($"Title:      {result.Title}");
Console.WriteLine($"Summary:    {result.Summary}");
Console.WriteLine($"Confidence: {result.Confidence:P0}");

Step 4: Summarize Documents and Images

The Summarizer accepts file attachments (PDFs, Word documents, images) through the Attachment class:

using LMKit.Data;

// Summarize a PDF
string pdfPath = "quarterly_report.pdf";
var attachment = new Attachment(pdfPath);

Summarizer.SummarizerResult pdfResult = summarizer.Summarize(attachment);

Console.WriteLine($"Title:   {pdfResult.Title}");
Console.WriteLine($"Summary: {pdfResult.Summary}");

For vision-capable models, you can also summarize images directly:

using LMKit.Graphics;

// Load model that supports vision
using LM visionModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });

var visionSummarizer = new Summarizer(visionModel)
{
    Intent = Summarizer.SummarizationIntent.Abstraction,
    MaxContentWords = 50
};

var image = new ImageBuffer("whiteboard_photo.png");
Summarizer.SummarizerResult imageResult = visionSummarizer.Summarize(image);

Console.WriteLine($"Image summary: {imageResult.Summary}");

Step 5: Batch Summarization

Process multiple documents and export results:

string[] files = Directory.GetFiles("documents", "*.txt");
var output = new List<string>();
output.Add("file,title,summary,confidence");

Console.WriteLine($"Summarizing {files.Length} files...\n");

foreach (string file in files)
{
    string content = File.ReadAllText(file);
    Summarizer.SummarizerResult r = summarizer.Summarize(content);
    string fileName = Path.GetFileName(file);

    Console.WriteLine($"  {fileName}: {r.Title}");

    output.Add($"\"{fileName}\",\"{r.Title}\",\"{r.Summary.Replace("\"", "\"\"")}\",{r.Confidence:F2}");
}

File.WriteAllLines("summaries.csv", output);
Console.WriteLine($"\nExported {files.Length} summaries to summaries.csv");

Step 6: Classification vs. Abstraction

Choose the right intent for your use case:

string email = """
    Hi team, just a reminder that the quarterly planning meeting is scheduled for
    Friday at 2pm. Please review the attached budget proposal before the meeting
    and come prepared with your department's priorities for Q3. Let me know if you
    have any scheduling conflicts.
    """;

// Classification: identifies what the content IS
summarizer.Intent = Summarizer.SummarizationIntent.Classification;
var classified = summarizer.Summarize(email);
Console.WriteLine($"Classification: {classified.Title}");
// Output example: "Meeting Reminder Email"

// Abstraction: condenses what the content SAYS
summarizer.Intent = Summarizer.SummarizationIntent.Abstraction;
var abstracted = summarizer.Summarize(email);
Console.WriteLine($"Abstraction:    {abstracted.Title}");
Console.WriteLine($"Summary:        {abstracted.Summary}");
// Output example: "Q3 Planning Meeting Friday at 2pm"

Use Classification when you need to tag or sort content. Use Abstraction when you need a human-readable condensation.


Step 7: Controlling Output with Guidance

The Guidance property steers the summarizer toward specific angles:

string technicalDoc = File.ReadAllText("architecture_doc.txt");

// Default summary
summarizer.Guidance = "";
var general = summarizer.Summarize(technicalDoc);
Console.WriteLine($"General:  {general.Summary}\n");

// Focus on security aspects
summarizer.Guidance = "Focus on security implications and vulnerabilities.";
var security = summarizer.Summarize(technicalDoc);
Console.WriteLine($"Security: {security.Summary}\n");

// Focus on cost
summarizer.Guidance = "Focus on cost, budget, and resource requirements.";
var cost = summarizer.Summarize(technicalDoc);
Console.WriteLine($"Cost:     {cost.Summary}\n");

Common Issues

Problem Cause Fix
Summary is too generic Using Classification instead of Abstraction Set Intent = SummarizationIntent.Abstraction
Content too long error OverflowStrategy set to Reject Switch to RecursiveSummarize or Truncate
Summary exceeds desired length MaxContentWords too high Lower MaxContentWords (default is 200)
No title generated GenerateTitle is false Set GenerateTitle = true
Low confidence on image input Model lacks vision capability Use a VLM like gemma3:4b which supports both text and images
Batch processing slow Large documents with recursive overflow Reduce MaximumContextLength or use Truncate strategy for speed

Next Steps