👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document_summarizer
Document Summarizer in .NET Applications
🎯 Purpose of the Sample
Document Summarizer demonstrates how to use LM-Kit.NET to load a local model and generate a short summary, optionally with an auto-generated title, from a document on disk.
This demo is especially useful for PDFs and images, which are the most common real-world formats in document workflows:
- PDF files: text-based PDFs (selectable text) and scanned PDFs (image-based pages).
- Images: screenshots, photos of documents, receipts, forms, etc.
The sample shows how to:
- Download and load a model with progress callbacks.
- Select a predefined model from a simple menu (or paste a custom model URI).
- Wrap a file as an
Attachment. - Summarize the document using LM-Kit’s
Summarizer. - Configure summarization output (title, summary text, max length, optional guidance).
- Run in a loop to summarize one document after another.
Why summarize documents with LM-Kit.NET?
- Local-first: summarize sensitive files on your own hardware.
- Unified input: PDFs and images go through the same
Attachmententry point. - Configurable: tune output length and add guidance (example: always summarize in French).
- Simple developer experience: minimal C# console app with a readable flow.
👥 Target Audience
- Product and Platform - add summarization to existing .NET services.
- Data and Document Processing - quickly digest large sets of PDFs, scans, and screenshots.
- RPA and Back-office - summarize reports, tickets, receipts, and back-office documents.
- Demo and Education - minimal example of model loading plus a practical document task in C#.
🚀 Problem Solved
- Turn long documents into short summaries: get a quick overview without reading everything.
- Summarize PDFs and images: handle common formats like PDF reports and screenshot captures.
- Model flexibility: pick a model based on your VRAM and latency requirements.
- Repeatable loop: summarize multiple files in one console session.
- Optional control: generate a title, control summary length, and add guidance.
💻 Sample Application Description
Console app that:
Lets you choose a model (or paste a custom model URI).
Downloads the model if needed, with live progress updates.
Loads the model with progress reporting.
Creates a
Summarizerconfigured to:- Generate a title
- Generate summary content
- Limit summary size to 100 words
- Apply optional guidance
Repeatedly prompts for a document path:
- Loads it as an
Attachment(PDF, image, and other supported formats). - Calls
summarizer.Summarize(attachment). - Prints Title and Summary.
- Loads it as an
Stops when you submit an empty path.
✨ Key Features
🧠 Document summarization: generates a short summary from a document input.
🏷️ Auto-title: generates a title from the document content.
📄 PDF-focused: great for reports and multi-page PDFs (including scanned PDFs).
🖼️ Image-friendly: summarize screenshots and photos of documents.
📥 Interactive loop: enter a path, get a summary, repeat.
📏 Output control:
MaxContentWordsto cap lengthGuidanceto steer style or language
📦 Model lifecycle:
- Automatic download on first use
- Loading progress shown in the console
❌ Nice errors: friendly message when a file path is invalid or inaccessible.
🧰 Built-In Models (menu)
On startup, the demo shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | MiniCPM 2.6 o 8.1B | ~5.9 GB VRAM |
| 1 | Alibaba Qwen 3 2B | ~2.5 GB VRAM |
| 2 | Alibaba Qwen 3 4B | ~4.5 GB VRAM |
| 3 | Alibaba Qwen 3 8B | ~6.5 GB VRAM |
| 4 | Google Gemma 3 4B | ~5.7 GB VRAM |
| 5 | Google Gemma 3 12B | ~11 GB VRAM |
| 6 | Mistral Ministral 3 3B | ~3.5 GB VRAM |
| 7 | Mistral Ministral 3 8B | ~6.5 GB VRAM |
| 8 | Mistral Ministral 3 14B | ~12 GB VRAM |
| other | Custom model URI (GGUF / LMK...) | depends on model |
Choosing a model for PDFs and images
- For text-based PDFs, most models work well since the input already contains clean text.
- For images and scanned PDFs, prefer a vision-capable model if your pipeline needs to read pixels (screenshots, photos, scanned pages). If the document is already selectable text, vision capability is usually less important.
🧠 Supported Models
The demo is pre-wired to LM-Kit’s predefined model cards:
minicpm-oqwen3-vl:2bqwen3-vl:4bqwen3-vl:8bgemma3:4bgemma3:12bministral3:3bministral3:8bministral3:14b
Internally:
modelLink = ModelCard
.GetPredefinedModelCardByModelID("gemma3:4b")
.ModelUri
.ToString();
You can also provide any valid model URI manually (including local paths or custom model servers) by typing or pasting it when prompted.
🛠️ Commands and Flow
Inside the console loop:
On startup
- Select a model (
0-8) or paste a custom model URI. - The model is downloaded (if needed) and loaded with progress reporting.
- Select a model (
Per document
The app prompts:
Enter the path to a document:Type a file path and press Enter.
The app loads it into an
Attachment(PDF, image, and other supported formats).The app runs summarization:
var result = summarizer.Summarize(attachment);
The app prints:
Title: ...Summary: ...
Quit
- Submitting an empty path exits the loop.
- The app then waits for a key press to close.
🗣️ Example Use Cases
Try the demo with:
- A PDF report -> produce a short recap for quick review.
- A multi-page scanned PDF -> get a summary without reading the whole scan.
- A screenshot of a web page -> capture the key idea and main sections.
- A photo of a document (phone capture) -> sanity-check what the model understood.
- A receipt or invoice image -> generate a short description of the purchase and totals.
- Multi-language content -> test guidance like “always summarize in French”.
For PDFs and images, results often depend on the source quality:
- Higher resolution and clean contrast usually improves output.
- Cropped images with only the relevant content usually summarize better than full-screen clutter.
⚙️ Behavior and Policies (quick reference)
Model selection: exactly one model per process. To change models, restart the app.
Primary formats: the demo is most commonly used with PDFs and images, but any format supported by
Attachmentcan work.Download and load:
ModelDownloadingProgressprintsDownloading model XX.XX%(or bytes).ModelLoadingProgressprintsLoading model XX%and clears the console after download.
Summarization settings (as configured in the demo):
GenerateTitle = trueGenerateContent = trueMaxContentWords = 100Guidance = ""(optional)
Exit condition: submitting an empty document path ends the loop.
Licensing:
- You can set an optional license key via
LicenseManager.SetLicenseKey(""). - A free community license is available from the LM-Kit website.
- You can set an optional license key via
💻 Minimal Integration Snippet
using System;
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.TextGeneration;
public class DocumentSummarizerSample
{
public void SummarizeFile(string modelUri, string filePath)
{
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// Load the model
var model = new LM(
new Uri(modelUri),
downloadingProgress: (path, contentLength, bytesRead) => true,
loadingProgress: progress => true);
// Create summarizer
var summarizer = new Summarizer(model)
{
GenerateTitle = true,
GenerateContent = true,
MaxContentWords = 100,
Guidance = "" // Example: "Always summarize in French"
};
// Wrap the file as an Attachment (PDF, image, etc.)
var attachment = new Attachment(filePath);
// Run summarization
var result = summarizer.Summarize(attachment);
Console.WriteLine($"Title: {result.Title}");
Console.WriteLine($"Summary: {result.Summary}");
}
}
Use this pattern to integrate summarization into web APIs, background workers, or desktop apps.
🛠️ Getting Started
📋 Prerequisites
- .NET Framework 4.6.2 or .NET 8.0+
📥 Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document_summarizer
Project Link: document_summarizer (same path as above)
▶️ Run
dotnet build
dotnet run
Then:
- Select a model by typing
0-8, or paste a custom model URI. - Wait for the model to download (first run) and load.
- When prompted, type the path to a PDF or image (or any other supported document file).
- Inspect the generated Title and Summary.
- Press Enter to summarize another file, or submit an empty path to exit.
🔍 Notes on Key Types
LM(LMKit.Model) - model wrapper used by LM-Kit.NET:- Accepts a
Uripointing to the model. - Uses callbacks for download and load progress.
- Accepts a
Summarizer(LMKit.TextGeneration) - document summarization engine:Summarize(Attachment)returns a result with fields likeTitleandSummary.- Controlled by properties such as
GenerateTitle,GenerateContent,MaxContentWords, andGuidance.
Attachment(LMKit.Data) - wraps external data:new Attachment(string path)loads a file from disk.- Common real-world usage includes PDFs and images (screenshots, scanned pages, photos).
- Exceptions are raised when the path is invalid or inaccessible.
🔧 Extend the Demo
Display elapsed time (the demo already measures it with
Stopwatch, it just does not print it yet).Add PDF-focused options:
- summarize only the first N pages
- summarize selected pages (example:
--pages 1,3-5) - page-by-page summaries then a final combined summary
Add CLI flags:
--max-words 200--no-title--guidance "Always summarize in French"
Add batch mode: summarize every PDF or image in a directory.
Write output to disk (example:
output.mdoroutput.json) instead of only printing to console.Add formatting modes:
- bullet summary
- executive summary
- “key takeaways” list
Chain with LM-Kit’s Structured Extraction to go from: document -> summary -> structured data