Table of Contents

👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document_summarizer

Document Summarizer in .NET Applications


🎯 Purpose of the Sample

Document Summarizer demonstrates how to use LM-Kit.NET to load a local model and generate a short summary, optionally with an auto-generated title, from a document on disk.

This demo is especially useful for PDFs and images, which are the most common real-world formats in document workflows:

  • PDF files: text-based PDFs (selectable text) and scanned PDFs (image-based pages).
  • Images: screenshots, photos of documents, receipts, forms, etc.

The sample shows how to:

  • Download and load a model with progress callbacks.
  • Select a predefined model from a simple menu (or paste a custom model URI).
  • Wrap a file as an Attachment.
  • Summarize the document using LM-Kit’s Summarizer.
  • Configure summarization output (title, summary text, max length, optional guidance).
  • Run in a loop to summarize one document after another.

Why summarize documents with LM-Kit.NET?

  • Local-first: summarize sensitive files on your own hardware.
  • Unified input: PDFs and images go through the same Attachment entry point.
  • Configurable: tune output length and add guidance (example: always summarize in French).
  • Simple developer experience: minimal C# console app with a readable flow.

👥 Target Audience

  • Product and Platform - add summarization to existing .NET services.
  • Data and Document Processing - quickly digest large sets of PDFs, scans, and screenshots.
  • RPA and Back-office - summarize reports, tickets, receipts, and back-office documents.
  • Demo and Education - minimal example of model loading plus a practical document task in C#.

🚀 Problem Solved

  • Turn long documents into short summaries: get a quick overview without reading everything.
  • Summarize PDFs and images: handle common formats like PDF reports and screenshot captures.
  • Model flexibility: pick a model based on your VRAM and latency requirements.
  • Repeatable loop: summarize multiple files in one console session.
  • Optional control: generate a title, control summary length, and add guidance.

💻 Sample Application Description

Console app that:

  • Lets you choose a model (or paste a custom model URI).

  • Downloads the model if needed, with live progress updates.

  • Loads the model with progress reporting.

  • Creates a Summarizer configured to:

    • Generate a title
    • Generate summary content
    • Limit summary size to 100 words
    • Apply optional guidance
  • Repeatedly prompts for a document path:

    • Loads it as an Attachment (PDF, image, and other supported formats).
    • Calls summarizer.Summarize(attachment).
    • Prints Title and Summary.
  • Stops when you submit an empty path.

✨ Key Features

  • 🧠 Document summarization: generates a short summary from a document input.

  • 🏷️ Auto-title: generates a title from the document content.

  • 📄 PDF-focused: great for reports and multi-page PDFs (including scanned PDFs).

  • 🖼️ Image-friendly: summarize screenshots and photos of documents.

  • 📥 Interactive loop: enter a path, get a summary, repeat.

  • 📏 Output control:

    • MaxContentWords to cap length
    • Guidance to steer style or language
  • 📦 Model lifecycle:

    • Automatic download on first use
    • Loading progress shown in the console
  • ❌ Nice errors: friendly message when a file path is invalid or inaccessible.


🧰 Built-In Models (menu)

On startup, the demo shows a model selection menu:

Option Model Approx. VRAM Needed
0 MiniCPM 2.6 o 8.1B ~5.9 GB VRAM
1 Alibaba Qwen 3 2B ~2.5 GB VRAM
2 Alibaba Qwen 3 4B ~4.5 GB VRAM
3 Alibaba Qwen 3 8B ~6.5 GB VRAM
4 Google Gemma 3 4B ~5.7 GB VRAM
5 Google Gemma 3 12B ~11 GB VRAM
6 Mistral Ministral 3 3B ~3.5 GB VRAM
7 Mistral Ministral 3 8B ~6.5 GB VRAM
8 Mistral Ministral 3 14B ~12 GB VRAM
other Custom model URI (GGUF / LMK...) depends on model

Choosing a model for PDFs and images

  • For text-based PDFs, most models work well since the input already contains clean text.
  • For images and scanned PDFs, prefer a vision-capable model if your pipeline needs to read pixels (screenshots, photos, scanned pages). If the document is already selectable text, vision capability is usually less important.

🧠 Supported Models

The demo is pre-wired to LM-Kit’s predefined model cards:

  • minicpm-o
  • qwen3-vl:2b
  • qwen3-vl:4b
  • qwen3-vl:8b
  • gemma3:4b
  • gemma3:12b
  • ministral3:3b
  • ministral3:8b
  • ministral3:14b

Internally:

modelLink = ModelCard
    .GetPredefinedModelCardByModelID("gemma3:4b")
    .ModelUri
    .ToString();

You can also provide any valid model URI manually (including local paths or custom model servers) by typing or pasting it when prompted.


🛠️ Commands and Flow

Inside the console loop:

  • On startup

    • Select a model (0-8) or paste a custom model URI.
    • The model is downloaded (if needed) and loaded with progress reporting.
  • Per document

    • The app prompts: Enter the path to a document:

    • Type a file path and press Enter.

    • The app loads it into an Attachment (PDF, image, and other supported formats).

    • The app runs summarization:

      • var result = summarizer.Summarize(attachment);
    • The app prints:

      • Title: ...
      • Summary: ...
  • Quit

    • Submitting an empty path exits the loop.
    • The app then waits for a key press to close.

🗣️ Example Use Cases

Try the demo with:

  • A PDF report -> produce a short recap for quick review.
  • A multi-page scanned PDF -> get a summary without reading the whole scan.
  • A screenshot of a web page -> capture the key idea and main sections.
  • A photo of a document (phone capture) -> sanity-check what the model understood.
  • A receipt or invoice image -> generate a short description of the purchase and totals.
  • Multi-language content -> test guidance like “always summarize in French”.

For PDFs and images, results often depend on the source quality:

  • Higher resolution and clean contrast usually improves output.
  • Cropped images with only the relevant content usually summarize better than full-screen clutter.

⚙️ Behavior and Policies (quick reference)

  • Model selection: exactly one model per process. To change models, restart the app.

  • Primary formats: the demo is most commonly used with PDFs and images, but any format supported by Attachment can work.

  • Download and load:

    • ModelDownloadingProgress prints Downloading model XX.XX% (or bytes).
    • ModelLoadingProgress prints Loading model XX% and clears the console after download.
  • Summarization settings (as configured in the demo):

    • GenerateTitle = true
    • GenerateContent = true
    • MaxContentWords = 100
    • Guidance = "" (optional)
  • Exit condition: submitting an empty document path ends the loop.

  • Licensing:

    • You can set an optional license key via LicenseManager.SetLicenseKey("").
    • A free community license is available from the LM-Kit website.

💻 Minimal Integration Snippet

using System;
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.TextGeneration;

public class DocumentSummarizerSample
{
    public void SummarizeFile(string modelUri, string filePath)
    {
        Console.InputEncoding = Encoding.UTF8;
        Console.OutputEncoding = Encoding.UTF8;

        // Load the model
        var model = new LM(
            new Uri(modelUri),
            downloadingProgress: (path, contentLength, bytesRead) => true,
            loadingProgress: progress => true);

        // Create summarizer
        var summarizer = new Summarizer(model)
        {
            GenerateTitle = true,
            GenerateContent = true,
            MaxContentWords = 100,
            Guidance = "" // Example: "Always summarize in French"
        };

        // Wrap the file as an Attachment (PDF, image, etc.)
        var attachment = new Attachment(filePath);

        // Run summarization
        var result = summarizer.Summarize(attachment);

        Console.WriteLine($"Title: {result.Title}");
        Console.WriteLine($"Summary: {result.Summary}");
    }
}

Use this pattern to integrate summarization into web APIs, background workers, or desktop apps.


🛠️ Getting Started

📋 Prerequisites

  • .NET Framework 4.6.2 or .NET 8.0+

📥 Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document_summarizer

Project Link: document_summarizer (same path as above)

▶️ Run

dotnet build
dotnet run

Then:

  1. Select a model by typing 0-8, or paste a custom model URI.
  2. Wait for the model to download (first run) and load.
  3. When prompted, type the path to a PDF or image (or any other supported document file).
  4. Inspect the generated Title and Summary.
  5. Press Enter to summarize another file, or submit an empty path to exit.

🔍 Notes on Key Types

  • LM (LMKit.Model) - model wrapper used by LM-Kit.NET:

    • Accepts a Uri pointing to the model.
    • Uses callbacks for download and load progress.
  • Summarizer (LMKit.TextGeneration) - document summarization engine:

    • Summarize(Attachment) returns a result with fields like Title and Summary.
    • Controlled by properties such as GenerateTitle, GenerateContent, MaxContentWords, and Guidance.
  • Attachment (LMKit.Data) - wraps external data:

    • new Attachment(string path) loads a file from disk.
    • Common real-world usage includes PDFs and images (screenshots, scanned pages, photos).
    • Exceptions are raised when the path is invalid or inaccessible.

🔧 Extend the Demo

  • Display elapsed time (the demo already measures it with Stopwatch, it just does not print it yet).

  • Add PDF-focused options:

    • summarize only the first N pages
    • summarize selected pages (example: --pages 1,3-5)
    • page-by-page summaries then a final combined summary
  • Add CLI flags:

    • --max-words 200
    • --no-title
    • --guidance "Always summarize in French"
  • Add batch mode: summarize every PDF or image in a directory.

  • Write output to disk (example: output.md or output.json) instead of only printing to console.

  • Add formatting modes:

    • bullet summary
    • executive summary
    • “key takeaways” list
  • Chain with LM-Kit’s Structured Extraction to go from: document -> summary -> structured data