Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/document_processing_agent

Document Processing Agent for C# .NET Applications


🎯 Purpose of the Demo

The Document Processing Agent demo shows how to build an AI agent that processes PDFs and images using 9 built-in Document tools from the LM-Kit.NET SDK. The agent can inspect, split, merge, and render PDFs, deskew, crop, and resize scanned images, extract text from multi-format documents, and run OCR on images in 34 languages, all driven by natural language instructions.


👥 Industry Target Audience

This demo is useful for developers and businesses involved in:

  • Mailroom and scanning automation: an agent that inspects bulk-scanned PDFs, splits them by page ranges, and extracts text from each segment.
  • Document preparation pipelines: deskew, crop, and resize scanned images before feeding them to vision models or OCR engines.
  • Legal and compliance: split combined filings into individual documents, merge related documents, and extract text for review.
  • Healthcare and insurance: process mixed-format claims with PDF splitting, text extraction, and OCR in a single conversational workflow.

🚀 Problem Solved

Building document processing pipelines traditionally requires writing custom code for each step: PDF parsing, image manipulation, OCR integration, and file assembly. This demo shows how an AI agent with built-in tools can orchestrate all these operations from natural language instructions, reducing a multi-script workflow to a single prompt.


💻 Sample Application Description

The Document Processing Agent demo is a console app that:

  • Lets you select a tool-calling model (or enter a custom model URI)
  • Downloads and loads the model with progress feedback
  • Creates an agent equipped with all 9 Document tools
  • Accepts natural language tasks in an interactive loop
  • Displays tool invocations in real time as the agent works
  • Shows results with execution statistics (tool calls, duration, inferences)

✨ Key Features

  • 9 built-in Document tools: PdfInfo, PdfSplit, PdfMerge, PdfRender, ImageDeskew, ImageCrop, ImageResize, DocumentText, and Ocr.
  • Natural language control: describe what you need and the agent selects the right tools.
  • Tool call monitoring: see which tools the agent invokes and what arguments it passes.
  • Multi-step workflows: chain operations like "deskew, crop, then OCR" in a single prompt.
  • Interactive console loop: process multiple tasks in one session.

💻 Minimal Integration Snippet

using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

// Load a tool-calling model
using LM model = LM.LoadFromModelID("qwen3:8b");

// Create agent with document tools
var agent = Agent.CreateBuilder(model)
    .WithPersona("Document Processing Assistant")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfInfo);
        tools.Register(BuiltInTools.DocumentText);
        tools.Register(BuiltInTools.Ocr);
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
        tools.Register(BuiltInTools.PdfRender);
    })
    .Build();

// Run a document task
var result = await agent.RunAsync(
    "Extract pages 1-3 from 'report.pdf' into 'summary.pdf', " +
    "then get the text from page 1.");

Console.WriteLine(result.Content);

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later
  • A tool-calling capable model (~6.5 GB VRAM for Qwen 3 8B)

📥 Download the Project

▶️ Running the Application

  1. Clone the repository:
git clone https://github.com/LM-Kit/lm-kit-net-samples
  1. Navigate to the project directory:
cd lm-kit-net-samples/console_net/agents/document_processing_agent
  1. Build and run the application:
dotnet build
dotnet run
  1. Follow the on-screen prompts to select a model and enter document processing tasks.

💡 Example Usage

  1. Select a tool-calling model: choose Qwen 3 8B (recommended) or another tool-calling model.
  2. Enter a document task: type a natural language instruction like "Split pages 1-5 from report.pdf into intro.pdf".
  3. Watch tool invocations: the agent shows which tools it calls and their arguments.
  4. Review results: the agent reports what it did, with execution statistics.
  5. Continue processing: enter more tasks or type 'q' to quit.

Example Prompts

  • How many pages does 'contract.pdf' have?
  • Extract pages 1-3 from 'report.pdf' into 'summary.pdf'
  • Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf'
  • Deskew 'scan.png', crop its borders, then resize to 1200x1600
  • Run OCR on 'receipt.jpg' in French
  • Render page 5 of 'manual.pdf' as a PNG at 2x zoom
  • Extract text from 'quarterly_report.docx'

🔍 Notes on Key Types

  • Agent (LMKit.Agents): the AI agent that reasons about tasks and calls tools autonomously. Created via Agent.CreateBuilder(model).

  • BuiltInTools (LMKit.Agents.Tools.BuiltIn): static factory class providing access to all 65 built-in tools. The Document category includes 9 tools for PDF and image processing.

  • AgentExecutionResult (LMKit.Agents): result of agent execution with Content (final answer), IsSuccess, ToolCalls (list of tool invocations), and Duration.


🔧 Extend the Demo

  • Add BuiltInTools.Json and BuiltInTools.Csv to parse extracted text into structured data.
  • Combine with DocumentSplitting to detect document boundaries before agent-driven splitting.
  • Use BuiltInTools.FileSystem to let the agent browse folders and process all PDFs in a directory.
  • Add BuiltInTools.WebSearch for an agent that can look up document metadata online.
  • Integrate with a multi-agent workflow where one agent preprocesses images and another extracts data.