👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/document_processing_agent

Document Processing Agent for C# .NET Applications

Purpose of the Demo

The Document Processing Agent demo shows how to build an AI agent that processes PDFs and images using built-in Document tools from the LM-Kit.NET SDK. The agent can inspect, split, merge, and render PDFs, deskew, crop, and resize scanned images, extract text from multi-format documents, and run OCR on images in 34 languages, all driven by natural language instructions.

👥 Who Should Use This Demo

This demo is useful for developers and businesses involved in:

Mailroom and scanning automation: an agent that inspects bulk-scanned PDFs, splits them by page ranges, and extracts text from each segment.
Document preparation pipelines: deskew, crop, and resize scanned images before feeding them to vision models or OCR engines.
Legal and compliance: split combined filings into individual documents, merge related documents, and extract text for review.
Healthcare and insurance: process mixed-format claims with PDF splitting, text extraction, and OCR in a single conversational workflow.

🚀 What Problem It Solves

Building document processing pipelines traditionally requires writing custom code for each step: PDF parsing, image manipulation, OCR integration, and file assembly. This demo shows how an AI agent with built-in tools can orchestrate all these operations from natural language instructions, reducing a multi-script workflow to a single prompt.

💻 Demo Application Overview

The Document Processing Agent demo is a console app that:

Lets you select a tool-calling model (or enter a custom model URI)
Downloads and loads the model with progress feedback
Creates an agent equipped with the built-in Document tools
Accepts natural language tasks in an interactive loop
Displays tool invocations in real time as the agent works
Shows results with execution statistics (tool calls, duration, inferences)

Key Features

Built-in Document tools for PDF operations (split, merge, info, render, unlock), image preprocessing (deskew, crop, resize, image-to-pdf), text extraction, and OCR.
Natural language control: describe what you need and the agent selects the right tools.
Tool call monitoring: see which tools the agent invokes and what arguments it passes.
Multi-step workflows: chain operations like "deskew, crop, then OCR" in a single prompt.
Interactive console loop: process multiple tasks in one session.

Minimal Integration Snippet

using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

// Load a tool-calling model
using LM model = LM.LoadFromModelID("qwen3:8b");

// Create agent with document tools
var agent = Agent.CreateBuilder(model)
    .WithPersona("Document Processing Assistant")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfMetadata);
        tools.Register(BuiltInTools.DocumentTextExtract);
        tools.Register(BuiltInTools.OcrRecognize);
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
        tools.Register(BuiltInTools.PdfToImage);
        tools.Register(BuiltInTools.ImageToPdf);
        tools.Register(BuiltInTools.PdfUnlock);
    })
    .Build();

// Run a document task
var result = await agent.RunAsync(
    "Extract pages 1-3 from 'report.pdf' into 'summary.pdf', " +
    "then get the text from page 1.");

Console.WriteLine(result.Content);

⚙️ Getting Started

Prerequisites

.NET 8.0 or later
A tool-calling capable model (~6.5 GB VRAM for Qwen 3 8B)

Download the Project

.NET Console Demo

Running the Application

Clone the repository:

git clone https://github.com/LM-Kit/lm-kit-net-samples

Navigate to the project directory:

cd lm-kit-net-samples/console_net/agents/document_processing_agent

Build and run the application:

dotnet build
dotnet run

Follow the on-screen prompts to select a model and enter document processing tasks.

Example Usage

Select a tool-calling model: choose Qwen 3 8B (recommended) or another tool-calling model.
Enter a document task: type a natural language instruction like "Split pages 1-5 from report.pdf into intro.pdf".
Watch tool invocations: the agent shows which tools it calls and their arguments.
Review results: the agent reports what it did, with execution statistics.
Continue processing: enter more tasks or type 'q' to quit.

Example Prompts

How many pages does 'contract.pdf' have?
Extract pages 1-3 from 'report.pdf' into 'summary.pdf'
Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf'
Deskew 'scan.png', crop its borders, then resize to 1200x1600
Run OCR on 'receipt.jpg' in French
Render page 5 of 'manual.pdf' as a PNG at 2x zoom
Extract text from 'quarterly_report.docx'

Notes on Key Types

Agent (LMKit.Agents): the AI agent that reasons about tasks and calls tools autonomously. Created via Agent.CreateBuilder(model).
BuiltInTools (LMKit.Agents.Tools.BuiltIn): static factory class providing access to all built-in tools. The Document category includes tools for PDF and image processing.
AgentExecutionResult (LMKit.Agents): result of agent execution with Content (final answer), IsSuccess, ToolCalls (list of tool invocations), and Duration.

🚀 Extend the Demo

Add BuiltInTools.JsonParse and BuiltInTools.CsvParse to parse extracted text into structured data.
Combine with DocumentSplitting to detect document boundaries before agent-driven splitting.
Use BuiltInTools.FileSystemRead and BuiltInTools.FileSystemList to let the agent browse folders and process all PDFs in a directory.
Add BuiltInTools.WebSearch for an agent that can look up document metadata online.
Integrate with a multi-agent workflow where one agent preprocesses images and another extracts data.

How-To: Process Documents with Built-In Tools: Guide to using the Document category tools for PDF and image processing.
How-To: Equip Agent with Built-In Tools: Learn how to register and configure built-in tools on agents.
Glossary: AI Agent Tools: Core concepts behind tool registration, invocation, and result handling.
Data Analyst Agent Demo: A companion demo that uses built-in Data and IO tools for CSV, JSON, and XML analysis.

Table of Contents