👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/document_processing_agent
Document Processing Agent for C# .NET Applications
🎯 Purpose of the Demo
The Document Processing Agent demo shows how to build an AI agent that processes PDFs and images using 9 built-in Document tools from the LM-Kit.NET SDK. The agent can inspect, split, merge, and render PDFs, deskew, crop, and resize scanned images, extract text from multi-format documents, and run OCR on images in 34 languages, all driven by natural language instructions.
👥 Industry Target Audience
This demo is useful for developers and businesses involved in:
- Mailroom and scanning automation: an agent that inspects bulk-scanned PDFs, splits them by page ranges, and extracts text from each segment.
- Document preparation pipelines: deskew, crop, and resize scanned images before feeding them to vision models or OCR engines.
- Legal and compliance: split combined filings into individual documents, merge related documents, and extract text for review.
- Healthcare and insurance: process mixed-format claims with PDF splitting, text extraction, and OCR in a single conversational workflow.
🚀 Problem Solved
Building document processing pipelines traditionally requires writing custom code for each step: PDF parsing, image manipulation, OCR integration, and file assembly. This demo shows how an AI agent with built-in tools can orchestrate all these operations from natural language instructions, reducing a multi-script workflow to a single prompt.
💻 Sample Application Description
The Document Processing Agent demo is a console app that:
- Lets you select a tool-calling model (or enter a custom model URI)
- Downloads and loads the model with progress feedback
- Creates an agent equipped with all 9 Document tools
- Accepts natural language tasks in an interactive loop
- Displays tool invocations in real time as the agent works
- Shows results with execution statistics (tool calls, duration, inferences)
✨ Key Features
- 9 built-in Document tools: PdfInfo, PdfSplit, PdfMerge, PdfRender, ImageDeskew, ImageCrop, ImageResize, DocumentText, and Ocr.
- Natural language control: describe what you need and the agent selects the right tools.
- Tool call monitoring: see which tools the agent invokes and what arguments it passes.
- Multi-step workflows: chain operations like "deskew, crop, then OCR" in a single prompt.
- Interactive console loop: process multiple tasks in one session.
💻 Minimal Integration Snippet
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
// Load a tool-calling model
using LM model = LM.LoadFromModelID("qwen3:8b");
// Create agent with document tools
var agent = Agent.CreateBuilder(model)
.WithPersona("Document Processing Assistant")
.WithTools(tools =>
{
tools.Register(BuiltInTools.PdfSplit);
tools.Register(BuiltInTools.PdfMerge);
tools.Register(BuiltInTools.PdfInfo);
tools.Register(BuiltInTools.DocumentText);
tools.Register(BuiltInTools.Ocr);
tools.Register(BuiltInTools.ImageDeskew);
tools.Register(BuiltInTools.ImageCrop);
tools.Register(BuiltInTools.ImageResize);
tools.Register(BuiltInTools.PdfRender);
})
.Build();
// Run a document task
var result = await agent.RunAsync(
"Extract pages 1-3 from 'report.pdf' into 'summary.pdf', " +
"then get the text from page 1.");
Console.WriteLine(result.Content);
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
- A tool-calling capable model (~6.5 GB VRAM for Qwen 3 8B)
📥 Download the Project
▶️ Running the Application
- Clone the repository:
git clone https://github.com/LM-Kit/lm-kit-net-samples
- Navigate to the project directory:
cd lm-kit-net-samples/console_net/agents/document_processing_agent
- Build and run the application:
dotnet build
dotnet run
- Follow the on-screen prompts to select a model and enter document processing tasks.
💡 Example Usage
- Select a tool-calling model: choose Qwen 3 8B (recommended) or another tool-calling model.
- Enter a document task: type a natural language instruction like "Split pages 1-5 from report.pdf into intro.pdf".
- Watch tool invocations: the agent shows which tools it calls and their arguments.
- Review results: the agent reports what it did, with execution statistics.
- Continue processing: enter more tasks or type 'q' to quit.
Example Prompts
How many pages does 'contract.pdf' have?Extract pages 1-3 from 'report.pdf' into 'summary.pdf'Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf'Deskew 'scan.png', crop its borders, then resize to 1200x1600Run OCR on 'receipt.jpg' in FrenchRender page 5 of 'manual.pdf' as a PNG at 2x zoomExtract text from 'quarterly_report.docx'
🔍 Notes on Key Types
Agent(LMKit.Agents): the AI agent that reasons about tasks and calls tools autonomously. Created viaAgent.CreateBuilder(model).BuiltInTools(LMKit.Agents.Tools.BuiltIn): static factory class providing access to all 65 built-in tools. The Document category includes 9 tools for PDF and image processing.AgentExecutionResult(LMKit.Agents): result of agent execution withContent(final answer),IsSuccess,ToolCalls(list of tool invocations), andDuration.
🔧 Extend the Demo
- Add
BuiltInTools.JsonandBuiltInTools.Csvto parse extracted text into structured data. - Combine with
DocumentSplittingto detect document boundaries before agent-driven splitting. - Use
BuiltInTools.FileSystemto let the agent browse folders and process all PDFs in a directory. - Add
BuiltInTools.WebSearchfor an agent that can look up document metadata online. - Integrate with a multi-agent workflow where one agent preprocesses images and another extracts data.