Process PDFs and Images with Built-In Document Tools
AI agents can do more than chat. With the right tools, they can split PDFs, merge files, extract text, run OCR, and preprocess scanned images, all from natural language instructions. LM-Kit.NET ships 9 built-in Document tools that turn an agent into a document processing assistant. This tutorial builds agents that handle real-world document workflows: splitting bulk scans, preprocessing images for vision pipelines, extracting text, and assembling output files.
Why Agent-Based Document Processing Matters
Two enterprise problems that agent-driven document processing solves:
- Mailroom automation without custom code. Bulk-scanned mail arrives as multi-page PDFs mixing invoices, contracts, and ID cards. An agent equipped with PdfInfo, PdfSplit, and DocumentText can inspect the file, split it by page ranges, and extract text from each segment. The entire workflow is described in a single prompt instead of a multi-step script.
- Image preprocessing before vision analysis. Scanned documents need deskewing, border cropping, and resizing before a Vision Language Model can process them accurately. An agent with ImageDeskew, ImageCrop, and ImageResize handles this pipeline autonomously, adapting to each image's characteristics.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 6+ GB (for a tool-calling model like Qwen 3 8B) |
| Model | Must support tool calling (model.HasToolCalls == true) |
Step 1: Create the Project
dotnet new console -n DocumentToolsAgent
cd DocumentToolsAgent
dotnet add package LM-Kit.NET
Step 2: Understand the Document Tools
All 9 Document tools are accessible through the BuiltInTools static class. They operate on local files and cover the full document lifecycle:
┌──────────────────────────────────────────────────────┐
│ Document Tools Ecosystem │
├──────────────────────────────────────────────────────┤
│ │
│ PDF Operations Image Processing Content │
│ ┌───────────┐ ┌──────────────┐ ┌──────┐ │
│ │ PdfInfo │ │ ImageDeskew │ │ Doc │ │
│ │ PdfSplit │ │ ImageCrop │ │ Text │ │
│ │ PdfMerge │ │ ImageResize │ │ │ │
│ │ PdfRender │ └──────────────┘ │ Ocr │ │
│ └───────────┘ └──────┘ │
│ │
└──────────────────────────────────────────────────────┘
| Tool | Name | Description |
|---|---|---|
BuiltInTools.PdfInfo |
pdf_info |
Page count, dimensions, metadata, and text content from PDFs |
BuiltInTools.PdfSplit |
pdf_split |
Extract pages by range, split PDFs into separate files |
BuiltInTools.PdfMerge |
pdf_merge |
Merge multiple PDF files into a single output |
BuiltInTools.PdfRender |
pdf_render |
Render PDF pages as PNG or BMP images with configurable zoom |
BuiltInTools.ImageDeskew |
image_deskew |
Detect and correct skew in scanned documents |
BuiltInTools.ImageCrop |
image_crop |
Auto-crop uniform borders from scanned images |
BuiltInTools.ImageResize |
image_resize |
Resize images, fit to bounding box, convert pixel formats |
BuiltInTools.DocumentText |
document_text |
Extract text from PDF, DOCX, XLSX, PPTX, and HTML files |
BuiltInTools.Ocr |
ocr |
Extract text from images using Tesseract OCR (34 languages) |
Step 3: Build a PDF Processing Agent
Start with an agent that can inspect, split, and merge PDFs:
using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Create a PDF processing agent
// ──────────────────────────────────────
var pdfAgent = Agent.CreateBuilder(model)
.WithPersona("PDF Processing Agent")
.WithInstruction(
"You process PDF documents. You can inspect PDFs for page count and metadata, " +
"split them by page ranges into separate files, merge multiple PDFs into one, " +
"render pages as images, and extract text content.")
.WithTools(tools =>
{
tools.Register(BuiltInTools.PdfInfo);
tools.Register(BuiltInTools.PdfSplit);
tools.Register(BuiltInTools.PdfMerge);
tools.Register(BuiltInTools.PdfRender);
tools.Register(BuiltInTools.DocumentText);
})
.WithMaxIterations(10)
.Build();
// ──────────────────────────────────────
// 3. Run a document workflow
// ──────────────────────────────────────
var result = await pdfAgent.RunAsync(
"Check how many pages 'contract.pdf' has, then extract pages 1-3 " +
"into 'summary.pdf' and pages 4-10 into 'appendix.pdf'.");
Console.WriteLine(result.Content);
Step 4: Build an Image Preprocessing Agent
For scanned documents, combine image tools to clean up before OCR or vision analysis:
var imageAgent = Agent.CreateBuilder(model)
.WithPersona("Image Preprocessing Agent")
.WithInstruction(
"You preprocess scanned document images. First deskew to correct rotation, " +
"then crop borders, then resize if needed. Report what corrections were applied.")
.WithTools(tools =>
{
tools.Register(BuiltInTools.ImageDeskew);
tools.Register(BuiltInTools.ImageCrop);
tools.Register(BuiltInTools.ImageResize);
})
.WithMaxIterations(10)
.Build();
var result = await imageAgent.RunAsync(
"Clean up the scanned page at 'scan_001.png': " +
"correct any rotation, remove white borders, " +
"then resize to fit within 1200x1600 pixels preserving aspect ratio.");
Console.WriteLine(result.Content);
Step 5: Build a Full Document Processing Agent
Combine all 9 tools for end-to-end workflows: inspect, preprocess, extract, split, and merge.
var docAgent = Agent.CreateBuilder(model)
.WithPersona("Document Processing Assistant")
.WithInstruction(
"You are a document processing assistant. You can:\n" +
"- Inspect PDFs for page count, metadata, and text content\n" +
"- Split PDFs by page ranges into separate files\n" +
"- Merge multiple PDFs into one\n" +
"- Render PDF pages as images\n" +
"- Deskew, crop, and resize scanned images\n" +
"- Extract text from documents (PDF, DOCX, XLSX, PPTX, HTML)\n" +
"- Run OCR on images to extract text (supports 34 languages)\n\n" +
"Always confirm what you did and report any issues.")
.WithTools(tools =>
{
tools.Register(BuiltInTools.PdfInfo);
tools.Register(BuiltInTools.PdfSplit);
tools.Register(BuiltInTools.PdfMerge);
tools.Register(BuiltInTools.PdfRender);
tools.Register(BuiltInTools.ImageDeskew);
tools.Register(BuiltInTools.ImageCrop);
tools.Register(BuiltInTools.ImageResize);
tools.Register(BuiltInTools.DocumentText);
tools.Register(BuiltInTools.Ocr);
})
.WithMaxIterations(15)
.Build();
Example Prompts
// Split a bulk scan into individual documents
var r1 = await docAgent.RunAsync(
"The file 'batch_scan.pdf' has 10 pages. Extract pages 1-3 into 'invoice.pdf', " +
"pages 4-7 into 'contract.pdf', and pages 8-10 into 'receipt.pdf'.");
// Preprocess and OCR a scanned document
var r2 = await docAgent.RunAsync(
"Deskew 'scan.png', crop its borders, then run OCR on it in French.");
// Merge and extract text
var r3 = await docAgent.RunAsync(
"Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf', " +
"then extract the text from all pages.");
// Render a PDF page for visual inspection
var r4 = await docAgent.RunAsync(
"Render page 1 of 'report.pdf' as a PNG at 2x zoom.");
Step 6: Combine with Data Tools for Richer Workflows
Document tools work well alongside other built-in tool categories. For example, combine PDF extraction with JSON parsing for structured output:
var dataDocAgent = Agent.CreateBuilder(model)
.WithPersona("Data Extraction Agent")
.WithInstruction(
"You extract and process data from documents. " +
"Extract text from files, parse structured data, and validate results.")
.WithTools(tools =>
{
// Document tools
tools.Register(BuiltInTools.DocumentText);
tools.Register(BuiltInTools.PdfInfo);
tools.Register(BuiltInTools.Ocr);
// Data tools for post-processing
tools.Register(BuiltInTools.Json);
tools.Register(BuiltInTools.Csv);
tools.Register(BuiltInTools.Regex);
tools.Register(BuiltInTools.Validator);
})
.WithMaxIterations(10)
.Build();
var result = await dataDocAgent.RunAsync(
"Extract text from 'invoices.pdf', find all dollar amounts using regex, " +
"and output them as a JSON array.");
Console.WriteLine(result.Content);
Step 7: Monitor Tool Invocations
Track which document tools the agent calls during execution:
docAgent.ToolInvoked += (sender, e) =>
{
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($" [TOOL] {e.ToolName}");
if (e.Arguments != null)
{
string args = e.Arguments.Length > 100
? e.Arguments.Substring(0, 100) + "..."
: e.Arguments;
Console.WriteLine($" args: {args}");
}
if (e.Result != null)
{
string result = e.Result.Length > 120
? e.Result.Substring(0, 120) + "..."
: e.Result;
Console.WriteLine($" result: {result}");
}
Console.ResetColor();
};
Tool Reference
PDF Tools
| Tool | Operations | Key Parameters |
|---|---|---|
| PdfInfo | info, metadata, pages, text |
filePath, pageRange |
| PdfSplit | split, info |
inputPath, outputPath, pageRange |
| PdfMerge | merge |
inputPaths[], outputPath |
| PdfRender | render |
filePath, page, zoom, outputPath, format |
Image Tools
| Tool | Operations | Key Parameters |
|---|---|---|
| ImageDeskew | deskew, info |
inputPath, outputPath, maxAngle, minAngle |
| ImageCrop | crop |
inputPath, outputPath, margin, tolerance |
| ImageResize | resize, resize_box, info |
inputPath, outputPath, width, height, pixelFormat |
Content Extraction Tools
| Tool | Operations | Key Parameters |
|---|---|---|
| DocumentText | extract |
filePath, pageRange |
| Ocr | recognize, info |
imagePath, language, enableDeskew, enableOrientationDetection |
Model Selection
| Model ID | VRAM | Best For |
|---|---|---|
qwen3:4b |
~4 GB | Simple single-tool tasks (split, info, text extract) |
qwen3:8b |
~6.5 GB | Multi-step workflows combining 3-5 tools |
qwen3:14b |
~11 GB | Complex reasoning across many tools with conditional logic |
gptoss:20b |
~15 GB | Advanced multi-document pipelines |
glm4.7-flash |
~18 GB | Strongest 30B-class MoE for long multi-step workflows |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Agent does not call document tools | Model does not support tool calling | Use a model with HasToolCalls == true (Qwen 3, Gemma 3, GPT-OSS) |
| PDF tool returns "file not found" | Path is relative and working directory differs | Use absolute file paths in prompts |
| OCR returns empty text | Image is too dark or low contrast | Preprocess with ImageDeskew and ImageCrop first |
| Agent loops without finishing | Too many tools registered | Limit to 5-7 tools per agent for best performance |
| Merge output is empty | Input file paths do not exist | Use PdfInfo first to verify files exist |
| Deskew produces artifacts | Skew angle exceeds MaxAngle | Increase maxAngle parameter or rotate 90 degrees first |
Next Steps
- Equip an Agent with Built-In Tools: full reference for all 65 built-in tools across 8 categories.
- Create an AI Agent with Tools: implement custom document tools with the
IToolinterface. - Automatically Split Multi-Document PDFs with AI Vision: use
DocumentSplittingwith vision models for intelligent boundary detection. - Preprocess Images for Vision Pipelines: direct
ImageBufferAPI for image preprocessing. - Process Scanned Documents with OCR and Vision Models: VLM OCR and Tesseract for scanned documents.