Table of Contents

Process PDFs and Images with Built-In Document Tools

AI agents can do more than chat. With the right tools, they can split PDFs, merge files, extract text, run OCR, and preprocess scanned images, all from natural language instructions. LM-Kit.NET ships 9 built-in Document tools that turn an agent into a document processing assistant. This tutorial builds agents that handle real-world document workflows: splitting bulk scans, preprocessing images for vision pipelines, extracting text, and assembling output files.


Why Agent-Based Document Processing Matters

Two enterprise problems that agent-driven document processing solves:

  1. Mailroom automation without custom code. Bulk-scanned mail arrives as multi-page PDFs mixing invoices, contracts, and ID cards. An agent equipped with PdfInfo, PdfSplit, and DocumentText can inspect the file, split it by page ranges, and extract text from each segment. The entire workflow is described in a single prompt instead of a multi-step script.
  2. Image preprocessing before vision analysis. Scanned documents need deskewing, border cropping, and resizing before a Vision Language Model can process them accurately. An agent with ImageDeskew, ImageCrop, and ImageResize handles this pipeline autonomously, adapting to each image's characteristics.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 6+ GB (for a tool-calling model like Qwen 3 8B)
Model Must support tool calling (model.HasToolCalls == true)

Step 1: Create the Project

dotnet new console -n DocumentToolsAgent
cd DocumentToolsAgent
dotnet add package LM-Kit.NET

Step 2: Understand the Document Tools

All 9 Document tools are accessible through the BuiltInTools static class. They operate on local files and cover the full document lifecycle:

┌──────────────────────────────────────────────────────┐
│              Document Tools Ecosystem                │
├──────────────────────────────────────────────────────┤
│                                                      │
│   PDF Operations        Image Processing    Content  │
│   ┌───────────┐        ┌──────────────┐   ┌──────┐   │
│   │ PdfInfo   │        │ ImageDeskew  │   │ Doc  │   │
│   │ PdfSplit  │        │ ImageCrop    │   │ Text │   │
│   │ PdfMerge  │        │ ImageResize  │   │      │   │
│   │ PdfRender │        └──────────────┘   │ Ocr  │   │
│   └───────────┘                           └──────┘   │
│                                                      │
└──────────────────────────────────────────────────────┘
Tool Name Description
BuiltInTools.PdfInfo pdf_info Page count, dimensions, metadata, and text content from PDFs
BuiltInTools.PdfSplit pdf_split Extract pages by range, split PDFs into separate files
BuiltInTools.PdfMerge pdf_merge Merge multiple PDF files into a single output
BuiltInTools.PdfRender pdf_render Render PDF pages as PNG or BMP images with configurable zoom
BuiltInTools.ImageDeskew image_deskew Detect and correct skew in scanned documents
BuiltInTools.ImageCrop image_crop Auto-crop uniform borders from scanned images
BuiltInTools.ImageResize image_resize Resize images, fit to bounding box, convert pixel formats
BuiltInTools.DocumentText document_text Extract text from PDF, DOCX, XLSX, PPTX, and HTML files
BuiltInTools.Ocr ocr Extract text from images using Tesseract OCR (34 languages)

Step 3: Build a PDF Processing Agent

Start with an agent that can inspect, split, and merge PDFs:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create a PDF processing agent
// ──────────────────────────────────────
var pdfAgent = Agent.CreateBuilder(model)
    .WithPersona("PDF Processing Agent")
    .WithInstruction(
        "You process PDF documents. You can inspect PDFs for page count and metadata, " +
        "split them by page ranges into separate files, merge multiple PDFs into one, " +
        "render pages as images, and extract text content.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfInfo);
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfRender);
        tools.Register(BuiltInTools.DocumentText);
    })
    .WithMaxIterations(10)
    .Build();

// ──────────────────────────────────────
// 3. Run a document workflow
// ──────────────────────────────────────
var result = await pdfAgent.RunAsync(
    "Check how many pages 'contract.pdf' has, then extract pages 1-3 " +
    "into 'summary.pdf' and pages 4-10 into 'appendix.pdf'.");

Console.WriteLine(result.Content);

Step 4: Build an Image Preprocessing Agent

For scanned documents, combine image tools to clean up before OCR or vision analysis:

var imageAgent = Agent.CreateBuilder(model)
    .WithPersona("Image Preprocessing Agent")
    .WithInstruction(
        "You preprocess scanned document images. First deskew to correct rotation, " +
        "then crop borders, then resize if needed. Report what corrections were applied.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
    })
    .WithMaxIterations(10)
    .Build();

var result = await imageAgent.RunAsync(
    "Clean up the scanned page at 'scan_001.png': " +
    "correct any rotation, remove white borders, " +
    "then resize to fit within 1200x1600 pixels preserving aspect ratio.");

Console.WriteLine(result.Content);

Step 5: Build a Full Document Processing Agent

Combine all 9 tools for end-to-end workflows: inspect, preprocess, extract, split, and merge.

var docAgent = Agent.CreateBuilder(model)
    .WithPersona("Document Processing Assistant")
    .WithInstruction(
        "You are a document processing assistant. You can:\n" +
        "- Inspect PDFs for page count, metadata, and text content\n" +
        "- Split PDFs by page ranges into separate files\n" +
        "- Merge multiple PDFs into one\n" +
        "- Render PDF pages as images\n" +
        "- Deskew, crop, and resize scanned images\n" +
        "- Extract text from documents (PDF, DOCX, XLSX, PPTX, HTML)\n" +
        "- Run OCR on images to extract text (supports 34 languages)\n\n" +
        "Always confirm what you did and report any issues.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfInfo);
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfRender);
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
        tools.Register(BuiltInTools.DocumentText);
        tools.Register(BuiltInTools.Ocr);
    })
    .WithMaxIterations(15)
    .Build();

Example Prompts

// Split a bulk scan into individual documents
var r1 = await docAgent.RunAsync(
    "The file 'batch_scan.pdf' has 10 pages. Extract pages 1-3 into 'invoice.pdf', " +
    "pages 4-7 into 'contract.pdf', and pages 8-10 into 'receipt.pdf'.");

// Preprocess and OCR a scanned document
var r2 = await docAgent.RunAsync(
    "Deskew 'scan.png', crop its borders, then run OCR on it in French.");

// Merge and extract text
var r3 = await docAgent.RunAsync(
    "Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf', " +
    "then extract the text from all pages.");

// Render a PDF page for visual inspection
var r4 = await docAgent.RunAsync(
    "Render page 1 of 'report.pdf' as a PNG at 2x zoom.");

Step 6: Combine with Data Tools for Richer Workflows

Document tools work well alongside other built-in tool categories. For example, combine PDF extraction with JSON parsing for structured output:

var dataDocAgent = Agent.CreateBuilder(model)
    .WithPersona("Data Extraction Agent")
    .WithInstruction(
        "You extract and process data from documents. " +
        "Extract text from files, parse structured data, and validate results.")
    .WithTools(tools =>
    {
        // Document tools
        tools.Register(BuiltInTools.DocumentText);
        tools.Register(BuiltInTools.PdfInfo);
        tools.Register(BuiltInTools.Ocr);

        // Data tools for post-processing
        tools.Register(BuiltInTools.Json);
        tools.Register(BuiltInTools.Csv);
        tools.Register(BuiltInTools.Regex);
        tools.Register(BuiltInTools.Validator);
    })
    .WithMaxIterations(10)
    .Build();

var result = await dataDocAgent.RunAsync(
    "Extract text from 'invoices.pdf', find all dollar amounts using regex, " +
    "and output them as a JSON array.");

Console.WriteLine(result.Content);

Step 7: Monitor Tool Invocations

Track which document tools the agent calls during execution:

docAgent.ToolInvoked += (sender, e) =>
{
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  [TOOL] {e.ToolName}");

    if (e.Arguments != null)
    {
        string args = e.Arguments.Length > 100
            ? e.Arguments.Substring(0, 100) + "..."
            : e.Arguments;
        Console.WriteLine($"         args: {args}");
    }

    if (e.Result != null)
    {
        string result = e.Result.Length > 120
            ? e.Result.Substring(0, 120) + "..."
            : e.Result;
        Console.WriteLine($"         result: {result}");
    }

    Console.ResetColor();
};

Tool Reference

PDF Tools

Tool Operations Key Parameters
PdfInfo info, metadata, pages, text filePath, pageRange
PdfSplit split, info inputPath, outputPath, pageRange
PdfMerge merge inputPaths[], outputPath
PdfRender render filePath, page, zoom, outputPath, format

Image Tools

Tool Operations Key Parameters
ImageDeskew deskew, info inputPath, outputPath, maxAngle, minAngle
ImageCrop crop inputPath, outputPath, margin, tolerance
ImageResize resize, resize_box, info inputPath, outputPath, width, height, pixelFormat

Content Extraction Tools

Tool Operations Key Parameters
DocumentText extract filePath, pageRange
Ocr recognize, info imagePath, language, enableDeskew, enableOrientationDetection

Model Selection

Model ID VRAM Best For
qwen3:4b ~4 GB Simple single-tool tasks (split, info, text extract)
qwen3:8b ~6.5 GB Multi-step workflows combining 3-5 tools
qwen3:14b ~11 GB Complex reasoning across many tools with conditional logic
gptoss:20b ~15 GB Advanced multi-document pipelines
glm4.7-flash ~18 GB Strongest 30B-class MoE for long multi-step workflows

Common Issues

Problem Cause Fix
Agent does not call document tools Model does not support tool calling Use a model with HasToolCalls == true (Qwen 3, Gemma 3, GPT-OSS)
PDF tool returns "file not found" Path is relative and working directory differs Use absolute file paths in prompts
OCR returns empty text Image is too dark or low contrast Preprocess with ImageDeskew and ImageCrop first
Agent loops without finishing Too many tools registered Limit to 5-7 tools per agent for best performance
Merge output is empty Input file paths do not exist Use PdfInfo first to verify files exist
Deskew produces artifacts Skew angle exceeds MaxAngle Increase maxAngle parameter or rotate 90 degrees first

Next Steps