Process PDFs and Images with Built-In Document Tools

AI agents can do more than chat. With the right tools, they can split PDFs, merge files, extract text, run OCR, and preprocess scanned images, all from natural language instructions. LM-Kit.NET ships a growing set of built-in Document tools that turn an agent into a document processing assistant. This tutorial builds agents that handle real-world document workflows: splitting bulk scans, preprocessing images for vision pipelines, extracting text, and assembling output files.

Why Agent-Based Document Processing Matters

Two enterprise problems that agent-driven document processing solves:

Mailroom automation without custom code. Bulk-scanned mail arrives as multi-page PDFs mixing invoices, contracts, and ID cards. An agent equipped with PdfInfo, PdfSplit, and DocumentText can inspect the file, split it by page ranges, and extract text from each segment. The entire workflow is described in a single prompt instead of a multi-step script.
Image preprocessing before vision analysis. Scanned documents need deskewing, border cropping, and resizing before a Vision Language Model can process them accurately. An agent with ImageDeskew, ImageCrop, and ImageResize handles this pipeline autonomously, adapting to each image's characteristics.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	6+ GB (for a tool-calling model like Qwen 3 8B)
Model	Must support tool calling (`model.HasToolCalls == true`)

Step 1: Create the Project

dotnet new console -n DocumentToolsAgent
cd DocumentToolsAgent
dotnet add package LM-Kit.NET

Step 2: Understand the Document Tools

All Document tools are accessible through the BuiltInTools static class. They operate on local files and cover the full document lifecycle:

┌──────────────────────────────────────────────────────────────────┐
│                   Document Tools Ecosystem                       │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  PDF Operations    Image Processing    Content    Conversion     │
│  ┌───────────┐    ┌──────────────┐   ┌──────┐   ┌────────────┐  │
│  │ PdfInfo   │    │ ImageDeskew  │   │ Doc  │   │ MdToPdf    │  │
│  │ PdfSplit  │    │ ImageCrop    │   │ Text │   │ EmlToPdf   │  │
│  │ PdfMerge  │    │ ImageResize  │   │      │   │ MdToDocx   │  │
│  │ PdfToImage│    │ ImageToPdf   │   │ Ocr  │   │ MdToHtml   │  │
│  │ PdfUnlock │    └──────────────┘   └──────┘   │ EmlToMd    │  │
│  └───────────┘                                   └────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Tool	Name	Description
`BuiltInTools.PdfMetadata`	`pdf_metadata`	Page count, dimensions, metadata, and text content from PDFs
`BuiltInTools.PdfSplit`	`pdf_split`	Extract pages by range, split PDFs into separate files
`BuiltInTools.PdfMerge`	`pdf_merge`	Merge multiple PDF files into a single output
`BuiltInTools.PdfToImage`	`pdf_to_image`	Render PDF pages as JPEG, PNG, or BMP images with configurable zoom
`BuiltInTools.ImageToPdf`	`image_to_pdf`	Convert JPEG, PNG, or BMP images into a single PDF
`BuiltInTools.PdfUnlock`	`pdf_unlock`	Remove password protection using the known password
`BuiltInTools.ImageDeskew`	`image_deskew`	Detect and correct skew in scanned documents
`BuiltInTools.ImageCrop`	`image_crop`	Auto-crop uniform borders from scanned images
`BuiltInTools.ImageResize`	`image_resize`	Resize images, fit to bounding box, convert pixel formats
`BuiltInTools.DocumentTextExtract`	`document_text_extract`	Extract text from PDF, DOCX, XLSX, PPTX, EML, MBOX, and HTML files
`BuiltInTools.OcrRecognize`	`ocr`	Extract text from images using Tesseract OCR (34 languages)
`BuiltInTools.ConvertMarkdownToPdf`	`markdown_to_pdf`	Convert Markdown content to a formatted PDF with headings, lists, code blocks, tables
`BuiltInTools.ConvertEmlToPdf`	`eml_to_pdf`	Convert an EML email to PDF with embedded file attachments
`BuiltInTools.ConvertMarkdownToDocx`	`markdown_to_docx`	Convert Markdown content to a DOCX file
`BuiltInTools.ConvertMarkdownToHtml`	`markdown_to_html`	Convert Markdown content to HTML
`BuiltInTools.ConvertHtmlToMarkdown`	`html_to_markdown`	Convert HTML content to Markdown
`BuiltInTools.ConvertDocxToMarkdown`	`docx_to_markdown`	Convert a DOCX file to Markdown
`BuiltInTools.ConvertEmlToMarkdown`	`eml_to_markdown`	Convert an EML email file to Markdown

Step 3: Build a PDF Processing Agent

Start with an agent that can inspect, split, and merge PDFs:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create a PDF processing agent
// ──────────────────────────────────────
var pdfAgent = Agent.CreateBuilder(model)
    .WithPersona("PDF Processing Agent")
    .WithInstruction(
        "You process PDF documents. You can inspect PDFs for page count and metadata, " +
        "split them by page ranges into separate files, merge multiple PDFs into one, " +
        "render pages as images, and extract text content.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfMetadata);
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfToImage);
        tools.Register(BuiltInTools.DocumentTextExtract);
    })
    .WithMaxIterations(10)
    .Build();

// ──────────────────────────────────────
// 3. Run a document workflow
// ──────────────────────────────────────
var result = await pdfAgent.RunAsync(
    "Check how many pages 'contract.pdf' has, then extract pages 1-3 " +
    "into 'summary.pdf' and pages 4-10 into 'appendix.pdf'.");

Console.WriteLine(result.Content);

Step 4: Build an Image Preprocessing Agent

For scanned documents, combine image tools to clean up before OCR or vision analysis:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var imageAgent = Agent.CreateBuilder(model)
    .WithPersona("Image Preprocessing Agent")
    .WithInstruction(
        "You preprocess scanned document images. First deskew to correct rotation, " +
        "then crop borders, then resize if needed. Report what corrections were applied.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
    })
    .WithMaxIterations(10)
    .Build();

var result = await imageAgent.RunAsync(
    "Clean up the scanned page at 'scan_001.png': " +
    "correct any rotation, remove white borders, " +
    "then resize to fit within 1200x1600 pixels preserving aspect ratio.");

Console.WriteLine(result.Content);

Step 5: Build a Full Document Processing Agent

Combine all Document tools for end-to-end workflows: inspect, preprocess, extract, split, and merge.

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var docAgent = Agent.CreateBuilder(model)
    .WithPersona("Document Processing Assistant")
    .WithInstruction(
        "You are a document processing assistant. You can:\n" +
        "- Inspect PDFs for page count, metadata, and text content\n" +
        "- Split PDFs by page ranges into separate files\n" +
        "- Merge multiple PDFs into one\n" +
        "- Render PDF pages as images\n" +
        "- Deskew, crop, and resize scanned images\n" +
        "- Extract text from documents (PDF, DOCX, XLSX, PPTX, EML, MBOX, HTML)\n" +
        "- Run OCR on images to extract text (supports 34 languages)\n\n" +
        "Always confirm what you did and report any issues.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.PdfMetadata);
        tools.Register(BuiltInTools.PdfSplit);
        tools.Register(BuiltInTools.PdfMerge);
        tools.Register(BuiltInTools.PdfToImage);
        tools.Register(BuiltInTools.ImageDeskew);
        tools.Register(BuiltInTools.ImageCrop);
        tools.Register(BuiltInTools.ImageResize);
        tools.Register(BuiltInTools.DocumentTextExtract);
        tools.Register(BuiltInTools.OcrRecognize);
    })
    .WithMaxIterations(15)
    .Build();

Example Prompts

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// Split a bulk scan into individual documents
var r1 = await docAgent.RunAsync(
    "The file 'batch_scan.pdf' has 10 pages. Extract pages 1-3 into 'invoice.pdf', " +
    "pages 4-7 into 'contract.pdf', and pages 8-10 into 'receipt.pdf'.");

// Preprocess and OCR a scanned document
var r2 = await docAgent.RunAsync(
    "Deskew 'scan.png', crop its borders, then run OCR on it in French.");

// Merge and extract text
var r3 = await docAgent.RunAsync(
    "Merge 'part1.pdf' and 'part2.pdf' into 'combined.pdf', " +
    "then extract the text from all pages.");

// Render a PDF page for visual inspection
var r4 = await docAgent.RunAsync(
    "Render page 1 of 'report.pdf' as a PNG at 2x zoom.");

Step 6: Combine with Data Tools for Richer Workflows

Document tools work well alongside other built-in tool categories. For example, combine PDF extraction with JSON parsing for structured output:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var dataDocAgent = Agent.CreateBuilder(model)
    .WithPersona("Data Extraction Agent")
    .WithInstruction(
        "You extract and process data from documents. " +
        "Extract text from files, parse structured data, and validate results.")
    .WithTools(tools =>
    {
        // Document tools
        tools.Register(BuiltInTools.DocumentTextExtract);
        tools.Register(BuiltInTools.PdfMetadata);
        tools.Register(BuiltInTools.OcrRecognize);

        // Data tools for post-processing
        tools.Register(BuiltInTools.JsonParse);
        tools.Register(BuiltInTools.CsvParse);
        tools.Register(BuiltInTools.RegexMatch);
        tools.Register(BuiltInTools.ValidatorData);
    })
    .WithMaxIterations(10)
    .Build();

var result = await dataDocAgent.RunAsync(
    "Extract text from 'invoices.pdf', find all dollar amounts using regex, " +
    "and output them as a JSON array.");

Console.WriteLine(result.Content);

Step 7: Monitor Tool Invocations

Track which document tools the agent calls during execution:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a tool-calling model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

docAgent.ToolInvoked += (sender, e) =>
{
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  [TOOL] {e.ToolName}");

    if (e.Arguments != null)
    {
        string args = e.Arguments.Length > 100
            ? e.Arguments.Substring(0, 100) + "..."
            : e.Arguments;
        Console.WriteLine($"         args: {args}");
    }

    if (e.Result != null)
    {
        string result = e.Result.Length > 120
            ? e.Result.Substring(0, 120) + "..."
            : e.Result;
        Console.WriteLine($"         result: {result}");
    }

    Console.ResetColor();
};

Tool Reference

PDF Tools

Tool	Operations	Key Parameters
PdfInfo	`info`, `metadata`, `pages`, `text`	`filePath`, `pageRange`
PdfSplit	`split`, `info`	`inputPath`, `outputPath`, `pageRange`
PdfMerge	`merge`	`inputPaths[]`, `outputPath`
PdfToImage	`render`	`inputPath`, `outputDirectory`, `pageRange`, `zoom`, `format`, `jpegQuality`
ImageToPdf	`convert`	`imagePaths[]`, `outputPath`
PdfUnlock	`unlock`	`inputPath`, `password`, `outputPath`

Image Tools

Tool	Operations	Key Parameters
ImageDeskew	`deskew`, `info`	`inputPath`, `outputPath`, `maxAngle`, `minAngle`
ImageCrop	`crop`	`inputPath`, `outputPath`, `margin`, `tolerance`
ImageResize	`resize`, `resize_box`, `info`	`inputPath`, `outputPath`, `width`, `height`, `pixelFormat`

Conversion Tools

Tool	Key Parameters
MarkdownToPdf	`markdown`, `outputPath`, `pageWidth`, `pageHeight`, `fontSize`
EmlToPdf	`inputPath`, `outputPath`, `stripQuotes`
MarkdownToDocx	`markdown`, `outputPath`
MarkdownToHtml	`markdown`
HtmlToMarkdown	`html`
DocxToMarkdown	`inputPath`, `includeTables`, `includeImages`, `includeHyperlinks`
EmlToMarkdown	`inputPath`, `stripQuotes`

Content Extraction Tools

Tool	Key Parameters
DocumentText	`filePath`, `pageRange`
Ocr	`imagePath`, `language`, `enableDeskew`, `enableOrientationDetection`

Model Selection

Model ID	VRAM	Best For
`qwen3:4b`	~4 GB	Simple single-tool tasks (split, info, text extract)
`qwen3:8b`	~6.5 GB	Multi-step workflows that combine several tools
`qwen3:14b`	~11 GB	Complex reasoning across many tools with conditional logic
`gptoss:20b`	~15 GB	Advanced multi-document pipelines
`glm4.7-flash`	~18 GB	Strongest 30B-class MoE for long multi-step workflows

Common Issues

Problem	Cause	Fix
Agent does not call document tools	Model does not support tool calling	Use a model with `HasToolCalls == true` (Qwen 3, Gemma 3, GPT-OSS)
PDF tool returns "file not found"	Path is relative and working directory differs	Use absolute file paths in prompts
OCR returns empty text	Image is too dark or low contrast	Preprocess with ImageDeskew and ImageCrop first
Agent loops without finishing	Too many tools registered	Start with a focused tool set and add tools incrementally for best performance
Merge output is empty	Input file paths do not exist	Use PdfInfo first to verify files exist
Deskew produces artifacts	Skew angle exceeds MaxAngle	Increase `maxAngle` parameter or rotate 90 degrees first

Next Steps

Equip an Agent with Built-In Tools: full reference for all built-in tool categories.
Create an AI Agent with Tools: implement custom document tools with the ITool interface.
Automatically Split Multi-Document PDFs with AI Vision: use DocumentSplitting with vision models for intelligent boundary detection.
Preprocess Images for Vision Pipelines: direct ImageBuffer API for image preprocessing.
Process Scanned Documents with OCR and Vision Models: VLM OCR and Tesseract for scanned documents.

Table of Contents