Table of Contents

Method Split

Namespace
LMKit.Extraction
Assembly
LM-Kit.NET.dll

Split(Attachment, CancellationToken)

Detects logical document boundaries synchronously within the specified attachment.

public DocumentSplittingResult Split(Attachment attachment, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The multi-page PDF Attachment to analyze. Cannot be null.

cancellationToken CancellationToken

A token to monitor for cancellation requests. The default value is None.

Returns

DocumentSplittingResult

A DocumentSplittingResult containing the detected document segments.

Examples

using LMKit.Model;
using LMKit.Extraction;
using LMKit.Document.Pdf;
using LMKit.Data;
using System;
using System.Collections.Generic;

// Load a vision-capable model (8B or larger recommended)
LM model = LM.LoadFromModelID("qwen3-vl:8b");

// Create the splitter
var splitter = new DocumentSplitting(model);

// Detect logical document boundaries
var source = new Attachment("multi_doc.pdf");
DocumentSplittingResult result = splitter.Split(source);

// Display results
Console.WriteLine($"Found {result.DocumentCount} document(s)");
foreach (DocumentSegment segment in result.Segments)
{
    Console.WriteLine($"  Pages {segment.StartPage}-{segment.EndPage}: {segment.Label}");
}

// Physically split the PDF using PdfSplitter
if (result.ContainsMultipleDocuments)
{
    List<Attachment> documents = PdfSplitter.Split(source, result);
    Console.WriteLine($"Split into {documents.Count} separate PDFs");
}

Remarks

This synchronous method blocks the calling thread. In asynchronous or UI contexts, use SplitAsync(Attachment, CancellationToken) instead.

Exceptions

ArgumentNullException

Thrown if attachment is null.

Split(Attachment, bool, string, CancellationToken)

Detects logical document boundaries synchronously within the specified attachment, and optionally splits the source PDF into separate files for each detected segment.

public DocumentSplittingResult Split(Attachment attachment, bool splitDocument, string outputDirectory = null, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The multi-page PDF Attachment to analyze. Cannot be null.

splitDocument bool

When true, the source PDF is physically split into separate PDF files for each detected segment. The file paths are available via Documents. When false, behavior is identical to Split(Attachment, CancellationToken).

outputDirectory string

The directory where split PDF files will be written. Created if it does not exist. Required when splitDocument is true.

cancellationToken CancellationToken

A token to monitor for cancellation requests. The default value is None.

Returns

DocumentSplittingResult

A DocumentSplittingResult containing the detected document segments and, when splitDocument is true, the paths to the split PDF files via Documents.

Examples

using LMKit.Model;
using LMKit.Extraction;
using LMKit.Data;
using System;

LM model = LM.LoadFromModelID("qwen3-vl:8b");
var splitter = new DocumentSplitting(model);

// Detect boundaries AND split the PDF into separate files in one call
DocumentSplittingResult result = splitter.Split(
    new Attachment("multi_doc_scan.pdf"),
    splitDocument: true,
    outputDirectory: "output/split_docs");

Console.WriteLine($"Found {result.DocumentCount} document(s)");

for (int i = 0; i < result.Segments.Count; i++)
{
    Console.WriteLine($"  {result.Segments[i].Label}: {result.Documents[i]}");
}

Remarks

This overload combines boundary detection and physical PDF splitting into a single call. Internally it performs the same VLM-based analysis as Split(Attachment, CancellationToken), then uses PdfSplitter to extract each detected segment into a separate PDF file.

This synchronous method blocks the calling thread. In asynchronous or UI contexts, use SplitAsync(Attachment, bool, string, CancellationToken) instead.

Exceptions

ArgumentNullException

Thrown if attachment is null, or if splitDocument is true and outputDirectory is null.

Share