Table of Contents

Method SplitAsync

Namespace
LMKit.Extraction
Assembly
LM-Kit.NET.dll

SplitAsync(Attachment, CancellationToken)

Asynchronously detects logical document boundaries within the specified attachment.

public Task<DocumentSplittingResult> SplitAsync(Attachment attachment, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The multi-page PDF Attachment to analyze. Cannot be null.

cancellationToken CancellationToken

A token to monitor for cancellation requests while splitting is running.

Returns

Task<DocumentSplittingResult>

A task that represents the asynchronous operation. The task result contains a DocumentSplittingResult with the detected document segments.

Examples

using LMKit.Model;
using LMKit.Extraction;
using LMKit.Document.Pdf;
using LMKit.Data;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

// Load a vision-capable model (8B or larger recommended)
LM model = LM.LoadFromModelID("qwen3-vl:8b");

// Create the splitter
var splitter = new DocumentSplitting(model);

// Detect logical document boundaries
var source = new Attachment("multi_doc.pdf");
DocumentSplittingResult result = await splitter.SplitAsync(source);

// Display results
Console.WriteLine($"Found {result.DocumentCount} document(s)");
Console.WriteLine($"Confidence: {result.Confidence:P0}");
foreach (DocumentSegment segment in result.Segments)
{
    Console.WriteLine($"  Pages {segment.StartPage}-{segment.EndPage}: {segment.Label}");
}

// Physically split the PDF using PdfSplitter
if (result.ContainsMultipleDocuments)
{
    List<Attachment> documents = PdfSplitter.Split(source, result);
    Console.WriteLine($"Split into {documents.Count} separate PDFs");
}

Exceptions

ArgumentNullException

Thrown if attachment is null.

SplitAsync(Attachment, bool, string, CancellationToken)

Asynchronously detects logical document boundaries within the specified attachment, and optionally splits the source PDF into separate files for each detected segment.

public Task<DocumentSplittingResult> SplitAsync(Attachment attachment, bool splitDocument, string outputDirectory = null, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The multi-page PDF Attachment to analyze. Cannot be null.

splitDocument bool

When true, the source PDF is physically split into separate PDF files for each detected segment. The file paths are available via Documents. When false, behavior is identical to SplitAsync(Attachment, CancellationToken).

outputDirectory string

The directory where split PDF files will be written. Created if it does not exist. Required when splitDocument is true.

cancellationToken CancellationToken

A token to monitor for cancellation requests while splitting is running.

Returns

Task<DocumentSplittingResult>

A task that represents the asynchronous operation. The task result contains a DocumentSplittingResult with the detected document segments and, when splitDocument is true, the paths to the split PDF files via Documents.

Examples

using LMKit.Model;
using LMKit.Extraction;
using LMKit.Data;
using System;
using System.Threading.Tasks;

LM model = LM.LoadFromModelID("qwen3-vl:8b");
var splitter = new DocumentSplitting(model);

// Detect boundaries AND split the PDF into separate files in one call
DocumentSplittingResult result = await splitter.SplitAsync(
    new Attachment("multi_doc_scan.pdf"),
    splitDocument: true,
    outputDirectory: "output/split_docs");

Console.WriteLine($"Found {result.DocumentCount} document(s)");
Console.WriteLine($"Confidence: {result.Confidence:P0}");

for (int i = 0; i < result.Segments.Count; i++)
{
    Console.WriteLine($"  {result.Segments[i].Label}: {result.Documents[i]}");
}

Exceptions

ArgumentNullException

Thrown if attachment is null, or if splitDocument is true and outputDirectory is null.

Share