Method Split
- Namespace
- LMKit.Extraction
- Assembly
- LM-Kit.NET.dll
Split(Attachment, CancellationToken)
Detects logical document boundaries synchronously within the specified attachment.
public DocumentSplittingResult Split(Attachment attachment, CancellationToken cancellationToken = default)
Parameters
attachmentAttachmentThe multi-page PDF Attachment to analyze. Cannot be
null.cancellationTokenCancellationTokenA token to monitor for cancellation requests. The default value is None.
Returns
- DocumentSplittingResult
A DocumentSplittingResult containing the detected document segments.
Examples
using LMKit.Model;
using LMKit.Extraction;
using LMKit.Document.Pdf;
using LMKit.Data;
using System;
using System.Collections.Generic;
// Load a vision-capable model (8B or larger recommended)
LM model = LM.LoadFromModelID("qwen3-vl:8b");
// Create the splitter
var splitter = new DocumentSplitting(model);
// Detect logical document boundaries
var source = new Attachment("multi_doc.pdf");
DocumentSplittingResult result = splitter.Split(source);
// Display results
Console.WriteLine($"Found {result.DocumentCount} document(s)");
foreach (DocumentSegment segment in result.Segments)
{
Console.WriteLine($" Pages {segment.StartPage}-{segment.EndPage}: {segment.Label}");
}
// Physically split the PDF using PdfSplitter
if (result.ContainsMultipleDocuments)
{
List<Attachment> documents = PdfSplitter.Split(source, result);
Console.WriteLine($"Split into {documents.Count} separate PDFs");
}
Remarks
This synchronous method blocks the calling thread. In asynchronous or UI contexts, use SplitAsync(Attachment, CancellationToken) instead.
Exceptions
- ArgumentNullException
Thrown if
attachmentisnull.
Split(Attachment, bool, string, CancellationToken)
Detects logical document boundaries synchronously within the specified attachment, and optionally splits the source PDF into separate files for each detected segment.
public DocumentSplittingResult Split(Attachment attachment, bool splitDocument, string outputDirectory = null, CancellationToken cancellationToken = default)
Parameters
attachmentAttachmentThe multi-page PDF Attachment to analyze. Cannot be
null.splitDocumentboolWhen
true, the source PDF is physically split into separate PDF files for each detected segment. The file paths are available via Documents. Whenfalse, behavior is identical to Split(Attachment, CancellationToken).outputDirectorystringThe directory where split PDF files will be written. Created if it does not exist. Required when
splitDocumentistrue.cancellationTokenCancellationTokenA token to monitor for cancellation requests. The default value is None.
Returns
- DocumentSplittingResult
A DocumentSplittingResult containing the detected document segments and, when
splitDocumentistrue, the paths to the split PDF files via Documents.
Examples
using LMKit.Model;
using LMKit.Extraction;
using LMKit.Data;
using System;
LM model = LM.LoadFromModelID("qwen3-vl:8b");
var splitter = new DocumentSplitting(model);
// Detect boundaries AND split the PDF into separate files in one call
DocumentSplittingResult result = splitter.Split(
new Attachment("multi_doc_scan.pdf"),
splitDocument: true,
outputDirectory: "output/split_docs");
Console.WriteLine($"Found {result.DocumentCount} document(s)");
for (int i = 0; i < result.Segments.Count; i++)
{
Console.WriteLine($" {result.Segments[i].Label}: {result.Documents[i]}");
}
Remarks
This overload combines boundary detection and physical PDF splitting into a single call. Internally it performs the same VLM-based analysis as Split(Attachment, CancellationToken), then uses PdfSplitter to extract each detected segment into a separate PDF file.
This synchronous method blocks the calling thread. In asynchronous or UI contexts, use SplitAsync(Attachment, bool, string, CancellationToken) instead.
Exceptions
- ArgumentNullException
Thrown if
attachmentisnull, or ifsplitDocumentistrueandoutputDirectoryisnull.