Method ImportDocument
ImportDocument(Attachment, DocumentMetadata, string, string, CancellationToken)
Imports a document into a DataSource, extracting text from each page and generating embeddings for retrieval.
public DataSource ImportDocument(Attachment attachment, DocumentRag.DocumentMetadata documentMetadata, string dataSourceIdentifier, string pageRange = null, CancellationToken cancellationToken = default)
Parameters
attachmentAttachmentThe document attachment to import. Must not be
null.documentMetadataDocumentRag.DocumentMetadataMetadata to associate with the document. Use this to specify a custom document name, reference URL, or additional metadata fields for source attribution.
dataSourceIdentifierstringThe unique identifier for the target DataSource. If a matching data source exists, pages are added as new sections; otherwise, a new data source is created.
pageRangestringAn optional page range specification (e.g., "1-5", "1,3,5-10") to import only specific pages. If
nullor empty, all pages are imported.cancellationTokenCancellationTokenA token to cancel the operation.
Returns
- DataSource
The DataSource containing the imported document, or
nullif no text could be extracted from any page.
Examples
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
DocumentRag docRag = new DocumentRag(embeddingModel);
// Import with explicit ID for document lifecycle management
var attachment = Attachment.FromFile("document.pdf");
var metadata = new DocumentMetadata(attachment, id: "tech-doc-001");
var dataSource = docRag.ImportDocument(attachment, metadata, "documents");
// Import with custom metadata including reference URL
var customMetadata = new DocumentMetadata(
attachment,
id: "tech-doc-002",
sourceUri: "https://example.com/doc.pdf",
customMetadata: new MetadataCollection { { "category", "technical" } });
var dataSourceWithMeta = docRag.ImportDocument(attachment, customMetadata, "documents");
// Import only pages 1-10
var partialMetadata = new DocumentMetadata(attachment, id: "tech-doc-003-partial");
var partialSource = docRag.ImportDocument(attachment, partialMetadata, "documents", pageRange: "1-10");
// Later, delete a document using its ID
docRag.DeleteDocument("tech-doc-001", "documents");
Remarks
This method processes the document page by page according to the configured ProcessingMode. Each page becomes a separate section in the data source, with metadata recording the page number and document name for source attribution.
The Progress event is raised throughout the import process to report status updates.
Pages that produce no extractable text (e.g., blank pages or images without OCR) are skipped.
Exceptions
- ArgumentNullException
Thrown if
attachmentordocumentMetadataisnull.- ArgumentException
Thrown if
dataSourceIdentifierisnull, empty, or whitespace.- OperationCanceledException
Thrown if the operation is canceled.