Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document-intelligence/email-processing/email_archive_to_markdown

Email Archive to Markdown for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that converts .eml and .mbox files into searchable Markdown. Headers, body, embedded attachments, and scanned images are all handled by the same DocumentToMarkdown engine used for PDFs and Office files.

All conversion runs on-device.


👥 Industry Target Audience

  • Compliance / legal: eDiscovery, FOIA, GDPR access requests.
  • Helpdesk / support analytics: turn ticket exports into queryable text.
  • Customer service intelligence: classify and summarize email threads.
  • Investigations: forensic review of historical archives.
  • RAG: feed support mailboxes into a knowledge base.

🚀 Problem Solved

Email archives are a nightmare to mine: nested replies, mixed encodings, scanned attachments, MIME multipart bodies, MBOX concatenations. Most pipelines stop at the body and lose the attachments. DocumentToMarkdown handles all of it, including OCR for image attachments. The demo wraps it behind a menu that scales from one file to a thousand.


💻 Application Overview

Interactive menu (no command-line arguments) with two modes:

Mode What it does
File Convert a single .eml or .mbox. Optional preview of the produced Markdown.
Archive Convert every email in a folder (recursive). Optionally produce a combined archive.md index with each message as a section.
Quit Exit.

The OCR model loads once at startup. Per-file output reports strategy, certainty, character count, and elapsed time.

✨ Key Features

  • DocumentToMarkdown.Convert(path, options): handles .eml and .mbox natively.
  • VlmOcr as OcrEngine: handles scanned attachments inside emails.
  • Combined index mode: produces one navigable archive.md for a whole folder.
  • Per-file Markdown: each email also exported individually for downstream chunking.

🧠 Model

  • paddleocr-vl-1.6:0.9b (used only when an email contains scanned image attachments).

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document-intelligence/email-processing/email_archive_to_markdown
dotnet run

Pick a mode from the menu and follow the prompts.

Share