👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document-intelligence/email-processing/email_archive_to_markdown
Email Archive to Markdown for C# .NET Applications
🎯 Purpose of the Demo
An interactive console app that converts .eml and .mbox files into searchable Markdown. Headers, body, embedded attachments, and scanned images are all handled by the same DocumentToMarkdown engine used for PDFs and Office files.
All conversion runs on-device.
👥 Industry Target Audience
- Compliance / legal: eDiscovery, FOIA, GDPR access requests.
- Helpdesk / support analytics: turn ticket exports into queryable text.
- Customer service intelligence: classify and summarize email threads.
- Investigations: forensic review of historical archives.
- RAG: feed support mailboxes into a knowledge base.
🚀 Problem Solved
Email archives are a nightmare to mine: nested replies, mixed encodings, scanned attachments, MIME multipart bodies, MBOX concatenations. Most pipelines stop at the body and lose the attachments. DocumentToMarkdown handles all of it, including OCR for image attachments. The demo wraps it behind a menu that scales from one file to a thousand.
💻 Application Overview
Interactive menu (no command-line arguments) with two modes:
| Mode | What it does |
|---|---|
| File | Convert a single .eml or .mbox. Optional preview of the produced Markdown. |
| Archive | Convert every email in a folder (recursive). Optionally produce a combined archive.md index with each message as a section. |
| Quit | Exit. |
The OCR model loads once at startup. Per-file output reports strategy, certainty, character count, and elapsed time.
✨ Key Features
DocumentToMarkdown.Convert(path, options): handles.emland.mboxnatively.VlmOcrasOcrEngine: handles scanned attachments inside emails.- Combined index mode: produces one navigable
archive.mdfor a whole folder. - Per-file Markdown: each email also exported individually for downstream chunking.
🧠 Model
paddleocr-vl-1.6:0.9b(used only when an email contains scanned image attachments).
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
▶️ Running the Application
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document-intelligence/email-processing/email_archive_to_markdown
dotnet run
Pick a mode from the menu and follow the prompts.