Table of Contents

👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/language_detection_from_document

AI Language Detection from Documents for C# .NET Applications (Images and PDFs)


🎯 Purpose of the Sample

The Language Detection from Document demo shows how to use LM-Kit.NET to automatically detect the language of text contained in documents, including images and PDF files.

It leverages vision-capable models and OCR internally, so you can detect language from visual content with minimal code and minimal manual steps.


👥 Industry Target Audience

This demo is particularly useful for developers and organizations working on:

  • Global enterprises: automatically route or localize content based on detected language.
  • Mobile and web app development: detect the language of uploaded files (images or PDFs) to provide localized experiences.
  • Media and publishing: tag and index documents by language to improve search and content organization.
  • Intelligent agent solutions: allow autonomous systems to interpret multilingual document inputs and react without human intervention.

🚀 Problem Solved

Manually determining the language inside documents is inefficient and error-prone, especially when the input arrives as images or PDF files. This sample automates language detection from document content, which is a key capability for agentic workflows (triage, routing, translation, extraction).

The demo also provides a selection of smaller vision-enabled models, so you can choose a balance between hardware constraints, latency, and accuracy.


💻 Sample Application Description

The Language Detection from Document demo is a console application that:

  • Lets you select a vision-capable model (or paste a custom model URI)
  • Downloads and loads the model with progress feedback
  • Prompts for a document path (image or PDF)
  • Detects the language and prints the result with processing time

✨ Key Features

  • Model selection: choose from predefined vision-capable models, or provide a custom model URI
  • Progress feedback: shows model download and loading progress
  • Document-based language detection: detect language from images and PDFs using OCR and a vision model
  • Performance metric: prints processing time for each run

🤖 Benefits for Agentic Solutions

Adding document-based language detection to autonomous agents provides:

  • Real-time routing: decide what pipeline to run (translation, extraction, summarization) based on language
  • Better user experience: respond in the right language without explicit user configuration
  • Scalable deployment: run locally when privacy matters or when cloud access is not desired
  • Improved context awareness: agents can interpret multilingual document inputs more reliably

🛠️ Getting Started

📋 Prerequisites

  • .NET Framework 4.6.2 or .NET 8.0+

📥 Download the Project

▶️ Running the Application

  1. Clone the repository:

    git clone https://github.com/LM-Kit/lm-kit-net-samples
    
  2. Navigate to the project directory:

    cd lm-kit-net-samples/console_net/language_detection_from_document
    
  3. Build and run the application:

    dotnet build
    dotnet run
    
  4. Follow the on-screen prompts to select a model and provide the path to an image or PDF for language detection.


💡 Example Usage

  1. Select a model: choose a vision-capable model from the menu, or enter a custom model URI.
  2. Provide a document path: input the file path to an image or PDF containing text.
  3. Language detection: the app processes the document, performs OCR internally, and detects the language.
  4. Review results: the detected language and processing time are displayed.
  5. Repeat or exit: continue with additional documents, or exit the application by submitting an empty input.

By incorporating this demo into your projects, you can enable agents and workflows to automatically detect the language of document inputs (images and PDFs) and route them to the right next step, such as translation, extraction, or indexing.