Table of Contents

Speech-to-Text with LM-Kit.NET


🎯 Purpose of the Sample

This Speech-to-Text demo illustrates how developers can leverage the LM-Kit.NET SDK to efficiently convert audio files into accurate text transcriptions. Using powerful speech recognition models such as OpenAI Whisper, this sample demonstrates easy integration for applications involving audio content indexing, accessibility enhancement, transcription services, and voice-based interaction systems.

The demo utilizes the SpeechToText, WaveFile, and LM classes, providing a straightforward, developer-friendly API for audio transcription.


🚀 Problem Solved

Manually transcribing audio content is slow, labor-intensive, and error-prone. LM-Kit's Speech-to-Text feature addresses these challenges by:

  • Automatically converting spoken audio into text with high accuracy.
  • Quickly processing audio files, significantly reducing manual transcription efforts.
  • Offering on-device processing to maintain privacy and enhance performance.

This enables developers to create intelligent, accessible applications effortlessly.


💻 Sample Application Description

This console-based application demonstrates audio transcription with streaming text output:

  1. Model Selection: Choose a pre-trained speech recognition model or specify a custom model.
  2. Audio Loading: Load audio files in WAV format.
  3. Transcription: Convert audio into text segments using selected models.

Transcriptions include segment timing, confidence scores, and language detection.


✨ Key Features

  • 🎙️ High-Accuracy Transcription: Converts audio to text accurately and rapidly.
  • 📈 Confidence Scoring: Provides scores indicating transcription accuracy.
  • 🌐 Language Detection: Automatically detects spoken language.
  • 🔒 On-Device Processing: Ensures privacy and rapid processing without cloud dependencies.

🧠 Supported Models

The sample supports multiple speech-to-text models including:

  • OpenAI Whisper Tiny, Base, Small, Medium, Large V3, Large Turbo V3
  • Custom speech-to-text models compatible with LM-Kit

🛠️ Getting Started


📋 Prerequisites

  • .NET Framework 4.6.2 or .NET 6.0

📥 Download the Project


▶️ Running the Application

  1. 📂 Clone the repository:
git clone https://github.com/LM-Kit/lm-kit-net-samples.git
  1. 📁 Navigate to the project directory:
cd lm-kit-net-samples/console_net/audio_transcription
  1. 🔨 Build and run the application:
dotnet build
dotnet run

💡 Example Usage

  • Select a Model:

    • Upon starting, choose a predefined Whisper model or enter a custom model URI.
  • Provide Audio File:

    • Enter the path of the WAV audio file when prompted.
  • View Live Transcription:

    • Transcriptions are displayed segment-by-segment as audio is processed.

📌 Viewing Results

Output:

===== Transcription =====
This is an example transcription.
Another transcribed sentence here.
...

┌────────────────── Transcription Complete ──────────────────┐
│   ✅🔊 Done in 00:02.534                                  │
└────────────────────────────────────────────────────────────┘

Transcriptions are displayed clearly, along with total processing time, facilitating easy integration into various applications.