Table of Contents

👉 Try the demo:
https://github.com/LM-Kit/LynxTranscribe

Offline Audio Transcription with LynxTranscribe


🎯 Purpose of the Sample

LynxTranscribe demonstrates how LM-Kit.NET powers real-time speech-to-text in a full-featured desktop application. It showcases on-device audio transcription with voice activity detection, dictation formatting, and multi-language support, all running entirely offline.

The app uses LM-Kit's SpeechToText API with Whisper-based models for accurate transcription, and the Formatter for intelligent punctuation and capitalization.

Why LynxTranscribe?

  • Complete privacy: audio never leaves the device.
  • Production-ready UI: segment navigation, waveform visualization, export options.
  • Real-world workflow: file transcription and live microphone recording.
  • Cross-platform: runs on Windows and macOS via .NET MAUI.
  • Extensible: clean architecture with separated services and helpers.

👥 Target Audience

  • Desktop App Developers: reference implementation for speech-to-text integration
  • Healthcare & Legal: transcription with strict data privacy requirements
  • Content Creators: transcribe podcasts, interviews, and recordings locally
  • Enterprise: offline transcription without cloud dependencies
  • Education: learn LM-Kit speech APIs in a real application context

🚀 Problem Solved

  • Data sovereignty: sensitive audio stays on-device, no cloud upload.
  • Offline operation: works without internet connectivity.
  • Accurate transcription: Whisper-based models with turbo and accurate modes.
  • Usable output: automatic punctuation, capitalization, and formatting.
  • Flexible export: output to TXT, SRT, VTT, DOCX, or RTF.

💻 Sample Application Description

.NET MAUI desktop app that:

  • Transcribes audio files (WAV, MP3, FLAC, OGG, M4A, WMA) via drag & drop or file picker.
  • Records from microphone with real-time audio level visualization.
  • Displays results in segment view (time-stamped) or document view (continuous text).
  • Provides click-to-seek playback with waveform display.
  • Maintains transcription history for easy access to past work.
  • Exports to multiple formats with a single click.

✨ Key Features

  • 🎤 Live Recording: capture audio directly from microphone with countdown and level meter.
  • 📂 File Import: drag & drop or browse for audio files in 6 popular formats.
  • 🔇 Voice Activity Detection: automatically segments speech from silence.
  • ✍️ Dictation Formatting: intelligent punctuation and capitalization via LM-Kit.
  • 🌍 99+ Languages: transcribe in virtually any spoken language.
  • ⚡ Dual Modes: choose Turbo (faster) or Accurate (higher quality).
  • 🎵 Audio Playback: built-in player with waveform, seeking, and speed control.
  • 📤 Multi-Format Export: TXT, SRT, VTT, DOCX, RTF output.
  • 🌙 Dark/Light Theme: comfortable viewing in any environment.
  • 📜 History: browse and reload past transcriptions.

🎵 Supported Audio Formats

Format Extension
WAV .wav
MP3 .mp3
FLAC .flac
OGG .ogg
M4A .m4a
WMA .wma

🧠 Transcription Modes

Mode Description Use Case
Turbo Faster processing, slightly lower accuracy Quick drafts, real-time needs
Accurate Higher quality output, more processing time Final transcripts, professional use

🛠️ Keyboard Shortcuts

Shortcut Action
Ctrl + O Open audio file
Ctrl + S Export transcription
Ctrl + Mouse Wheel Adjust font size
Space Play/Pause audio
/ Seek backward/forward 5s

🗣️ Example Use Cases

  • Meeting notes: record a meeting, transcribe offline, export to DOCX for sharing.
  • Podcast editing: transcribe episodes to create show notes or subtitles (SRT/VTT).
  • Interview processing: transcribe interviews with click-to-seek for quote verification.
  • Lecture capture: record lectures and get searchable text transcripts.
  • Accessibility: generate captions for video content.
  • Legal/Medical: transcribe sensitive recordings without cloud exposure.

⚙️ Configuration Options

Setting Description
Model Mode Turbo (faster) or Accurate (better quality)
Voice Activity Detection Enable/disable automatic speech segmentation
Dictation Formatting Auto-punctuation and capitalization
Storage Paths Custom locations for models, history, recordings
Theme Dark or light mode
UI Language English or French

🏗️ Architecture Overview

LynxTranscribe/
├── Helpers/            # Utility classes
│   ├── TranscriptExporter.cs
│   ├── WaveformDrawable.cs
│   └── WhisperLanguages.cs
├── Localization/       # Multi-language UI support
├── Models/             # Data models
│   └── TranscriptionRecord.cs
├── Services/           # Business logic
│   ├── LMKitService.cs           # LM-Kit integration
│   ├── AudioPlayerService.cs     # NAudio playback
│   ├── AudioRecorderService.cs   # NAudio recording
│   └── TranscriptionHistoryService.cs
├── MainPage.xaml       # Main UI
└── MainPage.*.cs       # Partial classes (Settings, Export, etc.)

💻 Minimal Integration Snippet

// Initialize LM-Kit speech-to-text
var speechToText = new SpeechToText(model);
speechToText.EnableVoiceActivityDetection = true;

// Transcribe an audio file
var result = await speechToText.TranscribeAsync(audioFilePath);

// Access segments with timestamps
foreach (var segment in result.Segments)
{
    Console.WriteLine($"[{segment.Start:mm\\:ss}] {segment.Text}");
}

// Apply dictation formatting
var formatted = Formatter.Format(result.Text);

🛠️ Getting Started

📋 Prerequisites

Windows: Visual Studio 2022 (17.8+) with .NET MAUI workload

macOS: Visual Studio for Mac or JetBrains Rider

SDK: .NET 8.0

📥 Download

git clone https://github.com/LM-Kit/LynxTranscribe
cd LynxTranscribe

▶️ Run

dotnet restore
dotnet build
dotnet run --project LynxTranscribe.csproj

On first launch, the app will download the speech recognition model (~1.5 GB). This is a one-time setup.


🔍 Key LM-Kit Types Used

  • SpeechToText: core transcription engine; supports streaming and batch modes.
  • AudioSegment: represents a transcribed segment with start/end timestamps and text.
  • Formatter: applies dictation formatting rules (punctuation, capitalization).
  • WhisperModelCard: model metadata and download management.

⚠️ Troubleshooting

  • Model download fails: check internet connectivity; the model is downloaded once on first run.
  • No audio input: verify microphone permissions and selected input device.
  • Slow transcription: switch to Turbo mode or use a machine with GPU acceleration.
  • Garbled output: ensure audio quality is reasonable; very noisy recordings may need preprocessing.

🔧 Extend the Demo

  • Add real-time streaming transcription during recording using TranscribeStreamAsync.
  • Integrate speaker diarization to identify different speakers.
  • Add translation by chaining transcription with LM-Kit's text generation.
  • Build a batch processor to transcribe multiple files automatically.
  • Create a web API wrapper to expose transcription as a service (still on-device).