👉 Try the demo:
https://github.com/LM-Kit/LynxTranscribe

Offline Audio Transcription with LynxTranscribe

🎯 Purpose of the Demo

LynxTranscribe demonstrates how LM-Kit.NET powers real-time speech-to-text in a full-featured desktop application. It showcases on-device audio transcription with voice activity detection, dictation formatting, and multi-language support, all running entirely offline.

The app uses LM-Kit's SpeechToText API with Whisper-based models for accurate transcription, and the Formatter for intelligent punctuation and capitalization.

Why LynxTranscribe?

Complete privacy: audio never leaves the device.
Production-ready UI: segment navigation, waveform visualization, export options.
Real-world workflow: file transcription and live microphone recording.
Cross-platform: runs on Windows and macOS via .NET MAUI.
Extensible: clean architecture with separated services and helpers.

👥 Target Audience

Desktop App Developers: reference implementation for speech-to-text integration
Healthcare & Legal: transcription with strict data privacy requirements
Content Creators: transcribe podcasts, interviews, and recordings locally
Enterprise: offline transcription without cloud dependencies
Education: learn LM-Kit speech APIs in a real application context

🚀 Problem Solved

Data sovereignty: sensitive audio stays on-device, no cloud upload.
Offline operation: works without internet connectivity.
Accurate transcription: Whisper-based models with turbo and accurate modes.
Usable output: automatic punctuation, capitalization, and formatting.
Flexible export: output to TXT, SRT, VTT, DOCX, or RTF.

💻 Sample Application Description

.NET MAUI desktop app that:

Transcribes audio files (WAV, MP3, FLAC, OGG, M4A, WMA) via drag & drop or file picker.
Records from microphone with real-time audio level visualization.
Displays results in segment view (time-stamped) or document view (continuous text).
Provides click-to-seek playback with waveform display.
Maintains transcription history for easy access to past work.
Exports to multiple formats with a single click.

✨ Key Features

🎤 Live Recording: capture audio directly from microphone with countdown and level meter.
📂 File Import: drag & drop or browse for audio files in 6 popular formats.
🔇 Voice Activity Detection: automatically segments speech from silence.
✍️ Dictation Formatting: intelligent punctuation and capitalization via LM-Kit.
🌍 99+ Languages: transcribe in virtually any spoken language.
⚡ Dual Modes: choose Turbo (faster) or Accurate (higher quality).
🎵 Audio Playback: built-in player with waveform, seeking, and speed control.
📤 Multi-Format Export: TXT, SRT, VTT, DOCX, RTF output.
🌙 Dark/Light Theme: comfortable viewing in any environment.
📜 History: browse and reload past transcriptions.

🎵 Supported Audio Formats

Format	Extension
WAV	.wav
MP3	.mp3
FLAC	.flac
OGG	.ogg
M4A	.m4a
WMA	.wma

🧠 Transcription Modes

Mode	Description	Use Case
Turbo	Faster processing, slightly lower accuracy	Quick drafts, real-time needs
Accurate	Higher quality output, more processing time	Final transcripts, professional use

🛠️ Keyboard Shortcuts

Shortcut	Action
`Ctrl + O`	Open audio file
`Ctrl + S`	Export transcription
`Ctrl + Mouse Wheel`	Adjust font size
`Space`	Play/Pause audio
`←` / `→`	Seek backward/forward 5s

🗣️ Example Use Cases

Meeting notes: record a meeting, transcribe offline, export to DOCX for sharing.
Podcast editing: transcribe episodes to create show notes or subtitles (SRT/VTT).
Interview processing: transcribe interviews with click-to-seek for quote verification.
Lecture capture: record lectures and get searchable text transcripts.
Accessibility: generate captions for video content.
Legal/Medical: transcribe sensitive recordings without cloud exposure.

⚙️ Configuration Options

Setting	Description
Model Mode	Turbo (faster) or Accurate (better quality)
Voice Activity Detection	Enable/disable automatic speech segmentation
Dictation Formatting	Auto-punctuation and capitalization
Storage Paths	Custom locations for models, history, recordings
Theme	Dark or light mode
UI Language	English or French

🏗️ Architecture Overview

LynxTranscribe/
├── Helpers/            # Utility classes
│   ├── TranscriptExporter.cs
│   ├── WaveformDrawable.cs
│   └── WhisperLanguages.cs
├── Localization/       # Multi-language UI support
├── Models/             # Data models
│   └── TranscriptionRecord.cs
├── Services/           # Business logic
│   ├── LMKitService.cs           # LM-Kit integration
│   ├── AudioPlayerService.cs     # NAudio playback
│   ├── AudioRecorderService.cs   # NAudio recording
│   └── TranscriptionHistoryService.cs
├── MainPage.xaml       # Main UI
└── MainPage.*.cs       # Partial classes (Settings, Export, etc.)

💻 Minimal Integration Snippet

// Initialize LM-Kit speech-to-text
var speechToText = new SpeechToText(model);
speechToText.EnableVoiceActivityDetection = true;

// Transcribe an audio file
var result = await speechToText.TranscribeAsync(audioFilePath);

// Access segments with timestamps
foreach (var segment in result.Segments)
{
    Console.WriteLine($"[{segment.Start:mm\\:ss}] {segment.Text}");
}

// Apply dictation formatting
var formatted = Formatter.Format(result.Text);

🛠️ Getting Started

📋 Prerequisites

Windows: Visual Studio 2022 (17.8+) with .NET MAUI workload

macOS: Visual Studio for Mac or JetBrains Rider

SDK: .NET 8.0

📥 Download

git clone https://github.com/LM-Kit/LynxTranscribe
cd LynxTranscribe

▶️ Run

dotnet restore
dotnet build
dotnet run --project LynxTranscribe.csproj

On first launch, the app will download the speech recognition model (~1.5 GB). This is a one-time setup.

🔍 Key LM-Kit Types Used

SpeechToText: core transcription engine; supports streaming and batch modes.
AudioSegment: represents a transcribed segment with start/end timestamps and text.
Formatter: applies dictation formatting rules (punctuation, capitalization).
WhisperModelCard: model metadata and download management.

⚠️ Troubleshooting

Model download fails: check internet connectivity; the model is downloaded once on first run.
No audio input: verify microphone permissions and selected input device.
Slow transcription: switch to Turbo mode or use a machine with GPU acceleration.
Garbled output: ensure audio quality is reasonable; very noisy recordings may need preprocessing.

🔧 Extend the Demo

Add real-time streaming transcription during recording using TranscribeStreamAsync.
Integrate speaker diarization to identify different speakers.
Add translation by chaining transcription with LM-Kit's text generation.
Build a batch processor to transcribe multiple files automatically.
Create a web API wrapper to expose transcription as a service (still on-device).

Transcribe Audio with Speech-to-Text: Step-by-step guide for integrating the SpeechToText API into your applications.
Tune Whisper Transcription with VAD and Segments: Configure voice activity detection and segment timing for higher-quality transcriptions.
Glossary: Speech-to-Text: Core concepts behind automatic speech recognition and Whisper model architecture.
Glossary: Voice Activity Detection: Understanding how VAD improves transcription accuracy by isolating speech segments.
Speech-to-Text Demo: Simpler console-based transcription demo for quick integration testing.

Table of Contents