👉 Try the demo:
https://github.com/LM-Kit/LynxTranscribe
Offline Audio Transcription with LynxTranscribe
🎯 Purpose of the Sample
LynxTranscribe demonstrates how LM-Kit.NET powers real-time speech-to-text in a full-featured desktop application. It showcases on-device audio transcription with voice activity detection, dictation formatting, and multi-language support, all running entirely offline.
The app uses LM-Kit's SpeechToText API with Whisper-based models for accurate transcription, and the Formatter for intelligent punctuation and capitalization.
Why LynxTranscribe?
- Complete privacy: audio never leaves the device.
- Production-ready UI: segment navigation, waveform visualization, export options.
- Real-world workflow: file transcription and live microphone recording.
- Cross-platform: runs on Windows and macOS via .NET MAUI.
- Extensible: clean architecture with separated services and helpers.
👥 Target Audience
- Desktop App Developers: reference implementation for speech-to-text integration
- Healthcare & Legal: transcription with strict data privacy requirements
- Content Creators: transcribe podcasts, interviews, and recordings locally
- Enterprise: offline transcription without cloud dependencies
- Education: learn LM-Kit speech APIs in a real application context
🚀 Problem Solved
- Data sovereignty: sensitive audio stays on-device, no cloud upload.
- Offline operation: works without internet connectivity.
- Accurate transcription: Whisper-based models with turbo and accurate modes.
- Usable output: automatic punctuation, capitalization, and formatting.
- Flexible export: output to TXT, SRT, VTT, DOCX, or RTF.
💻 Sample Application Description
.NET MAUI desktop app that:
- Transcribes audio files (WAV, MP3, FLAC, OGG, M4A, WMA) via drag & drop or file picker.
- Records from microphone with real-time audio level visualization.
- Displays results in segment view (time-stamped) or document view (continuous text).
- Provides click-to-seek playback with waveform display.
- Maintains transcription history for easy access to past work.
- Exports to multiple formats with a single click.
✨ Key Features
- 🎤 Live Recording: capture audio directly from microphone with countdown and level meter.
- 📂 File Import: drag & drop or browse for audio files in 6 popular formats.
- 🔇 Voice Activity Detection: automatically segments speech from silence.
- ✍️ Dictation Formatting: intelligent punctuation and capitalization via LM-Kit.
- 🌍 99+ Languages: transcribe in virtually any spoken language.
- ⚡ Dual Modes: choose Turbo (faster) or Accurate (higher quality).
- 🎵 Audio Playback: built-in player with waveform, seeking, and speed control.
- 📤 Multi-Format Export: TXT, SRT, VTT, DOCX, RTF output.
- 🌙 Dark/Light Theme: comfortable viewing in any environment.
- 📜 History: browse and reload past transcriptions.
🎵 Supported Audio Formats
| Format | Extension |
|---|---|
| WAV | .wav |
| MP3 | .mp3 |
| FLAC | .flac |
| OGG | .ogg |
| M4A | .m4a |
| WMA | .wma |
🧠 Transcription Modes
| Mode | Description | Use Case |
|---|---|---|
| Turbo | Faster processing, slightly lower accuracy | Quick drafts, real-time needs |
| Accurate | Higher quality output, more processing time | Final transcripts, professional use |
🛠️ Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Ctrl + O |
Open audio file |
Ctrl + S |
Export transcription |
Ctrl + Mouse Wheel |
Adjust font size |
Space |
Play/Pause audio |
← / → |
Seek backward/forward 5s |
🗣️ Example Use Cases
- Meeting notes: record a meeting, transcribe offline, export to DOCX for sharing.
- Podcast editing: transcribe episodes to create show notes or subtitles (SRT/VTT).
- Interview processing: transcribe interviews with click-to-seek for quote verification.
- Lecture capture: record lectures and get searchable text transcripts.
- Accessibility: generate captions for video content.
- Legal/Medical: transcribe sensitive recordings without cloud exposure.
⚙️ Configuration Options
| Setting | Description |
|---|---|
| Model Mode | Turbo (faster) or Accurate (better quality) |
| Voice Activity Detection | Enable/disable automatic speech segmentation |
| Dictation Formatting | Auto-punctuation and capitalization |
| Storage Paths | Custom locations for models, history, recordings |
| Theme | Dark or light mode |
| UI Language | English or French |
🏗️ Architecture Overview
LynxTranscribe/
├── Helpers/ # Utility classes
│ ├── TranscriptExporter.cs
│ ├── WaveformDrawable.cs
│ └── WhisperLanguages.cs
├── Localization/ # Multi-language UI support
├── Models/ # Data models
│ └── TranscriptionRecord.cs
├── Services/ # Business logic
│ ├── LMKitService.cs # LM-Kit integration
│ ├── AudioPlayerService.cs # NAudio playback
│ ├── AudioRecorderService.cs # NAudio recording
│ └── TranscriptionHistoryService.cs
├── MainPage.xaml # Main UI
└── MainPage.*.cs # Partial classes (Settings, Export, etc.)
💻 Minimal Integration Snippet
// Initialize LM-Kit speech-to-text
var speechToText = new SpeechToText(model);
speechToText.EnableVoiceActivityDetection = true;
// Transcribe an audio file
var result = await speechToText.TranscribeAsync(audioFilePath);
// Access segments with timestamps
foreach (var segment in result.Segments)
{
Console.WriteLine($"[{segment.Start:mm\\:ss}] {segment.Text}");
}
// Apply dictation formatting
var formatted = Formatter.Format(result.Text);
🛠️ Getting Started
📋 Prerequisites
Windows: Visual Studio 2022 (17.8+) with .NET MAUI workload
macOS: Visual Studio for Mac or JetBrains Rider
SDK: .NET 8.0
📥 Download
git clone https://github.com/LM-Kit/LynxTranscribe
cd LynxTranscribe
▶️ Run
dotnet restore
dotnet build
dotnet run --project LynxTranscribe.csproj
On first launch, the app will download the speech recognition model (~1.5 GB). This is a one-time setup.
🔍 Key LM-Kit Types Used
SpeechToText: core transcription engine; supports streaming and batch modes.AudioSegment: represents a transcribed segment with start/end timestamps and text.Formatter: applies dictation formatting rules (punctuation, capitalization).WhisperModelCard: model metadata and download management.
⚠️ Troubleshooting
- Model download fails: check internet connectivity; the model is downloaded once on first run.
- No audio input: verify microphone permissions and selected input device.
- Slow transcription: switch to Turbo mode or use a machine with GPU acceleration.
- Garbled output: ensure audio quality is reasonable; very noisy recordings may need preprocessing.
🔧 Extend the Demo
- Add real-time streaming transcription during recording using
TranscribeStreamAsync. - Integrate speaker diarization to identify different speakers.
- Add translation by chaining transcription with LM-Kit's text generation.
- Build a batch processor to transcribe multiple files automatically.
- Create a web API wrapper to expose transcription as a service (still on-device).