👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/speech/voice-activity-detection/whisper_with_silero_vad
Voice Activity Detection for C# .NET Applications
🎯 Purpose of the Demo
Transcribes the same audio file twice on a single loaded SpeechToText engine, once with EnableVoiceActivityDetection = false and once with it true. The demo prints wall time, real-time factor, segment count, and the full transcript for each pass so the speedup and the disappearance of silence hallucinations are visible.
👥 Who Should Use This Demo
- Anyone transcribing recorded calls, podcasts, lectures, meetings.
- Engineers chasing Whisper "hallucinated dialogue during silence" bugs.
🚀 What Problem It Solves
Real-world audio is mostly silence. Naive Whisper transcription spends compute on that silence and frequently invents text into it. A Silero VAD frontend gates the audio so the model only sees speech segments. Faster, cleaner, no caller code change other than the toggle.
✨ Key Features
SpeechToText.EnableVoiceActivityDetectiontoggle.SpeechToText.VadSettings.{EnergyThreshold, MinSpeechDuration, MinSilenceDuration, SpeechPadding, MaxSpeechDuration, SampleOverlapSeconds}knobs.- Works on top of any Whisper model (
whisper-tinythroughwhisper-large-turbo3). - Reads WAV directly; uses NAudio to transcode MP3 / M4A / OGG / FLAC on the fly.
⚙️ Getting Started
cd lm-kit-net-samples/console_net/speech/voice-activity-detection/whisper_with_silero_vad
dotnet run -- "C:\path\to\audio.mp3"