Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/speech/voice-activity-detection/whisper_with_silero_vad

Voice Activity Detection for C# .NET Applications


🎯 Purpose of the Demo

Transcribes the same audio file twice on a single loaded SpeechToText engine, once with EnableVoiceActivityDetection = false and once with it true. The demo prints wall time, real-time factor, segment count, and the full transcript for each pass so the speedup and the disappearance of silence hallucinations are visible.

👥 Who Should Use This Demo

  • Anyone transcribing recorded calls, podcasts, lectures, meetings.
  • Engineers chasing Whisper "hallucinated dialogue during silence" bugs.

🚀 What Problem It Solves

Real-world audio is mostly silence. Naive Whisper transcription spends compute on that silence and frequently invents text into it. A Silero VAD frontend gates the audio so the model only sees speech segments. Faster, cleaner, no caller code change other than the toggle.

✨ Key Features

  • SpeechToText.EnableVoiceActivityDetection toggle.
  • SpeechToText.VadSettings.{EnergyThreshold, MinSpeechDuration, MinSilenceDuration, SpeechPadding, MaxSpeechDuration, SampleOverlapSeconds} knobs.
  • Works on top of any Whisper model (whisper-tiny through whisper-large-turbo3).
  • Reads WAV directly; uses NAudio to transcode MP3 / M4A / OGG / FLAC on the fly.

⚙️ Getting Started

cd lm-kit-net-samples/console_net/speech/voice-activity-detection/whisper_with_silero_vad
dotnet run -- "C:\path\to\audio.mp3"

📚 Additional Resources

Share