Property SuppressHallucinations
SuppressHallucinations
Gets or sets whether to apply adaptive filtering to suppress hallucinated segments during transcription.
public bool SuppressHallucinations { get; set; }
Property Value
Examples
var engine = new SpeechToText(model);
engine.SuppressHallucinations = false; // Disable hallucination filtering
var result = engine.Transcribe(new WaveFile("audio.wav"));
Remarks
Speech-to-text models can occasionally produce hallucinated outputs, especially during silent or low-energy audio segments. Common hallucinations include phrases like "Thank you for watching", "Subscribe", "Hello", or other phantom text that does not correspond to actual speech in the audio.
When enabled, the engine applies adaptive filtering that combines multiple validation strategies:
- Audio energy analysis: Computes RMS (Root Mean Square) energy for each segment and compares it against adaptive thresholds derived from previously transcribed segments.
- Statistical adaptation: The filtering threshold adjusts dynamically based on the median RMS, variance, and stability of prior segments. As more segments are processed and confidence increases, the filter becomes more precise at distinguishing real speech from low-energy hallucinations.
- No-speech probability: Segments with high no-speech probability scores from the model are filtered out.
- Token confidence: Segments with very high average token probability bypass additional filtering, as they indicate strong model confidence.
- Speaking rate validation: Validates that the word count relative to segment duration falls within realistic human speaking rates.
This adaptive approach ensures that the filter remains effective across different audio characteristics (quiet recordings, loud environments, varying speaker volumes) without requiring manual threshold tuning.
Defaults to true.