Class SpeechToText
Provides transcription and language-detection capabilities using an LM model with speech-to-text support.
public sealed class SpeechToText
- Inheritance
-
SpeechToText
- Inherited Members
Examples
// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
Console.WriteLine(segment.Text);
// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");
// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");
// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
Console.WriteLine(segment.Text);
Remarks
Language handling:
-
For transcription, pass
"auto"to enable language auto-detection, or pass an ISO language code supported by the underlying model (see GetSupportedLanguages()). - Supported values are model-dependent. Use GetSupportedLanguages() at runtime to retrieve the exact list for the currently loaded model.
Constructors
- SpeechToText(LM)
Initializes a new instance of the SpeechToText class.
Properties
- Duration
Gets or sets the maximum duration of audio to transcribe.
- EnableVoiceActivityDetection
Gets or sets whether voice activity detection is enabled. When true, only the detected speech portions of the audio are processed, which can reduce processing time and improve speed. All VAD-specific behavior is governed by VadSettings. Defaults to
true.
- Mode
Gets or sets the operating mode: whether to transcribe in the source language or translate into English.
- Prompt
Provides an optional initial text prompt to guide the transcription process. This text is prepended to the decoder's context, which can help bias recognition toward specific vocabulary, phrases, or styles (e.g., domain-specific terminology).
- Start
Gets or sets the time offset at which transcription should begin.
- SuppressHallucinations
Gets or sets whether to apply adaptive filtering to suppress hallucinated segments during transcription.
- SuppressNonSpeechTokens
Gets or sets whether to suppress non-speech tokens during transcription.
- VadSettings
Configuration for voice-activity detection. Used only when EnableVoiceActivityDetection is
true. If you set this tonull, defaults will be reapplied.
Methods
- DetectLanguage(WaveFile, CancellationToken)
Synchronously detects the spoken language of the provided audio content.
- DetectLanguageAsync(WaveFile, CancellationToken)
Asynchronously detects the spoken language of the provided audio content.
- GetSupportedLanguages()
Returns the list of languages supported by the underlying language model.
- Transcribe(WaveFile, string, CancellationToken)
Synchronously transcribes the provided audio content into text segments.
- TranscribeAsync(WaveFile, string, CancellationToken)
Asynchronously transcribes the provided audio content into text segments.
Events
- OnNewSegment
Raised when each new AudioSegment is recognized during streaming transcription.
- OnProgress
Occurs periodically during speech-to-text processing to report overall progress.