Class SpeechToText
Provides transcription and language-detection capabilities using an LM model with speech-to-text support.
public sealed class SpeechToText- Inheritance
- 
      
      SpeechToText
- Inherited Members
Examples
// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
    Console.WriteLine(segment.Text);
// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");
// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
    Console.WriteLine(segment.Text);Constructors
- SpeechToText(LM)
- Initializes a new instance of the SpeechToText class. 
Properties
- Duration
- Gets or sets the maximum duration of audio to transcribe. 
- EnableVoiceActivityDetection
- Gets or sets whether voice activity detection is enabled. When true, only the detected speech portions of the audio are processed—which can reduce processing time and improve speed. All VAD-specific behavior is governed by VadSettings. Defaults to - true.
- Mode
- Gets or sets the operating mode: whether to transcribe in the source language or translate into English. 
- Prompt
- [EXPERIMENTAL] Provides an optional initial text prompt to guide the transcription process. This text is prepended to the decoder’s context, which can help bias recognition toward specific vocabulary, phrases, or styles (e.g., domain-specific terminology). 
- Start
- Gets or sets the time offset at which transcription should begin. 
- VadSettings
- Configuration for voice-activity detection. Used only when EnableVoiceActivityDetection is - true. If you set this to- null, defaults will be reapplied.
Methods
- DetectLanguage(WaveFile, CancellationToken)
- Synchronously detects the spoken language of the provided audio content. 
- DetectLanguageAsync(WaveFile, CancellationToken)
- Asynchronously detects the spoken language of the provided audio content. 
- GetSupportedLanguages()
- Returns the list of languages supported by the underlying language model. 
- Transcribe(WaveFile, string, CancellationToken)
- Synchronously transcribes the provided audio content into text segments. 
- TranscribeAsync(WaveFile, string, CancellationToken)
- Asynchronously transcribes the provided audio content into text segments. 
Events
- OnNewSegment
- Raised when each new AudioSegment is recognized during streaming transcription. 
- OnProgress
- Occurs periodically during speech-to-text processing to report overall progress.