Class SpeechToText
Provides transcription and language-detection capabilities using an LM model with speech-to-text support.
public sealed class SpeechToText
- Inheritance
-
SpeechToText
- Inherited Members
Examples
// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
Console.WriteLine(segment.Text);
// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");
// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");
// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
Console.WriteLine(segment.Text);
Constructors
- SpeechToText(LM)
Initializes a new instance of the SpeechToText class.
Properties
- Duration
Gets or sets the maximum duration of audio to transcribe.
- EnableVoiceActivityDetection
Gets or sets whether voice activity detection is enabled. When true, only the detected speech portions of the audio are processed—which can reduce processing time and improve speed. All VAD-specific behavior is governed by VadSettings. Defaults to
true
.
- Mode
Gets or sets the operating mode: whether to transcribe in the source language or translate into English.
- Start
Gets or sets the time offset at which transcription should begin.
- VadSettings
Configuration for voice-activity detection. Used only when EnableVoiceActivityDetection is
true
. If you set this tonull
, defaults will be reapplied.
Methods
- DetectLanguage(WaveFile, CancellationToken)
Synchronously detects the spoken language of the provided audio content.
- DetectLanguageAsync(WaveFile, CancellationToken)
Asynchronously detects the spoken language of the provided audio content.
- GetSupportedLanguages()
Returns the list of languages supported by the underlying language model.
- Transcribe(WaveFile, string, CancellationToken)
Synchronously transcribes the provided audio content into text segments.
- TranscribeAsync(WaveFile, string, CancellationToken)
Asynchronously transcribes the provided audio content into text segments.
Events
- OnNewSegment
Raised when each new AudioSegment is recognized during streaming transcription.
- OnProgress
Occurs periodically during speech-to-text processing to report overall progress.