Table of Contents

Class SpeechToText

Namespace
LMKit.Speech
Assembly
LM-Kit.NET.dll

Provides transcription and language-detection capabilities using an LM model with speech-to-text support.

public sealed class SpeechToText
Inheritance
SpeechToText
Inherited Members

Examples

// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
    Console.WriteLine(segment.Text);

// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");
// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");

// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
    Console.WriteLine(segment.Text);

Constructors

SpeechToText(LM)

Initializes a new instance of the SpeechToText class.

Properties

Duration

Gets or sets the maximum duration of audio to transcribe.

EnableVoiceActivityDetection

Gets or sets whether voice activity detection is enabled. When true, only the detected speech portions of the audio are processed—which can reduce processing time and improve speed. All VAD-specific behavior is governed by VadSettings. Defaults to true.

Mode

Gets or sets the operating mode: whether to transcribe in the source language or translate into English.

Start

Gets or sets the time offset at which transcription should begin.

VadSettings

Configuration for voice-activity detection. Used only when EnableVoiceActivityDetection is true. If you set this to null, defaults will be reapplied.

Methods

DetectLanguage(WaveFile, CancellationToken)

Synchronously detects the spoken language of the provided audio content.

DetectLanguageAsync(WaveFile, CancellationToken)

Asynchronously detects the spoken language of the provided audio content.

GetSupportedLanguages()

Returns the list of languages supported by the underlying language model.

Transcribe(WaveFile, string, CancellationToken)

Synchronously transcribes the provided audio content into text segments.

TranscribeAsync(WaveFile, string, CancellationToken)

Asynchronously transcribes the provided audio content into text segments.

Events

OnNewSegment

Raised when each new AudioSegment is recognized during streaming transcription.

OnProgress

Occurs periodically during speech-to-text processing to report overall progress.