Class SpeechToText

Namespace: LMKit.Speech

Assembly: LM-Kit.NET.dll

Provides transcription and language-detection capabilities using an LM model with speech-to-text support.

public sealed class SpeechToText

Inheritance: object

SpeechToText

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
    Console.WriteLine(segment.Text);

// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");

// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");

// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
    Console.WriteLine(segment.Text);

Constructors

SpeechToText(LM): Initializes a new instance of the SpeechToText class.

Properties

Duration: Gets or sets the maximum duration of audio to transcribe.

EnableVoiceActivityDetection: Gets or sets whether voice activity detection is enabled. When true, only the detected speech portions of the audio are processed—which can reduce processing time and improve speed. All VAD-specific behavior is governed by VadSettings. Defaults to true.

Mode: Gets or sets the operating mode: whether to transcribe in the source language or translate into English.

Start: Gets or sets the time offset at which transcription should begin.

VadSettings: Configuration for voice-activity detection. Used only when EnableVoiceActivityDetection is true. If you set this to null, defaults will be reapplied.

Methods

DetectLanguage(WaveFile, CancellationToken): Synchronously detects the spoken language of the provided audio content.

DetectLanguageAsync(WaveFile, CancellationToken): Asynchronously detects the spoken language of the provided audio content.

GetSupportedLanguages(): Returns the list of languages supported by the underlying language model.

Transcribe(WaveFile, string, CancellationToken): Synchronously transcribes the provided audio content into text segments.

TranscribeAsync(WaveFile, string, CancellationToken): Asynchronously transcribes the provided audio content into text segments.

Events

OnNewSegment: Raised when each new AudioSegment is recognized during streaming transcription.

OnProgress: Occurs periodically during speech-to-text processing to report overall progress.

Table of Contents

Class SpeechToText

Examples

Constructors

Properties

Methods

Events