Table of Contents

Class SpeechToText

Namespace
LMKit.Speech
Assembly
LM-Kit.NET.dll

Provides transcription and language-detection capabilities using an LM model with speech-to-text support.

public sealed class SpeechToText
Inheritance
SpeechToText
Inherited Members

Examples

// Transcribe audio file
var model = LM.LoadFromModelID("whisper-large-turbo3");
var engine = new SpeechToText(model);
var result = engine.Transcribe(new WaveFile("audio.wav"));
foreach (var segment in result.Segments)
    Console.WriteLine(segment.Text);

// Detect language synchronously
var language = engine.DetectLanguage(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {language}");
// Detect language asynchronously
var asyncLanguage = await engine.DetectLanguageAsync(new WaveFile("audio.wav"));
Console.WriteLine($"Detected language: {asyncLanguage}");

// Transcribe asynchronously
var asyncResult = await engine.TranscribeAsync(new WaveFile("audio.wav"));
foreach (var segment in asyncResult.Segments)
    Console.WriteLine(segment.Text);

Constructors

SpeechToText(LM)

Initializes a new instance of the SpeechToText class.

Properties

Duration

Gets or sets the maximum duration of audio to transcribe.

Start

Gets or sets the time offset at which transcription should begin.

Methods

DetectLanguage(WaveFile, CancellationToken)

Synchronously detects the spoken language of the provided audio content.

DetectLanguageAsync(WaveFile, CancellationToken)

Asynchronously detects the spoken language of the provided audio content.

GetSupportedLanguages()

Returns the list of languages supported by the underlying language model.

Transcribe(WaveFile, string, CancellationToken)

Synchronously transcribes the provided audio content into text segments.

TranscribeAsync(WaveFile, string, CancellationToken)

Asynchronously transcribes the provided audio content into text segments.

Events

OnNewSegment

Raised when each new AudioSegment is recognized during streaming transcription.

OnProgress

Occurs periodically during speech-to-text processing to report overall progress.