Table of Contents

Build a Multilingual Audio Translation Pipeline

Global organizations operate across languages: customer calls in Spanish, partner meetings in Japanese, training videos in German. LM-Kit.NET lets you chain local speech-to-text with text translation to convert audio recordings from any language into any other language, entirely on-device. No cloud APIs, no per-minute billing, and no audio data leaving your infrastructure. This tutorial builds a multilingual audio pipeline that detects the spoken language, transcribes the audio, and translates the transcript into one or more target languages.


Why Local Audio Translation Matters

Two enterprise problems that on-device audio translation solves:

  1. Multinational compliance recordings. Financial institutions operating across Europe and Asia record client advisory calls in local languages. Compliance teams in the home office need English translations for regulatory review. Cloud translation services create data residency issues under GDPR and local banking regulations. A local pipeline keeps recordings on-premises while producing translations for cross-border oversight.
  2. Multilingual customer support analysis. A global support center handles calls in 15+ languages. Quality assurance teams need to review call transcripts in a common language to identify trends, training gaps, and compliance issues. An automated translate-and-archive pipeline lets QA analysts review all calls in English regardless of the original language.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM ~4.5 GB (Whisper model + translation model)
Disk ~4 GB free for model downloads
Audio file A .wav file (16-bit PCM, any sample rate)

Step 1: Create the Project

dotnet new console -n AudioTranslationPipeline
cd AudioTranslationPipeline
dotnet add package LM-Kit.NET

Step 2: Understand the Pipeline

  Audio file (.wav)
        │
        ▼
  ┌──────────────────────┐
  │  SpeechToText        │    Whisper model
  │  DetectLanguage()    │    Identify spoken language
  │  Transcribe()        │    Original-language text
  └────────┬─────────────┘
           │
           ▼
  ┌──────────────────────┐
  │  TextTranslation     │    LLM translation
  │  Translate()         │    Target language text
  └────────┬─────────────┘
           │
           ▼
  Translated transcript
  (one or more languages)
Stage Component Purpose
Detect SpeechToText.DetectLanguage Identify the spoken language automatically
Transcribe SpeechToText.Transcribe Convert audio to text in the original language
Translate TextTranslation.Translate Translate transcript to target language(s)

Step 3: Detect, Transcribe, and Translate

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);
string translatedText = translation.Translation;

Console.WriteLine("done.\n");

Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"=== Translated Transcript ({targetLanguage}) ===");
Console.ResetColor();
Console.WriteLine(translatedText);

// Save both versions
File.WriteAllText("transcript_original.txt", originalText);
File.WriteAllText("transcript_translated.txt", translatedText);
Console.WriteLine("\nSaved: transcript_original.txt, transcript_translated.txt");

Step 4: Translate to Multiple Languages

Generate translations in several target languages at once:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Console.WriteLine("\n=== Multi-Language Translation ===\n");

Language[] targetLanguages =
{
    Language.English,
    Language.French,
    Language.German,
    Language.Spanish,
    Language.Japanese
};

foreach (Language lang in targetLanguages)
{
    Console.Write($"  Translating to {lang}... ");

    try
    {
        var result = translator.Translate(originalText, lang);

        string fileName = $"transcript_{lang.ToString().ToLowerInvariant()}.txt";
        File.WriteAllText(fileName, result.Translation);

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine($"done → {fileName}");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

Step 5: Use Whisper's Built-In Translation Mode

Whisper can translate speech directly to English during transcription (single step, no separate translation model needed):

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);

Console.WriteLine("\n=== Whisper Direct Translation (to English) ===\n");

// Switch to translation mode
stt.Mode = SpeechToTextMode.Translation;

Console.Write("Translating audio directly to English... ");
var directTranslation = stt.Transcribe(audio);
Console.WriteLine("done.\n");

Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("=== Direct English Translation ===");
Console.ResetColor();
Console.WriteLine(directTranslation.Text);

// Switch back to transcription mode
stt.Mode = SpeechToTextMode.Transcription;
Approach Languages Quality Speed
Whisper Translation mode Any → English only Good Faster (single step)
Whisper + TextTranslation Any → Any language Better Slower (two steps)

Use Whisper's built-in translation when you only need English output. Use the two-step pipeline when you need translations to non-English languages or higher quality.


Step 6: Batch Process Multilingual Recordings

Process a folder of audio files in mixed languages:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);

Console.WriteLine("\n=== Batch Multilingual Processing ===\n");

string inputDir = "recordings";
string outputDir = "translations";

if (!Directory.Exists(inputDir))
{
    Console.WriteLine($"Create a '{inputDir}' folder with WAV files, then run again.");
    return;
}

Directory.CreateDirectory(outputDir);
Language batchTargetLanguage = Language.English;

string[] wavFiles = Directory.GetFiles(inputDir, "*.wav");
Console.WriteLine($"Processing {wavFiles.Length} file(s), translating to {batchTargetLanguage}\n");

var reportLines = new List<string>();
reportLines.Add("| File | Detected Language | Confidence | Duration |");
reportLines.Add("|---|---|---|---|");

foreach (string wavPath in wavFiles)
{
    string fileName = Path.GetFileNameWithoutExtension(wavPath);
    Console.Write($"  {Path.GetFileName(wavPath)}: ");

    try
    {
        using var wav = new WaveFile(wavPath);

        // Detect language
        var detected = stt.DetectLanguage(wav);

        // Transcribe
        var result = stt.Transcribe(wav);

        // Translate if not already in target language
        string finalText;
        if (detected.Language.Equals(batchTargetLanguage.ToString(),
            StringComparison.OrdinalIgnoreCase))
        {
            finalText = result.Text;
            Console.ForegroundColor = ConsoleColor.DarkGray;
            Console.Write($"[{detected.Language}] already {batchTargetLanguage} → ");
        }
        else
        {
            var translated = translator.Translate(result.Text, batchTargetLanguage);
            finalText = translated.Translation;
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write($"[{detected.Language}→{batchTargetLanguage}] ");
        }

        // Save original + translation
        string origPath = Path.Combine(outputDir, $"{fileName}_original.txt");
        string transPath = Path.Combine(outputDir, $"{fileName}_{batchTargetLanguage.ToString().ToLowerInvariant()}.txt");
        File.WriteAllText(origPath, result.Text);
        File.WriteAllText(transPath, finalText);

        reportLines.Add(
            $"| {Path.GetFileName(wavPath)} | {detected.Language} " +
            $"| {detected.Probability:P0} | {wav.Duration:mm\\:ss} |");

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("done");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

// Save processing report
string reportPath = Path.Combine(outputDir, "processing_report.md");
File.WriteAllLines(reportPath, reportLines);

Console.WriteLine($"\nReport: {reportPath}");
Console.WriteLine($"Translations: {Path.GetFullPath(outputDir)}");

Step 7: Summarize in Target Language

Combine transcription, translation, and summarization for cross-language document intelligence:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.WriteLine("\n=== Transcribe, Translate, and Summarize ===\n");

var summarizer = new Summarizer(translationModel)
{
    MaxContentWords = 150,
    MaxTitleWords = 10,
    GenerateTitle = true,
    GenerateContent = true,
    Intent = Summarizer.SummarizationIntent.Abstraction,
    TargetLanguage = Language.English,
    OverflowStrategy = Summarizer.OverflowResolutionStrategy.RecursiveSummarize
};

// Transcribe in original language, summarize directly in English
Summarizer.SummarizerResult summaryResult = summarizer.Summarize(originalText);

Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Title: {summaryResult.Title}");
Console.ResetColor();
Console.WriteLine($"Summary: {summaryResult.Summary}");

The TargetLanguage property on Summarizer produces the summary directly in the target language, regardless of the input language. This avoids a separate translation step when you only need a summary.


Model Selection

Whisper Models (Transcription)

Model ID VRAM Languages Best For
whisper-large-turbo3 ~870 MB 99 languages Best multilingual accuracy (recommended)
whisper-medium ~820 MB 99 languages Good alternative with similar VRAM
whisper-small ~260 MB 99 languages Faster, lower accuracy on non-English

For multilingual audio, always prefer whisper-large-turbo3 or whisper-medium. Smaller models have significantly lower accuracy on non-English languages.

Translation Models

Model ID VRAM Quality Best For
qwen3:4b ~3.5 GB Good Common language pairs (EN/FR/DE/ES/ZH/JA)
qwen3:8b ~6 GB Very good All language pairs (recommended)
qwen3:14b ~10 GB Excellent Low-resource languages, nuanced translation

The Qwen3 family provides the strongest multilingual translation quality. For European and East Asian language pairs, qwen3:8b is the recommended balance of quality and speed.


Common Issues

Problem Cause Fix
Wrong language detected Short audio clip or ambiguous speech Force language: stt.Transcribe(audio, language: "fr")
Translation quality poor Model too small for the language pair Use qwen3:8b or larger; less common languages need bigger models
Whisper Translation mode outputs garbled text Audio quality too poor Use the two-step pipeline (transcribe + translate) instead
Slow batch processing Two-step pipeline for every file Use Whisper Translation mode for English-only output; reserve two-step for non-English targets
Characters missing in output Console encoding not set Ensure Console.OutputEncoding = Encoding.UTF8 is set

Next Steps

Share