Build a Multilingual Audio Translation Pipeline

Global organizations operate across languages: customer calls in Spanish, partner meetings in Japanese, training videos in German. LM-Kit.NET lets you chain local speech-to-text with text translation to convert audio recordings from any language into any other language, entirely on-device. No cloud APIs, no per-minute billing, and no audio data leaving your infrastructure. This tutorial builds a multilingual audio pipeline that detects the spoken language, transcribes the audio, and translates the transcript into one or more target languages.

Why Local Audio Translation Matters

Two enterprise problems that on-device audio translation solves:

Multinational compliance recordings. Financial institutions operating across Europe and Asia record client advisory calls in local languages. Compliance teams in the home office need English translations for regulatory review. Cloud translation services create data residency issues under GDPR and local banking regulations. A local pipeline keeps recordings on-premises while producing translations for cross-border oversight.
Multilingual customer support analysis. A global support center handles calls in 15+ languages. Quality assurance teams need to review call transcripts in a common language to identify trends, training gaps, and compliance issues. An automated translate-and-archive pipeline lets QA analysts review all calls in English regardless of the original language.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	~4.5 GB (Whisper model + translation model)
Disk	~4 GB free for model downloads
Audio file	A `.wav` file (16-bit PCM, any sample rate)

Step 1: Create the Project

dotnet new console -n AudioTranslationPipeline
cd AudioTranslationPipeline
dotnet add package LM-Kit.NET

Step 2: Understand the Pipeline

  Audio file (.wav)
        │
        ▼
  ┌──────────────────────┐
  │  SpeechToText        │    Whisper model
  │  DetectLanguage()    │    Identify spoken language
  │  Transcribe()        │    Original-language text
  └────────┬─────────────┘
           │
           ▼
  ┌──────────────────────┐
  │  TextTranslation     │    LLM translation
  │  Translate()         │    Target language text
  └────────┬─────────────┘
           │
           ▼
  Translated transcript
  (one or more languages)

Stage	Component	Purpose
Detect	`SpeechToText.DetectLanguage`	Identify the spoken language automatically
Transcribe	`SpeechToText.Transcribe`	Convert audio to text in the original language
Translate	`TextTranslation.Translate`	Translate transcript to target language(s)

Step 3: Detect, Transcribe, and Translate

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3.5:9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);
string translatedText = translation.Translation;

Console.WriteLine("done.\n");

Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"=== Translated Transcript ({targetLanguage}) ===");
Console.ResetColor();
Console.WriteLine(translatedText);

// Save both versions
File.WriteAllText("transcript_original.txt", originalText);
File.WriteAllText("transcript_translated.txt", translatedText);
Console.WriteLine("\nSaved: transcript_original.txt, transcript_translated.txt");

Step 4: Translate to Multiple Languages

Generate translations in several target languages at once:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3.5:9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Console.WriteLine("\n=== Multi-Language Translation ===\n");

Language[] targetLanguages =
{
    Language.English,
    Language.French,
    Language.German,
    Language.Spanish,
    Language.Japanese
};

foreach (Language lang in targetLanguages)
{
    Console.Write($"  Translating to {lang}... ");

    try
    {
        var result = translator.Translate(originalText, lang);

        string fileName = $"transcript_{lang.ToString().ToLowerInvariant()}.txt";
        File.WriteAllText(fileName, result.Translation);

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine($"done → {fileName}");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

Step 5: Use Whisper's Built-In Translation Mode

Whisper can translate speech directly to English during transcription (single step, no separate translation model needed):

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3.5:9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);

Console.WriteLine("\n=== Whisper Direct Translation (to English) ===\n");

// Switch to translation mode
stt.Mode = SpeechToTextMode.Translation;

Console.Write("Translating audio directly to English... ");
var directTranslation = stt.Transcribe(audio);
Console.WriteLine("done.\n");

Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("=== Direct English Translation ===");
Console.ResetColor();
Console.WriteLine(directTranslation.Text);

// Switch back to transcription mode
stt.Mode = SpeechToTextMode.Transcription;

Approach	Languages	Quality	Speed
Whisper Translation mode	Any → English only	Good	Faster (single step)
Whisper + TextTranslation	Any → Any language	Better	Slower (two steps)

Use Whisper's built-in translation when you only need English output. Use the two-step pipeline when you need translations to non-English languages or higher quality.

Step 6: Batch Process Multilingual Recordings

Process a folder of audio files in mixed languages:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3.5:9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);

// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");

var translator = new TextTranslation(translationModel);

Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");

var translation = translator.Translate(originalText, targetLanguage);

Console.WriteLine("\n=== Batch Multilingual Processing ===\n");

string inputDir = "recordings";
string outputDir = "translations";

if (!Directory.Exists(inputDir))
{
    Console.WriteLine($"Create a '{inputDir}' folder with WAV files, then run again.");
    return;
}

Directory.CreateDirectory(outputDir);
Language batchTargetLanguage = Language.English;

string[] wavFiles = Directory.GetFiles(inputDir, "*.wav");
Console.WriteLine($"Processing {wavFiles.Length} file(s), translating to {batchTargetLanguage}\n");

var reportLines = new List<string>();
reportLines.Add("| File | Detected Language | Confidence | Duration |");
reportLines.Add("|---|---|---|---|");

foreach (string wavPath in wavFiles)
{
    string fileName = Path.GetFileNameWithoutExtension(wavPath);
    Console.Write($"  {Path.GetFileName(wavPath)}: ");

    try
    {
        using var wav = new WaveFile(wavPath);

        // Detect language
        var detected = stt.DetectLanguage(wav);

        // Transcribe
        var result = stt.Transcribe(wav);

        // Translate if not already in target language
        string finalText;
        if (detected.Language.Equals(batchTargetLanguage.ToString(),
            StringComparison.OrdinalIgnoreCase))
        {
            finalText = result.Text;
            Console.ForegroundColor = ConsoleColor.DarkGray;
            Console.Write($"[{detected.Language}] already {batchTargetLanguage} → ");
        }
        else
        {
            var translated = translator.Translate(result.Text, batchTargetLanguage);
            finalText = translated.Translation;
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write($"[{detected.Language}→{batchTargetLanguage}] ");
        }

        // Save original + translation
        string origPath = Path.Combine(outputDir, $"{fileName}_original.txt");
        string transPath = Path.Combine(outputDir, $"{fileName}_{batchTargetLanguage.ToString().ToLowerInvariant()}.txt");
        File.WriteAllText(origPath, result.Text);
        File.WriteAllText(transPath, finalText);

        reportLines.Add(
            $"| {Path.GetFileName(wavPath)} | {detected.Language} " +
            $"| {detected.Probability:P0} | {wav.Duration:mm\\:ss} |");

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("done");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

// Save processing report
string reportPath = Path.Combine(outputDir, "processing_report.md");
File.WriteAllLines(reportPath, reportLines);

Console.WriteLine($"\nReport: {reportPath}");
Console.WriteLine($"Translations: {Path.GetFullPath(outputDir)}");

Step 7: Summarize in Target Language

Combine transcription, translation, and summarization for cross-language document intelligence:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3.5:9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");

// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");

// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;

Console.WriteLine("\n=== Transcribe, Translate, and Summarize ===\n");

var summarizer = new Summarizer(translationModel)
{
    MaxContentWords = 150,
    MaxTitleWords = 10,
    GenerateTitle = true,
    GenerateContent = true,
    Intent = Summarizer.SummarizationIntent.Abstraction,
    TargetLanguage = Language.English,
    OverflowStrategy = Summarizer.OverflowResolutionStrategy.RecursiveSummarize
};

// Transcribe in original language, summarize directly in English
Summarizer.SummarizerResult summaryResult = summarizer.Summarize(originalText);

Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Title: {summaryResult.Title}");
Console.ResetColor();
Console.WriteLine($"Summary: {summaryResult.Summary}");

The TargetLanguage property on Summarizer produces the summary directly in the target language, regardless of the input language. This avoids a separate translation step when you only need a summary.

Model Selection

Whisper Models (Transcription)

Model ID	VRAM	Languages	Best For
`whisper-large-turbo3`	~870 MB	99 languages	Best multilingual accuracy (recommended)
`whisper-medium`	~820 MB	99 languages	Good alternative with similar VRAM
`whisper-small`	~260 MB	99 languages	Faster, lower accuracy on non-English

For multilingual audio, always prefer whisper-large-turbo3 or whisper-medium. Smaller models have significantly lower accuracy on non-English languages.

Translation Models

Model ID	VRAM	Quality	Best For
`qwen3.5:4b`	~3.5 GB	Good	Common language pairs (EN/FR/DE/ES/ZH/JA)
`qwen3.5:9b`	~7 GB	Very good	All language pairs (recommended)
`qwen3.6:27b`	~18 GB	Excellent	Low-resource languages, nuanced translation

The Qwen 3.5 family provides the strongest multilingual translation quality. For European and East Asian language pairs, qwen3.5:9b is the recommended balance of quality and speed.

Common Issues

Problem	Cause	Fix
Wrong language detected	Short audio clip or ambiguous speech	Force language: `stt.Transcribe(audio, language: "fr")`
Translation quality poor	Model too small for the language pair	Use `qwen3.5:9b` or larger; less common languages need bigger models
Whisper Translation mode outputs garbled text	Audio quality too poor	Use the two-step pipeline (transcribe + translate) instead
Slow batch processing	Two-step pipeline for every file	Use Whisper Translation mode for English-only output; reserve two-step for non-English targets
Characters missing in output	Console encoding not set	Ensure `Console.OutputEncoding = Encoding.UTF8` is set

Next Steps

Transcribe Audio with Local Speech-to-Text: foundational transcription guide.
Translate and Localize Content: text translation without audio.
Transcribe and Reformat Audio with LLM Post-Processing: clean up translated transcripts.
Generate Structured Meeting Notes from Audio Recordings: structure transcripts as meeting notes.

Table of Contents