Build a Multilingual Audio Translation Pipeline
Global organizations operate across languages: customer calls in Spanish, partner meetings in Japanese, training videos in German. LM-Kit.NET lets you chain local speech-to-text with text translation to convert audio recordings from any language into any other language, entirely on-device. No cloud APIs, no per-minute billing, and no audio data leaving your infrastructure. This tutorial builds a multilingual audio pipeline that detects the spoken language, transcribes the audio, and translates the transcript into one or more target languages.
Why Local Audio Translation Matters
Two enterprise problems that on-device audio translation solves:
- Multinational compliance recordings. Financial institutions operating across Europe and Asia record client advisory calls in local languages. Compliance teams in the home office need English translations for regulatory review. Cloud translation services create data residency issues under GDPR and local banking regulations. A local pipeline keeps recordings on-premises while producing translations for cross-border oversight.
- Multilingual customer support analysis. A global support center handles calls in 15+ languages. Quality assurance teams need to review call transcripts in a common language to identify trends, training gaps, and compliance issues. An automated translate-and-archive pipeline lets QA analysts review all calls in English regardless of the original language.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | ~4.5 GB (Whisper model + translation model) |
| Disk | ~4 GB free for model downloads |
| Audio file | A .wav file (16-bit PCM, any sample rate) |
Step 1: Create the Project
dotnet new console -n AudioTranslationPipeline
cd AudioTranslationPipeline
dotnet add package LM-Kit.NET
Step 2: Understand the Pipeline
Audio file (.wav)
│
▼
┌──────────────────────┐
│ SpeechToText │ Whisper model
│ DetectLanguage() │ Identify spoken language
│ Transcribe() │ Original-language text
└────────┬─────────────┘
│
▼
┌──────────────────────┐
│ TextTranslation │ LLM translation
│ Translate() │ Target language text
└────────┬─────────────┘
│
▼
Translated transcript
(one or more languages)
| Stage | Component | Purpose |
|---|---|---|
| Detect | SpeechToText.DetectLanguage |
Identify the spoken language automatically |
| Transcribe | SpeechToText.Transcribe |
Convert audio to text in the original language |
| Translate | TextTranslation.Translate |
Translate transcript to target language(s) |
Step 3: Detect, Transcribe, and Translate
using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
return;
}
var stt = new SpeechToText(whisperModel)
{
EnableVoiceActivityDetection = true,
SuppressNonSpeechTokens = true,
SuppressHallucinations = true
};
using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");
// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");
// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);
// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");
var translator = new TextTranslation(translationModel);
Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");
var translation = translator.Translate(originalText, targetLanguage);
string translatedText = translation.Translation;
Console.WriteLine("done.\n");
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"=== Translated Transcript ({targetLanguage}) ===");
Console.ResetColor();
Console.WriteLine(translatedText);
// Save both versions
File.WriteAllText("transcript_original.txt", originalText);
File.WriteAllText("transcript_translated.txt", translatedText);
Console.WriteLine("\nSaved: transcript_original.txt, transcript_translated.txt");
Step 4: Translate to Multiple Languages
Generate translations in several target languages at once:
using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
return;
}
var stt = new SpeechToText(whisperModel)
{
EnableVoiceActivityDetection = true,
SuppressNonSpeechTokens = true,
SuppressHallucinations = true
};
using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");
// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");
// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);
// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");
var translator = new TextTranslation(translationModel);
Console.WriteLine("\n=== Multi-Language Translation ===\n");
Language[] targetLanguages =
{
Language.English,
Language.French,
Language.German,
Language.Spanish,
Language.Japanese
};
foreach (Language lang in targetLanguages)
{
Console.Write($" Translating to {lang}... ");
try
{
var result = translator.Translate(originalText, lang);
string fileName = $"transcript_{lang.ToString().ToLowerInvariant()}.txt";
File.WriteAllText(fileName, result.Translation);
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"done → {fileName}");
Console.ResetColor();
}
catch (Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"failed: {ex.Message}");
Console.ResetColor();
}
}
Step 5: Use Whisper's Built-In Translation Mode
Whisper can translate speech directly to English during transcription (single step, no separate translation model needed):
using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
return;
}
var stt = new SpeechToText(whisperModel)
{
EnableVoiceActivityDetection = true,
SuppressNonSpeechTokens = true,
SuppressHallucinations = true
};
using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");
// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");
// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);
// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");
var translator = new TextTranslation(translationModel);
Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");
var translation = translator.Translate(originalText, targetLanguage);
Console.WriteLine("\n=== Whisper Direct Translation (to English) ===\n");
// Switch to translation mode
stt.Mode = SpeechToTextMode.Translation;
Console.Write("Translating audio directly to English... ");
var directTranslation = stt.Transcribe(audio);
Console.WriteLine("done.\n");
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("=== Direct English Translation ===");
Console.ResetColor();
Console.WriteLine(directTranslation.Text);
// Switch back to transcription mode
stt.Mode = SpeechToTextMode.Transcription;
| Approach | Languages | Quality | Speed |
|---|---|---|---|
| Whisper Translation mode | Any → English only | Good | Faster (single step) |
| Whisper + TextTranslation | Any → Any language | Better | Slower (two steps) |
Use Whisper's built-in translation when you only need English output. Use the two-step pipeline when you need translations to non-English languages or higher quality.
Step 6: Batch Process Multilingual Recordings
Process a folder of audio files in mixed languages:
using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
return;
}
var stt = new SpeechToText(whisperModel)
{
EnableVoiceActivityDetection = true,
SuppressNonSpeechTokens = true,
SuppressHallucinations = true
};
using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");
// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");
// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n=== Original Transcript ({langDetection.Language}) ===");
Console.ResetColor();
Console.WriteLine(originalText);
// ──────────────────────────────────────
// 4. Translate to target language
// ──────────────────────────────────────
Console.WriteLine("\n=== Translation ===\n");
var translator = new TextTranslation(translationModel);
Language targetLanguage = Language.English;
Console.Write($"Translating to {targetLanguage}... ");
var translation = translator.Translate(originalText, targetLanguage);
Console.WriteLine("\n=== Batch Multilingual Processing ===\n");
string inputDir = "recordings";
string outputDir = "translations";
if (!Directory.Exists(inputDir))
{
Console.WriteLine($"Create a '{inputDir}' folder with WAV files, then run again.");
return;
}
Directory.CreateDirectory(outputDir);
Language batchTargetLanguage = Language.English;
string[] wavFiles = Directory.GetFiles(inputDir, "*.wav");
Console.WriteLine($"Processing {wavFiles.Length} file(s), translating to {batchTargetLanguage}\n");
var reportLines = new List<string>();
reportLines.Add("| File | Detected Language | Confidence | Duration |");
reportLines.Add("|---|---|---|---|");
foreach (string wavPath in wavFiles)
{
string fileName = Path.GetFileNameWithoutExtension(wavPath);
Console.Write($" {Path.GetFileName(wavPath)}: ");
try
{
using var wav = new WaveFile(wavPath);
// Detect language
var detected = stt.DetectLanguage(wav);
// Transcribe
var result = stt.Transcribe(wav);
// Translate if not already in target language
string finalText;
if (detected.Language.Equals(batchTargetLanguage.ToString(),
StringComparison.OrdinalIgnoreCase))
{
finalText = result.Text;
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.Write($"[{detected.Language}] already {batchTargetLanguage} → ");
}
else
{
var translated = translator.Translate(result.Text, batchTargetLanguage);
finalText = translated.Translation;
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write($"[{detected.Language}→{batchTargetLanguage}] ");
}
// Save original + translation
string origPath = Path.Combine(outputDir, $"{fileName}_original.txt");
string transPath = Path.Combine(outputDir, $"{fileName}_{batchTargetLanguage.ToString().ToLowerInvariant()}.txt");
File.WriteAllText(origPath, result.Text);
File.WriteAllText(transPath, finalText);
reportLines.Add(
$"| {Path.GetFileName(wavPath)} | {detected.Language} " +
$"| {detected.Probability:P0} | {wav.Duration:mm\\:ss} |");
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("done");
Console.ResetColor();
}
catch (Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"failed: {ex.Message}");
Console.ResetColor();
}
}
// Save processing report
string reportPath = Path.Combine(outputDir, "processing_report.md");
File.WriteAllLines(reportPath, reportLines);
Console.WriteLine($"\nReport: {reportPath}");
Console.WriteLine($"Translations: {Path.GetFullPath(outputDir)}");
Step 7: Summarize in Target Language
Combine transcription, translation, and summarization for cross-language document intelligence:
using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.Translation;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load Whisper model
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Load translation model
// ──────────────────────────────────────
Console.WriteLine("Loading translation model...");
using LM translationModel = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Transcribe and detect language
// ──────────────────────────────────────
string audioPath = "call_recording.wav";
if (!File.Exists(audioPath))
{
Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
return;
}
var stt = new SpeechToText(whisperModel)
{
EnableVoiceActivityDetection = true,
SuppressNonSpeechTokens = true,
SuppressHallucinations = true
};
using var audio = new WaveFile(audioPath);
Console.WriteLine($"Audio: {audioPath} ({audio.Duration:mm\\:ss\\.ff})\n");
// Detect language first
Console.Write("Detecting language... ");
var langDetection = stt.DetectLanguage(audio);
Console.WriteLine($"{langDetection.Language} (confidence: {langDetection.Probability:P0})\n");
// Transcribe in the detected language
Console.WriteLine("Transcribing...");
var transcription = stt.Transcribe(audio);
string originalText = transcription.Text;
Console.WriteLine("\n=== Transcribe, Translate, and Summarize ===\n");
var summarizer = new Summarizer(translationModel)
{
MaxContentWords = 150,
MaxTitleWords = 10,
GenerateTitle = true,
GenerateContent = true,
Intent = Summarizer.SummarizationIntent.Abstraction,
TargetLanguage = Language.English,
OverflowStrategy = Summarizer.OverflowResolutionStrategy.RecursiveSummarize
};
// Transcribe in original language, summarize directly in English
Summarizer.SummarizerResult summaryResult = summarizer.Summarize(originalText);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Title: {summaryResult.Title}");
Console.ResetColor();
Console.WriteLine($"Summary: {summaryResult.Summary}");
The TargetLanguage property on Summarizer produces the summary directly in the target language, regardless of the input language. This avoids a separate translation step when you only need a summary.
Model Selection
Whisper Models (Transcription)
| Model ID | VRAM | Languages | Best For |
|---|---|---|---|
whisper-large-turbo3 |
~870 MB | 99 languages | Best multilingual accuracy (recommended) |
whisper-medium |
~820 MB | 99 languages | Good alternative with similar VRAM |
whisper-small |
~260 MB | 99 languages | Faster, lower accuracy on non-English |
For multilingual audio, always prefer whisper-large-turbo3 or whisper-medium. Smaller models have significantly lower accuracy on non-English languages.
Translation Models
| Model ID | VRAM | Quality | Best For |
|---|---|---|---|
qwen3:4b |
~3.5 GB | Good | Common language pairs (EN/FR/DE/ES/ZH/JA) |
qwen3:8b |
~6 GB | Very good | All language pairs (recommended) |
qwen3:14b |
~10 GB | Excellent | Low-resource languages, nuanced translation |
The Qwen3 family provides the strongest multilingual translation quality. For European and East Asian language pairs, qwen3:8b is the recommended balance of quality and speed.
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Wrong language detected | Short audio clip or ambiguous speech | Force language: stt.Transcribe(audio, language: "fr") |
| Translation quality poor | Model too small for the language pair | Use qwen3:8b or larger; less common languages need bigger models |
| Whisper Translation mode outputs garbled text | Audio quality too poor | Use the two-step pipeline (transcribe + translate) instead |
| Slow batch processing | Two-step pipeline for every file | Use Whisper Translation mode for English-only output; reserve two-step for non-English targets |
| Characters missing in output | Console encoding not set | Ensure Console.OutputEncoding = Encoding.UTF8 is set |
Next Steps
- Transcribe Audio with Local Speech-to-Text: foundational transcription guide.
- Translate and Localize Content: text translation without audio.
- Transcribe and Reformat Audio with LLM Post-Processing: clean up translated transcripts.
- Generate Structured Meeting Notes from Audio Recordings: structure transcripts as meeting notes.