Table of Contents

Transcribe and Reformat Audio with LLM Post-Processing

Raw speech-to-text output is functional but rarely publication-ready. Whisper transcriptions contain filler words, repetitions, missing punctuation, and run-on sentences that make transcripts hard to read. LM-Kit.NET lets you chain a Whisper transcription directly into an LLM for post-processing: correcting grammar, removing filler words, restructuring sentences, and producing clean, professional text. The entire pipeline runs locally with no cloud dependency. This tutorial builds a transcription pipeline that automatically reformats raw audio into polished, readable documents.


Why LLM Post-Processing Matters for Transcription

Two enterprise problems that LLM-based transcript reformatting solves:

  1. Medical and legal dictation. Physicians dictating patient notes and attorneys recording case summaries speak naturally, with pauses, corrections, and filler words ("um", "uh", "you know"). Submitting raw transcriptions into electronic health records or case management systems creates unprofessional, hard-to-search records. An LLM post-processing step transforms dictation into clean, structured clinical notes or legal summaries without manual editing.
  2. Interview and podcast production. Media teams recording interviews need clean transcripts for show notes, articles, and subtitles. Raw Whisper output requires hours of manual cleanup. An automated pipeline that corrects errors, removes verbal tics, and formats the output as proper paragraphs cuts post-production time from hours to seconds.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM ~4.5 GB (Whisper model + chat model)
Disk ~4 GB free for model downloads
Audio file A .wav file (16-bit PCM, any sample rate)

Step 1: Create the Project

dotnet new console -n TranscribeAndReformat
cd TranscribeAndReformat
dotnet add package LM-Kit.NET

Step 2: Understand the Pipeline

  Audio file (.wav)
        │
        ▼
  ┌──────────────────┐
  │  SpeechToText    │    Whisper model
  │  (transcribe)    │    Raw text with timestamps
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │  TextCorrection  │    Fix grammar, spelling, punctuation
  │  (clean up)      │
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │  LLM Reformat    │    SingleTurnConversation
  │  (restructure)   │    Remove filler, add paragraphs
  └────────┬─────────┘
           │
           ▼
  Clean, formatted document
Stage Component Purpose
Transcribe SpeechToText Convert audio to raw text
Correct TextCorrection Fix grammar, spelling, punctuation errors
Reformat SingleTurnConversation Restructure into clean paragraphs, remove filler

Step 3: Transcribe and Correct Grammar

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);
string correctedText = corrector.Correct(rawText);

Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("\n=== Grammar-Corrected Text ===");
Console.WriteLine(correctedText);
Console.ResetColor();

Step 4: Reformat into Clean Paragraphs

Use an LLM to remove filler words, restructure sentences, and produce a polished document:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);
string correctedText = corrector.Correct(rawText);

// ──────────────────────────────────────
// 5. Reformat with LLM post-processing
// ──────────────────────────────────────
Console.WriteLine("\nReformatting transcript...\n");

var formatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a transcript editor. Your task is to reformat a speech transcription " +
                   "into a clean, professional document. Follow these rules:\n" +
                   "1. Remove all filler words (um, uh, like, you know, so, basically, I mean).\n" +
                   "2. Remove false starts and self-corrections (keep only the corrected version).\n" +
                   "3. Split the text into logical paragraphs.\n" +
                   "4. Fix any remaining grammar or punctuation issues.\n" +
                   "5. Preserve the original meaning and speaker's intent exactly.\n" +
                   "6. Do not add information that was not in the original.\n" +
                   "7. Output only the reformatted text with no commentary.",
    MaximumCompletionTokens = 4096
};

var reformatted = new StringBuilder();

formatter.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        reformatted.Append(e.Text);
};

formatter.Submit($"Reformat this transcript:\n\n{correctedText}");
string cleanText = reformatted.ToString();

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine("=== Reformatted Document ===");
Console.ResetColor();
Console.WriteLine(cleanText);

// Save results
File.WriteAllText("transcript_raw.txt", rawText);
File.WriteAllText("transcript_corrected.txt", correctedText);
File.WriteAllText("transcript_final.txt", cleanText);
Console.WriteLine("\nSaved: transcript_raw.txt, transcript_corrected.txt, transcript_final.txt");

Step 5: Domain-Specific Reformatting

Customize the reformatting prompt for specific industries:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);

// Medical dictation: structure as clinical note
var medicalFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a medical transcription editor. Reformat this dictation into a " +
                   "structured clinical note with sections: Chief Complaint, History of Present " +
                   "Illness, Assessment, and Plan. Use standard medical abbreviations. " +
                   "Remove filler words. Preserve all clinical details exactly as stated.",
    MaximumCompletionTokens = 4096
};

// Legal dictation: structure as case summary
var legalFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a legal transcription editor. Reformat this dictation into a " +
                   "structured case memo. Use formal legal language. Organize by: Facts, " +
                   "Legal Issues, Analysis, and Recommended Action. Remove filler words " +
                   "and verbal hesitations. Preserve all factual details exactly as stated.",
    MaximumCompletionTokens = 4096
};

// Technical meeting: structure as engineering notes
var techFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a technical documentation editor. Reformat this recording into " +
                   "clean engineering notes. Use bullet points for decisions and action items. " +
                   "Preserve technical terms, version numbers, and specifications exactly. " +
                   "Remove filler words and off-topic tangents.",
    MaximumCompletionTokens = 4096
};

Step 6: Batch Processing Multiple Audio Files

Process an entire folder of recordings:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);

Console.WriteLine("\n=== Batch Transcription and Reformatting ===\n");

string inputDir = "recordings";
string outputDir = "transcripts";

if (!Directory.Exists(inputDir))
{
    Console.WriteLine($"Create a '{inputDir}' folder with WAV files, then run again.");
    return;
}

Directory.CreateDirectory(outputDir);

string[] wavFiles = Directory.GetFiles(inputDir, "*.wav");
Console.WriteLine($"Found {wavFiles.Length} audio file(s)\n");

foreach (string wavPath in wavFiles)
{
    string fileName = Path.GetFileNameWithoutExtension(wavPath);
    Console.Write($"  {Path.GetFileName(wavPath)}: ");

    try
    {
        // Transcribe
        using var wav = new WaveFile(wavPath);
        var result = stt.Transcribe(wav);

        // Correct grammar
        string correctedText = corrector.Correct(result.Text);

        // Reformat
        var output = new StringBuilder();
        formatter.AfterTextCompletion += (_, e) =>
        {
            if (e.SegmentType == TextSegmentType.UserVisible)
                output.Append(e.Text);
        };

        formatter.Submit($"Reformat this transcript:\n\n{correctedText}");

        // Save
        string outPath = Path.Combine(outputDir, $"{fileName}.txt");
        File.WriteAllText(outPath, output.ToString());

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine($"done → {outPath}");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

Console.WriteLine($"\nAll transcripts saved to {Path.GetFullPath(outputDir)}");

Step 7: Streaming the Reformatted Output

For real-time display of the reformatted transcript as the LLM generates it:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.WriteLine("\n=== Streaming Reformatted Output ===\n");

var streamFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a transcript editor. Remove filler words, fix grammar, " +
                   "and restructure into clean paragraphs. Output only the reformatted text.",
    MaximumCompletionTokens = 4096
};

streamFormatter.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write("Reformatted: ");
Console.ResetColor();

streamFormatter.Submit($"Reformat this transcript:\n\n{rawText}");
Console.WriteLine("\n");

Model Selection

Whisper Models (Transcription)

Model ID VRAM Speed Best For
whisper-large-turbo3 ~870 MB Moderate Best accuracy (recommended)
whisper-small ~260 MB Fast Quick processing, good enough quality
whisper-base ~80 MB Very fast Real-time previews

Chat Models (Post-Processing)

Model ID VRAM Quality Best For
gemma3:4b ~3.5 GB Good Fast reformatting, batch processing
qwen3:8b ~6 GB Very good Technical or domain-specific content
gemma3:12b ~8 GB Excellent Complex restructuring, formal documents

Common Issues

Problem Cause Fix
Reformatted text changes meaning System prompt too aggressive Add "Preserve the original meaning exactly" to the prompt
Filler words still present Grammar correction alone does not remove fillers Use the LLM reformatting step (Step 4) in addition to TextCorrection
Output truncated MaximumCompletionTokens too low Increase to 4096 or higher; split long transcripts into chunks
Slow on long recordings Entire transcript sent to LLM at once Process in segments; send 5-minute chunks to the formatter
Domain terms misspelled after correction Corrector does not know domain vocabulary Use stt.Prompt with domain terms before transcription

Next Steps

Share