Transcribe and Reformat Audio with LLM Post-Processing

Raw speech-to-text output is functional but rarely publication-ready. Whisper transcriptions contain filler words, repetitions, missing punctuation, and run-on sentences that make transcripts hard to read. LM-Kit.NET lets you chain a Whisper transcription directly into an LLM for post-processing: correcting grammar, removing filler words, restructuring sentences, and producing clean, professional text. The entire pipeline runs locally with no cloud dependency. This tutorial builds a transcription pipeline that automatically reformats raw audio into polished, readable documents.

Why LLM Post-Processing Matters for Transcription

Two enterprise problems that LLM-based transcript reformatting solves:

Medical and legal dictation. Physicians dictating patient notes and attorneys recording case summaries speak naturally, with pauses, corrections, and filler words ("um", "uh", "you know"). Submitting raw transcriptions into electronic health records or case management systems creates unprofessional, hard-to-search records. An LLM post-processing step transforms dictation into clean, structured clinical notes or legal summaries without manual editing.
Interview and podcast production. Media teams recording interviews need clean transcripts for show notes, articles, and subtitles. Raw Whisper output requires hours of manual cleanup. An automated pipeline that corrects errors, removes verbal tics, and formats the output as proper paragraphs cuts post-production time from hours to seconds.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	~4.5 GB (Whisper model + chat model)
Disk	~4 GB free for model downloads
Audio file	A `.wav` file (16-bit PCM, any sample rate)

Step 1: Create the Project

dotnet new console -n TranscribeAndReformat
cd TranscribeAndReformat
dotnet add package LM-Kit.NET

Step 2: Understand the Pipeline

  Audio file (.wav)
        │
        ▼
  ┌──────────────────┐
  │  SpeechToText    │    Whisper model
  │  (transcribe)    │    Raw text with timestamps
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │  TextCorrection  │    Fix grammar, spelling, punctuation
  │  (clean up)      │
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────┐
  │  LLM Reformat    │    SingleTurnConversation
  │  (restructure)   │    Remove filler, add paragraphs
  └────────┬─────────┘
           │
           ▼
  Clean, formatted document

Stage	Component	Purpose
Transcribe	`SpeechToText`	Convert audio to raw text
Correct	`TextCorrection`	Fix grammar, spelling, punctuation errors
Reformat	`SingleTurnConversation`	Restructure into clean paragraphs, remove filler

Step 3: Transcribe and Correct Grammar

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);
string correctedText = corrector.Correct(rawText);

Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("\n=== Grammar-Corrected Text ===");
Console.WriteLine(correctedText);
Console.ResetColor();

Step 4: Reformat into Clean Paragraphs

Use an LLM to remove filler words, restructure sentences, and produce a polished document:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);
string correctedText = corrector.Correct(rawText);

// ──────────────────────────────────────
// 5. Reformat with LLM post-processing
// ──────────────────────────────────────
Console.WriteLine("\nReformatting transcript...\n");

var formatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a transcript editor. Your task is to reformat a speech transcription " +
                   "into a clean, professional document. Follow these rules:\n" +
                   "1. Remove all filler words (um, uh, like, you know, so, basically, I mean).\n" +
                   "2. Remove false starts and self-corrections (keep only the corrected version).\n" +
                   "3. Split the text into logical paragraphs.\n" +
                   "4. Fix any remaining grammar or punctuation issues.\n" +
                   "5. Preserve the original meaning and speaker's intent exactly.\n" +
                   "6. Do not add information that was not in the original.\n" +
                   "7. Output only the reformatted text with no commentary.",
    MaximumCompletionTokens = 4096
};

var reformatted = new StringBuilder();

formatter.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        reformatted.Append(e.Text);
};

formatter.Submit($"Reformat this transcript:\n\n{correctedText}");
string cleanText = reformatted.ToString();

Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine("=== Reformatted Document ===");
Console.ResetColor();
Console.WriteLine(cleanText);

// Save results
File.WriteAllText("transcript_raw.txt", rawText);
File.WriteAllText("transcript_corrected.txt", correctedText);
File.WriteAllText("transcript_final.txt", cleanText);
Console.WriteLine("\nSaved: transcript_raw.txt, transcript_corrected.txt, transcript_final.txt");

Step 5: Domain-Specific Reformatting

Customize the reformatting prompt for specific industries:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);

// Medical dictation: structure as clinical note
var medicalFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a medical transcription editor. Reformat this dictation into a " +
                   "structured clinical note with sections: Chief Complaint, History of Present " +
                   "Illness, Assessment, and Plan. Use standard medical abbreviations. " +
                   "Remove filler words. Preserve all clinical details exactly as stated.",
    MaximumCompletionTokens = 4096
};

// Legal dictation: structure as case summary
var legalFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a legal transcription editor. Reformat this dictation into a " +
                   "structured case memo. Use formal legal language. Organize by: Facts, " +
                   "Legal Issues, Analysis, and Recommended Action. Remove filler words " +
                   "and verbal hesitations. Preserve all factual details exactly as stated.",
    MaximumCompletionTokens = 4096
};

// Technical meeting: structure as engineering notes
var techFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a technical documentation editor. Reformat this recording into " +
                   "clean engineering notes. Use bullet points for decisions and action items. " +
                   "Preserve technical terms, version numbers, and specifications exactly. " +
                   "Remove filler words and off-topic tangents.",
    MaximumCompletionTokens = 4096
};

Step 6: Batch Processing Multiple Audio Files

Process an entire folder of recordings:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextEnhancement;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n=== Raw Transcription ({transcription.Segments.Count} segments) ===");
Console.WriteLine(rawText);
Console.ResetColor();

// ──────────────────────────────────────
// 4. Correct grammar and spelling
// ──────────────────────────────────────
Console.WriteLine("\nCorrecting grammar...");

var corrector = new TextCorrection(chatModel);

Console.WriteLine("\n=== Batch Transcription and Reformatting ===\n");

string inputDir = "recordings";
string outputDir = "transcripts";

if (!Directory.Exists(inputDir))
{
    Console.WriteLine($"Create a '{inputDir}' folder with WAV files, then run again.");
    return;
}

Directory.CreateDirectory(outputDir);

string[] wavFiles = Directory.GetFiles(inputDir, "*.wav");
Console.WriteLine($"Found {wavFiles.Length} audio file(s)\n");

foreach (string wavPath in wavFiles)
{
    string fileName = Path.GetFileNameWithoutExtension(wavPath);
    Console.Write($"  {Path.GetFileName(wavPath)}: ");

    try
    {
        // Transcribe
        using var wav = new WaveFile(wavPath);
        var result = stt.Transcribe(wav);

        // Correct grammar
        string correctedText = corrector.Correct(result.Text);

        // Reformat
        var output = new StringBuilder();
        formatter.AfterTextCompletion += (_, e) =>
        {
            if (e.SegmentType == TextSegmentType.UserVisible)
                output.Append(e.Text);
        };

        formatter.Submit($"Reformat this transcript:\n\n{correctedText}");

        // Save
        string outPath = Path.Combine(outputDir, $"{fileName}.txt");
        File.WriteAllText(outPath, output.ToString());

        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine($"done → {outPath}");
        Console.ResetColor();
    }
    catch (Exception ex)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"failed: {ex.Message}");
        Console.ResetColor();
    }
}

Console.WriteLine($"\nAll transcripts saved to {Path.GetFullPath(outputDir)}");

Step 7: Streaming the Reformatted Output

For real-time display of the reformatted transcript as the LLM generates it:

using System.Text;
using LMKit.Media.Audio;
using LMKit.Model;
using LMKit.Speech;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load Whisper model for transcription
// ──────────────────────────────────────
Console.WriteLine("Loading Whisper model...");
using LM whisperModel = LM.LoadFromModelID("whisper-large-turbo3",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Load a chat model for post-processing
// ──────────────────────────────────────
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma4:e4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Transcribe the audio
// ──────────────────────────────────────
string audioPath = "recording.wav";
if (!File.Exists(audioPath))
{
    Console.WriteLine($"Place a WAV file at '{audioPath}' and run again.");
    return;
}

var stt = new SpeechToText(whisperModel)
{
    EnableVoiceActivityDetection = true,
    SuppressNonSpeechTokens = true,
    SuppressHallucinations = true
};

Console.WriteLine($"Transcribing {audioPath}...");
using var audio = new WaveFile(audioPath);
Console.WriteLine($"  Duration: {audio.Duration:mm\\:ss\\.ff}");

var transcription = stt.Transcribe(audio);
string rawText = transcription.Text;

Console.WriteLine("\n=== Streaming Reformatted Output ===\n");

var streamFormatter = new SingleTurnConversation(chatModel)
{
    SystemPrompt = "You are a transcript editor. Remove filler words, fix grammar, " +
                   "and restructure into clean paragraphs. Output only the reformatted text.",
    MaximumCompletionTokens = 4096
};

streamFormatter.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write("Reformatted: ");
Console.ResetColor();

streamFormatter.Submit($"Reformat this transcript:\n\n{rawText}");
Console.WriteLine("\n");

Model Selection

Whisper Models (Transcription)

Model ID	VRAM	Speed	Best For
`whisper-large-turbo3`	~870 MB	Moderate	Best accuracy (recommended)
`whisper-small`	~260 MB	Fast	Quick processing, good enough quality
`whisper-base`	~80 MB	Very fast	Real-time previews

Chat Models (Post-Processing)

Model ID	VRAM	Quality	Best For
`gemma4:e4b`	~3.5 GB	Good	Fast reformatting, batch processing
`qwen3.5:9b`	~7 GB	Very good	Technical or domain-specific content
`gemma4:e4b`	~8 GB	Excellent	Complex restructuring, formal documents

Common Issues

Problem	Cause	Fix
Reformatted text changes meaning	System prompt too aggressive	Add "Preserve the original meaning exactly" to the prompt
Filler words still present	Grammar correction alone does not remove fillers	Use the LLM reformatting step (Step 4) in addition to TextCorrection
Output truncated	`MaximumCompletionTokens` too low	Increase to 4096 or higher; split long transcripts into chunks
Slow on long recordings	Entire transcript sent to LLM at once	Process in segments; send 5-minute chunks to the formatter
Domain terms misspelled after correction	Corrector does not know domain vocabulary	Use `stt.Prompt` with domain terms before transcription

Next Steps

Transcribe Audio with Local Speech-to-Text: foundational transcription without post-processing.
Generate Structured Meeting Notes from Audio Recordings: format transcripts as structured meeting notes.
Extract Action Items and Tasks from Meeting Recordings: pull tasks and deadlines from recordings.
Correct Grammar and Spelling: standalone grammar correction guide.

Table of Contents