Table of Contents

Build a Document Compliance Validation System with Checklists

Regulatory submissions, quality certifications, and internal audits require verifying that documents contain all mandatory sections, clauses, and data fields. Missing a required disclosure in an FDA filing or omitting a liability clause from an insurance policy can result in rejection, fines, or legal exposure. LM-Kit.NET's TextExtraction with boolean and string fields enables checklist-style validation: define what must be present, extract whether each item exists, and generate a compliance scorecard. This tutorial builds a document compliance validator for regulatory and business documents.


Why Local Compliance Validation Matters

Two enterprise problems that on-device compliance validation solves:

  1. Regulatory submission pre-screening. Pharmaceutical companies submitting documents to the FDA, EMA, or other agencies must verify completeness before submission. A missed section means rejection and months of delay. Automated pre-screening catches gaps before human reviewers spend time on incomplete documents.
  2. Insurance policy audit. Insurers must verify that policies contain all state-mandated disclosures, exclusion clauses, and coverage definitions. Manual review of thousands of policies is slow and error-prone. A checklist validator ensures every policy meets requirements before issuance.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n ComplianceValidator
cd ComplianceValidator
dotnet add package LM-Kit.NET

Step 2: Understand the Validation Approach

  Document          ┌────────────────────┐     ┌──────────────────────┐
  to validate ───►  │ TextExtraction     │ ──► │ Compliance           │
                    │ with checklist     │     │ Scorecard            │
                    │ schema             │     │                      │
                    │                    │     │ ✓ Section A: found  │
                    │ Bool fields:       │     │ ✗ Section B: MISSING│
                    │ "Does it contain?" │     │ ✓ Section C: found  │
                    │                    │     │ Score: 67%           │
                    └────────────────────┘     └──────────────────────┘

The key insight: use TextExtractionElement.ElementType.Bool fields to check for presence/absence of required items, combined with String fields to extract the actual content when present.


Step 3: The Complete Compliance Validator

using System.Text;
using System.Text.Json;
using LMKit.Data;
using LMKit.Extraction;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define compliance checklists
// ──────────────────────────────────────

// Checklist for an employment contract
var employmentChecklist = new List<TextExtractionElement>
{
    // Presence checks (boolean)
    new("has_job_title", TextExtractionElement.ElementType.Bool,
        "Whether the document specifies a job title or position", isRequired: true),
    new("has_compensation", TextExtractionElement.ElementType.Bool,
        "Whether the document specifies salary, wages, or compensation amount", isRequired: true),
    new("has_start_date", TextExtractionElement.ElementType.Bool,
        "Whether the document specifies an employment start date", isRequired: true),
    new("has_termination_clause", TextExtractionElement.ElementType.Bool,
        "Whether the document contains termination conditions or notice period", isRequired: true),
    new("has_confidentiality_clause", TextExtractionElement.ElementType.Bool,
        "Whether the document contains a confidentiality or non-disclosure clause"),
    new("has_non_compete_clause", TextExtractionElement.ElementType.Bool,
        "Whether the document contains a non-compete or non-solicitation clause"),
    new("has_benefits_section", TextExtractionElement.ElementType.Bool,
        "Whether the document describes employee benefits (health, PTO, retirement)"),
    new("has_governing_law", TextExtractionElement.ElementType.Bool,
        "Whether the document specifies governing law or jurisdiction"),
    new("has_dispute_resolution", TextExtractionElement.ElementType.Bool,
        "Whether the document specifies dispute resolution (arbitration, mediation, litigation)"),
    new("has_signature_blocks", TextExtractionElement.ElementType.Bool,
        "Whether the document has signature lines for both employer and employee"),

    // Content extraction for found items
    new("job_title", TextExtractionElement.ElementType.String, "The job title or position"),
    new("compensation_amount", TextExtractionElement.ElementType.String, "Salary or compensation details"),
    new("start_date", TextExtractionElement.ElementType.String, "Employment start date"),
    new("notice_period", TextExtractionElement.ElementType.String, "Termination notice period"),
    new("governing_law", TextExtractionElement.ElementType.String, "Governing law jurisdiction")
};

// ──────────────────────────────────────
// 3. Validate a sample document
// ──────────────────────────────────────
string sampleContract =
    "EMPLOYMENT AGREEMENT\n\n" +
    "This Employment Agreement (\"Agreement\") is entered into effective April 1, 2025, " +
    "between TechVentures Inc. (\"Employer\") and Alex Morgan (\"Employee\").\n\n" +
    "POSITION: Employee shall serve as Senior Software Engineer, reporting to the VP of Engineering.\n\n" +
    "COMPENSATION: Employee shall receive an annual base salary of $145,000, payable bi-weekly. " +
    "In addition, Employee shall be eligible for an annual performance bonus of up to 15% of base salary.\n\n" +
    "BENEFITS: Employee shall be entitled to participate in the Employer's health insurance plan, " +
    "401(k) retirement plan with 4% employer match, and 20 days of paid time off per year.\n\n" +
    "CONFIDENTIALITY: Employee agrees to hold in confidence all proprietary information, trade secrets, " +
    "and confidential business information disclosed during employment. This obligation survives " +
    "termination for a period of two (2) years.\n\n" +
    "TERMINATION: Either party may terminate this Agreement with thirty (30) days written notice. " +
    "Employer may terminate immediately for cause, including material breach or misconduct.\n\n" +
    "This Agreement shall be governed by the laws of the State of California.";

var extractor = new TextExtraction(model)
{
    NullOnDoubt = true,
    Elements = employmentChecklist,
    Guidance = "Evaluate whether each required section or clause is present in this employment contract. " +
               "Answer true only if the clause is explicitly stated, not merely implied."
};

extractor.SetContent(sampleContract);

Console.WriteLine("=== Employment Contract Compliance Check ===\n");

TextExtractionResult result = extractor.Parse();

// ──────────────────────────────────────
// 4. Generate compliance scorecard
// ──────────────────────────────────────
var checkItems = new (string field, string label, bool required)[]
{
    ("has_job_title", "Job Title / Position", true),
    ("has_compensation", "Compensation Details", true),
    ("has_start_date", "Start Date", true),
    ("has_termination_clause", "Termination Clause", true),
    ("has_confidentiality_clause", "Confidentiality Clause", false),
    ("has_non_compete_clause", "Non-Compete Clause", false),
    ("has_benefits_section", "Benefits Section", false),
    ("has_governing_law", "Governing Law", false),
    ("has_dispute_resolution", "Dispute Resolution", false),
    ("has_signature_blocks", "Signature Blocks", false),
};

int totalChecks = checkItems.Length;
int passedChecks = 0;
int requiredPassed = 0;
int requiredTotal = 0;
var failures = new List<string>();

Console.WriteLine("  Checklist Results:\n");

foreach (var (field, label, required) in checkItems)
{
    bool? found = result.GetValue<bool?>(field);
    bool passed = found == true;

    if (required) requiredTotal++;
    if (passed)
    {
        passedChecks++;
        if (required) requiredPassed++;
    }

    string status = passed ? "PASS" : (required ? "FAIL" : "MISSING");
    ConsoleColor color = passed ? ConsoleColor.Green : (required ? ConsoleColor.Red : ConsoleColor.Yellow);

    Console.ForegroundColor = color;
    string marker = passed ? "✓" : (required ? "✗" : "○");
    Console.Write($"    {marker} ");
    Console.ResetColor();
    Console.Write($"{label,-30}");
    Console.ForegroundColor = color;
    Console.Write($"[{status}]");
    Console.ResetColor();

    if (required && !passed)
        Console.Write("  ← REQUIRED");

    Console.WriteLine();

    if (!passed)
        failures.Add($"{label} ({(required ? "required" : "optional")})");
}

// Score
double score = (double)passedChecks / totalChecks * 100;
bool allRequiredPassed = requiredPassed == requiredTotal;

Console.WriteLine();
Console.ForegroundColor = allRequiredPassed ? ConsoleColor.Green : ConsoleColor.Red;
Console.WriteLine($"  Score: {passedChecks}/{totalChecks} ({score:F0}%)");
Console.WriteLine($"  Required: {requiredPassed}/{requiredTotal}");
Console.WriteLine($"  Status: {(allRequiredPassed ? "COMPLIANT" : "NON-COMPLIANT")}");
Console.ResetColor();

// Extracted details
Console.WriteLine("\n  Extracted Details:");
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"    Title:        {result.GetValue<string>("job_title") ?? "N/A"}");
Console.WriteLine($"    Compensation: {result.GetValue<string>("compensation_amount") ?? "N/A"}");
Console.WriteLine($"    Start Date:   {result.GetValue<string>("start_date") ?? "N/A"}");
Console.WriteLine($"    Notice:       {result.GetValue<string>("notice_period") ?? "N/A"}");
Console.WriteLine($"    Jurisdiction: {result.GetValue<string>("governing_law") ?? "N/A"}");
Console.ResetColor();

if (failures.Count > 0)
{
    Console.ForegroundColor = ConsoleColor.Red;
    Console.WriteLine("\n  Missing Items:");
    foreach (string f in failures)
        Console.WriteLine($"    - {f}");
    Console.ResetColor();
}

Step 4: Batch Compliance Validation

Process a folder of documents and generate a compliance report:

Console.WriteLine("\n=== Batch Compliance Validation ===\n");

string docsFolder = "contracts";

if (!Directory.Exists(docsFolder))
{
    Console.WriteLine($"Create a '{docsFolder}' folder with documents, then run again.");
    return;
}

string[] files = Directory.GetFiles(docsFolder)
    .Where(f => new[] { ".pdf", ".docx", ".txt" }
        .Contains(Path.GetExtension(f).ToLowerInvariant()))
    .ToArray();

var report = new List<string>();
report.Add("file,score_pct,required_passed,required_total,status,missing_items");

foreach (string filePath in files)
{
    string fileName = Path.GetFileName(filePath);
    Console.Write($"  {fileName}... ");

    extractor.SetContent(new Attachment(filePath));
    TextExtractionResult r = extractor.Parse();

    int passed = 0;
    int reqPassed = 0;
    int reqTotal = 0;
    var missing = new List<string>();

    foreach (var (field, label, required) in checkItems)
    {
        bool found = r.GetValue<bool?>(field) == true;
        if (found) passed++;
        if (required) { reqTotal++; if (found) reqPassed++; }
        if (!found) missing.Add(label);
    }

    double pct = (double)passed / checkItems.Length * 100;
    string status = reqPassed == reqTotal ? "COMPLIANT" : "NON-COMPLIANT";

    Console.ForegroundColor = status == "COMPLIANT" ? ConsoleColor.Green : ConsoleColor.Red;
    Console.WriteLine($"[{status}] {pct:F0}%");
    Console.ResetColor();

    report.Add($"\"{fileName}\",{pct:F0},{reqPassed},{reqTotal},\"{status}\",\"{string.Join("; ", missing)}\"");
}

File.WriteAllLines("compliance_report.csv", report);
Console.WriteLine($"\nReport saved to compliance_report.csv");

Step 5: Custom Checklists for Different Document Types

Define checklists for different regulatory contexts:

// Safety Data Sheet (SDS) checklist - GHS compliant
var sdsChecklist = new List<TextExtractionElement>
{
    new("has_product_identifier", TextExtractionElement.ElementType.Bool,
        "Section 1: Product identifier and company information", isRequired: true),
    new("has_hazard_identification", TextExtractionElement.ElementType.Bool,
        "Section 2: Hazard identification and GHS classification", isRequired: true),
    new("has_composition", TextExtractionElement.ElementType.Bool,
        "Section 3: Composition and ingredient information", isRequired: true),
    new("has_first_aid", TextExtractionElement.ElementType.Bool,
        "Section 4: First-aid measures", isRequired: true),
    new("has_fire_fighting", TextExtractionElement.ElementType.Bool,
        "Section 5: Fire-fighting measures", isRequired: true),
    new("has_accidental_release", TextExtractionElement.ElementType.Bool,
        "Section 6: Accidental release measures", isRequired: true),
    new("has_handling_storage", TextExtractionElement.ElementType.Bool,
        "Section 7: Handling and storage", isRequired: true),
    new("has_exposure_controls", TextExtractionElement.ElementType.Bool,
        "Section 8: Exposure controls and personal protection", isRequired: true),
    new("has_physical_chemical", TextExtractionElement.ElementType.Bool,
        "Section 9: Physical and chemical properties", isRequired: true),
    new("has_stability_reactivity", TextExtractionElement.ElementType.Bool,
        "Section 10: Stability and reactivity", isRequired: true),
    new("has_toxicological", TextExtractionElement.ElementType.Bool,
        "Section 11: Toxicological information", isRequired: true),
    new("has_ecological", TextExtractionElement.ElementType.Bool,
        "Section 12: Ecological information", isRequired: true),
    new("has_disposal", TextExtractionElement.ElementType.Bool,
        "Section 13: Disposal considerations", isRequired: true),
    new("has_transport", TextExtractionElement.ElementType.Bool,
        "Section 14: Transport information", isRequired: true),
    new("has_regulatory", TextExtractionElement.ElementType.Bool,
        "Section 15: Regulatory information", isRequired: true),
    new("has_other_information", TextExtractionElement.ElementType.Bool,
        "Section 16: Other information including revision date", isRequired: true),
};

// Privacy Policy checklist - GDPR compliant
var privacyChecklist = new List<TextExtractionElement>
{
    new("has_data_controller", TextExtractionElement.ElementType.Bool,
        "Identity and contact details of the data controller", isRequired: true),
    new("has_purposes", TextExtractionElement.ElementType.Bool,
        "Purposes of data processing and legal basis", isRequired: true),
    new("has_data_categories", TextExtractionElement.ElementType.Bool,
        "Categories of personal data collected", isRequired: true),
    new("has_retention_period", TextExtractionElement.ElementType.Bool,
        "Data retention period or criteria for determining it", isRequired: true),
    new("has_data_rights", TextExtractionElement.ElementType.Bool,
        "Data subject rights (access, rectification, erasure, portability)", isRequired: true),
    new("has_consent_withdrawal", TextExtractionElement.ElementType.Bool,
        "Right to withdraw consent at any time"),
    new("has_third_party_sharing", TextExtractionElement.ElementType.Bool,
        "Information about third-party data sharing or transfers"),
    new("has_dpo_contact", TextExtractionElement.ElementType.Bool,
        "Contact details for the Data Protection Officer"),
    new("has_complaint_right", TextExtractionElement.ElementType.Bool,
        "Right to lodge a complaint with a supervisory authority"),
};

Model Selection

Model ID VRAM Accuracy Best For
gemma3:4b ~3.5 GB Good Simple checklists, high throughput
qwen3:8b ~6 GB Very good Complex regulatory documents (recommended)
gemma3:12b ~8 GB Excellent Dense legal and compliance text
qwen3:14b ~10 GB Excellent Critical regulatory submissions

For compliance validation, accuracy matters more than speed. Use qwen3:8b or larger to minimize false positives and false negatives on mandatory checks.


Common Issues

Problem Cause Fix
False positives (item marked present but absent) Description too vague Make descriptions specific: "explicit termination notice period in days"
False negatives (item marked absent but present) Content uses different terminology Add synonyms in description; add Guidance with domain vocabulary
Low confidence on boolean fields Document is ambiguous Set NullOnDoubt = false to force a decision; review nulls as "uncertain"
Slow on large PDF compliance docs Processing entire document Use SetContent(attachment, pageRange: "1-5") for targeted validation

Next Steps