Build a Document Compliance Validation System with Checklists
Regulatory submissions, quality certifications, and internal audits require verifying that documents contain all mandatory sections, clauses, and data fields. Missing a required disclosure in an FDA filing or omitting a liability clause from an insurance policy can result in rejection, fines, or legal exposure. LM-Kit.NET's TextExtraction with boolean and string fields enables checklist-style validation: define what must be present, extract whether each item exists, and generate a compliance scorecard. This tutorial builds a document compliance validator for regulatory and business documents.
Why Local Compliance Validation Matters
Two enterprise problems that on-device compliance validation solves:
- Regulatory submission pre-screening. Pharmaceutical companies submitting documents to the FDA, EMA, or other agencies must verify completeness before submission. A missed section means rejection and months of delay. Automated pre-screening catches gaps before human reviewers spend time on incomplete documents.
- Insurance policy audit. Insurers must verify that policies contain all state-mandated disclosures, exclusion clauses, and coverage definitions. Manual review of thousands of policies is slow and error-prone. A checklist validator ensures every policy meets requirements before issuance.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n ComplianceValidator
cd ComplianceValidator
dotnet add package LM-Kit.NET
Step 2: Understand the Validation Approach
Document ┌────────────────────┐ ┌──────────────────────┐
to validate ───► │ TextExtraction │ ──► │ Compliance │
│ with checklist │ │ Scorecard │
│ schema │ │ │
│ │ │ ✓ Section A: found │
│ Bool fields: │ │ ✗ Section B: MISSING│
│ "Does it contain?" │ │ ✓ Section C: found │
│ │ │ Score: 67% │
└────────────────────┘ └──────────────────────┘
The key insight: use TextExtractionElement.ElementType.Bool fields to check for presence/absence of required items, combined with String fields to extract the actual content when present.
Step 3: The Complete Compliance Validator
using System.Text;
using System.Text.Json;
using LMKit.Data;
using LMKit.Extraction;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:8b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Define compliance checklists
// ──────────────────────────────────────
// Checklist for an employment contract
var employmentChecklist = new List<TextExtractionElement>
{
// Presence checks (boolean)
new("has_job_title", TextExtractionElement.ElementType.Bool,
"Whether the document specifies a job title or position", isRequired: true),
new("has_compensation", TextExtractionElement.ElementType.Bool,
"Whether the document specifies salary, wages, or compensation amount", isRequired: true),
new("has_start_date", TextExtractionElement.ElementType.Bool,
"Whether the document specifies an employment start date", isRequired: true),
new("has_termination_clause", TextExtractionElement.ElementType.Bool,
"Whether the document contains termination conditions or notice period", isRequired: true),
new("has_confidentiality_clause", TextExtractionElement.ElementType.Bool,
"Whether the document contains a confidentiality or non-disclosure clause"),
new("has_non_compete_clause", TextExtractionElement.ElementType.Bool,
"Whether the document contains a non-compete or non-solicitation clause"),
new("has_benefits_section", TextExtractionElement.ElementType.Bool,
"Whether the document describes employee benefits (health, PTO, retirement)"),
new("has_governing_law", TextExtractionElement.ElementType.Bool,
"Whether the document specifies governing law or jurisdiction"),
new("has_dispute_resolution", TextExtractionElement.ElementType.Bool,
"Whether the document specifies dispute resolution (arbitration, mediation, litigation)"),
new("has_signature_blocks", TextExtractionElement.ElementType.Bool,
"Whether the document has signature lines for both employer and employee"),
// Content extraction for found items
new("job_title", TextExtractionElement.ElementType.String, "The job title or position"),
new("compensation_amount", TextExtractionElement.ElementType.String, "Salary or compensation details"),
new("start_date", TextExtractionElement.ElementType.String, "Employment start date"),
new("notice_period", TextExtractionElement.ElementType.String, "Termination notice period"),
new("governing_law", TextExtractionElement.ElementType.String, "Governing law jurisdiction")
};
// ──────────────────────────────────────
// 3. Validate a sample document
// ──────────────────────────────────────
string sampleContract =
"EMPLOYMENT AGREEMENT\n\n" +
"This Employment Agreement (\"Agreement\") is entered into effective April 1, 2025, " +
"between TechVentures Inc. (\"Employer\") and Alex Morgan (\"Employee\").\n\n" +
"POSITION: Employee shall serve as Senior Software Engineer, reporting to the VP of Engineering.\n\n" +
"COMPENSATION: Employee shall receive an annual base salary of $145,000, payable bi-weekly. " +
"In addition, Employee shall be eligible for an annual performance bonus of up to 15% of base salary.\n\n" +
"BENEFITS: Employee shall be entitled to participate in the Employer's health insurance plan, " +
"401(k) retirement plan with 4% employer match, and 20 days of paid time off per year.\n\n" +
"CONFIDENTIALITY: Employee agrees to hold in confidence all proprietary information, trade secrets, " +
"and confidential business information disclosed during employment. This obligation survives " +
"termination for a period of two (2) years.\n\n" +
"TERMINATION: Either party may terminate this Agreement with thirty (30) days written notice. " +
"Employer may terminate immediately for cause, including material breach or misconduct.\n\n" +
"This Agreement shall be governed by the laws of the State of California.";
var extractor = new TextExtraction(model)
{
NullOnDoubt = true,
Elements = employmentChecklist,
Guidance = "Evaluate whether each required section or clause is present in this employment contract. " +
"Answer true only if the clause is explicitly stated, not merely implied."
};
extractor.SetContent(sampleContract);
Console.WriteLine("=== Employment Contract Compliance Check ===\n");
TextExtractionResult result = extractor.Parse();
// ──────────────────────────────────────
// 4. Generate compliance scorecard
// ──────────────────────────────────────
var checkItems = new (string field, string label, bool required)[]
{
("has_job_title", "Job Title / Position", true),
("has_compensation", "Compensation Details", true),
("has_start_date", "Start Date", true),
("has_termination_clause", "Termination Clause", true),
("has_confidentiality_clause", "Confidentiality Clause", false),
("has_non_compete_clause", "Non-Compete Clause", false),
("has_benefits_section", "Benefits Section", false),
("has_governing_law", "Governing Law", false),
("has_dispute_resolution", "Dispute Resolution", false),
("has_signature_blocks", "Signature Blocks", false),
};
int totalChecks = checkItems.Length;
int passedChecks = 0;
int requiredPassed = 0;
int requiredTotal = 0;
var failures = new List<string>();
Console.WriteLine(" Checklist Results:\n");
foreach (var (field, label, required) in checkItems)
{
bool? found = result.GetValue<bool?>(field);
bool passed = found == true;
if (required) requiredTotal++;
if (passed)
{
passedChecks++;
if (required) requiredPassed++;
}
string status = passed ? "PASS" : (required ? "FAIL" : "MISSING");
ConsoleColor color = passed ? ConsoleColor.Green : (required ? ConsoleColor.Red : ConsoleColor.Yellow);
Console.ForegroundColor = color;
string marker = passed ? "✓" : (required ? "✗" : "○");
Console.Write($" {marker} ");
Console.ResetColor();
Console.Write($"{label,-30}");
Console.ForegroundColor = color;
Console.Write($"[{status}]");
Console.ResetColor();
if (required && !passed)
Console.Write(" ← REQUIRED");
Console.WriteLine();
if (!passed)
failures.Add($"{label} ({(required ? "required" : "optional")})");
}
// Score
double score = (double)passedChecks / totalChecks * 100;
bool allRequiredPassed = requiredPassed == requiredTotal;
Console.WriteLine();
Console.ForegroundColor = allRequiredPassed ? ConsoleColor.Green : ConsoleColor.Red;
Console.WriteLine($" Score: {passedChecks}/{totalChecks} ({score:F0}%)");
Console.WriteLine($" Required: {requiredPassed}/{requiredTotal}");
Console.WriteLine($" Status: {(allRequiredPassed ? "COMPLIANT" : "NON-COMPLIANT")}");
Console.ResetColor();
// Extracted details
Console.WriteLine("\n Extracted Details:");
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($" Title: {result.GetValue<string>("job_title") ?? "N/A"}");
Console.WriteLine($" Compensation: {result.GetValue<string>("compensation_amount") ?? "N/A"}");
Console.WriteLine($" Start Date: {result.GetValue<string>("start_date") ?? "N/A"}");
Console.WriteLine($" Notice: {result.GetValue<string>("notice_period") ?? "N/A"}");
Console.WriteLine($" Jurisdiction: {result.GetValue<string>("governing_law") ?? "N/A"}");
Console.ResetColor();
if (failures.Count > 0)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine("\n Missing Items:");
foreach (string f in failures)
Console.WriteLine($" - {f}");
Console.ResetColor();
}
Step 4: Batch Compliance Validation
Process a folder of documents and generate a compliance report:
Console.WriteLine("\n=== Batch Compliance Validation ===\n");
string docsFolder = "contracts";
if (!Directory.Exists(docsFolder))
{
Console.WriteLine($"Create a '{docsFolder}' folder with documents, then run again.");
return;
}
string[] files = Directory.GetFiles(docsFolder)
.Where(f => new[] { ".pdf", ".docx", ".txt" }
.Contains(Path.GetExtension(f).ToLowerInvariant()))
.ToArray();
var report = new List<string>();
report.Add("file,score_pct,required_passed,required_total,status,missing_items");
foreach (string filePath in files)
{
string fileName = Path.GetFileName(filePath);
Console.Write($" {fileName}... ");
extractor.SetContent(new Attachment(filePath));
TextExtractionResult r = extractor.Parse();
int passed = 0;
int reqPassed = 0;
int reqTotal = 0;
var missing = new List<string>();
foreach (var (field, label, required) in checkItems)
{
bool found = r.GetValue<bool?>(field) == true;
if (found) passed++;
if (required) { reqTotal++; if (found) reqPassed++; }
if (!found) missing.Add(label);
}
double pct = (double)passed / checkItems.Length * 100;
string status = reqPassed == reqTotal ? "COMPLIANT" : "NON-COMPLIANT";
Console.ForegroundColor = status == "COMPLIANT" ? ConsoleColor.Green : ConsoleColor.Red;
Console.WriteLine($"[{status}] {pct:F0}%");
Console.ResetColor();
report.Add($"\"{fileName}\",{pct:F0},{reqPassed},{reqTotal},\"{status}\",\"{string.Join("; ", missing)}\"");
}
File.WriteAllLines("compliance_report.csv", report);
Console.WriteLine($"\nReport saved to compliance_report.csv");
Step 5: Custom Checklists for Different Document Types
Define checklists for different regulatory contexts:
// Safety Data Sheet (SDS) checklist - GHS compliant
var sdsChecklist = new List<TextExtractionElement>
{
new("has_product_identifier", TextExtractionElement.ElementType.Bool,
"Section 1: Product identifier and company information", isRequired: true),
new("has_hazard_identification", TextExtractionElement.ElementType.Bool,
"Section 2: Hazard identification and GHS classification", isRequired: true),
new("has_composition", TextExtractionElement.ElementType.Bool,
"Section 3: Composition and ingredient information", isRequired: true),
new("has_first_aid", TextExtractionElement.ElementType.Bool,
"Section 4: First-aid measures", isRequired: true),
new("has_fire_fighting", TextExtractionElement.ElementType.Bool,
"Section 5: Fire-fighting measures", isRequired: true),
new("has_accidental_release", TextExtractionElement.ElementType.Bool,
"Section 6: Accidental release measures", isRequired: true),
new("has_handling_storage", TextExtractionElement.ElementType.Bool,
"Section 7: Handling and storage", isRequired: true),
new("has_exposure_controls", TextExtractionElement.ElementType.Bool,
"Section 8: Exposure controls and personal protection", isRequired: true),
new("has_physical_chemical", TextExtractionElement.ElementType.Bool,
"Section 9: Physical and chemical properties", isRequired: true),
new("has_stability_reactivity", TextExtractionElement.ElementType.Bool,
"Section 10: Stability and reactivity", isRequired: true),
new("has_toxicological", TextExtractionElement.ElementType.Bool,
"Section 11: Toxicological information", isRequired: true),
new("has_ecological", TextExtractionElement.ElementType.Bool,
"Section 12: Ecological information", isRequired: true),
new("has_disposal", TextExtractionElement.ElementType.Bool,
"Section 13: Disposal considerations", isRequired: true),
new("has_transport", TextExtractionElement.ElementType.Bool,
"Section 14: Transport information", isRequired: true),
new("has_regulatory", TextExtractionElement.ElementType.Bool,
"Section 15: Regulatory information", isRequired: true),
new("has_other_information", TextExtractionElement.ElementType.Bool,
"Section 16: Other information including revision date", isRequired: true),
};
// Privacy Policy checklist - GDPR compliant
var privacyChecklist = new List<TextExtractionElement>
{
new("has_data_controller", TextExtractionElement.ElementType.Bool,
"Identity and contact details of the data controller", isRequired: true),
new("has_purposes", TextExtractionElement.ElementType.Bool,
"Purposes of data processing and legal basis", isRequired: true),
new("has_data_categories", TextExtractionElement.ElementType.Bool,
"Categories of personal data collected", isRequired: true),
new("has_retention_period", TextExtractionElement.ElementType.Bool,
"Data retention period or criteria for determining it", isRequired: true),
new("has_data_rights", TextExtractionElement.ElementType.Bool,
"Data subject rights (access, rectification, erasure, portability)", isRequired: true),
new("has_consent_withdrawal", TextExtractionElement.ElementType.Bool,
"Right to withdraw consent at any time"),
new("has_third_party_sharing", TextExtractionElement.ElementType.Bool,
"Information about third-party data sharing or transfers"),
new("has_dpo_contact", TextExtractionElement.ElementType.Bool,
"Contact details for the Data Protection Officer"),
new("has_complaint_right", TextExtractionElement.ElementType.Bool,
"Right to lodge a complaint with a supervisory authority"),
};
Model Selection
| Model ID | VRAM | Accuracy | Best For |
|---|---|---|---|
gemma3:4b |
~3.5 GB | Good | Simple checklists, high throughput |
qwen3:8b |
~6 GB | Very good | Complex regulatory documents (recommended) |
gemma3:12b |
~8 GB | Excellent | Dense legal and compliance text |
qwen3:14b |
~10 GB | Excellent | Critical regulatory submissions |
For compliance validation, accuracy matters more than speed. Use qwen3:8b or larger to minimize false positives and false negatives on mandatory checks.
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| False positives (item marked present but absent) | Description too vague | Make descriptions specific: "explicit termination notice period in days" |
| False negatives (item marked absent but present) | Content uses different terminology | Add synonyms in description; add Guidance with domain vocabulary |
| Low confidence on boolean fields | Document is ambiguous | Set NullOnDoubt = false to force a decision; review nulls as "uncertain" |
| Slow on large PDF compliance docs | Processing entire document | Use SetContent(attachment, pageRange: "1-5") for targeted validation |
Next Steps
- Automate Contract and Compliance Document Review: classify-then-extract pipeline for contracts.
- Extract Structured Data from Unstructured Text: deep dive into schema-driven extraction.
- Build a Classification and Extraction Pipeline: route documents to type-specific validators.
- Build a Self-Healing Extraction Pipeline with Fallbacks: retry and fallback strategies for production.