Extract PII and Redact Sensitive Data
Applications that process user content (support tickets, uploaded documents, form submissions) need to detect and handle personally identifiable information (PII) before storing, sharing, or analyzing that data. LM-Kit.NET's PiiExtraction class identifies 11 built-in PII types (names, emails, phone numbers, SSNs, credit cards, and more) with confidence scores and occurrence positions. This tutorial builds a PII detection and redaction system that processes text and documents locally.
Why Local PII Extraction Matters
Two enterprise problems that on-device PII detection solves:
- Comply with privacy regulations without sending data to third parties. GDPR, CCPA, HIPAA all require knowing what PII you hold. But sending documents to a cloud PII detection API means a third party processes the very data you are trying to protect. Local extraction keeps sensitive data within your infrastructure during the detection phase.
- Redact before sharing. Customer support logs shared with analytics teams, medical records shared with researchers, legal documents shared with external counsel. All need PII stripped first. A local system redacts at the source before data ever moves.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n PiiQuickstart
cd PiiQuickstart
dotnet add package LM-Kit.NET
Step 2: Understand PII Entity Types
LM-Kit.NET detects these PII types out of the box:
| Type | Examples |
|---|---|
Person |
"John Smith", "Dr. Sarah Chen" |
EmailAddress |
"user@example.com" |
PhoneNumber |
"+1-650-555-1234" |
PostalAddress |
"1600 Amphitheatre Parkway, Mountain View, CA" |
Url |
"https://example.com/profile/12345" |
IpAddress |
"192.168.0.1", "2001:db8::1" |
DateOfBirth |
"01/15/1980", "March 3rd, 1992" |
SocialSecurityNumber |
"123-45-6789" |
CreditCardNumber |
"4111 1111 1111 1111" |
BankAccountNumber |
"000123456789" |
Other |
Catch-all for unclassified PII (opt-in) |
You can also define custom PII types for domain-specific identifiers (patient IDs, employee numbers, account codes).
Step 3: Basic PII Extraction
using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Extract PII
// ──────────────────────────────────────
var pii = new PiiExtraction(model);
string text = """
Dear Support Team,
My name is James Wilson and I'm writing about order #45832. I purchased a laptop
on my Visa card ending in 4242 (full number: 4532-1234-5678-4242). The delivery
address is 742 Evergreen Terrace, Springfield, IL 62704.
You can reach me at james.wilson@email.com or call (555) 867-5309. My date of
birth for verification is 03/15/1985, and my SSN on file is 234-56-7890.
Thanks,
James Wilson
""";
List<PiiExtraction.PiiExtractedEntity> entities = pii.Extract(text);
Console.WriteLine($"Found {entities.Count} PII entities (confidence: {pii.Confidence:P0}):\n");
foreach (var entity in entities)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.Write($" {entity.EntityDefinition.Label,-25}");
Console.ResetColor();
Console.Write($" {entity.Value,-40}");
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($" ({entity.Confidence:P0})");
Console.ResetColor();
}
Step 4: Redact PII from Text
Build a redaction function that replaces PII with type-labeled placeholders:
string RedactText(string originalText, List<PiiExtraction.PiiExtractedEntity> entities)
{
string redacted = originalText;
// Sort by value length descending to avoid partial replacements
var sorted = entities.OrderByDescending(e => e.Value.Length);
foreach (var entity in sorted)
{
string placeholder = $"[{entity.EntityDefinition.Label.ToUpper()}]";
redacted = redacted.Replace(entity.Value, placeholder);
}
return redacted;
}
// Extract and redact
var piiEntities = pii.Extract(text);
string redacted = RedactText(text, piiEntities);
Console.WriteLine("Redacted output:\n");
Console.WriteLine(redacted);
Expected output:
Dear Support Team,
My name is [PERSON] and I'm writing about order #45832. I purchased a laptop
on my Visa card ending in 4242 (full number: [CREDITCARDNUMBER]). The delivery
address is [POSTALADDRESS].
You can reach me at [EMAILADDRESS] or call [PHONENUMBER]. My date of
birth for verification is [DATEOFBIRTH], and my SSN on file is [SOCIALSECURITYNUMBER].
Thanks,
[PERSON]
Step 5: Extract PII from Documents
Process PDFs, images, and scanned documents:
using LMKit.Data;
var pii = new PiiExtraction(model);
string filePath = "customer_application.pdf";
var attachment = new Attachment(filePath);
List<PiiExtraction.PiiExtractedEntity> docEntities = pii.Extract(attachment);
Console.WriteLine($"PII found in {Path.GetFileName(filePath)}:\n");
// Group by type for a compliance report
var grouped = docEntities
.GroupBy(e => e.EntityDefinition.Label)
.OrderBy(g => g.Key);
foreach (var group in grouped)
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($" {group.Key} ({group.Count()}):");
Console.ResetColor();
foreach (var entity in group)
{
Console.WriteLine($" {entity.Value} ({entity.Confidence:P0})");
}
}
Step 6: Custom PII Definitions
Add domain-specific PII types that the default set does not cover:
var customPii = new PiiExtraction(model, new List<PiiExtraction.PiiEntityDefinition>
{
// Keep the standard types you need
new(PiiExtraction.PiiEntityType.Person),
new(PiiExtraction.PiiEntityType.EmailAddress),
new(PiiExtraction.PiiEntityType.PhoneNumber),
new(PiiExtraction.PiiEntityType.SocialSecurityNumber),
// Add custom types
new("PatientID"),
new("MedicalRecordNumber"),
new("InsurancePolicyNumber"),
new("EmployeeID")
});
customPii.Guidance = "This is a healthcare document. " +
"Patient IDs follow the format P-XXXXX. " +
"Medical record numbers follow the format MRN-XXXXXXXX.";
string medicalText = """
Patient: Maria Garcia (P-48231)
MRN: MRN-20241589
Insurance: BlueCross Policy #BC-9912-4456-01
Emergency Contact: Carlos Garcia, (555) 234-8901
SSN: 567-89-0123
Employee ID: EMP-3391 (referring physician)
""";
var medicalEntities = customPii.Extract(medicalText);
foreach (var entity in medicalEntities)
{
Console.WriteLine($" [{entity.EntityDefinition.Label}] {entity.Value}");
}
Step 7: Batch PII Audit
Scan a directory of files and generate a compliance report:
string[] files = Directory.GetFiles("customer_data", "*.*")
.Where(f => f.EndsWith(".txt") || f.EndsWith(".pdf"))
.ToArray();
var report = new List<string>();
report.Add("file,pii_type,value,confidence");
int totalPii = 0;
Console.WriteLine($"Scanning {files.Length} files for PII...\n");
foreach (string file in files)
{
string content = File.ReadAllText(file);
var entities = pii.Extract(content);
string fileName = Path.GetFileName(file);
totalPii += entities.Count;
foreach (var entity in entities)
{
report.Add($"\"{fileName}\",\"{entity.EntityDefinition.Label}\"," +
$"\"{entity.Value.Replace("\"", "\"\"")}\",{entity.Confidence:F2}");
}
ConsoleColor color = entities.Count > 0 ? ConsoleColor.Yellow : ConsoleColor.Green;
Console.ForegroundColor = color;
Console.WriteLine($" {fileName}: {entities.Count} PII entities");
Console.ResetColor();
}
File.WriteAllLines("pii_audit_report.csv", report);
Console.WriteLine($"\nTotal PII found: {totalPii} across {files.Length} files");
Console.WriteLine("Report saved to pii_audit_report.csv");
Step 8: Confidence-Based Filtering
Not all detections are equally certain. Use confidence scores to separate confirmed PII from uncertain matches:
var entities = pii.Extract(text);
var confirmed = entities.Where(e => e.Confidence >= 0.85f).ToList();
var uncertain = entities.Where(e => e.Confidence < 0.85f).ToList();
Console.WriteLine($"Confirmed PII ({confirmed.Count}):");
foreach (var e in confirmed)
Console.WriteLine($" {e.EntityDefinition.Label}: {e.Value} ({e.Confidence:P0})");
Console.WriteLine($"\nUncertain (needs review) ({uncertain.Count}):");
foreach (var e in uncertain)
Console.WriteLine($" {e.EntityDefinition.Label}: {e.Value} ({e.Confidence:P0})");
// Auto-redact confirmed, flag uncertain for human review
string autoRedacted = RedactText(text, confirmed);
This two-tier approach reduces false positives while still catching high-confidence PII automatically.
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Missing PII types | Default set does not include Other |
Pass includeOtherType: true to constructor, or add custom definitions |
| False positives on product codes | Model confuses identifiers | Add Guidance to describe your document context |
| Poor detection on scanned PDFs | Image quality too low | Set pii.OcrEngine to a configured OcrEngine instance |
| Slow on large documents | Full document processed at once | Set MaxContextLength to limit processing window |
| Custom types not detected | Label too vague | Use descriptive labels ("PatientID" not "ID") and add Guidance with format examples |
Next Steps
- Extract Named Entities from Text: general-purpose entity extraction beyond PII.
- Extract Structured Data from Unstructured Text: schema-driven extraction for typed fields.
- Build a Content Moderation Filter: combine PII detection with content moderation.
- Samples: PII Extraction: PII extraction demo.
- Samples: Batch PII Extraction: batch PII processing demo.