AI-Powered PII Extraction for .NET Applications
π― Purpose of the Sample
This PII Extraction Demo demonstrates how to use the LM-Kit.NET SDK to identify and extract personally identifiable information (PII) such as names, emails, addresses, IPs, phone numbers, and more from both text and image files. This capability supports privacy compliance, security, redaction, and content classification pipelines.
The demo uses the PiiExtraction
class, a high-level API designed for extracting standard and custom PII types accurately. It leverages LM-Kit's multimodal inference and Dynamic Sampling to operate quickly and reliably, even on large, unstructured documents or visual inputs.
π₯ Industry Target Audience
- π‘οΈ Compliance & Legal: Automate redaction of personal data in documents for GDPR, HIPAA, or CCPA compliance.
- π₯ Healthcare: Identify and redact sensitive patient identifiers in clinical notes.
- π HR & Recruiting: Extract names, emails, and phone numbers from CVs and applications.
- π Finance: Detect and structure account numbers, credit cards, and transaction metadata.
- π Intelligence & Security: Detect IPs, URLs, and user metadata in logs or communications.
- π Government: Scrub or tag identifiers in applications, forms, and citizen communications.
π Problem Solved
Manual detection and handling of PII across unstructured documents is error-prone and labor-intensive. Whether for compliance, analytics, or user data protection, automatic extraction streamlines workflows and reduces risks. This demo enables:
- Fast detection of standard and custom PII.
- Verbatim return of text with positional data.
- Local, offline inference preserving user privacy.
π» Sample Application Description
The demo is a console application that loads a specified model, then takes a file path to either a text file or an image. It uses the PiiExtraction
engine to extract PII from the document and returns a structured list with confidence scores and offsets.
β¨ Key Features
- π· Multimodal Support: Handles images, plain text or image+text input.
- π Entity Variety: Built-in support for names, emails, phone numbers, credit cards, SSNs, IPs, URLs, etc.
- π Positional Accuracy: Each extracted value includes start and end indices.
- π Local Processing: Ensures privacy without external APIs.
- π§© Customization: Easily define and extract custom entity types.
π§ Supported Models
- MiniCPM 2.6 o Vision 8.1B
- Alibaba Qwen 2.5 Vision (3B, 7B)
- Google Gemma 3 Vision (4B, 12B)
- Any user-supplied model with vision/text support
π οΈ Getting Started
π Prerequisites
- .NET 6.0 or higher
- ImageMagick (optional, for uncommon image formats)
π₯ Download the Project
βΆοΈ Running the Application
git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/pii_extraction
dotnet build
dotnet run
π‘ Example Usage
Set the License Key:
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Choose Your Model: The app prompts model selection or accepts a custom URI.
Provide a File Path:
> C:\Users\user\Documents\sample-id.png
View Output:
5 detected entities | processing time: 00:00:02.350 Person: "Loïc Carrère" (confidence=0.97) EmailAddress: "user@example.com" (confidence=0.95) PhoneNumber: "+33 6 12 34 56 78" (confidence=0.92) PostalAddress: "10 Rue de Paris, 75000 Paris" (confidence=0.91) CreditCardNumber: "4111 1111 1111 1111" (confidence=0.88)
π§© Optional Customization
engine.PiiEntityDefinitions = new List<PiiEntityDefinition>
{
new PiiEntityDefinition("SocialInsuranceNumber"),
new PiiEntityDefinition("BankRoutingCode")
};
π Additional Notes
- Confidence is per-entity.
- Positional metadata helps highlight/redact in original content.
- Results are returned verbatim (case and punctuation preserved).