Extract Named Entities from Text
Named Entity Recognition (NER) identifies and classifies real-world objects in text: people, organizations, locations, dates, monetary amounts, and more. LM-Kit.NET's NamedEntityRecognition class extracts these entities with confidence scores and occurrence positions. It supports 20+ built-in entity types and custom definitions for domain-specific extraction. This tutorial builds a working NER system that processes text and documents.
Why Local NER Matters
Two enterprise problems that on-device NER solves:
- Extract structured data from unstructured text at scale. Contracts, news feeds, research papers, support tickets. Every organization sits on text that contains valuable structured information (names, dates, amounts) locked in prose. NER turns unstructured text into queryable data without manual tagging.
- Process sensitive documents without data exposure. Legal documents contain client names, financial terms, and case references. Medical records contain patient identifiers and treatment details. Local NER extracts entities from these documents without sending them to external APIs.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n NerQuickstart
cd NerQuickstart
dotnet add package LM-Kit.NET
Step 2: Understand Entity Types
LM-Kit.NET provides 20+ built-in entity types through the NamedEntityType enum:
| Category | Types |
|---|---|
| Core | Person, Location, Organization |
| Open-domain | Event, Product, WorkOfArt, Language |
| Contact/PII | PhoneNumber, EmailAddress, PostalAddress, Url, IpAddress |
| Temporal | Date, Time, DateTime |
| Numeric | Number, Percent, Ordinal, MonetaryAmount |
| Extensibility | Other, Custom |
By default, the constructor includes the most common built-in types. You can also provide your own list of EntityDefinition objects to focus on specific types or add custom ones.
Step 3: Basic Named Entity Extraction
using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Extract entities
// ──────────────────────────────────────
var ner = new NamedEntityRecognition(model);
string text = """
Apple Inc. announced today that CEO Tim Cook will present the company's Q3 earnings
on August 1st, 2025, at their headquarters in Cupertino, California. The event starts
at 2:00 PM Pacific Time. Analysts expect revenue of approximately $85.5 billion,
representing a 12% increase year-over-year. For press inquiries, contact
media@apple.com or call +1-408-996-1010.
""";
List<NamedEntityRecognition.ExtractedEntity> entities = ner.Recognize(text);
Console.WriteLine($"Found {entities.Count} entities (confidence: {ner.Confidence:P0}):\n");
foreach (var entity in entities)
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write($" {entity.EntityDefinition.Label,-20}");
Console.ResetColor();
Console.Write($" {entity.Value,-30}");
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($" ({entity.Confidence:P0}, {entity.Occurrences.Count} occurrence(s))");
Console.ResetColor();
}
Step 4: Custom Entity Definitions
Focus extraction on specific types, or add domain-specific custom entities:
// Only extract people, organizations, and monetary amounts
var focused = new NamedEntityRecognition(model, new List<NamedEntityRecognition.EntityDefinition>
{
new(NamedEntityRecognition.NamedEntityType.Person),
new(NamedEntityRecognition.NamedEntityType.Organization),
new(NamedEntityRecognition.NamedEntityType.MonetaryAmount)
});
string contract = """
This agreement between Acme Corporation and Jane Smith, effective January 15, 2025,
establishes a consulting fee of $150 per hour with a monthly cap of $12,000.
Payments will be processed by GlobalPay Services.
""";
var contractEntities = focused.Recognize(contract);
Console.WriteLine("Contract entities:\n");
foreach (var entity in contractEntities)
{
Console.WriteLine($" [{entity.EntityDefinition.Label}] {entity.Value}");
}
Add custom entity types for your domain:
// Medical domain: add custom entity types alongside built-in ones
var medicalNer = new NamedEntityRecognition(model, new List<NamedEntityRecognition.EntityDefinition>
{
new(NamedEntityRecognition.NamedEntityType.Person),
new(NamedEntityRecognition.NamedEntityType.Date),
new(NamedEntityRecognition.NamedEntityType.Organization),
new("Medication"),
new("Dosage"),
new("Condition"),
new("Procedure")
});
string medicalNote = """
Patient John Martinez, DOB 03/15/1978, was seen at St. Mary's Hospital on
December 10, 2024. Diagnosed with Type 2 Diabetes. Prescribed Metformin 500mg
twice daily. Scheduled for HbA1c test in 3 months.
""";
var medicalEntities = medicalNer.Recognize(medicalNote);
foreach (var entity in medicalEntities)
{
Console.WriteLine($" [{entity.EntityDefinition.Label}] {entity.Value} ({entity.Confidence:P0})");
}
Step 5: Extract from Documents
Process PDFs, images, and Office files using attachments:
using LMKit.Data;
var docNer = new NamedEntityRecognition(model);
string filePath = "signed_contract.pdf";
var attachment = new Attachment(filePath);
List<NamedEntityRecognition.ExtractedEntity> docEntities = docNer.Recognize(attachment);
Console.WriteLine($"Entities from {Path.GetFileName(filePath)}:\n");
// Group by entity type for cleaner output
var grouped = docEntities
.GroupBy(e => e.EntityDefinition.Label)
.OrderBy(g => g.Key);
foreach (var group in grouped)
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($" {group.Key}:");
Console.ResetColor();
foreach (var entity in group)
{
Console.WriteLine($" {entity.Value} ({entity.Confidence:P0})");
}
}
Step 6: Batch NER with Structured Output
Process multiple texts and export entities as structured CSV:
string[] documents =
{
"Microsoft CEO Satya Nadella announced a $10 billion investment in AI research in Seattle.",
"Dr. Sarah Chen published her findings on CRISPR at Stanford University on March 15, 2025.",
"Tesla delivered 500,000 vehicles in Q4, generating $25.2 billion in revenue for Elon Musk's company."
};
var output = new List<string>();
output.Add("document_index,entity_type,value,confidence");
for (int i = 0; i < documents.Length; i++)
{
var entities = ner.Recognize(documents[i]);
foreach (var entity in entities)
{
output.Add($"{i},\"{entity.EntityDefinition.Label}\",\"{entity.Value.Replace("\"", "\"\"")}\",{entity.Confidence:F2}");
}
Console.WriteLine($" Document {i}: {entities.Count} entities");
}
File.WriteAllLines("entities.csv", output);
Console.WriteLine($"\nExported to entities.csv");
Step 7: Using Guidance for Better Extraction
The Guidance property helps the model understand your context, improving accuracy for ambiguous text:
ner.Guidance = "This is a financial earnings report. " +
"Treat ticker symbols as Organization entities. " +
"Treat fiscal quarters (Q1, Q2, etc.) as Date entities.";
string financial = """
AAPL reported Q2 FY2025 revenue of $94.8 billion, up 5% YoY. MSFT posted
$61.9 billion in the same period. Both companies exceeded Wall Street estimates
by approximately 3%.
""";
var financialEntities = ner.Recognize(financial);
foreach (var entity in financialEntities)
{
Console.WriteLine($" [{entity.EntityDefinition.Label}] {entity.Value}");
}
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Missing entities | Default entity types do not cover your domain | Add custom EntityDefinition objects for domain-specific types |
| Wrong entity type | Ambiguous text (e.g., "Apple" as company vs. fruit) | Add Guidance with domain context |
| Duplicate entities | Same entity appears multiple times | Check entity.Occurrences.Count; deduplicate by Value |
| Low confidence | Short or ambiguous input | Use a larger model (gemma3:12b) for nuanced text |
| Slow on large documents | Document exceeds context window | Set MaxContextLength or split into paragraphs |
Next Steps
- Extract Structured Data from Unstructured Text: schema-driven extraction for typed fields.
- Build a Classification and Extraction Pipeline: classify then extract.
- Extract PII and Redact Sensitive Data: PII-specific extraction with redaction.
- Samples: Named Entity Recognition: NER demo.