Table of Contents

Extract Named Entities from Text

Named Entity Recognition (NER) identifies and classifies real-world objects in text: people, organizations, locations, dates, monetary amounts, and more. LM-Kit.NET's NamedEntityRecognition class extracts these entities with confidence scores and occurrence positions. It supports 20+ built-in entity types and custom definitions for domain-specific extraction. This tutorial builds a working NER system that processes text and documents.


Why Local NER Matters

Two enterprise problems that on-device NER solves:

  1. Extract structured data from unstructured text at scale. Contracts, news feeds, research papers, support tickets. Every organization sits on text that contains valuable structured information (names, dates, amounts) locked in prose. NER turns unstructured text into queryable data without manual tagging.
  2. Process sensitive documents without data exposure. Legal documents contain client names, financial terms, and case references. Medical records contain patient identifiers and treatment details. Local NER extracts entities from these documents without sending them to external APIs.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n NerQuickstart
cd NerQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand Entity Types

LM-Kit.NET provides 20+ built-in entity types through the NamedEntityType enum:

Category Types
Core Person, Location, Organization
Open-domain Event, Product, WorkOfArt, Language
Contact/PII PhoneNumber, EmailAddress, PostalAddress, Url, IpAddress
Temporal Date, Time, DateTime
Numeric Number, Percent, Ordinal, MonetaryAmount
Extensibility Other, Custom

By default, the constructor includes the most common built-in types. You can also provide your own list of EntityDefinition objects to focus on specific types or add custom ones.


Step 3: Basic Named Entity Extraction

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Extract entities
// ──────────────────────────────────────
var ner = new NamedEntityRecognition(model);

string text = """
    Apple Inc. announced today that CEO Tim Cook will present the company's Q3 earnings
    on August 1st, 2025, at their headquarters in Cupertino, California. The event starts
    at 2:00 PM Pacific Time. Analysts expect revenue of approximately $85.5 billion,
    representing a 12% increase year-over-year. For press inquiries, contact
    media@apple.com or call +1-408-996-1010.
    """;

List<NamedEntityRecognition.ExtractedEntity> entities = ner.Recognize(text);

Console.WriteLine($"Found {entities.Count} entities (confidence: {ner.Confidence:P0}):\n");

foreach (var entity in entities)
{
    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write($"  {entity.EntityDefinition.Label,-20}");
    Console.ResetColor();
    Console.Write($"  {entity.Value,-30}");
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  ({entity.Confidence:P0}, {entity.Occurrences.Count} occurrence(s))");
    Console.ResetColor();
}

Step 4: Custom Entity Definitions

Focus extraction on specific types, or add domain-specific custom entities:

// Only extract people, organizations, and monetary amounts
var focused = new NamedEntityRecognition(model, new List<NamedEntityRecognition.EntityDefinition>
{
    new(NamedEntityRecognition.NamedEntityType.Person),
    new(NamedEntityRecognition.NamedEntityType.Organization),
    new(NamedEntityRecognition.NamedEntityType.MonetaryAmount)
});

string contract = """
    This agreement between Acme Corporation and Jane Smith, effective January 15, 2025,
    establishes a consulting fee of $150 per hour with a monthly cap of $12,000.
    Payments will be processed by GlobalPay Services.
    """;

var contractEntities = focused.Recognize(contract);

Console.WriteLine("Contract entities:\n");
foreach (var entity in contractEntities)
{
    Console.WriteLine($"  [{entity.EntityDefinition.Label}] {entity.Value}");
}

Add custom entity types for your domain:

// Medical domain: add custom entity types alongside built-in ones
var medicalNer = new NamedEntityRecognition(model, new List<NamedEntityRecognition.EntityDefinition>
{
    new(NamedEntityRecognition.NamedEntityType.Person),
    new(NamedEntityRecognition.NamedEntityType.Date),
    new(NamedEntityRecognition.NamedEntityType.Organization),
    new("Medication"),
    new("Dosage"),
    new("Condition"),
    new("Procedure")
});

string medicalNote = """
    Patient John Martinez, DOB 03/15/1978, was seen at St. Mary's Hospital on
    December 10, 2024. Diagnosed with Type 2 Diabetes. Prescribed Metformin 500mg
    twice daily. Scheduled for HbA1c test in 3 months.
    """;

var medicalEntities = medicalNer.Recognize(medicalNote);

foreach (var entity in medicalEntities)
{
    Console.WriteLine($"  [{entity.EntityDefinition.Label}] {entity.Value} ({entity.Confidence:P0})");
}

Step 5: Extract from Documents

Process PDFs, images, and Office files using attachments:

using LMKit.Data;

var docNer = new NamedEntityRecognition(model);

string filePath = "signed_contract.pdf";
var attachment = new Attachment(filePath);

List<NamedEntityRecognition.ExtractedEntity> docEntities = docNer.Recognize(attachment);

Console.WriteLine($"Entities from {Path.GetFileName(filePath)}:\n");

// Group by entity type for cleaner output
var grouped = docEntities
    .GroupBy(e => e.EntityDefinition.Label)
    .OrderBy(g => g.Key);

foreach (var group in grouped)
{
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine($"  {group.Key}:");
    Console.ResetColor();

    foreach (var entity in group)
    {
        Console.WriteLine($"    {entity.Value} ({entity.Confidence:P0})");
    }
}

Step 6: Batch NER with Structured Output

Process multiple texts and export entities as structured CSV:

string[] documents =
{
    "Microsoft CEO Satya Nadella announced a $10 billion investment in AI research in Seattle.",
    "Dr. Sarah Chen published her findings on CRISPR at Stanford University on March 15, 2025.",
    "Tesla delivered 500,000 vehicles in Q4, generating $25.2 billion in revenue for Elon Musk's company."
};

var output = new List<string>();
output.Add("document_index,entity_type,value,confidence");

for (int i = 0; i < documents.Length; i++)
{
    var entities = ner.Recognize(documents[i]);

    foreach (var entity in entities)
    {
        output.Add($"{i},\"{entity.EntityDefinition.Label}\",\"{entity.Value.Replace("\"", "\"\"")}\",{entity.Confidence:F2}");
    }

    Console.WriteLine($"  Document {i}: {entities.Count} entities");
}

File.WriteAllLines("entities.csv", output);
Console.WriteLine($"\nExported to entities.csv");

Step 7: Using Guidance for Better Extraction

The Guidance property helps the model understand your context, improving accuracy for ambiguous text:

ner.Guidance = "This is a financial earnings report. " +
    "Treat ticker symbols as Organization entities. " +
    "Treat fiscal quarters (Q1, Q2, etc.) as Date entities.";

string financial = """
    AAPL reported Q2 FY2025 revenue of $94.8 billion, up 5% YoY. MSFT posted
    $61.9 billion in the same period. Both companies exceeded Wall Street estimates
    by approximately 3%.
    """;

var financialEntities = ner.Recognize(financial);

foreach (var entity in financialEntities)
{
    Console.WriteLine($"  [{entity.EntityDefinition.Label}] {entity.Value}");
}

Common Issues

Problem Cause Fix
Missing entities Default entity types do not cover your domain Add custom EntityDefinition objects for domain-specific types
Wrong entity type Ambiguous text (e.g., "Apple" as company vs. fruit) Add Guidance with domain context
Duplicate entities Same entity appears multiple times Check entity.Occurrences.Count; deduplicate by Value
Low confidence Short or ambiguous input Use a larger model (gemma3:12b) for nuanced text
Slow on large documents Document exceeds context window Set MaxContextLength or split into paragraphs

Next Steps