Understanding Named Entity Recognition (NER) in LM-Kit.NET
TL;DR
Named Entity Recognition (NER) identifies and extracts specific entities, such as people, organizations, locations, dates, and more, from text or images. In LM-Kit.NET, the NamedEntityRecognition class leverages multimodal language models to accurately detect and label entities, customizable with built-in or user-defined types, supporting efficient data analysis and processing workflows.
What is Named Entity Recognition?
Definition: Named Entity Recognition is an NLP task that identifies and categorizes key pieces of information (entities) within unstructured text or images, tagging them with predefined labels such as person, location, organization, date, and more.
- Extraction: Detects exact entity occurrences.
- Classification: Assigns each detected entity a specific category or label.
- Customization: Supports built-in entity types or fully custom user-defined labels.
Why Use NER?
- Structured Insights: Convert unstructured data into structured, actionable insights.
- Data Enrichment: Enhance datasets for improved analytics and decision-making.
- Automation Efficiency: Reduce manual data tagging, saving time and resources.
Key Class: NamedEntityRecognition
Located in LMKit.TextAnalysis, NamedEntityRecognition encapsulates the NER logic:
public class NamedEntityRecognition
{
// Constructor using default built-in entity definitions
public NamedEntityRecognition(LM model);
// Constructor using custom entity definitions
public NamedEntityRecognition(LM model, List<EntityDefinition> definitions);
// Confidence score of last recognition
public float Confidence { get; }
// Define entities to extract
public List<EntityDefinition> EntityDefinitions { get; set; }
// Preferred modality (text, image, multimodal)
public InferenceModality PreferredInferenceModality { get; set; }
// Synchronous recognition
public IEnumerable<NamedEntity> Recognize(string content, CancellationToken token);
// Asynchronous recognition
public Task<IEnumerable<NamedEntity>> RecognizeAsync(string content, CancellationToken token);
}
Supporting Types
EntityDefinition
Defines entity types to recognize:
- Built-in Types:
Person,Organization,Location,Date, etc. - Custom Types: User-defined labels for specialized entities.
NamedEntity
Represents identified entities:
- Value: Exact matched text or image data.
- EntityType: Predefined or custom category.
- Position:
StartIndexandEndIndexfor positional context in text.
Quickstart Example
Here's a quick example demonstrating basic NER usage:
var model = LM.LoadFromModelID("gemma3:12b");
var ner = new NamedEntityRecognition(model);
string text = "OpenAI was founded by Sam Altman in San Francisco on December 11, 2015.";
var entities = ner.Recognize(text, CancellationToken.None);
foreach (var entity in entities)
{
Console.WriteLine($"Entity: {entity.Value}, Type: {entity.EntityType}");
}
Customizing Entity Recognition
Customize your NER setup to detect domain-specific entities:
var customDefinitions = new List<EntityDefinition>
{
new EntityDefinition("PatentNumber"),
new EntityDefinition(NamedEntityRecognition.NamedEntityType.Date)
};
var nerCustom = new NamedEntityRecognition(model, customDefinitions);
Key Terms
- Entity: Key information such as names, locations, or dates.
- Entity Type: Category labels like
Person,Organization, etc. - Multimodal: Ability to process both text and image data.
- Confidence Score: Indicates reliability of extracted entities.
Summary
In LM-Kit.NET, the NamedEntityRecognition class efficiently identifies and categorizes key entities from unstructured text or images. By enabling extensive customization and multimodal processing, LM-Kit provides robust, contextually accurate insights, enhancing automation, analytics, and data-driven decision-making workflows.
Related API Documentation
NamedEntityRecognition: Core NER classEntityDefinition: Define entity typesNamedEntity: Recognized entity resultTextExtraction: Structured data extraction
Related Glossary Topics
- Extraction
- Structured Data Extraction
- Classification
- Semantic Similarity
- RAG (Retrieval-Augmented Generation)
- LLM
- Inference
- Prompt Engineering
- Dynamic Sampling
- Function Calling
External Resources
- LM-Kit NER Demo: Step-by-step tutorial
- spaCy NER: NER concepts reference