๐ Understanding Named Entity Recognition (NER) in LM-Kit.NET
๐ TL;DR:
Named Entity Recognition (NER) identifies and extracts specific entitiesโsuch as people, organizations, locations, dates, and moreโfrom text or images. In LM-Kit.NET, the NamedEntityRecognition class leverages multimodal language models to accurately detect and label entities, customizable with built-in or user-defined types, supporting efficient data analysis and processing workflows.
๐ What is Named Entity Recognition?
Definition: Named Entity Recognition is an NLP task that identifies and categorizes key pieces of information (entities) within unstructured text or images, tagging them with predefined labels such as person, location, organization, date, and more.
- Extraction: Detects exact entity occurrences.
- Classification: Assigns each detected entity a specific category or label.
- Customization: Supports built-in entity types or fully custom user-defined labels.
๐ฏ Why Use NER?
- Structured Insights: Convert unstructured data into structured, actionable insights.
- Data Enrichment: Enhance datasets for improved analytics and decision-making.
- Automation Efficiency: Reduce manual data tagging, saving time and resources.
โ๏ธ Key Class: NamedEntityRecognition
Located in LMKit.TextAnalysis
, NamedEntityRecognition encapsulates the NER logic:
public class NamedEntityRecognition
{
// Constructor using default built-in entity definitions
public NamedEntityRecognition(LM model);
// Constructor using custom entity definitions
public NamedEntityRecognition(LM model, List<EntityDefinition> definitions);
// Confidence score of last recognition
public float Confidence { get; }
// Define entities to extract
public List<EntityDefinition> EntityDefinitions { get; set; }
// Preferred modality (text, image, multimodal)
public InferenceModality PreferredInferenceModality { get; set; }
// Synchronous recognition
public IEnumerable<NamedEntity> Recognize(string content, CancellationToken token);
// Asynchronous recognition
public Task<IEnumerable<NamedEntity>> RecognizeAsync(string content, CancellationToken token);
}
๐ Supporting Types
EntityDefinition
Defines entity types to recognize:
- Built-in Types:
Person
,Organization
,Location
,Date
, etc. - Custom Types: User-defined labels for specialized entities.
NamedEntity
Represents identified entities:
- Value: Exact matched text or image data.
- EntityType: Predefined or custom category.
- Position:
StartIndex
andEndIndex
for positional context in text.
๐ Quickstart Example
Here's a quick example demonstrating basic NER usage:
var lmModel = new LM("model-path");
var ner = new NamedEntityRecognition(lmModel);
string text = "OpenAI was founded by Sam Altman in San Francisco on December 11, 2015.";
var entities = ner.Recognize(text, CancellationToken.None);
foreach (var entity in entities)
{
Console.WriteLine($"Entity: {entity.Value}, Type: {entity.EntityType}");
}
๐ง Customizing Entity Recognition
Customize your NER setup to detect domain-specific entities:
var customDefinitions = new List<EntityDefinition>
{
new EntityDefinition("PatentNumber"),
new EntityDefinition(NamedEntityRecognition.NamedEntityType.Date)
};
var nerCustom = new NamedEntityRecognition(lmModel, customDefinitions);
๐ Common Terms
- Entity: Key information such as names, locations, or dates.
- Entity Type: Category labels like
Person
,Organization
, etc. - Multimodal: Ability to process both text and image data.
- Confidence Score: Indicates reliability of extracted entities.
๐ Related Concepts
- Text Classification: Categorizing entire texts or documents.
- Semantic Search: Finding information based on contextual meaning.
- Retrieval Augmented Generation (RAG): Enhancing generation with relevant external context.
๐ Summary
In LM-Kit.NET, the NamedEntityRecognition class efficiently identifies and categorizes key entities from unstructured text or images. By enabling extensive customization and multimodal processing, LM-Kit provides robust, contextually accurate insights, enhancing automation, analytics, and data-driven decision-making workflows.