Table of Contents

๐Ÿ” Understanding Named Entity Recognition (NER) in LM-Kit.NET


๐Ÿ“„ TL;DR:

Named Entity Recognition (NER) identifies and extracts specific entitiesโ€”such as people, organizations, locations, dates, and moreโ€”from text or images. In LM-Kit.NET, the NamedEntityRecognition class leverages multimodal language models to accurately detect and label entities, customizable with built-in or user-defined types, supporting efficient data analysis and processing workflows.


๐Ÿ“ What is Named Entity Recognition?

Definition: Named Entity Recognition is an NLP task that identifies and categorizes key pieces of information (entities) within unstructured text or images, tagging them with predefined labels such as person, location, organization, date, and more.

  • Extraction: Detects exact entity occurrences.
  • Classification: Assigns each detected entity a specific category or label.
  • Customization: Supports built-in entity types or fully custom user-defined labels.

๐ŸŽฏ Why Use NER?

  1. Structured Insights: Convert unstructured data into structured, actionable insights.
  2. Data Enrichment: Enhance datasets for improved analytics and decision-making.
  3. Automation Efficiency: Reduce manual data tagging, saving time and resources.

โš™๏ธ Key Class: NamedEntityRecognition

Located in LMKit.TextAnalysis, NamedEntityRecognition encapsulates the NER logic:

public class NamedEntityRecognition
{
    // Constructor using default built-in entity definitions
    public NamedEntityRecognition(LM model);

    // Constructor using custom entity definitions
    public NamedEntityRecognition(LM model, List<EntityDefinition> definitions);

    // Confidence score of last recognition
    public float Confidence { get; }

    // Define entities to extract
    public List<EntityDefinition> EntityDefinitions { get; set; }

    // Preferred modality (text, image, multimodal)
    public InferenceModality PreferredInferenceModality { get; set; }

    // Synchronous recognition
    public IEnumerable<NamedEntity> Recognize(string content, CancellationToken token);

    // Asynchronous recognition
    public Task<IEnumerable<NamedEntity>> RecognizeAsync(string content, CancellationToken token);
}

๐Ÿ“Œ Supporting Types

EntityDefinition

Defines entity types to recognize:

  • Built-in Types: Person, Organization, Location, Date, etc.
  • Custom Types: User-defined labels for specialized entities.

NamedEntity

Represents identified entities:

  • Value: Exact matched text or image data.
  • EntityType: Predefined or custom category.
  • Position: StartIndex and EndIndex for positional context in text.

๐Ÿš€ Quickstart Example

Here's a quick example demonstrating basic NER usage:

var lmModel = new LM("model-path");
var ner = new NamedEntityRecognition(lmModel);

string text = "OpenAI was founded by Sam Altman in San Francisco on December 11, 2015.";
var entities = ner.Recognize(text, CancellationToken.None);

foreach (var entity in entities)
{
    Console.WriteLine($"Entity: {entity.Value}, Type: {entity.EntityType}");
}

๐Ÿ”ง Customizing Entity Recognition

Customize your NER setup to detect domain-specific entities:

var customDefinitions = new List<EntityDefinition>
{
    new EntityDefinition("PatentNumber"),
    new EntityDefinition(NamedEntityRecognition.NamedEntityType.Date)
};

var nerCustom = new NamedEntityRecognition(lmModel, customDefinitions);

๐Ÿ“– Common Terms

  • Entity: Key information such as names, locations, or dates.
  • Entity Type: Category labels like Person, Organization, etc.
  • Multimodal: Ability to process both text and image data.
  • Confidence Score: Indicates reliability of extracted entities.

  • Text Classification: Categorizing entire texts or documents.
  • Semantic Search: Finding information based on contextual meaning.
  • Retrieval Augmented Generation (RAG): Enhancing generation with relevant external context.

๐Ÿ“ Summary

In LM-Kit.NET, the NamedEntityRecognition class efficiently identifies and categorizes key entities from unstructured text or images. By enabling extensive customization and multimodal processing, LM-Kit provides robust, contextually accurate insights, enhancing automation, analytics, and data-driven decision-making workflows.