Table of Contents

Class KeywordExtraction

Namespace
LMKit.TextAnalysis
Assembly
LM-Kit.NET.dll

A class designed to handle keyword extraction tasks. This class provides functionality to extract a specified number of the most important keywords or phrases from a given piece of content, while respecting constraints such as maximum n-gram size.

public sealed class KeywordExtraction
Inheritance
KeywordExtraction
Inherited Members

Remarks

This class supports extraction from both text and images.

Constructors

KeywordExtraction(LM)

Initializes a new instance of the KeywordExtraction class with the specified language model.

Properties

Confidence

Gets the confidence score of the last keyword extraction operation. This score is influenced by how the model terminated and how well the generation followed the requested schema.

Guidance

Gets or sets optional guidance text that can influence the extraction process. This can be used to steer the model towards certain themes or constraints.

KeywordCount

Gets or sets the number of keywords to extract. While this value sets a desired target, there is no guarantee that the model will produce the exact number of keywords requested. The actual number of extracted keywords depends on the model's capacities and the input data. However, the specified maximum number will never be exceeded.

Default: 5

MaxNgramSize

Gets or sets the maximum allowed n-gram size for extracted keywords. This value serves as guidance for the extraction process but does not guarantee that each extracted keyword or phrase will exactly match the specified size. The output depends on the model's capacities and the characteristics of the input data.

Default: 3

MaximumContextLength

Gets or sets the maximum context length (in tokens) that can be used for the model input. Reducing this value can dramatically increase inference speed on CPUs, as the computation scales with context length. However, this comes at the cost of higher perplexity, which may reduce the quality of model outputs. The value is clamped to the model's inherent maximum context length.

Default: Automatically determined based on hardware capabilities and model constraints (commonly ranging from 2048 to 8192 tokens).

Model

Gets the language model used for keyword extraction.

TextShrinkingStrategy

Gets or sets the strategy used to shrink content when the input exceeds the defined MaximumContextLength. Different strategies trade off between preserving semantic integrity and aggressively reducing length.

Default: Auto

Methods

ExtractKeywords(Attachment, CancellationToken)

Extracts a set of keywords synchronously from an image provided as an Attachment.

ExtractKeywords(string, CancellationToken)

Extracts a set of keywords synchronously from the given text content.

ExtractKeywordsAsync(Attachment, CancellationToken)

Asynchronously extracts a set of keywords from an image provided as an Attachment.

ExtractKeywordsAsync(string, CancellationToken)

Asynchronously extracts a set of keywords from the provided text content.