Table of Contents

Class KeywordExtraction

Namespace
LMKit.TextAnalysis
Assembly
LM-Kit.NET.dll

A class designed to handle keyword extraction tasks. This class provides functionality to extract a specified number of the most important keywords or phrases from a given piece of content, respecting constraints such as maximum n-gram size.

public sealed class KeywordExtraction
Inheritance
KeywordExtraction
Inherited Members

Examples

// Initialize the KeywordExtraction engine with a given model:
LLM model = new(new Uri("https://path-to-your-model"));
KeywordExtraction extractor = new(model)
{
    KeywordCount = 5,
    MaxNgramSize = 3
};

// Extract keywords from some text:
var keywords = extractor.ExtractKeywords("This is some sample text about artificial intelligence and machine learning.");

// Print the extracted keywords:
foreach (var keyword in keywords)
{
    Console.WriteLine(keyword.Value);
}

Constructors

KeywordExtraction(LLM)

Initializes a new instance of the KeywordExtraction class.

Properties

Confidence

Gets the confidence score of the last keyword extraction operation. This score is influenced by how the model terminated and how well the generation followed the requested schema.

Guidance

Gets or sets optional guidance text that can influence the extraction process. This can be used to steer the model towards certain themes or constraints.

KeywordCount

Gets or sets the number of keywords to extract. While this value sets a desired target, there is no guarantee that the model will produce the exact number of keywords requested. The actual number of extracted keywords depends on the model's capacities and the input data. However, the specified maximum number will never be exceeded. The value is clamped between 1 and 50.

MaxNgramSize

Gets or sets the maximum allowed n-gram size for extracted keywords. This value serves as a guidance for the extraction process but does not guarantee that the exact size or number of n-grams will be produced. The output depends on the model's capacities and the characteristics of the input data. The value is clamped between 1 and 20.

MaximumContextLength

Gets or sets the maximum context length (in tokens) that can be used for the model input. Reducing this value can dramatically increase inference speed on CPUs, as the computation scales with context length. However, this comes at the cost of higher perplexity, which may reduce the quality of model outputs. The value is clamped to the model's inherent maximum context length.

Model

Gets the language model used for keyword extraction.

TextShrinkingStrategy

Gets or sets the strategy used to shrink content when the input exceeds the defined MaximumContextLength.

Different strategies trade off between preserving semantic integrity and aggressively reducing length. For example:

By choosing the appropriate shrinking strategy, you can ensure that the input remains within the allowable context window while retaining the key information necessary for accurate keyword extraction.

Methods

ExtractKeywords(string, CancellationToken)

Extracts a set of keywords synchronously from the given content. If unsuccessful, throws an exception indicating the cause of the failure.

ExtractKeywordsAsync(string, CancellationToken)

Extracts a set of keywords asynchronously from the given content. If unsuccessful, throws an exception indicating the cause of the failure.