Class KeywordExtraction
- Namespace
- LMKit.TextAnalysis
- Assembly
- LM-Kit.NET.dll
A class designed to handle keyword extraction tasks. This class provides functionality to extract a specified number of the most important keywords or phrases from a given piece of content, respecting constraints such as maximum n-gram size.
public sealed class KeywordExtraction
- Inheritance
-
KeywordExtraction
- Inherited Members
Examples
// Initialize the KeywordExtraction engine with a given model:
LLM model = new(new Uri("https://path-to-your-model"));
KeywordExtraction extractor = new(model)
{
KeywordCount = 5,
MaxNgramSize = 3
};
// Extract keywords from some text:
var keywords = extractor.ExtractKeywords("This is some sample text about artificial intelligence and machine learning.");
// Print the extracted keywords:
foreach (var keyword in keywords)
{
Console.WriteLine(keyword.Value);
}
Constructors
- KeywordExtraction(LLM)
Initializes a new instance of the KeywordExtraction class.
Properties
- Confidence
Gets the confidence score of the last keyword extraction operation. This score is influenced by how the model terminated and how well the generation followed the requested schema.
- Guidance
Gets or sets optional guidance text that can influence the extraction process. This can be used to steer the model towards certain themes or constraints.
- KeywordCount
Gets or sets the number of keywords to extract. While this value sets a desired target, there is no guarantee that the model will produce the exact number of keywords requested. The actual number of extracted keywords depends on the model's capacities and the input data. However, the specified maximum number will never be exceeded. The value is clamped between 1 and 50.
- MaxNgramSize
Gets or sets the maximum allowed n-gram size for extracted keywords. This value serves as a guidance for the extraction process but does not guarantee that the exact size or number of n-grams will be produced. The output depends on the model's capacities and the characteristics of the input data. The value is clamped between 1 and 20.
- MaximumContextLength
Gets or sets the maximum context length (in tokens) that can be used for the model input. Reducing this value can dramatically increase inference speed on CPUs, as the computation scales with context length. However, this comes at the cost of higher perplexity, which may reduce the quality of model outputs. The value is clamped to the model's inherent maximum context length.
- Model
Gets the language model used for keyword extraction.
- TextShrinkingStrategy
Gets or sets the strategy used to shrink content when the input exceeds the defined MaximumContextLength.
Different strategies trade off between preserving semantic integrity and aggressively reducing length. For example:
- Auto: Automatically selects the best approach.
- RemoveWords: Removes less important words without losing the overall structure.
- RemoveLines: Removes entire lines at random.
- SummarizeText: Summarizes and condenses the text.
- TrimTop or TrimBottom: Trims content from the top or bottom.
By choosing the appropriate shrinking strategy, you can ensure that the input remains within the allowable context window while retaining the key information necessary for accurate keyword extraction.
Methods
- ExtractKeywords(string, CancellationToken)
Extracts a set of keywords synchronously from the given content. If unsuccessful, throws an exception indicating the cause of the failure.
- ExtractKeywordsAsync(string, CancellationToken)
Extracts a set of keywords asynchronously from the given content. If unsuccessful, throws an exception indicating the cause of the failure.