Class KeywordExtraction
- Namespace
- LMKit.TextAnalysis
- Assembly
- LM-Kit.NET.dll
A class designed to handle keyword extraction tasks. This class provides functionality to extract a specified number of the most important keywords or phrases from a given piece of content, while respecting constraints such as maximum n-gram size.
public sealed class KeywordExtraction
- Inheritance
-
KeywordExtraction
- Inherited Members
Remarks
This class supports extraction from both text and images.
- For text extraction, use the ExtractKeywords(string, CancellationToken) and ExtractKeywordsAsync(string, CancellationToken) methods.
- For image extraction, use the ExtractKeywords(Attachment, CancellationToken) and ExtractKeywordsAsync(Attachment, CancellationToken) methods.
Constructors
- KeywordExtraction(LM)
Initializes a new instance of the KeywordExtraction class with the specified language model.
Properties
- Confidence
Gets the confidence score of the last keyword extraction operation. This score is influenced by how the model terminated and how well the generation followed the requested schema.
- Guidance
Gets or sets optional guidance text that can influence the extraction process. This can be used to steer the model towards certain themes or constraints.
- KeywordCount
Gets or sets the number of keywords to extract. While this value sets a desired target, there is no guarantee that the model will produce the exact number of keywords requested. The actual number of extracted keywords depends on the model's capacities and the input data. However, the specified maximum number will never be exceeded.
Default: 5
- MaxNgramSize
Gets or sets the maximum allowed n-gram size for extracted keywords. This value serves as guidance for the extraction process but does not guarantee that each extracted keyword or phrase will exactly match the specified size. The output depends on the model's capacities and the characteristics of the input data.
Default: 3
- MaximumContextLength
Gets or sets the maximum context length (in tokens) that can be used for the model input. Reducing this value can dramatically increase inference speed on CPUs, as the computation scales with context length. However, this comes at the cost of higher perplexity, which may reduce the quality of model outputs. The value is clamped to the model's inherent maximum context length.
Default: Automatically determined based on hardware capabilities and model constraints (commonly ranging from 2048 to 8192 tokens).
- Model
Gets the language model used for keyword extraction.
- TextShrinkingStrategy
Gets or sets the strategy used to shrink content when the input exceeds the defined MaximumContextLength. Different strategies trade off between preserving semantic integrity and aggressively reducing length.
Default: Auto
Methods
- ExtractKeywords(Attachment, CancellationToken)
Extracts a set of keywords synchronously from an image provided as an Attachment.
- ExtractKeywords(string, CancellationToken)
Extracts a set of keywords synchronously from the given text content.
- ExtractKeywordsAsync(Attachment, CancellationToken)
Asynchronously extracts a set of keywords from an image provided as an Attachment.
- ExtractKeywordsAsync(string, CancellationToken)
Asynchronously extracts a set of keywords from the provided text content.