Class Embedder
- Namespace
- LMKit.Embeddings
- Assembly
- LM-Kit.NET.dll
A class designed for generating embeddings from text and image. It facilitates execution of tasks related to natural language and coding, such as semantic search, clustering, topic modeling, and classification. Supports embeddings from:
- Plain text strings
- Tokenized text
- File attachments (e.g. images like PNG, JPEG, TIFF, and documents like TXT, HTML) when the underlying model provides image-embedding capabilities.
public sealed class Embedder
- Inheritance
-
Embedder
- Inherited Members
Examples
Example: Generate text embeddings
using LMKit.Model;
using LMKit.Embeddings;
using System;
// Load an embedding model
LM model = LM.LoadFromModelID("embeddinggemma-300m");
// Create the embedder
Embedder embedder = new Embedder(model);
// Generate embedding for a single text
float[] embedding = embedder.GetEmbeddings("Machine learning is fascinating.");
Console.WriteLine($"Embedding dimension: {embedding.Length}");
Console.WriteLine($"First 5 values: [{string.Join(", ", embedding.Take(5).Select(v => v.ToString("F4")))}...]");
Example: Batch embeddings for similarity comparison
using LMKit.Model;
using LMKit.Embeddings;
using System;
using System.Collections.Generic;
LM model = LM.LoadFromModelID("embeddinggemma-300m");
Embedder embedder = new Embedder(model);
// Generate embeddings for multiple texts
var texts = new List<string>
{
"The cat sat on the mat.",
"A feline rested on the rug.",
"The stock market closed higher today."
};
float[][] embeddings = embedder.GetEmbeddings(texts);
// Calculate cosine similarity between first two texts
float similarity = VectorOperations.CosineSimilarity(embeddings[0], embeddings[1]);
Console.WriteLine($"Similarity between text 1 and 2: {similarity:F4}");
// Compare with unrelated text
float dissimilarity = VectorOperations.CosineSimilarity(embeddings[0], embeddings[2]);
Console.WriteLine($"Similarity between text 1 and 3: {dissimilarity:F4}");
Example: Image embeddings (multimodal model)
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;
using System;
// Load a vision-enabled embedding model
LM model = LM.LoadFromModelID("nomic-embed-vision");
Embedder embedder = new Embedder(model);
// Generate embedding from an image
ImageBuffer image = ImageBuffer.LoadAsRGB("photo.jpg");
float[] imageEmbedding = embedder.GetEmbeddings(image);
Console.WriteLine($"Image embedding dimension: {imageEmbedding.Length}");
Remarks
Key Features
- Generate embeddings from text strings via GetEmbeddings(string, CancellationToken)
- Batch processing for multiple texts via GetEmbeddings(IEnumerable<string>, CancellationToken)
- Image embeddings (when model supports) via GetEmbeddings(ImageBuffer, CancellationToken)
- File attachment embeddings via GetEmbeddings(Attachment, CancellationToken)
- Token-based embeddings for pre-tokenized input
- Async variants for all methods
Common Use Cases
- Semantic search: Find similar documents by comparing embedding vectors
- Clustering: Group similar texts or images together
- Classification: Use embeddings as features for ML classifiers
- RAG systems: Generate embeddings for retrieval-augmented generation
The embedding dimension is determined by the model and can be queried via EmbeddingSize.
Constructors
- Embedder(LM)
Initializes a new instance of the Embedder class.
Properties
Methods
- GetCosineSimilarity(IList<float>, IList<float>)
Calculates the cosine similarity between two embedding vectors, representing the cosine of the angle between them in a multidimensional space.
- GetEmbeddings(Attachment, CancellationToken)
Generates the embedding vector for a given file attachment. If the associated LM supports image embeddings, image attachments (for example, PNG, JPEG, TIFF) will be processed into embeddings; otherwise, text‑based attachments (for example, TXT, HTML) will be used.
- GetEmbeddings(ImageBuffer, CancellationToken)
Generates the embedding vector for a given image.
- GetEmbeddings(IEnumerable<ImageBuffer>, CancellationToken)
Generates embedding vectors for a collection of images.
- GetEmbeddings(IEnumerable<IList<int>>, CancellationToken)
Generates embedding vectors for a collection of tokenized texts. Each vector represents a text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddings(IEnumerable<string>, CancellationToken)
Generates embedding vectors for a collection of text strings. Each vector represents a text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddings(IList<int>, CancellationToken)
Generates the embedding vector for a given tokenized text. This vector represents the text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddings(string, CancellationToken)
Generates the embedding vector for a given text string. This vector represents the text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddingsAsync(Attachment, CancellationToken)
Asynchronously generates the embedding vector for a given file attachment. If the associated LM supports image embeddings, image attachments (for example, PNG, JPEG, TIFF) will be processed into embeddings; otherwise, text‑based attachments (for example, TXT, HTML) will be used.
- GetEmbeddingsAsync(ImageBuffer, CancellationToken)
Asynchronously generates the embedding vector for a given image.
- GetEmbeddingsAsync(IEnumerable<ImageBuffer>, CancellationToken)
Asynchronously generates embedding vectors for a collection of images.
- GetEmbeddingsAsync(IEnumerable<IList<int>>, CancellationToken)
Asynchronously generates embedding vectors for a collection of tokenized texts. Each vector represents a text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddingsAsync(IEnumerable<string>, CancellationToken)
Asynchronously generates embedding vectors for a collection of text strings. Each vector represents a text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddingsAsync(IList<int>, CancellationToken)
Asynchronously generates the embedding vector for a given tokenized text. This vector represents the text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.
- GetEmbeddingsAsync(string, CancellationToken)
Asynchronously generates the embedding vector for a given text string. This vector represents the text in a high-dimensional space, enabling various natural language processing tasks by capturing semantic meaning.