Search Images by Visual Similarity
Image similarity search converts images into numerical vectors (embeddings) and finds the closest matches in vector space. Instead of relying on filenames, tags, or manual labels, it compares the visual content of images directly. Two photos of a sunset will match each other even if they have completely different filenames and no metadata.
This tutorial builds a working image similarity search system: loading a vision embedding model, indexing a folder of images, and finding visually similar matches. It also covers text-to-image search, where a natural language description finds matching images.
Why Visual Similarity Search Matters
Two enterprise problems that on-device image similarity search solves:
- E-commerce visual search. Customers upload a photo of a product they want to find. A visual search system matches it against the product catalog instantly, without requiring text descriptions. Running this locally keeps proprietary catalog data and customer images on-premises.
- Duplicate and near-duplicate photo detection. Media libraries, digital asset management systems, and forensic workflows need to find duplicate or near-duplicate images across large collections. Embedding-based comparison catches rotated, cropped, and color-adjusted variants that hash-based methods miss.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| RAM | 8+ GB |
| VRAM | 2+ GB (vision embedding models are small) |
| Disk | ~500 MB free for model download |
| Image files | A folder of .jpg, .png, or .jpeg images to index |
Step 1: Create the Project
dotnet new console -n ImageSimilaritySearch
cd ImageSimilaritySearch
dotnet add package LM-Kit.NET
Step 2: Understand Image Embeddings
An image embedding model converts an image into a fixed-size vector of floating-point numbers. Images with similar visual content produce vectors that are close together in this high-dimensional space. Cosine similarity measures how close two vectors are (1.0 = visually identical, 0.0 = unrelated).
A multimodal embedding model like nomic-embed-vision shares the same vector space for both images and text. This means you can search images by text description as well as by visual similarity to another image.
┌────────────────────┐
beach.jpg ───► │ │──► [0.42, 0.18, -0.61, ...] ─┐
│ Vision Embedding │ ├─ similarity: 0.93
coast.jpg ───► │ Model │──► [0.40, 0.20, -0.58, ...] ─┘
│ │
circuit.jpg ───► │ │──► [-0.33, 0.71, 0.12, ...] ── similarity: 0.07
└────────────────────┘
┌────────────────────┐
"a sandy ───► │ Text Embedding │──► [0.39, 0.21, -0.55, ...] ── cross-modal match
beach" │ (shared space) │ with beach.jpg
└────────────────────┘
Step 3: Write the Program
This program loads a vision embedding model, indexes all images in a folder, and supports both image-to-image and text-to-image search.
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);
string imageFolder = args.Length > 0 ? args[0] : "images";
if (!Directory.Exists(imageFolder))
{
Console.WriteLine($"Folder not found: {imageFolder}");
Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
return;
}
string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
.Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
.ToArray();
Console.WriteLine($"Indexing {imageFiles.Length} images...\n");
var imageIndex = new List<(string Path, float[] Embedding)>();
foreach (string imagePath in imageFiles)
{
ImageBuffer image = ImageBuffer.LoadAsRGB(imagePath);
float[] embedding = imageEmbedder.GetEmbeddings(image);
imageIndex.Add((imagePath, embedding));
Console.WriteLine($" Indexed: {Path.GetFileName(imagePath)}");
}
Console.WriteLine($"\nIndex complete: {imageIndex.Count} images ({imageIndex[0].Embedding.Length} dimensions)\n");
// ──────────────────────────────────────
// 3. Search by similarity to a query image
// ──────────────────────────────────────
Console.WriteLine("Enter an image filename to find similar images, or a text description for cross-modal search.");
Console.WriteLine("Type 'quit' to exit.\n");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Search: ");
Console.ResetColor();
string? input = Console.ReadLine();
if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
break;
float[] queryEmbedding;
// Determine if input is an image path or a text query
if (File.Exists(input))
{
Console.WriteLine($" Searching by image: {Path.GetFileName(input)}\n");
ImageBuffer queryImage = ImageBuffer.LoadAsRGB(input);
queryEmbedding = imageEmbedder.GetEmbeddings(queryImage);
}
else
{
Console.WriteLine($" Searching by text: \"{input}\"\n");
queryEmbedding = textEmbedder.GetEmbeddings(input);
}
var results = imageIndex
.Select(item => new
{
item.Path,
Similarity = Embedder.GetCosineSimilarity(queryEmbedding, item.Embedding)
})
.OrderByDescending(x => x.Similarity)
.Take(5)
.ToList();
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine(" Top matches:");
Console.ResetColor();
foreach (var match in results)
{
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.Write($" {match.Similarity:F4} ");
Console.ResetColor();
Console.WriteLine(Path.GetFileName(match.Path));
}
Console.WriteLine();
}
Run it:
dotnet run -- "path/to/image/folder"
Step 4: Example Output
Loading vision embedding model...
Loading: 100% Done.
Loading text embedding model...
Loading: 100% Done.
Indexing 12 images...
Indexed: beach-sunset.jpg
Indexed: mountain-lake.jpg
Indexed: city-skyline.jpg
Indexed: golden-retriever.jpg
Indexed: tabby-cat.jpg
Indexed: red-sports-car.jpg
Indexed: ocean-waves.jpg
Indexed: forest-trail.jpg
Indexed: office-desk.jpg
Indexed: pizza-closeup.jpg
Indexed: laptop-keyboard.jpg
Indexed: flower-garden.jpg
Index complete: 12 images (768 dimensions)
Enter an image filename to find similar images, or a text description for cross-modal search.
Type 'quit' to exit.
Search: images/beach-sunset.jpg
Searching by image: beach-sunset.jpg
Top matches:
1.0000 beach-sunset.jpg
0.8734 ocean-waves.jpg
0.6219 mountain-lake.jpg
0.4102 forest-trail.jpg
0.3011 flower-garden.jpg
Search: a cute pet animal
Searching by text: "a cute pet animal"
Top matches:
0.7821 golden-retriever.jpg
0.7544 tabby-cat.jpg
0.3102 flower-garden.jpg
0.2541 forest-trail.jpg
0.1893 beach-sunset.jpg
Tuning Similarity Thresholds
The right threshold depends on your use case:
| Use Case | Recommended Threshold | Rationale |
|---|---|---|
| Duplicate detection | 0.95+ | Only near-identical images |
| Visual search (similar items) | 0.70+ | Visually related but not identical |
| Broad category matching | 0.40+ | Same general category (e.g., "outdoor scenes") |
| Text-to-image search | 0.30+ | Cross-modal scores are typically lower than image-to-image |
Apply a threshold to filter results:
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);
string imageFolder = args.Length > 0 ? args[0] : "images";
if (!Directory.Exists(imageFolder))
{
Console.WriteLine($"Folder not found: {imageFolder}");
Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
return;
}
string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
.Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
.ToArray();
Console.WriteLine($"Indexing {imageFiles.Length} images...\n");
var imageIndex = new List<(string Path, float[] Embedding)>();
foreach (string imagePath in imageFiles)
{
ImageBuffer image = ImageBuffer.LoadAsRGB(imagePath);
float[] embedding = imageEmbedder.GetEmbeddings(image);
imageIndex.Add((imagePath, embedding));
Console.WriteLine($" Indexed: {Path.GetFileName(imagePath)}");
}
Console.WriteLine($"\nIndex complete: {imageIndex.Count} images ({imageIndex[0].Embedding.Length} dimensions)\n");
// ──────────────────────────────────────
// 3. Search by similarity to a query image
// ──────────────────────────────────────
Console.WriteLine("Enter an image filename to find similar images, or a text description for cross-modal search.");
Console.WriteLine("Type 'quit' to exit.\n");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Search: ");
Console.ResetColor();
string? input = Console.ReadLine();
if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
break;
float[] queryEmbedding;
// Determine if input is an image path or a text query
if (File.Exists(input))
{
Console.WriteLine($" Searching by image: {Path.GetFileName(input)}\n");
ImageBuffer queryImage = ImageBuffer.LoadAsRGB(input);
queryEmbedding = imageEmbedder.GetEmbeddings(queryImage);
}
else
{
Console.WriteLine($" Searching by text: \"{input}\"\n");
queryEmbedding = textEmbedder.GetEmbeddings(input);
}
var results = imageIndex
.Select(item => new
{
item.Path,
Similarity = Embedder.GetCosineSimilarity(queryEmbedding, item.Embedding)
})
.OrderByDescending(x => x.Similarity)
.Take(5)
.ToList();
float threshold = 0.70f;
var filtered = results.Where(r => r.Similarity >= threshold).ToList();
if (filtered.Count == 0)
{
Console.WriteLine(" No matches above the similarity threshold.");
}
else
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine(" Matches above threshold:");
Console.ResetColor();
foreach (var match in filtered)
{
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.Write($" {match.Similarity:F4} ");
Console.ResetColor();
Console.WriteLine(Path.GetFileName(match.Path));
}
}
Console.WriteLine();
}
Batch Indexing for Large Collections
When indexing hundreds or thousands of images, use the batch API for better throughput:
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);
string imageFolder = args.Length > 0 ? args[0] : "images";
if (!Directory.Exists(imageFolder))
{
Console.WriteLine($"Folder not found: {imageFolder}");
Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
return;
}
string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
.Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
.ToArray();
// Load all images into ImageBuffer objects
var images = imageFiles
.Select(path => ImageBuffer.LoadAsRGB(path))
.ToList();
// Batch embed all images at once
float[][] embeddings = imageEmbedder.GetEmbeddings(images);
// Build the index
var imageIndex = new List<(string Path, float[] Embedding)>();
for (int i = 0; i < imageFiles.Length; i++)
{
imageIndex.Add((imageFiles[i], embeddings[i]));
}
For very large collections, consider persisting embeddings to disk so you only compute them once:
using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
const string IndexFile = "image_index.bin";
if (File.Exists(IndexFile))
{
// Load pre-computed embeddings
Console.WriteLine("Loading cached index...");
// Deserialize your index from disk
}
else
{
// Compute and save embeddings
Console.WriteLine("Building index (first run)...");
// Serialize the index after computing
}
Choosing an Embedding Model
| Model ID | Type | Dimensions | Best For |
|---|---|---|---|
nomic-embed-vision |
Image only | 768 | Image-to-image similarity (recommended) |
nomic-embed-text |
Text only | 768 | Text queries in cross-modal search (paired with nomic-embed-vision) |
embeddinggemma-300m |
Text only | 256 | Text-only semantic search (not for images) |
The nomic-embed-vision and nomic-embed-text models share an aligned vector space. This means image embeddings from one and text embeddings from the other are directly comparable, enabling cross-modal search.
Common Issues
| Problem | Cause | Fix |
|---|---|---|
InvalidModelException: does not support image embeddings |
Using a text-only model for image embedding | Use nomic-embed-vision for image embeddings |
| Low similarity between obviously similar images | Images have very different resolutions or aspect ratios | Normalize images to similar sizes before embedding |
| Text-to-image scores are low | Cross-modal matching is inherently less precise | Lower the threshold to 0.30 for text queries; use descriptive text |
| Slow indexing on large folders | Processing hundreds of high-resolution images | Use batch embedding; resize images before loading |
| All scores cluster near 0.5 | Collection contains very similar images | Normal for homogeneous collections; use a higher threshold |
Next Steps
- Build Semantic Search with Embeddings: text-based semantic search with the same embedding concepts.
- Analyze Images with Vision Language Models: go beyond embeddings to generate descriptions and answer questions about images.
- Build a RAG Pipeline Over Your Own Documents: combine embeddings with retrieval-augmented generation for document Q&A.