Search Images by Visual Similarity

Image similarity search converts images into numerical vectors (embeddings) and finds the closest matches in vector space. Instead of relying on filenames, tags, or manual labels, it compares the visual content of images directly. Two photos of a sunset will match each other even if they have completely different filenames and no metadata.

This tutorial builds a working image similarity search system: loading a vision embedding model, indexing a folder of images, and finding visually similar matches. It also covers text-to-image search, where a natural language description finds matching images.

Why Visual Similarity Search Matters

Two enterprise problems that on-device image similarity search solves:

E-commerce visual search. Customers upload a photo of a product they want to find. A visual search system matches it against the product catalog instantly, without requiring text descriptions. Running this locally keeps proprietary catalog data and customer images on-premises.
Duplicate and near-duplicate photo detection. Media libraries, digital asset management systems, and forensic workflows need to find duplicate or near-duplicate images across large collections. Embedding-based comparison catches rotated, cropped, and color-adjusted variants that hash-based methods miss.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
RAM	8+ GB
VRAM	2+ GB (vision embedding models are small)
Disk	~500 MB free for model download
Image files	A folder of `.jpg`, `.png`, or `.jpeg` images to index

Step 1: Create the Project

dotnet new console -n ImageSimilaritySearch
cd ImageSimilaritySearch
dotnet add package LM-Kit.NET

Step 2: Understand Image Embeddings

An image embedding model converts an image into a fixed-size vector of floating-point numbers. Images with similar visual content produce vectors that are close together in this high-dimensional space. Cosine similarity measures how close two vectors are (1.0 = visually identical, 0.0 = unrelated).

A multimodal embedding model like nomic-embed-vision shares the same vector space for both images and text. This means you can search images by text description as well as by visual similarity to another image.

                   ┌────────────────────┐
  beach.jpg   ───► │                    │──► [0.42, 0.18, -0.61, ...]  ─┐
                   │  Vision Embedding  │                                ├─ similarity: 0.93
  coast.jpg   ───► │  Model             │──► [0.40, 0.20, -0.58, ...]  ─┘
                   │                    │
  circuit.jpg ───► │                    │──► [-0.33, 0.71, 0.12, ...]  ── similarity: 0.07
                   └────────────────────┘

                   ┌────────────────────┐
  "a sandy    ───► │  Text Embedding    │──► [0.39, 0.21, -0.55, ...]  ── cross-modal match
   beach"          │  (shared space)    │                                  with beach.jpg
                   └────────────────────┘

Step 3: Write the Program

This program loads a vision embedding model, indexes all images in a folder, and supports both image-to-image and text-to-image search.

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);

string imageFolder = args.Length > 0 ? args[0] : "images";

if (!Directory.Exists(imageFolder))
{
    Console.WriteLine($"Folder not found: {imageFolder}");
    Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
    return;
}

string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
    .Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
    .ToArray();

Console.WriteLine($"Indexing {imageFiles.Length} images...\n");

var imageIndex = new List<(string Path, float[] Embedding)>();

foreach (string imagePath in imageFiles)
{
    ImageBuffer image = ImageBuffer.LoadAsRGB(imagePath);
    float[] embedding = imageEmbedder.GetEmbeddings(image);
    imageIndex.Add((imagePath, embedding));
    Console.WriteLine($"  Indexed: {Path.GetFileName(imagePath)}");
}

Console.WriteLine($"\nIndex complete: {imageIndex.Count} images ({imageIndex[0].Embedding.Length} dimensions)\n");

// ──────────────────────────────────────
// 3. Search by similarity to a query image
// ──────────────────────────────────────
Console.WriteLine("Enter an image filename to find similar images, or a text description for cross-modal search.");
Console.WriteLine("Type 'quit' to exit.\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Search: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    float[] queryEmbedding;

    // Determine if input is an image path or a text query
    if (File.Exists(input))
    {
        Console.WriteLine($"  Searching by image: {Path.GetFileName(input)}\n");
        ImageBuffer queryImage = ImageBuffer.LoadAsRGB(input);
        queryEmbedding = imageEmbedder.GetEmbeddings(queryImage);
    }
    else
    {
        Console.WriteLine($"  Searching by text: \"{input}\"\n");
        queryEmbedding = textEmbedder.GetEmbeddings(input);
    }

    var results = imageIndex
        .Select(item => new
        {
            item.Path,
            Similarity = Embedder.GetCosineSimilarity(queryEmbedding, item.Embedding)
        })
        .OrderByDescending(x => x.Similarity)
        .Take(5)
        .ToList();

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.WriteLine("  Top matches:");
    Console.ResetColor();

    foreach (var match in results)
    {
        Console.ForegroundColor = ConsoleColor.DarkGray;
        Console.Write($"  {match.Similarity:F4}  ");
        Console.ResetColor();
        Console.WriteLine(Path.GetFileName(match.Path));
    }

    Console.WriteLine();
}

Run it:

dotnet run -- "path/to/image/folder"

Step 4: Example Output

Loading vision embedding model...
  Loading: 100%  Done.

Loading text embedding model...
  Loading: 100%  Done.

Indexing 12 images...

  Indexed: beach-sunset.jpg
  Indexed: mountain-lake.jpg
  Indexed: city-skyline.jpg
  Indexed: golden-retriever.jpg
  Indexed: tabby-cat.jpg
  Indexed: red-sports-car.jpg
  Indexed: ocean-waves.jpg
  Indexed: forest-trail.jpg
  Indexed: office-desk.jpg
  Indexed: pizza-closeup.jpg
  Indexed: laptop-keyboard.jpg
  Indexed: flower-garden.jpg

Index complete: 12 images (768 dimensions)

Enter an image filename to find similar images, or a text description for cross-modal search.
Type 'quit' to exit.

Search: images/beach-sunset.jpg
  Searching by image: beach-sunset.jpg

  Top matches:
  1.0000  beach-sunset.jpg
  0.8734  ocean-waves.jpg
  0.6219  mountain-lake.jpg
  0.4102  forest-trail.jpg
  0.3011  flower-garden.jpg

Search: a cute pet animal
  Searching by text: "a cute pet animal"

  Top matches:
  0.7821  golden-retriever.jpg
  0.7544  tabby-cat.jpg
  0.3102  flower-garden.jpg
  0.2541  forest-trail.jpg
  0.1893  beach-sunset.jpg

Tuning Similarity Thresholds

The right threshold depends on your use case:

Use Case	Recommended Threshold	Rationale
Duplicate detection	0.95+	Only near-identical images
Visual search (similar items)	0.70+	Visually related but not identical
Broad category matching	0.40+	Same general category (e.g., "outdoor scenes")
Text-to-image search	0.30+	Cross-modal scores are typically lower than image-to-image

Apply a threshold to filter results:

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);

string imageFolder = args.Length > 0 ? args[0] : "images";

if (!Directory.Exists(imageFolder))
{
    Console.WriteLine($"Folder not found: {imageFolder}");
    Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
    return;
}

string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
    .Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
    .ToArray();

Console.WriteLine($"Indexing {imageFiles.Length} images...\n");

var imageIndex = new List<(string Path, float[] Embedding)>();

foreach (string imagePath in imageFiles)
{
    ImageBuffer image = ImageBuffer.LoadAsRGB(imagePath);
    float[] embedding = imageEmbedder.GetEmbeddings(image);
    imageIndex.Add((imagePath, embedding));
    Console.WriteLine($"  Indexed: {Path.GetFileName(imagePath)}");
}

Console.WriteLine($"\nIndex complete: {imageIndex.Count} images ({imageIndex[0].Embedding.Length} dimensions)\n");

// ──────────────────────────────────────
// 3. Search by similarity to a query image
// ──────────────────────────────────────
Console.WriteLine("Enter an image filename to find similar images, or a text description for cross-modal search.");
Console.WriteLine("Type 'quit' to exit.\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Search: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    float[] queryEmbedding;

    // Determine if input is an image path or a text query
    if (File.Exists(input))
    {
        Console.WriteLine($"  Searching by image: {Path.GetFileName(input)}\n");
        ImageBuffer queryImage = ImageBuffer.LoadAsRGB(input);
        queryEmbedding = imageEmbedder.GetEmbeddings(queryImage);
    }
    else
    {
        Console.WriteLine($"  Searching by text: \"{input}\"\n");
        queryEmbedding = textEmbedder.GetEmbeddings(input);
    }

    var results = imageIndex
        .Select(item => new
        {
            item.Path,
            Similarity = Embedder.GetCosineSimilarity(queryEmbedding, item.Embedding)
        })
        .OrderByDescending(x => x.Similarity)
        .Take(5)
        .ToList();

    float threshold = 0.70f;

    var filtered = results.Where(r => r.Similarity >= threshold).ToList();

    if (filtered.Count == 0)
    {
        Console.WriteLine("  No matches above the similarity threshold.");
    }
    else
    {
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine("  Matches above threshold:");
        Console.ResetColor();

        foreach (var match in filtered)
        {
            Console.ForegroundColor = ConsoleColor.DarkGray;
            Console.Write($"  {match.Similarity:F4}  ");
            Console.ResetColor();
            Console.WriteLine(Path.GetFileName(match.Path));
        }
    }

    Console.WriteLine();
}

Batch Indexing for Large Collections

When indexing hundreds or thousands of images, use the batch API for better throughput:

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Index images as embedding vectors
// ──────────────────────────────────────
var imageEmbedder = new Embedder(embeddingModel);
var textEmbedder = new Embedder(textModel);

string imageFolder = args.Length > 0 ? args[0] : "images";

if (!Directory.Exists(imageFolder))
{
    Console.WriteLine($"Folder not found: {imageFolder}");
    Console.WriteLine("Usage: dotnet run -- <path-to-image-folder>");
    return;
}

string[] imageFiles = Directory.GetFiles(imageFolder, "*.*")
    .Where(f => f.EndsWith(".jpg", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".png", StringComparison.OrdinalIgnoreCase) ||
                f.EndsWith(".jpeg", StringComparison.OrdinalIgnoreCase))
    .ToArray();

// Load all images into ImageBuffer objects
var images = imageFiles
    .Select(path => ImageBuffer.LoadAsRGB(path))
    .ToList();

// Batch embed all images at once
float[][] embeddings = imageEmbedder.GetEmbeddings(images);

// Build the index
var imageIndex = new List<(string Path, float[] Embedding)>();
for (int i = 0; i < imageFiles.Length; i++)
{
    imageIndex.Add((imageFiles[i], embeddings[i]));
}

For very large collections, consider persisting embeddings to disk so you only compute them once:

using System.Text;
using LMKit.Model;
using LMKit.Embeddings;
using LMKit.Media.Image;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load the vision embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading vision embedding model...");
using LM embeddingModel = LM.LoadFromModelID("nomic-embed-vision",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// Also load the text embedding model for cross-modal search
Console.WriteLine("Loading text embedding model...");
using LM textModel = LM.LoadFromModelID("nomic-embed-text",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });

const string IndexFile = "image_index.bin";

if (File.Exists(IndexFile))
{
    // Load pre-computed embeddings
    Console.WriteLine("Loading cached index...");
    // Deserialize your index from disk
}
else
{
    // Compute and save embeddings
    Console.WriteLine("Building index (first run)...");
    // Serialize the index after computing
}

Choosing an Embedding Model

Model ID	Type	Dimensions	Best For
`nomic-embed-vision`	Image only	768	Image-to-image similarity (recommended)
`nomic-embed-text`	Text only	768	Text queries in cross-modal search (paired with `nomic-embed-vision`)
`embeddinggemma-300m`	Text only	256	Text-only semantic search (not for images)
`harrier-oss:0.6b`	Text only	1024	Text-only multilingual search (not for images)

The nomic-embed-vision and nomic-embed-text models share an aligned vector space. This means image embeddings from one and text embeddings from the other are directly comparable, enabling cross-modal search.

Common Issues

Problem	Cause	Fix
`InvalidModelException: does not support image embeddings`	Using a text-only model for image embedding	Use `nomic-embed-vision` for image embeddings
Low similarity between obviously similar images	Images have very different resolutions or aspect ratios	Normalize images to similar sizes before embedding
Text-to-image scores are low	Cross-modal matching is inherently less precise	Lower the threshold to 0.30 for text queries; use descriptive text
Slow indexing on large folders	Processing hundreds of high-resolution images	Use batch embedding; resize images before loading
All scores cluster near 0.5	Collection contains very similar images	Normal for homogeneous collections; use a higher threshold

Next Steps

Build Semantic Search with Embeddings: text-based semantic search with the same embedding concepts.
Analyze Images with Vision Language Models: go beyond embeddings to generate descriptions and answer questions about images.
Build a RAG Pipeline Over Your Own Documents: combine embeddings with retrieval-augmented generation for document Q&A.

Table of Contents