Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/text-analysis/embeddings/multimodal_embeddings

Multimodal Embeddings for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that embeds text and images into a shared vector space (nomic-embed-text + nomic-embed-vision) and exposes three cross-modal workflows: cosine matrix, folder search by text, and image-to-tags ranking.

All processing runs on-device.


👥 Industry Target Audience

  • E-commerce / DAM: text-to-image search over product photos.
  • Photo apps: "find images matching this phrase" without per-query VLM calls.
  • Auto-tagging: rank candidate tags against an image by cosine similarity.
  • Multimodal RAG: align text and image chunks in one index.
  • Moderation: text-prompt screening against image content.

🚀 Problem Solved

Text-to-image search, image-to-text classification, and reverse image search all collapse to one operation: cosine similarity in a shared embedding space. Aligned models make that possible without any per-query VLM call. The demo wraps that single operation into three menu modes that cover the three most-asked workflows.


💻 Application Overview

Interactive menu (no command-line arguments) with three modes:

Mode What it does
Matrix Type captions and image paths, print the full cosine similarity matrix.
Search Embed every image in a folder once; type repeated text queries; see top-K matches per query.
Tag Embed one image, type candidate tags, see top-K tags ranked by similarity.
Quit Exit.

✨ Key Features

  • Aligned model pair: nomic-embed-text + nomic-embed-vision share one embedding space.
  • Batch text embedding: Embedder.GetEmbeddings(IEnumerable<string>).
  • Image embedding: Embedder.GetEmbeddings(ImageBuffer).
  • Cross-modal score: Embedder.GetCosineSimilarity(textVec, imageVec).
  • Reusable image index: search mode embeds the folder once and reuses across queries.

🧠 Models

  • nomic-embed-text (text side, ~270 MB).
  • nomic-embed-vision (vision side, aligned with the text model).

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/text-analysis/embeddings/multimodal_embeddings
dotnet run

Both models load once at startup. Pick a mode from the menu.

Share