Build Semantic Search with Embeddings

Semantic search finds results by meaning, not keywords. Instead of matching exact words, it converts text into numerical vectors (embeddings) and finds the closest matches in vector space. This means a search for "how to fix a broken pipe" will match a document about "plumbing repair techniques" even though they share no common words.

This tutorial builds a working semantic search system from scratch: generating embeddings, computing similarity, and searching a document collection.

Why Local Semantic Search Matters

Two enterprise problems that on-device semantic search solves:

Sensitive knowledge base search. Internal wikis, HR policies, legal contracts, and proprietary research contain information that cannot be sent to cloud embedding APIs. Local embedding generation keeps semantic fingerprints of your data entirely on-premises.
Offline search for field workers. Technicians querying equipment manuals, inspectors searching compliance documents, and researchers browsing paper archives need semantic search without internet connectivity.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	1 GB (embedding models are small)
Disk	~500 MB free for model download

Step 1: Create the Project

dotnet new console -n SemanticSearchQuickstart
cd SemanticSearchQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand Embeddings

An embedding model converts text into a fixed-size vector of floating-point numbers. Texts with similar meanings produce vectors that are close together in this high-dimensional space. Cosine similarity measures how close two vectors are (1.0 = identical meaning, 0.0 = unrelated).

  "plumbing repair"  ──►  [0.12, 0.85, -0.33, ...]  ─┐
                                                        ├─ similarity: 0.91
  "fix broken pipe"  ──►  [0.14, 0.82, -0.30, ...]  ─┘

  "quantum physics"  ──►  [-0.71, 0.02, 0.55, ...]  ── similarity: 0.08

Step 3: Basic Embedding and Similarity

This program generates embeddings for a set of documents and finds the most relevant ones for a user query.

using System.Text;
using LMKit.Embeddings;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM model = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define a document collection
// ──────────────────────────────────────
string[] documents =
{
    "LM-Kit.NET provides local LLM inference for .NET applications.",
    "Retrieval-augmented generation combines search with text generation.",
    "GPU acceleration significantly speeds up model inference.",
    "Sentiment analysis classifies text as positive, negative, or neutral.",
    "Embeddings convert text into numerical vectors for similarity search.",
    "Fine-tuning adapts a pre-trained model to a specific domain.",
    "Voice activity detection identifies speech segments in audio.",
    "Vision language models can describe and analyze images.",
    "Agent orchestration coordinates multiple AI agents on a shared task.",
    "Structured extraction pulls typed fields from unstructured text."
};

// ──────────────────────────────────────
// 3. Generate embeddings for all documents
// ──────────────────────────────────────
var embedder = new Embedder(model);

Console.WriteLine("Generating embeddings for document collection...");
float[][] documentEmbeddings = embedder.GetEmbeddings(documents);
Console.WriteLine($"  Indexed {documents.Length} documents ({documentEmbeddings[0].Length} dimensions)\n");

// ──────────────────────────────────────
// 4. Search loop
// ──────────────────────────────────────
Console.WriteLine("Enter a search query (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Search: ");
    Console.ResetColor();

    string? query = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(query) || query.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    // Embed the query
    float[] queryEmbedding = embedder.GetEmbeddings(query);

    // Compute similarity against all documents
    var results = new List<(int Index, float Score)>();
    for (int i = 0; i < documents.Length; i++)
    {
        float score = Embedder.GetCosineSimilarity(queryEmbedding, documentEmbeddings[i]);
        results.Add((i, score));
    }

    // Sort by similarity (highest first)
    results.Sort((a, b) => b.Score.CompareTo(a.Score));

    // Show top 3 results
    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.WriteLine("\n  Top matches:");
    Console.ResetColor();

    for (int i = 0; i < Math.Min(3, results.Count); i++)
    {
        var (index, score) = results[i];
        Console.ForegroundColor = ConsoleColor.DarkGray;
        Console.Write($"  {score:F3}  ");
        Console.ResetColor();
        Console.WriteLine(documents[index]);
    }

    Console.WriteLine();
}

Run it:

dotnet run

Example session:

Search: how do I make the model faster?
  Top matches:
  0.847  GPU acceleration significantly speeds up model inference.
  0.712  LM-Kit.NET provides local LLM inference for .NET applications.
  0.534  Fine-tuning adapts a pre-trained model to a specific domain.

Search: understanding images
  Top matches:
  0.821  Vision language models can describe and analyze images.
  0.503  Embeddings convert text into numerical vectors for similarity search.
  0.412  Structured extraction pulls typed fields from unstructured text.

Step 4: Scaling Up with DataSource

For larger collections, use DataSource to persist embeddings to disk so you don't re-embed on every run:

using LMKit.Data;
using LMKit.Retrieval;

const string IndexPath = "search_index.dat";

DataSource dataSource;
if (File.Exists(IndexPath))
{
    Console.WriteLine("Loading existing index...");
    dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
    dataSource = DataSource.CreateFileDataSource(IndexPath, "Documents", model);
}

var rag = new RagEngine(model);
rag.AddDataSource(dataSource);
rag.DefaultIChunking = new TextChunking
{
    MaxChunkSize = 300,
    MaxOverlapSize = 50
};

// Index new documents
if (!dataSource.HasSection("manual"))
{
    string content = File.ReadAllText("docs/user-manual.txt");
    rag.ImportText(content, "Documents", "manual");
}

// Search
var matches = rag.FindMatchingPartitions("how to reset settings", topK: 5, minScore: 0.3f);

foreach (var match in matches)
{
    Console.WriteLine($"  [{match.SectionIdentifier}] score={match.Similarity:F3}");
    Console.WriteLine($"  {match.Payload}\n");
}

This approach indexes once and loads instantly on subsequent runs.

Step 5: Batch Embedding for Performance

When indexing large collections, use the batch API to embed multiple texts in a single call:

using System.Text;
using LMKit.Embeddings;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load embedding model
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM model = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define a document collection
// ──────────────────────────────────────
string[] documents =
{
    "LM-Kit.NET provides local LLM inference for .NET applications.",
    "Retrieval-augmented generation combines search with text generation.",
    "GPU acceleration significantly speeds up model inference.",
    "Sentiment analysis classifies text as positive, negative, or neutral.",
    "Embeddings convert text into numerical vectors for similarity search.",
    "Fine-tuning adapts a pre-trained model to a specific domain.",
    "Voice activity detection identifies speech segments in audio.",
    "Vision language models can describe and analyze images.",
    "Agent orchestration coordinates multiple AI agents on a shared task.",
    "Structured extraction pulls typed fields from unstructured text."
};

// ──────────────────────────────────────
// 3. Generate embeddings for all documents
// ──────────────────────────────────────
var embedder = new Embedder(model);

// Batch embedding (more efficient than one-by-one)
string[] texts = File.ReadAllLines("products.csv");
float[][] embeddings = embedder.GetEmbeddings(texts);

// Async version for non-blocking operation
float[][] asyncEmbeddings = await embedder.GetEmbeddingsAsync(texts);

Choosing an Embedding Model

Model ID	Dimensions	Size	Best For
`embeddinggemma-300m`	256	~300 MB	General-purpose, fast (recommended start)
`nomic-embed-text`	768	~260 MB	Higher-quality text embeddings

Both models are downloaded automatically with LoadFromModelID. Use embeddinggemma-300m as a default.

Common Issues

Problem	Cause	Fix
Low similarity scores for related texts	Embedding model not suited to domain	Try `nomic-embed-text`, or fine-tune on domain data
All scores cluster near 0.5	Texts too short (few tokens)	Provide more context per document; chunk at 200-500 tokens
Slow batch embedding	Large corpus on CPU	Use GPU backend; embed offline and persist with `DataSource`
Index file grows too large	Many large documents	Reduce chunk size; use `TextChunking` with smaller `MaxChunkSize`

Next Steps

Build a RAG Pipeline Over Your Own Documents: combine semantic search with LLM-generated answers.
Improve RAG Results with Reranking: add a cross-encoder reranker for higher precision.
Optimize RAG with Custom Chunking Strategies: tailor text splitting for your content type.
Build a Private Document Q&A System: full PDF search and Q&A.
Samples: Single-Turn RAG: single-turn RAG demo.
Samples: Conversational RAG: multi-turn RAG with RagChat.
Samples: Image Similarity Search: visual search using image embeddings.

Table of Contents