Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/vision/image-labeling/image_tag_generator

Image Tag Index & Lookup for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that tags every image in a folder (5–10 short, lower-case tags each), builds an inverted index, and provides REPL-style tag lookup over the index. Built on LM-Kit.NET's MultiTurnConversation combined with Grammar.PredefinedGrammar.JsonStringArray — the model is constrained to emit a valid JSON string array, so the output deserialises cleanly with System.Text.Json.

All inference runs on-device.


👥 Industry Target Audience

  • E-commerce / catalogues: auto-tag product imagery at upload time.
  • DAM (digital asset management): enrich the asset library with searchable tags.
  • Social platforms: generate discovery metadata for feed images.
  • Content moderation: surface candidate tags ("weapon", "nudity", "screenshot") as pre-filters.

🚀 Problem Solved

Free-form captioning is too verbose for indexing; single-label classification is too narrow. Tagging produces a small vector of descriptors that maps naturally onto search faceting, filtering, and analytics. Doing it correctly means the model output must always be parseable — this demo gets there with Grammar.PredefinedGrammar.JsonStringArray, which constrains decoding to a valid JSON string array. No regex fallback, no recovery code.


💻 Application Overview

Interactive menu — no command-line arguments — with four modes. Vision model is loaded once at startup.

Mode What it does
Live Type one image path; get a JSON-clean tag list back.
Index Walk a folder, tag every image, build both tags_index.json (per-image tags) and tags_inverted.csv (tag → count + sample images).
Lookup After indexing: REPL of tag queries; each one prints matching image paths.
Stats Distinct image / tag counts, total tag uses, top-15 tag frequencies.
Quit Exit.

Output artefacts (after Index):

  • tags_index.json{ "img.jpg": ["red", "shirt", "product photo"], ... }.
  • tags_inverted.csvtag, count, sample_images.

✨ Key Features

  • Grammar.PredefinedGrammar.JsonStringArray — guarantees the model output is a valid JSON string array. No tolerant parsing.
  • System.Text.Json.JsonSerializer.Deserialize<string[]> reads the result directly.
  • Inverted index turns per-image tags into a real search artefact.
  • In-memory reuse of the index across the lookup REPL.
  • RandomSampling with low temperature for stable but varied tags.

🧠 Supported Models

  • Google Gemma 3 VL 4B (~3 GB VRAM) — fast default.
  • Google Gemma 3 VL 12B (~8 GB VRAM).
  • Alibaba Qwen 2 VL 2B / 7B (~2 / ~6 GB VRAM).
  • Any custom vision-language model URI.

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later
  • VRAM appropriate to the selected vision model

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/vision/image-labeling/image_tag_generator
dotnet run

Pick a model, pick a mode, follow the prompts.

Share