👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/vision/image-labeling/image_tag_generator
Image Tag Index & Lookup for C# .NET Applications
🎯 Purpose of the Demo
An interactive console app that tags every image in a folder (5–10 short, lower-case tags each), builds an inverted index, and provides REPL-style tag lookup over the index. Built on LM-Kit.NET's MultiTurnConversation combined with Grammar.PredefinedGrammar.JsonStringArray — the model is constrained to emit a valid JSON string array, so the output deserialises cleanly with System.Text.Json.
All inference runs on-device.
👥 Industry Target Audience
- E-commerce / catalogues: auto-tag product imagery at upload time.
- DAM (digital asset management): enrich the asset library with searchable tags.
- Social platforms: generate discovery metadata for feed images.
- Content moderation: surface candidate tags ("weapon", "nudity", "screenshot") as pre-filters.
🚀 Problem Solved
Free-form captioning is too verbose for indexing; single-label classification is too narrow. Tagging produces a small vector of descriptors that maps naturally onto search faceting, filtering, and analytics. Doing it correctly means the model output must always be parseable — this demo gets there with Grammar.PredefinedGrammar.JsonStringArray, which constrains decoding to a valid JSON string array. No regex fallback, no recovery code.
💻 Application Overview
Interactive menu — no command-line arguments — with four modes. Vision model is loaded once at startup.
| Mode | What it does |
|---|---|
| Live | Type one image path; get a JSON-clean tag list back. |
| Index | Walk a folder, tag every image, build both tags_index.json (per-image tags) and tags_inverted.csv (tag → count + sample images). |
| Lookup | After indexing: REPL of tag queries; each one prints matching image paths. |
| Stats | Distinct image / tag counts, total tag uses, top-15 tag frequencies. |
| Quit | Exit. |
Output artefacts (after Index):
tags_index.json—{ "img.jpg": ["red", "shirt", "product photo"], ... }.tags_inverted.csv—tag, count, sample_images.
✨ Key Features
Grammar.PredefinedGrammar.JsonStringArray— guarantees the model output is a valid JSON string array. No tolerant parsing.System.Text.Json.JsonSerializer.Deserialize<string[]>reads the result directly.- Inverted index turns per-image tags into a real search artefact.
- In-memory reuse of the index across the lookup REPL.
RandomSamplingwith low temperature for stable but varied tags.
🧠 Supported Models
- Google Gemma 3 VL 4B (~3 GB VRAM) — fast default.
- Google Gemma 3 VL 12B (~8 GB VRAM).
- Alibaba Qwen 2 VL 2B / 7B (~2 / ~6 GB VRAM).
- Any custom vision-language model URI.
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
- VRAM appropriate to the selected vision model
▶️ Running the Application
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/vision/image-labeling/image_tag_generator
dotnet run
Pick a model, pick a mode, follow the prompts.