👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/vision/image-labeling/image_tag_generator

Image Tag Index & Lookup for C# .NET Applications

🎯 Purpose of the Demo

An interactive console app that tags every image in a folder (5–10 short, lower-case tags each), builds an inverted index, and provides REPL-style tag lookup over the index. Built on LM-Kit.NET's MultiTurnConversation combined with Grammar.PredefinedGrammar.JsonStringArray — the model is constrained to emit a valid JSON string array, so the output deserialises cleanly with System.Text.Json.

All inference runs on-device.

👥 Industry Target Audience

E-commerce / catalogues: auto-tag product imagery at upload time.
DAM (digital asset management): enrich the asset library with searchable tags.
Social platforms: generate discovery metadata for feed images.
Content moderation: surface candidate tags ("weapon", "nudity", "screenshot") as pre-filters.

🚀 Problem Solved

Free-form captioning is too verbose for indexing; single-label classification is too narrow. Tagging produces a small vector of descriptors that maps naturally onto search faceting, filtering, and analytics. Doing it correctly means the model output must always be parseable — this demo gets there with Grammar.PredefinedGrammar.JsonStringArray, which constrains decoding to a valid JSON string array. No regex fallback, no recovery code.

💻 Application Overview

Interactive menu — no command-line arguments — with four modes. Vision model is loaded once at startup.

Mode	What it does
Live	Type one image path; get a JSON-clean tag list back.
Index	Walk a folder, tag every image, build both `tags_index.json` (per-image tags) and `tags_inverted.csv` (tag → count + sample images).
Lookup	After indexing: REPL of tag queries; each one prints matching image paths.
Stats	Distinct image / tag counts, total tag uses, top-15 tag frequencies.
Quit	Exit.

Output artefacts (after Index):

tags_index.json — { "img.jpg": ["red", "shirt", "product photo"], ... }.
tags_inverted.csv — tag, count, sample_images.

✨ Key Features

Grammar.PredefinedGrammar.JsonStringArray — guarantees the model output is a valid JSON string array. No tolerant parsing.
System.Text.Json.JsonSerializer.Deserialize<string[]> reads the result directly.
Inverted index turns per-image tags into a real search artefact.
In-memory reuse of the index across the lookup REPL.
RandomSampling with low temperature for stable but varied tags.

🧠 Supported Models

Google Gemma 3 VL 4B (~3 GB VRAM) — fast default.
Google Gemma 3 VL 12B (~8 GB VRAM).
Alibaba Qwen 2 VL 2B / 7B (~2 / ~6 GB VRAM).
Any custom vision-language model URI.

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later
VRAM appropriate to the selected vision model

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/vision/image-labeling/image_tag_generator
dotnet run

Pick a model, pick a mode, follow the prompts.

Table of Contents