Table of Contents

What Languages Can LM-Kit.NET Models Understand and Generate?


TL;DR

Most models in the LM-Kit.NET catalog are multilingual by training. The Qwen 3.5 and Gemma 4 families support dozens of languages out of the box, including English, Chinese, Japanese, Korean, French, German, Spanish, Arabic, Russian, and many more. LM-Kit.NET also provides dedicated translation and language detection APIs for structured multilingual workflows.


Multilingual Models in the Catalog

The language capabilities depend on the model you choose. Here are the most multilingual families:

Model Family Languages Notes
Qwen 3.5 (qwen3.5:0.8b to qwen3.5:27b) 30+ languages Strong multilingual instruction following. Excellent for Chinese, Japanese, Korean, and European languages.
Qwen 3 Embedding (qwen3-embedding:*) 30+ languages Multilingual semantic search and RAG.
Gemma 4 (gemma4:e4b to gemma4:26b-a4b) 30+ languages Broad multilingual support. Good for European and Asian languages.
Mistral / Magistral (mistral-small, magistral-small) 20+ languages Strong for European languages, especially French.
Llama 3.1 (llama3.1:8b) 8 languages English, German, French, Italian, Portuguese, Hindi, Spanish, Thai.
BGE-M3 (bge-m3) 100+ languages Multilingual embedding model. Excellent for cross-language search.
Whisper (whisper-*) 99 languages Speech-to-text transcription across nearly all major languages.

Translation API

LM-Kit.NET includes a dedicated TextTranslation class that handles translation between any supported language pair:

using LMKit.Model;
using LMKit.Translation;

using LM model = LM.LoadFromModelID("qwen3.5:9b");

var translator = new TextTranslation(model);
var result = translator.Translate(
    text: "How do I configure GPU backends?",
    sourceLanguage: Language.English,
    targetLanguage: Language.French
);

Console.WriteLine(result);
// "Comment configurer les backends GPU ?"

The translation API automatically handles text chunking for long documents and preserves formatting.


Language Detection

LM-Kit.NET can automatically detect the language of input text, including specialized refiners for challenging language families:

using LMKit.Translation;

var detector = new LanguageDetection(model);
var result = detector.DetectLanguage("Bonjour, comment allez-vous ?");

Console.WriteLine($"Language: {result.Language}");     // French
Console.WriteLine($"Confidence: {result.Confidence}"); // 0.98

The detection engine includes specialized components for distinguishing between:

  • CJK languages (Chinese, Japanese, Korean)
  • Cyrillic languages (Russian, Ukrainian, Bulgarian, Serbian)
  • Slavic languages (Polish, Czech, Slovak, Croatian)

Multilingual RAG

For multilingual knowledge bases, use a multilingual embedding model so queries in one language can retrieve documents written in another:

using LMKit.Model;
using LMKit.Retrieval;

// BGE-M3 supports 100+ languages for cross-language retrieval
using LM embeddingModel = LM.LoadFromModelID("bge-m3");
var ragEngine = new RagEngine(embeddingModel);

// Index documents in multiple languages
ragEngine.ImportDocument("manual-en.pdf");
ragEngine.ImportDocument("manual-fr.pdf");
ragEngine.ImportDocument("manual-de.pdf");

// Query in any language retrieves relevant passages regardless of document language

Tips for Non-English Use Cases

  • Choose Qwen 3.5 or Gemma 4 for the broadest language coverage in chat and generation tasks.
  • Use BGE-M3 for multilingual embeddings and cross-language RAG pipelines.
  • Use Whisper for multilingual speech-to-text. The whisper-large-turbo3 model offers the best accuracy across all 99 supported languages.
  • Larger models perform better on non-English languages. If quality in your target language is insufficient with a 4B model, try 8B or larger.
  • System prompts in the target language often improve output quality for non-English tasks.

Share