Table of Contents

Route Prompts Across Models with RouterOrchestrator

Not every prompt needs the same model. A simple greeting wastes resources on a 12B parameter model, while a complex reasoning task fails on a 1B model. LM-Kit.NET's RouterOrchestrator lets you define named routes, each backed by a different agent (and potentially a different model), then direct each incoming prompt to the right agent automatically. Combined with FallbackAgentExecutor for resilience, you can build heterogeneous architectures that cut inference costs by 80%+ while maintaining quality where it matters. This guide builds a production-ready prompt routing system from scratch.


Why Prompt Routing Matters

Two production problems that prompt routing solves:

  1. Inference cost optimization at scale. Running every request through your largest model is the default approach, but 70% of production prompts are simple lookups, greetings, or short answers. Routing these to a small, fast model and reserving the large model for complex reasoning tasks reduces GPU utilization and latency dramatically. Organizations running thousands of requests per minute see immediate impact.
  2. Specialized quality across domains. A model trained for code generation might underperform on creative writing, and vice versa. Routing code questions to a code-specialized agent and general questions to a chat-optimized agent ensures each domain gets the best available model, without requiring a single model that excels at everything.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 8+ GB (two models loaded simultaneously)
Disk ~6 GB free for model downloads

Note: This guide loads two models simultaneously. If VRAM is limited, you can use the same model for all agents and focus on the routing logic. The architecture remains identical.


Step 1: Create the Project

dotnet new console -n PromptRouter
cd PromptRouter
dotnet add package LM-Kit.NET

Step 2: Load Multiple Models and Create Specialized Agents

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Orchestration;
using LMKit.Agents.Resilience;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a fast model for simple tasks
// ──────────────────────────────────────
Console.WriteLine("Loading fast model (1B)...");
using LM fastModel = LM.LoadFromModelID("gemma3:1b",
    loadingProgress: p => { Console.Write($"\rFast model: {p * 100:F0}%   "); return true; });
Console.WriteLine();

// ──────────────────────────────────────
// 2. Load a capable model for complex tasks
// ──────────────────────────────────────
Console.WriteLine("Loading capable model (4B)...");
using LM capableModel = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rCapable model: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 3. Create specialized agents
// ──────────────────────────────────────
Agent quickAgent = Agent.CreateBuilder(fastModel)
    .WithPersona("quick-responder")
    .WithInstruction(
        "You are a fast, concise assistant for simple questions. " +
        "Answer in 1 to 2 sentences maximum. Be direct.")
    .WithMaxIterations(1)
    .Build();

Agent reasoningAgent = Agent.CreateBuilder(capableModel)
    .WithPersona("reasoning-expert")
    .WithInstruction(
        "You are a thorough reasoning assistant for complex questions. " +
        "Think step by step, provide detailed explanations.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.Calculator);
        tools.Register(BuiltInTools.DateTime);
    })
    .WithMaxIterations(5)
    .Build();

Agent codeAgent = Agent.CreateBuilder(capableModel)
    .WithPersona("code-specialist")
    .WithInstruction(
        "You are a coding assistant specialized in C# and .NET. " +
        "Provide clean, well-documented code with explanations.")
    .WithMaxIterations(3)
    .Build();

Console.WriteLine("All agents ready.\n");

Step 3: Configure the RouterOrchestrator

The RouterOrchestrator maps named routes to agents and uses a routing function to direct each prompt:

// ──────────────────────────────────────
// 4. Build the router with named routes
// ──────────────────────────────────────
var router = new RouterOrchestrator()
    .AddRoute("quick", quickAgent)
    .AddRoute("reasoning", reasoningAgent)
    .AddRoute("code", codeAgent)
    .WithDefaultRoute("quick");

Console.WriteLine("Router configured with 3 routes: quick, reasoning, code\n");

Step 4: Implement Keyword-Based Routing

The simplest routing strategy uses keyword matching to classify prompts. This works well for clear-cut categories:

// ──────────────────────────────────────
// 5. Keyword-based routing function
// ──────────────────────────────────────
string[] codeKeywords = { "code", "function", "class", "implement", "debug", "compile",
                          "syntax", "C#", "csharp", "method", "algorithm", "API" };

string[] reasoningKeywords = { "explain", "why", "analyze", "compare", "calculate",
                               "evaluate", "pros and cons", "step by step", "reasoning",
                               "complex", "strategy", "tradeoff" };

router.WithRoutingFunction((input, agents) =>
{
    string lower = input.ToLowerInvariant();

    // Check code keywords first (most specific)
    foreach (string keyword in codeKeywords)
    {
        if (lower.Contains(keyword.ToLowerInvariant()))
            return "code";
    }

    // Check reasoning keywords
    foreach (string keyword in reasoningKeywords)
    {
        if (lower.Contains(keyword.ToLowerInvariant()))
            return "reasoning";
    }

    // Short prompts go to the fast model
    if (input.Split(' ').Length < 15)
        return "quick";

    // Default to reasoning for longer prompts
    return "reasoning";
});

Step 5: Test the Router

// ──────────────────────────────────────
// 6. Route various prompts
// ──────────────────────────────────────
string[] testPrompts = {
    "What is the capital of France?",
    "Explain why transformer architectures use multi-head attention instead of single attention.",
    "Implement a binary search algorithm in C# with generic type support.",
    "Hello!",
    "Compare the pros and cons of microservices versus monolithic architectures for a team of 5.",
    "Write a C# method that validates email addresses using regex."
};

foreach (string prompt in testPrompts)
{
    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.WriteLine($"User: {prompt}");
    Console.ResetColor();

    var result = await router.ExecuteAsync(prompt);

    Console.ForegroundColor = ConsoleColor.Green;
    Console.WriteLine($"  [Routed to: {result.AgentName}]");
    Console.ResetColor();
    Console.WriteLine($"  {result.Content}\n");
}

Step 6: Use an LLM as the Router

For more nuanced routing, replace the keyword function with a lightweight LLM that classifies prompts. The fast model itself can serve as the router:

// ──────────────────────────────────────
// 7. LLM-powered routing
// ──────────────────────────────────────
Agent routerAgent = Agent.CreateBuilder(fastModel)
    .WithInstruction(
        "You are a prompt classifier. Given a user message, respond with exactly " +
        "one word: 'quick', 'reasoning', or 'code'. " +
        "Use 'quick' for greetings, simple facts, and short answers. " +
        "Use 'reasoning' for analysis, comparisons, and explanations. " +
        "Use 'code' for programming, implementation, and debugging tasks. " +
        "Respond with the single word only, nothing else.")
    .WithMaxIterations(1)
    .Build();

var smartRouter = new RouterOrchestrator()
    .AddRoute("quick", quickAgent)
    .AddRoute("reasoning", reasoningAgent)
    .AddRoute("code", codeAgent)
    .WithDefaultRoute("quick")
    .WithRouterAgent(routerAgent);

Console.WriteLine("── LLM-Powered Router ──\n");

var smartResult = await smartRouter.ExecuteAsync(
    "Can you help me optimize a LINQ query that's causing N+1 database calls?");

Console.WriteLine($"Routed to: {smartResult.AgentName}");
Console.WriteLine($"Response: {smartResult.Content}\n");

The WithRouterAgent method lets the router LLM decide the route based on semantic understanding rather than simple keywords.


Step 7: Add Fallback Resilience

Wrap the router with FallbackAgentExecutor so that if the primary agent fails (timeout, out-of-memory, model error), the system automatically falls back to an alternative:

// ──────────────────────────────────────
// 8. Fallback chain: capable → fast → error message
// ──────────────────────────────────────
var fallbackExecutor = new FallbackAgentExecutor()
    .AddAgent(reasoningAgent)
    .AddAgent(quickAgent)
    .OnFallback((agent, ex, attempt) =>
    {
        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine($"  [FALLBACK] Agent '{agent.Name}' failed on attempt {attempt}: {ex.Message}");
        Console.WriteLine($"  [FALLBACK] Trying next agent...");
        Console.ResetColor();
    });

Console.WriteLine("── Fallback Execution ──");
var fallbackResult = await fallbackExecutor.ExecuteAsync(
    "Explain the difference between async and parallel programming in .NET.");

Console.WriteLine($"Result from: {fallbackResult.AgentName}");
Console.WriteLine($"Response: {fallbackResult.Content}\n");

Step 8: Interactive Router with Metrics

Build a complete interactive loop that tracks routing statistics:

// ──────────────────────────────────────
// 9. Interactive loop with route tracking
// ──────────────────────────────────────
var routeStats = new Dictionary<string, int>
{
    ["quick"] = 0,
    ["reasoning"] = 0,
    ["code"] = 0
};

Console.WriteLine("── Interactive Router ──");
Console.WriteLine("Type a message (or 'quit' to exit, 'stats' for metrics):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.White;
    Console.Write("You: ");
    string? input = Console.ReadLine();
    Console.ResetColor();

    if (string.IsNullOrWhiteSpace(input)) continue;
    if (input.Equals("quit", StringComparison.OrdinalIgnoreCase)) break;

    if (input.Equals("stats", StringComparison.OrdinalIgnoreCase))
    {
        int total = routeStats.Values.Sum();
        Console.WriteLine("\n── Routing Statistics ──");
        foreach (var stat in routeStats)
        {
            double pct = total > 0 ? (double)stat.Value / total * 100 : 0;
            Console.WriteLine($"  {stat.Key}: {stat.Value} requests ({pct:F1}%)");
        }
        Console.WriteLine($"  Total: {total} requests\n");
        continue;
    }

    var response = await smartRouter.ExecuteAsync(input);

    if (routeStats.ContainsKey(response.AgentName))
        routeStats[response.AgentName]++;

    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  [Route: {response.AgentName}]");
    Console.ResetColor();
    Console.WriteLine($"Assistant: {response.Content}\n");
}

// Final stats
Console.WriteLine("\n── Final Session Statistics ──");
foreach (var stat in routeStats)
    Console.WriteLine($"  {stat.Key}: {stat.Value} requests");

Common Issues

Problem Cause Fix
Out-of-memory with two models Insufficient VRAM for simultaneous loading Use a single model for all agents, or use smaller quantizations
Router always picks the same route Routing function logic too broad Add more specific keywords or use WithRouterAgent for semantic routing
LLM router returns unexpected text Router agent instruction not strict enough Use grammar constraints with CreateGrammarFromStringList to force valid route names
Fallback not triggered Exception type not caught by default handler Use HandleException on FallbackAgentExecutor to specify which exceptions trigger fallback
Slow routing with LLM router Router model is too large Use the smallest available model (1B) for the router agent

Next Steps