Route Prompts Across Models with RouterOrchestrator
Not every prompt needs the same model. A simple greeting wastes resources on a 12B parameter model, while a complex reasoning task fails on a 1B model. LM-Kit.NET's RouterOrchestrator lets you define named routes, each backed by a different agent (and potentially a different model), then direct each incoming prompt to the right agent automatically. Combined with FallbackAgentExecutor for resilience, you can build heterogeneous architectures that cut inference costs by 80%+ while maintaining quality where it matters. This guide builds a production-ready prompt routing system from scratch.
Why Prompt Routing Matters
Two production problems that prompt routing solves:
- Inference cost optimization at scale. Running every request through your largest model is the default approach, but 70% of production prompts are simple lookups, greetings, or short answers. Routing these to a small, fast model and reserving the large model for complex reasoning tasks reduces GPU utilization and latency dramatically. Organizations running thousands of requests per minute see immediate impact.
- Specialized quality across domains. A model trained for code generation might underperform on creative writing, and vice versa. Routing code questions to a code-specialized agent and general questions to a chat-optimized agent ensures each domain gets the best available model, without requiring a single model that excels at everything.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 8+ GB (two models loaded simultaneously) |
| Disk | ~6 GB free for model downloads |
Note: This guide loads two models simultaneously. If VRAM is limited, you can use the same model for all agents and focus on the routing logic. The architecture remains identical.
Step 1: Create the Project
dotnet new console -n PromptRouter
cd PromptRouter
dotnet add package LM-Kit.NET
Step 2: Load Multiple Models and Create Specialized Agents
using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Orchestration;
using LMKit.Agents.Resilience;
using LMKit.Agents.Tools.BuiltIn;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load a fast model for simple tasks
// ──────────────────────────────────────
Console.WriteLine("Loading fast model (1B)...");
using LM fastModel = LM.LoadFromModelID("gemma3:1b",
loadingProgress: p => { Console.Write($"\rFast model: {p * 100:F0}% "); return true; });
Console.WriteLine();
// ──────────────────────────────────────
// 2. Load a capable model for complex tasks
// ──────────────────────────────────────
Console.WriteLine("Loading capable model (4B)...");
using LM capableModel = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rCapable model: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 3. Create specialized agents
// ──────────────────────────────────────
Agent quickAgent = Agent.CreateBuilder(fastModel)
.WithPersona("quick-responder")
.WithInstruction(
"You are a fast, concise assistant for simple questions. " +
"Answer in 1 to 2 sentences maximum. Be direct.")
.WithMaxIterations(1)
.Build();
Agent reasoningAgent = Agent.CreateBuilder(capableModel)
.WithPersona("reasoning-expert")
.WithInstruction(
"You are a thorough reasoning assistant for complex questions. " +
"Think step by step, provide detailed explanations.")
.WithTools(tools =>
{
tools.Register(BuiltInTools.Calculator);
tools.Register(BuiltInTools.DateTime);
})
.WithMaxIterations(5)
.Build();
Agent codeAgent = Agent.CreateBuilder(capableModel)
.WithPersona("code-specialist")
.WithInstruction(
"You are a coding assistant specialized in C# and .NET. " +
"Provide clean, well-documented code with explanations.")
.WithMaxIterations(3)
.Build();
Console.WriteLine("All agents ready.\n");
Step 3: Configure the RouterOrchestrator
The RouterOrchestrator maps named routes to agents and uses a routing function to direct each prompt:
// ──────────────────────────────────────
// 4. Build the router with named routes
// ──────────────────────────────────────
var router = new RouterOrchestrator()
.AddRoute("quick", quickAgent)
.AddRoute("reasoning", reasoningAgent)
.AddRoute("code", codeAgent)
.WithDefaultRoute("quick");
Console.WriteLine("Router configured with 3 routes: quick, reasoning, code\n");
Step 4: Implement Keyword-Based Routing
The simplest routing strategy uses keyword matching to classify prompts. This works well for clear-cut categories:
// ──────────────────────────────────────
// 5. Keyword-based routing function
// ──────────────────────────────────────
string[] codeKeywords = { "code", "function", "class", "implement", "debug", "compile",
"syntax", "C#", "csharp", "method", "algorithm", "API" };
string[] reasoningKeywords = { "explain", "why", "analyze", "compare", "calculate",
"evaluate", "pros and cons", "step by step", "reasoning",
"complex", "strategy", "tradeoff" };
router.WithRoutingFunction((input, agents) =>
{
string lower = input.ToLowerInvariant();
// Check code keywords first (most specific)
foreach (string keyword in codeKeywords)
{
if (lower.Contains(keyword.ToLowerInvariant()))
return "code";
}
// Check reasoning keywords
foreach (string keyword in reasoningKeywords)
{
if (lower.Contains(keyword.ToLowerInvariant()))
return "reasoning";
}
// Short prompts go to the fast model
if (input.Split(' ').Length < 15)
return "quick";
// Default to reasoning for longer prompts
return "reasoning";
});
Step 5: Test the Router
// ──────────────────────────────────────
// 6. Route various prompts
// ──────────────────────────────────────
string[] testPrompts = {
"What is the capital of France?",
"Explain why transformer architectures use multi-head attention instead of single attention.",
"Implement a binary search algorithm in C# with generic type support.",
"Hello!",
"Compare the pros and cons of microservices versus monolithic architectures for a team of 5.",
"Write a C# method that validates email addresses using regex."
};
foreach (string prompt in testPrompts)
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"User: {prompt}");
Console.ResetColor();
var result = await router.ExecuteAsync(prompt);
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($" [Routed to: {result.AgentName}]");
Console.ResetColor();
Console.WriteLine($" {result.Content}\n");
}
Step 6: Use an LLM as the Router
For more nuanced routing, replace the keyword function with a lightweight LLM that classifies prompts. The fast model itself can serve as the router:
// ──────────────────────────────────────
// 7. LLM-powered routing
// ──────────────────────────────────────
Agent routerAgent = Agent.CreateBuilder(fastModel)
.WithInstruction(
"You are a prompt classifier. Given a user message, respond with exactly " +
"one word: 'quick', 'reasoning', or 'code'. " +
"Use 'quick' for greetings, simple facts, and short answers. " +
"Use 'reasoning' for analysis, comparisons, and explanations. " +
"Use 'code' for programming, implementation, and debugging tasks. " +
"Respond with the single word only, nothing else.")
.WithMaxIterations(1)
.Build();
var smartRouter = new RouterOrchestrator()
.AddRoute("quick", quickAgent)
.AddRoute("reasoning", reasoningAgent)
.AddRoute("code", codeAgent)
.WithDefaultRoute("quick")
.WithRouterAgent(routerAgent);
Console.WriteLine("── LLM-Powered Router ──\n");
var smartResult = await smartRouter.ExecuteAsync(
"Can you help me optimize a LINQ query that's causing N+1 database calls?");
Console.WriteLine($"Routed to: {smartResult.AgentName}");
Console.WriteLine($"Response: {smartResult.Content}\n");
The WithRouterAgent method lets the router LLM decide the route based on semantic understanding rather than simple keywords.
Step 7: Add Fallback Resilience
Wrap the router with FallbackAgentExecutor so that if the primary agent fails (timeout, out-of-memory, model error), the system automatically falls back to an alternative:
// ──────────────────────────────────────
// 8. Fallback chain: capable → fast → error message
// ──────────────────────────────────────
var fallbackExecutor = new FallbackAgentExecutor()
.AddAgent(reasoningAgent)
.AddAgent(quickAgent)
.OnFallback((agent, ex, attempt) =>
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($" [FALLBACK] Agent '{agent.Name}' failed on attempt {attempt}: {ex.Message}");
Console.WriteLine($" [FALLBACK] Trying next agent...");
Console.ResetColor();
});
Console.WriteLine("── Fallback Execution ──");
var fallbackResult = await fallbackExecutor.ExecuteAsync(
"Explain the difference between async and parallel programming in .NET.");
Console.WriteLine($"Result from: {fallbackResult.AgentName}");
Console.WriteLine($"Response: {fallbackResult.Content}\n");
Step 8: Interactive Router with Metrics
Build a complete interactive loop that tracks routing statistics:
// ──────────────────────────────────────
// 9. Interactive loop with route tracking
// ──────────────────────────────────────
var routeStats = new Dictionary<string, int>
{
["quick"] = 0,
["reasoning"] = 0,
["code"] = 0
};
Console.WriteLine("── Interactive Router ──");
Console.WriteLine("Type a message (or 'quit' to exit, 'stats' for metrics):\n");
while (true)
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write("You: ");
string? input = Console.ReadLine();
Console.ResetColor();
if (string.IsNullOrWhiteSpace(input)) continue;
if (input.Equals("quit", StringComparison.OrdinalIgnoreCase)) break;
if (input.Equals("stats", StringComparison.OrdinalIgnoreCase))
{
int total = routeStats.Values.Sum();
Console.WriteLine("\n── Routing Statistics ──");
foreach (var stat in routeStats)
{
double pct = total > 0 ? (double)stat.Value / total * 100 : 0;
Console.WriteLine($" {stat.Key}: {stat.Value} requests ({pct:F1}%)");
}
Console.WriteLine($" Total: {total} requests\n");
continue;
}
var response = await smartRouter.ExecuteAsync(input);
if (routeStats.ContainsKey(response.AgentName))
routeStats[response.AgentName]++;
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($" [Route: {response.AgentName}]");
Console.ResetColor();
Console.WriteLine($"Assistant: {response.Content}\n");
}
// Final stats
Console.WriteLine("\n── Final Session Statistics ──");
foreach (var stat in routeStats)
Console.WriteLine($" {stat.Key}: {stat.Value} requests");
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Out-of-memory with two models | Insufficient VRAM for simultaneous loading | Use a single model for all agents, or use smaller quantizations |
| Router always picks the same route | Routing function logic too broad | Add more specific keywords or use WithRouterAgent for semantic routing |
| LLM router returns unexpected text | Router agent instruction not strict enough | Use grammar constraints with CreateGrammarFromStringList to force valid route names |
| Fallback not triggered | Exception type not caught by default handler | Use HandleException on FallbackAgentExecutor to specify which exceptions trigger fallback |
| Slow routing with LLM router | Router model is too large | Use the smallest available model (1B) for the router agent |
Next Steps
- Build a Multi-Agent Workflow: use Pipeline, Parallel, and Supervisor orchestration alongside routing.
- Build a Resilient Production Agent: add retry policies and circuit breakers to your routed agents.
- Enforce Structured Output with Grammar-Constrained Decoding: use grammar constraints on the router agent for deterministic route selection.
- Stream Agent Responses in Real Time: add token streaming to your routed agents for responsive UIs.