Build a Resilient Production Agent

A demo agent that works on a developer's laptop is not the same as a production agent that handles real traffic. Production agents need retry logic for transient failures, timeouts to prevent hung requests, fallback behavior when the primary path fails, and health checks to verify the system is operational. This tutorial builds all of these resilience patterns on top of LM-Kit.NET's Agent API.

Why This Matters

Two production problems that resilience patterns solve:

Transient failures in long-running agent loops. Agents that use tools (web search, APIs, file systems) encounter intermittent errors: network timeouts, rate limits, or temporary resource contention. Without retry logic, a single transient failure terminates the entire workflow, forcing users to restart.
Cascading failures under load. When an agent starts failing repeatedly (due to VRAM pressure, model corruption, or external dependency outages), continuing to send requests makes the problem worse. A circuit breaker stops sending requests to a failing component and gives it time to recover, protecting the rest of the system.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB (for a 4B model with tool-calling support)
Disk	~3 GB free for model download

You need a model that supports tool calling. Recommended: qwen3:4b or gemma3:4b.

Step 1: Create the Project

dotnet new console -n ResilientAgentQuickstart
cd ResilientAgentQuickstart
dotnet add package LM-Kit.NET

Step 2: Build a Basic Agent

Start with a standard agent using built-in tools:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Build the agent
// ──────────────────────────────────────
var agent = Agent.CreateBuilder(model)
    .WithPersona("production-assistant")
    .WithInstruction("You are a reliable assistant. Answer questions accurately and concisely.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
    })
    .WithMaxIterations(5)
    .Build();

Console.WriteLine("Agent ready.\n");

This agent works for demos, but it has no protection against failures. The following steps add production resilience.

Step 3: Add Retry Logic with Exponential Backoff

Wrap agent execution in a retry loop that backs off exponentially between attempts. This handles transient errors like temporary resource contention or intermittent tool failures:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Build the agent
// ──────────────────────────────────────
var agent = Agent.CreateBuilder(model)
    .WithPersona("production-assistant")
    .WithInstruction("You are a reliable assistant. Answer questions accurately and concisely.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
    })
    .WithMaxIterations(5)
    .Build();

async Task<AgentExecutionResult> ExecuteWithRetryAsync(
    Agent agent,
    string prompt,
    int maxRetries = 3,
    int timeoutSeconds = 60)
{
    for (int attempt = 1; attempt <= maxRetries; attempt++)
    {
        try
        {
            using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds));

            var result = await agent.RunAsync(prompt, cts.Token);

            if (result.IsSuccess)
                return result;

            Console.WriteLine($"  Attempt {attempt}/{maxRetries} failed: {result.Error?.Message}");
        }
        catch (OperationCanceledException)
        {
            Console.WriteLine($"  Attempt {attempt}/{maxRetries} timed out after {timeoutSeconds}s");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"  Attempt {attempt}/{maxRetries} error: {ex.Message}");
        }

        if (attempt < maxRetries)
        {
            int delayMs = (int)(Math.Pow(2, attempt) * 1000);
            Console.WriteLine($"  Retrying in {delayMs / 1000}s...");
            await Task.Delay(delayMs);
        }
    }

    return AgentExecutionResult.Failure(new Exception("All retry attempts exhausted."));
}

Usage:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Build the agent
// ──────────────────────────────────────
var agent = Agent.CreateBuilder(model)
    .WithPersona("production-assistant")
    .WithInstruction("You are a reliable assistant. Answer questions accurately and concisely.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
    })
    .WithMaxIterations(5)
    .Build();

var result = await ExecuteWithRetryAsync(agent, "What is 42 * 17?", maxRetries: 3, timeoutSeconds: 30);

if (result.IsSuccess)
    Console.WriteLine($"Answer: {result.Content}");
else
    Console.WriteLine($"Failed after all retries: {result.Error?.Message}");

The exponential backoff (2s, 4s, 8s) prevents hammering a struggling system while still recovering quickly from brief hiccups.

Step 4: Add Timeout Handling

Timeouts prevent individual requests from blocking indefinitely. Use CancellationTokenSource with a deadline:

async Task<AgentExecutionResult> ExecuteWithTimeoutAsync(
    Agent agent,
    string prompt,
    int timeoutSeconds = 60)
{
    try
    {
        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds));
        return await agent.RunAsync(prompt, cts.Token);
    }
    catch (OperationCanceledException)
    {
        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine($"  Request timed out after {timeoutSeconds}s");
        Console.ResetColor();
        return AgentExecutionResult.Failure(new TimeoutException($"Agent execution exceeded {timeoutSeconds}s limit."));
    }
}

Choosing timeout values: simple Q&A with no tools typically completes in 5-15 seconds. Agents with web search or multi-step reasoning may need 30-90 seconds. Start with 60 seconds and adjust based on your observed P95 latency.

Step 5: Add Fallback Behavior

When the primary agent fails (after retries and timeouts), provide a graceful fallback instead of returning an error to the user:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Build the agent
// ──────────────────────────────────────
var agent = Agent.CreateBuilder(model)
    .WithPersona("production-assistant")
    .WithInstruction("You are a reliable assistant. Answer questions accurately and concisely.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
    })
    .WithMaxIterations(5)
    .Build();

async Task<string> ExecuteWithFallbackAsync(
    Agent primaryAgent,
    string prompt,
    int maxRetries = 3,
    int timeoutSeconds = 60)
{
    // Try the primary agent with retries
    var result = await ExecuteWithRetryAsync(primaryAgent, prompt, maxRetries, timeoutSeconds);

    if (result.IsSuccess)
        return result.Content;

    // Fallback: return a helpful message instead of an error
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine("  Primary agent failed. Using fallback response.");
    Console.ResetColor();

    return "I'm currently unable to process this request. " +
           "Please try again in a few moments, or rephrase your question.";
}

For more sophisticated fallback, you can use a simpler model that is less likely to fail:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

async Task<string> ExecuteWithModelFallbackAsync(
    Agent primaryAgent,
    LM fallbackModel,
    string prompt)
{
    var result = await ExecuteWithRetryAsync(primaryAgent, prompt, maxRetries: 2, timeoutSeconds: 30);

    if (result.IsSuccess)
        return result.Content;

    // Fallback to a simpler, faster model
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine("  Falling back to lightweight model...");
    Console.ResetColor();

    var fallbackAgent = Agent.CreateBuilder(fallbackModel)
        .WithPersona("fallback-assistant")
        .WithInstruction("Answer the question concisely. Keep your response brief.")
        .Build();

    var fallbackResult = await fallbackAgent.ExecuteAsync(prompt);
    return fallbackResult.IsSuccess ? fallbackResult.Content : "Service temporarily unavailable.";
}

Step 6: Build a Circuit Breaker

A circuit breaker tracks failure rates and stops sending requests when failures exceed a threshold. After a reset period, it allows a test request through. If that succeeds, the circuit closes and normal traffic resumes:

public class CircuitBreaker
{
    private readonly int _failureThreshold;
    private readonly TimeSpan _resetTimeout;
    private int _failureCount;
    private DateTime _lastFailure = DateTime.MinValue;
    private bool _isOpen;

    public CircuitBreaker(int failureThreshold = 5, int resetTimeoutSeconds = 30)
    {
        _failureThreshold = failureThreshold;
        _resetTimeout = TimeSpan.FromSeconds(resetTimeoutSeconds);
    }

    public bool AllowRequest()
    {
        if (!_isOpen) return true;
        if (DateTime.UtcNow - _lastFailure > _resetTimeout)
        {
            _isOpen = false;
            _failureCount = 0;
            return true;
        }
        return false;
    }

    public void RecordSuccess() => Interlocked.Exchange(ref _failureCount, 0);

    public void RecordFailure()
    {
        _lastFailure = DateTime.UtcNow;
        if (Interlocked.Increment(ref _failureCount) >= _failureThreshold)
            _isOpen = true;
    }

    public bool IsOpen => _isOpen;
}

The circuit breaker has three states:

Closed (normal): all requests pass through.
Open (tripped): all requests are rejected immediately without calling the agent.
Half-open (testing): after the reset timeout, one request is allowed through to test recovery.

Step 7: Combine into a Production-Ready Pattern

The ResilientAgent class combines retries, timeouts, circuit breaking, and health statistics into a single wrapper:

public class ResilientAgent
{
    private readonly Agent _agent;
    private readonly int _maxRetries;
    private readonly int _timeoutSeconds;
    private readonly CircuitBreaker _circuitBreaker;
    private int _successCount;
    private int _failureCount;

    public ResilientAgent(Agent agent, int maxRetries = 3, int timeoutSeconds = 60)
    {
        _agent = agent;
        _maxRetries = maxRetries;
        _timeoutSeconds = timeoutSeconds;
        _circuitBreaker = new CircuitBreaker(failureThreshold: 5, resetTimeoutSeconds: 30);
    }

    public async Task<AgentExecutionResult> ExecuteAsync(string prompt, CancellationToken cancellationToken = default)
    {
        // Check circuit breaker
        if (!_circuitBreaker.AllowRequest())
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine("  Circuit breaker is OPEN. Request rejected.");
            Console.ResetColor();
            return AgentExecutionResult.Failure(new Exception("Circuit breaker open. Service is temporarily unavailable."));
        }

        for (int attempt = 1; attempt <= _maxRetries; attempt++)
        {
            try
            {
                using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
                cts.CancelAfter(TimeSpan.FromSeconds(_timeoutSeconds));

                var result = await _agent.RunAsync(prompt, cts.Token);

                if (result.IsSuccess)
                {
                    _circuitBreaker.RecordSuccess();
                    Interlocked.Increment(ref _successCount);
                    return result;
                }

                Console.WriteLine($"  Attempt {attempt}: agent returned error.");
            }
            catch (OperationCanceledException) when (!cancellationToken.IsCancellationRequested)
            {
                Console.WriteLine($"  Attempt {attempt}: timed out.");
            }

            if (attempt < _maxRetries)
            {
                int delayMs = (int)(Math.Pow(2, attempt) * 500);
                await Task.Delay(delayMs, cancellationToken);
            }
        }

        _circuitBreaker.RecordFailure();
        Interlocked.Increment(ref _failureCount);
        return AgentExecutionResult.Failure(new Exception("All retry attempts exhausted."));
    }

    public (int Success, int Failure) GetStats() => (_successCount, _failureCount);

    public bool IsHealthy => !_circuitBreaker.IsOpen;
}

Usage with the full resilience stack:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// Build the underlying agent
var agent = Agent.CreateBuilder(model)
    .WithPersona("production-assistant")
    .WithInstruction("You are a reliable assistant. Answer questions accurately and concisely.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
    })
    .WithMaxIterations(5)
    .Build();

// Wrap it with resilience
var resilientAgent = new ResilientAgent(agent, maxRetries: 3, timeoutSeconds: 60);

// Interactive loop
Console.WriteLine("Resilient agent ready. Type a question (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    var result = await resilientAgent.ExecuteAsync(input);

    if (result.IsSuccess)
    {
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine($"\nAssistant: {result.Content}");
        Console.ResetColor();
    }
    else
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\nError: {result.Error?.Message}");
        Console.ResetColor();
    }

    // Show health status
    var stats = resilientAgent.GetStats();
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  [Health: {(resilientAgent.IsHealthy ? "OK" : "DEGRADED")} | Success: {stats.Success}, Failures: {stats.Failure}]");
    Console.ResetColor();
    Console.WriteLine();
}

Choosing Resilience Parameters

Parameter	Default	Guidance
`maxRetries`	3	Higher for unreliable networks; lower for latency-sensitive workloads
`timeoutSeconds`	60	Match to your P99 observed latency plus a margin
`failureThreshold`	5	Number of consecutive failures before the circuit opens
`resetTimeoutSeconds`	30	How long to wait before testing if the service has recovered
`backoff multiplier`	2x	Exponential backoff base. 2x is standard; use 1.5x for faster retry

Common Issues

Problem	Cause	Fix
Agent always times out	Timeout too short for complex tool-using agents	Increase `timeoutSeconds` to 90-120 for agents with web search
Circuit breaker never closes	Reset timeout too short, failures keep coming	Increase `resetTimeoutSeconds` and investigate root cause
Retries make the problem worse	Retrying a permanently failing request wastes resources	Add circuit breaker to stop retries when failure rate is high
Fallback model produces low-quality output	Fallback model too small for the task	Use a mid-size fallback (e.g., `gemma3:4b` as fallback for `qwen3:8b`)
High latency under concurrent load	Single model instance serializes requests	Use model caching (`EnableModelCache`) and consider load balancing across instances

Next Steps

Create an AI Agent with Tools: learn the Agent API before adding resilience.
Add Telemetry and Observability: monitor your resilient agent's performance metrics.
Build a Multi-Agent Workflow: orchestrate multiple agents with parallel, pipeline, and supervisor patterns.
Add Middleware Filters to Agents and Conversations: use filters for rate limiting, caching, and telemetry.

Table of Contents