Resilience¶
The Dory SDK ships three resilience primitives that let a processor degrade gracefully instead of failing hard: circuit breakers that stop hammering a broken dependency, retry with backoff that absorbs transient faults, and an error classifier that decides what to do with an exception. BaseProcessor wires usable defaults for all three, so a processor can lean on them without any setup (see Core Concepts).
Circuit breakers¶
A circuit breaker wraps a call to an external dependency. While the dependency is healthy the breaker is closed and calls pass through. Once failures pile up it opens and fails fast — protecting both your processor and the struggling dependency — then probes for recovery before closing again.
States¶
| State | Meaning | Transition |
|---|---|---|
CLOSED ("closed") |
Calls pass through normally. | Opens after failure_threshold consecutive failures. |
OPEN ("open") |
Calls fail fast with CircuitOpenError. |
Moves to half-open after timeout_seconds elapses. |
HALF_OPEN ("half_open") |
A limited number of trial calls are allowed. | Closes after success_threshold successes; any failure re-opens it. |
Configuration¶
CircuitBreakerConfig (and the matching CircuitBreaker constructor arguments):
| Field | Default | Purpose |
|---|---|---|
name |
— | Identifier used in stats and logs. |
failure_threshold |
5 |
Consecutive failures before opening. |
success_threshold |
2 |
Successes in half-open before closing. |
timeout_seconds |
60.0 |
How long to stay open before probing. |
half_open_max_calls |
1 |
Trial calls allowed while half-open. |
on_state_change |
None |
Callback invoked on every state transition. |
Wrapping a call¶
call() is the wrapping primitive. It auto-detects whether the wrapped function is a coroutine or a sync function, runs it, and records the outcome. When the breaker is open it raises CircuitOpenError(circuit_name, next_attempt_time) immediately without invoking the function.
from dory.resilience import CircuitBreaker, CircuitOpenError
breaker = CircuitBreaker("payments", failure_threshold=3, timeout_seconds=30.0)
try:
result = await breaker.call(charge_card, amount)
except CircuitOpenError as exc:
# fail fast; exc.next_attempt_time tells you when to retry
result = None
Useful members: properties state, is_closed, is_open, is_half_open; get_stats() -> dict; and the async controls reset() (back to closed) and open() (force open).
The three breakers on BaseProcessor¶
Every processor gets self.circuit_breakers, a dict pre-populated with three breakers:
| Key | Intended dependency |
|---|---|
"database" |
Database / persistence calls |
"external_api" |
Outbound HTTP / RPC calls |
"cache" |
Cache backends |
class MyProcessor(BaseProcessor):
async def process(self):
rows = await self.circuit_breakers["database"].call(db.fetch, query)
Warning
These three breakers are constructed with failure_threshold=5, success_threshold=2, timeout_seconds=30.0, half_open_max_calls=3 — the timeout_seconds and half_open_max_calls values differ from the CircuitBreakerConfig dataclass defaults (60.0 and 1). If you depend on exact timing, read the breaker's get_stats() rather than assuming the dataclass defaults.
A CircuitBreakerRegistry (with a process-wide get_global_registry()) is available for sharing named breakers across components.
Retry with backoff¶
Transient faults — a dropped connection, a momentary 429 — are best absorbed by retrying with exponential backoff rather than tripping a breaker. The retry_with_backoff decorator works on both async and sync functions.
from dory.resilience import retry_with_backoff, RetryBudget
budget = RetryBudget(budget_percent=20.0)
@retry_with_backoff(max_attempts=4, initial_delay=0.5, budget=budget,
retryable_exceptions=(ConnectionError,))
async def call_api():
...
On exhaustion the decorator raises RetryExhaustedError(attempts, last_error).
RetryPolicy¶
The decorator's parameters mirror RetryPolicy:
| Field | Default | Purpose |
|---|---|---|
max_attempts |
3 |
Total attempts (including the first). |
initial_delay |
1.0 |
Base delay in seconds. |
max_delay |
30.0 |
Upper bound on any single delay. |
multiplier |
2.0 |
Exponential growth factor. |
jitter |
True |
Add randomized jitter to each delay. |
retryable_exceptions |
(Exception,) |
Exceptions that trigger a retry. |
non_retryable_exceptions |
() |
Exceptions that never retry (checked first). |
on_retry |
None |
Callback invoked before each retry. |
The delay for an attempt is min(initial_delay * multiplier ** attempt, max_delay). Whether an exception is retryable is decided by is_retryable: non_retryable_exceptions always wins, then the retryable_exceptions check applies.
Jitter¶
Jitter is one-sided: when enabled, the computed delay is increased by a random amount between 0 and 25% of itself (delay + uniform(0, delay * 0.25)). This spreads retries from many processors so they do not all wake up at the same instant (the "thundering herd").
RetryBudget¶
A retry budget caps how much of your traffic is allowed to be retries, preventing a retry storm from amplifying load on an already-degraded dependency.
| Field | Default | Purpose |
|---|---|---|
budget_percent |
20.0 |
Max retries as a percentage of requests. |
window_seconds |
60.0 |
Sliding window; counters auto-reset. |
can_retry() returns True while (retries / requests) * 100 <= budget_percent. Pass the budget to retry_with_backoff(budget=...).
Warning
When the budget is exhausted, the decorator does not raise RetryExhaustedError — it re-raises the original exception immediately instead of retrying. Handle the underlying exception type at the call site, not just RetryExhaustedError.
Error classification¶
Rather than hard-coding except arms, let ErrorClassifier map an exception to a structured decision. BaseProcessor exposes self.error_classifier.
classify(error) returns a ClassificationResult:
| Field | Type | Description |
|---|---|---|
error_type |
ErrorType |
Category of the error. |
recommended_action |
RecoveryAction |
What to do about it. |
retryable |
bool |
Whether a retry is sensible. |
severity |
str |
"low", "medium", "high", or "critical". |
details |
dict |
Additional context. |
Use classify_and_handle(error) to classify and log in one call.
ErrorType and RecoveryAction¶
ErrorType: TRANSIENT, PERMANENT, RESOURCE, EXTERNAL, LOGIC, UNKNOWN.
RecoveryAction: RETRY, CIRCUIT_BREAKER, BACKOFF, SCALE, GOLDEN_RESET, DEGRADE, ALERT, FAIL, LOG.
The classifier maps each type to a default action, and marks {TRANSIENT, EXTERNAL, RESOURCE} as retryable:
ErrorType |
Default RecoveryAction |
Retryable |
|---|---|---|
TRANSIENT |
RETRY |
Yes |
EXTERNAL |
CIRCUIT_BREAKER |
Yes |
RESOURCE |
BACKOFF |
Yes |
LOGIC |
ALERT |
No |
PERMANENT |
GOLDEN_RESET |
No |
UNKNOWN |
LOG |
No |
HTTP status codes are recognized heuristically: 500/502/503/504/429 classify as EXTERNAL; 400/401/403/404 classify as PERMANENT.
Custom registration¶
Map your own exception classes to a type so the classifier handles them correctly:
from dory.errors import ErrorClassifier, ErrorType
classifier = ErrorClassifier()
classifier.register_error_type(MyTimeoutError, ErrorType.TRANSIENT)
Use clear_error_type_registry() to reset. Module-level helpers classify_error(error) and is_retryable(error) are available for one-off use.
Error codes¶
The SDK defines stable error codes in the format E-<DOMAIN>-<NUMBER>, where domains include COR, STA, MIG, RET, CBR, ECL, GLD, REC, VAL, and PRC. Roughly 28 codes are predefined, for example:
| Code | Meaning |
|---|---|
E-RET-001 |
Retry budget exhausted |
E-CBR-001 |
Circuit breaker is OPEN |
E-STA-003 |
State corruption detected |
ErrorCodeRegistry supports register, get, search, list_by_domain, and all for looking codes up programmatically.
Putting it together¶
A typical defensive call combines all three: classify the error, retry transient failures within budget, and let the breaker shed load when a dependency is genuinely down.
@retry_with_backoff(max_attempts=3, retryable_exceptions=(ConnectionError,))
async def query(self):
return await self.circuit_breakers["database"].call(db.fetch, sql)
See Configuration for the environment variables that tune these defaults and API Reference for full signatures.