Resilience

Error Handling

Robust error handling with graceful degradation, intelligent retries, and circuit breakers for resilient systems.

0.12%

Error Rate

Last 24h

94%

Retry Success

After retry

Circuit Opens

This week

4.2m

MTTR

Avg recovery

Error Handling Patterns

Structured approach to handling errors at every layer.

# Error Handling Architecture

## Exception Hierarchy
├── JustKalmError (base)
│   ├── ValidationError
│   │   ├── InvalidInputError
│   │   └── SchemaValidationError
│   ├── AuthenticationError
│   │   ├── InvalidTokenError
│   │   └── ExpiredTokenError
│   ├── AuthorizationError
│   │   ├── InsufficientScopeError
│   │   └── QuotaExceededError
│   ├── ResourceError
│   │   ├── NotFoundError
│   │   └── ConflictError
│   └── ExternalServiceError
│       ├── UpstreamTimeoutError
│       └── ServiceUnavailableError

## Error Context
class JustKalmError(Exception):
    def __init__(
        self,
        message: str,
        code: str,
        details: dict = None,
        retry_after: int = None,
        doc_url: str = None
    ):
        self.message = message
        self.code = code
        self.details = details or {}
        self.retry_after = retry_after
        self.doc_url = doc_url

Graceful Degradation

# Fallback Patterns

async def get_valuation(product_id: str):
    try:
        # Primary: ML model
        result = await ml_service.predict(product_id)
        return result
        
    except MLServiceError:
        logger.warning("ML service unavailable, using cache")
        
        # Fallback 1: Cached prediction
        cached = await redis.get(f"val:{product_id}")
        if cached:
            return ValuationResult(
                value=cached.value,
                confidence=cached.confidence * 0.9,
                source="cache",
                stale=True
            )
        
        # Fallback 2: Historical average
        avg = await db.get_category_average(product_id)
        if avg:
            return ValuationResult(
                value=avg,
                confidence=0.5,
                source="historical",
                degraded=True
            )
        
        # Final fallback: Graceful error
        raise ServiceDegradedError(
            message="Unable to value product",
            retry_after=60,
            suggestions=["Try again later"]
        )

Resilient by Design

Graceful degradation and self-healing for maximum uptime.

0.12% Error Rate94% Retry Success4.2m MTTR