JK
JustKalm
Observability

Health Monitoring

Comprehensive health checks with readiness probes, liveness monitoring, and dependency health tracking for Kubernetes-native deployments.

99.97%

Uptime

Last 90 days

2.4M

Health Checks

Per day

12ms

Avg Response

Health endpoint

8/8

Dependencies

All healthy

Health Check Endpoints

Standardized health endpoints for monitoring.

# Health Check Endpoints

# /health - Basic health check (liveness)
GET /health
{
  "status": "healthy",
  "timestamp": "2024-12-14T10:30:00Z",
  "version": "1.2.3",
  "uptime_seconds": 86400
}

# /health/ready - Readiness check
GET /health/ready
{
  "status": "ready",
  "checks": {
    "database": { "status": "up", "latency_ms": 2 },
    "redis": { "status": "up", "latency_ms": 1 },
    "ml_service": { "status": "up", "latency_ms": 15 }
  }
}

# /health/detailed - Full diagnostics
GET /health/detailed
Authorization: Bearer <internal-token>
{
  "status": "healthy",
  "checks": { ... },
  "metrics": {
    "memory_mb": 512,
    "cpu_percent": 25,
    "open_connections": 45,
    "queue_depth": 12
  }
}

Implementation

# FastAPI Health Check Implementation

from fastapi import APIRouter
from datetime import datetime

router = APIRouter(prefix="/health", tags=["Health"])
start_time = datetime.utcnow()

@router.get("")
async def liveness():
    """Kubernetes liveness probe"""
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "version": settings.VERSION,
        "uptime_seconds": (
            datetime.utcnow() - start_time
        ).total_seconds()
    }

@router.get("/ready")
async def readiness(
    db: AsyncSession = Depends(get_db),
    redis: Redis = Depends(get_redis),
):
    """Kubernetes readiness probe"""
    checks = {}
    
    # Check database
    try:
        await db.execute(text("SELECT 1"))
        checks["database"] = {"status": "up"}
    except Exception as e:
        checks["database"] = {"status": "down", "error": str(e)}
    
    # Check Redis
    try:
        await redis.ping()
        checks["redis"] = {"status": "up"}
    except Exception:
        checks["redis"] = {"status": "down"}
    
    all_up = all(c["status"] == "up" for c in checks.values())
    
    return JSONResponse(
        status_code=200 if all_up else 503,
        content={"status": "ready" if all_up else "not_ready", "checks": checks}
    )

Always-On Reliability

Comprehensive health monitoring ensures maximum uptime.

99.97% Uptime8/8 Dependencies Healthy12ms Health Check