17: Scaling & Resilience (HPA, Probes, PDB, Retries, Graceful Shutdown)¶

This section teaches you how to make your microservices resilient, scalable, and stable under load, traffic spikes, network failures, and rolling updates.

We cover:

• Horizontal Pod Autoscaling (HPA)
• Resource Requests & Limits
• Liveness & Readiness Probes
• PodDisruptionBudget (PDB)
• Retry & backoff strategies
• Graceful shutdown & connection draining
• How Envoy balances traffic under scaling events
• Anti-patterns to avoid

This is essential for taking LocalCloudLab from “works” to “production-grade.”

17.1 Why Scaling & Resilience Matter¶

A microservice is resilient when it can:

✓ survive high traffic
✓ recover from failures automatically
✓ scale up when needed
✓ scale down to save costs
✓ restart safely without losing work
✓ avoid cascading failures
✓ stay responsive under partial outages

A microservice is scalable when:

✓ horizontal scaling is predictable
✓ CPU/memory patterns are understood
✓ workloads are measurable and observable

Kubernetes provides these guarantees only if services are configured correctly.

17.2 Resource Requests & Limits¶

Every Deployment must define:

• requests (minimum resources guaranteed)
• limits (maximum allowed resources)

Example for Search API:

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "600m"
    memory: "512Mi"

Why we need requests?¶

Kubernetes uses requests to schedule pods onto nodes. If you don’t define them, pods may be:

✗ scheduled poorly
✗ evicted due to low memory
✗ competing for CPU

Why we need limits?¶

Limits prevent:

✗ runaway CPU spikes
✗ memory leaks killing the node
✗ noisy neighbors

For LocalCloudLab, recommended defaults:

• Search API: CPU 200m–600m, RAM 256–512Mi
• Checkin API: CPU 150m–500m, RAM 256–512Mi

17.3 Horizontal Pod Autoscaling (HPA)¶

HPA adjusts the number of pods based on metrics.

Enable metrics server (k3s includes it by default):

kubectl get deployment metrics-server -n kube-system

Example: HPA for Search API¶

Create:

k8s/search-api/hpa.yaml

Contents:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: search-api-hpa
  namespace: search
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: search-api-deployment
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Meaning:

• Start with 2 pods
• Scale up to 6
• Trigger scaling when average CPU > 70%

Search traffic is often spiky → HPA helps smooth load.

17.4 Readiness & Liveness Probes¶

Probes tell Kubernetes when:

• the service is ready to accept traffic (readiness)
• the service is healthy (liveness)

Readiness Probe¶

Search API should NOT accept traffic until:

• Database connection is initialized
• Redis is reachable
• Startup completed

Example:

readinessProbe:
  httpGet:
    path: /health/ready
    port: 80
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

Liveness Probe¶

Detects crashes or deadlocks:

livenessProbe:
  httpGet:
    path: /health/live
    port: 80
  periodSeconds: 10
  failureThreshold: 5

Why both matter?¶

If readiness fails → pod removed from load balancer If liveness fails → pod restarted

17.5 Startup Probe (Recommended)¶

Protects against slow startup causing false failures.

Example:

startupProbe:
  httpGet:
    path: /health/startup
    port: 80
  failureThreshold: 30
  periodSeconds: 5

This gives up to 30 × 5 = 150 seconds before Kubernetes kills the pod during slow boot.

17.6 PodDisruptionBudget (PDB)¶

Prevents Kubernetes or cluster events from killing too many pods at once.

Example:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: search-api-pdb
  namespace: search
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: search-api

Meaning:

• At least 1 pod must always be running
• Protects against node drains, updates, auto-repairs

17.7 Retry & Backoff Strategy (Application & Envoy)¶

Application-Level Retry (C#)¶

Use Polly or built-in HttpClient retry handler:

Example:

services.AddHttpClient("RemoteSearch")
    .AddTransientHttpErrorPolicy(p => p.WaitAndRetryAsync(
        retryCount: 3,
        sleepDurationProvider: attempt => TimeSpan.FromMilliseconds(200 * attempt)
    ));

Envoy BackendPolicy Retry¶

retry:
  attempts: 3
  retryOn:
  - gateway-errors
  - connection-failure
  perTryTimeout: 2s

Use retries carefully:

✔ transient failures
✗ persistent failures (causing overload)

17.8 Graceful Shutdown in .NET¶

A service must:

• finish in-flight requests
• reject new requests
• close RabbitMQ connections
• flush logs

Program.cs:

builder.WebHost.ConfigureKestrel(opt =>
{
    opt.AddServerHeader = false;
    opt.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(30);
});

app.Lifetime.ApplicationStopping.Register(() =>
{
    // cleanup logic
    rabbitMqChannel?.Close();
    rabbitMqConnection?.Close();
});

.NET automatically waits:

• 5 seconds grace period (configurable via Kubernetes)

17.9 Envoy + HPA Interaction¶

When traffic spikes:

1. Envoy receives more incoming requests
2. CPU rises on pods
3. HPA detects > 70% CPU
4. HPA creates new pods
5. Readiness probe ensures new pods are warmed up
6. Envoy distributes traffic to new pods

This delivers smooth, automated scaling.

17.10 Anti-Patterns to Avoid¶

❌ No resource limits → Risk: Node OOM kills entire cluster

❌ No readiness probes → Risk: Unhealthy pods receive traffic → timeouts

❌ Retry everything indefinitely → Risk: Cascading failures & overload

❌ Log every request → Risk: Loki storage explosion

❌ HPA with maxReplicas=1 → Zero scalability

17.11 Summary of Section 17¶

You now have:

✔ Resource requests & limits for stability
✔ Horizontal autoscaling with metrics
✔ Full probe strategy (startup, readiness, liveness)
✔ Pod disruption protection (PDB)
✔ Retry & backoff strategies
✔ Graceful shutdown and connection draining
✔ A resilient, production-ready microservice cluster

Next section will begin automatically:

Section 18 — Storage, Backups & Disaster Recovery (PostgreSQL, Redis, RabbitMQ)