20: Deployment Lifecycle & Operations¶

In this section, we describe how LocalCloudLab is operated day-to-day, including:

• Deployment lifecycle
• Release management
• Versioning strategy
• Promotion through environments (dev → staging → production)
• Rollback procedures
• Operational tasks
• On-call troubleshooting
• Maintenance routines
• Incident response workflow

This is your operations manual, enabling consistent, safe, repeatable deployments.

20.1 Deployment Lifecycle Overview¶

LocalCloudLab uses a structured lifecycle:

1. Developer commits code
2. GitHub Actions builds, tests, and creates container images
3. Images are pushed to GHCR
4. Kubernetes manifests are applied automatically (based on paths)
5. Envoy routes traffic to updated pods
6. Probes ensure new pods are ready
7. Cluster autoscaling adjusts load as needed
8. Observability tracks performance

The goal: Continuous Delivery with safety.

20.2 Versioning Strategy (Images & Deployments)¶

Images are tagged with:

<service>:<git-sha>

Example:

search-api:7c3f2b1
checkin-api:98ab4e2

This provides:

✓ Perfect traceability
✓ No version conflicts
✓ Easy rollback (switch image tag)
✓ Clean deployment history

For large releases: Also tag:

<service>:v1.0.0

But SHA remains the deployment source of truth.

20.3 Release Workflow (No GitOps Required)¶

LocalCloudLab uses a simple but powerful workflow:

Step 1 — Commit changes to main
Step 2 — GitHub Actions: Build + Push + Deploy
Step 3 — Deployment updated in Kubernetes
Step 4 — Envoy routes traffic
Step 5 — Observability monitors success/failure

If using branch protection rules, all commits require:

• PR review
• Static analysis
• CI pass

This ensures high quality.

20.4 Promotion Across Environments (Future-Ready)¶

Even though LocalCloudLab currently uses one cluster, the architecture supports:

• dev cluster
• staging cluster
• production cluster

Promotion strategies:

Option A — Manual Tag Promotion¶

Developer tags:

git tag prod-v1.0.3
git push --tags

GitHub Actions detects tag → deploys to production.

Option B — Environment-Specific Branches¶

dev → staging → prod

Option C — GitOps (future)¶

ArgoCD pulls manifests from specific folders:

/environments/dev
/environments/staging
/environments/prod

20.5 Deployment Strategies¶

Kubernetes supports:

1. Rolling Updates (default)¶

• Zero downtime
• Gradual replacement
• Can be paused/resumed

2. Recreate¶

• All old pods terminate first
• Only used for incompatible schema changes

3. Blue/Green (with Envoy)¶

• Deploy new version → route traffic gradually

4. Canary Release¶

• Send 1–10% of traffic to new version
• Increase gradually

Envoy Gateway already supports 3 & 4 from Section 15.

20.6 Rollback Procedures (Fast & Safe)¶

Rollback can happen at:

Level 1 — Kubernetes deployment¶

Switch image tag:

kubectl set image deployment/search-api-deployment       search-api=ghcr.io/user/search-api:<previous-sha>       -n search

Validate rollout:

kubectl rollout status deployment/search-api-deployment -n search

Level 2 — Revert code + redeploy¶

git revert <commit>
git push

Level 3 — Perform full cluster restore (disaster)¶

See Section 18 for DR.

20.7 Operational Tasks for the Administrator¶

Daily:

• Check pod health
• Check failing deployments
• Check logs for errors
• Check Redis/DB/Rabbit metrics
• Review Grafana dashboards
• Review open alerts

Weekly:

• Apply OS updates
• Review cluster events
• Clean unused images (docker prune)
• Verify backups

Monthly:

• Test DR restore procedure
• Security audit
• Certificate expiration check

20.8 Troubleshooting Guide (On-Call Manual)¶

Symptom: API returns 500¶

Check:

kubectl logs deployment/search-api-deployment -n search

Look for:

✗ Database unreachable
✗ Redis timeouts
✗ RabbitMQ disconnects

Symptom: Deployment stuck in rollout¶

kubectl describe deployment search-api-deployment -n search

Check:

✗ readiness probe failing
✗ image pull failure
✗ crash loop

Symptom: High latency¶

Check:

Grafana → Search API latency dashboard

Look at:

• p95 and p99 latency
• Redis get time
• DB queries
• RabbitMQ publish time

Symptom: Node pressure (OOM/CPU)¶

kubectl describe node

Look for:

MemoryPressure
DiskPressure
PIDPressure

Symptom: 429 Too Many Requests¶

Envoy is rate limiting → increase limits or fix client misuse.

20.9 Long-Term Maintenance¶

Every 6 months:

• Upgrade k3s
• Upgrade .NET runtime
• Rotate secrets
• Update Helm charts
• Refresh TLS keys
• Rebuild base Docker images

Annually:

• Security audit
• Cost optimization review
• Architectural review
• Observability tuning

20.10 Summary of Section 20¶

You now have:

✔ Complete deployment lifecycle
✔ Rolling update & rollback strategy
✔ Release management workflow
✔ Troubleshooting playbook
✔ Maintenance routine
✔ Disaster recovery integration
✔ Future-ready multi-environment promotion strategy

Your cluster is now fully operational, with predictable and safe deployments.

Next Section Begins Automatically:

Section 21 — LocalCloudLab Infrastructure Reference & Glossary