20: Deployment Lifecycle & Operations¶
In this section, we describe how LocalCloudLab is operated day-to-day, including:
• Deployment lifecycle
• Release management
• Versioning strategy
• Promotion through environments (dev → staging → production)
• Rollback procedures
• Operational tasks
• On-call troubleshooting
• Maintenance routines
• Incident response workflow
This is your operations manual, enabling consistent, safe, repeatable deployments.
20.1 Deployment Lifecycle Overview¶
LocalCloudLab uses a structured lifecycle:
1. Developer commits code
2. GitHub Actions builds, tests, and creates container images
3. Images are pushed to GHCR
4. Kubernetes manifests are applied automatically (based on paths)
5. Envoy routes traffic to updated pods
6. Probes ensure new pods are ready
7. Cluster autoscaling adjusts load as needed
8. Observability tracks performance
The goal: Continuous Delivery with safety.
20.2 Versioning Strategy (Images & Deployments)¶
Images are tagged with:
<service>:<git-sha>
Example:
search-api:7c3f2b1
checkin-api:98ab4e2
This provides:
✓ Perfect traceability
✓ No version conflicts
✓ Easy rollback (switch image tag)
✓ Clean deployment history
For large releases: Also tag:
<service>:v1.0.0
But SHA remains the deployment source of truth.
20.3 Release Workflow (No GitOps Required)¶
LocalCloudLab uses a simple but powerful workflow:
Step 1 — Commit changes to main
Step 2 — GitHub Actions: Build + Push + Deploy
Step 3 — Deployment updated in Kubernetes
Step 4 — Envoy routes traffic
Step 5 — Observability monitors success/failure
If using branch protection rules, all commits require:
• PR review
• Static analysis
• CI pass
This ensures high quality.
20.4 Promotion Across Environments (Future-Ready)¶
Even though LocalCloudLab currently uses one cluster, the architecture supports:
• dev cluster
• staging cluster
• production cluster
Promotion strategies:
Option A — Manual Tag Promotion¶
Developer tags:
git tag prod-v1.0.3
git push --tags
GitHub Actions detects tag → deploys to production.
Option B — Environment-Specific Branches¶
dev → staging → prod
Option C — GitOps (future)¶
ArgoCD pulls manifests from specific folders:
/environments/dev
/environments/staging
/environments/prod
20.5 Deployment Strategies¶
Kubernetes supports:
1. Rolling Updates (default)¶
• Zero downtime
• Gradual replacement
• Can be paused/resumed
2. Recreate¶
• All old pods terminate first
• Only used for incompatible schema changes
3. Blue/Green (with Envoy)¶
• Deploy new version → route traffic gradually
4. Canary Release¶
• Send 1–10% of traffic to new version
• Increase gradually
Envoy Gateway already supports 3 & 4 from Section 15.
20.6 Rollback Procedures (Fast & Safe)¶
Rollback can happen at:
Level 1 — Kubernetes deployment¶
Switch image tag:
kubectl set image deployment/search-api-deployment search-api=ghcr.io/user/search-api:<previous-sha> -n search
Validate rollout:
kubectl rollout status deployment/search-api-deployment -n search
Level 2 — Revert code + redeploy¶
git revert <commit>
git push
Level 3 — Perform full cluster restore (disaster)¶
See Section 18 for DR.
20.7 Operational Tasks for the Administrator¶
Daily:
• Check pod health
• Check failing deployments
• Check logs for errors
• Check Redis/DB/Rabbit metrics
• Review Grafana dashboards
• Review open alerts
Weekly:
• Apply OS updates
• Review cluster events
• Clean unused images (docker prune)
• Verify backups
Monthly:
• Test DR restore procedure
• Security audit
• Certificate expiration check
20.8 Troubleshooting Guide (On-Call Manual)¶
Symptom: API returns 500¶
Check:
kubectl logs deployment/search-api-deployment -n search
Look for:
✗ Database unreachable
✗ Redis timeouts
✗ RabbitMQ disconnects
Symptom: Deployment stuck in rollout¶
kubectl describe deployment search-api-deployment -n search
Check:
✗ readiness probe failing
✗ image pull failure
✗ crash loop
Symptom: High latency¶
Check:
Grafana → Search API latency dashboard
Look at:
• p95 and p99 latency
• Redis get time
• DB queries
• RabbitMQ publish time
Symptom: Node pressure (OOM/CPU)¶
kubectl describe node
Look for:
MemoryPressure
DiskPressure
PIDPressure
Symptom: 429 Too Many Requests¶
Envoy is rate limiting → increase limits or fix client misuse.
20.9 Long-Term Maintenance¶
Every 6 months:
• Upgrade k3s
• Upgrade .NET runtime
• Rotate secrets
• Update Helm charts
• Refresh TLS keys
• Rebuild base Docker images
Annually:
• Security audit
• Cost optimization review
• Architectural review
• Observability tuning
20.10 Summary of Section 20¶
You now have:
✔ Complete deployment lifecycle
✔ Rolling update & rollback strategy
✔ Release management workflow
✔ Troubleshooting playbook
✔ Maintenance routine
✔ Disaster recovery integration
✔ Future-ready multi-environment promotion strategy
Your cluster is now fully operational, with predictable and safe deployments.
Next Section Begins Automatically:
Section 21 — LocalCloudLab Infrastructure Reference & Glossary