Runbook: Pod Crash Looping
Alert: OpenClawPodCrashLooping
Meaning
An OpenClaw pod has restarted more than 2 times in the last 10 minutes, indicating a crash loop.
Impact
The instance is intermittently unavailable. Users experience connection drops and may lose in-flight work.
Diagnosis
# Check pod status and restart count
kubectl get pods -n <namespace> -l app.kubernetes.io/instance=<name>
# Check last termination reason
kubectl get pod <name>-0 -n <namespace> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
# Check container logs (including previous incarnation)
kubectl logs <name>-0 -n <namespace> -c openclaw --previous --tail=100
kubectl logs <name>-0 -n <namespace> -c openclaw --tail=100
# Check events
kubectl describe pod <name>-0 -n <namespace>
# Check resource limits
kubectl get pod <name>-0 -n <namespace> -o jsonpath='{.spec.containers[0].resources}'
Mitigation
- OOMKilled - Increase memory limits (see Pod OOM Killed runbook)
- Application error - Check container logs for stack traces or startup errors
- Configuration error - Verify the OpenClaw config is valid
- Missing dependencies - Ensure required skills and MCP servers are available
- Liveness probe too aggressive - Increase
failureThresholdortimeoutSeconds