Troubleshooting

This guide covers common issues and how to diagnose them.

Checking Operator Logs

The operator logs are the first place to look for any issue.

kubectl get pods -n openclaw-system -l app.kubernetes.io/name=openclaw-operator
kubectl logs -n openclaw-system -l app.kubernetes.io/name=openclaw-operator -f
kubectl logs -n openclaw-system -l app.kubernetes.io/name=openclaw-operator --all-containers

Checking Events

Kubernetes events provide a timeline of what happened to your resources.

kubectl describe openclawinstance my-assistant -n openclaw
kubectl get events -n openclaw --sort-by='.lastTimestamp'
kubectl get events -n openclaw --watch

Checking Instance Status

kubectl get openclawinstance -n openclaw
kubectl get openclawinstance my-assistant -n openclaw -o yaml | grep -A 50 'status:'
kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'

Common Issues

Instance Stuck in Pending

Symptoms: The instance stays in Pending phase and never transitions to Provisioning.

Possible causes and solutions:

Operator is not running:
```
kubectl get pods -n openclaw-system
```
Verify the operator pod is Running and ready. If it is in CrashLoopBackOff, check its logs.
CRD not installed or outdated:
```
kubectl get crd openclawinstances.openclaw.rocks
```
If the CRD is missing, install it. If you upgraded the operator but new fields are rejected as “field not declared in schema”, the CRD is outdated:
```
kubectl apply --server-side -f config/crd/bases/
```

RBAC issues with the operator:

kubectl auth can-i get openclawinstances --as=system:serviceaccount:openclaw-system:openclaw-operator -n openclaw

Webhook blocking the request: Check for admission webhook errors in the API server logs or the operator logs.

Instance Stuck in Provisioning

Symptoms: The instance transitions to Provisioning but never reaches Running.

Resource creation failing silently: Check operator logs for errors:

kubectl logs -n openclaw-system deploy/openclaw-operator | grep -i error

Resource quota exceeded:

kubectl describe resourcequota -n openclaw

Deployment not becoming ready: Check the pod:

kubectl get pods -n openclaw -l app.kubernetes.io/instance=my-assistant
kubectl describe pod -n openclaw -l app.kubernetes.io/instance=my-assistant

Instance in Failed State

Symptoms: The instance phase is Failed. The Ready condition shows status: "False" with a reason.

kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'
kubectl describe openclawinstance my-assistant -n openclaw

Common failure reasons:

Image pull errors: Look for ImagePullBackOff or ErrImagePull. Verify the image repository/tag and pull secrets.
Insufficient resources: Look for FailedScheduling events. Reduce resource requests or add capacity.
ConfigMap or Secret not found: Verify referenced ConfigMaps and Secrets exist.

NetworkPolicy Blocking Traffic

Symptoms: Instance is Running but cannot reach external APIs or other pods cannot reach the instance.

Verify the NetworkPolicy exists:

kubectl get networkpolicy -n openclaw
kubectl describe networkpolicy my-assistant -n openclaw

Instance cannot reach AI APIs: The default NetworkPolicy allows egress to port 443 and 53. If the AI provider uses a non-standard port, add it to allowedEgressCIDRs.
DNS resolution failing: Ensure allowDNS is true.
Other pods cannot reach the instance: Add namespaces to allowedIngressNamespaces.

Verify with a test pod:

kubectl run -n openclaw test-curl --rm -it --image=curlimages/curl -- \
  curl -v http://my-assistant:18789

PVC Not Binding

Symptoms: Pod stuck in Pending or PVC shows Pending.

kubectl get pvc -n openclaw
kubectl describe pvc my-assistant-data -n openclaw

StorageClass does not exist: Verify the storageClass exists.
No capacity available: Check provisioner logs.
Access mode incompatibility: Use ReadWriteOnce (the default).
Zone mismatch: Ensure nodes and storage are in the same zone.

Webhook Errors

Symptoms: Creating or updating an OpenClawInstance fails with a webhook error.

Webhook not enabled: Check webhook configurations exist.
cert-manager not installed: Check certificate status.
Webhook Service not reachable: Check service and endpoints.
Bypass temporarily (last resort): Delete the webhook configuration.

Ingress Not Working

Symptoms: Ingress is created but traffic does not reach the instance.

IngressClass not found: Verify the className matches an installed IngressClass.
Ingress controller not installed: Verify the controller is running.
TLS Secret missing: Check the referenced Secret exists.
DNS not pointing to Ingress: Verify DNS resolution.
NetworkPolicy blocking the Ingress controller: Add the controller’s namespace to allowedIngressNamespaces.

Chromium Sidecar Issues

Symptoms: Chromium sidecar not starting, crashing, or browser automation fails.

Check sidecar status and logs
Insufficient shared memory: Increase memory limits
Insufficient resources: Increase CPU/memory limits
Security context restrictions: Check for SCC violations

Operator CrashLoopBackOff

Symptoms: The operator pod itself is restarting.

kubectl logs -n openclaw-system deploy/openclaw-operator --previous

Leader election failure: Check for stale leases.
Missing CRD: Verify CRD is installed.
Insufficient RBAC: Verify ClusterRole and ClusterRoleBinding.
Webhook certificate issues: Check certificate provisioning.

Useful Commands Reference

kubectl get openclawinstance -A
kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.managedResources}' | jq .
kubectl annotate openclawinstance my-assistant -n openclaw \
  force-reconcile=$(date +%s) --overwrite
kubectl get openclawinstance my-assistant -n openclaw -o yaml