Troubleshooting

This guide covers common issues and how to diagnose them.

Checking Operator Logs

The operator logs are the first place to look for any issue.

kubectl get pods -n openclaw-system -l app.kubernetes.io/name=openclaw-operator
kubectl logs -n openclaw-system -l app.kubernetes.io/name=openclaw-operator -f
kubectl logs -n openclaw-system -l app.kubernetes.io/name=openclaw-operator --all-containers

Checking Events

Kubernetes events provide a timeline of what happened to your resources.

kubectl describe openclawinstance my-assistant -n openclaw
kubectl get events -n openclaw --sort-by='.lastTimestamp'
kubectl get events -n openclaw --watch

Checking Instance Status

kubectl get openclawinstance -n openclaw
kubectl get openclawinstance my-assistant -n openclaw -o yaml | grep -A 50 'status:'
kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'

Common Issues

Instance Stuck in Pending

Symptoms: The instance stays in Pending phase and never transitions to Provisioning.

Possible causes and solutions:

  1. Operator is not running:

    kubectl get pods -n openclaw-system
    

    Verify the operator pod is Running and ready. If it is in CrashLoopBackOff, check its logs.

  2. CRD not installed or outdated:

    kubectl get crd openclawinstances.openclaw.rocks
    

    If the CRD is missing, install it. If you upgraded the operator but new fields are rejected as “field not declared in schema”, the CRD is outdated:

    kubectl apply --server-side -f config/crd/bases/
    
  3. RBAC issues with the operator:

    kubectl auth can-i get openclawinstances --as=system:serviceaccount:openclaw-system:openclaw-operator -n openclaw
    
  4. Webhook blocking the request: Check for admission webhook errors in the API server logs or the operator logs.

Instance Stuck in Provisioning

Symptoms: The instance transitions to Provisioning but never reaches Running.

  1. Resource creation failing silently: Check operator logs for errors:

    kubectl logs -n openclaw-system deploy/openclaw-operator | grep -i error
    
  2. Resource quota exceeded:

    kubectl describe resourcequota -n openclaw
    
  3. Deployment not becoming ready: Check the pod:

    kubectl get pods -n openclaw -l app.kubernetes.io/instance=my-assistant
    kubectl describe pod -n openclaw -l app.kubernetes.io/instance=my-assistant
    

Instance in Failed State

Symptoms: The instance phase is Failed. The Ready condition shows status: "False" with a reason.

kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'
kubectl describe openclawinstance my-assistant -n openclaw

Common failure reasons:

  1. Image pull errors: Look for ImagePullBackOff or ErrImagePull. Verify the image repository/tag and pull secrets.

  2. Insufficient resources: Look for FailedScheduling events. Reduce resource requests or add capacity.

  3. ConfigMap or Secret not found: Verify referenced ConfigMaps and Secrets exist.

NetworkPolicy Blocking Traffic

Symptoms: Instance is Running but cannot reach external APIs or other pods cannot reach the instance.

  1. Verify the NetworkPolicy exists:

    kubectl get networkpolicy -n openclaw
    kubectl describe networkpolicy my-assistant -n openclaw
    
  2. Instance cannot reach AI APIs: The default NetworkPolicy allows egress to port 443 and 53. If the AI provider uses a non-standard port, add it to allowedEgressCIDRs.

  3. DNS resolution failing: Ensure allowDNS is true.

  4. Other pods cannot reach the instance: Add namespaces to allowedIngressNamespaces.

  5. Verify with a test pod:

    kubectl run -n openclaw test-curl --rm -it --image=curlimages/curl -- \
      curl -v http://my-assistant:18789
    

PVC Not Binding

Symptoms: Pod stuck in Pending or PVC shows Pending.

kubectl get pvc -n openclaw
kubectl describe pvc my-assistant-data -n openclaw
  1. StorageClass does not exist: Verify the storageClass exists.
  2. No capacity available: Check provisioner logs.
  3. Access mode incompatibility: Use ReadWriteOnce (the default).
  4. Zone mismatch: Ensure nodes and storage are in the same zone.

Webhook Errors

Symptoms: Creating or updating an OpenClawInstance fails with a webhook error.

  1. Webhook not enabled: Check webhook configurations exist.
  2. cert-manager not installed: Check certificate status.
  3. Webhook Service not reachable: Check service and endpoints.
  4. Bypass temporarily (last resort): Delete the webhook configuration.

Ingress Not Working

Symptoms: Ingress is created but traffic does not reach the instance.

  1. IngressClass not found: Verify the className matches an installed IngressClass.
  2. Ingress controller not installed: Verify the controller is running.
  3. TLS Secret missing: Check the referenced Secret exists.
  4. DNS not pointing to Ingress: Verify DNS resolution.
  5. NetworkPolicy blocking the Ingress controller: Add the controller’s namespace to allowedIngressNamespaces.

Chromium Sidecar Issues

Symptoms: Chromium sidecar not starting, crashing, or browser automation fails.

  1. Check sidecar status and logs
  2. Insufficient shared memory: Increase memory limits
  3. Insufficient resources: Increase CPU/memory limits
  4. Security context restrictions: Check for SCC violations

Operator CrashLoopBackOff

Symptoms: The operator pod itself is restarting.

kubectl logs -n openclaw-system deploy/openclaw-operator --previous
  1. Leader election failure: Check for stale leases.
  2. Missing CRD: Verify CRD is installed.
  3. Insufficient RBAC: Verify ClusterRole and ClusterRoleBinding.
  4. Webhook certificate issues: Check certificate provisioning.

Useful Commands Reference

kubectl get openclawinstance -A
kubectl get openclawinstance my-assistant -n openclaw \
  -o jsonpath='{.status.managedResources}' | jq .
kubectl annotate openclawinstance my-assistant -n openclaw \
  force-reconcile=$(date +%s) --overwrite
kubectl get openclawinstance my-assistant -n openclaw -o yaml