Security researchers have found over 135,000 OpenClaw instances sitting wide open on the internet. Many of them were vulnerable to remote code execution. Running OpenClaw on a VPS with docker run is easy. Running it securely is a different problem.

Kubernetes solves that problem. You get network isolation, resource limits, automated restarts, and security defaults that would take hours to configure by hand. And with the OpenClaw Kubernetes Operator, you get all of it from a single YAML file.

This guide takes you from zero to a production-ready OpenClaw agent on Kubernetes. Every YAML block is copy-paste ready.

Why an operator

Running OpenClaw on Kubernetes is more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, config rollouts, and optionally browser automation. Wiring all of that correctly by hand is tedious and error-prone.

A Kubernetes operator encodes these concerns into a single custom resource. You declare what you want, and the operator continuously reconciles it into the right set of Kubernetes objects. That gives you:

  • Security by default. Every agent runs as UID 1000, all Linux capabilities dropped, seccomp enabled, read-only root filesystem, and a default-deny NetworkPolicy that only allows DNS and HTTPS egress. No manual hardening needed.
  • Auto-updates with rollback. The operator polls the OCI registry for new versions, backs up the workspace, rolls out the update, and automatically rolls back if the new pod fails health checks.
  • Config rollouts. Change your spec.config.raw and the operator detects the content hash changed, triggers a rolling update. Same for secret rotation.
  • Backup and restore. Automatic workspace backup to S3-compatible storage on instance deletion. Restore into a new instance from any snapshot.
  • Gateway auth. Auto-generates a gateway token per instance. No manual pairing, no mDNS (which does not work in Kubernetes anyway).
  • Drift detection. Every 5 minutes, the operator checks that every managed resource matches the desired state. If someone manually edits a NetworkPolicy or deletes a PDB, it gets reconciled back.

Prerequisites

You need:

  • A Kubernetes cluster (1.28+). Any conformant distribution works: EKS, GKE, AKS, k3s, or a local Kind cluster for testing.
  • kubectl configured to talk to your cluster.
  • helm v3 installed.
  • An API key for your AI provider (Anthropic, OpenAI, or any OpenAI-compatible endpoint).

Step 1: Install the operator

The operator ships as an OCI Helm chart. One command installs it:

helm install openclaw-operator \
  oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \
  --namespace openclaw-operator-system \
  --create-namespace

Verify it is running:

kubectl get pods -n openclaw-operator-system

You should see the operator pod in Running state. The operator also installs a validating webhook that prevents insecure configurations (like running as root).

Step 2: Create your API key secret

Store your AI provider API key in a Kubernetes Secret. The operator will inject it into the agent container:

kubectl create namespace openclaw

kubectl create secret generic openclaw-api-keys \
  --namespace openclaw \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here

For OpenAI or other providers, use the appropriate environment variable name (OPENAI_API_KEY, OPENROUTER_API_KEY, etc.). You can include multiple providers in the same Secret.

Tip: For production, consider using External Secrets Operator to sync keys from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault. The operator’s docs have detailed examples.

Step 3: Deploy your first agent

Create a file called my-agent.yaml:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent
  namespace: openclaw
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  config:
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"
  storage:
    persistence:
      enabled: true
      size: 10Gi

Apply it:

kubectl apply -f my-agent.yaml

That single resource creates a StatefulSet, Service, ServiceAccount, Role, RoleBinding, ConfigMap, PVC, PDB, NetworkPolicy, and a gateway token Secret. The operator reconciles all of it.

Step 4: Verify it is running

Watch the instance come up:

kubectl get openclawinstances -n openclaw -w
NAME       PHASE        READY   AGE
my-agent   Provisioning False   10s
my-agent   Running      True    45s

Once the phase shows Running and Ready is True, your agent is live. Check the logs:

kubectl logs -n openclaw statefulset/my-agent -f

To interact with your agent, port-forward the gateway:

kubectl port-forward -n openclaw svc/my-agent 18789:18789

Then open http://localhost:18789 in your browser.

Step 5: Connect a channel

OpenClaw supports Telegram, Discord, WhatsApp, Signal, and other messaging channels. Each channel is configured through environment variables. Add the relevant token to your Secret:

kubectl create secret generic openclaw-channel-keys \
  --namespace openclaw \
  --from-literal=TELEGRAM_BOT_TOKEN=your-bot-token-here

Then reference it in your instance:

spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
    - secretRef:
        name: openclaw-channel-keys

OpenClaw auto-detects the token and enables the channel. No additional config needed.


That covers the basics. Your agent is running, secured, and reachable. The rest of this guide covers optional features you can enable when you are ready.

Browser automation

OpenClaw can browse the web, take screenshots, and interact with pages. The operator makes this a one-line addition. It runs a hardened Chromium sidecar in the same pod, connected over localhost:

spec:
  chromium:
    enabled: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi

The operator automatically injects a CHROMIUM_URL environment variable into the main container. The sidecar runs as UID 1001 with a read-only root filesystem and its own security context.

Skills and runtime dependencies

OpenClaw skills from ClawHub can be installed declaratively. The operator runs an init container that fetches each skill before the agent starts:

spec:
  skills:
    - "@anthropic/mcp-server-fetch"
    - "@anthropic/mcp-server-filesystem"

If your skills or MCP servers need pnpm or Python, enable the built-in runtime dependency init containers:

spec:
  runtimeDeps:
    pnpm: true    # Installs pnpm via corepack
    python: true  # Installs Python 3.12 + uv

The init containers install these tools to the data PVC, so they persist across restarts without bloating the container image.

Auto-updates

OpenClaw releases new versions frequently. The operator can track these automatically, back up before updating, and roll back if something goes wrong:

spec:
  autoUpdate:
    enabled: true
    checkInterval: "12h"
    backupBeforeUpdate: true
    rollbackOnFailure: true
    healthCheckTimeout: "10m"

When a new version appears in the registry, the operator:

  1. Creates a backup of the workspace PVC to S3-compatible storage
  2. Updates the image tag on the StatefulSet
  3. Waits up to healthCheckTimeout for the pod to pass readiness checks
  4. If the pod fails to become ready, restores the previous image tag and the backup

After 3 consecutive failed rollbacks, the operator pauses auto-update and sets a condition so you can investigate.

Note: Auto-update is a no-op for digest-pinned images (spec.image.digest). If you pin by digest, you control updates manually.

Production hardening

The operator ships secure by default. Here are the additional knobs for production deployments.

Monitor with Prometheus

Enable the ServiceMonitor to scrape operator and instance metrics:

spec:
  observability:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
        interval: "30s"

The operator exposes openclaw_reconcile_total, openclaw_reconcile_duration_seconds, openclaw_instance_phase, and auto-update counters.

Schedule on dedicated nodes

If you run a mixed cluster, use nodeSelector and tolerations to pin agents to dedicated nodes:

spec:
  availability:
    nodeSelector:
      openclaw.rocks/nodepool: openclaw
    tolerations:
      - key: openclaw.rocks/dedicated
        value: openclaw
        effect: NoSchedule

Add custom egress rules

The default NetworkPolicy only allows DNS (port 53) and HTTPS (port 443). If your agent needs to reach other services (a database, a message queue, an internal API), add egress rules:

spec:
  security:
    networkPolicy:
      additionalEgress:
        - to:
            - ipBlock:
                cidr: 10.0.0.0/8
          ports:
            - port: 5432
              protocol: TCP

Cloud provider identity

For AWS IRSA or GCP Workload Identity, annotate the managed ServiceAccount:

spec:
  security:
    rbac:
      serviceAccountAnnotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/openclaw"

Corporate proxies and private CAs

If your cluster uses a TLS-intercepting proxy, inject a CA bundle:

spec:
  security:
    caBundle:
      configMapName: corporate-ca-bundle
      key: ca-bundle.crt

The operator mounts it into all containers and sets SSL_CERT_FILE and NODE_EXTRA_CA_CERTS automatically.

GitOps

The OpenClawInstance CRD is a plain YAML file. That means it fits directly into a GitOps workflow. Store your agent manifests in a git repo, and let ArgoCD or Flux sync them to your cluster.

A typical repo structure:

gitops/
└── agents/
    ├── kustomization.yaml
    ├── namespace.yaml
    ├── agent-a.yaml
    └── agent-b.yaml

Every change goes through a pull request. Your team reviews the diff. Merge to main, and ArgoCD applies it. No kubectl apply from laptops, no configuration drift, full audit trail.

The operator’s config hashing makes this especially smooth. When ArgoCD syncs a changed spec.config.raw, the operator detects the content hash changed and triggers a rolling update automatically. Same for secret rotation: the operator watches referenced Secrets and rolls pods when they change.

Backup and restore

The operator supports S3-compatible backups. When you delete an instance, the operator automatically creates a backup of the workspace PVC before teardown.

To restore an agent from a backup into a new instance:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent-restored
  namespace: openclaw
spec:
  restoreFrom: "s3://bucket/path/to/backup.tar.gz"
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  storage:
    persistence:
      enabled: true
      size: 10Gi

The operator downloads the snapshot, unpacks it into the PVC, and starts the agent with all previous workspace data, skills, and conversation history intact.

Custom sidecars and init containers

The operator supports arbitrary sidecars and init containers for advanced use cases. Run Ollama as a sidecar for local inference, a Cloud SQL Proxy for database access, or a log forwarder alongside your agent:

spec:
  sidecars:
    - name: ollama
      image: ollama/ollama:latest
      ports:
        - containerPort: 11434
      resources:
        requests:
          cpu: "2"
          memory: 4Gi
  sidecarVolumes:
    - name: ollama-models
      persistentVolumeClaim:
        claimName: ollama-models-pvc

Custom init containers run after the operator’s own init pipeline (config seeding, pnpm, Python, skills):

spec:
  initContainers:
    - name: fetch-data
      image: curlimages/curl:8.5.0
      command: ["sh", "-c", "curl -o /data/dataset.json https://..."]
      volumeMounts:
        - name: data
          mountPath: /data

Config merge mode

By default, the operator overwrites the config file on every pod restart. If your agent modifies its own config at runtime (through skills or self-modification), set mergeMode: merge to deep-merge operator config with the existing PVC config:

spec:
  config:
    mergeMode: merge
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"

In merge mode, operator-specified keys win, but keys the agent added on its own survive restarts.

The complete example

Here is a production-ready manifest that combines everything from this guide:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: production-agent
  namespace: openclaw
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys

  config:
    mergeMode: merge
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"

  skills:
    - "@anthropic/mcp-server-fetch"

  runtimeDeps:
    pnpm: true

  chromium:
    enabled: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi

  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

  storage:
    persistence:
      enabled: true
      size: 10Gi

  autoUpdate:
    enabled: true
    checkInterval: "24h"
    backupBeforeUpdate: true
    rollbackOnFailure: true

  observability:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true

  # Remove or adjust these if you don't use dedicated nodes
  availability:
    nodeSelector:
      openclaw.rocks/nodepool: openclaw
    tolerations:
      - key: openclaw.rocks/dedicated
        value: openclaw
        effect: NoSchedule

Apply it, and you have a hardened, auto-updating, browser-capable AI agent with monitoring, backup, and network isolation. One kubectl apply.

What you get out of the box

Without touching a single security setting, every agent deployed by the operator ships with:

  • Non-root execution (UID 1000)
  • Read-only root filesystem
  • All Linux capabilities dropped
  • Seccomp RuntimeDefault profile
  • Default-deny NetworkPolicy (DNS + HTTPS egress only)
  • Per-instance ServiceAccount with no token auto-mounting
  • PodDisruptionBudget
  • Liveness, readiness, and startup probes
  • Auto-generated gateway authentication token
  • 5-minute drift reconciliation

A validating webhook blocks attempts to run as root and warns about disabled NetworkPolicies, missing TLS on Ingress, and undetected AI provider keys.

Next steps

If you run into issues or have feedback, open an issue on GitHub. PRs are welcome too.

If you do not want to operate Kubernetes yourself, OpenClaw.rocks handles all of this for you. Pick a plan, connect a channel, and your agent is live in seconds.