How to Deploy OpenClaw on Kubernetes
Security researchers have found over 135,000 OpenClaw instances sitting wide open on the internet. Many of them were vulnerable to remote code execution. Running OpenClaw on a VPS with docker run is easy. Running it securely is a different problem.
Kubernetes solves that problem. You get network isolation, resource limits, automated restarts, and security defaults that would take hours to configure by hand. And with the OpenClaw Kubernetes Operator, you get all of it from a single YAML file.
This guide takes you from zero to a production-ready OpenClaw agent on Kubernetes. Every YAML block is copy-paste ready.
Why an operator
Running OpenClaw on Kubernetes is more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, config rollouts, and optionally browser automation. Wiring all of that correctly by hand is tedious and error-prone.
A Kubernetes operator encodes these concerns into a single custom resource. You declare what you want, and the operator continuously reconciles it into the right set of Kubernetes objects. That gives you:
- Security by default. Every agent runs as UID 1000, all Linux capabilities dropped, seccomp enabled, read-only root filesystem, and a default-deny NetworkPolicy that only allows DNS and HTTPS egress. No manual hardening needed.
- Auto-updates with rollback. The operator polls the OCI registry for new versions, backs up the workspace, rolls out the update, and automatically rolls back if the new pod fails health checks.
- Config rollouts. Change your
spec.config.rawand the operator detects the content hash changed, triggers a rolling update. Same for secret rotation. - Backup and restore. Automatic workspace backup to S3-compatible storage on instance deletion. Restore into a new instance from any snapshot.
- Gateway auth. Auto-generates a gateway token per instance. No manual pairing, no mDNS (which does not work in Kubernetes anyway).
- Drift detection. Every 5 minutes, the operator checks that every managed resource matches the desired state. If someone manually edits a NetworkPolicy or deletes a PDB, it gets reconciled back.
Prerequisites
You need:
- A Kubernetes cluster (1.28+). Any conformant distribution works: EKS, GKE, AKS, k3s, or a local Kind cluster for testing.
kubectlconfigured to talk to your cluster.helmv3 installed.- An API key for your AI provider (Anthropic, OpenAI, or any OpenAI-compatible endpoint).
Step 1: Install the operator
The operator ships as an OCI Helm chart. One command installs it:
helm install openclaw-operator \
oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \
--namespace openclaw-operator-system \
--create-namespace
Verify it is running:
kubectl get pods -n openclaw-operator-system
You should see the operator pod in Running state. The operator also installs a validating webhook that prevents insecure configurations (like running as root).
Step 2: Create your API key secret
Store your AI provider API key in a Kubernetes Secret. The operator will inject it into the agent container:
kubectl create namespace openclaw
kubectl create secret generic openclaw-api-keys \
--namespace openclaw \
--from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here
For OpenAI or other providers, use the appropriate environment variable name (OPENAI_API_KEY, OPENROUTER_API_KEY, etc.). You can include multiple providers in the same Secret.
Tip: For production, consider using External Secrets Operator to sync keys from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault. The operator’s docs have detailed examples.
Step 3: Deploy your first agent
Create a file called my-agent.yaml:
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
name: my-agent
namespace: openclaw
spec:
envFrom:
- secretRef:
name: openclaw-api-keys
config:
raw:
agents:
defaults:
model:
primary: "anthropic/claude-sonnet-4-20250514"
storage:
persistence:
enabled: true
size: 10Gi
Apply it:
kubectl apply -f my-agent.yaml
That single resource creates a StatefulSet, Service, ServiceAccount, Role, RoleBinding, ConfigMap, PVC, PDB, NetworkPolicy, and a gateway token Secret. The operator reconciles all of it.
Step 4: Verify it is running
Watch the instance come up:
kubectl get openclawinstances -n openclaw -w
NAME PHASE READY AGE
my-agent Provisioning False 10s
my-agent Running True 45s
Once the phase shows Running and Ready is True, your agent is live. Check the logs:
kubectl logs -n openclaw statefulset/my-agent -f
To interact with your agent, port-forward the gateway:
kubectl port-forward -n openclaw svc/my-agent 18789:18789
Then open http://localhost:18789 in your browser.
Step 5: Connect a channel
OpenClaw supports Telegram, Discord, WhatsApp, Signal, and other messaging channels. Each channel is configured through environment variables. Add the relevant token to your Secret:
kubectl create secret generic openclaw-channel-keys \
--namespace openclaw \
--from-literal=TELEGRAM_BOT_TOKEN=your-bot-token-here
Then reference it in your instance:
spec:
envFrom:
- secretRef:
name: openclaw-api-keys
- secretRef:
name: openclaw-channel-keys
OpenClaw auto-detects the token and enables the channel. No additional config needed.
That covers the basics. Your agent is running, secured, and reachable. The rest of this guide covers optional features you can enable when you are ready.
Browser automation
OpenClaw can browse the web, take screenshots, and interact with pages. The operator makes this a one-line addition. It runs a hardened Chromium sidecar in the same pod, connected over localhost:
spec:
chromium:
enabled: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
The operator automatically injects a CHROMIUM_URL environment variable into the main container. The sidecar runs as UID 1001 with a read-only root filesystem and its own security context.
Skills and runtime dependencies
OpenClaw skills from ClawHub can be installed declaratively. The operator runs an init container that fetches each skill before the agent starts:
spec:
skills:
- "@anthropic/mcp-server-fetch"
- "@anthropic/mcp-server-filesystem"
If your skills or MCP servers need pnpm or Python, enable the built-in runtime dependency init containers:
spec:
runtimeDeps:
pnpm: true # Installs pnpm via corepack
python: true # Installs Python 3.12 + uv
The init containers install these tools to the data PVC, so they persist across restarts without bloating the container image.
Auto-updates
OpenClaw releases new versions frequently. The operator can track these automatically, back up before updating, and roll back if something goes wrong:
spec:
autoUpdate:
enabled: true
checkInterval: "12h"
backupBeforeUpdate: true
rollbackOnFailure: true
healthCheckTimeout: "10m"
When a new version appears in the registry, the operator:
- Creates a backup of the workspace PVC to S3-compatible storage
- Updates the image tag on the StatefulSet
- Waits up to
healthCheckTimeoutfor the pod to pass readiness checks - If the pod fails to become ready, restores the previous image tag and the backup
After 3 consecutive failed rollbacks, the operator pauses auto-update and sets a condition so you can investigate.
Note: Auto-update is a no-op for digest-pinned images (
spec.image.digest). If you pin by digest, you control updates manually.
Production hardening
The operator ships secure by default. Here are the additional knobs for production deployments.
Monitor with Prometheus
Enable the ServiceMonitor to scrape operator and instance metrics:
spec:
observability:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: "30s"
The operator exposes openclaw_reconcile_total, openclaw_reconcile_duration_seconds, openclaw_instance_phase, and auto-update counters.
Schedule on dedicated nodes
If you run a mixed cluster, use nodeSelector and tolerations to pin agents to dedicated nodes:
spec:
availability:
nodeSelector:
openclaw.rocks/nodepool: openclaw
tolerations:
- key: openclaw.rocks/dedicated
value: openclaw
effect: NoSchedule
Add custom egress rules
The default NetworkPolicy only allows DNS (port 53) and HTTPS (port 443). If your agent needs to reach other services (a database, a message queue, an internal API), add egress rules:
spec:
security:
networkPolicy:
additionalEgress:
- to:
- ipBlock:
cidr: 10.0.0.0/8
ports:
- port: 5432
protocol: TCP
Cloud provider identity
For AWS IRSA or GCP Workload Identity, annotate the managed ServiceAccount:
spec:
security:
rbac:
serviceAccountAnnotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/openclaw"
Corporate proxies and private CAs
If your cluster uses a TLS-intercepting proxy, inject a CA bundle:
spec:
security:
caBundle:
configMapName: corporate-ca-bundle
key: ca-bundle.crt
The operator mounts it into all containers and sets SSL_CERT_FILE and NODE_EXTRA_CA_CERTS automatically.
GitOps
The OpenClawInstance CRD is a plain YAML file. That means it fits directly into a GitOps workflow. Store your agent manifests in a git repo, and let ArgoCD or Flux sync them to your cluster.
A typical repo structure:
gitops/
└── agents/
├── kustomization.yaml
├── namespace.yaml
├── agent-a.yaml
└── agent-b.yaml
Every change goes through a pull request. Your team reviews the diff. Merge to main, and ArgoCD applies it. No kubectl apply from laptops, no configuration drift, full audit trail.
The operator’s config hashing makes this especially smooth. When ArgoCD syncs a changed spec.config.raw, the operator detects the content hash changed and triggers a rolling update automatically. Same for secret rotation: the operator watches referenced Secrets and rolls pods when they change.
Backup and restore
The operator supports S3-compatible backups. When you delete an instance, the operator automatically creates a backup of the workspace PVC before teardown.
To restore an agent from a backup into a new instance:
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
name: my-agent-restored
namespace: openclaw
spec:
restoreFrom: "s3://bucket/path/to/backup.tar.gz"
envFrom:
- secretRef:
name: openclaw-api-keys
storage:
persistence:
enabled: true
size: 10Gi
The operator downloads the snapshot, unpacks it into the PVC, and starts the agent with all previous workspace data, skills, and conversation history intact.
Custom sidecars and init containers
The operator supports arbitrary sidecars and init containers for advanced use cases. Run Ollama as a sidecar for local inference, a Cloud SQL Proxy for database access, or a log forwarder alongside your agent:
spec:
sidecars:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
requests:
cpu: "2"
memory: 4Gi
sidecarVolumes:
- name: ollama-models
persistentVolumeClaim:
claimName: ollama-models-pvc
Custom init containers run after the operator’s own init pipeline (config seeding, pnpm, Python, skills):
spec:
initContainers:
- name: fetch-data
image: curlimages/curl:8.5.0
command: ["sh", "-c", "curl -o /data/dataset.json https://..."]
volumeMounts:
- name: data
mountPath: /data
Config merge mode
By default, the operator overwrites the config file on every pod restart. If your agent modifies its own config at runtime (through skills or self-modification), set mergeMode: merge to deep-merge operator config with the existing PVC config:
spec:
config:
mergeMode: merge
raw:
agents:
defaults:
model:
primary: "anthropic/claude-sonnet-4-20250514"
In merge mode, operator-specified keys win, but keys the agent added on its own survive restarts.
The complete example
Here is a production-ready manifest that combines everything from this guide:
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
name: production-agent
namespace: openclaw
spec:
envFrom:
- secretRef:
name: openclaw-api-keys
config:
mergeMode: merge
raw:
agents:
defaults:
model:
primary: "anthropic/claude-sonnet-4-20250514"
skills:
- "@anthropic/mcp-server-fetch"
runtimeDeps:
pnpm: true
chromium:
enabled: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
storage:
persistence:
enabled: true
size: 10Gi
autoUpdate:
enabled: true
checkInterval: "24h"
backupBeforeUpdate: true
rollbackOnFailure: true
observability:
metrics:
enabled: true
serviceMonitor:
enabled: true
# Remove or adjust these if you don't use dedicated nodes
availability:
nodeSelector:
openclaw.rocks/nodepool: openclaw
tolerations:
- key: openclaw.rocks/dedicated
value: openclaw
effect: NoSchedule
Apply it, and you have a hardened, auto-updating, browser-capable AI agent with monitoring, backup, and network isolation. One kubectl apply.
What you get out of the box
Without touching a single security setting, every agent deployed by the operator ships with:
- Non-root execution (UID 1000)
- Read-only root filesystem
- All Linux capabilities dropped
- Seccomp RuntimeDefault profile
- Default-deny NetworkPolicy (DNS + HTTPS egress only)
- Per-instance ServiceAccount with no token auto-mounting
- PodDisruptionBudget
- Liveness, readiness, and startup probes
- Auto-generated gateway authentication token
- 5-minute drift reconciliation
A validating webhook blocks attempts to run as root and warns about disabled NetworkPolicies, missing TLS on Ingress, and undetected AI provider keys.
Next steps
- Browse the full API reference for every CRD field
- Read the deployment guides for EKS, GKE, AKS, and Kind
- Set up model fallback chains across multiple AI providers
- Configure External Secrets for Vault, AWS, GCP, or Azure
If you run into issues or have feedback, open an issue on GitHub. PRs are welcome too.
If you do not want to operate Kubernetes yourself, OpenClaw.rocks handles all of this for you. Pick a plan, connect a channel, and your agent is live in seconds.