OpenClaw Kubernetes Operator
Self-host OpenClaw AI agents on Kubernetes with production-grade security, observability, and lifecycle management.
OpenClaw is an AI agent platform that acts on your behalf across Telegram, Discord, WhatsApp, and Signal. It manages your inbox, calendar, smart home, and more through 50+ integrations. While OpenClaw.rocks offers fully managed hosting, this operator lets you run OpenClaw on your own infrastructure with the same operational rigor.
Why an Operator?
Deploying AI agents to Kubernetes involves more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, optional browser automation, and config rollouts, all wired correctly. This operator encodes those concerns into a single OpenClawInstance custom resource so you can go from zero to production in minutes:
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
name: my-agent
spec:
envFrom:
- secretRef:
name: openclaw-api-keys
storage:
persistence:
enabled: true
size: 10Gi
The operator reconciles this into a fully managed stack of 9+ Kubernetes resources: secured, monitored, and self-healing.
Agents That Adapt Themselves
Agents can autonomously install skills, patch their config, add environment variables, and seed workspace files through the Kubernetes API, validated by the operator on every request.
# 1. Enable self-configure on the instance
spec:
selfConfigure:
enabled: true
allowedActions: [skills, config, envVars, workspaceFiles]
# 2. The agent creates this to install a skill at runtime
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawSelfConfig
metadata:
name: add-fetch-skill
spec:
instanceRef: my-agent
addSkills:
- "@anthropic/mcp-server-fetch"
Every request is validated against the instance’s allowlist policy. Protected config keys cannot be overwritten, and denied requests are logged with a reason.
Features
| Feature | Details | |
|---|---|---|
| Declarative | Single CRD | One resource defines the entire stack: StatefulSet, Service, RBAC, NetworkPolicy, PVC, PDB, Ingress, and more |
| Adaptive | Agent self-configure | Agents autonomously install skills, patch config, and adapt their environment via the K8s API |
| Secure | Hardened by default | Non-root (UID 1000), read-only root filesystem, all capabilities dropped, seccomp RuntimeDefault, default-deny NetworkPolicy, validating webhook |
| Observable | Built-in metrics | Prometheus metrics, ServiceMonitor integration, structured JSON logging, Kubernetes events |
| Flexible | Provider-agnostic config | Use any AI provider (Anthropic, OpenAI, or others) via environment variables and inline or external config |
| Config Modes | Merge or overwrite | overwrite replaces config on restart; merge deep-merges with PVC config, preserving runtime changes |
| Skills | Declarative install | Install ClawHub skills, npm packages, or GitHub-hosted skill packs via spec.skills |
| Runtime Deps | pnpm and Python/uv | Built-in init containers install pnpm (via corepack) or Python 3.12 + uv for MCP servers and skills |
| Auto-Update | OCI registry polling | Opt-in version tracking: checks the registry for new semver releases, backs up first, rolls out, and auto-rolls back on failure |
| Scalable | Auto-scaling | HPA integration with CPU and memory metrics, min/max replica bounds, automatic StatefulSet replica management |
| Resilient | Self-healing lifecycle | PodDisruptionBudgets, health probes, automatic config rollouts via content hashing, 5-minute drift detection |
| Backup/Restore | S3-backed snapshots | Automatic backup on deletion, pre-update, and on a cron schedule; restore into a new instance from any snapshot |
| Gateway Auth | Auto-generated tokens | Automatic gateway token Secret per instance, bypassing mDNS pairing |
| Tailscale | Tailnet access | Expose via Tailscale Serve or Funnel with SSO auth |
| Extensible | Sidecars and init containers | Chromium for browser automation, Ollama for local LLMs, plus custom init containers and sidecars |
Architecture Overview
+-----------------------------------------------------------------+
| OpenClawInstance CR OpenClawSelfConfig CR |
| (your declarative config) (agent self-modification requests) |
+---------------+-------------------------------------------------+
| watch
v
+-----------------------------------------------------------------+
| OpenClaw Operator |
| +-----------+ +-------------+ +----------------------------+ |
| | Reconciler| | Webhooks | | Prometheus Metrics | |
| | | | (validate | | (reconcile count, | |
| | creates -> | & default)| | duration, phases) | |
| +-----------+ +-------------+ +----------------------------+ |
+---------------+-------------------------------------------------+
| manages
v
+-----------------------------------------------------------------+
| Managed Resources (per instance) |
| |
| ServiceAccount -> Role -> RoleBinding NetworkPolicy |
| ConfigMap PVC PDB ServiceMonitor |
| GatewayToken Secret |
| |
| StatefulSet |
| +-----------------------------------------------------------+ |
| | Init: config -> pnpm* -> python* -> skills* -> custom | |
| | (* = opt-in) | |
| +------------------------------------------------------------+ |
| | OpenClaw Container Gateway Proxy (nginx) | |
| | Chromium (opt) / Ollama (opt) | |
| | Tailscale (opt) + custom sidecars | |
| +------------------------------------------------------------+ |
| |
| Service (default: 18789, 18793 or custom) -> Ingress (opt) |
+-----------------------------------------------------------------+
Next Steps
- Quick Start to get your first instance running
- Deployment Guide for platform-specific instructions
- Architecture for a deep dive into the operator internals
- API Reference for the complete CRD specification