Files
openclaw/AGENTS.md
2026-02-23 11:06:40 -05:00

572 lines
15 KiB
Markdown

# OpenClaw Cluster Deployment Guide
This document describes the deployment setup for running OpenClaw Gateway on a Docker Swarm cluster using Dokploy.
## Overview
- **Gateway URL**: `wss://klaatu.bendtstudio.com`
- **Registry**: `registry.lan:5000` (internal)
- **Image**: `registry.lan/openclaw:latest`
- **Node**: `tpi-n1` (192.168.2.130)
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Docker Swarm │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Node: tpi-n1 (Manager + Worker) │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ OpenClaw Gateway Service │ │ │
│ │ │ - Ports: 18789 (WebSocket), 18790 (Bridge) │ │ │
│ │ │ - Image: registry.lan/openclaw:latest │ │ │
│ │ │ - Volumes: │ │ │
│ │ │ - openclaw-config → /home/node/.openclaw │ │ │
│ │ │ - openclaw-workspace → /home/node/.openclaw/workspace │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Registry Service (on tpi-n1) │ │
│ │ - URL: registry.lan:5000 │ │
│ │ - Accessible via Traefik at registry.lan:443 │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Pre-requisites
### 1. Registry Setup
Ensure the Docker registry is running and accessible:
```bash
# Check registry is running
docker ps | grep registry
# Check registry is accessible
curl -sk https://registry.lan/v2/_catalog
```
### 2. DNS Configuration
All nodes must resolve `registry.lan` to the registry IP:
```bash
# On each node
echo "192.168.2.130 registry.lan" | sudo tee -a /etc/hosts
# Verify
ping registry.lan
```
### 3. Registry Authentication
All nodes must be logged into the registry:
```bash
docker login registry.lan
# Username: docker
# Password: password
```
## Building and Pushing the Image
### Dockerfile Requirements
The Dockerfile must include:
```dockerfile
FROM node:22-bookworm
# Install Bun
RUN curl -fsSL https://bun.sh/install | bash
ENV PATH="/root/.bun/bin:${PATH}"
RUN corepack enable
# Install gogcli (optional, for Google Workspace integration)
RUN curl -L -o /tmp/gogcli.tar.gz \
"https://github.com/steipete/gogcli/releases/download/v0.11.0/gogcli_0.11.0_linux_arm64.tar.gz" && \
tar -xzf /tmp/gogcli.tar.gz -C /tmp && \
mv /tmp/gog /usr/local/bin/gog && \
chmod +x /usr/local/bin/gog && \
rm -f /tmp/gogcli.tar.gz
# Install gcloud SDK (optional, for GCP integration)
RUN curl -L -o /tmp/google-cloud-sdk.tar.gz \
"https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-linux-arm.tar.gz" && \
tar -xzf /tmp/google-cloud-sdk.tar.gz -C /opt/ && \
/opt/google-cloud-sdk/install.sh --quiet --path-update=true --command-completion=false --usage-reporting=false && \
rm -f /tmp/google-cloud-sdk.tar.gz
ENV PATH="/opt/google-cloud-sdk/bin:${PATH}"
# ... rest of build steps
```
### Build and Push
On the registry node (tpi-n1):
```bash
cd ~/openclaw
# Build image
docker build -t registry.lan/openclaw:latest .
# Push to registry
docker push registry.lan/openclaw:latest
```
**Note**: The push may fail on large layers. If so, build directly on the registry node.
## Stack Configuration
### stack.yml
```yaml
version: "3.8"
services:
openclaw-gateway:
image: ${OPENCLAW_IMAGE:-registry.lan/openclaw:latest}
environment:
HOME: /home/node
TERM: xterm-256color
MOONSHOT_API_KEY: ${MOONSHOT_API_KEY}
OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
volumes:
- openclaw-config:/home/node/.openclaw
- openclaw-workspace:/home/node/.openclaw/workspace
ports:
- target: 18789
published: ${OPENCLAW_GATEWAY_PORT:-18789}
protocol: tcp
mode: host
- target: 18790
published: ${OPENCLAW_BRIDGE_PORT:-18790}
protocol: tcp
mode: host
init: true
deploy:
replicas: 1
placement:
constraints:
- node.hostname == tpi-n1
networks:
- dokploy-network
command:
[
"node",
"dist/index.js",
"gateway",
"--bind",
"lan",
"--port",
"18789",
]
volumes:
openclaw-config:
openclaw-workspace:
networks:
dokploy-network:
external: true
```
### Environment Variables
Create a `.env` file:
```bash
OPENCLAW_IMAGE=registry.lan/openclaw:latest
OPENCLAW_GATEWAY_TOKEN=your-generated-token
MOONSHOT_API_KEY=your-moonshot-api-key
OPENCLAW_GATEWAY_PORT=18789
OPENCLAW_BRIDGE_PORT=18790
OPENCLAW_GATEWAY_BIND=lan
```
Generate gateway token:
```bash
openssl rand -hex 32
```
## Configuration Setup
### 1. Seeding the Config Volume
Before first deployment, seed the config volume:
```bash
# On manager node (tpi-n1)
# Create config file
cat > /tmp/openclaw-config.json << 'EOF'
{
"env": {
"MOONSHOT_API_KEY": "${MOONSHOT_API_KEY}",
"OPENCLAW_GATEWAY_TOKEN": "${OPENCLAW_GATEWAY_TOKEN}"
},
"gateway": {
"port": 18789,
"bind": "lan",
"mode": "local"
},
"agents": {
"defaults": {
"model": {
"primary": "moonshot/kimi-k2.5"
},
"models": {
"moonshot/kimi-k2.5": {
"alias": "Kimi K2.5"
}
}
}
},
"models": {
"providers": {
"moonshot": {
"baseUrl": "https://api.moonshot.ai/v1",
"apiKey": "${MOONSHOT_API_KEY}",
"api": "openai-completions",
"models": [
{
"id": "kimi-k2.5",
"name": "Kimi K2.5",
"reasoning": false,
"input": ["text"],
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
"contextWindow": 256000,
"maxTokens": 8192
}
]
}
}
}
}
EOF
# Copy to volume
docker run --rm \
-v openclaw-app-ceptgi_openclaw-config:/data \
-v /tmp/openclaw-config.json:/config.json \
alpine:latest \
sh -c "cp /config.json /data/openclaw.json && chown -R 1000:1000 /data && chmod 755 /data"
```
### 2. Fixing Workspace Permissions
After deployment, fix workspace permissions:
```bash
# Fix permissions on both volumes
docker run --rm \
-v openclaw-app-ceptgi_openclaw-config:/data \
alpine:latest \
sh -c "chown -R 1000:1000 /data && chmod 755 /data"
docker run --rm \
-v openclaw-app-ceptgi_openclaw-workspace:/data \
alpine:latest \
sh -c "chown -R 1000:1000 /data && chmod 755 /data"
```
## Pairing and Device Approval
### Automatic Device Approval
When web UI shows "pairing required", approve devices:
```bash
# SSH into container
docker exec -it CONTAINER_NAME /bin/sh
# List pending devices
cat /home/node/.openclaw/devices/pending.json
# Manually approve by creating paired.json
cat > /home/node/.openclaw/devices/paired.json << 'EOF'
{
"DEVICE_ID": {
"deviceId": "DEVICE_ID",
"publicKey": "PUBLIC_KEY",
"platform": "MacIntel",
"clientId": "openclaw-control-ui",
"clientMode": "webchat",
"role": "operator",
"roles": ["operator"],
"scopes": ["operator.admin", "operator.approvals", "operator.pairing"],
"token": "approved-token",
"ts": TIMESTAMP,
"pairedAt": TIMESTAMP
}
}
EOF
# Clear pending
echo '{}' > /home/node/.openclaw/devices/pending.json
```
### Using CLI to Approve
```bash
# From within container
export JAVA_HOME=/home/node/.local/jre
export PATH=$JAVA_HOME/bin:/home/node/.local/bin:$PATH
node /app/openclaw.mjs devices approve DEVICE_ID --token $OPENCLAW_GATEWAY_TOKEN
```
## Deploying Updates
### Method 1: Manual Rebuild and Deploy
```bash
# On registry node (tpi-n1)
cd ~/openclaw
# Build new image
docker build -t registry.lan/openclaw:latest .
# Push to registry (may fail on large layers)
docker push registry.lan/openclaw:latest
# Force service update to pull new image
docker service update --force openclaw-app-ceptgi_openclaw-gateway
```
### Method 2: Via Dokploy Webhook
```bash
# Trigger dokploy deployment
curl -X POST http://dokploy.lan:3000/api/deploy/compose/JrrUnF7-O3tJDK02wUhUG
```
**Note**: Dokploy pulls the stack.yml from git. Make sure the image reference is updated in stack.yml before triggering.
## Troubleshooting
### Image Push Fails on Large Layers
**Problem**: Docker push fails on large layers.
**Solution**: Build directly on the registry node:
```bash
ssh ubuntu@192.168.2.130
cd ~/openclaw
docker build -t registry.lan/openclaw:latest .
# No push needed since registry is local
```
### Container Can't Access Registry
**Problem**: "No such image: registry.lan/openclaw:latest"
**Solution**:
1. Ensure registry.lan resolves on all nodes
2. Ensure all nodes are logged into registry
3. Check insecure registry settings in daemon.json
### Pairing Required Error
**Problem**: Web UI shows "disconnected (1008): pairing required"
**Solution**:
1. Use the built-in CLI approver (recommended):
```bash
# approve newest pending request
docker exec CONTAINER_NAME node dist/index.js devices approve --latest --token "$OPENCLAW_GATEWAY_TOKEN"
# inspect current state
docker exec CONTAINER_NAME node dist/index.js devices list --token "$OPENCLAW_GATEWAY_TOKEN"
```
2. If using Tailscale Serve, ensure Gateway auth/proxy settings are correct:
```json
{
"gateway": {
"bind": "loopback",
"tailscale": { "mode": "serve" },
"auth": { "allowTailscale": true },
"trustedProxies": ["127.0.0.1", "::1"]
}
}
```
3. If error changes to `device token mismatch`, the browser usually has stale local state.
- Open the Control UI in an Incognito/Private window.
- Re-paste gateway token in settings, or open a tokenized URL from:
```bash
docker exec CONTAINER_NAME node dist/index.js dashboard --no-open
```
4. If needed, only clear pending requests (not full config):
```bash
docker exec CONTAINER_NAME sh -c 'echo {} > /home/node/.openclaw/devices/pending.json'
```
**Notes**:
- Remote browsers (LAN/Tailscale) still require one-time device pairing.
- Localhost (`127.0.0.1`) auto-approves.
- Config edits trigger Gateway reload/restart automatically; container restart is usually unnecessary.
### Keep Web UI Working Across Rebuilds
To avoid repeated reconnect/pairing friction after redeploys:
1. Keep a stable Tailscale hostname so the browser origin does not change:
```bash
TAILSCALE_HOSTNAME=openclaw-gateway
```
2. Keep the same `OPENCLAW_GATEWAY_TOKEN` between deployments.
3. Persist and reuse the same `openclaw-config` volume (contains `devices/paired.json`).
4. If UI shows `token_missing`, open a tokenized URL and re-save settings:
```bash
docker exec CONTAINER_NAME node dist/index.js dashboard --no-open
```
### Permission Denied Errors
**Problem**: `EACCES: permission denied, open '/home/node/.openclaw/workspace/...'`
**Solution**: Fix volume permissions (see "Fixing Workspace Permissions" above)
### Service Won't Start
**Problem**: Service stuck in "no suitable node" state
**Solution**:
```bash
# Check placement constraints
docker service inspect openclaw-app-ceptgi_openclaw-gateway | grep -A10 placement
# Verify node labels
docker node ls
docker node inspect tpi-n1 | grep -A10 Labels
# Remove constraint temporarily for debugging
docker service update --constraint-rm 'node.hostname == tpi-n1' openclaw-app-ceptgi_openclaw-gateway
```
## Monitoring
### Check Service Status
```bash
docker service ps openclaw-app-ceptgi_openclaw-gateway
docker service logs openclaw-app-ceptgi_openclaw-gateway --tail 50
```
### Check Container Health
```bash
CONTAINER=$(docker ps --filter "name=openclaw-app-ceptgi" --format "{{.Names}}" | head -1)
docker logs $CONTAINER --tail 50
docker exec $CONTAINER ps aux
```
### Test Gateway
```bash
# From any node
curl -s http://tpi-n1:18789/ | grep "<title>"
```
## Additional Tools
### Installing signal-cli (Temporary)
Note: Manual installations are lost on container restart. Add to Dockerfile for persistence.
```bash
# Download JRE
mkdir -p /home/node/.local/jre
cd /tmp
curl -L -o jre.tar.gz \
'https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.5%2B11/OpenJDK21U-jre_aarch64_linux_hotspot_21.0.5_11.tar.gz'
tar -xzf jre.tar.gz -C /home/node/.local/jre/ --strip-components=1
# Download signal-cli
mkdir -p /home/node/.local/bin /home/node/.local/share
curl -L -o signal-cli.tar.gz \
'https://github.com/AsamK/signal-cli/releases/download/v0.13.24/signal-cli-0.13.24.tar.gz'
tar -xzf signal-cli.tar.gz -C /home/node/.local/share/
ln -sf /home/node/.local/share/signal-cli-0.13.24/bin/signal-cli /home/node/.local/bin/signal-cli
# Set environment
export JAVA_HOME=/home/node/.local/jre
export PATH=$JAVA_HOME/bin:/home/node/.local/bin:$PATH
```
### Using gogcli
```bash
# Configure credentials
gog auth credentials /home/node/.openclaw/client_secret.json
# Add account (requires browser for OAuth)
gog auth add user@gmail.com --services gmail,calendar,drive,contacts,docs,sheets
# List accounts
gog auth list
```
## Security Notes
1. **Token Storage**: Gateway token is stored in environment variables and config file
2. **Pairing**: Devices must be explicitly approved before connecting
3. **Network**: Gateway binds to `0.0.0.0` (lan mode) - ensure firewall rules are in place
4. **Registry**: Use HTTPS for registry access in production
## Maintenance
### Updating Moonshot API Key
```bash
# Update config in volume
docker run --rm \
-v openclaw-app-ceptgi_openclaw-config:/data \
alpine:latest \
sh -c "sed -i 's/old-api-key/new-api-key/g' /data/openclaw.json"
# Restart service
docker service update --force openclaw-app-ceptgi_openclaw-gateway
```
### Backup Volumes
```bash
# Backup config
docker run --rm \
-v openclaw-app-ceptgi_openclaw-config:/data \
-v /backup:/backup \
alpine:latest \
tar -czf /backup/openclaw-config-$(date +%Y%m%d).tar.gz -C /data .
# Backup workspace
docker run --rm \
-v openclaw-app-ceptgi_openclaw-workspace:/data \
-v /backup:/backup \
alpine:latest \
tar -czf /backup/openclaw-workspace-$(date +%Y%m%d).tar.gz -C /data .
```
## Support
- OpenClaw Docs: https://docs.openclaw.ai
- Gateway Docs: https://docs.openclaw.ai/gateway
- Moonshot API: https://platform.moonshot.ai
- Signal-cli: https://github.com/AsamK/signal-cli
- gogcli: https://github.com/steipete/gogcli