Add comprehensive deployment guide for Docker Swarm cluster
This commit is contained in:
515
AGENTS.md
Normal file
515
AGENTS.md
Normal file
@@ -0,0 +1,515 @@
|
||||
# OpenClaw Cluster Deployment Guide
|
||||
|
||||
This document describes the deployment setup for running OpenClaw Gateway on a Docker Swarm cluster using Dokploy.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Gateway URL**: `wss://klaatu.bendtstudio.com`
|
||||
- **Registry**: `registry.lan:5000` (internal)
|
||||
- **Image**: `registry.lan/openclaw:latest`
|
||||
- **Node**: `tpi-n1` (192.168.2.130)
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Docker Swarm │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Node: tpi-n1 (Manager + Worker) │ │
|
||||
│ │ ┌────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ OpenClaw Gateway Service │ │ │
|
||||
│ │ │ - Ports: 18789 (WebSocket), 18790 (Bridge) │ │ │
|
||||
│ │ │ - Image: registry.lan/openclaw:latest │ │ │
|
||||
│ │ │ - Volumes: │ │ │
|
||||
│ │ │ - openclaw-config → /home/node/.openclaw │ │ │
|
||||
│ │ │ - openclaw-workspace → /home/node/.openclaw/workspace │ │ │
|
||||
│ │ └────────────────────────────────────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Registry Service (on tpi-n1) │ │
|
||||
│ │ - URL: registry.lan:5000 │ │
|
||||
│ │ - Accessible via Traefik at registry.lan:443 │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
### 1. Registry Setup
|
||||
|
||||
Ensure the Docker registry is running and accessible:
|
||||
|
||||
```bash
|
||||
# Check registry is running
|
||||
docker ps | grep registry
|
||||
|
||||
# Check registry is accessible
|
||||
curl -sk https://registry.lan/v2/_catalog
|
||||
```
|
||||
|
||||
### 2. DNS Configuration
|
||||
|
||||
All nodes must resolve `registry.lan` to the registry IP:
|
||||
|
||||
```bash
|
||||
# On each node
|
||||
echo "192.168.2.130 registry.lan" | sudo tee -a /etc/hosts
|
||||
|
||||
# Verify
|
||||
ping registry.lan
|
||||
```
|
||||
|
||||
### 3. Registry Authentication
|
||||
|
||||
All nodes must be logged into the registry:
|
||||
|
||||
```bash
|
||||
docker login registry.lan
|
||||
# Username: docker
|
||||
# Password: password
|
||||
```
|
||||
|
||||
## Building and Pushing the Image
|
||||
|
||||
### Dockerfile Requirements
|
||||
|
||||
The Dockerfile must include:
|
||||
|
||||
```dockerfile
|
||||
FROM node:22-bookworm
|
||||
|
||||
# Install Bun
|
||||
RUN curl -fsSL https://bun.sh/install | bash
|
||||
ENV PATH="/root/.bun/bin:${PATH}"
|
||||
|
||||
RUN corepack enable
|
||||
|
||||
# Install gogcli (optional, for Google Workspace integration)
|
||||
RUN curl -L -o /tmp/gogcli.tar.gz \
|
||||
"https://github.com/steipete/gogcli/releases/download/v0.11.0/gogcli_0.11.0_linux_arm64.tar.gz" && \
|
||||
tar -xzf /tmp/gogcli.tar.gz -C /tmp && \
|
||||
mv /tmp/gog /usr/local/bin/gog && \
|
||||
chmod +x /usr/local/bin/gog && \
|
||||
rm -f /tmp/gogcli.tar.gz
|
||||
|
||||
# Install gcloud SDK (optional, for GCP integration)
|
||||
RUN curl -L -o /tmp/google-cloud-sdk.tar.gz \
|
||||
"https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-linux-arm.tar.gz" && \
|
||||
tar -xzf /tmp/google-cloud-sdk.tar.gz -C /opt/ && \
|
||||
/opt/google-cloud-sdk/install.sh --quiet --path-update=true --command-completion=false --usage-reporting=false && \
|
||||
rm -f /tmp/google-cloud-sdk.tar.gz
|
||||
|
||||
ENV PATH="/opt/google-cloud-sdk/bin:${PATH}"
|
||||
|
||||
# ... rest of build steps
|
||||
```
|
||||
|
||||
### Build and Push
|
||||
|
||||
On the registry node (tpi-n1):
|
||||
|
||||
```bash
|
||||
cd ~/openclaw
|
||||
|
||||
# Build image
|
||||
docker build -t registry.lan/openclaw:latest .
|
||||
|
||||
# Push to registry
|
||||
docker push registry.lan/openclaw:latest
|
||||
```
|
||||
|
||||
**Note**: The push may fail on large layers. If so, build directly on the registry node.
|
||||
|
||||
## Stack Configuration
|
||||
|
||||
### stack.yml
|
||||
|
||||
```yaml
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
openclaw-gateway:
|
||||
image: ${OPENCLAW_IMAGE:-registry.lan/openclaw:latest}
|
||||
environment:
|
||||
HOME: /home/node
|
||||
TERM: xterm-256color
|
||||
MOONSHOT_API_KEY: ${MOONSHOT_API_KEY}
|
||||
OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
|
||||
volumes:
|
||||
- openclaw-config:/home/node/.openclaw
|
||||
- openclaw-workspace:/home/node/.openclaw/workspace
|
||||
ports:
|
||||
- target: 18789
|
||||
published: ${OPENCLAW_GATEWAY_PORT:-18789}
|
||||
protocol: tcp
|
||||
mode: host
|
||||
- target: 18790
|
||||
published: ${OPENCLAW_BRIDGE_PORT:-18790}
|
||||
protocol: tcp
|
||||
mode: host
|
||||
init: true
|
||||
deploy:
|
||||
replicas: 1
|
||||
placement:
|
||||
constraints:
|
||||
- node.hostname == tpi-n1
|
||||
networks:
|
||||
- dokploy-network
|
||||
command:
|
||||
[
|
||||
"node",
|
||||
"dist/index.js",
|
||||
"gateway",
|
||||
"--bind",
|
||||
"lan",
|
||||
"--port",
|
||||
"18789",
|
||||
]
|
||||
|
||||
volumes:
|
||||
openclaw-config:
|
||||
openclaw-workspace:
|
||||
|
||||
networks:
|
||||
dokploy-network:
|
||||
external: true
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file:
|
||||
|
||||
```bash
|
||||
OPENCLAW_IMAGE=registry.lan/openclaw:latest
|
||||
OPENCLAW_GATEWAY_TOKEN=your-generated-token
|
||||
MOONSHOT_API_KEY=your-moonshot-api-key
|
||||
OPENCLAW_GATEWAY_PORT=18789
|
||||
OPENCLAW_BRIDGE_PORT=18790
|
||||
OPENCLAW_GATEWAY_BIND=lan
|
||||
```
|
||||
|
||||
Generate gateway token:
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
## Configuration Setup
|
||||
|
||||
### 1. Seeding the Config Volume
|
||||
|
||||
Before first deployment, seed the config volume:
|
||||
|
||||
```bash
|
||||
# On manager node (tpi-n1)
|
||||
|
||||
# Create config file
|
||||
cat > /tmp/openclaw-config.json << 'EOF'
|
||||
{
|
||||
"env": {
|
||||
"MOONSHOT_API_KEY": "${MOONSHOT_API_KEY}",
|
||||
"OPENCLAW_GATEWAY_TOKEN": "${OPENCLAW_GATEWAY_TOKEN}"
|
||||
},
|
||||
"gateway": {
|
||||
"port": 18789,
|
||||
"bind": "lan",
|
||||
"mode": "local"
|
||||
},
|
||||
"agents": {
|
||||
"defaults": {
|
||||
"model": {
|
||||
"primary": "moonshot/kimi-k2.5"
|
||||
},
|
||||
"models": {
|
||||
"moonshot/kimi-k2.5": {
|
||||
"alias": "Kimi K2.5"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"models": {
|
||||
"providers": {
|
||||
"moonshot": {
|
||||
"baseUrl": "https://api.moonshot.ai/v1",
|
||||
"apiKey": "${MOONSHOT_API_KEY}",
|
||||
"api": "openai-completions",
|
||||
"models": [
|
||||
{
|
||||
"id": "kimi-k2.5",
|
||||
"name": "Kimi K2.5",
|
||||
"reasoning": false,
|
||||
"input": ["text"],
|
||||
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
|
||||
"contextWindow": 256000,
|
||||
"maxTokens": 8192
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Copy to volume
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-config:/data \
|
||||
-v /tmp/openclaw-config.json:/config.json \
|
||||
alpine:latest \
|
||||
sh -c "cp /config.json /data/openclaw.json && chown -R 1000:1000 /data && chmod 755 /data"
|
||||
```
|
||||
|
||||
### 2. Fixing Workspace Permissions
|
||||
|
||||
After deployment, fix workspace permissions:
|
||||
|
||||
```bash
|
||||
# Fix permissions on both volumes
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-config:/data \
|
||||
alpine:latest \
|
||||
sh -c "chown -R 1000:1000 /data && chmod 755 /data"
|
||||
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-workspace:/data \
|
||||
alpine:latest \
|
||||
sh -c "chown -R 1000:1000 /data && chmod 755 /data"
|
||||
```
|
||||
|
||||
## Pairing and Device Approval
|
||||
|
||||
### Automatic Device Approval
|
||||
|
||||
When web UI shows "pairing required", approve devices:
|
||||
|
||||
```bash
|
||||
# SSH into container
|
||||
docker exec -it CONTAINER_NAME /bin/sh
|
||||
|
||||
# List pending devices
|
||||
cat /home/node/.openclaw/devices/pending.json
|
||||
|
||||
# Manually approve by creating paired.json
|
||||
cat > /home/node/.openclaw/devices/paired.json << 'EOF'
|
||||
{
|
||||
"DEVICE_ID": {
|
||||
"deviceId": "DEVICE_ID",
|
||||
"publicKey": "PUBLIC_KEY",
|
||||
"platform": "MacIntel",
|
||||
"clientId": "openclaw-control-ui",
|
||||
"clientMode": "webchat",
|
||||
"role": "operator",
|
||||
"roles": ["operator"],
|
||||
"scopes": ["operator.admin", "operator.approvals", "operator.pairing"],
|
||||
"token": "approved-token",
|
||||
"ts": TIMESTAMP,
|
||||
"pairedAt": TIMESTAMP
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Clear pending
|
||||
echo '{}' > /home/node/.openclaw/devices/pending.json
|
||||
```
|
||||
|
||||
### Using CLI to Approve
|
||||
|
||||
```bash
|
||||
# From within container
|
||||
export JAVA_HOME=/home/node/.local/jre
|
||||
export PATH=$JAVA_HOME/bin:/home/node/.local/bin:$PATH
|
||||
|
||||
node /app/openclaw.mjs devices approve DEVICE_ID --token $OPENCLAW_GATEWAY_TOKEN
|
||||
```
|
||||
|
||||
## Deploying Updates
|
||||
|
||||
### Method 1: Manual Rebuild and Deploy
|
||||
|
||||
```bash
|
||||
# On registry node (tpi-n1)
|
||||
cd ~/openclaw
|
||||
|
||||
# Build new image
|
||||
docker build -t registry.lan/openclaw:latest .
|
||||
|
||||
# Push to registry (may fail on large layers)
|
||||
docker push registry.lan/openclaw:latest
|
||||
|
||||
# Force service update to pull new image
|
||||
docker service update --force openclaw-app-ceptgi_openclaw-gateway
|
||||
```
|
||||
|
||||
### Method 2: Via Dokploy Webhook
|
||||
|
||||
```bash
|
||||
# Trigger dokploy deployment
|
||||
curl -X POST http://dokploy.lan:3000/api/deploy/compose/JrrUnF7-O3tJDK02wUhUG
|
||||
```
|
||||
|
||||
**Note**: Dokploy pulls the stack.yml from git. Make sure the image reference is updated in stack.yml before triggering.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Image Push Fails on Large Layers
|
||||
|
||||
**Problem**: Docker push fails on large layers.
|
||||
|
||||
**Solution**: Build directly on the registry node:
|
||||
```bash
|
||||
ssh ubuntu@192.168.2.130
|
||||
cd ~/openclaw
|
||||
docker build -t registry.lan/openclaw:latest .
|
||||
# No push needed since registry is local
|
||||
```
|
||||
|
||||
### Container Can't Access Registry
|
||||
|
||||
**Problem**: "No such image: registry.lan/openclaw:latest"
|
||||
|
||||
**Solution**:
|
||||
1. Ensure registry.lan resolves on all nodes
|
||||
2. Ensure all nodes are logged into registry
|
||||
3. Check insecure registry settings in daemon.json
|
||||
|
||||
### Pairing Required Error
|
||||
|
||||
**Problem**: Web UI shows "disconnected (1008): pairing required"
|
||||
|
||||
**Solution**:
|
||||
1. Check logs for device IDs: `docker logs CONTAINER | grep pairing`
|
||||
2. Manually approve devices in paired.json
|
||||
3. Restart container
|
||||
|
||||
### Permission Denied Errors
|
||||
|
||||
**Problem**: `EACCES: permission denied, open '/home/node/.openclaw/workspace/...'`
|
||||
|
||||
**Solution**: Fix volume permissions (see "Fixing Workspace Permissions" above)
|
||||
|
||||
### Service Won't Start
|
||||
|
||||
**Problem**: Service stuck in "no suitable node" state
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Check placement constraints
|
||||
docker service inspect openclaw-app-ceptgi_openclaw-gateway | grep -A10 placement
|
||||
|
||||
# Verify node labels
|
||||
docker node ls
|
||||
docker node inspect tpi-n1 | grep -A10 Labels
|
||||
|
||||
# Remove constraint temporarily for debugging
|
||||
docker service update --constraint-rm 'node.hostname == tpi-n1' openclaw-app-ceptgi_openclaw-gateway
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check Service Status
|
||||
|
||||
```bash
|
||||
docker service ps openclaw-app-ceptgi_openclaw-gateway
|
||||
docker service logs openclaw-app-ceptgi_openclaw-gateway --tail 50
|
||||
```
|
||||
|
||||
### Check Container Health
|
||||
|
||||
```bash
|
||||
CONTAINER=$(docker ps --filter "name=openclaw-app-ceptgi" --format "{{.Names}}" | head -1)
|
||||
docker logs $CONTAINER --tail 50
|
||||
docker exec $CONTAINER ps aux
|
||||
```
|
||||
|
||||
### Test Gateway
|
||||
|
||||
```bash
|
||||
# From any node
|
||||
curl -s http://tpi-n1:18789/ | grep "<title>"
|
||||
```
|
||||
|
||||
## Additional Tools
|
||||
|
||||
### Installing signal-cli (Temporary)
|
||||
|
||||
Note: Manual installations are lost on container restart. Add to Dockerfile for persistence.
|
||||
|
||||
```bash
|
||||
# Download JRE
|
||||
mkdir -p /home/node/.local/jre
|
||||
cd /tmp
|
||||
curl -L -o jre.tar.gz \
|
||||
'https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.5%2B11/OpenJDK21U-jre_aarch64_linux_hotspot_21.0.5_11.tar.gz'
|
||||
tar -xzf jre.tar.gz -C /home/node/.local/jre/ --strip-components=1
|
||||
|
||||
# Download signal-cli
|
||||
mkdir -p /home/node/.local/bin /home/node/.local/share
|
||||
curl -L -o signal-cli.tar.gz \
|
||||
'https://github.com/AsamK/signal-cli/releases/download/v0.13.24/signal-cli-0.13.24.tar.gz'
|
||||
tar -xzf signal-cli.tar.gz -C /home/node/.local/share/
|
||||
ln -sf /home/node/.local/share/signal-cli-0.13.24/bin/signal-cli /home/node/.local/bin/signal-cli
|
||||
|
||||
# Set environment
|
||||
export JAVA_HOME=/home/node/.local/jre
|
||||
export PATH=$JAVA_HOME/bin:/home/node/.local/bin:$PATH
|
||||
```
|
||||
|
||||
### Using gogcli
|
||||
|
||||
```bash
|
||||
# Configure credentials
|
||||
gog auth credentials /home/node/.openclaw/client_secret.json
|
||||
|
||||
# Add account (requires browser for OAuth)
|
||||
gog auth add user@gmail.com --services gmail,calendar,drive,contacts,docs,sheets
|
||||
|
||||
# List accounts
|
||||
gog auth list
|
||||
```
|
||||
|
||||
## Security Notes
|
||||
|
||||
1. **Token Storage**: Gateway token is stored in environment variables and config file
|
||||
2. **Pairing**: Devices must be explicitly approved before connecting
|
||||
3. **Network**: Gateway binds to `0.0.0.0` (lan mode) - ensure firewall rules are in place
|
||||
4. **Registry**: Use HTTPS for registry access in production
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Updating Moonshot API Key
|
||||
|
||||
```bash
|
||||
# Update config in volume
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-config:/data \
|
||||
alpine:latest \
|
||||
sh -c "sed -i 's/old-api-key/new-api-key/g' /data/openclaw.json"
|
||||
|
||||
# Restart service
|
||||
docker service update --force openclaw-app-ceptgi_openclaw-gateway
|
||||
```
|
||||
|
||||
### Backup Volumes
|
||||
|
||||
```bash
|
||||
# Backup config
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-config:/data \
|
||||
-v /backup:/backup \
|
||||
alpine:latest \
|
||||
tar -czf /backup/openclaw-config-$(date +%Y%m%d).tar.gz -C /data .
|
||||
|
||||
# Backup workspace
|
||||
docker run --rm \
|
||||
-v openclaw-app-ceptgi_openclaw-workspace:/data \
|
||||
-v /backup:/backup \
|
||||
alpine:latest \
|
||||
tar -czf /backup/openclaw-workspace-$(date +%Y%m%d).tar.gz -C /data .
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
- OpenClaw Docs: https://docs.openclaw.ai
|
||||
- Gateway Docs: https://docs.openclaw.ai/gateway
|
||||
- Moonshot API: https://platform.moonshot.ai
|
||||
- Signal-cli: https://github.com/AsamK/signal-cli
|
||||
- gogcli: https://github.com/steipete/gogcli
|
||||
Reference in New Issue
Block a user