new plan and docs
This commit is contained in:
403
ACTION_PLAN.md
Normal file
403
ACTION_PLAN.md
Normal file
@@ -0,0 +1,403 @@
|
|||||||
|
# Home Lab Action Plan
|
||||||
|
|
||||||
|
## Phase 1: Critical Fixes (Do This Week)
|
||||||
|
|
||||||
|
### 1.1 Fix Failing Services
|
||||||
|
|
||||||
|
**bewcloud-memos (Restarting Loop)**
|
||||||
|
```bash
|
||||||
|
# SSH to controller
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||||
|
|
||||||
|
# Check what's wrong
|
||||||
|
docker service logs bewcloud-memos-ssogxn-memos --tail 100
|
||||||
|
|
||||||
|
# Common fixes:
|
||||||
|
# If database connection issue:
|
||||||
|
docker service update --env-add "MEMOS_DB_HOST=correct-hostname" bewcloud-memos-ssogxn-memos
|
||||||
|
|
||||||
|
# If it keeps failing, try recreating:
|
||||||
|
docker service rm bewcloud-memos-ssogxn-memos
|
||||||
|
# Then redeploy via Dokploy UI
|
||||||
|
```
|
||||||
|
|
||||||
|
**bendtstudio-webstatic (Rollback Paused)**
|
||||||
|
```bash
|
||||||
|
# Check the error
|
||||||
|
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
|
||||||
|
|
||||||
|
# Force update to retry
|
||||||
|
docker service update --force bendtstudio-webstatic-iq9evl
|
||||||
|
|
||||||
|
# If that fails, inspect the image
|
||||||
|
docker service inspect bendtstudio-webstatic-iq9evl --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**syncthing (Stopped)**
|
||||||
|
```bash
|
||||||
|
# Option A: Start it if you need it
|
||||||
|
docker service scale syncthing=1
|
||||||
|
|
||||||
|
# Option B: Remove it if not needed
|
||||||
|
docker service rm syncthing
|
||||||
|
# Also remove the volume if no longer needed
|
||||||
|
docker volume rm cloud-syncthing-i2rpwr_syncthing_config
|
||||||
|
```
|
||||||
|
|
||||||
|
### 1.2 Clean Up Unused Resources
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove unused volumes (reclaim ~595MB)
|
||||||
|
docker volume prune
|
||||||
|
|
||||||
|
# Remove unused images
|
||||||
|
docker image prune -a
|
||||||
|
|
||||||
|
# System-wide cleanup
|
||||||
|
docker system prune -a --volumes
|
||||||
|
```
|
||||||
|
|
||||||
|
### 1.3 Document Current State
|
||||||
|
|
||||||
|
Take screenshots of:
|
||||||
|
- Dokploy UI (all projects)
|
||||||
|
- Swarmpit dashboard
|
||||||
|
- Traefik dashboard (http://192.168.2.130:8080)
|
||||||
|
- MinIO console (http://192.168.2.18:9001)
|
||||||
|
- Gitea repositories
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Configuration Backup (Do This Week)
|
||||||
|
|
||||||
|
### 2.1 Create Git Repository for Infrastructure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On the controller node:
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||||
|
|
||||||
|
# Create a backup directory
|
||||||
|
mkdir -p ~/infrastructure-backup/$(date +%Y-%m-%d)
|
||||||
|
cd ~/infrastructure-backup/$(date +%Y-%m-%d)
|
||||||
|
|
||||||
|
# Copy all compose files
|
||||||
|
cp -r /etc/dokploy/compose ./dokploy-compose
|
||||||
|
cp -r /etc/dokploy/traefik ./traefik-config
|
||||||
|
cp ~/minio-stack.yml ./
|
||||||
|
|
||||||
|
# Export service configs
|
||||||
|
mkdir -p ./service-configs
|
||||||
|
docker service ls -q | while read service; do
|
||||||
|
docker service inspect "$service" > "./service-configs/${service}.json"
|
||||||
|
done
|
||||||
|
|
||||||
|
# Export stack configs
|
||||||
|
docker stack ls -q | while read stack; do
|
||||||
|
docker stack ps "$stack" > "./service-configs/${stack}-tasks.txt"
|
||||||
|
done
|
||||||
|
|
||||||
|
# Create a summary
|
||||||
|
cat > README.txt << EOF
|
||||||
|
Infrastructure Backup - $(date)
|
||||||
|
Cluster: Docker Swarm with Dokploy
|
||||||
|
Nodes: 3 (tpi-n1, tpi-n2, node-nas)
|
||||||
|
Services: $(docker service ls -q | wc -l) services
|
||||||
|
Stacks: $(docker stack ls -q | wc -l) stacks
|
||||||
|
|
||||||
|
See HOMELAB_AUDIT.md for full documentation.
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Create tar archive
|
||||||
|
cd ..
|
||||||
|
tar -czf infrastructure-$(date +%Y-%m-%d).tar.gz $(date +%Y-%m-%d)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.2 Commit to Gitea
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone your infrastructure repo (create if needed)
|
||||||
|
# Replace with your actual Gitea URL
|
||||||
|
git clone http://gitea.bendtstudio.com:3000/sirtimbly/homelab-configs.git
|
||||||
|
cd homelab-configs
|
||||||
|
|
||||||
|
# Copy backed up configs
|
||||||
|
cp -r ~/infrastructure-backup/$(date +%Y-%m-%d)/* .
|
||||||
|
|
||||||
|
# Organize by service
|
||||||
|
mkdir -p {stacks,compose,dokploy,traefik,docs}
|
||||||
|
mv dokploy-compose/* compose/ 2>/dev/null || true
|
||||||
|
mv traefik-config/* traefik/ 2>/dev/null || true
|
||||||
|
mv minio-stack.yml stacks/
|
||||||
|
mv service-configs/* docs/ 2>/dev/null || true
|
||||||
|
|
||||||
|
# Commit
|
||||||
|
git add .
|
||||||
|
git commit -m "Initial infrastructure backup - $(date +%Y-%m-%d)
|
||||||
|
|
||||||
|
- All Dokploy compose files
|
||||||
|
- Traefik configuration
|
||||||
|
- MinIO stack definition
|
||||||
|
- Service inspection exports
|
||||||
|
- Task history exports
|
||||||
|
|
||||||
|
Services backed up:
|
||||||
|
$(docker service ls --format '- {{.Name}}' | sort)
|
||||||
|
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Security Hardening (Do Next Week)
|
||||||
|
|
||||||
|
### 3.1 Remove Exposed Credentials
|
||||||
|
|
||||||
|
**Problem:** Services have passwords in environment variables visible in Docker configs
|
||||||
|
|
||||||
|
**Solution:** Use Docker secrets or Dokploy environment variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Example: Securing MinIO
|
||||||
|
# Instead of having password in compose file, use Docker secret:
|
||||||
|
|
||||||
|
echo "your-minio-password" | docker secret create minio_root_password -
|
||||||
|
|
||||||
|
# Then in compose:
|
||||||
|
# environment:
|
||||||
|
# MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
|
||||||
|
# secrets:
|
||||||
|
# - minio_root_password
|
||||||
|
```
|
||||||
|
|
||||||
|
**Action items:**
|
||||||
|
1. List all services with exposed passwords:
|
||||||
|
```bash
|
||||||
|
docker service ls -q | xargs -I {} docker service inspect {} --format '{{.Spec.Name}}: {{range .Spec.TaskTemplate.ContainerSpec.Env}}{{.}} {{end}}' | grep -i password
|
||||||
|
```
|
||||||
|
|
||||||
|
2. For each service, create a plan to move credentials to:
|
||||||
|
- Docker secrets (best for swarm)
|
||||||
|
- Environment files (easier to manage)
|
||||||
|
- Dokploy UI environment variables
|
||||||
|
|
||||||
|
3. Update compose files and redeploy
|
||||||
|
|
||||||
|
### 3.2 Update Default Passwords
|
||||||
|
|
||||||
|
Check for default/weak passwords:
|
||||||
|
- Dokploy (if still default)
|
||||||
|
- MinIO
|
||||||
|
- Gitea admin
|
||||||
|
- Technitium DNS
|
||||||
|
- Any databases
|
||||||
|
|
||||||
|
### 3.3 Review Exposed Ports
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check all published ports
|
||||||
|
docker service ls --format '{{.Name}}: {{.Ports}}'
|
||||||
|
|
||||||
|
# Check if any services are exposed without Traefik
|
||||||
|
# (Should only be: 53, 2222, 3000, 8384, 9000-9001)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Monitoring Setup (Do Next Week)
|
||||||
|
|
||||||
|
### 4.1 Set Up Prometheus + Grafana
|
||||||
|
|
||||||
|
You mentioned these in PLAN.md but they're not running. Let's add them:
|
||||||
|
|
||||||
|
Create `monitoring-stack.yml`:
|
||||||
|
```yaml
|
||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
services:
|
||||||
|
prometheus:
|
||||||
|
image: prom/prometheus:latest
|
||||||
|
volumes:
|
||||||
|
- prometheus-data:/prometheus
|
||||||
|
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||||
|
command:
|
||||||
|
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||||
|
- '--storage.tsdb.path=/prometheus'
|
||||||
|
networks:
|
||||||
|
- dokploy-network
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.role == manager
|
||||||
|
|
||||||
|
grafana:
|
||||||
|
image: grafana/grafana:latest
|
||||||
|
volumes:
|
||||||
|
- grafana-data:/var/lib/grafana
|
||||||
|
environment:
|
||||||
|
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
|
||||||
|
secrets:
|
||||||
|
- grafana_admin_password
|
||||||
|
networks:
|
||||||
|
- dokploy-network
|
||||||
|
deploy:
|
||||||
|
labels:
|
||||||
|
- traefik.http.routers.grafana.rule=Host(`grafana.bendtstudio.com`)
|
||||||
|
- traefik.http.routers.grafana.entrypoints=websecure
|
||||||
|
- traefik.http.routers.grafana.tls.certresolver=letsencrypt
|
||||||
|
- traefik.enable=true
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
prometheus-data:
|
||||||
|
grafana-data:
|
||||||
|
|
||||||
|
networks:
|
||||||
|
dokploy-network:
|
||||||
|
external: true
|
||||||
|
|
||||||
|
secrets:
|
||||||
|
grafana_admin_password:
|
||||||
|
external: true
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.2 Add Node Exporter
|
||||||
|
|
||||||
|
Deploy node-exporter on all nodes to collect system metrics.
|
||||||
|
|
||||||
|
### 4.3 Configure Alerts
|
||||||
|
|
||||||
|
Set up alerts for:
|
||||||
|
- Service down
|
||||||
|
- High CPU/memory usage
|
||||||
|
- Disk space low
|
||||||
|
- Certificate expiration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Backup Strategy (Do Within 2 Weeks)
|
||||||
|
|
||||||
|
### 5.1 Define What to Back Up
|
||||||
|
|
||||||
|
**Critical Data:**
|
||||||
|
1. Gitea repositories (/data/git)
|
||||||
|
2. Dokploy database
|
||||||
|
3. MinIO buckets
|
||||||
|
4. Immich photos (/mnt/synology-data/immich)
|
||||||
|
5. PostgreSQL databases
|
||||||
|
6. Configuration files
|
||||||
|
|
||||||
|
### 5.2 Create Backup Scripts
|
||||||
|
|
||||||
|
Example backup script for Gitea:
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# /opt/backup/backup-gitea.sh
|
||||||
|
|
||||||
|
BACKUP_DIR="/backup/gitea/$(date +%Y%m%d)"
|
||||||
|
mkdir -p "$BACKUP_DIR"
|
||||||
|
|
||||||
|
# Backup Gitea data
|
||||||
|
docker exec gitea-giteasqlite-bhymqw-gitea-1 tar czf /tmp/gitea-backup.tar.gz /data
|
||||||
|
docker cp gitea-giteasqlite-bhymqw-gitea-1:/tmp/gitea-backup.tar.gz "$BACKUP_DIR/"
|
||||||
|
|
||||||
|
# Backup to MinIO (offsite)
|
||||||
|
mc cp "$BACKUP_DIR/gitea-backup.tar.gz" minio/backups/gitea/
|
||||||
|
|
||||||
|
# Clean up old backups (keep 30 days)
|
||||||
|
find /backup/gitea -type d -mtime +30 -exec rm -rf {} +
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.3 Automate Backups
|
||||||
|
|
||||||
|
Add to crontab:
|
||||||
|
```bash
|
||||||
|
# Daily backups at 2 AM
|
||||||
|
0 2 * * * /opt/backup/backup-gitea.sh
|
||||||
|
0 3 * * * /opt/backup/backup-dokploy.sh
|
||||||
|
0 4 * * * /opt/backup/backup-databases.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: Documentation (Ongoing)
|
||||||
|
|
||||||
|
### 6.1 Create Service Catalog
|
||||||
|
|
||||||
|
For each service, document:
|
||||||
|
- **Purpose:** What does it do?
|
||||||
|
- **Access URL:** How do I reach it?
|
||||||
|
- **Dependencies:** What does it need?
|
||||||
|
- **Data location:** Where is data stored?
|
||||||
|
- **Backup procedure:** How to back it up?
|
||||||
|
- **Restore procedure:** How to restore it?
|
||||||
|
|
||||||
|
### 6.2 Create Runbooks
|
||||||
|
|
||||||
|
Common operations:
|
||||||
|
- Adding a new service
|
||||||
|
- Scaling a service
|
||||||
|
- Updating a service
|
||||||
|
- Removing a service
|
||||||
|
- Recovering from node failure
|
||||||
|
- Restoring from backup
|
||||||
|
|
||||||
|
### 6.3 Network Diagram
|
||||||
|
|
||||||
|
Create a visual diagram showing:
|
||||||
|
- Nodes and their roles
|
||||||
|
- Services and their locations
|
||||||
|
- Network connections
|
||||||
|
- Data flows
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Cluster status
|
||||||
|
docker node ls
|
||||||
|
docker service ls
|
||||||
|
docker stack ls
|
||||||
|
|
||||||
|
# Service management
|
||||||
|
docker service logs <service> --tail 100 -f
|
||||||
|
docker service ps <service>
|
||||||
|
docker service scale <service>=<count>
|
||||||
|
docker service update --force <service>
|
||||||
|
|
||||||
|
# Resource usage
|
||||||
|
docker system df
|
||||||
|
docker stats
|
||||||
|
|
||||||
|
# SSH access
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Manager
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker
|
||||||
|
|
||||||
|
# Web UIs
|
||||||
|
curl http://192.168.2.130:3000 # Dokploy
|
||||||
|
curl http://192.168.2.130:888 # Swarmpit
|
||||||
|
curl http://192.168.2.130:8080 # Traefik
|
||||||
|
curl http://192.168.2.18:5380 # Technitium DNS
|
||||||
|
curl http://192.168.2.18:9001 # MinIO Console
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Questions for You
|
||||||
|
|
||||||
|
Before we proceed, I need to clarify a few things:
|
||||||
|
|
||||||
|
1. **NAS Node Access:** What are the SSH credentials for node-nas (192.168.2.18)?
|
||||||
|
|
||||||
|
2. **bendtstudio-app:** Is this service needed? It has 0 replicas.
|
||||||
|
|
||||||
|
3. **syncthing:** Do you want to keep this? It's currently stopped.
|
||||||
|
|
||||||
|
4. **Monitoring:** Do you want me to set up Prometheus/Grafana now, or later?
|
||||||
|
|
||||||
|
5. **Gitea:** Can you provide access credentials so I can check what's already version controlled?
|
||||||
|
|
||||||
|
6. **Priority:** Which phase should we tackle first? I recommend Phase 1 (critical fixes).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Action Plan Version 1.0 - February 9, 2026*
|
||||||
425
HOMELAB_AUDIT.md
Normal file
425
HOMELAB_AUDIT.md
Normal file
@@ -0,0 +1,425 @@
|
|||||||
|
# Home Lab Cluster Audit Report
|
||||||
|
|
||||||
|
**Date:** February 9, 2026
|
||||||
|
**Auditor:** opencode
|
||||||
|
**Cluster:** Docker Swarm with Dokploy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Cluster Overview
|
||||||
|
|
||||||
|
- **Cluster Type:** Docker Swarm (3 nodes)
|
||||||
|
- **Orchestration:** Dokploy v3.x
|
||||||
|
- **Reverse Proxy:** Traefik v3.6.1
|
||||||
|
- **DNS:** Technitium DNS Server
|
||||||
|
- **Monitoring:** Swarmpit
|
||||||
|
- **Git Server:** Gitea v1.24.4
|
||||||
|
- **Object Storage:** MinIO
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Node Inventory
|
||||||
|
|
||||||
|
### Node 1: tpi-n1 (Controller/Manager)
|
||||||
|
- **IP:** 192.168.2.130
|
||||||
|
- **Role:** Manager (Leader)
|
||||||
|
- **Architecture:** aarch64 (ARM64)
|
||||||
|
- **OS:** Linux
|
||||||
|
- **CPU:** 8 cores
|
||||||
|
- **RAM:** ~8 GB
|
||||||
|
- **Docker:** v27.5.1
|
||||||
|
- **Labels:**
|
||||||
|
- `infra=true`
|
||||||
|
- `role=storage`
|
||||||
|
- `storage=high`
|
||||||
|
- **Status:** Ready, Active
|
||||||
|
|
||||||
|
### Node 2: tpi-n2 (Worker)
|
||||||
|
- **IP:** 192.168.2.19
|
||||||
|
- **Role:** Worker
|
||||||
|
- **Architecture:** aarch64 (ARM64)
|
||||||
|
- **OS:** Linux
|
||||||
|
- **CPU:** 8 cores
|
||||||
|
- **RAM:** ~8 GB
|
||||||
|
- **Docker:** v27.5.1
|
||||||
|
- **Labels:**
|
||||||
|
- `role=compute`
|
||||||
|
- **Status:** Ready, Active
|
||||||
|
|
||||||
|
### Node 3: node-nas (Storage Worker)
|
||||||
|
- **IP:** 192.168.2.18
|
||||||
|
- **Role:** Worker (NAS/Storage)
|
||||||
|
- **Architecture:** x86_64
|
||||||
|
- **OS:** Linux
|
||||||
|
- **CPU:** 2 cores
|
||||||
|
- **RAM:** ~8 GB
|
||||||
|
- **Docker:** v29.1.2
|
||||||
|
- **Labels:**
|
||||||
|
- `type=nas`
|
||||||
|
- **Status:** Ready, Active
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Docker Stacks (Swarm Mode)
|
||||||
|
|
||||||
|
### Active Stacks:
|
||||||
|
|
||||||
|
#### 1. minio
|
||||||
|
- **Services:** 1 (minio_minio)
|
||||||
|
- **Status:** Running
|
||||||
|
- **Node:** node-nas (constrained to NAS)
|
||||||
|
- **Ports:** 9000 (API), 9001 (Console)
|
||||||
|
- **Storage:** /mnt/synology-data/minio (bind mount)
|
||||||
|
- **Credentials:** [REDACTED - see service config]
|
||||||
|
|
||||||
|
#### 2. swarmpit
|
||||||
|
- **Services:** 4
|
||||||
|
- swarmpit_app (UI) - Running on tpi-n1, Port 888
|
||||||
|
- swarmpit_agent (global) - Running on all 3 nodes
|
||||||
|
- swarmpit_db (CouchDB) - Running on tpi-n2
|
||||||
|
- swarmpit_influxdb - Running on node-nas
|
||||||
|
- **Status:** Active with historical failures
|
||||||
|
- **Issues:** Multiple container failures in history (mostly resolved)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Dokploy-Managed Services
|
||||||
|
|
||||||
|
### Running Services (via Dokploy Compose):
|
||||||
|
|
||||||
|
1. **ai-lobechat-yqvecg** - AI chat interface
|
||||||
|
2. **bewcloud-memos-ssogxn** - Note-taking app (⚠️ Restarting loop)
|
||||||
|
3. **bewcloud-silverbullet-42sjev** - SilverBullet markdown editor + Watchtower
|
||||||
|
4. **cloud-bewcloud-u2pls5** - BewCloud instance with Radicale (CalDAV/CardDAV)
|
||||||
|
5. **cloud-fizzy-ezuhfq** - Fizzy web app
|
||||||
|
6. **cloud-ironcalc-0id5k8** - IronCalc spreadsheet
|
||||||
|
7. **cloud-radicale-wqldcv** - Standalone Radicale server
|
||||||
|
8. **cloud-uptimekuma-jdeivt** - Uptime monitoring
|
||||||
|
9. **dns-technitum-6ojgo2** - Technitium DNS server
|
||||||
|
10. **gitea-giteasqlite-bhymqw** - Git server (Port 3000, SSH on 2222)
|
||||||
|
11. **gitea-registry-vdftrt** - Docker registry (Port 5000)
|
||||||
|
|
||||||
|
### Dokploy Infrastructure Services:
|
||||||
|
- **dokploy** - Main Dokploy UI (Port 3000, host mode)
|
||||||
|
- **dokploy-postgres** - Dokploy database
|
||||||
|
- **dokploy-redis** - Dokploy cache
|
||||||
|
- **dokploy-traefik** - Reverse proxy (Ports 80, 443, 8080)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Standalone Services (docker-compose)
|
||||||
|
|
||||||
|
### Running:
|
||||||
|
- **technitium-dns** - DNS server (Port 53, 5380)
|
||||||
|
- **immich3-compose** - Photo management (Immich v2.3.0)
|
||||||
|
- immich-server
|
||||||
|
- immich-machine-learning
|
||||||
|
- immich-database (pgvecto-rs)
|
||||||
|
- immich-redis
|
||||||
|
|
||||||
|
### Stack Services:
|
||||||
|
- **bendtstudio-pancake-bzgfpc** - MariaDB database (Port 3306)
|
||||||
|
- **bendtstudio-webstatic-iq9evl** - Static web files (⚠️ Rollback paused state)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Issues Identified
|
||||||
|
|
||||||
|
### 🔴 Critical Issues:
|
||||||
|
|
||||||
|
1. **bewcloud-memos in Restart Loop**
|
||||||
|
- Container keeps restarting (seen 24 seconds ago)
|
||||||
|
- Status: `Restarting (0) 24 seconds ago`
|
||||||
|
- **Action Required:** Check logs and fix configuration
|
||||||
|
|
||||||
|
2. **bendtstudio-webstatic in Rollback Paused State**
|
||||||
|
- Service is not updating properly
|
||||||
|
- State: `rollback_paused`
|
||||||
|
- **Action Required:** Investigate update failure
|
||||||
|
|
||||||
|
3. **bendtstudio-app Not Running**
|
||||||
|
- Service has 0/0 replicas
|
||||||
|
- **Action Required:** Determine if needed or remove
|
||||||
|
|
||||||
|
4. **syncthing Stopped**
|
||||||
|
- Service has 0 replicas
|
||||||
|
- Should be on node-nas
|
||||||
|
- **Action Required:** Restart or remove if not needed
|
||||||
|
|
||||||
|
### 🟡 Warning Issues:
|
||||||
|
|
||||||
|
5. **Swarmpit Agent Failures (Historical)**
|
||||||
|
- Multiple past failures on all nodes
|
||||||
|
- Currently running but concerning history
|
||||||
|
- **Action Required:** Monitor for stability
|
||||||
|
|
||||||
|
6. **No Monitoring of MinIO**
|
||||||
|
- MinIO running but no obvious backup/monitoring strategy documented
|
||||||
|
- **Action Required:** Set up monitoring and backup
|
||||||
|
|
||||||
|
7. **Credential Management**
|
||||||
|
- Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS)
|
||||||
|
- **Action Required:** Migrate to Docker secrets or env files
|
||||||
|
|
||||||
|
### 🟢 Informational:
|
||||||
|
|
||||||
|
8. **13 Unused/Orphaned Volumes**
|
||||||
|
- 33 total volumes, only 20 active
|
||||||
|
- **Action Required:** Clean up unused volumes to reclaim ~595MB
|
||||||
|
|
||||||
|
9. **Gitea Repository Status Unknown**
|
||||||
|
- Cannot verify if all compose files are version controlled
|
||||||
|
- **Action Required:** Audit Gitea repositories
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Storage Configuration
|
||||||
|
|
||||||
|
### Local Volumes (33 total):
|
||||||
|
Key volumes include:
|
||||||
|
- `dokploy-postgres-database`
|
||||||
|
- `bewcloud-postgres-in40hh-data`
|
||||||
|
- `gitea-data`, `gitea-registry-data`
|
||||||
|
- `immich-postgres`, `immich-redis-data`, `immich-model-cache`
|
||||||
|
- `bendtstudio-pancake-data`
|
||||||
|
- `shared-data` (NFS/shared)
|
||||||
|
- Various app-specific volumes
|
||||||
|
|
||||||
|
### Bind Mounts:
|
||||||
|
- **MinIO:** `/mnt/synology-data/minio` → `/data`
|
||||||
|
- **Syncthing:** `/mnt/synology-data` → `/var/syncthing` (currently stopped)
|
||||||
|
- **Dokploy:** `/etc/dokploy` → `/etc/dokploy`
|
||||||
|
|
||||||
|
### NFS Mounts:
|
||||||
|
- Synology NAS mounted at `/mnt/synology-data/`
|
||||||
|
- Contains: immich/, minio/
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Networking
|
||||||
|
|
||||||
|
### Overlay Networks:
|
||||||
|
- `dokploy-network` - Main Dokploy network
|
||||||
|
- `minio_default` - MinIO stack network
|
||||||
|
- `swarmpit_net` - Swarmpit monitoring network
|
||||||
|
- `ingress` - Docker Swarm ingress
|
||||||
|
|
||||||
|
### Bridge Networks:
|
||||||
|
- Multiple app-specific networks created by compose
|
||||||
|
- `ai-lobechat-yqvecg`
|
||||||
|
- `bewcloud-memos-ssogxn`
|
||||||
|
- `bewcloud-silverbullet-42sjev`
|
||||||
|
- `cloud-fizzy-ezuhfq_default`
|
||||||
|
- `cloud-uptimekuma-jdeivt`
|
||||||
|
- `gitea-giteasqlite-bhymqw`
|
||||||
|
- `gitea-registry-vdftrt`
|
||||||
|
- `immich3-compose-ubyhe9_default`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. SSL/TLS Configuration
|
||||||
|
|
||||||
|
- **Certificate Resolver:** Let's Encrypt (ACME)
|
||||||
|
- **Email:** sirtimbly@gmail.com
|
||||||
|
- **Challenge Type:** HTTP-01
|
||||||
|
- **Storage:** `/etc/dokploy/traefik/dynamic/acme.json`
|
||||||
|
- **Entry Points:** web (80) → websecure (443) with auto-redirect
|
||||||
|
- **HTTP/3:** Enabled on websecure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Traefik Routing
|
||||||
|
|
||||||
|
### Configured Routes (via labels):
|
||||||
|
- gitea.bendtstudio.com → Gitea
|
||||||
|
- Multiple apps via traefik.me subdomains
|
||||||
|
- HTTP → HTTPS redirect enabled
|
||||||
|
- Middlewares configured in `/etc/dokploy/traefik/dynamic/`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. DNS Configuration
|
||||||
|
|
||||||
|
### Technitium DNS:
|
||||||
|
- **Port:** 53 (TCP/UDP), 5380 (Web UI)
|
||||||
|
- **Domain:** dns.bendtstudio.com
|
||||||
|
- **Admin Password:** [REDACTED]
|
||||||
|
- **Placement:** Locked to tpi-n1
|
||||||
|
- **TZ:** America/New_York
|
||||||
|
|
||||||
|
### Services using DNS:
|
||||||
|
- All services accessible via bendtstudio.com subdomains
|
||||||
|
- Internal DNS resolution for Docker services
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Configuration Files Location
|
||||||
|
|
||||||
|
### In `/etc/dokploy/`:
|
||||||
|
- `traefik/traefik.yml` - Main Traefik config
|
||||||
|
- `traefik/dynamic/*.yml` - Dynamic routes and middlewares
|
||||||
|
- `compose/*/code/docker-compose.yml` - Dokploy-managed compose files
|
||||||
|
|
||||||
|
### In `/home/ubuntu/`:
|
||||||
|
- `minio-stack.yml` - MinIO stack definition
|
||||||
|
|
||||||
|
### In local workspace:
|
||||||
|
- Various compose files (not all deployed via Dokploy)
|
||||||
|
- May be out of sync with running services
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Missing Configuration in Version Control
|
||||||
|
|
||||||
|
Based on the analysis, the following may NOT be properly tracked in Gitea:
|
||||||
|
|
||||||
|
1. ✅ **Gitea** itself - compose file present
|
||||||
|
2. ✅ **MinIO** - stack file in ~/minio-stack.yml
|
||||||
|
3. ⚠️ **Dokploy dynamic configs** - traefik routes
|
||||||
|
4. ⚠️ **All Dokploy-managed compose files** - 11 services
|
||||||
|
5. ❌ **Technitium DNS** - compose file in /etc/dokploy/
|
||||||
|
6. ❌ **Immich** - compose configuration
|
||||||
|
7. ❌ **Swarmpit** - stack configuration
|
||||||
|
8. ❌ **Dokploy infrastructure** - internal services
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Resource Usage
|
||||||
|
|
||||||
|
### Docker System:
|
||||||
|
- **Images:** 23 (10.91 GB)
|
||||||
|
- **Containers:** 26 (135 MB)
|
||||||
|
- **Volumes:** 33 (2.02 GB, 595MB reclaimable)
|
||||||
|
- **Build Cache:** 0
|
||||||
|
|
||||||
|
### Node Resources:
|
||||||
|
- **tpi-n1 & tpi-n2:** 8 cores ARM64, 8GB RAM each
|
||||||
|
- **node-nas:** 2 cores x86_64, 8GB RAM
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Recommendations
|
||||||
|
|
||||||
|
### Immediate Actions (High Priority):
|
||||||
|
|
||||||
|
1. **Fix bewcloud-memos**
|
||||||
|
```bash
|
||||||
|
docker service logs bewcloud-memos-ssogxn-memos --tail 50
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Fix bendtstudio-webstatic**
|
||||||
|
```bash
|
||||||
|
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
|
||||||
|
docker service update --force bendtstudio-webstatic-iq9evl
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Restart or Remove syncthing**
|
||||||
|
```bash
|
||||||
|
# Option 1: Scale up
|
||||||
|
docker service scale syncthing=1
|
||||||
|
|
||||||
|
# Option 2: Remove
|
||||||
|
docker service rm syncthing
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Clean up unused volumes**
|
||||||
|
```bash
|
||||||
|
docker volume prune
|
||||||
|
```
|
||||||
|
|
||||||
|
### Short-term Actions (Medium Priority):
|
||||||
|
|
||||||
|
5. **Audit Gitea repositories**
|
||||||
|
- Access Gitea at http://gitea.bendtstudio.com
|
||||||
|
- Verify which compose files are tracked
|
||||||
|
- Commit missing configurations
|
||||||
|
|
||||||
|
6. **Secure credentials**
|
||||||
|
- Use Docker secrets for passwords
|
||||||
|
- Move credentials to environment files
|
||||||
|
- Never commit .env files with real passwords
|
||||||
|
|
||||||
|
7. **Set up automated backups**
|
||||||
|
- Back up Dokploy database
|
||||||
|
- Back up Gitea repositories
|
||||||
|
- Back up MinIO data
|
||||||
|
|
||||||
|
8. **Document all services**
|
||||||
|
- Create README for each service
|
||||||
|
- Document dependencies and data locations
|
||||||
|
- Create runbook for common operations
|
||||||
|
|
||||||
|
### Long-term Actions (Low Priority):
|
||||||
|
|
||||||
|
9. **Implement proper monitoring**
|
||||||
|
- Prometheus/Grafana for metrics (mentioned in PLAN.md but not found)
|
||||||
|
- Alerting for service failures
|
||||||
|
- Disk usage monitoring
|
||||||
|
|
||||||
|
10. **Implement GitOps workflow**
|
||||||
|
- All changes through Git
|
||||||
|
- Automated deployments via Dokploy webhooks
|
||||||
|
- Configuration drift detection
|
||||||
|
|
||||||
|
11. **Consolidate storage strategy**
|
||||||
|
- Define clear policy for volumes vs bind mounts
|
||||||
|
- Document backup procedures for each storage type
|
||||||
|
|
||||||
|
12. **Security audit**
|
||||||
|
- Review all exposed ports
|
||||||
|
- Check for default/weak passwords
|
||||||
|
- Implement network segmentation if needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Next Steps Checklist
|
||||||
|
|
||||||
|
- [ ] Fix critical service issues (memos, webstatic)
|
||||||
|
- [ ] Document all running services with purpose
|
||||||
|
- [ ] Commit all compose files to Gitea
|
||||||
|
- [ ] Create backup strategy
|
||||||
|
- [ ] Set up monitoring and alerting
|
||||||
|
- [ ] Clean up unused resources
|
||||||
|
- [ ] Create disaster recovery plan
|
||||||
|
- [ ] Document SSH access for all nodes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix A: Quick Commands Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View cluster status
|
||||||
|
docker node ls
|
||||||
|
docker service ls
|
||||||
|
docker stack ls
|
||||||
|
|
||||||
|
# View service logs
|
||||||
|
docker service logs <service-name> --tail 100 -f
|
||||||
|
|
||||||
|
# View container logs
|
||||||
|
docker logs <container-name> --tail 100 -f
|
||||||
|
|
||||||
|
# Scale a service
|
||||||
|
docker service scale <service-name>=<replicas>
|
||||||
|
|
||||||
|
# Update a service
|
||||||
|
docker service update --force <service-name>
|
||||||
|
|
||||||
|
# SSH to nodes
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # tpi-n1 (manager)
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # tpi-n2 (worker)
|
||||||
|
# NAS node requires different credentials
|
||||||
|
|
||||||
|
# Access Dokploy UI
|
||||||
|
http://192.168.2.130:3000
|
||||||
|
|
||||||
|
# Access Swarmpit UI
|
||||||
|
http://192.168.2.130:888
|
||||||
|
|
||||||
|
# Access Traefik Dashboard
|
||||||
|
http://192.168.2.130:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*End of Audit Report*
|
||||||
34
QUICK_REFERENCE.md
Normal file
34
QUICK_REFERENCE.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
## Cluster Access
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SSH to controller
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||||
|
|
||||||
|
# SSH to worker
|
||||||
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19
|
||||||
|
|
||||||
|
# SSH to NAS node
|
||||||
|
ssh tim@192.168.2.18
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dokploy
|
||||||
|
|
||||||
|
Configured to deploy across the nodes in the cluster. The nas node has access to a NFS share at <????>
|
||||||
|
|
||||||
|
Dokploy has an s3 compatible storage which is pointing to minio on the nas node. This is used for backups.
|
||||||
|
|
||||||
|
## Minio
|
||||||
|
|
||||||
|
Minio is installed on the nas node. It is used for backups of the dokploy database and for storing the dokploy s3 compatible storage.
|
||||||
|
|
||||||
|
## Traefik
|
||||||
|
|
||||||
|
Traefik is installed on the controller node. It is used to route traffic to the various services on the cluster.
|
||||||
|
|
||||||
|
## Gitea
|
||||||
|
|
||||||
|
Used for storing all the compose and stack yml files for each service.
|
||||||
|
|
||||||
|
## Technitium DNS
|
||||||
|
|
||||||
|
Using internal DNS requires configuring the docker DNS to point to the Technitium DNS server.
|
||||||
103
SETUP.md
Normal file
103
SETUP.md
Normal file
@@ -0,0 +1,103 @@
|
|||||||
|
Dokploy + Docker Swarm Homelab Setup Instructions
|
||||||
|
This guide walks through setting up a fresh, multi-node Docker Swarm cluster using Dokploy for quick web app deployment and easy hosting of infrastructure services (like Pi-hole and Minio), including shared storage via NFS from your NAS node.
|
||||||
|
|
||||||
|
1. Prepare Environment
|
||||||
|
• Choose a primary node (can be any capable Linux server).
|
||||||
|
• Identify your NAS node (high capacity storage).
|
||||||
|
• Gather all SSH credentials.
|
||||||
|
• Ensure all nodes have Docker installed (curl -fsSL https://get.docker.com | sh).
|
||||||
|
|
||||||
|
2. Initialize Docker Swarm Cluster
|
||||||
|
On your primary node:
|
||||||
|
docker swarm init --advertise-addr <PRIMARY_NODE_IP>
|
||||||
|
On each additional node:
|
||||||
|
• Run the join command given by the previous step, e.g.:
|
||||||
|
docker swarm join --token <TOKEN> <PRIMARY_NODE_IP>:2377
|
||||||
|
|
||||||
|
3. Label Nodes for Placement Constraints
|
||||||
|
On your primary node, label nodes:
|
||||||
|
docker node update --label-add role=storage nas-node-01
|
||||||
|
docker node update --label-add storage=high nas-node-01
|
||||||
|
docker node update --label-add role=compute node-light-01
|
||||||
|
docker node update --label-add infra=true nas-node-01
|
||||||
|
(Replace node names as appropriate)
|
||||||
|
|
||||||
|
4. Set Up Dokploy
|
||||||
|
On primary node:
|
||||||
|
curl -sSL https://dokploy.com/install.sh | sh
|
||||||
|
• Dokploy UI will be available on port 8080.
|
||||||
|
• Default credentials: admin / admin (change ASAP).
|
||||||
|
|
||||||
|
5. Set Up Shared NFS Storage from Your NAS
|
||||||
|
On your NAS node:
|
||||||
|
• Install NFS server (Debian/Ubuntu):
|
||||||
|
sudo apt install nfs-kernel-server
|
||||||
|
• Export a directory:
|
||||||
|
o Edit /etc/exports, add:
|
||||||
|
/mnt/storage/docker-data *(rw,sync,no_subtree_check)
|
||||||
|
o Restart NFS:
|
||||||
|
sudo exportfs -ra
|
||||||
|
sudo systemctl restart nfs-kernel-server
|
||||||
|
|
||||||
|
6. Create Shared NFS Volume in Docker
|
||||||
|
On the manager node:
|
||||||
|
docker volume create
|
||||||
|
--driver local
|
||||||
|
--opt type=nfs
|
||||||
|
--opt o=addr=<NAS_IP>,rw,nolock,nfsvers=4
|
||||||
|
--opt device=:/mnt/storage/docker-data
|
||||||
|
shared-data
|
||||||
|
(Replace <NAS_IP> with your NAS's address.)
|
||||||
|
|
||||||
|
7. Deploy Apps with Dokploy + Placement Constraints
|
||||||
|
• Use Dokploy UI to:
|
||||||
|
o Deploy your web apps (Node.js, PHP, static sites)
|
||||||
|
o Set replica counts (scaling)
|
||||||
|
o Pin infrastructure apps (like Pi-hole or Minio) to the NAS node via placement constraints.
|
||||||
|
o Use the shared NFS volume for persistent data.
|
||||||
|
Example Docker Compose snippet for Pinning:
|
||||||
|
services:
|
||||||
|
pihole:
|
||||||
|
image: pihole/pihole
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.role==storage
|
||||||
|
volumes:
|
||||||
|
- shared-data:/etc/pihole
|
||||||
|
|
||||||
|
8. (Optional) Set Up Minio (S3-Compatible Storage)
|
||||||
|
• Deploy Minio with Dokploy, pin it to your NAS, and use shared volume for data:
|
||||||
|
services:
|
||||||
|
minio:
|
||||||
|
image: minio/minio
|
||||||
|
command: server /data --console-address ":9001"
|
||||||
|
environment:
|
||||||
|
MINIO_ROOT_USER: admin
|
||||||
|
MINIO_ROOT_PASSWORD: changeme123
|
||||||
|
volumes:
|
||||||
|
- shared-data:/data
|
||||||
|
deploy:
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.labels.role==storage
|
||||||
|
ports:
|
||||||
|
- "9000:9000"
|
||||||
|
- "9001:9001"
|
||||||
|
|
||||||
|
9. Add Web Apps and Experiment!
|
||||||
|
• Use Dokploy's UI to connect to your Gitea instance, auto-deploy repos, and experiment rapidly.
|
||||||
|
• Traefik integration and SSL setup is handled automatically in Dokploy.
|
||||||
|
|
||||||
|
10. Restore K3s (Optional, Later)
|
||||||
|
• Your original K3s manifests are saved in git—just reapply if you wish to revert:
|
||||||
|
k3s server
|
||||||
|
kubectl apply -f <your-manifests>
|
||||||
|
|
||||||
|
References
|
||||||
|
• Docker Swarm Docs: https://docs.docker.com/engine/swarm/
|
||||||
|
• Dokploy Docs: https://dokploy.com/docs/
|
||||||
|
• Docker Volumes: https://docs.docker.com/engine/storage/volumes/
|
||||||
|
• NFS on Linux: https://help.ubuntu.com/community/NFS
|
||||||
|
|
||||||
|
This guide gives you a fast start for a declarative, multi-node homelab with web app simplicity and infrastructure reliability using Dokploy and Docker Swarm!
|
||||||
Reference in New Issue
Block a user