11 KiB
11 KiB
Home Lab Cluster Audit Report
Date: February 9, 2026
Auditor: opencode
Cluster: Docker Swarm with Dokploy
1. Cluster Overview
- Cluster Type: Docker Swarm (3 nodes)
- Orchestration: Dokploy v3.x
- Reverse Proxy: Traefik v3.6.1
- DNS: Technitium DNS Server
- Monitoring: Swarmpit
- Git Server: Gitea v1.24.4
- Object Storage: MinIO
2. Node Inventory
Node 1: tpi-n1 (Controller/Manager)
- IP: 192.168.2.130
- Role: Manager (Leader)
- Architecture: aarch64 (ARM64)
- OS: Linux
- CPU: 8 cores
- RAM: ~8 GB
- Docker: v27.5.1
- Labels:
infra=truerole=storagestorage=high
- Status: Ready, Active
Node 2: tpi-n2 (Worker)
- IP: 192.168.2.19
- Role: Worker
- Architecture: aarch64 (ARM64)
- OS: Linux
- CPU: 8 cores
- RAM: ~8 GB
- Docker: v27.5.1
- Labels:
role=compute
- Status: Ready, Active
Node 3: node-nas (Storage Worker)
- IP: 192.168.2.18
- Role: Worker (NAS/Storage)
- Architecture: x86_64
- OS: Linux
- CPU: 2 cores
- RAM: ~8 GB
- Docker: v29.1.2
- Labels:
type=nas
- Status: Ready, Active
3. Docker Stacks (Swarm Mode)
Active Stacks:
1. minio
- Services: 1 (minio_minio)
- Status: Running
- Node: node-nas (constrained to NAS)
- Ports: 9000 (API), 9001 (Console)
- Storage: /mnt/synology-data/minio (bind mount)
- Credentials: [REDACTED - see service config]
2. swarmpit
- Services: 4
- swarmpit_app (UI) - Running on tpi-n1, Port 888
- swarmpit_agent (global) - Running on all 3 nodes
- swarmpit_db (CouchDB) - Running on tpi-n2
- swarmpit_influxdb - Running on node-nas
- Status: Active with historical failures
- Issues: Multiple container failures in history (mostly resolved)
4. Dokploy-Managed Services
Running Services (via Dokploy Compose):
- ai-lobechat-yqvecg - AI chat interface
- bewcloud-memos-ssogxn - Note-taking app (⚠️ Restarting loop)
- bewcloud-silverbullet-42sjev - SilverBullet markdown editor + Watchtower
- cloud-bewcloud-u2pls5 - BewCloud instance with Radicale (CalDAV/CardDAV)
- cloud-fizzy-ezuhfq - Fizzy web app
- cloud-ironcalc-0id5k8 - IronCalc spreadsheet
- cloud-radicale-wqldcv - Standalone Radicale server
- cloud-uptimekuma-jdeivt - Uptime monitoring
- dns-technitum-6ojgo2 - Technitium DNS server
- gitea-giteasqlite-bhymqw - Git server (Port 3000, SSH on 2222)
- gitea-registry-vdftrt - Docker registry (Port 5000)
Dokploy Infrastructure Services:
- dokploy - Main Dokploy UI (Port 3000, host mode)
- dokploy-postgres - Dokploy database
- dokploy-redis - Dokploy cache
- dokploy-traefik - Reverse proxy (Ports 80, 443, 8080)
5. Standalone Services (docker-compose)
Running:
- technitium-dns - DNS server (Port 53, 5380)
- immich3-compose - Photo management (Immich v2.3.0)
- immich-server
- immich-machine-learning
- immich-database (pgvecto-rs)
- immich-redis
Stack Services:
- bendtstudio-pancake-bzgfpc - MariaDB database (Port 3306)
- bendtstudio-webstatic-iq9evl - Static web files (⚠️ Rollback paused state)
6. Issues Identified
🔴 Critical Issues:
-
bewcloud-memos in Restart Loop
- Container keeps restarting (seen 24 seconds ago)
- Status:
Restarting (0) 24 seconds ago - Action Required: Check logs and fix configuration
-
bendtstudio-webstatic in Rollback Paused State
- Service is not updating properly
- State:
rollback_paused - Action Required: Investigate update failure
-
bendtstudio-app Not Running
- Service has 0/0 replicas
- Action Required: Determine if needed or remove
-
syncthing Stopped
- Service has 0 replicas
- Should be on node-nas
- Action Required: Restart or remove if not needed
🟡 Warning Issues:
-
Swarmpit Agent Failures (Historical)
- Multiple past failures on all nodes
- Currently running but concerning history
- Action Required: Monitor for stability
-
No Monitoring of MinIO
- MinIO running but no obvious backup/monitoring strategy documented
- Action Required: Set up monitoring and backup
-
Credential Management
- Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS)
- Action Required: Migrate to Docker secrets or env files
🟢 Informational:
-
13 Unused/Orphaned Volumes
- 33 total volumes, only 20 active
- Action Required: Clean up unused volumes to reclaim ~595MB
-
Gitea Repository Status Unknown
- Cannot verify if all compose files are version controlled
- Action Required: Audit Gitea repositories
7. Storage Configuration
Local Volumes (33 total):
Key volumes include:
dokploy-postgres-databasebewcloud-postgres-in40hh-datagitea-data,gitea-registry-dataimmich-postgres,immich-redis-data,immich-model-cachebendtstudio-pancake-datashared-data(NFS/shared)- Various app-specific volumes
Bind Mounts:
- MinIO:
/mnt/synology-data/minio→/data - Syncthing:
/mnt/synology-data→/var/syncthing(currently stopped) - Dokploy:
/etc/dokploy→/etc/dokploy
NFS Mounts:
- Synology NAS mounted at
/mnt/synology-data/ - Contains: immich/, minio/
8. Networking
Overlay Networks:
dokploy-network- Main Dokploy networkminio_default- MinIO stack networkswarmpit_net- Swarmpit monitoring networkingress- Docker Swarm ingress
Bridge Networks:
- Multiple app-specific networks created by compose
ai-lobechat-yqvecgbewcloud-memos-ssogxnbewcloud-silverbullet-42sjevcloud-fizzy-ezuhfq_defaultcloud-uptimekuma-jdeivtgitea-giteasqlite-bhymqwgitea-registry-vdftrtimmich3-compose-ubyhe9_default
9. SSL/TLS Configuration
- Certificate Resolver: Let's Encrypt (ACME)
- Email: sirtimbly@gmail.com
- Challenge Type: HTTP-01
- Storage:
/etc/dokploy/traefik/dynamic/acme.json - Entry Points: web (80) → websecure (443) with auto-redirect
- HTTP/3: Enabled on websecure
10. Traefik Routing
Configured Routes (via labels):
- gitea.bendtstudio.com → Gitea
- Multiple apps via traefik.me subdomains
- HTTP → HTTPS redirect enabled
- Middlewares configured in
/etc/dokploy/traefik/dynamic/
11. DNS Configuration
Technitium DNS:
- Port: 53 (TCP/UDP), 5380 (Web UI)
- Domain: dns.bendtstudio.com
- Admin Password: [REDACTED]
- Placement: Locked to tpi-n1
- TZ: America/New_York
Services using DNS:
- All services accessible via bendtstudio.com subdomains
- Internal DNS resolution for Docker services
12. Configuration Files Location
In /etc/dokploy/:
traefik/traefik.yml- Main Traefik configtraefik/dynamic/*.yml- Dynamic routes and middlewarescompose/*/code/docker-compose.yml- Dokploy-managed compose files
In /home/ubuntu/:
minio-stack.yml- MinIO stack definition
In local workspace:
- Various compose files (not all deployed via Dokploy)
- May be out of sync with running services
13. Missing Configuration in Version Control
Based on the analysis, the following may NOT be properly tracked in Gitea:
- ✅ Gitea itself - compose file present
- ✅ MinIO - stack file in ~/minio-stack.yml
- ⚠️ Dokploy dynamic configs - traefik routes
- ⚠️ All Dokploy-managed compose files - 11 services
- ❌ Technitium DNS - compose file in /etc/dokploy/
- ❌ Immich - compose configuration
- ❌ Swarmpit - stack configuration
- ❌ Dokploy infrastructure - internal services
14. Resource Usage
Docker System:
- Images: 23 (10.91 GB)
- Containers: 26 (135 MB)
- Volumes: 33 (2.02 GB, 595MB reclaimable)
- Build Cache: 0
Node Resources:
- tpi-n1 & tpi-n2: 8 cores ARM64, 8GB RAM each
- node-nas: 2 cores x86_64, 8GB RAM
15. Recommendations
Immediate Actions (High Priority):
-
Fix bewcloud-memos
docker service logs bewcloud-memos-ssogxn-memos --tail 50 -
Fix bendtstudio-webstatic
docker service ps bendtstudio-webstatic-iq9evl --no-trunc docker service update --force bendtstudio-webstatic-iq9evl -
Restart or Remove syncthing
# Option 1: Scale up docker service scale syncthing=1 # Option 2: Remove docker service rm syncthing -
Clean up unused volumes
docker volume prune
Short-term Actions (Medium Priority):
-
Audit Gitea repositories
- Access Gitea at http://gitea.bendtstudio.com
- Verify which compose files are tracked
- Commit missing configurations
-
Secure credentials
- Use Docker secrets for passwords
- Move credentials to environment files
- Never commit .env files with real passwords
-
Set up automated backups
- Back up Dokploy database
- Back up Gitea repositories
- Back up MinIO data
-
Document all services
- Create README for each service
- Document dependencies and data locations
- Create runbook for common operations
Long-term Actions (Low Priority):
-
Implement proper monitoring
- Prometheus/Grafana for metrics (mentioned in PLAN.md but not found)
- Alerting for service failures
- Disk usage monitoring
-
Implement GitOps workflow
- All changes through Git
- Automated deployments via Dokploy webhooks
- Configuration drift detection
-
Consolidate storage strategy
- Define clear policy for volumes vs bind mounts
- Document backup procedures for each storage type
-
Security audit
- Review all exposed ports
- Check for default/weak passwords
- Implement network segmentation if needed
16. Next Steps Checklist
- Fix critical service issues (memos, webstatic)
- Document all running services with purpose
- Commit all compose files to Gitea
- Create backup strategy
- Set up monitoring and alerting
- Clean up unused resources
- Create disaster recovery plan
- Document SSH access for all nodes
Appendix A: Quick Commands Reference
# View cluster status
docker node ls
docker service ls
docker stack ls
# View service logs
docker service logs <service-name> --tail 100 -f
# View container logs
docker logs <container-name> --tail 100 -f
# Scale a service
docker service scale <service-name>=<replicas>
# Update a service
docker service update --force <service-name>
# SSH to nodes
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # tpi-n1 (manager)
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # tpi-n2 (worker)
# NAS node requires different credentials
# Access Dokploy UI
http://192.168.2.130:3000
# Access Swarmpit UI
http://192.168.2.130:888
# Access Traefik Dashboard
http://192.168.2.130:8080
End of Audit Report