426 lines
11 KiB
Markdown
426 lines
11 KiB
Markdown
# Home Lab Cluster Audit Report
|
|
|
|
**Date:** February 9, 2026
|
|
**Auditor:** opencode
|
|
**Cluster:** Docker Swarm with Dokploy
|
|
|
|
---
|
|
|
|
## 1. Cluster Overview
|
|
|
|
- **Cluster Type:** Docker Swarm (3 nodes)
|
|
- **Orchestration:** Dokploy v3.x
|
|
- **Reverse Proxy:** Traefik v3.6.1
|
|
- **DNS:** Technitium DNS Server
|
|
- **Monitoring:** Swarmpit
|
|
- **Git Server:** Gitea v1.24.4
|
|
- **Object Storage:** MinIO
|
|
|
|
---
|
|
|
|
## 2. Node Inventory
|
|
|
|
### Node 1: tpi-n1 (Controller/Manager)
|
|
- **IP:** 192.168.2.130
|
|
- **Role:** Manager (Leader)
|
|
- **Architecture:** aarch64 (ARM64)
|
|
- **OS:** Linux
|
|
- **CPU:** 8 cores
|
|
- **RAM:** ~8 GB
|
|
- **Docker:** v27.5.1
|
|
- **Labels:**
|
|
- `infra=true`
|
|
- `role=storage`
|
|
- `storage=high`
|
|
- **Status:** Ready, Active
|
|
|
|
### Node 2: tpi-n2 (Worker)
|
|
- **IP:** 192.168.2.19
|
|
- **Role:** Worker
|
|
- **Architecture:** aarch64 (ARM64)
|
|
- **OS:** Linux
|
|
- **CPU:** 8 cores
|
|
- **RAM:** ~8 GB
|
|
- **Docker:** v27.5.1
|
|
- **Labels:**
|
|
- `role=compute`
|
|
- **Status:** Ready, Active
|
|
|
|
### Node 3: node-nas (Storage Worker)
|
|
- **IP:** 192.168.2.18
|
|
- **Role:** Worker (NAS/Storage)
|
|
- **Architecture:** x86_64
|
|
- **OS:** Linux
|
|
- **CPU:** 2 cores
|
|
- **RAM:** ~8 GB
|
|
- **Docker:** v29.1.2
|
|
- **Labels:**
|
|
- `type=nas`
|
|
- **Status:** Ready, Active
|
|
|
|
---
|
|
|
|
## 3. Docker Stacks (Swarm Mode)
|
|
|
|
### Active Stacks:
|
|
|
|
#### 1. minio
|
|
- **Services:** 1 (minio_minio)
|
|
- **Status:** Running
|
|
- **Node:** node-nas (constrained to NAS)
|
|
- **Ports:** 9000 (API), 9001 (Console)
|
|
- **Storage:** /mnt/synology-data/minio (bind mount)
|
|
- **Credentials:** [REDACTED - see service config]
|
|
|
|
#### 2. swarmpit
|
|
- **Services:** 4
|
|
- swarmpit_app (UI) - Running on tpi-n1, Port 888
|
|
- swarmpit_agent (global) - Running on all 3 nodes
|
|
- swarmpit_db (CouchDB) - Running on tpi-n2
|
|
- swarmpit_influxdb - Running on node-nas
|
|
- **Status:** Active with historical failures
|
|
- **Issues:** Multiple container failures in history (mostly resolved)
|
|
|
|
---
|
|
|
|
## 4. Dokploy-Managed Services
|
|
|
|
### Running Services (via Dokploy Compose):
|
|
|
|
1. **ai-lobechat-yqvecg** - AI chat interface
|
|
2. **bewcloud-memos-ssogxn** - Note-taking app (⚠️ Restarting loop)
|
|
3. **bewcloud-silverbullet-42sjev** - SilverBullet markdown editor + Watchtower
|
|
4. **cloud-bewcloud-u2pls5** - BewCloud instance with Radicale (CalDAV/CardDAV)
|
|
5. **cloud-fizzy-ezuhfq** - Fizzy web app
|
|
6. **cloud-ironcalc-0id5k8** - IronCalc spreadsheet
|
|
7. **cloud-radicale-wqldcv** - Standalone Radicale server
|
|
8. **cloud-uptimekuma-jdeivt** - Uptime monitoring
|
|
9. **dns-technitum-6ojgo2** - Technitium DNS server
|
|
10. **gitea-giteasqlite-bhymqw** - Git server (Port 3000, SSH on 2222)
|
|
11. **gitea-registry-vdftrt** - Docker registry (Port 5000)
|
|
|
|
### Dokploy Infrastructure Services:
|
|
- **dokploy** - Main Dokploy UI (Port 3000, host mode)
|
|
- **dokploy-postgres** - Dokploy database
|
|
- **dokploy-redis** - Dokploy cache
|
|
- **dokploy-traefik** - Reverse proxy (Ports 80, 443, 8080)
|
|
|
|
---
|
|
|
|
## 5. Standalone Services (docker-compose)
|
|
|
|
### Running:
|
|
- **technitium-dns** - DNS server (Port 53, 5380)
|
|
- **immich3-compose** - Photo management (Immich v2.3.0)
|
|
- immich-server
|
|
- immich-machine-learning
|
|
- immich-database (pgvecto-rs)
|
|
- immich-redis
|
|
|
|
### Stack Services:
|
|
- **bendtstudio-pancake-bzgfpc** - MariaDB database (Port 3306)
|
|
- **bendtstudio-webstatic-iq9evl** - Static web files (⚠️ Rollback paused state)
|
|
|
|
---
|
|
|
|
## 6. Issues Identified
|
|
|
|
### 🔴 Critical Issues:
|
|
|
|
1. **bewcloud-memos in Restart Loop**
|
|
- Container keeps restarting (seen 24 seconds ago)
|
|
- Status: `Restarting (0) 24 seconds ago`
|
|
- **Action Required:** Check logs and fix configuration
|
|
|
|
2. **bendtstudio-webstatic in Rollback Paused State**
|
|
- Service is not updating properly
|
|
- State: `rollback_paused`
|
|
- **Action Required:** Investigate update failure
|
|
|
|
3. **bendtstudio-app Not Running**
|
|
- Service has 0/0 replicas
|
|
- **Action Required:** Determine if needed or remove
|
|
|
|
4. **syncthing Stopped**
|
|
- Service has 0 replicas
|
|
- Should be on node-nas
|
|
- **Action Required:** Restart or remove if not needed
|
|
|
|
### 🟡 Warning Issues:
|
|
|
|
5. **Swarmpit Agent Failures (Historical)**
|
|
- Multiple past failures on all nodes
|
|
- Currently running but concerning history
|
|
- **Action Required:** Monitor for stability
|
|
|
|
6. **No Monitoring of MinIO**
|
|
- MinIO running but no obvious backup/monitoring strategy documented
|
|
- **Action Required:** Set up monitoring and backup
|
|
|
|
7. **Credential Management**
|
|
- Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS)
|
|
- **Action Required:** Migrate to Docker secrets or env files
|
|
|
|
### 🟢 Informational:
|
|
|
|
8. **13 Unused/Orphaned Volumes**
|
|
- 33 total volumes, only 20 active
|
|
- **Action Required:** Clean up unused volumes to reclaim ~595MB
|
|
|
|
9. **Gitea Repository Status Unknown**
|
|
- Cannot verify if all compose files are version controlled
|
|
- **Action Required:** Audit Gitea repositories
|
|
|
|
---
|
|
|
|
## 7. Storage Configuration
|
|
|
|
### Local Volumes (33 total):
|
|
Key volumes include:
|
|
- `dokploy-postgres-database`
|
|
- `bewcloud-postgres-in40hh-data`
|
|
- `gitea-data`, `gitea-registry-data`
|
|
- `immich-postgres`, `immich-redis-data`, `immich-model-cache`
|
|
- `bendtstudio-pancake-data`
|
|
- `shared-data` (NFS/shared)
|
|
- Various app-specific volumes
|
|
|
|
### Bind Mounts:
|
|
- **MinIO:** `/mnt/synology-data/minio` → `/data`
|
|
- **Syncthing:** `/mnt/synology-data` → `/var/syncthing` (currently stopped)
|
|
- **Dokploy:** `/etc/dokploy` → `/etc/dokploy`
|
|
|
|
### NFS Mounts:
|
|
- Synology NAS mounted at `/mnt/synology-data/`
|
|
- Contains: immich/, minio/
|
|
|
|
---
|
|
|
|
## 8. Networking
|
|
|
|
### Overlay Networks:
|
|
- `dokploy-network` - Main Dokploy network
|
|
- `minio_default` - MinIO stack network
|
|
- `swarmpit_net` - Swarmpit monitoring network
|
|
- `ingress` - Docker Swarm ingress
|
|
|
|
### Bridge Networks:
|
|
- Multiple app-specific networks created by compose
|
|
- `ai-lobechat-yqvecg`
|
|
- `bewcloud-memos-ssogxn`
|
|
- `bewcloud-silverbullet-42sjev`
|
|
- `cloud-fizzy-ezuhfq_default`
|
|
- `cloud-uptimekuma-jdeivt`
|
|
- `gitea-giteasqlite-bhymqw`
|
|
- `gitea-registry-vdftrt`
|
|
- `immich3-compose-ubyhe9_default`
|
|
|
|
---
|
|
|
|
## 9. SSL/TLS Configuration
|
|
|
|
- **Certificate Resolver:** Let's Encrypt (ACME)
|
|
- **Email:** sirtimbly@gmail.com
|
|
- **Challenge Type:** HTTP-01
|
|
- **Storage:** `/etc/dokploy/traefik/dynamic/acme.json`
|
|
- **Entry Points:** web (80) → websecure (443) with auto-redirect
|
|
- **HTTP/3:** Enabled on websecure
|
|
|
|
---
|
|
|
|
## 10. Traefik Routing
|
|
|
|
### Configured Routes (via labels):
|
|
- gitea.bendtstudio.com → Gitea
|
|
- Multiple apps via traefik.me subdomains
|
|
- HTTP → HTTPS redirect enabled
|
|
- Middlewares configured in `/etc/dokploy/traefik/dynamic/`
|
|
|
|
---
|
|
|
|
## 11. DNS Configuration
|
|
|
|
### Technitium DNS:
|
|
- **Port:** 53 (TCP/UDP), 5380 (Web UI)
|
|
- **Domain:** dns.bendtstudio.com
|
|
- **Admin Password:** [REDACTED]
|
|
- **Placement:** Locked to tpi-n1
|
|
- **TZ:** America/New_York
|
|
|
|
### Services using DNS:
|
|
- All services accessible via bendtstudio.com subdomains
|
|
- Internal DNS resolution for Docker services
|
|
|
|
---
|
|
|
|
## 12. Configuration Files Location
|
|
|
|
### In `/etc/dokploy/`:
|
|
- `traefik/traefik.yml` - Main Traefik config
|
|
- `traefik/dynamic/*.yml` - Dynamic routes and middlewares
|
|
- `compose/*/code/docker-compose.yml` - Dokploy-managed compose files
|
|
|
|
### In `/home/ubuntu/`:
|
|
- `minio-stack.yml` - MinIO stack definition
|
|
|
|
### In local workspace:
|
|
- Various compose files (not all deployed via Dokploy)
|
|
- May be out of sync with running services
|
|
|
|
---
|
|
|
|
## 13. Missing Configuration in Version Control
|
|
|
|
Based on the analysis, the following may NOT be properly tracked in Gitea:
|
|
|
|
1. ✅ **Gitea** itself - compose file present
|
|
2. ✅ **MinIO** - stack file in ~/minio-stack.yml
|
|
3. ⚠️ **Dokploy dynamic configs** - traefik routes
|
|
4. ⚠️ **All Dokploy-managed compose files** - 11 services
|
|
5. ❌ **Technitium DNS** - compose file in /etc/dokploy/
|
|
6. ❌ **Immich** - compose configuration
|
|
7. ❌ **Swarmpit** - stack configuration
|
|
8. ❌ **Dokploy infrastructure** - internal services
|
|
|
|
---
|
|
|
|
## 14. Resource Usage
|
|
|
|
### Docker System:
|
|
- **Images:** 23 (10.91 GB)
|
|
- **Containers:** 26 (135 MB)
|
|
- **Volumes:** 33 (2.02 GB, 595MB reclaimable)
|
|
- **Build Cache:** 0
|
|
|
|
### Node Resources:
|
|
- **tpi-n1 & tpi-n2:** 8 cores ARM64, 8GB RAM each
|
|
- **node-nas:** 2 cores x86_64, 8GB RAM
|
|
|
|
---
|
|
|
|
## 15. Recommendations
|
|
|
|
### Immediate Actions (High Priority):
|
|
|
|
1. **Fix bewcloud-memos**
|
|
```bash
|
|
docker service logs bewcloud-memos-ssogxn-memos --tail 50
|
|
```
|
|
|
|
2. **Fix bendtstudio-webstatic**
|
|
```bash
|
|
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
|
|
docker service update --force bendtstudio-webstatic-iq9evl
|
|
```
|
|
|
|
3. **Restart or Remove syncthing**
|
|
```bash
|
|
# Option 1: Scale up
|
|
docker service scale syncthing=1
|
|
|
|
# Option 2: Remove
|
|
docker service rm syncthing
|
|
```
|
|
|
|
4. **Clean up unused volumes**
|
|
```bash
|
|
docker volume prune
|
|
```
|
|
|
|
### Short-term Actions (Medium Priority):
|
|
|
|
5. **Audit Gitea repositories**
|
|
- Access Gitea at http://gitea.bendtstudio.com
|
|
- Verify which compose files are tracked
|
|
- Commit missing configurations
|
|
|
|
6. **Secure credentials**
|
|
- Use Docker secrets for passwords
|
|
- Move credentials to environment files
|
|
- Never commit .env files with real passwords
|
|
|
|
7. **Set up automated backups**
|
|
- Back up Dokploy database
|
|
- Back up Gitea repositories
|
|
- Back up MinIO data
|
|
|
|
8. **Document all services**
|
|
- Create README for each service
|
|
- Document dependencies and data locations
|
|
- Create runbook for common operations
|
|
|
|
### Long-term Actions (Low Priority):
|
|
|
|
9. **Implement proper monitoring**
|
|
- Prometheus/Grafana for metrics (mentioned in PLAN.md but not found)
|
|
- Alerting for service failures
|
|
- Disk usage monitoring
|
|
|
|
10. **Implement GitOps workflow**
|
|
- All changes through Git
|
|
- Automated deployments via Dokploy webhooks
|
|
- Configuration drift detection
|
|
|
|
11. **Consolidate storage strategy**
|
|
- Define clear policy for volumes vs bind mounts
|
|
- Document backup procedures for each storage type
|
|
|
|
12. **Security audit**
|
|
- Review all exposed ports
|
|
- Check for default/weak passwords
|
|
- Implement network segmentation if needed
|
|
|
|
---
|
|
|
|
## 16. Next Steps Checklist
|
|
|
|
- [ ] Fix critical service issues (memos, webstatic)
|
|
- [ ] Document all running services with purpose
|
|
- [ ] Commit all compose files to Gitea
|
|
- [ ] Create backup strategy
|
|
- [ ] Set up monitoring and alerting
|
|
- [ ] Clean up unused resources
|
|
- [ ] Create disaster recovery plan
|
|
- [ ] Document SSH access for all nodes
|
|
|
|
---
|
|
|
|
## Appendix A: Quick Commands Reference
|
|
|
|
```bash
|
|
# View cluster status
|
|
docker node ls
|
|
docker service ls
|
|
docker stack ls
|
|
|
|
# View service logs
|
|
docker service logs <service-name> --tail 100 -f
|
|
|
|
# View container logs
|
|
docker logs <container-name> --tail 100 -f
|
|
|
|
# Scale a service
|
|
docker service scale <service-name>=<replicas>
|
|
|
|
# Update a service
|
|
docker service update --force <service-name>
|
|
|
|
# SSH to nodes
|
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # tpi-n1 (manager)
|
|
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # tpi-n2 (worker)
|
|
# NAS node requires different credentials
|
|
|
|
# Access Dokploy UI
|
|
http://192.168.2.130:3000
|
|
|
|
# Access Swarmpit UI
|
|
http://192.168.2.130:888
|
|
|
|
# Access Traefik Dashboard
|
|
http://192.168.2.130:8080
|
|
```
|
|
|
|
---
|
|
|
|
*End of Audit Report*
|