Files
cloud-compose/HOMELAB_AUDIT.md
2026-02-09 09:52:00 -05:00

11 KiB

Home Lab Cluster Audit Report

Date: February 9, 2026
Auditor: opencode
Cluster: Docker Swarm with Dokploy


1. Cluster Overview

  • Cluster Type: Docker Swarm (3 nodes)
  • Orchestration: Dokploy v3.x
  • Reverse Proxy: Traefik v3.6.1
  • DNS: Technitium DNS Server
  • Monitoring: Swarmpit
  • Git Server: Gitea v1.24.4
  • Object Storage: MinIO

2. Node Inventory

Node 1: tpi-n1 (Controller/Manager)

  • IP: 192.168.2.130
  • Role: Manager (Leader)
  • Architecture: aarch64 (ARM64)
  • OS: Linux
  • CPU: 8 cores
  • RAM: ~8 GB
  • Docker: v27.5.1
  • Labels:
    • infra=true
    • role=storage
    • storage=high
  • Status: Ready, Active

Node 2: tpi-n2 (Worker)

  • IP: 192.168.2.19
  • Role: Worker
  • Architecture: aarch64 (ARM64)
  • OS: Linux
  • CPU: 8 cores
  • RAM: ~8 GB
  • Docker: v27.5.1
  • Labels:
    • role=compute
  • Status: Ready, Active

Node 3: node-nas (Storage Worker)

  • IP: 192.168.2.18
  • Role: Worker (NAS/Storage)
  • Architecture: x86_64
  • OS: Linux
  • CPU: 2 cores
  • RAM: ~8 GB
  • Docker: v29.1.2
  • Labels:
    • type=nas
  • Status: Ready, Active

3. Docker Stacks (Swarm Mode)

Active Stacks:

1. minio

  • Services: 1 (minio_minio)
  • Status: Running
  • Node: node-nas (constrained to NAS)
  • Ports: 9000 (API), 9001 (Console)
  • Storage: /mnt/synology-data/minio (bind mount)
  • Credentials: [REDACTED - see service config]

2. swarmpit

  • Services: 4
    • swarmpit_app (UI) - Running on tpi-n1, Port 888
    • swarmpit_agent (global) - Running on all 3 nodes
    • swarmpit_db (CouchDB) - Running on tpi-n2
    • swarmpit_influxdb - Running on node-nas
  • Status: Active with historical failures
  • Issues: Multiple container failures in history (mostly resolved)

4. Dokploy-Managed Services

Running Services (via Dokploy Compose):

  1. ai-lobechat-yqvecg - AI chat interface
  2. bewcloud-memos-ssogxn - Note-taking app (⚠️ Restarting loop)
  3. bewcloud-silverbullet-42sjev - SilverBullet markdown editor + Watchtower
  4. cloud-bewcloud-u2pls5 - BewCloud instance with Radicale (CalDAV/CardDAV)
  5. cloud-fizzy-ezuhfq - Fizzy web app
  6. cloud-ironcalc-0id5k8 - IronCalc spreadsheet
  7. cloud-radicale-wqldcv - Standalone Radicale server
  8. cloud-uptimekuma-jdeivt - Uptime monitoring
  9. dns-technitum-6ojgo2 - Technitium DNS server
  10. gitea-giteasqlite-bhymqw - Git server (Port 3000, SSH on 2222)
  11. gitea-registry-vdftrt - Docker registry (Port 5000)

Dokploy Infrastructure Services:

  • dokploy - Main Dokploy UI (Port 3000, host mode)
  • dokploy-postgres - Dokploy database
  • dokploy-redis - Dokploy cache
  • dokploy-traefik - Reverse proxy (Ports 80, 443, 8080)

5. Standalone Services (docker-compose)

Running:

  • technitium-dns - DNS server (Port 53, 5380)
  • immich3-compose - Photo management (Immich v2.3.0)
    • immich-server
    • immich-machine-learning
    • immich-database (pgvecto-rs)
    • immich-redis

Stack Services:

  • bendtstudio-pancake-bzgfpc - MariaDB database (Port 3306)
  • bendtstudio-webstatic-iq9evl - Static web files (⚠️ Rollback paused state)

6. Issues Identified

🔴 Critical Issues:

  1. bewcloud-memos in Restart Loop

    • Container keeps restarting (seen 24 seconds ago)
    • Status: Restarting (0) 24 seconds ago
    • Action Required: Check logs and fix configuration
  2. bendtstudio-webstatic in Rollback Paused State

    • Service is not updating properly
    • State: rollback_paused
    • Action Required: Investigate update failure
  3. bendtstudio-app Not Running

    • Service has 0/0 replicas
    • Action Required: Determine if needed or remove
  4. syncthing Stopped

    • Service has 0 replicas
    • Should be on node-nas
    • Action Required: Restart or remove if not needed

🟡 Warning Issues:

  1. Swarmpit Agent Failures (Historical)

    • Multiple past failures on all nodes
    • Currently running but concerning history
    • Action Required: Monitor for stability
  2. No Monitoring of MinIO

    • MinIO running but no obvious backup/monitoring strategy documented
    • Action Required: Set up monitoring and backup
  3. Credential Management

    • Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS)
    • Action Required: Migrate to Docker secrets or env files

🟢 Informational:

  1. 13 Unused/Orphaned Volumes

    • 33 total volumes, only 20 active
    • Action Required: Clean up unused volumes to reclaim ~595MB
  2. Gitea Repository Status Unknown

    • Cannot verify if all compose files are version controlled
    • Action Required: Audit Gitea repositories

7. Storage Configuration

Local Volumes (33 total):

Key volumes include:

  • dokploy-postgres-database
  • bewcloud-postgres-in40hh-data
  • gitea-data, gitea-registry-data
  • immich-postgres, immich-redis-data, immich-model-cache
  • bendtstudio-pancake-data
  • shared-data (NFS/shared)
  • Various app-specific volumes

Bind Mounts:

  • MinIO: /mnt/synology-data/minio/data
  • Syncthing: /mnt/synology-data/var/syncthing (currently stopped)
  • Dokploy: /etc/dokploy/etc/dokploy

NFS Mounts:

  • Synology NAS mounted at /mnt/synology-data/
  • Contains: immich/, minio/

8. Networking

Overlay Networks:

  • dokploy-network - Main Dokploy network
  • minio_default - MinIO stack network
  • swarmpit_net - Swarmpit monitoring network
  • ingress - Docker Swarm ingress

Bridge Networks:

  • Multiple app-specific networks created by compose
  • ai-lobechat-yqvecg
  • bewcloud-memos-ssogxn
  • bewcloud-silverbullet-42sjev
  • cloud-fizzy-ezuhfq_default
  • cloud-uptimekuma-jdeivt
  • gitea-giteasqlite-bhymqw
  • gitea-registry-vdftrt
  • immich3-compose-ubyhe9_default

9. SSL/TLS Configuration

  • Certificate Resolver: Let's Encrypt (ACME)
  • Email: sirtimbly@gmail.com
  • Challenge Type: HTTP-01
  • Storage: /etc/dokploy/traefik/dynamic/acme.json
  • Entry Points: web (80) → websecure (443) with auto-redirect
  • HTTP/3: Enabled on websecure

10. Traefik Routing

Configured Routes (via labels):

  • gitea.bendtstudio.com → Gitea
  • Multiple apps via traefik.me subdomains
  • HTTP → HTTPS redirect enabled
  • Middlewares configured in /etc/dokploy/traefik/dynamic/

11. DNS Configuration

Technitium DNS:

  • Port: 53 (TCP/UDP), 5380 (Web UI)
  • Domain: dns.bendtstudio.com
  • Admin Password: [REDACTED]
  • Placement: Locked to tpi-n1
  • TZ: America/New_York

Services using DNS:

  • All services accessible via bendtstudio.com subdomains
  • Internal DNS resolution for Docker services

12. Configuration Files Location

In /etc/dokploy/:

  • traefik/traefik.yml - Main Traefik config
  • traefik/dynamic/*.yml - Dynamic routes and middlewares
  • compose/*/code/docker-compose.yml - Dokploy-managed compose files

In /home/ubuntu/:

  • minio-stack.yml - MinIO stack definition

In local workspace:

  • Various compose files (not all deployed via Dokploy)
  • May be out of sync with running services

13. Missing Configuration in Version Control

Based on the analysis, the following may NOT be properly tracked in Gitea:

  1. Gitea itself - compose file present
  2. MinIO - stack file in ~/minio-stack.yml
  3. ⚠️ Dokploy dynamic configs - traefik routes
  4. ⚠️ All Dokploy-managed compose files - 11 services
  5. Technitium DNS - compose file in /etc/dokploy/
  6. Immich - compose configuration
  7. Swarmpit - stack configuration
  8. Dokploy infrastructure - internal services

14. Resource Usage

Docker System:

  • Images: 23 (10.91 GB)
  • Containers: 26 (135 MB)
  • Volumes: 33 (2.02 GB, 595MB reclaimable)
  • Build Cache: 0

Node Resources:

  • tpi-n1 & tpi-n2: 8 cores ARM64, 8GB RAM each
  • node-nas: 2 cores x86_64, 8GB RAM

15. Recommendations

Immediate Actions (High Priority):

  1. Fix bewcloud-memos

    docker service logs bewcloud-memos-ssogxn-memos --tail 50
    
  2. Fix bendtstudio-webstatic

    docker service ps bendtstudio-webstatic-iq9evl --no-trunc
    docker service update --force bendtstudio-webstatic-iq9evl
    
  3. Restart or Remove syncthing

    # Option 1: Scale up
    docker service scale syncthing=1
    
    # Option 2: Remove
    docker service rm syncthing
    
  4. Clean up unused volumes

    docker volume prune
    

Short-term Actions (Medium Priority):

  1. Audit Gitea repositories

  2. Secure credentials

    • Use Docker secrets for passwords
    • Move credentials to environment files
    • Never commit .env files with real passwords
  3. Set up automated backups

    • Back up Dokploy database
    • Back up Gitea repositories
    • Back up MinIO data
  4. Document all services

    • Create README for each service
    • Document dependencies and data locations
    • Create runbook for common operations

Long-term Actions (Low Priority):

  1. Implement proper monitoring

    • Prometheus/Grafana for metrics (mentioned in PLAN.md but not found)
    • Alerting for service failures
    • Disk usage monitoring
  2. Implement GitOps workflow

    • All changes through Git
    • Automated deployments via Dokploy webhooks
    • Configuration drift detection
  3. Consolidate storage strategy

    • Define clear policy for volumes vs bind mounts
    • Document backup procedures for each storage type
  4. Security audit

    • Review all exposed ports
    • Check for default/weak passwords
    • Implement network segmentation if needed

16. Next Steps Checklist

  • Fix critical service issues (memos, webstatic)
  • Document all running services with purpose
  • Commit all compose files to Gitea
  • Create backup strategy
  • Set up monitoring and alerting
  • Clean up unused resources
  • Create disaster recovery plan
  • Document SSH access for all nodes

Appendix A: Quick Commands Reference

# View cluster status
docker node ls
docker service ls
docker stack ls

# View service logs
docker service logs <service-name> --tail 100 -f

# View container logs
docker logs <container-name> --tail 100 -f

# Scale a service
docker service scale <service-name>=<replicas>

# Update a service
docker service update --force <service-name>

# SSH to nodes
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130  # tpi-n1 (manager)
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19   # tpi-n2 (worker)
# NAS node requires different credentials

# Access Dokploy UI
http://192.168.2.130:3000

# Access Swarmpit UI
http://192.168.2.130:888

# Access Traefik Dashboard
http://192.168.2.130:8080

End of Audit Report