# Home Lab Action Plan ## Phase 1: Critical Fixes (Do This Week) ### 1.1 Fix Failing Services **bewcloud-memos (Restarting Loop)** ```bash # SSH to controller ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Check what's wrong docker service logs bewcloud-memos-ssogxn-memos --tail 100 # Common fixes: # If database connection issue: docker service update --env-add "MEMOS_DB_HOST=correct-hostname" bewcloud-memos-ssogxn-memos # If it keeps failing, try recreating: docker service rm bewcloud-memos-ssogxn-memos # Then redeploy via Dokploy UI ``` **bendtstudio-webstatic (Rollback Paused)** ```bash # Check the error docker service ps bendtstudio-webstatic-iq9evl --no-trunc # Force update to retry docker service update --force bendtstudio-webstatic-iq9evl # If that fails, inspect the image docker service inspect bendtstudio-webstatic-iq9evl --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}' ``` **syncthing (Stopped)** ```bash # Option A: Start it if you need it docker service scale syncthing=1 # Option B: Remove it if not needed docker service rm syncthing # Also remove the volume if no longer needed docker volume rm cloud-syncthing-i2rpwr_syncthing_config ``` ### 1.2 Clean Up Unused Resources ```bash # Remove unused volumes (reclaim ~595MB) docker volume prune # Remove unused images docker image prune -a # System-wide cleanup docker system prune -a --volumes ``` ### 1.3 Document Current State Take screenshots of: - Dokploy UI (all projects) - Swarmpit dashboard - Traefik dashboard (http://192.168.2.130:8080) - MinIO console (http://192.168.2.18:9001) - Gitea repositories --- ## Phase 2: Configuration Backup (Do This Week) ### 2.1 Create Git Repository for Infrastructure ```bash # On the controller node: ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Create a backup directory mkdir -p ~/infrastructure-backup/$(date +%Y-%m-%d) cd ~/infrastructure-backup/$(date +%Y-%m-%d) # Copy all compose files cp -r /etc/dokploy/compose ./dokploy-compose cp -r /etc/dokploy/traefik ./traefik-config cp ~/minio-stack.yml ./ # Export service configs mkdir -p ./service-configs docker service ls -q | while read service; do docker service inspect "$service" > "./service-configs/${service}.json" done # Export stack configs docker stack ls -q | while read stack; do docker stack ps "$stack" > "./service-configs/${stack}-tasks.txt" done # Create a summary cat > README.txt << EOF Infrastructure Backup - $(date) Cluster: Docker Swarm with Dokploy Nodes: 3 (tpi-n1, tpi-n2, node-nas) Services: $(docker service ls -q | wc -l) services Stacks: $(docker stack ls -q | wc -l) stacks See HOMELAB_AUDIT.md for full documentation. EOF # Create tar archive cd .. tar -czf infrastructure-$(date +%Y-%m-%d).tar.gz $(date +%Y-%m-%d) ``` ### 2.2 Commit to Gitea ```bash # Clone your infrastructure repo (create if needed) # Replace with your actual Gitea URL git clone http://gitea.bendtstudio.com:3000/sirtimbly/homelab-configs.git cd homelab-configs # Copy backed up configs cp -r ~/infrastructure-backup/$(date +%Y-%m-%d)/* . # Organize by service mkdir -p {stacks,compose,dokploy,traefik,docs} mv dokploy-compose/* compose/ 2>/dev/null || true mv traefik-config/* traefik/ 2>/dev/null || true mv minio-stack.yml stacks/ mv service-configs/* docs/ 2>/dev/null || true # Commit git add . git commit -m "Initial infrastructure backup - $(date +%Y-%m-%d) - All Dokploy compose files - Traefik configuration - MinIO stack definition - Service inspection exports - Task history exports Services backed up: $(docker service ls --format '- {{.Name}}' | sort) git push origin main ``` --- ## Phase 3: Security Hardening (Do Next Week) ### 3.1 Remove Exposed Credentials **Problem:** Services have passwords in environment variables visible in Docker configs **Solution:** Use Docker secrets or Dokploy environment variables ```bash # Example: Securing MinIO # Instead of having password in compose file, use Docker secret: echo "your-minio-password" | docker secret create minio_root_password - # Then in compose: # environment: # MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password # secrets: # - minio_root_password ``` **Action items:** 1. List all services with exposed passwords: ```bash docker service ls -q | xargs -I {} docker service inspect {} --format '{{.Spec.Name}}: {{range .Spec.TaskTemplate.ContainerSpec.Env}}{{.}} {{end}}' | grep -i password ``` 2. For each service, create a plan to move credentials to: - Docker secrets (best for swarm) - Environment files (easier to manage) - Dokploy UI environment variables 3. Update compose files and redeploy ### 3.2 Update Default Passwords Check for default/weak passwords: - Dokploy (if still default) - MinIO - Gitea admin - Technitium DNS - Any databases ### 3.3 Review Exposed Ports ```bash # Check all published ports docker service ls --format '{{.Name}}: {{.Ports}}' # Check if any services are exposed without Traefik # (Should only be: 53, 2222, 3000, 8384, 9000-9001) ``` --- ## Phase 4: Monitoring Setup (Do Next Week) ### 4.1 Set Up Prometheus + Grafana You mentioned these in PLAN.md but they're not running. Let's add them: Create `monitoring-stack.yml`: ```yaml version: '3.8' services: prometheus: image: prom/prometheus:latest volumes: - prometheus-data:/prometheus - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' networks: - dokploy-network deploy: placement: constraints: - node.role == manager grafana: image: grafana/grafana:latest volumes: - grafana-data:/var/lib/grafana environment: - GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password secrets: - grafana_admin_password networks: - dokploy-network deploy: labels: - traefik.http.routers.grafana.rule=Host(`grafana.bendtstudio.com`) - traefik.http.routers.grafana.entrypoints=websecure - traefik.http.routers.grafana.tls.certresolver=letsencrypt - traefik.enable=true volumes: prometheus-data: grafana-data: networks: dokploy-network: external: true secrets: grafana_admin_password: external: true ``` ### 4.2 Add Node Exporter Deploy node-exporter on all nodes to collect system metrics. ### 4.3 Configure Alerts Set up alerts for: - Service down - High CPU/memory usage - Disk space low - Certificate expiration --- ## Phase 5: Backup Strategy (Do Within 2 Weeks) ### 5.1 Define What to Back Up **Critical Data:** 1. Gitea repositories (/data/git) 2. Dokploy database 3. MinIO buckets 4. Immich photos (/mnt/synology-data/immich) 5. PostgreSQL databases 6. Configuration files ### 5.2 Create Backup Scripts Example backup script for Gitea: ```bash #!/bin/bash # /opt/backup/backup-gitea.sh BACKUP_DIR="/backup/gitea/$(date +%Y%m%d)" mkdir -p "$BACKUP_DIR" # Backup Gitea data docker exec gitea-giteasqlite-bhymqw-gitea-1 tar czf /tmp/gitea-backup.tar.gz /data docker cp gitea-giteasqlite-bhymqw-gitea-1:/tmp/gitea-backup.tar.gz "$BACKUP_DIR/" # Backup to MinIO (offsite) mc cp "$BACKUP_DIR/gitea-backup.tar.gz" minio/backups/gitea/ # Clean up old backups (keep 30 days) find /backup/gitea -type d -mtime +30 -exec rm -rf {} + ``` ### 5.3 Automate Backups Add to crontab: ```bash # Daily backups at 2 AM 0 2 * * * /opt/backup/backup-gitea.sh 0 3 * * * /opt/backup/backup-dokploy.sh 0 4 * * * /opt/backup/backup-databases.sh ``` --- ## Phase 6: Documentation (Ongoing) ### 6.1 Create Service Catalog For each service, document: - **Purpose:** What does it do? - **Access URL:** How do I reach it? - **Dependencies:** What does it need? - **Data location:** Where is data stored? - **Backup procedure:** How to back it up? - **Restore procedure:** How to restore it? ### 6.2 Create Runbooks Common operations: - Adding a new service - Scaling a service - Updating a service - Removing a service - Recovering from node failure - Restoring from backup ### 6.3 Network Diagram Create a visual diagram showing: - Nodes and their roles - Services and their locations - Network connections - Data flows --- ## Quick Reference Commands ```bash # Cluster status docker node ls docker service ls docker stack ls # Service management docker service logs --tail 100 -f docker service ps docker service scale = docker service update --force # Resource usage docker system df docker stats # SSH access ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Manager ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker # Web UIs curl http://192.168.2.130:3000 # Dokploy curl http://192.168.2.130:888 # Swarmpit curl http://192.168.2.130:8080 # Traefik curl http://192.168.2.18:5380 # Technitium DNS curl http://192.168.2.18:9001 # MinIO Console ``` --- ## Questions for You Before we proceed, I need to clarify a few things: 1. **NAS Node Access:** What are the SSH credentials for node-nas (192.168.2.18)? 2. **bendtstudio-app:** Is this service needed? It has 0 replicas. 3. **syncthing:** Do you want to keep this? It's currently stopped. 4. **Monitoring:** Do you want me to set up Prometheus/Grafana now, or later? 5. **Gitea:** Can you provide access credentials so I can check what's already version controlled? 6. **Priority:** Which phase should we tackle first? I recommend Phase 1 (critical fixes). --- *Action Plan Version 1.0 - February 9, 2026*