9.3 KiB
Home Lab Action Plan
Phase 1: Critical Fixes (Do This Week)
1.1 Fix Failing Services
bewcloud-memos (Restarting Loop)
# SSH to controller
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
# Check what's wrong
docker service logs bewcloud-memos-ssogxn-memos --tail 100
# Common fixes:
# If database connection issue:
docker service update --env-add "MEMOS_DB_HOST=correct-hostname" bewcloud-memos-ssogxn-memos
# If it keeps failing, try recreating:
docker service rm bewcloud-memos-ssogxn-memos
# Then redeploy via Dokploy UI
bendtstudio-webstatic (Rollback Paused)
# Check the error
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
# Force update to retry
docker service update --force bendtstudio-webstatic-iq9evl
# If that fails, inspect the image
docker service inspect bendtstudio-webstatic-iq9evl --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
syncthing (Stopped)
# Option A: Start it if you need it
docker service scale syncthing=1
# Option B: Remove it if not needed
docker service rm syncthing
# Also remove the volume if no longer needed
docker volume rm cloud-syncthing-i2rpwr_syncthing_config
1.2 Clean Up Unused Resources
# Remove unused volumes (reclaim ~595MB)
docker volume prune
# Remove unused images
docker image prune -a
# System-wide cleanup
docker system prune -a --volumes
1.3 Document Current State
Take screenshots of:
- Dokploy UI (all projects)
- Swarmpit dashboard
- Traefik dashboard (http://192.168.2.130:8080)
- MinIO console (http://192.168.2.18:9001)
- Gitea repositories
Phase 2: Configuration Backup (Do This Week)
2.1 Create Git Repository for Infrastructure
# On the controller node:
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
# Create a backup directory
mkdir -p ~/infrastructure-backup/$(date +%Y-%m-%d)
cd ~/infrastructure-backup/$(date +%Y-%m-%d)
# Copy all compose files
cp -r /etc/dokploy/compose ./dokploy-compose
cp -r /etc/dokploy/traefik ./traefik-config
cp ~/minio-stack.yml ./
# Export service configs
mkdir -p ./service-configs
docker service ls -q | while read service; do
docker service inspect "$service" > "./service-configs/${service}.json"
done
# Export stack configs
docker stack ls -q | while read stack; do
docker stack ps "$stack" > "./service-configs/${stack}-tasks.txt"
done
# Create a summary
cat > README.txt << EOF
Infrastructure Backup - $(date)
Cluster: Docker Swarm with Dokploy
Nodes: 3 (tpi-n1, tpi-n2, node-nas)
Services: $(docker service ls -q | wc -l) services
Stacks: $(docker stack ls -q | wc -l) stacks
See HOMELAB_AUDIT.md for full documentation.
EOF
# Create tar archive
cd ..
tar -czf infrastructure-$(date +%Y-%m-%d).tar.gz $(date +%Y-%m-%d)
2.2 Commit to Gitea
# Clone your infrastructure repo (create if needed)
# Replace with your actual Gitea URL
git clone http://gitea.bendtstudio.com:3000/sirtimbly/homelab-configs.git
cd homelab-configs
# Copy backed up configs
cp -r ~/infrastructure-backup/$(date +%Y-%m-%d)/* .
# Organize by service
mkdir -p {stacks,compose,dokploy,traefik,docs}
mv dokploy-compose/* compose/ 2>/dev/null || true
mv traefik-config/* traefik/ 2>/dev/null || true
mv minio-stack.yml stacks/
mv service-configs/* docs/ 2>/dev/null || true
# Commit
git add .
git commit -m "Initial infrastructure backup - $(date +%Y-%m-%d)
- All Dokploy compose files
- Traefik configuration
- MinIO stack definition
- Service inspection exports
- Task history exports
Services backed up:
$(docker service ls --format '- {{.Name}}' | sort)
git push origin main
Phase 3: Security Hardening (Do Next Week)
3.1 Remove Exposed Credentials
Problem: Services have passwords in environment variables visible in Docker configs
Solution: Use Docker secrets or Dokploy environment variables
# Example: Securing MinIO
# Instead of having password in compose file, use Docker secret:
echo "your-minio-password" | docker secret create minio_root_password -
# Then in compose:
# environment:
# MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
# secrets:
# - minio_root_password
Action items:
-
List all services with exposed passwords:
docker service ls -q | xargs -I {} docker service inspect {} --format '{{.Spec.Name}}: {{range .Spec.TaskTemplate.ContainerSpec.Env}}{{.}} {{end}}' | grep -i password -
For each service, create a plan to move credentials to:
- Docker secrets (best for swarm)
- Environment files (easier to manage)
- Dokploy UI environment variables
-
Update compose files and redeploy
3.2 Update Default Passwords
Check for default/weak passwords:
- Dokploy (if still default)
- MinIO
- Gitea admin
- Technitium DNS
- Any databases
3.3 Review Exposed Ports
# Check all published ports
docker service ls --format '{{.Name}}: {{.Ports}}'
# Check if any services are exposed without Traefik
# (Should only be: 53, 2222, 3000, 8384, 9000-9001)
Phase 4: Monitoring Setup (Do Next Week)
4.1 Set Up Prometheus + Grafana
You mentioned these in PLAN.md but they're not running. Let's add them:
Create monitoring-stack.yml:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- prometheus-data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
networks:
- dokploy-network
deploy:
placement:
constraints:
- node.role == manager
grafana:
image: grafana/grafana:latest
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
secrets:
- grafana_admin_password
networks:
- dokploy-network
deploy:
labels:
- traefik.http.routers.grafana.rule=Host(`grafana.bendtstudio.com`)
- traefik.http.routers.grafana.entrypoints=websecure
- traefik.http.routers.grafana.tls.certresolver=letsencrypt
- traefik.enable=true
volumes:
prometheus-data:
grafana-data:
networks:
dokploy-network:
external: true
secrets:
grafana_admin_password:
external: true
4.2 Add Node Exporter
Deploy node-exporter on all nodes to collect system metrics.
4.3 Configure Alerts
Set up alerts for:
- Service down
- High CPU/memory usage
- Disk space low
- Certificate expiration
Phase 5: Backup Strategy (Do Within 2 Weeks)
5.1 Define What to Back Up
Critical Data:
- Gitea repositories (/data/git)
- Dokploy database
- MinIO buckets
- Immich photos (/mnt/synology-data/immich)
- PostgreSQL databases
- Configuration files
5.2 Create Backup Scripts
Example backup script for Gitea:
#!/bin/bash
# /opt/backup/backup-gitea.sh
BACKUP_DIR="/backup/gitea/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup Gitea data
docker exec gitea-giteasqlite-bhymqw-gitea-1 tar czf /tmp/gitea-backup.tar.gz /data
docker cp gitea-giteasqlite-bhymqw-gitea-1:/tmp/gitea-backup.tar.gz "$BACKUP_DIR/"
# Backup to MinIO (offsite)
mc cp "$BACKUP_DIR/gitea-backup.tar.gz" minio/backups/gitea/
# Clean up old backups (keep 30 days)
find /backup/gitea -type d -mtime +30 -exec rm -rf {} +
5.3 Automate Backups
Add to crontab:
# Daily backups at 2 AM
0 2 * * * /opt/backup/backup-gitea.sh
0 3 * * * /opt/backup/backup-dokploy.sh
0 4 * * * /opt/backup/backup-databases.sh
Phase 6: Documentation (Ongoing)
6.1 Create Service Catalog
For each service, document:
- Purpose: What does it do?
- Access URL: How do I reach it?
- Dependencies: What does it need?
- Data location: Where is data stored?
- Backup procedure: How to back it up?
- Restore procedure: How to restore it?
6.2 Create Runbooks
Common operations:
- Adding a new service
- Scaling a service
- Updating a service
- Removing a service
- Recovering from node failure
- Restoring from backup
6.3 Network Diagram
Create a visual diagram showing:
- Nodes and their roles
- Services and their locations
- Network connections
- Data flows
Quick Reference Commands
# Cluster status
docker node ls
docker service ls
docker stack ls
# Service management
docker service logs <service> --tail 100 -f
docker service ps <service>
docker service scale <service>=<count>
docker service update --force <service>
# Resource usage
docker system df
docker stats
# SSH access
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Manager
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker
# Web UIs
curl http://192.168.2.130:3000 # Dokploy
curl http://192.168.2.130:888 # Swarmpit
curl http://192.168.2.130:8080 # Traefik
curl http://192.168.2.18:5380 # Technitium DNS
curl http://192.168.2.18:9001 # MinIO Console
Questions for You
Before we proceed, I need to clarify a few things:
-
NAS Node Access: What are the SSH credentials for node-nas (192.168.2.18)?
-
bendtstudio-app: Is this service needed? It has 0 replicas.
-
syncthing: Do you want to keep this? It's currently stopped.
-
Monitoring: Do you want me to set up Prometheus/Grafana now, or later?
-
Gitea: Can you provide access credentials so I can check what's already version controlled?
-
Priority: Which phase should we tackle first? I recommend Phase 1 (critical fixes).
Action Plan Version 1.0 - February 9, 2026