diff --git a/ACTION_PLAN.md b/ACTION_PLAN.md new file mode 100644 index 0000000..72ab8c9 --- /dev/null +++ b/ACTION_PLAN.md @@ -0,0 +1,403 @@ +# Home Lab Action Plan + +## Phase 1: Critical Fixes (Do This Week) + +### 1.1 Fix Failing Services + +**bewcloud-memos (Restarting Loop)** +```bash +# SSH to controller +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 + +# Check what's wrong +docker service logs bewcloud-memos-ssogxn-memos --tail 100 + +# Common fixes: +# If database connection issue: +docker service update --env-add "MEMOS_DB_HOST=correct-hostname" bewcloud-memos-ssogxn-memos + +# If it keeps failing, try recreating: +docker service rm bewcloud-memos-ssogxn-memos +# Then redeploy via Dokploy UI +``` + +**bendtstudio-webstatic (Rollback Paused)** +```bash +# Check the error +docker service ps bendtstudio-webstatic-iq9evl --no-trunc + +# Force update to retry +docker service update --force bendtstudio-webstatic-iq9evl + +# If that fails, inspect the image +docker service inspect bendtstudio-webstatic-iq9evl --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}' +``` + +**syncthing (Stopped)** +```bash +# Option A: Start it if you need it +docker service scale syncthing=1 + +# Option B: Remove it if not needed +docker service rm syncthing +# Also remove the volume if no longer needed +docker volume rm cloud-syncthing-i2rpwr_syncthing_config +``` + +### 1.2 Clean Up Unused Resources + +```bash +# Remove unused volumes (reclaim ~595MB) +docker volume prune + +# Remove unused images +docker image prune -a + +# System-wide cleanup +docker system prune -a --volumes +``` + +### 1.3 Document Current State + +Take screenshots of: +- Dokploy UI (all projects) +- Swarmpit dashboard +- Traefik dashboard (http://192.168.2.130:8080) +- MinIO console (http://192.168.2.18:9001) +- Gitea repositories + +--- + +## Phase 2: Configuration Backup (Do This Week) + +### 2.1 Create Git Repository for Infrastructure + +```bash +# On the controller node: +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 + +# Create a backup directory +mkdir -p ~/infrastructure-backup/$(date +%Y-%m-%d) +cd ~/infrastructure-backup/$(date +%Y-%m-%d) + +# Copy all compose files +cp -r /etc/dokploy/compose ./dokploy-compose +cp -r /etc/dokploy/traefik ./traefik-config +cp ~/minio-stack.yml ./ + +# Export service configs +mkdir -p ./service-configs +docker service ls -q | while read service; do + docker service inspect "$service" > "./service-configs/${service}.json" +done + +# Export stack configs +docker stack ls -q | while read stack; do + docker stack ps "$stack" > "./service-configs/${stack}-tasks.txt" +done + +# Create a summary +cat > README.txt << EOF +Infrastructure Backup - $(date) +Cluster: Docker Swarm with Dokploy +Nodes: 3 (tpi-n1, tpi-n2, node-nas) +Services: $(docker service ls -q | wc -l) services +Stacks: $(docker stack ls -q | wc -l) stacks + +See HOMELAB_AUDIT.md for full documentation. +EOF + +# Create tar archive +cd .. +tar -czf infrastructure-$(date +%Y-%m-%d).tar.gz $(date +%Y-%m-%d) +``` + +### 2.2 Commit to Gitea + +```bash +# Clone your infrastructure repo (create if needed) +# Replace with your actual Gitea URL +git clone http://gitea.bendtstudio.com:3000/sirtimbly/homelab-configs.git +cd homelab-configs + +# Copy backed up configs +cp -r ~/infrastructure-backup/$(date +%Y-%m-%d)/* . + +# Organize by service +mkdir -p {stacks,compose,dokploy,traefik,docs} +mv dokploy-compose/* compose/ 2>/dev/null || true +mv traefik-config/* traefik/ 2>/dev/null || true +mv minio-stack.yml stacks/ +mv service-configs/* docs/ 2>/dev/null || true + +# Commit +git add . +git commit -m "Initial infrastructure backup - $(date +%Y-%m-%d) + +- All Dokploy compose files +- Traefik configuration +- MinIO stack definition +- Service inspection exports +- Task history exports + +Services backed up: +$(docker service ls --format '- {{.Name}}' | sort) + +git push origin main +``` + +--- + +## Phase 3: Security Hardening (Do Next Week) + +### 3.1 Remove Exposed Credentials + +**Problem:** Services have passwords in environment variables visible in Docker configs + +**Solution:** Use Docker secrets or Dokploy environment variables + +```bash +# Example: Securing MinIO +# Instead of having password in compose file, use Docker secret: + +echo "your-minio-password" | docker secret create minio_root_password - + +# Then in compose: +# environment: +# MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password +# secrets: +# - minio_root_password +``` + +**Action items:** +1. List all services with exposed passwords: + ```bash + docker service ls -q | xargs -I {} docker service inspect {} --format '{{.Spec.Name}}: {{range .Spec.TaskTemplate.ContainerSpec.Env}}{{.}} {{end}}' | grep -i password + ``` + +2. For each service, create a plan to move credentials to: + - Docker secrets (best for swarm) + - Environment files (easier to manage) + - Dokploy UI environment variables + +3. Update compose files and redeploy + +### 3.2 Update Default Passwords + +Check for default/weak passwords: +- Dokploy (if still default) +- MinIO +- Gitea admin +- Technitium DNS +- Any databases + +### 3.3 Review Exposed Ports + +```bash +# Check all published ports +docker service ls --format '{{.Name}}: {{.Ports}}' + +# Check if any services are exposed without Traefik +# (Should only be: 53, 2222, 3000, 8384, 9000-9001) +``` + +--- + +## Phase 4: Monitoring Setup (Do Next Week) + +### 4.1 Set Up Prometheus + Grafana + +You mentioned these in PLAN.md but they're not running. Let's add them: + +Create `monitoring-stack.yml`: +```yaml +version: '3.8' + +services: + prometheus: + image: prom/prometheus:latest + volumes: + - prometheus-data:/prometheus + - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro + command: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--storage.tsdb.path=/prometheus' + networks: + - dokploy-network + deploy: + placement: + constraints: + - node.role == manager + + grafana: + image: grafana/grafana:latest + volumes: + - grafana-data:/var/lib/grafana + environment: + - GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password + secrets: + - grafana_admin_password + networks: + - dokploy-network + deploy: + labels: + - traefik.http.routers.grafana.rule=Host(`grafana.bendtstudio.com`) + - traefik.http.routers.grafana.entrypoints=websecure + - traefik.http.routers.grafana.tls.certresolver=letsencrypt + - traefik.enable=true + +volumes: + prometheus-data: + grafana-data: + +networks: + dokploy-network: + external: true + +secrets: + grafana_admin_password: + external: true +``` + +### 4.2 Add Node Exporter + +Deploy node-exporter on all nodes to collect system metrics. + +### 4.3 Configure Alerts + +Set up alerts for: +- Service down +- High CPU/memory usage +- Disk space low +- Certificate expiration + +--- + +## Phase 5: Backup Strategy (Do Within 2 Weeks) + +### 5.1 Define What to Back Up + +**Critical Data:** +1. Gitea repositories (/data/git) +2. Dokploy database +3. MinIO buckets +4. Immich photos (/mnt/synology-data/immich) +5. PostgreSQL databases +6. Configuration files + +### 5.2 Create Backup Scripts + +Example backup script for Gitea: +```bash +#!/bin/bash +# /opt/backup/backup-gitea.sh + +BACKUP_DIR="/backup/gitea/$(date +%Y%m%d)" +mkdir -p "$BACKUP_DIR" + +# Backup Gitea data +docker exec gitea-giteasqlite-bhymqw-gitea-1 tar czf /tmp/gitea-backup.tar.gz /data +docker cp gitea-giteasqlite-bhymqw-gitea-1:/tmp/gitea-backup.tar.gz "$BACKUP_DIR/" + +# Backup to MinIO (offsite) +mc cp "$BACKUP_DIR/gitea-backup.tar.gz" minio/backups/gitea/ + +# Clean up old backups (keep 30 days) +find /backup/gitea -type d -mtime +30 -exec rm -rf {} + +``` + +### 5.3 Automate Backups + +Add to crontab: +```bash +# Daily backups at 2 AM +0 2 * * * /opt/backup/backup-gitea.sh +0 3 * * * /opt/backup/backup-dokploy.sh +0 4 * * * /opt/backup/backup-databases.sh +``` + +--- + +## Phase 6: Documentation (Ongoing) + +### 6.1 Create Service Catalog + +For each service, document: +- **Purpose:** What does it do? +- **Access URL:** How do I reach it? +- **Dependencies:** What does it need? +- **Data location:** Where is data stored? +- **Backup procedure:** How to back it up? +- **Restore procedure:** How to restore it? + +### 6.2 Create Runbooks + +Common operations: +- Adding a new service +- Scaling a service +- Updating a service +- Removing a service +- Recovering from node failure +- Restoring from backup + +### 6.3 Network Diagram + +Create a visual diagram showing: +- Nodes and their roles +- Services and their locations +- Network connections +- Data flows + +--- + +## Quick Reference Commands + +```bash +# Cluster status +docker node ls +docker service ls +docker stack ls + +# Service management +docker service logs --tail 100 -f +docker service ps +docker service scale = +docker service update --force + +# Resource usage +docker system df +docker stats + +# SSH access +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Manager +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker + +# Web UIs +curl http://192.168.2.130:3000 # Dokploy +curl http://192.168.2.130:888 # Swarmpit +curl http://192.168.2.130:8080 # Traefik +curl http://192.168.2.18:5380 # Technitium DNS +curl http://192.168.2.18:9001 # MinIO Console +``` + +--- + +## Questions for You + +Before we proceed, I need to clarify a few things: + +1. **NAS Node Access:** What are the SSH credentials for node-nas (192.168.2.18)? + +2. **bendtstudio-app:** Is this service needed? It has 0 replicas. + +3. **syncthing:** Do you want to keep this? It's currently stopped. + +4. **Monitoring:** Do you want me to set up Prometheus/Grafana now, or later? + +5. **Gitea:** Can you provide access credentials so I can check what's already version controlled? + +6. **Priority:** Which phase should we tackle first? I recommend Phase 1 (critical fixes). + +--- + +*Action Plan Version 1.0 - February 9, 2026* diff --git a/HOMELAB_AUDIT.md b/HOMELAB_AUDIT.md new file mode 100644 index 0000000..c200d77 --- /dev/null +++ b/HOMELAB_AUDIT.md @@ -0,0 +1,425 @@ +# Home Lab Cluster Audit Report + +**Date:** February 9, 2026 +**Auditor:** opencode +**Cluster:** Docker Swarm with Dokploy + +--- + +## 1. Cluster Overview + +- **Cluster Type:** Docker Swarm (3 nodes) +- **Orchestration:** Dokploy v3.x +- **Reverse Proxy:** Traefik v3.6.1 +- **DNS:** Technitium DNS Server +- **Monitoring:** Swarmpit +- **Git Server:** Gitea v1.24.4 +- **Object Storage:** MinIO + +--- + +## 2. Node Inventory + +### Node 1: tpi-n1 (Controller/Manager) +- **IP:** 192.168.2.130 +- **Role:** Manager (Leader) +- **Architecture:** aarch64 (ARM64) +- **OS:** Linux +- **CPU:** 8 cores +- **RAM:** ~8 GB +- **Docker:** v27.5.1 +- **Labels:** + - `infra=true` + - `role=storage` + - `storage=high` +- **Status:** Ready, Active + +### Node 2: tpi-n2 (Worker) +- **IP:** 192.168.2.19 +- **Role:** Worker +- **Architecture:** aarch64 (ARM64) +- **OS:** Linux +- **CPU:** 8 cores +- **RAM:** ~8 GB +- **Docker:** v27.5.1 +- **Labels:** + - `role=compute` +- **Status:** Ready, Active + +### Node 3: node-nas (Storage Worker) +- **IP:** 192.168.2.18 +- **Role:** Worker (NAS/Storage) +- **Architecture:** x86_64 +- **OS:** Linux +- **CPU:** 2 cores +- **RAM:** ~8 GB +- **Docker:** v29.1.2 +- **Labels:** + - `type=nas` +- **Status:** Ready, Active + +--- + +## 3. Docker Stacks (Swarm Mode) + +### Active Stacks: + +#### 1. minio +- **Services:** 1 (minio_minio) +- **Status:** Running +- **Node:** node-nas (constrained to NAS) +- **Ports:** 9000 (API), 9001 (Console) +- **Storage:** /mnt/synology-data/minio (bind mount) +- **Credentials:** [REDACTED - see service config] + +#### 2. swarmpit +- **Services:** 4 + - swarmpit_app (UI) - Running on tpi-n1, Port 888 + - swarmpit_agent (global) - Running on all 3 nodes + - swarmpit_db (CouchDB) - Running on tpi-n2 + - swarmpit_influxdb - Running on node-nas +- **Status:** Active with historical failures +- **Issues:** Multiple container failures in history (mostly resolved) + +--- + +## 4. Dokploy-Managed Services + +### Running Services (via Dokploy Compose): + +1. **ai-lobechat-yqvecg** - AI chat interface +2. **bewcloud-memos-ssogxn** - Note-taking app (⚠️ Restarting loop) +3. **bewcloud-silverbullet-42sjev** - SilverBullet markdown editor + Watchtower +4. **cloud-bewcloud-u2pls5** - BewCloud instance with Radicale (CalDAV/CardDAV) +5. **cloud-fizzy-ezuhfq** - Fizzy web app +6. **cloud-ironcalc-0id5k8** - IronCalc spreadsheet +7. **cloud-radicale-wqldcv** - Standalone Radicale server +8. **cloud-uptimekuma-jdeivt** - Uptime monitoring +9. **dns-technitum-6ojgo2** - Technitium DNS server +10. **gitea-giteasqlite-bhymqw** - Git server (Port 3000, SSH on 2222) +11. **gitea-registry-vdftrt** - Docker registry (Port 5000) + +### Dokploy Infrastructure Services: +- **dokploy** - Main Dokploy UI (Port 3000, host mode) +- **dokploy-postgres** - Dokploy database +- **dokploy-redis** - Dokploy cache +- **dokploy-traefik** - Reverse proxy (Ports 80, 443, 8080) + +--- + +## 5. Standalone Services (docker-compose) + +### Running: +- **technitium-dns** - DNS server (Port 53, 5380) +- **immich3-compose** - Photo management (Immich v2.3.0) + - immich-server + - immich-machine-learning + - immich-database (pgvecto-rs) + - immich-redis + +### Stack Services: +- **bendtstudio-pancake-bzgfpc** - MariaDB database (Port 3306) +- **bendtstudio-webstatic-iq9evl** - Static web files (⚠️ Rollback paused state) + +--- + +## 6. Issues Identified + +### πŸ”΄ Critical Issues: + +1. **bewcloud-memos in Restart Loop** + - Container keeps restarting (seen 24 seconds ago) + - Status: `Restarting (0) 24 seconds ago` + - **Action Required:** Check logs and fix configuration + +2. **bendtstudio-webstatic in Rollback Paused State** + - Service is not updating properly + - State: `rollback_paused` + - **Action Required:** Investigate update failure + +3. **bendtstudio-app Not Running** + - Service has 0/0 replicas + - **Action Required:** Determine if needed or remove + +4. **syncthing Stopped** + - Service has 0 replicas + - Should be on node-nas + - **Action Required:** Restart or remove if not needed + +### 🟑 Warning Issues: + +5. **Swarmpit Agent Failures (Historical)** + - Multiple past failures on all nodes + - Currently running but concerning history + - **Action Required:** Monitor for stability + +6. **No Monitoring of MinIO** + - MinIO running but no obvious backup/monitoring strategy documented + - **Action Required:** Set up monitoring and backup + +7. **Credential Management** + - Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS) + - **Action Required:** Migrate to Docker secrets or env files + +### 🟒 Informational: + +8. **13 Unused/Orphaned Volumes** + - 33 total volumes, only 20 active + - **Action Required:** Clean up unused volumes to reclaim ~595MB + +9. **Gitea Repository Status Unknown** + - Cannot verify if all compose files are version controlled + - **Action Required:** Audit Gitea repositories + +--- + +## 7. Storage Configuration + +### Local Volumes (33 total): +Key volumes include: +- `dokploy-postgres-database` +- `bewcloud-postgres-in40hh-data` +- `gitea-data`, `gitea-registry-data` +- `immich-postgres`, `immich-redis-data`, `immich-model-cache` +- `bendtstudio-pancake-data` +- `shared-data` (NFS/shared) +- Various app-specific volumes + +### Bind Mounts: +- **MinIO:** `/mnt/synology-data/minio` β†’ `/data` +- **Syncthing:** `/mnt/synology-data` β†’ `/var/syncthing` (currently stopped) +- **Dokploy:** `/etc/dokploy` β†’ `/etc/dokploy` + +### NFS Mounts: +- Synology NAS mounted at `/mnt/synology-data/` +- Contains: immich/, minio/ + +--- + +## 8. Networking + +### Overlay Networks: +- `dokploy-network` - Main Dokploy network +- `minio_default` - MinIO stack network +- `swarmpit_net` - Swarmpit monitoring network +- `ingress` - Docker Swarm ingress + +### Bridge Networks: +- Multiple app-specific networks created by compose +- `ai-lobechat-yqvecg` +- `bewcloud-memos-ssogxn` +- `bewcloud-silverbullet-42sjev` +- `cloud-fizzy-ezuhfq_default` +- `cloud-uptimekuma-jdeivt` +- `gitea-giteasqlite-bhymqw` +- `gitea-registry-vdftrt` +- `immich3-compose-ubyhe9_default` + +--- + +## 9. SSL/TLS Configuration + +- **Certificate Resolver:** Let's Encrypt (ACME) +- **Email:** sirtimbly@gmail.com +- **Challenge Type:** HTTP-01 +- **Storage:** `/etc/dokploy/traefik/dynamic/acme.json` +- **Entry Points:** web (80) β†’ websecure (443) with auto-redirect +- **HTTP/3:** Enabled on websecure + +--- + +## 10. Traefik Routing + +### Configured Routes (via labels): +- gitea.bendtstudio.com β†’ Gitea +- Multiple apps via traefik.me subdomains +- HTTP β†’ HTTPS redirect enabled +- Middlewares configured in `/etc/dokploy/traefik/dynamic/` + +--- + +## 11. DNS Configuration + +### Technitium DNS: +- **Port:** 53 (TCP/UDP), 5380 (Web UI) +- **Domain:** dns.bendtstudio.com +- **Admin Password:** [REDACTED] +- **Placement:** Locked to tpi-n1 +- **TZ:** America/New_York + +### Services using DNS: +- All services accessible via bendtstudio.com subdomains +- Internal DNS resolution for Docker services + +--- + +## 12. Configuration Files Location + +### In `/etc/dokploy/`: +- `traefik/traefik.yml` - Main Traefik config +- `traefik/dynamic/*.yml` - Dynamic routes and middlewares +- `compose/*/code/docker-compose.yml` - Dokploy-managed compose files + +### In `/home/ubuntu/`: +- `minio-stack.yml` - MinIO stack definition + +### In local workspace: +- Various compose files (not all deployed via Dokploy) +- May be out of sync with running services + +--- + +## 13. Missing Configuration in Version Control + +Based on the analysis, the following may NOT be properly tracked in Gitea: + +1. βœ… **Gitea** itself - compose file present +2. βœ… **MinIO** - stack file in ~/minio-stack.yml +3. ⚠️ **Dokploy dynamic configs** - traefik routes +4. ⚠️ **All Dokploy-managed compose files** - 11 services +5. ❌ **Technitium DNS** - compose file in /etc/dokploy/ +6. ❌ **Immich** - compose configuration +7. ❌ **Swarmpit** - stack configuration +8. ❌ **Dokploy infrastructure** - internal services + +--- + +## 14. Resource Usage + +### Docker System: +- **Images:** 23 (10.91 GB) +- **Containers:** 26 (135 MB) +- **Volumes:** 33 (2.02 GB, 595MB reclaimable) +- **Build Cache:** 0 + +### Node Resources: +- **tpi-n1 & tpi-n2:** 8 cores ARM64, 8GB RAM each +- **node-nas:** 2 cores x86_64, 8GB RAM + +--- + +## 15. Recommendations + +### Immediate Actions (High Priority): + +1. **Fix bewcloud-memos** + ```bash + docker service logs bewcloud-memos-ssogxn-memos --tail 50 + ``` + +2. **Fix bendtstudio-webstatic** + ```bash + docker service ps bendtstudio-webstatic-iq9evl --no-trunc + docker service update --force bendtstudio-webstatic-iq9evl + ``` + +3. **Restart or Remove syncthing** + ```bash + # Option 1: Scale up + docker service scale syncthing=1 + + # Option 2: Remove + docker service rm syncthing + ``` + +4. **Clean up unused volumes** + ```bash + docker volume prune + ``` + +### Short-term Actions (Medium Priority): + +5. **Audit Gitea repositories** + - Access Gitea at http://gitea.bendtstudio.com + - Verify which compose files are tracked + - Commit missing configurations + +6. **Secure credentials** + - Use Docker secrets for passwords + - Move credentials to environment files + - Never commit .env files with real passwords + +7. **Set up automated backups** + - Back up Dokploy database + - Back up Gitea repositories + - Back up MinIO data + +8. **Document all services** + - Create README for each service + - Document dependencies and data locations + - Create runbook for common operations + +### Long-term Actions (Low Priority): + +9. **Implement proper monitoring** + - Prometheus/Grafana for metrics (mentioned in PLAN.md but not found) + - Alerting for service failures + - Disk usage monitoring + +10. **Implement GitOps workflow** + - All changes through Git + - Automated deployments via Dokploy webhooks + - Configuration drift detection + +11. **Consolidate storage strategy** + - Define clear policy for volumes vs bind mounts + - Document backup procedures for each storage type + +12. **Security audit** + - Review all exposed ports + - Check for default/weak passwords + - Implement network segmentation if needed + +--- + +## 16. Next Steps Checklist + +- [ ] Fix critical service issues (memos, webstatic) +- [ ] Document all running services with purpose +- [ ] Commit all compose files to Gitea +- [ ] Create backup strategy +- [ ] Set up monitoring and alerting +- [ ] Clean up unused resources +- [ ] Create disaster recovery plan +- [ ] Document SSH access for all nodes + +--- + +## Appendix A: Quick Commands Reference + +```bash +# View cluster status +docker node ls +docker service ls +docker stack ls + +# View service logs +docker service logs --tail 100 -f + +# View container logs +docker logs --tail 100 -f + +# Scale a service +docker service scale = + +# Update a service +docker service update --force + +# SSH to nodes +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # tpi-n1 (manager) +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # tpi-n2 (worker) +# NAS node requires different credentials + +# Access Dokploy UI +http://192.168.2.130:3000 + +# Access Swarmpit UI +http://192.168.2.130:888 + +# Access Traefik Dashboard +http://192.168.2.130:8080 +``` + +--- + +*End of Audit Report* diff --git a/QUICK_REFERENCE.md b/QUICK_REFERENCE.md new file mode 100644 index 0000000..180e4d7 --- /dev/null +++ b/QUICK_REFERENCE.md @@ -0,0 +1,34 @@ +## Cluster Access + +```bash +# SSH to controller +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 + +# SSH to worker +ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 + +# SSH to NAS node +ssh tim@192.168.2.18 +``` + +## Dokploy + +Configured to deploy across the nodes in the cluster. The nas node has access to a NFS share at + +Dokploy has an s3 compatible storage which is pointing to minio on the nas node. This is used for backups. + +## Minio + +Minio is installed on the nas node. It is used for backups of the dokploy database and for storing the dokploy s3 compatible storage. + +## Traefik + +Traefik is installed on the controller node. It is used to route traffic to the various services on the cluster. + +## Gitea + +Used for storing all the compose and stack yml files for each service. + +## Technitium DNS + +Using internal DNS requires configuring the docker DNS to point to the Technitium DNS server. diff --git a/SETUP.md b/SETUP.md new file mode 100644 index 0000000..0ea7bc2 --- /dev/null +++ b/SETUP.md @@ -0,0 +1,103 @@ +Dokploy + Docker Swarm Homelab Setup Instructions +This guide walks through setting up a fresh, multi-node Docker Swarm cluster using Dokploy for quick web app deployment and easy hosting of infrastructure services (like Pi-hole and Minio), including shared storage via NFS from your NAS node. + +1. Prepare Environment +β€’ Choose a primary node (can be any capable Linux server). +β€’ Identify your NAS node (high capacity storage). +β€’ Gather all SSH credentials. +β€’ Ensure all nodes have Docker installed (curl -fsSL https://get.docker.com | sh). + +2. Initialize Docker Swarm Cluster +On your primary node: +docker swarm init --advertise-addr +On each additional node: +β€’ Run the join command given by the previous step, e.g.: +docker swarm join --token :2377 + +3. Label Nodes for Placement Constraints +On your primary node, label nodes: +docker node update --label-add role=storage nas-node-01 +docker node update --label-add storage=high nas-node-01 +docker node update --label-add role=compute node-light-01 +docker node update --label-add infra=true nas-node-01 +(Replace node names as appropriate) + +4. Set Up Dokploy +On primary node: +curl -sSL https://dokploy.com/install.sh | sh +β€’ Dokploy UI will be available on port 8080. +β€’ Default credentials: admin / admin (change ASAP). + +5. Set Up Shared NFS Storage from Your NAS +On your NAS node: +β€’ Install NFS server (Debian/Ubuntu): +sudo apt install nfs-kernel-server +β€’ Export a directory: +o Edit /etc/exports, add: +/mnt/storage/docker-data *(rw,sync,no_subtree_check) +o Restart NFS: +sudo exportfs -ra +sudo systemctl restart nfs-kernel-server + +6. Create Shared NFS Volume in Docker +On the manager node: +docker volume create +--driver local +--opt type=nfs +--opt o=addr=,rw,nolock,nfsvers=4 +--opt device=:/mnt/storage/docker-data +shared-data +(Replace with your NAS's address.) + +7. Deploy Apps with Dokploy + Placement Constraints +β€’ Use Dokploy UI to: +o Deploy your web apps (Node.js, PHP, static sites) +o Set replica counts (scaling) +o Pin infrastructure apps (like Pi-hole or Minio) to the NAS node via placement constraints. +o Use the shared NFS volume for persistent data. +Example Docker Compose snippet for Pinning: +services: +pihole: +image: pihole/pihole +deploy: +placement: +constraints: +- node.labels.role==storage +volumes: +- shared-data:/etc/pihole + +8. (Optional) Set Up Minio (S3-Compatible Storage) +β€’ Deploy Minio with Dokploy, pin it to your NAS, and use shared volume for data: +services: +minio: +image: minio/minio +command: server /data --console-address ":9001" +environment: +MINIO_ROOT_USER: admin +MINIO_ROOT_PASSWORD: changeme123 +volumes: +- shared-data:/data +deploy: +placement: +constraints: +- node.labels.role==storage +ports: +- "9000:9000" +- "9001:9001" + +9. Add Web Apps and Experiment! +β€’ Use Dokploy's UI to connect to your Gitea instance, auto-deploy repos, and experiment rapidly. +β€’ Traefik integration and SSL setup is handled automatically in Dokploy. + +10. Restore K3s (Optional, Later) +β€’ Your original K3s manifests are saved in gitβ€”just reapply if you wish to revert: +k3s server +kubectl apply -f + +References +β€’ Docker Swarm Docs: https://docs.docker.com/engine/swarm/ +β€’ Dokploy Docs: https://dokploy.com/docs/ +β€’ Docker Volumes: https://docs.docker.com/engine/storage/volumes/ +β€’ NFS on Linux: https://help.ubuntu.com/community/NFS + +This guide gives you a fast start for a declarative, multi-node homelab with web app simplicity and infrastructure reliability using Dokploy and Docker Swarm!