new plan and docs
This commit is contained in:
403
ACTION_PLAN.md
Normal file
403
ACTION_PLAN.md
Normal file
@@ -0,0 +1,403 @@
|
||||
# Home Lab Action Plan
|
||||
|
||||
## Phase 1: Critical Fixes (Do This Week)
|
||||
|
||||
### 1.1 Fix Failing Services
|
||||
|
||||
**bewcloud-memos (Restarting Loop)**
|
||||
```bash
|
||||
# SSH to controller
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||
|
||||
# Check what's wrong
|
||||
docker service logs bewcloud-memos-ssogxn-memos --tail 100
|
||||
|
||||
# Common fixes:
|
||||
# If database connection issue:
|
||||
docker service update --env-add "MEMOS_DB_HOST=correct-hostname" bewcloud-memos-ssogxn-memos
|
||||
|
||||
# If it keeps failing, try recreating:
|
||||
docker service rm bewcloud-memos-ssogxn-memos
|
||||
# Then redeploy via Dokploy UI
|
||||
```
|
||||
|
||||
**bendtstudio-webstatic (Rollback Paused)**
|
||||
```bash
|
||||
# Check the error
|
||||
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
|
||||
|
||||
# Force update to retry
|
||||
docker service update --force bendtstudio-webstatic-iq9evl
|
||||
|
||||
# If that fails, inspect the image
|
||||
docker service inspect bendtstudio-webstatic-iq9evl --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
|
||||
```
|
||||
|
||||
**syncthing (Stopped)**
|
||||
```bash
|
||||
# Option A: Start it if you need it
|
||||
docker service scale syncthing=1
|
||||
|
||||
# Option B: Remove it if not needed
|
||||
docker service rm syncthing
|
||||
# Also remove the volume if no longer needed
|
||||
docker volume rm cloud-syncthing-i2rpwr_syncthing_config
|
||||
```
|
||||
|
||||
### 1.2 Clean Up Unused Resources
|
||||
|
||||
```bash
|
||||
# Remove unused volumes (reclaim ~595MB)
|
||||
docker volume prune
|
||||
|
||||
# Remove unused images
|
||||
docker image prune -a
|
||||
|
||||
# System-wide cleanup
|
||||
docker system prune -a --volumes
|
||||
```
|
||||
|
||||
### 1.3 Document Current State
|
||||
|
||||
Take screenshots of:
|
||||
- Dokploy UI (all projects)
|
||||
- Swarmpit dashboard
|
||||
- Traefik dashboard (http://192.168.2.130:8080)
|
||||
- MinIO console (http://192.168.2.18:9001)
|
||||
- Gitea repositories
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Configuration Backup (Do This Week)
|
||||
|
||||
### 2.1 Create Git Repository for Infrastructure
|
||||
|
||||
```bash
|
||||
# On the controller node:
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||
|
||||
# Create a backup directory
|
||||
mkdir -p ~/infrastructure-backup/$(date +%Y-%m-%d)
|
||||
cd ~/infrastructure-backup/$(date +%Y-%m-%d)
|
||||
|
||||
# Copy all compose files
|
||||
cp -r /etc/dokploy/compose ./dokploy-compose
|
||||
cp -r /etc/dokploy/traefik ./traefik-config
|
||||
cp ~/minio-stack.yml ./
|
||||
|
||||
# Export service configs
|
||||
mkdir -p ./service-configs
|
||||
docker service ls -q | while read service; do
|
||||
docker service inspect "$service" > "./service-configs/${service}.json"
|
||||
done
|
||||
|
||||
# Export stack configs
|
||||
docker stack ls -q | while read stack; do
|
||||
docker stack ps "$stack" > "./service-configs/${stack}-tasks.txt"
|
||||
done
|
||||
|
||||
# Create a summary
|
||||
cat > README.txt << EOF
|
||||
Infrastructure Backup - $(date)
|
||||
Cluster: Docker Swarm with Dokploy
|
||||
Nodes: 3 (tpi-n1, tpi-n2, node-nas)
|
||||
Services: $(docker service ls -q | wc -l) services
|
||||
Stacks: $(docker stack ls -q | wc -l) stacks
|
||||
|
||||
See HOMELAB_AUDIT.md for full documentation.
|
||||
EOF
|
||||
|
||||
# Create tar archive
|
||||
cd ..
|
||||
tar -czf infrastructure-$(date +%Y-%m-%d).tar.gz $(date +%Y-%m-%d)
|
||||
```
|
||||
|
||||
### 2.2 Commit to Gitea
|
||||
|
||||
```bash
|
||||
# Clone your infrastructure repo (create if needed)
|
||||
# Replace with your actual Gitea URL
|
||||
git clone http://gitea.bendtstudio.com:3000/sirtimbly/homelab-configs.git
|
||||
cd homelab-configs
|
||||
|
||||
# Copy backed up configs
|
||||
cp -r ~/infrastructure-backup/$(date +%Y-%m-%d)/* .
|
||||
|
||||
# Organize by service
|
||||
mkdir -p {stacks,compose,dokploy,traefik,docs}
|
||||
mv dokploy-compose/* compose/ 2>/dev/null || true
|
||||
mv traefik-config/* traefik/ 2>/dev/null || true
|
||||
mv minio-stack.yml stacks/
|
||||
mv service-configs/* docs/ 2>/dev/null || true
|
||||
|
||||
# Commit
|
||||
git add .
|
||||
git commit -m "Initial infrastructure backup - $(date +%Y-%m-%d)
|
||||
|
||||
- All Dokploy compose files
|
||||
- Traefik configuration
|
||||
- MinIO stack definition
|
||||
- Service inspection exports
|
||||
- Task history exports
|
||||
|
||||
Services backed up:
|
||||
$(docker service ls --format '- {{.Name}}' | sort)
|
||||
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Security Hardening (Do Next Week)
|
||||
|
||||
### 3.1 Remove Exposed Credentials
|
||||
|
||||
**Problem:** Services have passwords in environment variables visible in Docker configs
|
||||
|
||||
**Solution:** Use Docker secrets or Dokploy environment variables
|
||||
|
||||
```bash
|
||||
# Example: Securing MinIO
|
||||
# Instead of having password in compose file, use Docker secret:
|
||||
|
||||
echo "your-minio-password" | docker secret create minio_root_password -
|
||||
|
||||
# Then in compose:
|
||||
# environment:
|
||||
# MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
|
||||
# secrets:
|
||||
# - minio_root_password
|
||||
```
|
||||
|
||||
**Action items:**
|
||||
1. List all services with exposed passwords:
|
||||
```bash
|
||||
docker service ls -q | xargs -I {} docker service inspect {} --format '{{.Spec.Name}}: {{range .Spec.TaskTemplate.ContainerSpec.Env}}{{.}} {{end}}' | grep -i password
|
||||
```
|
||||
|
||||
2. For each service, create a plan to move credentials to:
|
||||
- Docker secrets (best for swarm)
|
||||
- Environment files (easier to manage)
|
||||
- Dokploy UI environment variables
|
||||
|
||||
3. Update compose files and redeploy
|
||||
|
||||
### 3.2 Update Default Passwords
|
||||
|
||||
Check for default/weak passwords:
|
||||
- Dokploy (if still default)
|
||||
- MinIO
|
||||
- Gitea admin
|
||||
- Technitium DNS
|
||||
- Any databases
|
||||
|
||||
### 3.3 Review Exposed Ports
|
||||
|
||||
```bash
|
||||
# Check all published ports
|
||||
docker service ls --format '{{.Name}}: {{.Ports}}'
|
||||
|
||||
# Check if any services are exposed without Traefik
|
||||
# (Should only be: 53, 2222, 3000, 8384, 9000-9001)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Monitoring Setup (Do Next Week)
|
||||
|
||||
### 4.1 Set Up Prometheus + Grafana
|
||||
|
||||
You mentioned these in PLAN.md but they're not running. Let's add them:
|
||||
|
||||
Create `monitoring-stack.yml`:
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
volumes:
|
||||
- prometheus-data:/prometheus
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
command:
|
||||
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||
- '--storage.tsdb.path=/prometheus'
|
||||
networks:
|
||||
- dokploy-network
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.role == manager
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
volumes:
|
||||
- grafana-data:/var/lib/grafana
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
|
||||
secrets:
|
||||
- grafana_admin_password
|
||||
networks:
|
||||
- dokploy-network
|
||||
deploy:
|
||||
labels:
|
||||
- traefik.http.routers.grafana.rule=Host(`grafana.bendtstudio.com`)
|
||||
- traefik.http.routers.grafana.entrypoints=websecure
|
||||
- traefik.http.routers.grafana.tls.certresolver=letsencrypt
|
||||
- traefik.enable=true
|
||||
|
||||
volumes:
|
||||
prometheus-data:
|
||||
grafana-data:
|
||||
|
||||
networks:
|
||||
dokploy-network:
|
||||
external: true
|
||||
|
||||
secrets:
|
||||
grafana_admin_password:
|
||||
external: true
|
||||
```
|
||||
|
||||
### 4.2 Add Node Exporter
|
||||
|
||||
Deploy node-exporter on all nodes to collect system metrics.
|
||||
|
||||
### 4.3 Configure Alerts
|
||||
|
||||
Set up alerts for:
|
||||
- Service down
|
||||
- High CPU/memory usage
|
||||
- Disk space low
|
||||
- Certificate expiration
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Backup Strategy (Do Within 2 Weeks)
|
||||
|
||||
### 5.1 Define What to Back Up
|
||||
|
||||
**Critical Data:**
|
||||
1. Gitea repositories (/data/git)
|
||||
2. Dokploy database
|
||||
3. MinIO buckets
|
||||
4. Immich photos (/mnt/synology-data/immich)
|
||||
5. PostgreSQL databases
|
||||
6. Configuration files
|
||||
|
||||
### 5.2 Create Backup Scripts
|
||||
|
||||
Example backup script for Gitea:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /opt/backup/backup-gitea.sh
|
||||
|
||||
BACKUP_DIR="/backup/gitea/$(date +%Y%m%d)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# Backup Gitea data
|
||||
docker exec gitea-giteasqlite-bhymqw-gitea-1 tar czf /tmp/gitea-backup.tar.gz /data
|
||||
docker cp gitea-giteasqlite-bhymqw-gitea-1:/tmp/gitea-backup.tar.gz "$BACKUP_DIR/"
|
||||
|
||||
# Backup to MinIO (offsite)
|
||||
mc cp "$BACKUP_DIR/gitea-backup.tar.gz" minio/backups/gitea/
|
||||
|
||||
# Clean up old backups (keep 30 days)
|
||||
find /backup/gitea -type d -mtime +30 -exec rm -rf {} +
|
||||
```
|
||||
|
||||
### 5.3 Automate Backups
|
||||
|
||||
Add to crontab:
|
||||
```bash
|
||||
# Daily backups at 2 AM
|
||||
0 2 * * * /opt/backup/backup-gitea.sh
|
||||
0 3 * * * /opt/backup/backup-dokploy.sh
|
||||
0 4 * * * /opt/backup/backup-databases.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Documentation (Ongoing)
|
||||
|
||||
### 6.1 Create Service Catalog
|
||||
|
||||
For each service, document:
|
||||
- **Purpose:** What does it do?
|
||||
- **Access URL:** How do I reach it?
|
||||
- **Dependencies:** What does it need?
|
||||
- **Data location:** Where is data stored?
|
||||
- **Backup procedure:** How to back it up?
|
||||
- **Restore procedure:** How to restore it?
|
||||
|
||||
### 6.2 Create Runbooks
|
||||
|
||||
Common operations:
|
||||
- Adding a new service
|
||||
- Scaling a service
|
||||
- Updating a service
|
||||
- Removing a service
|
||||
- Recovering from node failure
|
||||
- Restoring from backup
|
||||
|
||||
### 6.3 Network Diagram
|
||||
|
||||
Create a visual diagram showing:
|
||||
- Nodes and their roles
|
||||
- Services and their locations
|
||||
- Network connections
|
||||
- Data flows
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Commands
|
||||
|
||||
```bash
|
||||
# Cluster status
|
||||
docker node ls
|
||||
docker service ls
|
||||
docker stack ls
|
||||
|
||||
# Service management
|
||||
docker service logs <service> --tail 100 -f
|
||||
docker service ps <service>
|
||||
docker service scale <service>=<count>
|
||||
docker service update --force <service>
|
||||
|
||||
# Resource usage
|
||||
docker system df
|
||||
docker stats
|
||||
|
||||
# SSH access
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Manager
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker
|
||||
|
||||
# Web UIs
|
||||
curl http://192.168.2.130:3000 # Dokploy
|
||||
curl http://192.168.2.130:888 # Swarmpit
|
||||
curl http://192.168.2.130:8080 # Traefik
|
||||
curl http://192.168.2.18:5380 # Technitium DNS
|
||||
curl http://192.168.2.18:9001 # MinIO Console
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Questions for You
|
||||
|
||||
Before we proceed, I need to clarify a few things:
|
||||
|
||||
1. **NAS Node Access:** What are the SSH credentials for node-nas (192.168.2.18)?
|
||||
|
||||
2. **bendtstudio-app:** Is this service needed? It has 0 replicas.
|
||||
|
||||
3. **syncthing:** Do you want to keep this? It's currently stopped.
|
||||
|
||||
4. **Monitoring:** Do you want me to set up Prometheus/Grafana now, or later?
|
||||
|
||||
5. **Gitea:** Can you provide access credentials so I can check what's already version controlled?
|
||||
|
||||
6. **Priority:** Which phase should we tackle first? I recommend Phase 1 (critical fixes).
|
||||
|
||||
---
|
||||
|
||||
*Action Plan Version 1.0 - February 9, 2026*
|
||||
425
HOMELAB_AUDIT.md
Normal file
425
HOMELAB_AUDIT.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Home Lab Cluster Audit Report
|
||||
|
||||
**Date:** February 9, 2026
|
||||
**Auditor:** opencode
|
||||
**Cluster:** Docker Swarm with Dokploy
|
||||
|
||||
---
|
||||
|
||||
## 1. Cluster Overview
|
||||
|
||||
- **Cluster Type:** Docker Swarm (3 nodes)
|
||||
- **Orchestration:** Dokploy v3.x
|
||||
- **Reverse Proxy:** Traefik v3.6.1
|
||||
- **DNS:** Technitium DNS Server
|
||||
- **Monitoring:** Swarmpit
|
||||
- **Git Server:** Gitea v1.24.4
|
||||
- **Object Storage:** MinIO
|
||||
|
||||
---
|
||||
|
||||
## 2. Node Inventory
|
||||
|
||||
### Node 1: tpi-n1 (Controller/Manager)
|
||||
- **IP:** 192.168.2.130
|
||||
- **Role:** Manager (Leader)
|
||||
- **Architecture:** aarch64 (ARM64)
|
||||
- **OS:** Linux
|
||||
- **CPU:** 8 cores
|
||||
- **RAM:** ~8 GB
|
||||
- **Docker:** v27.5.1
|
||||
- **Labels:**
|
||||
- `infra=true`
|
||||
- `role=storage`
|
||||
- `storage=high`
|
||||
- **Status:** Ready, Active
|
||||
|
||||
### Node 2: tpi-n2 (Worker)
|
||||
- **IP:** 192.168.2.19
|
||||
- **Role:** Worker
|
||||
- **Architecture:** aarch64 (ARM64)
|
||||
- **OS:** Linux
|
||||
- **CPU:** 8 cores
|
||||
- **RAM:** ~8 GB
|
||||
- **Docker:** v27.5.1
|
||||
- **Labels:**
|
||||
- `role=compute`
|
||||
- **Status:** Ready, Active
|
||||
|
||||
### Node 3: node-nas (Storage Worker)
|
||||
- **IP:** 192.168.2.18
|
||||
- **Role:** Worker (NAS/Storage)
|
||||
- **Architecture:** x86_64
|
||||
- **OS:** Linux
|
||||
- **CPU:** 2 cores
|
||||
- **RAM:** ~8 GB
|
||||
- **Docker:** v29.1.2
|
||||
- **Labels:**
|
||||
- `type=nas`
|
||||
- **Status:** Ready, Active
|
||||
|
||||
---
|
||||
|
||||
## 3. Docker Stacks (Swarm Mode)
|
||||
|
||||
### Active Stacks:
|
||||
|
||||
#### 1. minio
|
||||
- **Services:** 1 (minio_minio)
|
||||
- **Status:** Running
|
||||
- **Node:** node-nas (constrained to NAS)
|
||||
- **Ports:** 9000 (API), 9001 (Console)
|
||||
- **Storage:** /mnt/synology-data/minio (bind mount)
|
||||
- **Credentials:** [REDACTED - see service config]
|
||||
|
||||
#### 2. swarmpit
|
||||
- **Services:** 4
|
||||
- swarmpit_app (UI) - Running on tpi-n1, Port 888
|
||||
- swarmpit_agent (global) - Running on all 3 nodes
|
||||
- swarmpit_db (CouchDB) - Running on tpi-n2
|
||||
- swarmpit_influxdb - Running on node-nas
|
||||
- **Status:** Active with historical failures
|
||||
- **Issues:** Multiple container failures in history (mostly resolved)
|
||||
|
||||
---
|
||||
|
||||
## 4. Dokploy-Managed Services
|
||||
|
||||
### Running Services (via Dokploy Compose):
|
||||
|
||||
1. **ai-lobechat-yqvecg** - AI chat interface
|
||||
2. **bewcloud-memos-ssogxn** - Note-taking app (⚠️ Restarting loop)
|
||||
3. **bewcloud-silverbullet-42sjev** - SilverBullet markdown editor + Watchtower
|
||||
4. **cloud-bewcloud-u2pls5** - BewCloud instance with Radicale (CalDAV/CardDAV)
|
||||
5. **cloud-fizzy-ezuhfq** - Fizzy web app
|
||||
6. **cloud-ironcalc-0id5k8** - IronCalc spreadsheet
|
||||
7. **cloud-radicale-wqldcv** - Standalone Radicale server
|
||||
8. **cloud-uptimekuma-jdeivt** - Uptime monitoring
|
||||
9. **dns-technitum-6ojgo2** - Technitium DNS server
|
||||
10. **gitea-giteasqlite-bhymqw** - Git server (Port 3000, SSH on 2222)
|
||||
11. **gitea-registry-vdftrt** - Docker registry (Port 5000)
|
||||
|
||||
### Dokploy Infrastructure Services:
|
||||
- **dokploy** - Main Dokploy UI (Port 3000, host mode)
|
||||
- **dokploy-postgres** - Dokploy database
|
||||
- **dokploy-redis** - Dokploy cache
|
||||
- **dokploy-traefik** - Reverse proxy (Ports 80, 443, 8080)
|
||||
|
||||
---
|
||||
|
||||
## 5. Standalone Services (docker-compose)
|
||||
|
||||
### Running:
|
||||
- **technitium-dns** - DNS server (Port 53, 5380)
|
||||
- **immich3-compose** - Photo management (Immich v2.3.0)
|
||||
- immich-server
|
||||
- immich-machine-learning
|
||||
- immich-database (pgvecto-rs)
|
||||
- immich-redis
|
||||
|
||||
### Stack Services:
|
||||
- **bendtstudio-pancake-bzgfpc** - MariaDB database (Port 3306)
|
||||
- **bendtstudio-webstatic-iq9evl** - Static web files (⚠️ Rollback paused state)
|
||||
|
||||
---
|
||||
|
||||
## 6. Issues Identified
|
||||
|
||||
### 🔴 Critical Issues:
|
||||
|
||||
1. **bewcloud-memos in Restart Loop**
|
||||
- Container keeps restarting (seen 24 seconds ago)
|
||||
- Status: `Restarting (0) 24 seconds ago`
|
||||
- **Action Required:** Check logs and fix configuration
|
||||
|
||||
2. **bendtstudio-webstatic in Rollback Paused State**
|
||||
- Service is not updating properly
|
||||
- State: `rollback_paused`
|
||||
- **Action Required:** Investigate update failure
|
||||
|
||||
3. **bendtstudio-app Not Running**
|
||||
- Service has 0/0 replicas
|
||||
- **Action Required:** Determine if needed or remove
|
||||
|
||||
4. **syncthing Stopped**
|
||||
- Service has 0 replicas
|
||||
- Should be on node-nas
|
||||
- **Action Required:** Restart or remove if not needed
|
||||
|
||||
### 🟡 Warning Issues:
|
||||
|
||||
5. **Swarmpit Agent Failures (Historical)**
|
||||
- Multiple past failures on all nodes
|
||||
- Currently running but concerning history
|
||||
- **Action Required:** Monitor for stability
|
||||
|
||||
6. **No Monitoring of MinIO**
|
||||
- MinIO running but no obvious backup/monitoring strategy documented
|
||||
- **Action Required:** Set up monitoring and backup
|
||||
|
||||
7. **Credential Management**
|
||||
- Passwords visible in service configs (bendtstudio-webstatic, MinIO, DNS)
|
||||
- **Action Required:** Migrate to Docker secrets or env files
|
||||
|
||||
### 🟢 Informational:
|
||||
|
||||
8. **13 Unused/Orphaned Volumes**
|
||||
- 33 total volumes, only 20 active
|
||||
- **Action Required:** Clean up unused volumes to reclaim ~595MB
|
||||
|
||||
9. **Gitea Repository Status Unknown**
|
||||
- Cannot verify if all compose files are version controlled
|
||||
- **Action Required:** Audit Gitea repositories
|
||||
|
||||
---
|
||||
|
||||
## 7. Storage Configuration
|
||||
|
||||
### Local Volumes (33 total):
|
||||
Key volumes include:
|
||||
- `dokploy-postgres-database`
|
||||
- `bewcloud-postgres-in40hh-data`
|
||||
- `gitea-data`, `gitea-registry-data`
|
||||
- `immich-postgres`, `immich-redis-data`, `immich-model-cache`
|
||||
- `bendtstudio-pancake-data`
|
||||
- `shared-data` (NFS/shared)
|
||||
- Various app-specific volumes
|
||||
|
||||
### Bind Mounts:
|
||||
- **MinIO:** `/mnt/synology-data/minio` → `/data`
|
||||
- **Syncthing:** `/mnt/synology-data` → `/var/syncthing` (currently stopped)
|
||||
- **Dokploy:** `/etc/dokploy` → `/etc/dokploy`
|
||||
|
||||
### NFS Mounts:
|
||||
- Synology NAS mounted at `/mnt/synology-data/`
|
||||
- Contains: immich/, minio/
|
||||
|
||||
---
|
||||
|
||||
## 8. Networking
|
||||
|
||||
### Overlay Networks:
|
||||
- `dokploy-network` - Main Dokploy network
|
||||
- `minio_default` - MinIO stack network
|
||||
- `swarmpit_net` - Swarmpit monitoring network
|
||||
- `ingress` - Docker Swarm ingress
|
||||
|
||||
### Bridge Networks:
|
||||
- Multiple app-specific networks created by compose
|
||||
- `ai-lobechat-yqvecg`
|
||||
- `bewcloud-memos-ssogxn`
|
||||
- `bewcloud-silverbullet-42sjev`
|
||||
- `cloud-fizzy-ezuhfq_default`
|
||||
- `cloud-uptimekuma-jdeivt`
|
||||
- `gitea-giteasqlite-bhymqw`
|
||||
- `gitea-registry-vdftrt`
|
||||
- `immich3-compose-ubyhe9_default`
|
||||
|
||||
---
|
||||
|
||||
## 9. SSL/TLS Configuration
|
||||
|
||||
- **Certificate Resolver:** Let's Encrypt (ACME)
|
||||
- **Email:** sirtimbly@gmail.com
|
||||
- **Challenge Type:** HTTP-01
|
||||
- **Storage:** `/etc/dokploy/traefik/dynamic/acme.json`
|
||||
- **Entry Points:** web (80) → websecure (443) with auto-redirect
|
||||
- **HTTP/3:** Enabled on websecure
|
||||
|
||||
---
|
||||
|
||||
## 10. Traefik Routing
|
||||
|
||||
### Configured Routes (via labels):
|
||||
- gitea.bendtstudio.com → Gitea
|
||||
- Multiple apps via traefik.me subdomains
|
||||
- HTTP → HTTPS redirect enabled
|
||||
- Middlewares configured in `/etc/dokploy/traefik/dynamic/`
|
||||
|
||||
---
|
||||
|
||||
## 11. DNS Configuration
|
||||
|
||||
### Technitium DNS:
|
||||
- **Port:** 53 (TCP/UDP), 5380 (Web UI)
|
||||
- **Domain:** dns.bendtstudio.com
|
||||
- **Admin Password:** [REDACTED]
|
||||
- **Placement:** Locked to tpi-n1
|
||||
- **TZ:** America/New_York
|
||||
|
||||
### Services using DNS:
|
||||
- All services accessible via bendtstudio.com subdomains
|
||||
- Internal DNS resolution for Docker services
|
||||
|
||||
---
|
||||
|
||||
## 12. Configuration Files Location
|
||||
|
||||
### In `/etc/dokploy/`:
|
||||
- `traefik/traefik.yml` - Main Traefik config
|
||||
- `traefik/dynamic/*.yml` - Dynamic routes and middlewares
|
||||
- `compose/*/code/docker-compose.yml` - Dokploy-managed compose files
|
||||
|
||||
### In `/home/ubuntu/`:
|
||||
- `minio-stack.yml` - MinIO stack definition
|
||||
|
||||
### In local workspace:
|
||||
- Various compose files (not all deployed via Dokploy)
|
||||
- May be out of sync with running services
|
||||
|
||||
---
|
||||
|
||||
## 13. Missing Configuration in Version Control
|
||||
|
||||
Based on the analysis, the following may NOT be properly tracked in Gitea:
|
||||
|
||||
1. ✅ **Gitea** itself - compose file present
|
||||
2. ✅ **MinIO** - stack file in ~/minio-stack.yml
|
||||
3. ⚠️ **Dokploy dynamic configs** - traefik routes
|
||||
4. ⚠️ **All Dokploy-managed compose files** - 11 services
|
||||
5. ❌ **Technitium DNS** - compose file in /etc/dokploy/
|
||||
6. ❌ **Immich** - compose configuration
|
||||
7. ❌ **Swarmpit** - stack configuration
|
||||
8. ❌ **Dokploy infrastructure** - internal services
|
||||
|
||||
---
|
||||
|
||||
## 14. Resource Usage
|
||||
|
||||
### Docker System:
|
||||
- **Images:** 23 (10.91 GB)
|
||||
- **Containers:** 26 (135 MB)
|
||||
- **Volumes:** 33 (2.02 GB, 595MB reclaimable)
|
||||
- **Build Cache:** 0
|
||||
|
||||
### Node Resources:
|
||||
- **tpi-n1 & tpi-n2:** 8 cores ARM64, 8GB RAM each
|
||||
- **node-nas:** 2 cores x86_64, 8GB RAM
|
||||
|
||||
---
|
||||
|
||||
## 15. Recommendations
|
||||
|
||||
### Immediate Actions (High Priority):
|
||||
|
||||
1. **Fix bewcloud-memos**
|
||||
```bash
|
||||
docker service logs bewcloud-memos-ssogxn-memos --tail 50
|
||||
```
|
||||
|
||||
2. **Fix bendtstudio-webstatic**
|
||||
```bash
|
||||
docker service ps bendtstudio-webstatic-iq9evl --no-trunc
|
||||
docker service update --force bendtstudio-webstatic-iq9evl
|
||||
```
|
||||
|
||||
3. **Restart or Remove syncthing**
|
||||
```bash
|
||||
# Option 1: Scale up
|
||||
docker service scale syncthing=1
|
||||
|
||||
# Option 2: Remove
|
||||
docker service rm syncthing
|
||||
```
|
||||
|
||||
4. **Clean up unused volumes**
|
||||
```bash
|
||||
docker volume prune
|
||||
```
|
||||
|
||||
### Short-term Actions (Medium Priority):
|
||||
|
||||
5. **Audit Gitea repositories**
|
||||
- Access Gitea at http://gitea.bendtstudio.com
|
||||
- Verify which compose files are tracked
|
||||
- Commit missing configurations
|
||||
|
||||
6. **Secure credentials**
|
||||
- Use Docker secrets for passwords
|
||||
- Move credentials to environment files
|
||||
- Never commit .env files with real passwords
|
||||
|
||||
7. **Set up automated backups**
|
||||
- Back up Dokploy database
|
||||
- Back up Gitea repositories
|
||||
- Back up MinIO data
|
||||
|
||||
8. **Document all services**
|
||||
- Create README for each service
|
||||
- Document dependencies and data locations
|
||||
- Create runbook for common operations
|
||||
|
||||
### Long-term Actions (Low Priority):
|
||||
|
||||
9. **Implement proper monitoring**
|
||||
- Prometheus/Grafana for metrics (mentioned in PLAN.md but not found)
|
||||
- Alerting for service failures
|
||||
- Disk usage monitoring
|
||||
|
||||
10. **Implement GitOps workflow**
|
||||
- All changes through Git
|
||||
- Automated deployments via Dokploy webhooks
|
||||
- Configuration drift detection
|
||||
|
||||
11. **Consolidate storage strategy**
|
||||
- Define clear policy for volumes vs bind mounts
|
||||
- Document backup procedures for each storage type
|
||||
|
||||
12. **Security audit**
|
||||
- Review all exposed ports
|
||||
- Check for default/weak passwords
|
||||
- Implement network segmentation if needed
|
||||
|
||||
---
|
||||
|
||||
## 16. Next Steps Checklist
|
||||
|
||||
- [ ] Fix critical service issues (memos, webstatic)
|
||||
- [ ] Document all running services with purpose
|
||||
- [ ] Commit all compose files to Gitea
|
||||
- [ ] Create backup strategy
|
||||
- [ ] Set up monitoring and alerting
|
||||
- [ ] Clean up unused resources
|
||||
- [ ] Create disaster recovery plan
|
||||
- [ ] Document SSH access for all nodes
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Quick Commands Reference
|
||||
|
||||
```bash
|
||||
# View cluster status
|
||||
docker node ls
|
||||
docker service ls
|
||||
docker stack ls
|
||||
|
||||
# View service logs
|
||||
docker service logs <service-name> --tail 100 -f
|
||||
|
||||
# View container logs
|
||||
docker logs <container-name> --tail 100 -f
|
||||
|
||||
# Scale a service
|
||||
docker service scale <service-name>=<replicas>
|
||||
|
||||
# Update a service
|
||||
docker service update --force <service-name>
|
||||
|
||||
# SSH to nodes
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # tpi-n1 (manager)
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # tpi-n2 (worker)
|
||||
# NAS node requires different credentials
|
||||
|
||||
# Access Dokploy UI
|
||||
http://192.168.2.130:3000
|
||||
|
||||
# Access Swarmpit UI
|
||||
http://192.168.2.130:888
|
||||
|
||||
# Access Traefik Dashboard
|
||||
http://192.168.2.130:8080
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*End of Audit Report*
|
||||
34
QUICK_REFERENCE.md
Normal file
34
QUICK_REFERENCE.md
Normal file
@@ -0,0 +1,34 @@
|
||||
## Cluster Access
|
||||
|
||||
```bash
|
||||
# SSH to controller
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130
|
||||
|
||||
# SSH to worker
|
||||
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19
|
||||
|
||||
# SSH to NAS node
|
||||
ssh tim@192.168.2.18
|
||||
```
|
||||
|
||||
## Dokploy
|
||||
|
||||
Configured to deploy across the nodes in the cluster. The nas node has access to a NFS share at <????>
|
||||
|
||||
Dokploy has an s3 compatible storage which is pointing to minio on the nas node. This is used for backups.
|
||||
|
||||
## Minio
|
||||
|
||||
Minio is installed on the nas node. It is used for backups of the dokploy database and for storing the dokploy s3 compatible storage.
|
||||
|
||||
## Traefik
|
||||
|
||||
Traefik is installed on the controller node. It is used to route traffic to the various services on the cluster.
|
||||
|
||||
## Gitea
|
||||
|
||||
Used for storing all the compose and stack yml files for each service.
|
||||
|
||||
## Technitium DNS
|
||||
|
||||
Using internal DNS requires configuring the docker DNS to point to the Technitium DNS server.
|
||||
103
SETUP.md
Normal file
103
SETUP.md
Normal file
@@ -0,0 +1,103 @@
|
||||
Dokploy + Docker Swarm Homelab Setup Instructions
|
||||
This guide walks through setting up a fresh, multi-node Docker Swarm cluster using Dokploy for quick web app deployment and easy hosting of infrastructure services (like Pi-hole and Minio), including shared storage via NFS from your NAS node.
|
||||
|
||||
1. Prepare Environment
|
||||
• Choose a primary node (can be any capable Linux server).
|
||||
• Identify your NAS node (high capacity storage).
|
||||
• Gather all SSH credentials.
|
||||
• Ensure all nodes have Docker installed (curl -fsSL https://get.docker.com | sh).
|
||||
|
||||
2. Initialize Docker Swarm Cluster
|
||||
On your primary node:
|
||||
docker swarm init --advertise-addr <PRIMARY_NODE_IP>
|
||||
On each additional node:
|
||||
• Run the join command given by the previous step, e.g.:
|
||||
docker swarm join --token <TOKEN> <PRIMARY_NODE_IP>:2377
|
||||
|
||||
3. Label Nodes for Placement Constraints
|
||||
On your primary node, label nodes:
|
||||
docker node update --label-add role=storage nas-node-01
|
||||
docker node update --label-add storage=high nas-node-01
|
||||
docker node update --label-add role=compute node-light-01
|
||||
docker node update --label-add infra=true nas-node-01
|
||||
(Replace node names as appropriate)
|
||||
|
||||
4. Set Up Dokploy
|
||||
On primary node:
|
||||
curl -sSL https://dokploy.com/install.sh | sh
|
||||
• Dokploy UI will be available on port 8080.
|
||||
• Default credentials: admin / admin (change ASAP).
|
||||
|
||||
5. Set Up Shared NFS Storage from Your NAS
|
||||
On your NAS node:
|
||||
• Install NFS server (Debian/Ubuntu):
|
||||
sudo apt install nfs-kernel-server
|
||||
• Export a directory:
|
||||
o Edit /etc/exports, add:
|
||||
/mnt/storage/docker-data *(rw,sync,no_subtree_check)
|
||||
o Restart NFS:
|
||||
sudo exportfs -ra
|
||||
sudo systemctl restart nfs-kernel-server
|
||||
|
||||
6. Create Shared NFS Volume in Docker
|
||||
On the manager node:
|
||||
docker volume create
|
||||
--driver local
|
||||
--opt type=nfs
|
||||
--opt o=addr=<NAS_IP>,rw,nolock,nfsvers=4
|
||||
--opt device=:/mnt/storage/docker-data
|
||||
shared-data
|
||||
(Replace <NAS_IP> with your NAS's address.)
|
||||
|
||||
7. Deploy Apps with Dokploy + Placement Constraints
|
||||
• Use Dokploy UI to:
|
||||
o Deploy your web apps (Node.js, PHP, static sites)
|
||||
o Set replica counts (scaling)
|
||||
o Pin infrastructure apps (like Pi-hole or Minio) to the NAS node via placement constraints.
|
||||
o Use the shared NFS volume for persistent data.
|
||||
Example Docker Compose snippet for Pinning:
|
||||
services:
|
||||
pihole:
|
||||
image: pihole/pihole
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.labels.role==storage
|
||||
volumes:
|
||||
- shared-data:/etc/pihole
|
||||
|
||||
8. (Optional) Set Up Minio (S3-Compatible Storage)
|
||||
• Deploy Minio with Dokploy, pin it to your NAS, and use shared volume for data:
|
||||
services:
|
||||
minio:
|
||||
image: minio/minio
|
||||
command: server /data --console-address ":9001"
|
||||
environment:
|
||||
MINIO_ROOT_USER: admin
|
||||
MINIO_ROOT_PASSWORD: changeme123
|
||||
volumes:
|
||||
- shared-data:/data
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.labels.role==storage
|
||||
ports:
|
||||
- "9000:9000"
|
||||
- "9001:9001"
|
||||
|
||||
9. Add Web Apps and Experiment!
|
||||
• Use Dokploy's UI to connect to your Gitea instance, auto-deploy repos, and experiment rapidly.
|
||||
• Traefik integration and SSL setup is handled automatically in Dokploy.
|
||||
|
||||
10. Restore K3s (Optional, Later)
|
||||
• Your original K3s manifests are saved in git—just reapply if you wish to revert:
|
||||
k3s server
|
||||
kubectl apply -f <your-manifests>
|
||||
|
||||
References
|
||||
• Docker Swarm Docs: https://docs.docker.com/engine/swarm/
|
||||
• Dokploy Docs: https://dokploy.com/docs/
|
||||
• Docker Volumes: https://docs.docker.com/engine/storage/volumes/
|
||||
• NFS on Linux: https://help.ubuntu.com/community/NFS
|
||||
|
||||
This guide gives you a fast start for a declarative, multi-node homelab with web app simplicity and infrastructure reliability using Dokploy and Docker Swarm!
|
||||
Reference in New Issue
Block a user