- Document Docker data migration from /var/lib/docker to /mnt/nvme/docker - Record nginx configuration fixes for Pancake pretty URLs and static assets - Document DNS resolution fix using dnsmasq for container builds - Add troubleshooting procedures for Docker, DNS, networking, and deployments - Include monitoring commands and emergency recovery procedures - Document service configurations for Gitea, Dokploy, Traefik, Swarmpit, Bendtstudio - Add configuration file examples and future improvement suggestions This provides complete reference for homelab maintenance and troubleshooting.
7.5 KiB
7.5 KiB
Homelab Infrastructure & Deployment Notes
Overview
This document contains comprehensive notes about the homelab Docker Swarm cluster setup, troubleshooting procedures, and deployment configurations.
Infrastructure Architecture
Host Configuration
- Main Node:
tpi-n1(192.168.2.130) - Docker Swarm Manager - Worker Node:
tpi-n2- Docker Swarm Worker - Storage:
- Main drive:
/dev/mmcblk0p2(29GB) - System and applications - NVMe drive:
/mnt/nvme(916GB) - Docker data storage
- Main drive:
Docker Configuration
- Data Directory:
/mnt/nvme/docker(moved from/var/lib/docker) - DNS Configuration: Local dnsmasq forwarder at 127.0.0.1
- External DNS: 8.8.8.8, 8.8.4.4, 1.1.1.1
- Docker Daemon Config:
/etc/docker/daemon.json
Services Running
- Traefik: Load balancer and SSL termination
- Dokploy: Deployment management (port 3000)
- Gitea: Git server (port 2222 for SSH)
- Swarmpit: Docker Swarm management UI (port 888)
- Bendtstudio: Main web application (5 replicas)
- MariaDB: Database for Pancake application
Major Maintenance Tasks
1. Docker Data Migration (Completed ✅)
Problem: Main drive 100% full (28G/29G used) Solution: Moved 19GB Docker data to NVMe drive
Commands Used:
# Stop Docker services
sudo systemctl stop docker docker.socket
# Move data to NVMe
sudo cp -a /var/lib/docker /mnt/nvme/
# Update Docker config
echo '{"data-root": "/mnt/nvme/docker"}' | sudo tee /etc/docker/daemon.json
# Restart Docker
sudo systemctl start docker
Result:
- Freed 15GB on main drive (100% → 46% usage)
- Docker data on fast NVMe storage
- All services maintained without downtime
2. Nginx Configuration Fix (Completed ✅)
Problem: Pancake static assets returning 404, pretty URLs not working Root Cause: Apache .htaccess rules not translated to nginx properly
Files Modified:
nginx.template.conf- Main configuration template- All running containers - Updated nginx configuration
Key Changes:
# Fixed static asset paths
location /pancake/third_party {
alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
# ... caching headers
}
# Added pretty URL support
location /pancake {
try_files $uri $uri/ @pancake_fallback;
}
location @pancake_fallback {
rewrite ^.*$ /pancake/index.php last;
}
# Fixed PHP handling
location ~ \.php$ {
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
# ... other fastcgi params
}
Result:
- ✅ Static assets serving correctly
- ✅ Pretty URLs working (bendtstudio.com/pancake/admin)
- ✅ Apache .htaccess functionality replicated in nginx
3. DNS Resolution Fix (Completed ✅)
Problem: Docker containers couldn't resolve ghcr.io causing build failures
Root Cause: No proper DNS forwarding for containers
Solution Implemented:
# Install dnsmasq
sudo apt install -y dnsmasq
# Configure DNS forwarding
cat > /etc/dnsmasq.conf << EOF
server=8.8.8.8
server=8.8.4.4
server=1.1.1.1
listen-address=127.0.0.1
bind-interfaces
EOF
# Start dnsmasq
sudo systemctl enable dnsmasq && sudo systemctl start dnsmasq
# Update system DNS
echo 'nameserver 127.0.0.1' | sudo tee /etc/resolv.conf
# Update Docker to use local DNS
echo '{"data-root": "/mnt/nvme/docker", "dns": ["127.0.0.1"]}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker
Result:
- ✅ ghcr.io resolution working
- ✅ Docker builds successful
- ✅ Deployments working via Dokploy UI
Service-Specific Notes
Gitea (Git Server)
- SSH Access:
git@gitea.bendtstudio.com:2222orgit@gitea.bendtstudio.com:username/repo.git - Web UI: https://gitea.bendtstudio.com
- Status: Working correctly, SSH authentication successful
- Note: "No shell access" message is normal for Gitea
Dokploy (Deployment Management)
- Web UI: https://dokploy.bendtstudio.com (port 3000)
- Usage:
- Push code to Gitea repository
- Dokploy automatically detects new commits
- Trigger manual redeployment via web UI
- Monitor build logs in real-time
- Build Process: Uses Nixpacks for containerization
- Current Status: ✅ Working with DNS fix
Bendtstudio Web Application
- Domain: https://bendtstudio.com
- Pancake App: https://bendtstudio.com/pancake
- Replicas: 5 containers for load balancing
- Static Assets: All serving correctly from
/pancake/third_party/ - Database: MariaDB container for Pancake data
Troubleshooting Procedures
Docker Issues
# Check Docker status
sudo systemctl status docker
# Check container logs
docker logs <container_name>
# Check service status
docker service ls
# Restart Docker daemon
sudo systemctl restart docker
DNS Issues
# Check DNS resolution
nslookup ghcr.io
# Test from container
docker exec <container> curl -I https://ghcr.io
# Restart dnsmasq
sudo systemctl restart dnsmasq
# Check Docker DNS config
cat /etc/docker/daemon.json
Network Issues
# Check port mapping
docker port <container_name>
# Test external access
nc -v <host_ip> <port>
# Check Traefik routes
curl -s http://localhost:8080/api/http/routers
# Check container networks
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
Application Deployment Issues
# Check deployment logs
docker service logs <service_name> --tail 50
# Force redeployment
docker service update --force <service_name>
# Check service configuration
docker service inspect <service_name>
# Scale services
docker service scale <service_name>=<replicas>
Monitoring Commands
System Resources
# Disk usage
df -h
# Memory usage
free -h
# Docker space usage
docker system df
# Container resource usage
docker stats
Docker Swarm Health
# Check swarm status
docker node ls
# Check service health
docker service ls
# Check individual services
docker service ps <service_name>
Configuration Files
Docker Daemon Configuration
{
"data-root": "/mnt/nvme/docker",
"dns": ["127.0.0.1"]
}
Nginx Template Key Sections
# Static assets for pancake/third_party
location /pancake/third_party {
alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
expires 1y;
add_header Cache-Control "public, immutable";
}
# Pretty URLs for Pancake
location /pancake {
try_files $uri $uri/ @pancake_fallback;
}
location @pancake_fallback {
rewrite ^.*$ /pancake/index.php last;
}
Future Improvements
DNS Enhancement
- Configure dnsmasq to forward internal domains to local DNS server
- Set up conditional forwarding for homelab services
- Add DNS caching for better performance
Backup Strategy
- Regular backups of Docker volumes to NVMe
- Automated snapshots of configuration files
- Git repository tracking of all changes
Monitoring
- Set up Prometheus/Grafana for system monitoring
- Log aggregation for better troubleshooting
- Alert configuration for critical services
Emergency Procedures
Full System Recovery
# 1. Check all services
docker service ls
# 2. Restart critical services
docker service update --force dokploy
docker service update --force traefik
# 3. Check DNS resolution
curl -I https://ghcr.io
# 4. Verify storage
df -h
docker system df
Service Restoration
# Restore from backup if needed
docker volume ls
docker volume restore <volume_name> <backup_file>
# Re-deploy from last known good state
git log --oneline -10
git checkout <commit_hash>
Last Updated: 2025-11-29 Maintainer: sirtimbly Environment: Production Docker Swarm Cluster