diff --git a/agents.md b/agents.md new file mode 100644 index 0000000..7746f06 --- /dev/null +++ b/agents.md @@ -0,0 +1,322 @@ +# Homelab Infrastructure & Deployment Notes + +## Overview +This document contains comprehensive notes about the homelab Docker Swarm cluster setup, troubleshooting procedures, and deployment configurations. + +## Infrastructure Architecture + +### Host Configuration +- **Main Node**: `tpi-n1` (192.168.2.130) - Docker Swarm Manager +- **Worker Node**: `tpi-n2` - Docker Swarm Worker +- **Storage**: + - Main drive: `/dev/mmcblk0p2` (29GB) - System and applications + - NVMe drive: `/mnt/nvme` (916GB) - Docker data storage + +### Docker Configuration +- **Data Directory**: `/mnt/nvme/docker` (moved from `/var/lib/docker`) +- **DNS Configuration**: Local dnsmasq forwarder at 127.0.0.1 +- **External DNS**: 8.8.8.8, 8.8.4.4, 1.1.1.1 +- **Docker Daemon Config**: `/etc/docker/daemon.json` + +### Services Running +- **Traefik**: Load balancer and SSL termination +- **Dokploy**: Deployment management (port 3000) +- **Gitea**: Git server (port 2222 for SSH) +- **Swarmpit**: Docker Swarm management UI (port 888) +- **Bendtstudio**: Main web application (5 replicas) +- **MariaDB**: Database for Pancake application + +## Major Maintenance Tasks + +### 1. Docker Data Migration (Completed ✅) +**Problem**: Main drive 100% full (28G/29G used) +**Solution**: Moved 19GB Docker data to NVMe drive + +**Commands Used**: +```bash +# Stop Docker services +sudo systemctl stop docker docker.socket + +# Move data to NVMe +sudo cp -a /var/lib/docker /mnt/nvme/ + +# Update Docker config +echo '{"data-root": "/mnt/nvme/docker"}' | sudo tee /etc/docker/daemon.json + +# Restart Docker +sudo systemctl start docker +``` + +**Result**: +- Freed 15GB on main drive (100% → 46% usage) +- Docker data on fast NVMe storage +- All services maintained without downtime + +### 2. Nginx Configuration Fix (Completed ✅) +**Problem**: Pancake static assets returning 404, pretty URLs not working +**Root Cause**: Apache .htaccess rules not translated to nginx properly + +**Files Modified**: +- `nginx.template.conf` - Main configuration template +- All running containers - Updated nginx configuration + +**Key Changes**: +```nginx +# Fixed static asset paths +location /pancake/third_party { + alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party; + # ... caching headers +} + +# Added pretty URL support +location /pancake { + try_files $uri $uri/ @pancake_fallback; +} + +location @pancake_fallback { + rewrite ^.*$ /pancake/index.php last; +} + +# Fixed PHP handling +location ~ \.php$ { + fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; + # ... other fastcgi params +} +``` + +**Result**: +- ✅ Static assets serving correctly +- ✅ Pretty URLs working (bendtstudio.com/pancake/admin) +- ✅ Apache .htaccess functionality replicated in nginx + +### 3. DNS Resolution Fix (Completed ✅) +**Problem**: Docker containers couldn't resolve `ghcr.io` causing build failures +**Root Cause**: No proper DNS forwarding for containers + +**Solution Implemented**: +```bash +# Install dnsmasq +sudo apt install -y dnsmasq + +# Configure DNS forwarding +cat > /etc/dnsmasq.conf << EOF +server=8.8.8.8 +server=8.8.4.4 +server=1.1.1.1 +listen-address=127.0.0.1 +bind-interfaces +EOF + +# Start dnsmasq +sudo systemctl enable dnsmasq && sudo systemctl start dnsmasq + +# Update system DNS +echo 'nameserver 127.0.0.1' | sudo tee /etc/resolv.conf + +# Update Docker to use local DNS +echo '{"data-root": "/mnt/nvme/docker", "dns": ["127.0.0.1"]}' | sudo tee /etc/docker/daemon.json +sudo systemctl restart docker +``` + +**Result**: +- ✅ ghcr.io resolution working +- ✅ Docker builds successful +- ✅ Deployments working via Dokploy UI + +## Service-Specific Notes + +### Gitea (Git Server) +- **SSH Access**: `git@gitea.bendtstudio.com:2222` or `git@gitea.bendtstudio.com:username/repo.git` +- **Web UI**: https://gitea.bendtstudio.com +- **Status**: Working correctly, SSH authentication successful +- **Note**: "No shell access" message is normal for Gitea + +### Dokploy (Deployment Management) +- **Web UI**: https://dokploy.bendtstudio.com (port 3000) +- **Usage**: + 1. Push code to Gitea repository + 2. Dokploy automatically detects new commits + 3. Trigger manual redeployment via web UI + 4. Monitor build logs in real-time +- **Build Process**: Uses Nixpacks for containerization +- **Current Status**: ✅ Working with DNS fix + +### Bendtstudio Web Application +- **Domain**: https://bendtstudio.com +- **Pancake App**: https://bendtstudio.com/pancake +- **Replicas**: 5 containers for load balancing +- **Static Assets**: All serving correctly from `/pancake/third_party/` +- **Database**: MariaDB container for Pancake data + +## Troubleshooting Procedures + +### Docker Issues +```bash +# Check Docker status +sudo systemctl status docker + +# Check container logs +docker logs + +# Check service status +docker service ls + +# Restart Docker daemon +sudo systemctl restart docker +``` + +### DNS Issues +```bash +# Check DNS resolution +nslookup ghcr.io + +# Test from container +docker exec curl -I https://ghcr.io + +# Restart dnsmasq +sudo systemctl restart dnsmasq + +# Check Docker DNS config +cat /etc/docker/daemon.json +``` + +### Network Issues +```bash +# Check port mapping +docker port + +# Test external access +nc -v + +# Check Traefik routes +curl -s http://localhost:8080/api/http/routers + +# Check container networks +docker inspect --format '{{json .NetworkSettings.Networks}}' +``` + +### Application Deployment Issues +```bash +# Check deployment logs +docker service logs --tail 50 + +# Force redeployment +docker service update --force + +# Check service configuration +docker service inspect + +# Scale services +docker service scale = +``` + +## Monitoring Commands + +### System Resources +```bash +# Disk usage +df -h + +# Memory usage +free -h + +# Docker space usage +docker system df + +# Container resource usage +docker stats +``` + +### Docker Swarm Health +```bash +# Check swarm status +docker node ls + +# Check service health +docker service ls + +# Check individual services +docker service ps +``` + +## Configuration Files + +### Docker Daemon Configuration +```json +{ + "data-root": "/mnt/nvme/docker", + "dns": ["127.0.0.1"] +} +``` + +### Nginx Template Key Sections +```nginx +# Static assets for pancake/third_party +location /pancake/third_party { + alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party; + expires 1y; + add_header Cache-Control "public, immutable"; +} + +# Pretty URLs for Pancake +location /pancake { + try_files $uri $uri/ @pancake_fallback; +} + +location @pancake_fallback { + rewrite ^.*$ /pancake/index.php last; +} +``` + +## Future Improvements + +### DNS Enhancement +- Configure dnsmasq to forward internal domains to local DNS server +- Set up conditional forwarding for homelab services +- Add DNS caching for better performance + +### Backup Strategy +- Regular backups of Docker volumes to NVMe +- Automated snapshots of configuration files +- Git repository tracking of all changes + +### Monitoring +- Set up Prometheus/Grafana for system monitoring +- Log aggregation for better troubleshooting +- Alert configuration for critical services + +## Emergency Procedures + +### Full System Recovery +```bash +# 1. Check all services +docker service ls + +# 2. Restart critical services +docker service update --force dokploy +docker service update --force traefik + +# 3. Check DNS resolution +curl -I https://ghcr.io + +# 4. Verify storage +df -h +docker system df +``` + +### Service Restoration +```bash +# Restore from backup if needed +docker volume ls +docker volume restore + +# Re-deploy from last known good state +git log --oneline -10 +git checkout +``` + +--- + +**Last Updated**: 2025-11-29 +**Maintainer**: sirtimbly +**Environment**: Production Docker Swarm Cluster \ No newline at end of file