Files
bendtstudio0-static/agents.md
Tim Bendt 88cf1003a8 Add comprehensive homelab infrastructure documentation
- Document Docker data migration from /var/lib/docker to /mnt/nvme/docker
- Record nginx configuration fixes for Pancake pretty URLs and static assets
- Document DNS resolution fix using dnsmasq for container builds
- Add troubleshooting procedures for Docker, DNS, networking, and deployments
- Include monitoring commands and emergency recovery procedures
- Document service configurations for Gitea, Dokploy, Traefik, Swarmpit, Bendtstudio
- Add configuration file examples and future improvement suggestions

This provides complete reference for homelab maintenance and troubleshooting.
2025-11-29 17:25:25 -05:00

7.5 KiB

Homelab Infrastructure & Deployment Notes

Overview

This document contains comprehensive notes about the homelab Docker Swarm cluster setup, troubleshooting procedures, and deployment configurations.

Infrastructure Architecture

Host Configuration

  • Main Node: tpi-n1 (192.168.2.130) - Docker Swarm Manager
  • Worker Node: tpi-n2 - Docker Swarm Worker
  • Storage:
    • Main drive: /dev/mmcblk0p2 (29GB) - System and applications
    • NVMe drive: /mnt/nvme (916GB) - Docker data storage

Docker Configuration

  • Data Directory: /mnt/nvme/docker (moved from /var/lib/docker)
  • DNS Configuration: Local dnsmasq forwarder at 127.0.0.1
  • External DNS: 8.8.8.8, 8.8.4.4, 1.1.1.1
  • Docker Daemon Config: /etc/docker/daemon.json

Services Running

  • Traefik: Load balancer and SSL termination
  • Dokploy: Deployment management (port 3000)
  • Gitea: Git server (port 2222 for SSH)
  • Swarmpit: Docker Swarm management UI (port 888)
  • Bendtstudio: Main web application (5 replicas)
  • MariaDB: Database for Pancake application

Major Maintenance Tasks

1. Docker Data Migration (Completed )

Problem: Main drive 100% full (28G/29G used) Solution: Moved 19GB Docker data to NVMe drive

Commands Used:

# Stop Docker services
sudo systemctl stop docker docker.socket

# Move data to NVMe
sudo cp -a /var/lib/docker /mnt/nvme/

# Update Docker config
echo '{"data-root": "/mnt/nvme/docker"}' | sudo tee /etc/docker/daemon.json

# Restart Docker
sudo systemctl start docker

Result:

  • Freed 15GB on main drive (100% → 46% usage)
  • Docker data on fast NVMe storage
  • All services maintained without downtime

2. Nginx Configuration Fix (Completed )

Problem: Pancake static assets returning 404, pretty URLs not working Root Cause: Apache .htaccess rules not translated to nginx properly

Files Modified:

  • nginx.template.conf - Main configuration template
  • All running containers - Updated nginx configuration

Key Changes:

# Fixed static asset paths
location /pancake/third_party {
    alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
    # ... caching headers
}

# Added pretty URL support
location /pancake {
    try_files $uri $uri/ @pancake_fallback;
}

location @pancake_fallback {
    rewrite ^.*$ /pancake/index.php last;
}

# Fixed PHP handling
location ~ \.php$ {
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    # ... other fastcgi params
}

Result:

  • Static assets serving correctly
  • Pretty URLs working (bendtstudio.com/pancake/admin)
  • Apache .htaccess functionality replicated in nginx

3. DNS Resolution Fix (Completed )

Problem: Docker containers couldn't resolve ghcr.io causing build failures Root Cause: No proper DNS forwarding for containers

Solution Implemented:

# Install dnsmasq
sudo apt install -y dnsmasq

# Configure DNS forwarding
cat > /etc/dnsmasq.conf << EOF
server=8.8.8.8
server=8.8.4.4
server=1.1.1.1
listen-address=127.0.0.1
bind-interfaces
EOF

# Start dnsmasq
sudo systemctl enable dnsmasq && sudo systemctl start dnsmasq

# Update system DNS
echo 'nameserver 127.0.0.1' | sudo tee /etc/resolv.conf

# Update Docker to use local DNS
echo '{"data-root": "/mnt/nvme/docker", "dns": ["127.0.0.1"]}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker

Result:

  • ghcr.io resolution working
  • Docker builds successful
  • Deployments working via Dokploy UI

Service-Specific Notes

Gitea (Git Server)

  • SSH Access: git@gitea.bendtstudio.com:2222 or git@gitea.bendtstudio.com:username/repo.git
  • Web UI: https://gitea.bendtstudio.com
  • Status: Working correctly, SSH authentication successful
  • Note: "No shell access" message is normal for Gitea

Dokploy (Deployment Management)

  • Web UI: https://dokploy.bendtstudio.com (port 3000)
  • Usage:
    1. Push code to Gitea repository
    2. Dokploy automatically detects new commits
    3. Trigger manual redeployment via web UI
    4. Monitor build logs in real-time
  • Build Process: Uses Nixpacks for containerization
  • Current Status: Working with DNS fix

Bendtstudio Web Application

Troubleshooting Procedures

Docker Issues

# Check Docker status
sudo systemctl status docker

# Check container logs
docker logs <container_name>

# Check service status
docker service ls

# Restart Docker daemon
sudo systemctl restart docker

DNS Issues

# Check DNS resolution
nslookup ghcr.io

# Test from container
docker exec <container> curl -I https://ghcr.io

# Restart dnsmasq
sudo systemctl restart dnsmasq

# Check Docker DNS config
cat /etc/docker/daemon.json

Network Issues

# Check port mapping
docker port <container_name>

# Test external access
nc -v <host_ip> <port>

# Check Traefik routes
curl -s http://localhost:8080/api/http/routers

# Check container networks
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'

Application Deployment Issues

# Check deployment logs
docker service logs <service_name> --tail 50

# Force redeployment
docker service update --force <service_name>

# Check service configuration
docker service inspect <service_name>

# Scale services
docker service scale <service_name>=<replicas>

Monitoring Commands

System Resources

# Disk usage
df -h

# Memory usage
free -h

# Docker space usage
docker system df

# Container resource usage
docker stats

Docker Swarm Health

# Check swarm status
docker node ls

# Check service health
docker service ls

# Check individual services
docker service ps <service_name>

Configuration Files

Docker Daemon Configuration

{
  "data-root": "/mnt/nvme/docker",
  "dns": ["127.0.0.1"]
}

Nginx Template Key Sections

# Static assets for pancake/third_party
location /pancake/third_party {
    alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
    expires 1y;
    add_header Cache-Control "public, immutable";
}

# Pretty URLs for Pancake
location /pancake {
    try_files $uri $uri/ @pancake_fallback;
}

location @pancake_fallback {
    rewrite ^.*$ /pancake/index.php last;
}

Future Improvements

DNS Enhancement

  • Configure dnsmasq to forward internal domains to local DNS server
  • Set up conditional forwarding for homelab services
  • Add DNS caching for better performance

Backup Strategy

  • Regular backups of Docker volumes to NVMe
  • Automated snapshots of configuration files
  • Git repository tracking of all changes

Monitoring

  • Set up Prometheus/Grafana for system monitoring
  • Log aggregation for better troubleshooting
  • Alert configuration for critical services

Emergency Procedures

Full System Recovery

# 1. Check all services
docker service ls

# 2. Restart critical services
docker service update --force dokploy
docker service update --force traefik

# 3. Check DNS resolution
curl -I https://ghcr.io

# 4. Verify storage
df -h
docker system df

Service Restoration

# Restore from backup if needed
docker volume ls
docker volume restore <volume_name> <backup_file>

# Re-deploy from last known good state
git log --oneline -10
git checkout <commit_hash>

Last Updated: 2025-11-29 Maintainer: sirtimbly Environment: Production Docker Swarm Cluster