Compare commits
2 Commits
040f0d2d15
...
73e80f6533
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
73e80f6533 | ||
|
|
88cf1003a8 |
322
agents.md
Normal file
322
agents.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Homelab Infrastructure & Deployment Notes
|
||||
|
||||
## Overview
|
||||
This document contains comprehensive notes about the homelab Docker Swarm cluster setup, troubleshooting procedures, and deployment configurations.
|
||||
|
||||
## Infrastructure Architecture
|
||||
|
||||
### Host Configuration
|
||||
- **Main Node**: `tpi-n1` (192.168.2.130) - Docker Swarm Manager
|
||||
- **Worker Node**: `tpi-n2` - Docker Swarm Worker
|
||||
- **Storage**:
|
||||
- Main drive: `/dev/mmcblk0p2` (29GB) - System and applications
|
||||
- NVMe drive: `/mnt/nvme` (916GB) - Docker data storage
|
||||
|
||||
### Docker Configuration
|
||||
- **Data Directory**: `/mnt/nvme/docker` (moved from `/var/lib/docker`)
|
||||
- **DNS Configuration**: Local dnsmasq forwarder at 127.0.0.1
|
||||
- **External DNS**: 8.8.8.8, 8.8.4.4, 1.1.1.1
|
||||
- **Docker Daemon Config**: `/etc/docker/daemon.json`
|
||||
|
||||
### Services Running
|
||||
- **Traefik**: Load balancer and SSL termination
|
||||
- **Dokploy**: Deployment management (port 3000)
|
||||
- **Gitea**: Git server (port 2222 for SSH)
|
||||
- **Swarmpit**: Docker Swarm management UI (port 888)
|
||||
- **Bendtstudio**: Main web application (5 replicas)
|
||||
- **MariaDB**: Database for Pancake application
|
||||
|
||||
## Major Maintenance Tasks
|
||||
|
||||
### 1. Docker Data Migration (Completed ✅)
|
||||
**Problem**: Main drive 100% full (28G/29G used)
|
||||
**Solution**: Moved 19GB Docker data to NVMe drive
|
||||
|
||||
**Commands Used**:
|
||||
```bash
|
||||
# Stop Docker services
|
||||
sudo systemctl stop docker docker.socket
|
||||
|
||||
# Move data to NVMe
|
||||
sudo cp -a /var/lib/docker /mnt/nvme/
|
||||
|
||||
# Update Docker config
|
||||
echo '{"data-root": "/mnt/nvme/docker"}' | sudo tee /etc/docker/daemon.json
|
||||
|
||||
# Restart Docker
|
||||
sudo systemctl start docker
|
||||
```
|
||||
|
||||
**Result**:
|
||||
- Freed 15GB on main drive (100% → 46% usage)
|
||||
- Docker data on fast NVMe storage
|
||||
- All services maintained without downtime
|
||||
|
||||
### 2. Nginx Configuration Fix (Completed ✅)
|
||||
**Problem**: Pancake static assets returning 404, pretty URLs not working
|
||||
**Root Cause**: Apache .htaccess rules not translated to nginx properly
|
||||
|
||||
**Files Modified**:
|
||||
- `nginx.template.conf` - Main configuration template
|
||||
- All running containers - Updated nginx configuration
|
||||
|
||||
**Key Changes**:
|
||||
```nginx
|
||||
# Fixed static asset paths
|
||||
location /pancake/third_party {
|
||||
alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
|
||||
# ... caching headers
|
||||
}
|
||||
|
||||
# Added pretty URL support
|
||||
location /pancake {
|
||||
try_files $uri $uri/ @pancake_fallback;
|
||||
}
|
||||
|
||||
location @pancake_fallback {
|
||||
rewrite ^.*$ /pancake/index.php last;
|
||||
}
|
||||
|
||||
# Fixed PHP handling
|
||||
location ~ \.php$ {
|
||||
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
|
||||
# ... other fastcgi params
|
||||
}
|
||||
```
|
||||
|
||||
**Result**:
|
||||
- ✅ Static assets serving correctly
|
||||
- ✅ Pretty URLs working (bendtstudio.com/pancake/admin)
|
||||
- ✅ Apache .htaccess functionality replicated in nginx
|
||||
|
||||
### 3. DNS Resolution Fix (Completed ✅)
|
||||
**Problem**: Docker containers couldn't resolve `ghcr.io` causing build failures
|
||||
**Root Cause**: No proper DNS forwarding for containers
|
||||
|
||||
**Solution Implemented**:
|
||||
```bash
|
||||
# Install dnsmasq
|
||||
sudo apt install -y dnsmasq
|
||||
|
||||
# Configure DNS forwarding
|
||||
cat > /etc/dnsmasq.conf << EOF
|
||||
server=8.8.8.8
|
||||
server=8.8.4.4
|
||||
server=1.1.1.1
|
||||
listen-address=127.0.0.1
|
||||
bind-interfaces
|
||||
EOF
|
||||
|
||||
# Start dnsmasq
|
||||
sudo systemctl enable dnsmasq && sudo systemctl start dnsmasq
|
||||
|
||||
# Update system DNS
|
||||
echo 'nameserver 127.0.0.1' | sudo tee /etc/resolv.conf
|
||||
|
||||
# Update Docker to use local DNS
|
||||
echo '{"data-root": "/mnt/nvme/docker", "dns": ["127.0.0.1"]}' | sudo tee /etc/docker/daemon.json
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
**Result**:
|
||||
- ✅ ghcr.io resolution working
|
||||
- ✅ Docker builds successful
|
||||
- ✅ Deployments working via Dokploy UI
|
||||
|
||||
## Service-Specific Notes
|
||||
|
||||
### Gitea (Git Server)
|
||||
- **SSH Access**: `git@gitea.bendtstudio.com:2222` or `git@gitea.bendtstudio.com:username/repo.git`
|
||||
- **Web UI**: https://gitea.bendtstudio.com
|
||||
- **Status**: Working correctly, SSH authentication successful
|
||||
- **Note**: "No shell access" message is normal for Gitea
|
||||
|
||||
### Dokploy (Deployment Management)
|
||||
- **Web UI**: https://dokploy.bendtstudio.com (port 3000)
|
||||
- **Usage**:
|
||||
1. Push code to Gitea repository
|
||||
2. Dokploy automatically detects new commits
|
||||
3. Trigger manual redeployment via web UI
|
||||
4. Monitor build logs in real-time
|
||||
- **Build Process**: Uses Nixpacks for containerization
|
||||
- **Current Status**: ✅ Working with DNS fix
|
||||
|
||||
### Bendtstudio Web Application
|
||||
- **Domain**: https://bendtstudio.com
|
||||
- **Pancake App**: https://bendtstudio.com/pancake
|
||||
- **Replicas**: 5 containers for load balancing
|
||||
- **Static Assets**: All serving correctly from `/pancake/third_party/`
|
||||
- **Database**: MariaDB container for Pancake data
|
||||
|
||||
## Troubleshooting Procedures
|
||||
|
||||
### Docker Issues
|
||||
```bash
|
||||
# Check Docker status
|
||||
sudo systemctl status docker
|
||||
|
||||
# Check container logs
|
||||
docker logs <container_name>
|
||||
|
||||
# Check service status
|
||||
docker service ls
|
||||
|
||||
# Restart Docker daemon
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### DNS Issues
|
||||
```bash
|
||||
# Check DNS resolution
|
||||
nslookup ghcr.io
|
||||
|
||||
# Test from container
|
||||
docker exec <container> curl -I https://ghcr.io
|
||||
|
||||
# Restart dnsmasq
|
||||
sudo systemctl restart dnsmasq
|
||||
|
||||
# Check Docker DNS config
|
||||
cat /etc/docker/daemon.json
|
||||
```
|
||||
|
||||
### Network Issues
|
||||
```bash
|
||||
# Check port mapping
|
||||
docker port <container_name>
|
||||
|
||||
# Test external access
|
||||
nc -v <host_ip> <port>
|
||||
|
||||
# Check Traefik routes
|
||||
curl -s http://localhost:8080/api/http/routers
|
||||
|
||||
# Check container networks
|
||||
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
|
||||
```
|
||||
|
||||
### Application Deployment Issues
|
||||
```bash
|
||||
# Check deployment logs
|
||||
docker service logs <service_name> --tail 50
|
||||
|
||||
# Force redeployment
|
||||
docker service update --force <service_name>
|
||||
|
||||
# Check service configuration
|
||||
docker service inspect <service_name>
|
||||
|
||||
# Scale services
|
||||
docker service scale <service_name>=<replicas>
|
||||
```
|
||||
|
||||
## Monitoring Commands
|
||||
|
||||
### System Resources
|
||||
```bash
|
||||
# Disk usage
|
||||
df -h
|
||||
|
||||
# Memory usage
|
||||
free -h
|
||||
|
||||
# Docker space usage
|
||||
docker system df
|
||||
|
||||
# Container resource usage
|
||||
docker stats
|
||||
```
|
||||
|
||||
### Docker Swarm Health
|
||||
```bash
|
||||
# Check swarm status
|
||||
docker node ls
|
||||
|
||||
# Check service health
|
||||
docker service ls
|
||||
|
||||
# Check individual services
|
||||
docker service ps <service_name>
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### Docker Daemon Configuration
|
||||
```json
|
||||
{
|
||||
"data-root": "/mnt/nvme/docker",
|
||||
"dns": ["127.0.0.1"]
|
||||
}
|
||||
```
|
||||
|
||||
### Nginx Template Key Sections
|
||||
```nginx
|
||||
# Static assets for pancake/third_party
|
||||
location /pancake/third_party {
|
||||
alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
|
||||
expires 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
}
|
||||
|
||||
# Pretty URLs for Pancake
|
||||
location /pancake {
|
||||
try_files $uri $uri/ @pancake_fallback;
|
||||
}
|
||||
|
||||
location @pancake_fallback {
|
||||
rewrite ^.*$ /pancake/index.php last;
|
||||
}
|
||||
```
|
||||
|
||||
## Future Improvements
|
||||
|
||||
### DNS Enhancement
|
||||
- Configure dnsmasq to forward internal domains to local DNS server
|
||||
- Set up conditional forwarding for homelab services
|
||||
- Add DNS caching for better performance
|
||||
|
||||
### Backup Strategy
|
||||
- Regular backups of Docker volumes to NVMe
|
||||
- Automated snapshots of configuration files
|
||||
- Git repository tracking of all changes
|
||||
|
||||
### Monitoring
|
||||
- Set up Prometheus/Grafana for system monitoring
|
||||
- Log aggregation for better troubleshooting
|
||||
- Alert configuration for critical services
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Full System Recovery
|
||||
```bash
|
||||
# 1. Check all services
|
||||
docker service ls
|
||||
|
||||
# 2. Restart critical services
|
||||
docker service update --force dokploy
|
||||
docker service update --force traefik
|
||||
|
||||
# 3. Check DNS resolution
|
||||
curl -I https://ghcr.io
|
||||
|
||||
# 4. Verify storage
|
||||
df -h
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Service Restoration
|
||||
```bash
|
||||
# Restore from backup if needed
|
||||
docker volume ls
|
||||
docker volume restore <volume_name> <backup_file>
|
||||
|
||||
# Re-deploy from last known good state
|
||||
git log --oneline -10
|
||||
git checkout <commit_hash>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-11-29
|
||||
**Maintainer**: sirtimbly
|
||||
**Environment**: Production Docker Swarm Cluster
|
||||
@@ -82,7 +82,7 @@ http {
|
||||
|
||||
# Static assets for pancake/third_party
|
||||
location /pancake/third_party {
|
||||
alias ${NIXPACKS_PHP_ROOT_DIR}/pancake/third_party;
|
||||
alias /app/pancake/third_party;
|
||||
expires 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
|
||||
|
||||
Reference in New Issue
Block a user