# Infrastructure Management This directory contains backups, scripts, and documentation for managing the homelab infrastructure. ## Directory Structure ``` infrastructure/ ├── compose/ # Docker Compose files backed up from cluster ├── stacks/ # Docker Stack definitions ├── traefik/ # Traefik configuration backups ├── scripts/ # Management and backup scripts ├── backups/ # Critical data backups (created by scripts) └── BACKUP_MANIFEST.md # Auto-generated backup manifest ``` ## Quick Start ### 1. Backup Compose Files Run this to back up all compose configurations from the cluster: ```bash ./scripts/backup-compose-files.sh ``` This will: - Copy all Dokploy compose files from `/etc/dokploy/compose/` - Copy Traefik configuration - Copy stack files - Generate a backup manifest ### 2. Backup Critical Data Run this to back up databases and application data: ```bash ./scripts/backup-critical-data.sh ``` This will: - Backup PostgreSQL databases (Dokploy, Immich, BewCloud) - Backup MariaDB databases (Pancake) - Backup application volumes (Memos, Gitea) - Clean up old backups (30+ days) - Generate a backup report ## Automated Backups ### Set up cron jobs on your local machine: ```bash # Edit crontab crontab -e # Add these lines: # Backup compose files daily at 2 AM 0 2 * * * cd /Users/timothy.bendt/developer/cloud-compose/infrastructure/scripts && ./backup-compose-files.sh >> /var/log/homelab-backup.log 2>&1 # Backup critical data daily at 3 AM 0 3 * * * cd /Users/timothy.bendt/developer/cloud-compose/infrastructure/scripts && ./backup-critical-data.sh >> /var/log/homelab-backup.log 2>&1 ``` ### Or run manually whenever you make changes: ```bash # After modifying any service cd /Users/timothy.bendt/developer/cloud-compose ./infrastructure/scripts/backup-compose-files.sh git add infrastructure/ git commit -m "Backup infrastructure configs" git push ``` ## Restore Procedures ### Restore a Compose File 1. Copy the compose file from `infrastructure/compose//docker-compose.yml` 2. Upload via Dokploy UI 3. Deploy ### Restore a Database ```bash # Example: Restore Dokploy database scp infrastructure/backups/dokploy-postgres-dokploy-2026-02-09.sql ubuntu@192.168.2.130:/tmp/ ssh ubuntu@192.168.2.130 "docker exec -i dokploy-postgres.1. psql -U postgres dokploy < /tmp/dokploy-postgres-dokploy-2026-02-09.sql" ``` ### Restore Volume Data ```bash # Example: Restore Memos data scp infrastructure/backups/bewcloud-memos-ssogxn-memos-data-2026-02-09.tar.gz ubuntu@192.168.2.130:/tmp/ ssh ubuntu@192.168.2.130 "docker run --rm -v bewcloud-memos-ssogxn_memos_data:/data -v /tmp:/backup alpine sh -c 'cd /data && tar xzf /backup/bewcloud-memos-ssogxn-memos-data-2026-02-09.tar.gz'" ``` ## SSH Access ### Controller (tpi-n1) ```bash ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 ``` ### Worker (tpi-n2) ```bash ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 ``` ### NAS (node-nas) ```bash ssh tim@192.168.2.18 ``` ## Useful Commands ### Check Service Status ```bash ssh ubuntu@192.168.2.130 "docker service ls" ``` ### View Service Logs ```bash ssh ubuntu@192.168.2.130 "docker service logs --tail 100 -f" ``` ### Scale a Service ```bash ssh ubuntu@192.168.2.130 "docker service scale =" ``` ### Check Node Status ```bash ssh ubuntu@192.168.2.130 "docker node ls" ``` ## Web Interfaces | Service | URL | Purpose | |---------|-----|---------| | Dokploy | http://192.168.2.130:3000 | Container management | | Swarmpit | http://192.168.2.130:888 | Swarm monitoring | | Traefik | http://192.168.2.130:8080 | Reverse proxy dashboard | | MinIO | http://192.168.2.18:9001 | Object storage console | ## Backup Storage ### Local Backups are stored in `infrastructure/backups/` with date stamps. ### Offsite (Recommended) Consider copying backups to: - MinIO bucket (`backups/`) - External hard drive - Cloud storage (AWS S3, etc.) - Another server Example: ```bash # Copy to MinIO mc cp infrastructure/backups/* minio/backups/ ``` ## Maintenance Checklist ### Daily - [ ] Check backup logs for errors - [ ] Verify critical services are running ### Weekly - [ ] Review Swarmpit dashboard - [ ] Check disk usage on all nodes - [ ] Review backup integrity ### Monthly - [ ] Test restore procedures - [ ] Update documentation - [ ] Review and update services - [ ] Clean up unused images/volumes ### Quarterly - [ ] Full disaster recovery drill - [ ] Security audit - [ ] Update base images - [ ] Review access controls --- *Infrastructure Management Guide - February 2026*