Files
cloud-compose/QUICKSTART.md
2026-02-09 10:59:14 -05:00

291 lines
6.4 KiB
Markdown

# Home Lab Quick Start Guide
**Welcome to your documented homelab!** This guide will help you manage and maintain your cluster.
---
## 🎯 What's Been Set Up
**Cluster Audit Complete** - All services documented
**Memos Fixed** - Running at https://memos.bendtstudio.com
**WebStatic Scaled** - Down to 1 replica, stable
**Documentation Created** - Service catalog and guides
**Backup Scripts** - Automated backup tools ready
**Infrastructure Repo** - All configs in version control
---
## 🚀 Quick Access
### Your Services (via HTTPS)
| Service | URL | Status |
|---------|-----|--------|
| **Memos** | https://memos.bendtstudio.com | ✅ Working |
| **Gitea** | https://gitea.bendtstudio.com:3000 | ✅ Git server |
| **BewCloud** | Check Traefik routes | ✅ Cloud storage |
| **Immich** | Check Traefik routes | ✅ Photos |
| **SilverBullet** | Check Traefik routes | ✅ Notes |
### Management Interfaces
| Tool | URL | Use For |
|------|-----|---------|
| **Dokploy** | http://192.168.2.130:3000 | Deploy containers |
| **Swarmpit** | http://192.168.2.130:888 | Monitor cluster |
| **Traefik** | http://192.168.2.130:8080 | View routes |
| **MinIO** | http://192.168.2.18:9001 | Object storage |
| **DNS** | http://192.168.2.130:5380 | Manage DNS |
---
## 📁 Important Files
### In This Repo
```
cloud-compose/
├── HOMELAB_AUDIT.md # Complete cluster documentation
├── SERVICE_CATALOG.md # All services documented
├── ACTION_PLAN.md # Step-by-step improvement plan
├── QUICK_REFERENCE.md # Quick access commands
├── memos-compose.yml # Fixed memos configuration
├── infrastructure/ # Backup scripts & configs
│ ├── README.md # How to use backups
│ ├── scripts/ # Backup automation
│ │ ├── backup-compose-files.sh
│ │ └── backup-critical-data.sh
│ └── compose/ # Backed up compose files
```
---
## 🔄 Regular Maintenance
### Daily (Recommended)
```bash
# Check cluster health
ssh ubuntu@192.168.2.130 "docker service ls"
# Quick backup
./infrastructure/scripts/backup-compose-files.sh
```
### Weekly
```bash
# Full backup
./infrastructure/scripts/backup-critical-data.sh
# Review Swarmpit dashboard
open http://192.168.2.130:888
# Commit changes to git
git add .
git commit -m "Weekly backup $(date +%Y-%m-%d)"
git push
```
---
## 🛠️ Common Tasks
### Deploy a New Service
1. Create `docker-compose.yml` file
2. Add Traefik labels for HTTPS:
```yaml
labels:
- traefik.http.routers.myapp.rule=Host(`myapp.bendtstudio.com`)
- traefik.http.routers.myapp.tls.certresolver=letsencrypt
- traefik.enable=true
```
3. Upload via Dokploy UI
4. Deploy
5. Add DNS record in Technitium DNS
6. Commit compose file to this repo
### Backup Everything
```bash
# Run both backup scripts
cd infrastructure/scripts
./backup-compose-files.sh
./backup-critical-data.sh
# Commit to git
cd ../..
git add infrastructure/
git commit -m "Backup $(date +%Y-%m-%d)"
git push
```
### Check Service Logs
```bash
# View logs
ssh ubuntu@192.168.2.130 "docker service logs <service-name> --tail 50 -f"
# Common services:
# - bewcloud-memos-ssogxn-memos
# - bendtstudio-webstatic-iq9evl
# - dokploy
```
### Restart a Service
```bash
# Via SSH
ssh ubuntu@192.168.2.130 "docker service update --force <service-name>"
# Or via Dokploy UI
open http://192.168.2.130:3000
```
---
## 🆘 Troubleshooting
### Service Won't Start
```bash
# Check logs
ssh ubuntu@192.168.2.130 "docker service logs <service-name> --tail 100"
# Common fixes:
# 1. Permission issues (like memos had)
# 2. Missing environment variables
# 3. Port conflicts
# 4. Image not found
```
### Can't Access Website
1. Check if container is running:
```bash
ssh ubuntu@192.168.2.130 "docker ps | grep <service>"
```
2. Check Traefik routes:
```bash
open http://192.168.2.130:8080
```
3. Check DNS:
```bash
dig memos.bendtstudio.com @192.168.2.130
```
### Database Issues
```bash
# Check database container
ssh ubuntu@192.168.2.130 "docker ps | grep postgres"
# Check database logs
ssh ubuntu@192.168.2.130 "docker service logs dokploy-postgres --tail 50"
```
---
## 📊 Current Status
### Nodes
- **tpi-n1** (192.168.2.130) - Manager ✅
- **tpi-n2** (192.168.2.19) - Worker ✅
- **node-nas** (192.168.2.18) - Storage ✅
### Running Services
```bash
# Check all services
ssh ubuntu@192.168.2.130 "docker service ls"
```
### Known Issues
- **Syncthing** - Stopped (0 replicas), start when needed:
```bash
docker service scale syncthing=1
```
---
## 📚 Documentation
- **Service Catalog** - `SERVICE_CATALOG.md`
- **Infrastructure Guide** - `infrastructure/README.md`
- **Audit Report** - `HOMELAB_AUDIT.md`
- **Action Plan** - `ACTION_PLAN.md`
---
## 🎓 Learning Resources
### Docker Swarm
- Official Docs: https://docs.docker.com/engine/swarm/
- Service Management: `docker service --help`
### Dokploy
- Documentation: https://dokploy.com/docs/
- UI at: http://192.168.2.130:3000
### Traefik
- Routing: http://192.168.2.130:8080/dashboard/
- Labels Reference: https://doc.traefik.io/traefik/routing/providers/docker/
---
## 📞 Quick Commands Reference
```bash
# SSH to nodes
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.130 # Controller
ssh -i ~/.ssh/id_ed25519 ubuntu@192.168.2.19 # Worker
ssh tim@192.168.2.18 # NAS
# Check cluster
docker node ls
docker service ls
docker stack ls
# View logs
docker service logs <service> --tail 100 -f
# Scale service
docker service scale <service>=<replicas>
# Update service
docker service update --force <service>
```
---
## ✅ Next Steps
### High Priority (This Week)
- [ ] Test backup scripts
- [ ] Set up automated daily backups (cron)
- [ ] Verify all services have working HTTPS
- [ ] Add missing compose files to version control
### Medium Priority (This Month)
- [ ] Create disaster recovery procedures
- [ ] Set up monitoring alerts
- [ ] Document all environment variables
- [ ] Test restore procedures
### Low Priority (When Time)
- [ ] Set up Prometheus/Grafana
- [ ] Create network diagrams
- [ ] Automate SSL certificate renewal checks
- [ ] Security audit
---
**Questions or issues?** Check the detailed documentation in:
- `HOMELAB_AUDIT.md` - Complete technical details
- `SERVICE_CATALOG.md` - Service-specific information
- `infrastructure/README.md` - Backup and management guide
*Last Updated: February 9, 2026*