diff --git a/GITHUB_RUNNERS_PLAN.md b/GITHUB_RUNNERS_PLAN.md new file mode 100644 index 0000000..f1f9d0e --- /dev/null +++ b/GITHUB_RUNNERS_PLAN.md @@ -0,0 +1,685 @@ +# GitHub Actions Self-Hosted Runners Deployment Plan + +## Overview + +**Goal:** Deploy GitHub Actions self-hosted runners on your Docker Swarm cluster to run CI/CD workflows with unlimited minutes, custom environments, and access to your homelab resources. + +**Architecture:** Docker-based runners deployed as a Swarm service with auto-scaling capabilities. + +--- + +## Architecture Decision + +### Option 1: Docker Container Runners (Recommended for your setup) +- ✅ Runs in Docker containers on your existing cluster +- ✅ Scales horizontally by adding/removing containers +- ✅ Uses your existing infrastructure (tpi-n1, tpi-n2, node-nas) +- ✅ Easy to manage through Docker Swarm +- ✅ ARM64 and x86_64 support for multi-arch builds + +### Option 2: VM/Physical Runners (Alternative) +- Runners installed directly on VMs or bare metal +- More isolated but harder to manage +- Not recommended for your containerized setup + +**Decision:** Use Docker Container Runners (Option 1) with multi-arch support. + +--- + +## Deployment Architecture + +``` +GitHub Repository + │ + │ Webhook/REST API + ▼ +┌─────────────────────────────┐ +│ GitHub Actions Service │ +└─────────────────────────────┘ + │ + │ Job Request + ▼ +┌─────────────────────────────┐ +│ Your Docker Swarm Cluster │ +│ │ +│ ┌─────────────────────┐ │ +│ │ Runner Service │ │ +│ │ (Multiple Replicas)│ │ +│ │ │ │ +│ │ ┌─────┐ ┌─────┐ │ │ +│ │ │ ARM │ │x86_64│ │ │ +│ │ │64 │ │ │ │ │ +│ │ └─────┘ └─────┘ │ │ +│ └─────────────────────┘ │ +│ │ +│ ┌─────────────────────┐ │ +│ │ Docker-in-Docker │ │ +│ │ (for Docker builds)│ │ +│ └─────────────────────┘ │ +│ │ +└─────────────────────────────┘ +``` + +--- + +## Phase 1: Planning & Preparation + +### Step 1: Determine Requirements + +**Use Cases:** +- [ ] Build and test applications +- [ ] Deploy to your homelab (Kubernetes/Docker Swarm) +- [ ] Run ARM64 builds (for Raspberry Pi/ARM apps) +- [ ] Run x86_64 builds (standard applications) +- [ ] Access private network resources (databases, internal APIs) +- [ ] Build Docker images and push to your Gitea registry + +**Resource Requirements per Runner:** +- CPU: 2+ cores recommended +- Memory: 4GB+ RAM per runner +- Disk: 20GB+ for workspace and Docker layers +- Network: Outbound HTTPS to GitHub + +**Current Cluster Capacity:** +- tpi-n1: 8 cores ARM64, 8GB RAM (Manager) +- tpi-n2: 8 cores ARM64, 8GB RAM (Worker) +- node-nas: 2 cores x86_64, 8GB RAM (Storage) + +**Recommended Allocation:** +- 2 runners on tpi-n1 (ARM64) +- 2 runners on tpi-n2 (ARM64) +- 1 runner on node-nas (x86_64) + +### Step 2: GitHub Configuration + +**Choose Runner Level:** +- [ ] **Repository-level** - Dedicated to specific repo (recommended to start) +- [ ] **Organization-level** - Shared across org repos +- [ ] **Enterprise-level** - Shared across enterprise + +**For your use case:** Start with **repository-level** runners, then expand to organization-level if needed. + +**Required GitHub Settings:** +1. Go to: `Settings > Actions > Runners > New self-hosted runner` +2. Note the **Registration Token** (expires after 1 hour) +3. Note the **Runner Group** (default: "Default") +4. Configure labels (e.g., `homelab`, `arm64`, `x86_64`, `self-hosted`) + +--- + +## Phase 2: Infrastructure Setup + +### Step 3: Create Docker Network + +```bash +# On controller (tpi-n1) +ssh ubuntu@192.168.2.130 + +# Create overlay network for runners +docker network create --driver overlay --attachable github-runners-network + +# Verify +docker network ls | grep github +``` + +### Step 4: Create Persistent Storage + +```bash +# Create volume for runner cache (shared across runners) +docker volume create github-runner-cache + +# Create volume for Docker build cache +docker volume create github-runner-docker-cache +``` + +### Step 5: Prepare Node Labels + +```bash +# Verify node labels +ssh ubuntu@192.168.2.130 +docker node ls --format '{{.Hostname}} {{.Labels}}' + +# Expected output: +# tpi-n1 map[infra:true role:storage storage:high] +# tpi-n2 map[role:compute] +# node-nas map[type:nas] + +# Add architecture labels if missing: +docker node update --label-add arch=arm64 tpi-n1 +docker node update --label-add arch=arm64 tpi-n2 +docker node update --label-add arch=x86_64 node-nas +``` + +--- + +## Phase 3: Runner Deployment + +### Step 6: Create Environment File + +Create `.env` file: +```bash +# GitHub Configuration +GITHUB_TOKEN=your_github_personal_access_token +GITHUB_OWNER=your-github-username-or-org +GITHUB_REPO=your-repository-name # Leave empty for org-level + +# Runner Configuration +RUNNER_NAME_PREFIX=homelab +RUNNER_LABELS=self-hosted,homelab,linux +RUNNER_GROUP=Default + +# Docker Configuration +DOCKER_TLS_CERTDIR=/certs + +# Optional: Pre-installed tools +PRE_INSTALL_TOOLS="docker-compose,nodejs,npm,yarn,python3,pip,git" +``` + +### Step 7: Create Docker Compose Stack + +Create `github-runners-stack.yml`: + +```yaml +version: "3.8" + +services: + # ARM64 Runners + runner-arm64: + image: myoung34/github-runner:latest + environment: + - ACCESS_TOKEN=${GITHUB_TOKEN} + - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO} + - RUNNER_NAME=${RUNNER_NAME_PREFIX}-arm64-{{.Task.Slot}} + - RUNNER_WORKDIR=/tmp/runner-work + - RUNNER_GROUP=${RUNNER_GROUP:-Default} + - RUNNER_SCOPE=repo + - LABELS=${RUNNER_LABELS},arm64 + - DISABLE_AUTO_UPDATE=true + - EPHEMERAL=true # One job per container + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - github-runner-cache:/home/runner/cache + - github-runner-docker-cache:/var/lib/docker + networks: + - github-runners-network + - dokploy-network + deploy: + mode: replicated + replicas: 2 + placement: + constraints: + - node.labels.arch == arm64 + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + privileged: true # Required for Docker-in-Docker + + # x86_64 Runners + runner-x86_64: + image: myoung34/github-runner:latest + environment: + - ACCESS_TOKEN=${GITHUB_TOKEN} + - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO} + - RUNNER_NAME=${RUNNER_NAME_PREFIX}-x86_64-{{.Task.Slot}} + - RUNNER_WORKDIR=/tmp/runner-work + - RUNNER_GROUP=${RUNNER_GROUP:-Default} + - RUNNER_SCOPE=repo + - LABELS=${RUNNER_LABELS},x86_64 + - DISABLE_AUTO_UPDATE=true + - EPHEMERAL=true + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - github-runner-cache:/home/runner/cache + - github-runner-docker-cache:/var/lib/docker + networks: + - github-runners-network + - dokploy-network + deploy: + mode: replicated + replicas: 1 + placement: + constraints: + - node.labels.arch == x86_64 + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + privileged: true + + # Optional: Runner Autoscaler + autoscaler: + image: ghcr.io/actions-runner-controller/actions-runner-controller:latest + environment: + - GITHUB_TOKEN=${GITHUB_TOKEN} + - RUNNER_SCOPE=repo + volumes: + - /var/run/docker.sock:/var/run/docker.sock + networks: + - github-runners-network + deploy: + mode: replicated + replicas: 1 + placement: + constraints: + - node.role == manager + +volumes: + github-runner-cache: + github-runner-docker-cache: + +networks: + github-runners-network: + driver: overlay + dokploy-network: + external: true +``` + +### Step 8: Deploy Runners + +```bash +# Copy files to controller +scp github-runners-stack.yml ubuntu@192.168.2.130:~/ +scp .env ubuntu@192.168.2.130:~/ + +# SSH to controller +ssh ubuntu@192.168.2.130 + +# Load environment +set -a && source .env && set +a + +# Deploy stack +docker stack deploy -c github-runners-stack.yml github-runners + +# Verify deployment +docker stack ps github-runners +docker service ls | grep github +``` + +--- + +## Phase 4: GitHub Integration + +### Step 9: Verify Runners in GitHub + +1. Go to: `https://github.com/[OWNER]/[REPO]/settings/actions/runners` +2. You should see your runners listed as "Idle" +3. Labels should show: `self-hosted`, `homelab`, `linux`, `arm64` or `x86_64` + +### Step 10: Test with Sample Workflow + +Create `.github/workflows/test-self-hosted.yml`: + +```yaml +name: Test Self-Hosted Runners + +on: + push: + branches: [ main ] + workflow_dispatch: + +jobs: + test-arm64: + runs-on: [self-hosted, homelab, arm64] + steps: + - uses: actions/checkout@v4 + + - name: Show runner info + run: | + echo "Architecture: $(uname -m)" + echo "OS: $(uname -s)" + echo "Node: $(hostname)" + echo "CPU: $(nproc)" + echo "Memory: $(free -h | grep Mem)" + + - name: Test Docker + run: | + docker --version + docker info + docker run --rm hello-world + + test-x86_64: + runs-on: [self-hosted, homelab, x86_64] + steps: + - uses: actions/checkout@v4 + + - name: Show runner info + run: | + echo "Architecture: $(uname -m)" + echo "OS: $(uname -s)" + echo "Node: $(hostname)" + + - name: Test access to homelab + run: | + # Test connectivity to your services + curl -s http://gitea.bendtstudio.com:3000 || echo "Gitea not accessible" + curl -s http://192.168.2.130:3000 || echo "Dokploy not accessible" +``` + +--- + +## Phase 5: Security Hardening + +### Step 11: Implement Security Best Practices + +**1. Use Short-Lived Tokens:** +```bash +# Generate a GitHub App instead of PAT for better security +# Or use OpenID Connect (OIDC) for authentication +``` + +**2. Restrict Runner Permissions:** +```yaml +# Add to workflow +jobs: + build: + runs-on: [self-hosted, homelab] + permissions: + contents: read + packages: write # Only if pushing to registry +``` + +**3. Network Isolation:** +```yaml +# Modify stack to use isolated network +networks: + github-runners-network: + driver: overlay + internal: true # No external access except through proxy +``` + +**4. Resource Limits:** +```yaml +# Add to service definition in stack +deploy: + resources: + limits: + cpus: '2' + memory: 4G + reservations: + cpus: '1' + memory: 2G +``` + +### Step 12: Enable Ephemeral Mode + +Ephemeral runners (already configured with `EPHEMERAL=true`) provide better security: +- Each runner handles only one job +- Container is destroyed after job completion +- Fresh environment for every build +- Prevents credential leakage between jobs + +--- + +## Phase 6: Monitoring & Maintenance + +### Step 13: Set Up Monitoring + +**Create monitoring script** (`monitor-runners.sh`): +```bash +#!/bin/bash + +# Check runner status +echo "=== Docker Service Status ===" +docker service ls | grep github-runner + +echo -e "\n=== Runner Containers ===" +docker ps --filter name=github-runner --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" + +echo -e "\n=== Recent Logs ===" +docker service logs github-runners_runner-arm64 --tail 50 +docker service logs github-runners_runner-x86_64 --tail 50 + +echo -e "\n=== Resource Usage ===" +docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep github-runner +``` + +**Create cron job for monitoring:** +```bash +# Add to crontab +crontab -e + +# Check runner health every 5 minutes +*/5 * * * * /home/ubuntu/github-runners/monitor-runners.sh >> /var/log/github-runners.log 2>&1 +``` + +### Step 14: Set Up Log Rotation + +```bash +# Create logrotate config +sudo tee /etc/logrotate.d/github-runners << EOF +/var/log/github-runners.log { + daily + rotate 7 + compress + delaycompress + missingok + notifempty + create 644 ubuntu ubuntu +} +EOF +``` + +### Step 15: Backup Strategy + +```bash +# Create backup script +#!/bin/bash +BACKUP_DIR="/backup/github-runners/$(date +%Y%m%d)" +mkdir -p "$BACKUP_DIR" + +# Backup configuration +cp ~/github-runners-stack.yml "$BACKUP_DIR/" +cp ~/.env "$BACKUP_DIR/" + +# Backup volumes +docker run --rm -v github-runner-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/runner-cache.tar.gz -C /data . +docker run --rm -v github-runner-docker-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/docker-cache.tar.gz -C /data . + +echo "Backup completed: $BACKUP_DIR" +``` + +--- + +## Phase 7: Advanced Configuration + +### Step 16: Cache Optimization + +**Mount host cache directories:** +```yaml +volumes: + - /home/ubuntu/.cache/npm:/root/.npm + - /home/ubuntu/.cache/pip:/root/.cache/pip + - /home/ubuntu/.cache/go-build:/root/.cache/go-build + - /home/ubuntu/.cargo:/root/.cargo +``` + +**Pre-install common tools in custom image** (`Dockerfile.runner`): +```dockerfile +FROM myoung34/github-runner:latest + +# Install common build tools +RUN apt-get update && apt-get install -y \ + build-essential \ + nodejs \ + npm \ + python3 \ + python3-pip \ + golang-go \ + openjdk-17-jdk \ + maven \ + gradle \ + && rm -rf /var/lib/apt/lists/* + +# Install Docker Compose +RUN curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \ + -o /usr/local/bin/docker-compose && \ + chmod +x /usr/local/bin/docker-compose + +# Pre-pull common images +RUN docker pull node:lts-alpine +RUN docker pull python:3.11-slim +``` + +Build and use custom image: +```bash +docker build -t your-registry/github-runner:custom -f Dockerfile.runner . +docker push your-registry/github-runner:custom + +# Update stack to use custom image +``` + +### Step 17: Autoscaling Configuration + +**Use Actions Runner Controller (ARC) for Kubernetes-style autoscaling:** + +```yaml +# Add to stack +autoscaler: + image: ghcr.io/actions-runner-controller/actions-runner-controller:latest + environment: + - GITHUB_TOKEN=${GITHUB_TOKEN} + - GITHUB_APP_ID=${GITHUB_APP_ID} + - GITHUB_APP_INSTALLATION_ID=${GITHUB_APP_INSTALLATION_ID} + - GITHUB_APP_PRIVATE_KEY=/etc/gh-app-key/private-key.pem + volumes: + - /path/to/private-key.pem:/etc/gh-app-key/private-key.pem:ro + - /var/run/docker.sock:/var/run/docker.sock + deploy: + mode: replicated + replicas: 1 + placement: + constraints: + - node.role == manager +``` + +### Step 18: Multi-Repository Setup + +For organization-level runners, update environment: +```bash +# For org-level +RUNNER_SCOPE=org +ORG_NAME=your-organization + +# Remove REPO_URL, use: +ORG_URL=https://github.com/${ORG_NAME} +``` + +--- + +## Phase 8: Troubleshooting Guide + +### Common Issues & Solutions + +**1. Runner shows "Offline" in GitHub:** +```bash +# Check logs +docker service logs github-runners_runner-arm64 + +# Common causes: +# - Expired token (regenerate in GitHub settings) +# - Network connectivity issue +docker exec curl -I https://github.com + +# Restart service +docker service update --force github-runners_runner-arm64 +``` + +**2. Docker-in-Docker not working:** +```bash +# Ensure privileged mode is enabled +# Check Docker socket is mounted +docker exec docker ps + +# If failing, check AppArmor/SELinux +sudo aa-status | grep docker +``` + +**3. Jobs stuck in "Queued":** +```bash +# Check if runners are picking up jobs +docker service ps github-runners_runner-arm64 + +# Verify labels match +docker exec cat /home/runner/.runner | jq '.labels' +``` + +**4. Out of disk space:** +```bash +# Clean up Docker system +docker system prune -a --volumes + +# Clean runner cache +docker volume rm github-runner-docker-cache +docker volume create github-runner-docker-cache +``` + +--- + +## Implementation Checklist + +### Phase 1: Planning +- [ ] Determine which repositories need self-hosted runners +- [ ] Decide on runner count per architecture +- [ ] Generate GitHub Personal Access Token + +### Phase 2: Infrastructure +- [ ] Create Docker network +- [ ] Create persistent volumes +- [ ] Verify node labels + +### Phase 3: Deployment +- [ ] Create `.env` file with GitHub token +- [ ] Create `github-runners-stack.yml` +- [ ] Deploy stack to Docker Swarm +- [ ] Verify runners appear in GitHub UI + +### Phase 4: Testing +- [ ] Create test workflow +- [ ] Run test on ARM64 runner +- [ ] Run test on x86_64 runner +- [ ] Verify Docker builds work +- [ ] Test access to homelab services + +### Phase 5: Security +- [ ] Enable ephemeral mode +- [ ] Set resource limits +- [ ] Review and restrict permissions +- [ ] Set up network isolation + +### Phase 6: Operations +- [ ] Create monitoring script +- [ ] Set up log rotation +- [ ] Create backup script +- [ ] Document maintenance procedures + +--- + +## Cost & Resource Analysis + +**Compared to GitHub-hosted runners:** + +| Feature | GitHub Hosted | Your Self-Hosted | +|---------|---------------|------------------| +| Cost | $0.008/minute Linux | Free (electricity) | +| Minutes | 2,000/month free | Unlimited | +| ARM64 | Limited | Full control | +| Concurrency | 20 jobs | Unlimited | +| Network | Internet only | Your homelab access | + +**Your Infrastructure Cost:** +- Existing hardware: $0 (already running) +- Electricity: ~$10-20/month additional load +- Time: Initial setup ~2-4 hours + +--- + +## Next Steps + +1. **Review this plan** and decide on your specific use cases +2. **Generate GitHub PAT** with `repo` and `admin:org` scopes +3. **Start with Phase 1** - Planning +4. **Deploy a single runner first** to test before scaling +5. **Iterate** based on your workflow needs + +Would you like me to help you start with any specific phase, or do you have questions about the architecture? 🚀 \ No newline at end of file