Add comprehensive GitHub Actions self-hosted runners deployment plan

2026-02-16 12:24:33 -05:00
parent 6c1eebd5e5
commit 0d710e82ee
1 changed files with 685 additions and 0 deletions
--- a/GITHUB_RUNNERS_PLAN.md
+++ b/GITHUB_RUNNERS_PLAN.md
@@ -0,0 +1,685 @@
+# GitHub Actions Self-Hosted Runners Deployment Plan
+
+## Overview
+
+**Goal:** Deploy GitHub Actions self-hosted runners on your Docker Swarm cluster to run CI/CD workflows with unlimited minutes, custom environments, and access to your homelab resources.
+
+**Architecture:** Docker-based runners deployed as a Swarm service with auto-scaling capabilities.
+
+---
+
+## Architecture Decision
+
+### Option 1: Docker Container Runners (Recommended for your setup)
+- ✅ Runs in Docker containers on your existing cluster
+- ✅ Scales horizontally by adding/removing containers
+- ✅ Uses your existing infrastructure (tpi-n1, tpi-n2, node-nas)
+- ✅ Easy to manage through Docker Swarm
+- ✅ ARM64 and x86_64 support for multi-arch builds
+
+### Option 2: VM/Physical Runners (Alternative)
+- Runners installed directly on VMs or bare metal
+- More isolated but harder to manage
+- Not recommended for your containerized setup
+
+**Decision:** Use Docker Container Runners (Option 1) with multi-arch support.
+
+---
+
+## Deployment Architecture
+
+```
+GitHub Repository
+        │
+        │ Webhook/REST API
+        ▼
+┌─────────────────────────────┐
+│  GitHub Actions Service     │
+└─────────────────────────────┘
+        │
+        │ Job Request
+        ▼
+┌─────────────────────────────┐
+│  Your Docker Swarm Cluster  │
+│                             │
+│  ┌─────────────────────┐   │
+│  │  Runner Service     │   │
+│  │  (Multiple Replicas)│   │
+│  │                     │   │
+│  │  ┌─────┐ ┌─────┐   │   │
+│  │  │ ARM │ │x86_64│   │   │
+│  │  │64   │ │     │   │   │
+│  │  └─────┘ └─────┘   │   │
+│  └─────────────────────┘   │
+│                             │
+│  ┌─────────────────────┐   │
+│  │  Docker-in-Docker   │   │
+│  │  (for Docker builds)│   │
+│  └─────────────────────┘   │
+│                             │
+└─────────────────────────────┘
+```
+
+---
+
+## Phase 1: Planning & Preparation
+
+### Step 1: Determine Requirements
+
+**Use Cases:**
+- [ ] Build and test applications
+- [ ] Deploy to your homelab (Kubernetes/Docker Swarm)
+- [ ] Run ARM64 builds (for Raspberry Pi/ARM apps)
+- [ ] Run x86_64 builds (standard applications)
+- [ ] Access private network resources (databases, internal APIs)
+- [ ] Build Docker images and push to your Gitea registry
+
+**Resource Requirements per Runner:**
+- CPU: 2+ cores recommended
+- Memory: 4GB+ RAM per runner
+- Disk: 20GB+ for workspace and Docker layers
+- Network: Outbound HTTPS to GitHub
+
+**Current Cluster Capacity:**
+- tpi-n1: 8 cores ARM64, 8GB RAM (Manager)
+- tpi-n2: 8 cores ARM64, 8GB RAM (Worker)
+- node-nas: 2 cores x86_64, 8GB RAM (Storage)
+
+**Recommended Allocation:**
+- 2 runners on tpi-n1 (ARM64)
+- 2 runners on tpi-n2 (ARM64)
+- 1 runner on node-nas (x86_64)
+
+### Step 2: GitHub Configuration
+
+**Choose Runner Level:**
+- [ ] **Repository-level** - Dedicated to specific repo (recommended to start)
+- [ ] **Organization-level** - Shared across org repos
+- [ ] **Enterprise-level** - Shared across enterprise
+
+**For your use case:** Start with **repository-level** runners, then expand to organization-level if needed.
+
+**Required GitHub Settings:**
+1. Go to: `Settings > Actions > Runners > New self-hosted runner`
+2. Note the **Registration Token** (expires after 1 hour)
+3. Note the **Runner Group** (default: "Default")
+4. Configure labels (e.g., `homelab`, `arm64`, `x86_64`, `self-hosted`)
+
+---
+
+## Phase 2: Infrastructure Setup
+
+### Step 3: Create Docker Network
+
+```bash
+# On controller (tpi-n1)
+ssh ubuntu@192.168.2.130
+
+# Create overlay network for runners
+docker network create --driver overlay --attachable github-runners-network
+
+# Verify
+docker network ls | grep github
+```
+
+### Step 4: Create Persistent Storage
+
+```bash
+# Create volume for runner cache (shared across runners)
+docker volume create github-runner-cache
+
+# Create volume for Docker build cache
+docker volume create github-runner-docker-cache
+```
+
+### Step 5: Prepare Node Labels
+
+```bash
+# Verify node labels
+ssh ubuntu@192.168.2.130
+docker node ls --format '{{.Hostname}} {{.Labels}}'
+
+# Expected output:
+# tpi-n1      map[infra:true role:storage storage:high]
+# tpi-n2      map[role:compute]
+# node-nas    map[type:nas]
+
+# Add architecture labels if missing:
+docker node update --label-add arch=arm64 tpi-n1
+docker node update --label-add arch=arm64 tpi-n2
+docker node update --label-add arch=x86_64 node-nas
+```
+
+---
+
+## Phase 3: Runner Deployment
+
+### Step 6: Create Environment File
+
+Create `.env` file:
+```bash
+# GitHub Configuration
+GITHUB_TOKEN=your_github_personal_access_token
+GITHUB_OWNER=your-github-username-or-org
+GITHUB_REPO=your-repository-name  # Leave empty for org-level
+
+# Runner Configuration
+RUNNER_NAME_PREFIX=homelab
+RUNNER_LABELS=self-hosted,homelab,linux
+RUNNER_GROUP=Default
+
+# Docker Configuration
+DOCKER_TLS_CERTDIR=/certs
+
+# Optional: Pre-installed tools
+PRE_INSTALL_TOOLS="docker-compose,nodejs,npm,yarn,python3,pip,git"
+```
+
+### Step 7: Create Docker Compose Stack
+
+Create `github-runners-stack.yml`:
+
+```yaml
+version: "3.8"
+
+services:
+  # ARM64 Runners
+  runner-arm64:
+    image: myoung34/github-runner:latest
+    environment:
+      - ACCESS_TOKEN=${GITHUB_TOKEN}
+      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
+      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-arm64-{{.Task.Slot}}
+      - RUNNER_WORKDIR=/tmp/runner-work
+      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
+      - RUNNER_SCOPE=repo
+      - LABELS=${RUNNER_LABELS},arm64
+      - DISABLE_AUTO_UPDATE=true
+      - EPHEMERAL=true  # One job per container
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+      - github-runner-cache:/home/runner/cache
+      - github-runner-docker-cache:/var/lib/docker
+    networks:
+      - github-runners-network
+      - dokploy-network
+    deploy:
+      mode: replicated
+      replicas: 2
+      placement:
+        constraints:
+          - node.labels.arch == arm64
+      restart_policy:
+        condition: any
+        delay: 5s
+        max_attempts: 3
+    privileged: true  # Required for Docker-in-Docker
+
+  # x86_64 Runners
+  runner-x86_64:
+    image: myoung34/github-runner:latest
+    environment:
+      - ACCESS_TOKEN=${GITHUB_TOKEN}
+      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
+      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-x86_64-{{.Task.Slot}}
+      - RUNNER_WORKDIR=/tmp/runner-work
+      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
+      - RUNNER_SCOPE=repo
+      - LABELS=${RUNNER_LABELS},x86_64
+      - DISABLE_AUTO_UPDATE=true
+      - EPHEMERAL=true
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+      - github-runner-cache:/home/runner/cache
+      - github-runner-docker-cache:/var/lib/docker
+    networks:
+      - github-runners-network
+      - dokploy-network
+    deploy:
+      mode: replicated
+      replicas: 1
+      placement:
+        constraints:
+          - node.labels.arch == x86_64
+      restart_policy:
+        condition: any
+        delay: 5s
+        max_attempts: 3
+    privileged: true
+
+  # Optional: Runner Autoscaler
+  autoscaler:
+    image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
+    environment:
+      - GITHUB_TOKEN=${GITHUB_TOKEN}
+      - RUNNER_SCOPE=repo
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+    networks:
+      - github-runners-network
+    deploy:
+      mode: replicated
+      replicas: 1
+      placement:
+        constraints:
+          - node.role == manager
+
+volumes:
+  github-runner-cache:
+  github-runner-docker-cache:
+
+networks:
+  github-runners-network:
+    driver: overlay
+  dokploy-network:
+    external: true
+```
+
+### Step 8: Deploy Runners
+
+```bash
+# Copy files to controller
+scp github-runners-stack.yml ubuntu@192.168.2.130:~/
+scp .env ubuntu@192.168.2.130:~/
+
+# SSH to controller
+ssh ubuntu@192.168.2.130
+
+# Load environment
+set -a && source .env && set +a
+
+# Deploy stack
+docker stack deploy -c github-runners-stack.yml github-runners
+
+# Verify deployment
+docker stack ps github-runners
+docker service ls | grep github
+```
+
+---
+
+## Phase 4: GitHub Integration
+
+### Step 9: Verify Runners in GitHub
+
+1. Go to: `https://github.com/[OWNER]/[REPO]/settings/actions/runners`
+2. You should see your runners listed as "Idle"
+3. Labels should show: `self-hosted`, `homelab`, `linux`, `arm64` or `x86_64`
+
+### Step 10: Test with Sample Workflow
+
+Create `.github/workflows/test-self-hosted.yml`:
+
+```yaml
+name: Test Self-Hosted Runners
+
+on:
+  push:
+    branches: [ main ]
+  workflow_dispatch:
+
+jobs:
+  test-arm64:
+    runs-on: [self-hosted, homelab, arm64]
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Show runner info
+        run: |
+          echo "Architecture: $(uname -m)"
+          echo "OS: $(uname -s)"
+          echo "Node: $(hostname)"
+          echo "CPU: $(nproc)"
+          echo "Memory: $(free -h | grep Mem)"
+      
+      - name: Test Docker
+        run: |
+          docker --version
+          docker info
+          docker run --rm hello-world
+
+  test-x86_64:
+    runs-on: [self-hosted, homelab, x86_64]
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Show runner info
+        run: |
+          echo "Architecture: $(uname -m)"
+          echo "OS: $(uname -s)"
+          echo "Node: $(hostname)"
+      
+      - name: Test access to homelab
+        run: |
+          # Test connectivity to your services
+          curl -s http://gitea.bendtstudio.com:3000 || echo "Gitea not accessible"
+          curl -s http://192.168.2.130:3000 || echo "Dokploy not accessible"
+```
+
+---
+
+## Phase 5: Security Hardening
+
+### Step 11: Implement Security Best Practices
+
+**1. Use Short-Lived Tokens:**
+```bash
+# Generate a GitHub App instead of PAT for better security
+# Or use OpenID Connect (OIDC) for authentication
+```
+
+**2. Restrict Runner Permissions:**
+```yaml
+# Add to workflow
+jobs:
+  build:
+    runs-on: [self-hosted, homelab]
+    permissions:
+      contents: read
+      packages: write  # Only if pushing to registry
+```
+
+**3. Network Isolation:**
+```yaml
+# Modify stack to use isolated network
+networks:
+  github-runners-network:
+    driver: overlay
+    internal: true  # No external access except through proxy
+```
+
+**4. Resource Limits:**
+```yaml
+# Add to service definition in stack
+deploy:
+  resources:
+    limits:
+      cpus: '2'
+      memory: 4G
+    reservations:
+      cpus: '1'
+      memory: 2G
+```
+
+### Step 12: Enable Ephemeral Mode
+
+Ephemeral runners (already configured with `EPHEMERAL=true`) provide better security:
+- Each runner handles only one job
+- Container is destroyed after job completion
+- Fresh environment for every build
+- Prevents credential leakage between jobs
+
+---
+
+## Phase 6: Monitoring & Maintenance
+
+### Step 13: Set Up Monitoring
+
+**Create monitoring script** (`monitor-runners.sh`):
+```bash
+#!/bin/bash
+
+# Check runner status
+echo "=== Docker Service Status ==="
+docker service ls | grep github-runner
+
+echo -e "\n=== Runner Containers ==="
+docker ps --filter name=github-runner --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
+
+echo -e "\n=== Recent Logs ==="
+docker service logs github-runners_runner-arm64 --tail 50
+docker service logs github-runners_runner-x86_64 --tail 50
+
+echo -e "\n=== Resource Usage ==="
+docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep github-runner
+```
+
+**Create cron job for monitoring:**
+```bash
+# Add to crontab
+crontab -e
+
+# Check runner health every 5 minutes
+*/5 * * * * /home/ubuntu/github-runners/monitor-runners.sh >> /var/log/github-runners.log 2>&1
+```
+
+### Step 14: Set Up Log Rotation
+
+```bash
+# Create logrotate config
+sudo tee /etc/logrotate.d/github-runners << EOF
+/var/log/github-runners.log {
+    daily
+    rotate 7
+    compress
+    delaycompress
+    missingok
+    notifempty
+    create 644 ubuntu ubuntu
+}
+EOF
+```
+
+### Step 15: Backup Strategy
+
+```bash
+# Create backup script
+#!/bin/bash
+BACKUP_DIR="/backup/github-runners/$(date +%Y%m%d)"
+mkdir -p "$BACKUP_DIR"
+
+# Backup configuration
+cp ~/github-runners-stack.yml "$BACKUP_DIR/"
+cp ~/.env "$BACKUP_DIR/"
+
+# Backup volumes
+docker run --rm -v github-runner-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/runner-cache.tar.gz -C /data .
+docker run --rm -v github-runner-docker-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/docker-cache.tar.gz -C /data .
+
+echo "Backup completed: $BACKUP_DIR"
+```
+
+---
+
+## Phase 7: Advanced Configuration
+
+### Step 16: Cache Optimization
+
+**Mount host cache directories:**
+```yaml
+volumes:
+  - /home/ubuntu/.cache/npm:/root/.npm
+  - /home/ubuntu/.cache/pip:/root/.cache/pip
+  - /home/ubuntu/.cache/go-build:/root/.cache/go-build
+  - /home/ubuntu/.cargo:/root/.cargo
+```
+
+**Pre-install common tools in custom image** (`Dockerfile.runner`):
+```dockerfile
+FROM myoung34/github-runner:latest
+
+# Install common build tools
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    nodejs \
+    npm \
+    python3 \
+    python3-pip \
+    golang-go \
+    openjdk-17-jdk \
+    maven \
+    gradle \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Docker Compose
+RUN curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
+    -o /usr/local/bin/docker-compose && \
+    chmod +x /usr/local/bin/docker-compose
+
+# Pre-pull common images
+RUN docker pull node:lts-alpine
+RUN docker pull python:3.11-slim
+```
+
+Build and use custom image:
+```bash
+docker build -t your-registry/github-runner:custom -f Dockerfile.runner .
+docker push your-registry/github-runner:custom
+
+# Update stack to use custom image
+```
+
+### Step 17: Autoscaling Configuration
+
+**Use Actions Runner Controller (ARC) for Kubernetes-style autoscaling:**
+
+```yaml
+# Add to stack
+autoscaler:
+  image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
+  environment:
+    - GITHUB_TOKEN=${GITHUB_TOKEN}
+    - GITHUB_APP_ID=${GITHUB_APP_ID}
+    - GITHUB_APP_INSTALLATION_ID=${GITHUB_APP_INSTALLATION_ID}
+    - GITHUB_APP_PRIVATE_KEY=/etc/gh-app-key/private-key.pem
+  volumes:
+    - /path/to/private-key.pem:/etc/gh-app-key/private-key.pem:ro
+    - /var/run/docker.sock:/var/run/docker.sock
+  deploy:
+    mode: replicated
+    replicas: 1
+    placement:
+      constraints:
+        - node.role == manager
+```
+
+### Step 18: Multi-Repository Setup
+
+For organization-level runners, update environment:
+```bash
+# For org-level
+RUNNER_SCOPE=org
+ORG_NAME=your-organization
+
+# Remove REPO_URL, use:
+ORG_URL=https://github.com/${ORG_NAME}
+```
+
+---
+
+## Phase 8: Troubleshooting Guide
+
+### Common Issues & Solutions
+
+**1. Runner shows "Offline" in GitHub:**
+```bash
+# Check logs
+docker service logs github-runners_runner-arm64
+
+# Common causes:
+# - Expired token (regenerate in GitHub settings)
+# - Network connectivity issue
+docker exec <container> curl -I https://github.com
+
+# Restart service
+docker service update --force github-runners_runner-arm64
+```
+
+**2. Docker-in-Docker not working:**
+```bash
+# Ensure privileged mode is enabled
+# Check Docker socket is mounted
+docker exec <container> docker ps
+
+# If failing, check AppArmor/SELinux
+sudo aa-status | grep docker
+```
+
+**3. Jobs stuck in "Queued":**
+```bash
+# Check if runners are picking up jobs
+docker service ps github-runners_runner-arm64
+
+# Verify labels match
+docker exec <container> cat /home/runner/.runner | jq '.labels'
+```
+
+**4. Out of disk space:**
+```bash
+# Clean up Docker system
+docker system prune -a --volumes
+
+# Clean runner cache
+docker volume rm github-runner-docker-cache
+docker volume create github-runner-docker-cache
+```
+
+---
+
+## Implementation Checklist
+
+### Phase 1: Planning
+- [ ] Determine which repositories need self-hosted runners
+- [ ] Decide on runner count per architecture
+- [ ] Generate GitHub Personal Access Token
+
+### Phase 2: Infrastructure
+- [ ] Create Docker network
+- [ ] Create persistent volumes
+- [ ] Verify node labels
+
+### Phase 3: Deployment
+- [ ] Create `.env` file with GitHub token
+- [ ] Create `github-runners-stack.yml`
+- [ ] Deploy stack to Docker Swarm
+- [ ] Verify runners appear in GitHub UI
+
+### Phase 4: Testing
+- [ ] Create test workflow
+- [ ] Run test on ARM64 runner
+- [ ] Run test on x86_64 runner
+- [ ] Verify Docker builds work
+- [ ] Test access to homelab services
+
+### Phase 5: Security
+- [ ] Enable ephemeral mode
+- [ ] Set resource limits
+- [ ] Review and restrict permissions
+- [ ] Set up network isolation
+
+### Phase 6: Operations
+- [ ] Create monitoring script
+- [ ] Set up log rotation
+- [ ] Create backup script
+- [ ] Document maintenance procedures
+
+---
+
+## Cost & Resource Analysis
+
+**Compared to GitHub-hosted runners:**
+
+| Feature | GitHub Hosted | Your Self-Hosted |
+|---------|---------------|------------------|
+| Cost | $0.008/minute Linux | Free (electricity) |
+| Minutes | 2,000/month free | Unlimited |
+| ARM64 | Limited | Full control |
+| Concurrency | 20 jobs | Unlimited |
+| Network | Internet only | Your homelab access |
+
+**Your Infrastructure Cost:**
+- Existing hardware: $0 (already running)
+- Electricity: ~$10-20/month additional load
+- Time: Initial setup ~2-4 hours
+
+---
+
+## Next Steps
+
+1. **Review this plan** and decide on your specific use cases
+2. **Generate GitHub PAT** with `repo` and `admin:org` scopes
+3. **Start with Phase 1** - Planning
+4. **Deploy a single runner first** to test before scaling
+5. **Iterate** based on your workflow needs
+
+Would you like me to help you start with any specific phase, or do you have questions about the architecture? 🚀