Add comprehensive GitHub Actions self-hosted runners deployment plan

This commit is contained in:
Bendt
2026-02-16 12:24:33 -05:00
parent 6c1eebd5e5
commit 0d710e82ee

685
GITHUB_RUNNERS_PLAN.md Normal file
View File

@@ -0,0 +1,685 @@
# GitHub Actions Self-Hosted Runners Deployment Plan
## Overview
**Goal:** Deploy GitHub Actions self-hosted runners on your Docker Swarm cluster to run CI/CD workflows with unlimited minutes, custom environments, and access to your homelab resources.
**Architecture:** Docker-based runners deployed as a Swarm service with auto-scaling capabilities.
---
## Architecture Decision
### Option 1: Docker Container Runners (Recommended for your setup)
- ✅ Runs in Docker containers on your existing cluster
- ✅ Scales horizontally by adding/removing containers
- ✅ Uses your existing infrastructure (tpi-n1, tpi-n2, node-nas)
- ✅ Easy to manage through Docker Swarm
- ✅ ARM64 and x86_64 support for multi-arch builds
### Option 2: VM/Physical Runners (Alternative)
- Runners installed directly on VMs or bare metal
- More isolated but harder to manage
- Not recommended for your containerized setup
**Decision:** Use Docker Container Runners (Option 1) with multi-arch support.
---
## Deployment Architecture
```
GitHub Repository
│ Webhook/REST API
┌─────────────────────────────┐
│ GitHub Actions Service │
└─────────────────────────────┘
│ Job Request
┌─────────────────────────────┐
│ Your Docker Swarm Cluster │
│ │
│ ┌─────────────────────┐ │
│ │ Runner Service │ │
│ │ (Multiple Replicas)│ │
│ │ │ │
│ │ ┌─────┐ ┌─────┐ │ │
│ │ │ ARM │ │x86_64│ │ │
│ │ │64 │ │ │ │ │
│ │ └─────┘ └─────┘ │ │
│ └─────────────────────┘ │
│ │
│ ┌─────────────────────┐ │
│ │ Docker-in-Docker │ │
│ │ (for Docker builds)│ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────┘
```
---
## Phase 1: Planning & Preparation
### Step 1: Determine Requirements
**Use Cases:**
- [ ] Build and test applications
- [ ] Deploy to your homelab (Kubernetes/Docker Swarm)
- [ ] Run ARM64 builds (for Raspberry Pi/ARM apps)
- [ ] Run x86_64 builds (standard applications)
- [ ] Access private network resources (databases, internal APIs)
- [ ] Build Docker images and push to your Gitea registry
**Resource Requirements per Runner:**
- CPU: 2+ cores recommended
- Memory: 4GB+ RAM per runner
- Disk: 20GB+ for workspace and Docker layers
- Network: Outbound HTTPS to GitHub
**Current Cluster Capacity:**
- tpi-n1: 8 cores ARM64, 8GB RAM (Manager)
- tpi-n2: 8 cores ARM64, 8GB RAM (Worker)
- node-nas: 2 cores x86_64, 8GB RAM (Storage)
**Recommended Allocation:**
- 2 runners on tpi-n1 (ARM64)
- 2 runners on tpi-n2 (ARM64)
- 1 runner on node-nas (x86_64)
### Step 2: GitHub Configuration
**Choose Runner Level:**
- [ ] **Repository-level** - Dedicated to specific repo (recommended to start)
- [ ] **Organization-level** - Shared across org repos
- [ ] **Enterprise-level** - Shared across enterprise
**For your use case:** Start with **repository-level** runners, then expand to organization-level if needed.
**Required GitHub Settings:**
1. Go to: `Settings > Actions > Runners > New self-hosted runner`
2. Note the **Registration Token** (expires after 1 hour)
3. Note the **Runner Group** (default: "Default")
4. Configure labels (e.g., `homelab`, `arm64`, `x86_64`, `self-hosted`)
---
## Phase 2: Infrastructure Setup
### Step 3: Create Docker Network
```bash
# On controller (tpi-n1)
ssh ubuntu@192.168.2.130
# Create overlay network for runners
docker network create --driver overlay --attachable github-runners-network
# Verify
docker network ls | grep github
```
### Step 4: Create Persistent Storage
```bash
# Create volume for runner cache (shared across runners)
docker volume create github-runner-cache
# Create volume for Docker build cache
docker volume create github-runner-docker-cache
```
### Step 5: Prepare Node Labels
```bash
# Verify node labels
ssh ubuntu@192.168.2.130
docker node ls --format '{{.Hostname}} {{.Labels}}'
# Expected output:
# tpi-n1 map[infra:true role:storage storage:high]
# tpi-n2 map[role:compute]
# node-nas map[type:nas]
# Add architecture labels if missing:
docker node update --label-add arch=arm64 tpi-n1
docker node update --label-add arch=arm64 tpi-n2
docker node update --label-add arch=x86_64 node-nas
```
---
## Phase 3: Runner Deployment
### Step 6: Create Environment File
Create `.env` file:
```bash
# GitHub Configuration
GITHUB_TOKEN=your_github_personal_access_token
GITHUB_OWNER=your-github-username-or-org
GITHUB_REPO=your-repository-name # Leave empty for org-level
# Runner Configuration
RUNNER_NAME_PREFIX=homelab
RUNNER_LABELS=self-hosted,homelab,linux
RUNNER_GROUP=Default
# Docker Configuration
DOCKER_TLS_CERTDIR=/certs
# Optional: Pre-installed tools
PRE_INSTALL_TOOLS="docker-compose,nodejs,npm,yarn,python3,pip,git"
```
### Step 7: Create Docker Compose Stack
Create `github-runners-stack.yml`:
```yaml
version: "3.8"
services:
# ARM64 Runners
runner-arm64:
image: myoung34/github-runner:latest
environment:
- ACCESS_TOKEN=${GITHUB_TOKEN}
- REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
- RUNNER_NAME=${RUNNER_NAME_PREFIX}-arm64-{{.Task.Slot}}
- RUNNER_WORKDIR=/tmp/runner-work
- RUNNER_GROUP=${RUNNER_GROUP:-Default}
- RUNNER_SCOPE=repo
- LABELS=${RUNNER_LABELS},arm64
- DISABLE_AUTO_UPDATE=true
- EPHEMERAL=true # One job per container
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- github-runner-cache:/home/runner/cache
- github-runner-docker-cache:/var/lib/docker
networks:
- github-runners-network
- dokploy-network
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.labels.arch == arm64
restart_policy:
condition: any
delay: 5s
max_attempts: 3
privileged: true # Required for Docker-in-Docker
# x86_64 Runners
runner-x86_64:
image: myoung34/github-runner:latest
environment:
- ACCESS_TOKEN=${GITHUB_TOKEN}
- REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
- RUNNER_NAME=${RUNNER_NAME_PREFIX}-x86_64-{{.Task.Slot}}
- RUNNER_WORKDIR=/tmp/runner-work
- RUNNER_GROUP=${RUNNER_GROUP:-Default}
- RUNNER_SCOPE=repo
- LABELS=${RUNNER_LABELS},x86_64
- DISABLE_AUTO_UPDATE=true
- EPHEMERAL=true
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- github-runner-cache:/home/runner/cache
- github-runner-docker-cache:/var/lib/docker
networks:
- github-runners-network
- dokploy-network
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.arch == x86_64
restart_policy:
condition: any
delay: 5s
max_attempts: 3
privileged: true
# Optional: Runner Autoscaler
autoscaler:
image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
- RUNNER_SCOPE=repo
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- github-runners-network
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
volumes:
github-runner-cache:
github-runner-docker-cache:
networks:
github-runners-network:
driver: overlay
dokploy-network:
external: true
```
### Step 8: Deploy Runners
```bash
# Copy files to controller
scp github-runners-stack.yml ubuntu@192.168.2.130:~/
scp .env ubuntu@192.168.2.130:~/
# SSH to controller
ssh ubuntu@192.168.2.130
# Load environment
set -a && source .env && set +a
# Deploy stack
docker stack deploy -c github-runners-stack.yml github-runners
# Verify deployment
docker stack ps github-runners
docker service ls | grep github
```
---
## Phase 4: GitHub Integration
### Step 9: Verify Runners in GitHub
1. Go to: `https://github.com/[OWNER]/[REPO]/settings/actions/runners`
2. You should see your runners listed as "Idle"
3. Labels should show: `self-hosted`, `homelab`, `linux`, `arm64` or `x86_64`
### Step 10: Test with Sample Workflow
Create `.github/workflows/test-self-hosted.yml`:
```yaml
name: Test Self-Hosted Runners
on:
push:
branches: [ main ]
workflow_dispatch:
jobs:
test-arm64:
runs-on: [self-hosted, homelab, arm64]
steps:
- uses: actions/checkout@v4
- name: Show runner info
run: |
echo "Architecture: $(uname -m)"
echo "OS: $(uname -s)"
echo "Node: $(hostname)"
echo "CPU: $(nproc)"
echo "Memory: $(free -h | grep Mem)"
- name: Test Docker
run: |
docker --version
docker info
docker run --rm hello-world
test-x86_64:
runs-on: [self-hosted, homelab, x86_64]
steps:
- uses: actions/checkout@v4
- name: Show runner info
run: |
echo "Architecture: $(uname -m)"
echo "OS: $(uname -s)"
echo "Node: $(hostname)"
- name: Test access to homelab
run: |
# Test connectivity to your services
curl -s http://gitea.bendtstudio.com:3000 || echo "Gitea not accessible"
curl -s http://192.168.2.130:3000 || echo "Dokploy not accessible"
```
---
## Phase 5: Security Hardening
### Step 11: Implement Security Best Practices
**1. Use Short-Lived Tokens:**
```bash
# Generate a GitHub App instead of PAT for better security
# Or use OpenID Connect (OIDC) for authentication
```
**2. Restrict Runner Permissions:**
```yaml
# Add to workflow
jobs:
build:
runs-on: [self-hosted, homelab]
permissions:
contents: read
packages: write # Only if pushing to registry
```
**3. Network Isolation:**
```yaml
# Modify stack to use isolated network
networks:
github-runners-network:
driver: overlay
internal: true # No external access except through proxy
```
**4. Resource Limits:**
```yaml
# Add to service definition in stack
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
```
### Step 12: Enable Ephemeral Mode
Ephemeral runners (already configured with `EPHEMERAL=true`) provide better security:
- Each runner handles only one job
- Container is destroyed after job completion
- Fresh environment for every build
- Prevents credential leakage between jobs
---
## Phase 6: Monitoring & Maintenance
### Step 13: Set Up Monitoring
**Create monitoring script** (`monitor-runners.sh`):
```bash
#!/bin/bash
# Check runner status
echo "=== Docker Service Status ==="
docker service ls | grep github-runner
echo -e "\n=== Runner Containers ==="
docker ps --filter name=github-runner --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
echo -e "\n=== Recent Logs ==="
docker service logs github-runners_runner-arm64 --tail 50
docker service logs github-runners_runner-x86_64 --tail 50
echo -e "\n=== Resource Usage ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep github-runner
```
**Create cron job for monitoring:**
```bash
# Add to crontab
crontab -e
# Check runner health every 5 minutes
*/5 * * * * /home/ubuntu/github-runners/monitor-runners.sh >> /var/log/github-runners.log 2>&1
```
### Step 14: Set Up Log Rotation
```bash
# Create logrotate config
sudo tee /etc/logrotate.d/github-runners << EOF
/var/log/github-runners.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 644 ubuntu ubuntu
}
EOF
```
### Step 15: Backup Strategy
```bash
# Create backup script
#!/bin/bash
BACKUP_DIR="/backup/github-runners/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup configuration
cp ~/github-runners-stack.yml "$BACKUP_DIR/"
cp ~/.env "$BACKUP_DIR/"
# Backup volumes
docker run --rm -v github-runner-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/runner-cache.tar.gz -C /data .
docker run --rm -v github-runner-docker-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/docker-cache.tar.gz -C /data .
echo "Backup completed: $BACKUP_DIR"
```
---
## Phase 7: Advanced Configuration
### Step 16: Cache Optimization
**Mount host cache directories:**
```yaml
volumes:
- /home/ubuntu/.cache/npm:/root/.npm
- /home/ubuntu/.cache/pip:/root/.cache/pip
- /home/ubuntu/.cache/go-build:/root/.cache/go-build
- /home/ubuntu/.cargo:/root/.cargo
```
**Pre-install common tools in custom image** (`Dockerfile.runner`):
```dockerfile
FROM myoung34/github-runner:latest
# Install common build tools
RUN apt-get update && apt-get install -y \
build-essential \
nodejs \
npm \
python3 \
python3-pip \
golang-go \
openjdk-17-jdk \
maven \
gradle \
&& rm -rf /var/lib/apt/lists/*
# Install Docker Compose
RUN curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
-o /usr/local/bin/docker-compose && \
chmod +x /usr/local/bin/docker-compose
# Pre-pull common images
RUN docker pull node:lts-alpine
RUN docker pull python:3.11-slim
```
Build and use custom image:
```bash
docker build -t your-registry/github-runner:custom -f Dockerfile.runner .
docker push your-registry/github-runner:custom
# Update stack to use custom image
```
### Step 17: Autoscaling Configuration
**Use Actions Runner Controller (ARC) for Kubernetes-style autoscaling:**
```yaml
# Add to stack
autoscaler:
image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
- GITHUB_APP_ID=${GITHUB_APP_ID}
- GITHUB_APP_INSTALLATION_ID=${GITHUB_APP_INSTALLATION_ID}
- GITHUB_APP_PRIVATE_KEY=/etc/gh-app-key/private-key.pem
volumes:
- /path/to/private-key.pem:/etc/gh-app-key/private-key.pem:ro
- /var/run/docker.sock:/var/run/docker.sock
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
```
### Step 18: Multi-Repository Setup
For organization-level runners, update environment:
```bash
# For org-level
RUNNER_SCOPE=org
ORG_NAME=your-organization
# Remove REPO_URL, use:
ORG_URL=https://github.com/${ORG_NAME}
```
---
## Phase 8: Troubleshooting Guide
### Common Issues & Solutions
**1. Runner shows "Offline" in GitHub:**
```bash
# Check logs
docker service logs github-runners_runner-arm64
# Common causes:
# - Expired token (regenerate in GitHub settings)
# - Network connectivity issue
docker exec <container> curl -I https://github.com
# Restart service
docker service update --force github-runners_runner-arm64
```
**2. Docker-in-Docker not working:**
```bash
# Ensure privileged mode is enabled
# Check Docker socket is mounted
docker exec <container> docker ps
# If failing, check AppArmor/SELinux
sudo aa-status | grep docker
```
**3. Jobs stuck in "Queued":**
```bash
# Check if runners are picking up jobs
docker service ps github-runners_runner-arm64
# Verify labels match
docker exec <container> cat /home/runner/.runner | jq '.labels'
```
**4. Out of disk space:**
```bash
# Clean up Docker system
docker system prune -a --volumes
# Clean runner cache
docker volume rm github-runner-docker-cache
docker volume create github-runner-docker-cache
```
---
## Implementation Checklist
### Phase 1: Planning
- [ ] Determine which repositories need self-hosted runners
- [ ] Decide on runner count per architecture
- [ ] Generate GitHub Personal Access Token
### Phase 2: Infrastructure
- [ ] Create Docker network
- [ ] Create persistent volumes
- [ ] Verify node labels
### Phase 3: Deployment
- [ ] Create `.env` file with GitHub token
- [ ] Create `github-runners-stack.yml`
- [ ] Deploy stack to Docker Swarm
- [ ] Verify runners appear in GitHub UI
### Phase 4: Testing
- [ ] Create test workflow
- [ ] Run test on ARM64 runner
- [ ] Run test on x86_64 runner
- [ ] Verify Docker builds work
- [ ] Test access to homelab services
### Phase 5: Security
- [ ] Enable ephemeral mode
- [ ] Set resource limits
- [ ] Review and restrict permissions
- [ ] Set up network isolation
### Phase 6: Operations
- [ ] Create monitoring script
- [ ] Set up log rotation
- [ ] Create backup script
- [ ] Document maintenance procedures
---
## Cost & Resource Analysis
**Compared to GitHub-hosted runners:**
| Feature | GitHub Hosted | Your Self-Hosted |
|---------|---------------|------------------|
| Cost | $0.008/minute Linux | Free (electricity) |
| Minutes | 2,000/month free | Unlimited |
| ARM64 | Limited | Full control |
| Concurrency | 20 jobs | Unlimited |
| Network | Internet only | Your homelab access |
**Your Infrastructure Cost:**
- Existing hardware: $0 (already running)
- Electricity: ~$10-20/month additional load
- Time: Initial setup ~2-4 hours
---
## Next Steps
1. **Review this plan** and decide on your specific use cases
2. **Generate GitHub PAT** with `repo` and `admin:org` scopes
3. **Start with Phase 1** - Planning
4. **Deploy a single runner first** to test before scaling
5. **Iterate** based on your workflow needs
Would you like me to help you start with any specific phase, or do you have questions about the architecture? 🚀