Files
cloud-compose/GITHUB_RUNNERS_PLAN.md

18 KiB

GitHub Actions Self-Hosted Runners Deployment Plan

Overview

Goal: Deploy GitHub Actions self-hosted runners on your Docker Swarm cluster to run CI/CD workflows with unlimited minutes, custom environments, and access to your homelab resources.

Architecture: Docker-based runners deployed as a Swarm service with auto-scaling capabilities.


Architecture Decision

  • Runs in Docker containers on your existing cluster
  • Scales horizontally by adding/removing containers
  • Uses your existing infrastructure (tpi-n1, tpi-n2, node-nas)
  • Easy to manage through Docker Swarm
  • ARM64 and x86_64 support for multi-arch builds

Option 2: VM/Physical Runners (Alternative)

  • Runners installed directly on VMs or bare metal
  • More isolated but harder to manage
  • Not recommended for your containerized setup

Decision: Use Docker Container Runners (Option 1) with multi-arch support.


Deployment Architecture

GitHub Repository
        │
        │ Webhook/REST API
        ▼
┌─────────────────────────────┐
│  GitHub Actions Service     │
└─────────────────────────────┘
        │
        │ Job Request
        ▼
┌─────────────────────────────┐
│  Your Docker Swarm Cluster  │
│                             │
│  ┌─────────────────────┐   │
│  │  Runner Service     │   │
│  │  (Multiple Replicas)│   │
│  │                     │   │
│  │  ┌─────┐ ┌─────┐   │   │
│  │  │ ARM │ │x86_64│   │   │
│  │  │64   │ │     │   │   │
│  │  └─────┘ └─────┘   │   │
│  └─────────────────────┘   │
│                             │
│  ┌─────────────────────┐   │
│  │  Docker-in-Docker   │   │
│  │  (for Docker builds)│   │
│  └─────────────────────┘   │
│                             │
└─────────────────────────────┘

Phase 1: Planning & Preparation

Step 1: Determine Requirements

Use Cases:

  • Build and test applications
  • Deploy to your homelab (Kubernetes/Docker Swarm)
  • Run ARM64 builds (for Raspberry Pi/ARM apps)
  • Run x86_64 builds (standard applications)
  • Access private network resources (databases, internal APIs)
  • Build Docker images and push to your Gitea registry

Resource Requirements per Runner:

  • CPU: 2+ cores recommended
  • Memory: 4GB+ RAM per runner
  • Disk: 20GB+ for workspace and Docker layers
  • Network: Outbound HTTPS to GitHub

Current Cluster Capacity:

  • tpi-n1: 8 cores ARM64, 8GB RAM (Manager)
  • tpi-n2: 8 cores ARM64, 8GB RAM (Worker)
  • node-nas: 2 cores x86_64, 8GB RAM (Storage)

Recommended Allocation:

  • 2 runners on tpi-n1 (ARM64)
  • 2 runners on tpi-n2 (ARM64)
  • 1 runner on node-nas (x86_64)

Step 2: GitHub Configuration

Choose Runner Level:

  • Repository-level - Dedicated to specific repo (recommended to start)
  • Organization-level - Shared across org repos
  • Enterprise-level - Shared across enterprise

For your use case: Start with repository-level runners, then expand to organization-level if needed.

Required GitHub Settings:

  1. Go to: Settings > Actions > Runners > New self-hosted runner
  2. Note the Registration Token (expires after 1 hour)
  3. Note the Runner Group (default: "Default")
  4. Configure labels (e.g., homelab, arm64, x86_64, self-hosted)

Phase 2: Infrastructure Setup

Step 3: Create Docker Network

# On controller (tpi-n1)
ssh ubuntu@192.168.2.130

# Create overlay network for runners
docker network create --driver overlay --attachable github-runners-network

# Verify
docker network ls | grep github

Step 4: Create Persistent Storage

# Create volume for runner cache (shared across runners)
docker volume create github-runner-cache

# Create volume for Docker build cache
docker volume create github-runner-docker-cache

Step 5: Prepare Node Labels

# Verify node labels
ssh ubuntu@192.168.2.130
docker node ls --format '{{.Hostname}} {{.Labels}}'

# Expected output:
# tpi-n1      map[infra:true role:storage storage:high]
# tpi-n2      map[role:compute]
# node-nas    map[type:nas]

# Add architecture labels if missing:
docker node update --label-add arch=arm64 tpi-n1
docker node update --label-add arch=arm64 tpi-n2
docker node update --label-add arch=x86_64 node-nas

Phase 3: Runner Deployment

Step 6: Create Environment File

Create .env file:

# GitHub Configuration
GITHUB_TOKEN=your_github_personal_access_token
GITHUB_OWNER=your-github-username-or-org
GITHUB_REPO=your-repository-name  # Leave empty for org-level

# Runner Configuration
RUNNER_NAME_PREFIX=homelab
RUNNER_LABELS=self-hosted,homelab,linux
RUNNER_GROUP=Default

# Docker Configuration
DOCKER_TLS_CERTDIR=/certs

# Optional: Pre-installed tools
PRE_INSTALL_TOOLS="docker-compose,nodejs,npm,yarn,python3,pip,git"

Step 7: Create Docker Compose Stack

Create github-runners-stack.yml:

version: "3.8"

services:
  # ARM64 Runners
  runner-arm64:
    image: myoung34/github-runner:latest
    environment:
      - ACCESS_TOKEN=${GITHUB_TOKEN}
      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-arm64-{{.Task.Slot}}
      - RUNNER_WORKDIR=/tmp/runner-work
      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
      - RUNNER_SCOPE=repo
      - LABELS=${RUNNER_LABELS},arm64
      - DISABLE_AUTO_UPDATE=true
      - EPHEMERAL=true  # One job per container
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - github-runner-cache:/home/runner/cache
      - github-runner-docker-cache:/var/lib/docker
    networks:
      - github-runners-network
      - dokploy-network
    deploy:
      mode: replicated
      replicas: 2
      placement:
        constraints:
          - node.labels.arch == arm64
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
    privileged: true  # Required for Docker-in-Docker

  # x86_64 Runners
  runner-x86_64:
    image: myoung34/github-runner:latest
    environment:
      - ACCESS_TOKEN=${GITHUB_TOKEN}
      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-x86_64-{{.Task.Slot}}
      - RUNNER_WORKDIR=/tmp/runner-work
      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
      - RUNNER_SCOPE=repo
      - LABELS=${RUNNER_LABELS},x86_64
      - DISABLE_AUTO_UPDATE=true
      - EPHEMERAL=true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - github-runner-cache:/home/runner/cache
      - github-runner-docker-cache:/var/lib/docker
    networks:
      - github-runners-network
      - dokploy-network
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.labels.arch == x86_64
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
    privileged: true

  # Optional: Runner Autoscaler
  autoscaler:
    image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
    environment:
      - GITHUB_TOKEN=${GITHUB_TOKEN}
      - RUNNER_SCOPE=repo
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - github-runners-network
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager

volumes:
  github-runner-cache:
  github-runner-docker-cache:

networks:
  github-runners-network:
    driver: overlay
  dokploy-network:
    external: true

Step 8: Deploy Runners

# Copy files to controller
scp github-runners-stack.yml ubuntu@192.168.2.130:~/
scp .env ubuntu@192.168.2.130:~/

# SSH to controller
ssh ubuntu@192.168.2.130

# Load environment
set -a && source .env && set +a

# Deploy stack
docker stack deploy -c github-runners-stack.yml github-runners

# Verify deployment
docker stack ps github-runners
docker service ls | grep github

Phase 4: GitHub Integration

Step 9: Verify Runners in GitHub

  1. Go to: https://github.com/[OWNER]/[REPO]/settings/actions/runners
  2. You should see your runners listed as "Idle"
  3. Labels should show: self-hosted, homelab, linux, arm64 or x86_64

Step 10: Test with Sample Workflow

Create .github/workflows/test-self-hosted.yml:

name: Test Self-Hosted Runners

on:
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  test-arm64:
    runs-on: [self-hosted, homelab, arm64]
    steps:
      - uses: actions/checkout@v4
      
      - name: Show runner info
        run: |
          echo "Architecture: $(uname -m)"
          echo "OS: $(uname -s)"
          echo "Node: $(hostname)"
          echo "CPU: $(nproc)"
          echo "Memory: $(free -h | grep Mem)"
      
      - name: Test Docker
        run: |
          docker --version
          docker info
          docker run --rm hello-world

  test-x86_64:
    runs-on: [self-hosted, homelab, x86_64]
    steps:
      - uses: actions/checkout@v4
      
      - name: Show runner info
        run: |
          echo "Architecture: $(uname -m)"
          echo "OS: $(uname -s)"
          echo "Node: $(hostname)"
      
      - name: Test access to homelab
        run: |
          # Test connectivity to your services
          curl -s http://gitea.bendtstudio.com:3000 || echo "Gitea not accessible"
          curl -s http://192.168.2.130:3000 || echo "Dokploy not accessible"

Phase 5: Security Hardening

Step 11: Implement Security Best Practices

1. Use Short-Lived Tokens:

# Generate a GitHub App instead of PAT for better security
# Or use OpenID Connect (OIDC) for authentication

2. Restrict Runner Permissions:

# Add to workflow
jobs:
  build:
    runs-on: [self-hosted, homelab]
    permissions:
      contents: read
      packages: write  # Only if pushing to registry

3. Network Isolation:

# Modify stack to use isolated network
networks:
  github-runners-network:
    driver: overlay
    internal: true  # No external access except through proxy

4. Resource Limits:

# Add to service definition in stack
deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '1'
      memory: 2G

Step 12: Enable Ephemeral Mode

Ephemeral runners (already configured with EPHEMERAL=true) provide better security:

  • Each runner handles only one job
  • Container is destroyed after job completion
  • Fresh environment for every build
  • Prevents credential leakage between jobs

Phase 6: Monitoring & Maintenance

Step 13: Set Up Monitoring

Create monitoring script (monitor-runners.sh):

#!/bin/bash

# Check runner status
echo "=== Docker Service Status ==="
docker service ls | grep github-runner

echo -e "\n=== Runner Containers ==="
docker ps --filter name=github-runner --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

echo -e "\n=== Recent Logs ==="
docker service logs github-runners_runner-arm64 --tail 50
docker service logs github-runners_runner-x86_64 --tail 50

echo -e "\n=== Resource Usage ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep github-runner

Create cron job for monitoring:

# Add to crontab
crontab -e

# Check runner health every 5 minutes
*/5 * * * * /home/ubuntu/github-runners/monitor-runners.sh >> /var/log/github-runners.log 2>&1

Step 14: Set Up Log Rotation

# Create logrotate config
sudo tee /etc/logrotate.d/github-runners << EOF
/var/log/github-runners.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 644 ubuntu ubuntu
}
EOF

Step 15: Backup Strategy

# Create backup script
#!/bin/bash
BACKUP_DIR="/backup/github-runners/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup configuration
cp ~/github-runners-stack.yml "$BACKUP_DIR/"
cp ~/.env "$BACKUP_DIR/"

# Backup volumes
docker run --rm -v github-runner-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/runner-cache.tar.gz -C /data .
docker run --rm -v github-runner-docker-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/docker-cache.tar.gz -C /data .

echo "Backup completed: $BACKUP_DIR"

Phase 7: Advanced Configuration

Step 16: Cache Optimization

Mount host cache directories:

volumes:
  - /home/ubuntu/.cache/npm:/root/.npm
  - /home/ubuntu/.cache/pip:/root/.cache/pip
  - /home/ubuntu/.cache/go-build:/root/.cache/go-build
  - /home/ubuntu/.cargo:/root/.cargo

Pre-install common tools in custom image (Dockerfile.runner):

FROM myoung34/github-runner:latest

# Install common build tools
RUN apt-get update && apt-get install -y \
    build-essential \
    nodejs \
    npm \
    python3 \
    python3-pip \
    golang-go \
    openjdk-17-jdk \
    maven \
    gradle \
    && rm -rf /var/lib/apt/lists/*

# Install Docker Compose
RUN curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
    -o /usr/local/bin/docker-compose && \
    chmod +x /usr/local/bin/docker-compose

# Pre-pull common images
RUN docker pull node:lts-alpine
RUN docker pull python:3.11-slim

Build and use custom image:

docker build -t your-registry/github-runner:custom -f Dockerfile.runner .
docker push your-registry/github-runner:custom

# Update stack to use custom image

Step 17: Autoscaling Configuration

Use Actions Runner Controller (ARC) for Kubernetes-style autoscaling:

# Add to stack
autoscaler:
  image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
  environment:
    - GITHUB_TOKEN=${GITHUB_TOKEN}
    - GITHUB_APP_ID=${GITHUB_APP_ID}
    - GITHUB_APP_INSTALLATION_ID=${GITHUB_APP_INSTALLATION_ID}
    - GITHUB_APP_PRIVATE_KEY=/etc/gh-app-key/private-key.pem
  volumes:
    - /path/to/private-key.pem:/etc/gh-app-key/private-key.pem:ro
    - /var/run/docker.sock:/var/run/docker.sock
  deploy:
    mode: replicated
    replicas: 1
    placement:
      constraints:
        - node.role == manager

Step 18: Multi-Repository Setup

For organization-level runners, update environment:

# For org-level
RUNNER_SCOPE=org
ORG_NAME=your-organization

# Remove REPO_URL, use:
ORG_URL=https://github.com/${ORG_NAME}

Phase 8: Troubleshooting Guide

Common Issues & Solutions

1. Runner shows "Offline" in GitHub:

# Check logs
docker service logs github-runners_runner-arm64

# Common causes:
# - Expired token (regenerate in GitHub settings)
# - Network connectivity issue
docker exec <container> curl -I https://github.com

# Restart service
docker service update --force github-runners_runner-arm64

2. Docker-in-Docker not working:

# Ensure privileged mode is enabled
# Check Docker socket is mounted
docker exec <container> docker ps

# If failing, check AppArmor/SELinux
sudo aa-status | grep docker

3. Jobs stuck in "Queued":

# Check if runners are picking up jobs
docker service ps github-runners_runner-arm64

# Verify labels match
docker exec <container> cat /home/runner/.runner | jq '.labels'

4. Out of disk space:

# Clean up Docker system
docker system prune -a --volumes

# Clean runner cache
docker volume rm github-runner-docker-cache
docker volume create github-runner-docker-cache

Implementation Checklist

Phase 1: Planning

  • Determine which repositories need self-hosted runners
  • Decide on runner count per architecture
  • Generate GitHub Personal Access Token

Phase 2: Infrastructure

  • Create Docker network
  • Create persistent volumes
  • Verify node labels

Phase 3: Deployment

  • Create .env file with GitHub token
  • Create github-runners-stack.yml
  • Deploy stack to Docker Swarm
  • Verify runners appear in GitHub UI

Phase 4: Testing

  • Create test workflow
  • Run test on ARM64 runner
  • Run test on x86_64 runner
  • Verify Docker builds work
  • Test access to homelab services

Phase 5: Security

  • Enable ephemeral mode
  • Set resource limits
  • Review and restrict permissions
  • Set up network isolation

Phase 6: Operations

  • Create monitoring script
  • Set up log rotation
  • Create backup script
  • Document maintenance procedures

Cost & Resource Analysis

Compared to GitHub-hosted runners:

Feature GitHub Hosted Your Self-Hosted
Cost $0.008/minute Linux Free (electricity)
Minutes 2,000/month free Unlimited
ARM64 Limited Full control
Concurrency 20 jobs Unlimited
Network Internet only Your homelab access

Your Infrastructure Cost:

  • Existing hardware: $0 (already running)
  • Electricity: ~$10-20/month additional load
  • Time: Initial setup ~2-4 hours

Next Steps

  1. Review this plan and decide on your specific use cases
  2. Generate GitHub PAT with repo and admin:org scopes
  3. Start with Phase 1 - Planning
  4. Deploy a single runner first to test before scaling
  5. Iterate based on your workflow needs

Would you like me to help you start with any specific phase, or do you have questions about the architecture? 🚀