Files

Bendt 0d710e82ee Add comprehensive GitHub Actions self-hosted runners deployment plan

2026-02-16 12:24:33 -05:00

18 KiB

Raw Blame History

GitHub Actions Self-Hosted Runners Deployment Plan

Overview

Goal: Deploy GitHub Actions self-hosted runners on your Docker Swarm cluster to run CI/CD workflows with unlimited minutes, custom environments, and access to your homelab resources.

Architecture: Docker-based runners deployed as a Swarm service with auto-scaling capabilities.

Architecture Decision

Option 1: Docker Container Runners (Recommended for your setup)

✅ Runs in Docker containers on your existing cluster
✅ Scales horizontally by adding/removing containers
✅ Uses your existing infrastructure (tpi-n1, tpi-n2, node-nas)
✅ Easy to manage through Docker Swarm
✅ ARM64 and x86_64 support for multi-arch builds

Option 2: VM/Physical Runners (Alternative)

Runners installed directly on VMs or bare metal
More isolated but harder to manage
Not recommended for your containerized setup

Decision: Use Docker Container Runners (Option 1) with multi-arch support.

Deployment Architecture

GitHub Repository
        │
        │ Webhook/REST API
        ▼
┌─────────────────────────────┐
│  GitHub Actions Service     │
└─────────────────────────────┘
        │
        │ Job Request
        ▼
┌─────────────────────────────┐
│  Your Docker Swarm Cluster  │
│                             │
│  ┌─────────────────────┐   │
│  │  Runner Service     │   │
│  │  (Multiple Replicas)│   │
│  │                     │   │
│  │  ┌─────┐ ┌─────┐   │   │
│  │  │ ARM │ │x86_64│   │   │
│  │  │64   │ │     │   │   │
│  │  └─────┘ └─────┘   │   │
│  └─────────────────────┘   │
│                             │
│  ┌─────────────────────┐   │
│  │  Docker-in-Docker   │   │
│  │  (for Docker builds)│   │
│  └─────────────────────┘   │
│                             │
└─────────────────────────────┘

Phase 1: Planning & Preparation

Step 1: Determine Requirements

Use Cases:

Build and test applications
Deploy to your homelab (Kubernetes/Docker Swarm)
Run ARM64 builds (for Raspberry Pi/ARM apps)
Run x86_64 builds (standard applications)
Access private network resources (databases, internal APIs)
Build Docker images and push to your Gitea registry

Resource Requirements per Runner:

CPU: 2+ cores recommended
Memory: 4GB+ RAM per runner
Disk: 20GB+ for workspace and Docker layers
Network: Outbound HTTPS to GitHub

Current Cluster Capacity:

tpi-n1: 8 cores ARM64, 8GB RAM (Manager)
tpi-n2: 8 cores ARM64, 8GB RAM (Worker)
node-nas: 2 cores x86_64, 8GB RAM (Storage)

Recommended Allocation:

2 runners on tpi-n1 (ARM64)
2 runners on tpi-n2 (ARM64)
1 runner on node-nas (x86_64)

Step 2: GitHub Configuration

Choose Runner Level:

Repository-level - Dedicated to specific repo (recommended to start)
Organization-level - Shared across org repos
Enterprise-level - Shared across enterprise

For your use case: Start with repository-level runners, then expand to organization-level if needed.

Required GitHub Settings:

Go to: Settings > Actions > Runners > New self-hosted runner
Note the Registration Token (expires after 1 hour)
Note the Runner Group (default: "Default")
Configure labels (e.g., homelab, arm64, x86_64, self-hosted)

Phase 2: Infrastructure Setup

Step 3: Create Docker Network

# On controller (tpi-n1)
ssh ubuntu@192.168.2.130

# Create overlay network for runners
docker network create --driver overlay --attachable github-runners-network

# Verify
docker network ls | grep github

Step 4: Create Persistent Storage

# Create volume for runner cache (shared across runners)
docker volume create github-runner-cache

# Create volume for Docker build cache
docker volume create github-runner-docker-cache

Step 5: Prepare Node Labels

# Verify node labels
ssh ubuntu@192.168.2.130
docker node ls --format '{{.Hostname}} {{.Labels}}'

# Expected output:
# tpi-n1      map[infra:true role:storage storage:high]
# tpi-n2      map[role:compute]
# node-nas    map[type:nas]

# Add architecture labels if missing:
docker node update --label-add arch=arm64 tpi-n1
docker node update --label-add arch=arm64 tpi-n2
docker node update --label-add arch=x86_64 node-nas

Phase 3: Runner Deployment

Step 6: Create Environment File

Create .env file:

# GitHub Configuration
GITHUB_TOKEN=your_github_personal_access_token
GITHUB_OWNER=your-github-username-or-org
GITHUB_REPO=your-repository-name  # Leave empty for org-level

# Runner Configuration
RUNNER_NAME_PREFIX=homelab
RUNNER_LABELS=self-hosted,homelab,linux
RUNNER_GROUP=Default

# Docker Configuration
DOCKER_TLS_CERTDIR=/certs

# Optional: Pre-installed tools
PRE_INSTALL_TOOLS="docker-compose,nodejs,npm,yarn,python3,pip,git"

Step 7: Create Docker Compose Stack

Create github-runners-stack.yml:

version: "3.8"

services:
  # ARM64 Runners
  runner-arm64:
    image: myoung34/github-runner:latest
    environment:
      - ACCESS_TOKEN=${GITHUB_TOKEN}
      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-arm64-{{.Task.Slot}}
      - RUNNER_WORKDIR=/tmp/runner-work
      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
      - RUNNER_SCOPE=repo
      - LABELS=${RUNNER_LABELS},arm64
      - DISABLE_AUTO_UPDATE=true
      - EPHEMERAL=true  # One job per container
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - github-runner-cache:/home/runner/cache
      - github-runner-docker-cache:/var/lib/docker
    networks:
      - github-runners-network
      - dokploy-network
    deploy:
      mode: replicated
      replicas: 2
      placement:
        constraints:
          - node.labels.arch == arm64
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
    privileged: true  # Required for Docker-in-Docker

  # x86_64 Runners
  runner-x86_64:
    image: myoung34/github-runner:latest
    environment:
      - ACCESS_TOKEN=${GITHUB_TOKEN}
      - REPO_URL=https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}
      - RUNNER_NAME=${RUNNER_NAME_PREFIX}-x86_64-{{.Task.Slot}}
      - RUNNER_WORKDIR=/tmp/runner-work
      - RUNNER_GROUP=${RUNNER_GROUP:-Default}
      - RUNNER_SCOPE=repo
      - LABELS=${RUNNER_LABELS},x86_64
      - DISABLE_AUTO_UPDATE=true
      - EPHEMERAL=true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - github-runner-cache:/home/runner/cache
      - github-runner-docker-cache:/var/lib/docker
    networks:
      - github-runners-network
      - dokploy-network
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.labels.arch == x86_64
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
    privileged: true

  # Optional: Runner Autoscaler
  autoscaler:
    image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
    environment:
      - GITHUB_TOKEN=${GITHUB_TOKEN}
      - RUNNER_SCOPE=repo
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - github-runners-network
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager

volumes:
  github-runner-cache:
  github-runner-docker-cache:

networks:
  github-runners-network:
    driver: overlay
  dokploy-network:
    external: true

Step 8: Deploy Runners

# Copy files to controller
scp github-runners-stack.yml ubuntu@192.168.2.130:~/
scp .env ubuntu@192.168.2.130:~/

# SSH to controller
ssh ubuntu@192.168.2.130

# Load environment
set -a && source .env && set +a

# Deploy stack
docker stack deploy -c github-runners-stack.yml github-runners

# Verify deployment
docker stack ps github-runners
docker service ls | grep github

Phase 4: GitHub Integration

Step 9: Verify Runners in GitHub

Go to: https://github.com/[OWNER]/[REPO]/settings/actions/runners
You should see your runners listed as "Idle"
Labels should show: self-hosted, homelab, linux, arm64 or x86_64

Step 10: Test with Sample Workflow

Create .github/workflows/test-self-hosted.yml:

name: Test Self-Hosted Runners

on:
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  test-arm64:
    runs-on: [self-hosted, homelab, arm64]
    steps:
      - uses: actions/checkout@v4
      
      - name: Show runner info
        run: |
          echo "Architecture: $(uname -m)"
          echo "OS: $(uname -s)"
          echo "Node: $(hostname)"
          echo "CPU: $(nproc)"
          echo "Memory: $(free -h | grep Mem)"
      
      - name: Test Docker
        run: |
          docker --version
          docker info
          docker run --rm hello-world

  test-x86_64:
    runs-on: [self-hosted, homelab, x86_64]
    steps:
      - uses: actions/checkout@v4
      
      - name: Show runner info
        run: |
          echo "Architecture: $(uname -m)"
          echo "OS: $(uname -s)"
          echo "Node: $(hostname)"
      
      - name: Test access to homelab
        run: |
          # Test connectivity to your services
          curl -s http://gitea.bendtstudio.com:3000 || echo "Gitea not accessible"
          curl -s http://192.168.2.130:3000 || echo "Dokploy not accessible"

Phase 5: Security Hardening

Step 11: Implement Security Best Practices

1. Use Short-Lived Tokens:

# Generate a GitHub App instead of PAT for better security
# Or use OpenID Connect (OIDC) for authentication

2. Restrict Runner Permissions:

# Add to workflow
jobs:
  build:
    runs-on: [self-hosted, homelab]
    permissions:
      contents: read
      packages: write  # Only if pushing to registry

3. Network Isolation:

# Modify stack to use isolated network
networks:
  github-runners-network:
    driver: overlay
    internal: true  # No external access except through proxy

4. Resource Limits:

# Add to service definition in stack
deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '1'
      memory: 2G

Step 12: Enable Ephemeral Mode

Ephemeral runners (already configured with EPHEMERAL=true) provide better security:

Each runner handles only one job
Container is destroyed after job completion
Fresh environment for every build
Prevents credential leakage between jobs

Phase 6: Monitoring & Maintenance

Step 13: Set Up Monitoring

Create monitoring script (monitor-runners.sh):

#!/bin/bash

# Check runner status
echo "=== Docker Service Status ==="
docker service ls | grep github-runner

echo -e "\n=== Runner Containers ==="
docker ps --filter name=github-runner --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

echo -e "\n=== Recent Logs ==="
docker service logs github-runners_runner-arm64 --tail 50
docker service logs github-runners_runner-x86_64 --tail 50

echo -e "\n=== Resource Usage ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep github-runner

Create cron job for monitoring:

# Add to crontab
crontab -e

# Check runner health every 5 minutes
*/5 * * * * /home/ubuntu/github-runners/monitor-runners.sh >> /var/log/github-runners.log 2>&1

Step 14: Set Up Log Rotation

# Create logrotate config
sudo tee /etc/logrotate.d/github-runners << EOF
/var/log/github-runners.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 644 ubuntu ubuntu
}
EOF

Step 15: Backup Strategy

# Create backup script
#!/bin/bash
BACKUP_DIR="/backup/github-runners/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup configuration
cp ~/github-runners-stack.yml "$BACKUP_DIR/"
cp ~/.env "$BACKUP_DIR/"

# Backup volumes
docker run --rm -v github-runner-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/runner-cache.tar.gz -C /data .
docker run --rm -v github-runner-docker-cache:/data -v "$BACKUP_DIR":/backup alpine tar czf /backup/docker-cache.tar.gz -C /data .

echo "Backup completed: $BACKUP_DIR"

Phase 7: Advanced Configuration

Step 16: Cache Optimization

Mount host cache directories:

volumes:
  - /home/ubuntu/.cache/npm:/root/.npm
  - /home/ubuntu/.cache/pip:/root/.cache/pip
  - /home/ubuntu/.cache/go-build:/root/.cache/go-build
  - /home/ubuntu/.cargo:/root/.cargo

Pre-install common tools in custom image (Dockerfile.runner):

FROM myoung34/github-runner:latest

# Install common build tools
RUN apt-get update && apt-get install -y \
    build-essential \
    nodejs \
    npm \
    python3 \
    python3-pip \
    golang-go \
    openjdk-17-jdk \
    maven \
    gradle \
    && rm -rf /var/lib/apt/lists/*

# Install Docker Compose
RUN curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" \
    -o /usr/local/bin/docker-compose && \
    chmod +x /usr/local/bin/docker-compose

# Pre-pull common images
RUN docker pull node:lts-alpine
RUN docker pull python:3.11-slim

Build and use custom image:

docker build -t your-registry/github-runner:custom -f Dockerfile.runner .
docker push your-registry/github-runner:custom

# Update stack to use custom image

Step 17: Autoscaling Configuration

Use Actions Runner Controller (ARC) for Kubernetes-style autoscaling:

# Add to stack
autoscaler:
  image: ghcr.io/actions-runner-controller/actions-runner-controller:latest
  environment:
    - GITHUB_TOKEN=${GITHUB_TOKEN}
    - GITHUB_APP_ID=${GITHUB_APP_ID}
    - GITHUB_APP_INSTALLATION_ID=${GITHUB_APP_INSTALLATION_ID}
    - GITHUB_APP_PRIVATE_KEY=/etc/gh-app-key/private-key.pem
  volumes:
    - /path/to/private-key.pem:/etc/gh-app-key/private-key.pem:ro
    - /var/run/docker.sock:/var/run/docker.sock
  deploy:
    mode: replicated
    replicas: 1
    placement:
      constraints:
        - node.role == manager

Step 18: Multi-Repository Setup

For organization-level runners, update environment:

# For org-level
RUNNER_SCOPE=org
ORG_NAME=your-organization

# Remove REPO_URL, use:
ORG_URL=https://github.com/${ORG_NAME}

Phase 8: Troubleshooting Guide

Common Issues & Solutions

1. Runner shows "Offline" in GitHub:

# Check logs
docker service logs github-runners_runner-arm64

# Common causes:
# - Expired token (regenerate in GitHub settings)
# - Network connectivity issue
docker exec <container> curl -I https://github.com

# Restart service
docker service update --force github-runners_runner-arm64

2. Docker-in-Docker not working:

# Ensure privileged mode is enabled
# Check Docker socket is mounted
docker exec <container> docker ps

# If failing, check AppArmor/SELinux
sudo aa-status | grep docker

3. Jobs stuck in "Queued":

# Check if runners are picking up jobs
docker service ps github-runners_runner-arm64

# Verify labels match
docker exec <container> cat /home/runner/.runner | jq '.labels'

4. Out of disk space:

# Clean up Docker system
docker system prune -a --volumes

# Clean runner cache
docker volume rm github-runner-docker-cache
docker volume create github-runner-docker-cache

Implementation Checklist

Phase 1: Planning

Determine which repositories need self-hosted runners
Decide on runner count per architecture
Generate GitHub Personal Access Token

Phase 2: Infrastructure

Create Docker network
Create persistent volumes
Verify node labels

Phase 3: Deployment

Create .env file with GitHub token
Create github-runners-stack.yml
Deploy stack to Docker Swarm
Verify runners appear in GitHub UI

Phase 4: Testing

Create test workflow
Run test on ARM64 runner
Run test on x86_64 runner
Verify Docker builds work
Test access to homelab services

Phase 5: Security

Enable ephemeral mode
Set resource limits
Review and restrict permissions
Set up network isolation

Phase 6: Operations

Create monitoring script
Set up log rotation
Create backup script
Document maintenance procedures

Cost & Resource Analysis

Compared to GitHub-hosted runners:

Feature	GitHub Hosted	Your Self-Hosted
Cost	$0.008/minute Linux	Free (electricity)
Minutes	2,000/month free	Unlimited
ARM64	Limited	Full control
Concurrency	20 jobs	Unlimited
Network	Internet only	Your homelab access

Your Infrastructure Cost:

Existing hardware: $0 (already running)
Electricity: ~$10-20/month additional load
Time: Initial setup ~2-4 hours

Next Steps

Review this plan and decide on your specific use cases
Generate GitHub PAT with repo and admin:org scopes
Start with Phase 1 - Planning
Deploy a single runner first to test before scaling
Iterate based on your workflow needs

Would you like me to help you start with any specific phase, or do you have questions about the architecture? 🚀

18 KiB Raw Blame History