Home Network Full Build: 2-VM Tailscale Mesh Architecture for Production Web Hosting

From home lab to production web hosting β€” this project documents a complete infrastructure overhaul, migrating four WordPress sites from a legacy server to a modern 2-VM architecture using Tailscale mesh networking, Docker containerization, and Cloudflare-edge caching. The result? Enterprise-grade reliability on a home lab budget.

πŸ“‹ Table of Contents

  • 1. Project Overview & Architecture
  • 2. The Two-VM Design Philosophy
  • 3. Traffic Flow & Caching Strategy
  • 4. Home VM: Docker Stack Configuration
  • 5. Oracle VM: Edge Cache & TLS Termination
  • 6. Tailscale Mesh Networking
  • 7. Migration Process & Database Strategy
  • 8. Security Considerations
  • 9. Troubleshooting & Lessons Learned
  • 10. Developer Access & Permission Management

1. Project Overview & Architecture

As an IT Project Engineer, I've managed countless infrastructure projects, but this one hit close to homeβ€”literally. The challenge was to migrate four production WordPress sites from an aging server (xanook) to a modern, resilient architecture that could survive home internet outages while maintaining enterprise-grade performance and security.

The solution leverages two virtual machines: a Home VM running on my personal QEMU/KVM host for the application layer, and an Oracle VM using Oracle Cloud's Always Free tier for edge caching and TLS termination. The magic happens through Tailscale, creating a secure WireGuard mesh tunnel between the two VMs that feels like they're on the same LAN, despite being physically separated by thousands of miles.

1.1 Sites Being Migrated

Site Domain Type Notes
Website1 website1.net WordPress Main brand site
TerryGluff Portfolio terrygluff.me WordPress Personal portfolio with video
Website2 website2.com WordPress + Custom Twitch bots, APIs, tournament system
Xanook xanook.com Static + WordPress Hybrid: static root + WP subdirectory

2. The Two-VM Design Philosophy

Why split across two VMs instead of consolidating everything? The answer lies in three critical requirements: resilience, security, and cost efficiency.

2.1 VM Role Summary

VM Location Role Key Services
Home VM Home QEMU/KVM Host Application Layer Traefik, 4Γ— Nginx, 4Γ— PHP-FPM, MariaDB
Oracle VM Oracle Cloud (Free Tier) Edge Layer Caddy TLS, Nginx Cache, Tailscale

🎯 Key Design Decisions

  • Traefik binds only to Tailscale IP β€” The public internet and home LAN cannot reach the application layer directly. Only traffic arriving via the Tailscale tunnel from Oracle VM gets through.
  • Nginx cache with stale-while-revalidate β€” If the home internet goes down, Oracle VM serves cached pages, keeping sites online in read-only mode.
  • Caddy On-Demand TLS β€” Automatic Let's Encrypt certificate provisioning with domain authorization to prevent abuse.

2.2 IP Address Map

VM IP Type Value
Home VM LAN Static -LOCALIP-
Home VM Tailscale -TAILSCALEIP-
Oracle VM Public IP [Oracle-assigned]
Oracle VM Tailscale [Assigned by Tailscale]

3. Traffic Flow & Caching Strategy

Understanding the traffic flow is crucial for debugging and optimization. Here's what happens when a user visits website1.net:

3.1 Normal Request (Cache MISS)

USER BROWSER β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CLOUDFLARE EDGE (Anycast, Free Tier) β”‚ β”‚ β€’ DDoS protection β”‚ β”‚ β€’ SSL/TLS outermost layer β”‚ β”‚ β€’ Mode: Full (Strict) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ HTTPS 443 β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ORACLE VM (Always Free A1 ARM) β”‚ β”‚ β”‚ β”‚ caddy-edge [ports 80, 443] β”‚ β”‚ └─► On-Demand TLS via Let's Encrypt β”‚ β”‚ └─► Reverse proxy β†’ localhost:8080 β”‚ β”‚ β”‚ β”‚ nginx-cache [port 8080] β”‚ β”‚ └─► proxy_cache disk-backed, 72h TTL β”‚ β”‚ └─► CACHE MISS β†’ forward to Home VM β”‚ β”‚ β”‚ β”‚ Tailscale: host OS (tailscale0 interface) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Tailscale WireGuard tunnel β”‚ -TAILSCALEIP-:80 β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ HOME VM (Arch Linux, QEMU/KVM) β”‚ β”‚ LAN: -LOCALIP- Tailscale: -TAILSCALEIP-β”‚ β”‚ β”‚ β”‚ traefik [listens on Tailscale IP only] β”‚ β”‚ └─► Routes by Host header β”‚ β”‚ └─► website1.net β†’ nginx:80 β”‚ β”‚ β”‚ β”‚ website1-nginx + website1-phpfpm β”‚ β”‚ └─► WordPress application β”‚ β”‚ β”‚ β”‚ mariadb-shared (Docker DNS, no host port) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.2 Cache HIT Scenario

When the requested page is already cached, the response is served entirely from Oracle VM without touching the Home VM:

βœ… Cache HIT Flow

User β†’ Cloudflare β†’ Oracle VM Caddy (TLS) β†’ Oracle VM Nginx (CACHE HIT) β†’ User

Result: Sub-100ms response times, no load on home infrastructure, sites remain fast even during home maintenance windows.

3.3 Home Internet DOWN (Stale Serve)

This is where the architecture truly shines. When the home internet connection drops:

⚠️ Outage Resilience

User β†’ Cloudflare β†’ Oracle VM Caddy β†’ Oracle VM Nginx

β†’ Tailscale tunnel TIMEOUT

β†’ proxy_cache_use_stale kicks in

β†’ Oracle VM serves last cached version

Result: Sites remain online in read-only mode. Visitors see cached content, no 502 errors, no SEO impact from downtime.

4. Home VM: Docker Stack Configuration

The Home VM runs a complete Dockerized stack with Traefik as the reverse proxy, individual Nginx containers per site, PHP-FPM for WordPress processing, and a shared MariaDB instance.

4.1 Directory Structure

The design separates configuration from runtime data:

  • /opt/homevm/ β€” Configuration files (docker-compose, nginx configs, traefik configs) β€” text-based, version-controllable
  • /DATA/homevm/ β€” Runtime data (database files, WordPress uploads, site content) β€” large, binary, backed up regularly

4.2 Traefik Configuration

Traefik serves as the intelligent reverse proxy, routing requests based on the Host header to the appropriate Nginx container:

# traefik.yml - Key Configuration

entryPoints:
  web:
    address: "-TAILSCALEIP-:80"  # Tailscale IP only!
  traefik-dashboard:
    address: "-LOCALIP-:8080"    # LAN only

api:
  dashboard: true
  insecure: true

providers:
  docker:
    exposedByDefault: false
    network: web_backend

πŸ” Security Property

Traefik binds its web entrypoint only to -TAILSCALEIP-:80 (Tailscale IP). The public internet and home LAN cannot reach Traefik directly. Only traffic arriving via the Tailscale tunnel from the Oracle VM can reach it.

4.3 Per-Site Nginx Configuration

Each site has its own Nginx container with a tailored configuration. Key differences from the Apache setup on the old server:

Aspect Apache (xanook) Nginx (Docker)
WP Permalink Rewriting .htaccess mod_rewrite try_files in nginx.conf
SSL/TLS Certbot on server Caddy on Oracle VM
Document Root /var/www/sitename/html/ /DATA/homevm/sitename/html/
DB_HOST localhost mariadb-shared

4.4 Docker Compose Stack

The complete stack comprises 10 containers: 1 Traefik, 1 MariaDB, 4 Nginx, and 4 PHP-FPM instances:

# docker-compose.yml structure

networks:
  web_backend:
    driver: bridge

services:
  traefik:        # Reverse proxy (Tailscale IP only)
  mariadb-shared: # Shared database (no host port)
  
  # 4 sites Γ— 2 containers each:
  website1-nginx + website1-phpfpm
  terrygluff-nginx + terrygluff-phpfpm
  website2-nginx + website2-phpfpm
  xanook-nginx + xanook-phpfpm

5. Oracle VM: Edge Cache & TLS Termination

The Oracle VM serves as the public-facing edge, handling TLS termination with automatic Let's Encrypt certificates via Caddy, and caching responses with Nginx.

5.1 Why network_mode: host?

All three Oracle VM containers use network_mode: host because Docker bridge networks cannot reach the host's tailscale0 interface. With host networking:

  • Caddy reaches nginx-cache at localhost:8080
  • Nginx-cache connects to Home VM at -TAILSCALEIP-:80
  • Caddy queries auth-check at localhost:9123

5.2 Caddy On-Demand TLS

Caddy's On-Demand TLS automatically provisions certificates for authorized domains:

# Caddyfile

{
    email {$CADDY_EMAIL}
    on_demand_tls {
        ask      http://localhost:9123/check
        interval 2m
        burst    5
    }
}

https:// {
    tls {
        on_demand
    }
    reverse_proxy localhost:8080
}

The ask directive points to a Python-based authorization server that checks requests against a whitelist of allowed domains, preventing certificate abuse.

5.3 Nginx Cache Configuration

The caching layer is the heart of the resilience strategy:

# nginx.conf key directives

proxy_cache_path /var/cache/nginx
    levels=1:2
    keys_zone=edge_cache:50m
    max_size=8g
    inactive=72h;

# Serve stale content if origin is down
proxy_cache_use_stale  error timeout updating
    http_500 http_502 http_503 http_504;

# Bypass cache for logged-in users
proxy_cache_bypass $cookie_wordpress_logged_in;
proxy_no_cache     $cookie_wordpress_logged_in;

6. Tailscale Mesh Networking

Tailscale provides the secure communication backbone between the two VMs. Running on the host OS (not in Docker), it creates a WireGuard mesh tunnel that makes the Home VM accessible from Oracle VM as if they were on the same LAN.

6.1 Installation

# On both VMs
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# Verify connection
tailscale status
tailscale ip -4

⚠️ Important: Disable Key Expiry

By default, Tailscale keys expire every 180 days. Disable this in the admin panel at https://login.tailscale.com/admin/machines for both VMs, or the tunnel will drop unexpectedly.

6.2 Verify Tunnel Connectivity

Before proceeding with the full stack, verify the tunnel works:

# From Oracle VM
ping -TAILSCALEIP-

curl -H "Host: website1.net" http://-TAILSCALEIP-/
# Should return WordPress HTML (HTTP 200)

7. Migration Process & Database Strategy

The migration involved moving both database content and filesystem data from the legacy xanook server to the new Dockerized infrastructure.

7.1 Database Migration

Each site's database was exported with proper charset handling, transferred to Home VM, and imported into the new MariaDB container:

# Export from xanook
mysqldump --single-transaction --routines --triggers \
  --default-character-set=utf8mb4 \
  soun3519224418 > db_website1.sql

# Transfer and import
rsync -avz terry@xanook:/tmp/db_exports/ /DATA/homevm/db_imports/
docker exec -i mariadb-shared mariadb -u root -p db_website1 < db_website1.sql

πŸ”„ Charset Upgrade

The website1 database was originally utf8 (3-byte). The dump was converted to utf8mb4 before import using sed replacements, ensuring full Unicode support including emojis.

7.2 File Migration with rsync

# Sync WordPress files
rsync -avz --progress \
  terry@xanook:/var/www/website1.net/html/ \
  /DATA/homevm/website1/html/

# Update wp-config.php
define('DB_HOST', 'mariadb-shared');
define('FS_METHOD', 'direct');
# Remove FORCE_SSL and FTP defines

8. Security Considerations

Security was a primary driver for this architecture. Here's the layered approach:

8.1 Network Security

  • Traefik accessible only via Tailscale IP (not exposed to LAN or internet)
  • MariaDB has no host port mapping β€” Docker DNS only
  • Tailscale provides WireGuard encryption for all inter-VM traffic
  • Cloudflare provides DDoS protection and hides Oracle public IP
  • SSH key authentication for all administrative access

8.2 Application Security

  • Security headers in Nginx (X-Frame-Options, X-Content-Type-Options, Referrer-Policy)
  • PHP execution blocked in uploads directories
  • Hidden files (.htaccess, .env) access denied
  • WordPress admin URLs bypass cache (no stale admin panels)

9. Troubleshooting & Lessons Learned

Every infrastructure project has its challenges. Here are the key issues encountered and their resolutions:

9.1 WordPress Update Failures

Problem

WordPress could not update plugins, themes, or core. Errors like "Could not create directory" appeared in the admin panel.

Root Cause

PHP-FPM runs as UID 82 (www-data) inside Alpine containers. Ubuntu ARM host has HTTP user at 82. Files migrated from xanook retained original ownership (terry or root), preventing WordPress from writing.

Solution

Apply Shared Group + SGID permissions. Set ownership to 82:82, directories to 2775, files to 664. Add the FS_CHMOD directives to wp-config.php to ensure new files have correct permissions.

9.2 Traefik Dashboard 404

The dashboard URL requires a trailing slash: http://-LOCALIP-:8080/dashboard/ β€” without it, you get a 404.

9.3 Cache Bypass for Logged-In Users

Without cache bypass, logged-in WordPress users would see cached versions of pages, breaking admin functionality. The fix was adding proxy_cache_bypass $cookie_wordpress_logged_in to the nginx cache configuration.

10. Developer Access & Permission Management

A crucial requirement was enabling web developers to access and modify site files without requiring sudo privileges or compromising security.

10.1 Shared Group + SGID Strategy

The permission strategy allows both Docker containers (UID 82) and developers to collaborate seamlessly:

  • Owner: UID 82 (PHP-FPM container)
  • Group: GID 82 (webdevs group on host)
  • Permissions: 2775 for directories, 664 for files
  • SGID Bit: Ensures new files inherit the parent directory's group

10.2 SSH Jail for Developers

Developers access their assigned site via SFTP only, with a chroot jail limiting them to their site directory:

# /etc/ssh/sshd_config

Match User dev_website1
    ChrootDirectory /DATA/webvm/website1/jail
    ForceCommand internal-sftp
    AllowTcpForwarding no
    X11Forwarding no
    PasswordAuthentication yes

βœ… Benefits

  • WordPress can update itself (plugins, themes, core)
  • Developers can edit files without sudo
  • Secure chroot jail limits developer access
  • No permission conflicts between container and host
  • SGID ensures consistent permissions on new files

Conclusion

This project transformed a legacy single-server WordPress deployment into a modern, resilient, and secure infrastructure. The 2-VM architecture with Tailscale mesh networking provides enterprise-grade reliability on a home lab budget, with the Oracle Cloud free tier handling edge caching and TLS termination.

The key takeaway is that infrastructure doesn't need to be expensive to be resilient. By leveraging free tier cloud services, open-source tools, and thoughtful architecture, it's possible to achieve high availability and security without enterprise budgets.

πŸ† Project Summary

  • 4 WordPress sites migrated from legacy Apache to Dockerized Nginx
  • 2-VM architecture with Tailscale mesh networking
  • Cloudflare + Oracle VM edge caching for resilience
  • Sites remain online during home internet outages (stale-serve)
  • Secure developer access via SSH chroot jail
  • Complete automation with Docker Compose

What's next? Future improvements include implementing automated backups, adding monitoring with Prometheus/Grafana, and exploring container orchestration with Docker Swarm for even higher availability.