Home Network Full Build: 2-VM Tailscale Mesh Architecture for Production Web Hosting
From home lab to production web hosting β this project documents a complete infrastructure overhaul, migrating four WordPress sites from a legacy server to a modern 2-VM architecture using Tailscale mesh networking, Docker containerization, and Cloudflare-edge caching. The result? Enterprise-grade reliability on a home lab budget.
π Table of Contents
- 1. Project Overview & Architecture
- 2. The Two-VM Design Philosophy
- 3. Traffic Flow & Caching Strategy
- 4. Home VM: Docker Stack Configuration
- 5. Oracle VM: Edge Cache & TLS Termination
- 6. Tailscale Mesh Networking
- 7. Migration Process & Database Strategy
- 8. Security Considerations
- 9. Troubleshooting & Lessons Learned
- 10. Developer Access & Permission Management
1. Project Overview & Architecture
As an IT Project Engineer, I've managed countless infrastructure projects, but this one hit close to homeβliterally. The challenge was to migrate four production WordPress sites from an aging server (xanook) to a modern, resilient architecture that could survive home internet outages while maintaining enterprise-grade performance and security.
The solution leverages two virtual machines: a Home VM running on my personal QEMU/KVM host for the application layer, and an Oracle VM using Oracle Cloud's Always Free tier for edge caching and TLS termination. The magic happens through Tailscale, creating a secure WireGuard mesh tunnel between the two VMs that feels like they're on the same LAN, despite being physically separated by thousands of miles.
1.1 Sites Being Migrated
| Site | Domain | Type | Notes |
|---|---|---|---|
| Website1 | website1.net | WordPress | Main brand site |
| TerryGluff Portfolio | terrygluff.me | WordPress | Personal portfolio with video |
| Website2 | website2.com | WordPress + Custom | Twitch bots, APIs, tournament system |
| Xanook | xanook.com | Static + WordPress | Hybrid: static root + WP subdirectory |
2. The Two-VM Design Philosophy
Why split across two VMs instead of consolidating everything? The answer lies in three critical requirements: resilience, security, and cost efficiency.
2.1 VM Role Summary
| VM | Location | Role | Key Services |
|---|---|---|---|
| Home VM | Home QEMU/KVM Host | Application Layer | Traefik, 4Γ Nginx, 4Γ PHP-FPM, MariaDB |
| Oracle VM | Oracle Cloud (Free Tier) | Edge Layer | Caddy TLS, Nginx Cache, Tailscale |
π― Key Design Decisions
- Traefik binds only to Tailscale IP β The public internet and home LAN cannot reach the application layer directly. Only traffic arriving via the Tailscale tunnel from Oracle VM gets through.
- Nginx cache with stale-while-revalidate β If the home internet goes down, Oracle VM serves cached pages, keeping sites online in read-only mode.
- Caddy On-Demand TLS β Automatic Let's Encrypt certificate provisioning with domain authorization to prevent abuse.
2.2 IP Address Map
| VM | IP Type | Value |
|---|---|---|
| Home VM | LAN Static | -LOCALIP- |
| Home VM | Tailscale | -TAILSCALEIP- |
| Oracle VM | Public IP | [Oracle-assigned] |
| Oracle VM | Tailscale | [Assigned by Tailscale] |
3. Traffic Flow & Caching Strategy
Understanding the traffic flow is crucial for debugging and optimization. Here's what happens when a user visits website1.net:
3.1 Normal Request (Cache MISS)
3.2 Cache HIT Scenario
When the requested page is already cached, the response is served entirely from Oracle VM without touching the Home VM:
β Cache HIT Flow
User β Cloudflare β Oracle VM Caddy (TLS) β Oracle VM Nginx (CACHE HIT) β User
Result: Sub-100ms response times, no load on home infrastructure, sites remain fast even during home maintenance windows.
3.3 Home Internet DOWN (Stale Serve)
This is where the architecture truly shines. When the home internet connection drops:
β οΈ Outage Resilience
User β Cloudflare β Oracle VM Caddy β Oracle VM Nginx
β Tailscale tunnel TIMEOUT
β proxy_cache_use_stale kicks in
β Oracle VM serves last cached version
Result: Sites remain online in read-only mode. Visitors see cached content, no 502 errors, no SEO impact from downtime.
4. Home VM: Docker Stack Configuration
The Home VM runs a complete Dockerized stack with Traefik as the reverse proxy, individual Nginx containers per site, PHP-FPM for WordPress processing, and a shared MariaDB instance.
4.1 Directory Structure
The design separates configuration from runtime data:
/opt/homevm/β Configuration files (docker-compose, nginx configs, traefik configs) β text-based, version-controllable/DATA/homevm/β Runtime data (database files, WordPress uploads, site content) β large, binary, backed up regularly
4.2 Traefik Configuration
Traefik serves as the intelligent reverse proxy, routing requests based on the Host header to the appropriate Nginx container:
# traefik.yml - Key Configuration
entryPoints:
web:
address: "-TAILSCALEIP-:80" # Tailscale IP only!
traefik-dashboard:
address: "-LOCALIP-:8080" # LAN only
api:
dashboard: true
insecure: true
providers:
docker:
exposedByDefault: false
network: web_backend
π Security Property
Traefik binds its web entrypoint only to -TAILSCALEIP-:80 (Tailscale IP). The public internet and home LAN cannot reach Traefik directly. Only traffic arriving via the Tailscale tunnel from the Oracle VM can reach it.
4.3 Per-Site Nginx Configuration
Each site has its own Nginx container with a tailored configuration. Key differences from the Apache setup on the old server:
| Aspect | Apache (xanook) | Nginx (Docker) |
|---|---|---|
| WP Permalink Rewriting | .htaccess mod_rewrite | try_files in nginx.conf |
| SSL/TLS | Certbot on server | Caddy on Oracle VM |
| Document Root | /var/www/sitename/html/ | /DATA/homevm/sitename/html/ |
| DB_HOST | localhost | mariadb-shared |
4.4 Docker Compose Stack
The complete stack comprises 10 containers: 1 Traefik, 1 MariaDB, 4 Nginx, and 4 PHP-FPM instances:
# docker-compose.yml structure
networks:
web_backend:
driver: bridge
services:
traefik: # Reverse proxy (Tailscale IP only)
mariadb-shared: # Shared database (no host port)
# 4 sites Γ 2 containers each:
website1-nginx + website1-phpfpm
terrygluff-nginx + terrygluff-phpfpm
website2-nginx + website2-phpfpm
xanook-nginx + xanook-phpfpm
5. Oracle VM: Edge Cache & TLS Termination
The Oracle VM serves as the public-facing edge, handling TLS termination with automatic Let's Encrypt certificates via Caddy, and caching responses with Nginx.
5.1 Why network_mode: host?
All three Oracle VM containers use network_mode: host because Docker bridge networks cannot reach the host's tailscale0 interface. With host networking:
- Caddy reaches nginx-cache at
localhost:8080 - Nginx-cache connects to Home VM at
-TAILSCALEIP-:80 - Caddy queries auth-check at
localhost:9123
5.2 Caddy On-Demand TLS
Caddy's On-Demand TLS automatically provisions certificates for authorized domains:
# Caddyfile
{
email {$CADDY_EMAIL}
on_demand_tls {
ask http://localhost:9123/check
interval 2m
burst 5
}
}
https:// {
tls {
on_demand
}
reverse_proxy localhost:8080
}
The ask directive points to a Python-based authorization server that checks requests against a whitelist of allowed domains, preventing certificate abuse.
5.3 Nginx Cache Configuration
The caching layer is the heart of the resilience strategy:
# nginx.conf key directives
proxy_cache_path /var/cache/nginx
levels=1:2
keys_zone=edge_cache:50m
max_size=8g
inactive=72h;
# Serve stale content if origin is down
proxy_cache_use_stale error timeout updating
http_500 http_502 http_503 http_504;
# Bypass cache for logged-in users
proxy_cache_bypass $cookie_wordpress_logged_in;
proxy_no_cache $cookie_wordpress_logged_in;
6. Tailscale Mesh Networking
Tailscale provides the secure communication backbone between the two VMs. Running on the host OS (not in Docker), it creates a WireGuard mesh tunnel that makes the Home VM accessible from Oracle VM as if they were on the same LAN.
6.1 Installation
# On both VMs
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Verify connection
tailscale status
tailscale ip -4
β οΈ Important: Disable Key Expiry
By default, Tailscale keys expire every 180 days. Disable this in the admin panel at https://login.tailscale.com/admin/machines for both VMs, or the tunnel will drop unexpectedly.
6.2 Verify Tunnel Connectivity
Before proceeding with the full stack, verify the tunnel works:
# From Oracle VM
ping -TAILSCALEIP-
curl -H "Host: website1.net" http://-TAILSCALEIP-/
# Should return WordPress HTML (HTTP 200)
7. Migration Process & Database Strategy
The migration involved moving both database content and filesystem data from the legacy xanook server to the new Dockerized infrastructure.
7.1 Database Migration
Each site's database was exported with proper charset handling, transferred to Home VM, and imported into the new MariaDB container:
# Export from xanook
mysqldump --single-transaction --routines --triggers \
--default-character-set=utf8mb4 \
soun3519224418 > db_website1.sql
# Transfer and import
rsync -avz terry@xanook:/tmp/db_exports/ /DATA/homevm/db_imports/
docker exec -i mariadb-shared mariadb -u root -p db_website1 < db_website1.sql
π Charset Upgrade
The website1 database was originally utf8 (3-byte). The dump was converted to utf8mb4 before import using sed replacements, ensuring full Unicode support including emojis.
7.2 File Migration with rsync
# Sync WordPress files
rsync -avz --progress \
terry@xanook:/var/www/website1.net/html/ \
/DATA/homevm/website1/html/
# Update wp-config.php
define('DB_HOST', 'mariadb-shared');
define('FS_METHOD', 'direct');
# Remove FORCE_SSL and FTP defines
8. Security Considerations
Security was a primary driver for this architecture. Here's the layered approach:
8.1 Network Security
- Traefik accessible only via Tailscale IP (not exposed to LAN or internet)
- MariaDB has no host port mapping β Docker DNS only
- Tailscale provides WireGuard encryption for all inter-VM traffic
- Cloudflare provides DDoS protection and hides Oracle public IP
- SSH key authentication for all administrative access
8.2 Application Security
- Security headers in Nginx (X-Frame-Options, X-Content-Type-Options, Referrer-Policy)
- PHP execution blocked in uploads directories
- Hidden files (.htaccess, .env) access denied
- WordPress admin URLs bypass cache (no stale admin panels)
9. Troubleshooting & Lessons Learned
Every infrastructure project has its challenges. Here are the key issues encountered and their resolutions:
9.1 WordPress Update Failures
Problem
WordPress could not update plugins, themes, or core. Errors like "Could not create directory" appeared in the admin panel.
Root Cause
PHP-FPM runs as UID 82 (www-data) inside Alpine containers. Ubuntu ARM host has HTTP user at 82. Files migrated from xanook retained original ownership (terry or root), preventing WordPress from writing.
Solution
Apply Shared Group + SGID permissions. Set ownership to 82:82, directories to 2775, files to 664. Add the FS_CHMOD directives to wp-config.php to ensure new files have correct permissions.
9.2 Traefik Dashboard 404
The dashboard URL requires a trailing slash: http://-LOCALIP-:8080/dashboard/ β without it, you get a 404.
9.3 Cache Bypass for Logged-In Users
Without cache bypass, logged-in WordPress users would see cached versions of pages, breaking admin functionality. The fix was adding proxy_cache_bypass $cookie_wordpress_logged_in to the nginx cache configuration.
10. Developer Access & Permission Management
A crucial requirement was enabling web developers to access and modify site files without requiring sudo privileges or compromising security.
10.1 Shared Group + SGID Strategy
The permission strategy allows both Docker containers (UID 82) and developers to collaborate seamlessly:
- Owner: UID 82 (PHP-FPM container)
- Group: GID 82 (webdevs group on host)
- Permissions: 2775 for directories, 664 for files
- SGID Bit: Ensures new files inherit the parent directory's group
10.2 SSH Jail for Developers
Developers access their assigned site via SFTP only, with a chroot jail limiting them to their site directory:
# /etc/ssh/sshd_config
Match User dev_website1
ChrootDirectory /DATA/webvm/website1/jail
ForceCommand internal-sftp
AllowTcpForwarding no
X11Forwarding no
PasswordAuthentication yes
β Benefits
- WordPress can update itself (plugins, themes, core)
- Developers can edit files without sudo
- Secure chroot jail limits developer access
- No permission conflicts between container and host
- SGID ensures consistent permissions on new files
Conclusion
This project transformed a legacy single-server WordPress deployment into a modern, resilient, and secure infrastructure. The 2-VM architecture with Tailscale mesh networking provides enterprise-grade reliability on a home lab budget, with the Oracle Cloud free tier handling edge caching and TLS termination.
The key takeaway is that infrastructure doesn't need to be expensive to be resilient. By leveraging free tier cloud services, open-source tools, and thoughtful architecture, it's possible to achieve high availability and security without enterprise budgets.
π Project Summary
- 4 WordPress sites migrated from legacy Apache to Dockerized Nginx
- 2-VM architecture with Tailscale mesh networking
- Cloudflare + Oracle VM edge caching for resilience
- Sites remain online during home internet outages (stale-serve)
- Secure developer access via SSH chroot jail
- Complete automation with Docker Compose
What's next? Future improvements include implementing automated backups, adding monitoring with Prometheus/Grafana, and exploring container orchestration with Docker Swarm for even higher availability.