Featured image of post Best Edge Computing Cluster Solution: Tailscale + K3s (2026 Edition)

Best Edge Computing Cluster Solution: Tailscale + K3s (2026 Edition)

A cross-cloud edge computing cluster solution based on self-hosted Tailscale control plane + K3s, with practical 8-node HA cluster deployment.

PS: Holiday’s coming, time for more tinkering!!

Manual dog head emoji

ใ€Programming Tech Zone Disclaimerใ€‘

My last k3s article was written in 2025, using WireGuard for cross-cloud networking.

A year has passed, and WireGuard configuration is still too tedious.

โ€”Every time you add a machine, you have to manually edit configs, add peers, adjust routes…

So this year, I decided to switch the underlying network entirely to Tailscale.

More specificallyโ€”self-hosting a Headscale control plane, without relying on the official SaaS.

This way, you can add as many nodes to the internal network as you want,

with a one-line command to join the network, no more manual WireGuard Peer configuration management.

At the same time, the k3s cluster has been upgraded from a humble 2-node version,

to an 8-node HA cluster with 4 Masters + 4 Edges.

Small but complete.

Let’s get to it.

Overall Architecture

First, here’s an architecture diagram for a global overview:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
                    Internet
                      โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚             โ”‚             โ”‚
    [Tencent Cloud] [Rainyun]    [Local]
    (Domestic)   (Singapore)   (Home/Office)
        โ”‚             โ”‚             โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
           Tailscale VPN (100.64.0.0/10)
          Headscale Self-hosted (ts.example.com)
          Authentik OIDC Login (auth.example.com)
                      โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚          โ”‚          โ”‚          โ”‚
    [control1] [control2] [control3] [control4]
    vm-0-8     vm-16-12   vm-28-17   vm-0-15
   100.64.0.6 100.64.0.5 100.64.0.7 100.64.0.10
    (Primary)  (Member)   (Member)  (TS Control)
        โ”‚          โ”‚          โ”‚          โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
        K3S HA Cluster (Flannel VXLAN over Tailscale)
        Pod: 10.42.0.0/16  |  Service: 10.43.0.0/16
                      โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚         โ”‚         โ”‚         โ”‚
     [haru]  [lgb-amd]  [rainyun]  [hc1]
    Domestic   Local    Overseas  Domestic
    Ready โœ…  Ready โœ…  Ready โœ…  Ready โœ…

Current cluster state:

Node NameRoleTailscale IPStatusNotes
vm-0-8-ubuntu-newMaster100.64.0.6Ready โœ…Initial Master, –cluster-init
vm-16-12-ubuntuMaster100.64.0.5Ready โœ…HA Master
vm-28-17-ubuntuMaster100.64.0.7Ready โœ…HA Master
vm-0-15-ubuntuMaster100.64.0.10Ready โœ…Headscale control plane node, NoSchedule
haruEdge100.64.0.3Ready โœ…Domestic node
lgb-amd-3700Edge100.64.0.4Ready โœ…Local AMD host, 16GB
rainyun-ssh7pavpEdge100.64.0.12Ready โœ…Overseas Singapore node
hc1Edge100.64.0.9Ready โœ…Domestic node

8-node cluster running dozens of services, stable as a rock.

Part I: Setting up Tailscale - Cloud Internal Network Cluster

Why Not Manually Configure WireGuard Anymore?

In the 2025 article, I used the native WireGuard solution + k3s Flannel wireguard-native mode.

To be honest, WireGuard itself is very stable with good performance. But manually managing peers becomes a nightmare when you have multiple nodes:

  • Every time you add a node, you need to update WireGuard configs on all other nodes
  • Public key exchange and IP allocation are all manual
  • If a machine’s public IP changes, all peers need to be updated
  • No unified management interface

So this time, I went straight to Tailscale.

Why Self-host Headscale?

Tailscale’s official SaaS service is certainly convenient, and the personal plan now supports up to 100 devices, which is more than enough. But there are a few issues:

  1. Login requires external networkโ€”Tailscale login authentication uses Google/Microsoft/GitHub OAuth, basically unusable from mainland China without VPN, especially troublesome on servers
  2. Data controlโ€”All node information is on someone else’s servers, not reassuring
  3. Domestic accessโ€”Tailscale’s official coordination server is overseas, domestic nodes have unstable connections, occasionally slow to establish connections between nodes

So I chose Headscaleโ€”an open-source Tailscale control plane implementation.

Combined with Authentik for OIDC login, the experience is almost identical to official Tailscale, even more flexible.

Headscale Deployment

Server requirements: A small 2C2G machine is enough, I’m using Tencent Cloud Lighthouse Ubuntu 24.04.

The core is a docker-compose.yml containing:

ComponentDescription
Headscale v0.28.0Tailscale control plane
Authentik 2025.2.4OIDC username/password login
PostgreSQL 16Authentik database
RedisAuthentik cache
NginxHTTPS reverse proxy

After deployment, two domains are ready:

  • https://ts.example.com โ†’ Headscale control plane
  • https://auth.example.com โ†’ Authentik login management

I won’t expand on the detailed deployment process here (that’s for another article), but the key points are:

  1. Use acme.sh + Nginx for HTTPS, don’t mess with fancy Traefik stuff
  2. Configure OIDC in Headscale, with Issuer pointing to Authentik
  3. Use 100.64.0.0/10 IP range, this is the CGNAT address range that won’t conflict with internal networks

Node Onboarding

After setting up Headscale, getting any machine on the network is a one-line command:

1
2
3
4
5
# Install Tailscale client
curl -fsSL https://tailscale.com/install.sh | sh

# Connect to self-hosted control plane
tailscale up --login-server=https://ts.example.com --hostname=your-hostname --accept-dns=false

The terminal will output a link, open it in a browser, it jumps to the Authentik login page, enter username and password, authorizeโ€”node is on the network.

That simple.

Key recommendation:

Don’t rely on public cloud internal networks!

Even if your Master nodes are in the same cloud, it’s recommended to communicate uniformly through Tailscale.

The reason is simple: public cloud internal networks are black boxes. When you change machines or availability zones, internal IPs change. While Tailscale IPs are allocated by you and won’t change.

All nodes, whether cloud-based, local, or on other cloud platforms, should uniformly access via Tailscale for a clean network architecture.

My current Tailscale network looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
100.64.0.2   lgb-macbookair-m4   macOS     โ† My dev machine
100.64.0.3   haru                linux     โ† Edge node
100.64.0.4   lgb-amd-3700        linux     โ† Local Edge node
100.64.0.5   vm-16-12-ubuntu     linux     โ† K3s Master
100.64.0.6   vm-0-8-ubuntu-new   linux     โ† K3s Master (Primary)
100.64.0.7   vm-28-17-ubuntu     linux     โ† K3s Master
100.64.0.8   rainyun-vja2g92e    linux     โ† Overseas node
100.64.0.9   hc1                 linux     โ† Edge node
100.64.0.10  ts-headscale        linux     โ† Headscale control plane + K3s Master
100.64.0.12  ipv6radar           linux     โ† Overseas node
100.64.0.13  localhost           android   โ† Phone can join too (for slacking off)

No matter where you are, ping 100.64.0.6 works. This is what a proper internal network experience should be.

Part II: K3s Cluster Setup - Based on Tailscale Network

Core Principles

  1. All Master nodes in the same cloud regionโ€”My 4 Masters are all in Tencent Cloud, low etcd sync latency
  2. Masters communicate using Tailscale IPsโ€”Don’t rely on cloud internal network
  3. Edge nodes anywhereโ€”Home desktop, overseas VPS, office workstation, as long as Tailscale can reach
  4. Gateway nodes on high-bandwidth machinesโ€”Ingress runs on lightweight cloud with sufficient traffic

Installing Masters

Installation is actually straightforward, just follow the official docs: https://docs.k3s.io/quick-start

But there are a few critical parameters you must pay attention to.

First Master (initialize cluster):

1
2
3
4
5
6
7
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | \
  INSTALL_K3S_MIRROR=cn \
  sh -s - server \
  --cluster-init \
  --flannel-iface=tailscale0 \
  --node-ip=100.64.0.6 \
  --tls-san=100.64.0.6

Subsequent Masters joining cluster:

1
2
3
4
5
6
7
8
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | \
  INSTALL_K3S_MIRROR=cn \
  K3S_TOKEN="your-token" \
  sh -s - server \
  --server https://100.64.0.6:6443 \
  --flannel-iface=tailscale0 \
  --node-ip=$(tailscale ip -4) \
  --tls-san=$(tailscale ip -4)

Core parameter explanation:

ParameterWhy it’s necessary
--cluster-initFirst Master uses this, enables embedded etcd HA
--flannel-iface=tailscale0Most critical! Makes Flannel VXLAN use Tailscale NIC, not physical NIC
--node-ip=$(tailscale ip -4)Node IP uses Tailscale IP, ensures cross-cloud communication
--tls-san=<ip>API Server certificate includes Tailscale IP
INSTALL_K3S_MIRROR=cnDomestic mirror acceleration, overseas nodes don’t need

--flannel-iface=tailscale0 is the hard-learned lesson after countless pitfalls.

Without this parameter, Flannel will default to using the physical NIC’s IP (like public IP or cloud internal IP) to build VXLAN tunnels. The result isโ€”same-cloud nodes can communicate, cross-cloud nodes have complete Pod communication failure.

With this parameter, Flannel’s VXLAN tunnels all go through the Tailscale virtual NIC, cross-cloud Pod communication is perfect.

Installing Edge Nodes

Edge nodes are Agents, even simpler.

Domestic nodes:

1
2
3
4
5
6
7
8
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | \
  K3S_URL="https://100.64.0.6:6443" \
  K3S_TOKEN="your-token" \
  INSTALL_K3S_MIRROR=cn \
  sh -s - agent \
  --node-ip=$(tailscale ip -4) \
  --flannel-iface=tailscale0 \
  --node-label="node.kubernetes.io/role=edge"

Overseas nodes:

1
2
3
4
5
6
7
curl -sfL https://get.k3s.io | \
  K3S_URL="https://100.64.0.6:6443" \
  K3S_TOKEN="your-token" \
  sh -s - agent \
  --node-ip=$(tailscale ip -4) \
  --flannel-iface=tailscale0 \
  --node-label="node.kubernetes.io/role=edge"

The difference is domestic uses rancher-mirror.rancher.cn, overseas uses official get.k3s.io.

Domestic Docker Image Pulling Issues

I need to specifically mention: Docker/containerd image pulling in domestic environments is a big hassle.

Docker Hub, ghcr.io, gcr.io and other image sources are basically semi-blocked in China. Some k3s built-in system component images (like pause, coredns, metrics-server) might not be pullable, causing nodes to stay NotReady.

Several solutions:

  1. Configure image acceleratorsโ€”Configure available domestic mirrors in /etc/rancher/k3s/registries.yaml (if you can still find live ones)
  2. Local export then import (recommended)โ€”Pull images on overseas nodes or machines that can pull normally, export and transfer to domestic nodes for import:
    1
    2
    3
    4
    5
    6
    7
    8
    
    # Export on overseas node
    ctr -n k8s.io images export pause.tar registry.k8s.io/pause:3.9
    
    # Transfer to domestic node
    scp pause.tar root@<domestic-node-IP>:/tmp/
    
    # Import on domestic node
    ctr -n k8s.io images import /tmp/pause.tar
    
  3. Self-host Harbor image registryโ€”If you have many nodes, it’s recommended to set up a private image registry as a proxy cache, once and for all

My approach is to pre-pull needed images on overseas nodes, then ctr images export to export, transfer through Tailscale internal network to domestic nodes, and ctr images import to import. Though crude, it’s stable and reliable.

Get the token on the first Master:

1
cat /var/lib/rancher/k3s/server/node-token

If everything’s fine, you’ll see new nodes online within seconds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ kubectl get nodes -o wide
NAME                 STATUS   ROLES                       AGE    VERSION
vm-0-8-ubuntu-new    Ready    control-plane,etcd,master   30d    v1.32.3+k3s1
vm-16-12-ubuntu      Ready    control-plane,etcd,master   30d    v1.32.3+k3s1
vm-28-17-ubuntu      Ready    control-plane,etcd,master   30d    v1.32.3+k3s1
vm-0-15-ubuntu       Ready    control-plane,etcd,master   1d     v1.32.3+k3s1
haru                 Ready    <none>                      20d    v1.32.3+k3s1
lgb-amd-3700         Ready    <none>                      15d    v1.32.3+k3s1
rainyun-ssh7pavp     Ready    <none>                      10d    v1.32.3+k3s1
hc1                  Ready    <none>                      5d     v1.32.3+k3s1

All 8 nodes Ready.

Beautiful.

About Gateway Nodes

Ingress traffic entry requires public IP + sufficient bandwidth.

My approach:

  • Gateway nodes use lightweight cloud serversโ€”Tencent Cloud Lighthouse, Alibaba Cloud Lighthouse, etc., monthly payment of tens of yuan, traffic package is sufficient
  • k3s built-in Traefik Ingress runs on all nodes by default (DaemonSet), but only one or two nodes need to be exposed externally
  • Domain DNS resolves to these lightweight cloud public IPs

Traffic path: User request โ†’ Lightweight cloud public IP โ†’ Traefik Ingress โ†’ Service โ†’ Pod (can be on any node)

Even if the Pod is on an overseas node, no problemโ€”Flannel over Tailscale will route the traffic.

Part III: Pitfall Chronicles

Pitfall 1: Flannel Using Wrong NIC (Hard-learned Lesson)

Symptom: Edge node is Ready, but cross-node Pod communication fails completely.

Cause: Didn’t add --flannel-iface=tailscale0, Flannel defaulted to public network NIC.

Diagnosis:

1
2
kubectl describe node rainyun-ssh7pavp | grep flannel
# See flannel.alpha.coreos.com/public-ip using public IP instead of 100.64.0.x

Solution: Uninstall k3s agent, reinstall with --flannel-iface=tailscale0.

This parameter is so important, say it three times:

  1. --flannel-iface=tailscale0
  2. --flannel-iface=tailscale0
  3. --flannel-iface=tailscale0

Pitfall 2: Unstable Tailscale Connection

A Vultr VPS had Tailscale connections dropping every few days.

Node kept bouncing between Ready and NotReady, kubelet frantically reporting unable to update node status.

Investigation revealed it was a VPN link issue, nothing to do with K3s.

Solution: First use taint to isolate, later just replaced the machine.

1
2
kubectl taint nodes vultr.guest node-problem=true:NoSchedule --overwrite
kubectl taint nodes vultr.guest node-problem=true:NoExecute --overwrite

Lesson: Better to remove unstable nodes than struggle. Replacing with a new machine is faster than troubleshooting network issues.

Pitfall 3: Don’t Run Business Pods on Low-spec Masters

My fourth Master (vm-0-15-ubuntu) only has 2G memory and also runs the Headscale control plane.

If you let it run business Pods, it’ll OOM in minutes.

Solution: Add NoSchedule taint, only run control plane components + etcd.

1
kubectl taint nodes vm-0-15-ubuntu node-role.kubernetes.io/control-plane=:NoSchedule

2G memory running k3s control-plane + etcd + Headscale full stack, CPU and memory usage around 60%, holds up fine.

Part IV: Troubleshooting Quick Reference

Node added but having issues? Check in this order:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 1. Check node status
kubectl get nodes -o wide

# 2. Check node events
kubectl describe node <node-name>

# 3. Check Tailscale connectivity
tailscale ping <target-node-tailscale-IP>

# 4. Check Flannel configuration
kubectl describe node <node-name> | grep flannel

# 5. Check kubelet logs
ssh root@<node-IP> "journalctl -u k3s-agent -n 50"

# 6. Test cross-node Pod communication
kubectl run test --image=busybox --rm -it -- wget -qO- http://<other-node-PodIP>

90% of issues can be found in the first 4 steps.

Part V: Summary

Compared to the 2025 solution, key changes in this upgrade:

Item20252026
NetworkingWireGuard manual configTailscale (Headscale self-hosted)
Control planeOfficial Tailscale / manual WGSelf-hosted Headscale + OIDC
Master count24 (HA)
Edge nodes04
Total nodes28
New node onboardingChange lots of configsOne-line command
Cross-cloud communicationFlannel wireguard-nativeFlannel VXLAN over Tailscale
Management complexityHighLow

One-sentence solution summary:

  1. Use Tailscale (Headscale self-hosted) for underlying networkโ€”All nodes uniformly join, don’t rely on any public cloud internal network
  2. K3s Masters in same cloud regionโ€”Low etcd latency, stable control plane
  3. Edge nodes add freelyโ€”Home machines, overseas VPS, office workstations, as long as Tailscale can reach
  4. Gateway nodes use lightweight cloudโ€”Cheap, sufficient bandwidth, fixed public IP

The entire cluster has been running for a month, stable as an old dog.

8/8 nodes all available (previously had 1 Vultr with network issues, already replaced).

Done.

Entire article completed in the study at dawn, Happy New Year everyone~

Manual dog head emoji


Related links:


This article was written with AI assistance.

Built with Hugo
Theme Stack designed by Jimmy