nftables

I have a typical "hacker homenet" - a few real external IP v4 addresses, and an internal 10/8 divided up into /24 chunks for organization, not routing (for example, "webcams", "raspis", "kindles", "laptops", "android stuff", etc) on the theory that I may eventually want to lock them down differently or allocate specific frequency bands for them, like making the webcams 2.4ghz only and keeping laptops on non-colliding 5ghz.

While at home, the webcams are easily accessible to my phone and laptops via the magic of "being on the same wifi." What I wanted was a low-maintenance way to access them "on the road" from both my phone and my laptop. (Currently there's a high-maintenance path involving ssh tunnels that needs to be updated with every new camera - we're trading "do a little more configuration work up front and never touch the plumbing again" (wireguard) against various per-camera tweaks.)

What is wireguard, mechanically?

Wireguard is a widely deployed VPN mechanism built on some efficient cryptographic fundamentals - "but that's not really important right now". The details explain why it's good (and there are many articles starting with https://wireguard.com itself defending that), what we care about here is "what are the moving parts to set it up?"

Endpoint key pairs

As with ssh, you have a private key and a matching public key. Mechanically, wg genkey generates the private key and wg pubkey generates the public key from that. You keep the the private key private - it appears in wg0.conf if you're using wg-quick (which, for now, you are) and nowhere else. Arguably, don't even back it up; if you're rebuilding the machine, it's simple to just generate new keys and add the new pubkey on any peer systems. (If you have reason to keep that connectivity working across a restore, then go ahead and make sure you have Sufficiently Secure Backups for keying material, just try and be explicit about what it's costing you.)

The wireguard tooling is a little odd in that it treats the (base64 encoded) public key as the name of the endpoint; primarily this removes a layer of indirection and potential inconsistency, and it makes some aspects of the configuration more tedious than it needs to be - constants need names. (There may be hacks for this, but wg-quick roundtrips the live state and will probably lose any comments.) The idea seems to be that for small deployments it's not that hard and for larger ones, you'll have more sophisticated automation than just wg-quick.

The main point is that you'll have an obscure string of characters (the private key) that appears in exactly one place (on the machine that "owns" it) and a similar-looking string (the public key) that appears on every machine that directly connects to it. (We'll see later that there might end up not being very many machines, because ultimately this is a network system and other layers can provide plumbing without needing wireguard to explicitly do the work.)

Endpoint addresses

The "outside" of a wireguard connection is an IP address and port. These just need to be reachable (with UDP packets) from the machine you're configuring; if you're just going between machines on a campus or home net they don't even need to be public endpoints, but in the "outside world to homenet" configuration we're planning here, it's a lot easier if you have at least one public internet endpoint.

If all of the involved machines are public you can have them point to each other and spontaneously connect as protected packets arrive; in the modern dying age of IPv4, you're more likely to have one hub or cloud machine that has a public address and most other systems talk to it from behind NATs. In that case, the hub can't connect out to the clients, but the clients can set a PersistentKeepalive which makes sure that they keep pinging the hub and keeping the association alive in the intervening routers - the tools on the hub will show an endpoint, but that's not really the client endpoint, it's a transient address on the NAT box that will still forward packets as-if-they-were responses, through that existing association as long as it's kept alive from the inside/client side.

(If you're using wg-quick on the hub, the main thing you'll want to do is filter out the EndPoint entries after a save or down operation, since they'll be useless. This is probably optional.)

AllowedIPs

The wg-quick config has an AllowedIPs value under [Peer] (as you experiment, you can directly set this with wg set wg0 peer ... allowed-ips ....) This does two things:

  • it allows only packets from those source IP addresses through from the wireguard network interface when they come from that specific peer connection
  • it does an ip route add for that same range of addresses on that wireguard network interface. (If you're not using wg-quick you need to do this yourself explicitly - wg-quick load prints the literal ip route add command it uses, but it should be pretty obvious.)

Basically it says "this subnet is on the far side of this wgN network" - and that connections sent to those addresses should go via wireguard, and connections from those addresses should get responses via wireguard.

In the easy case, you've grabbed a bit of private-use network address space (see RFC-1918) for your network and your "client" machines just need to say "all of that is allowed from the hub"; the hub needs individual (usually /32) ranges for each of the clients.

All of this explains why the AllowedIPs ranges on one machine can't overlap: wireguard itself needs to be able to pick where a packet goes.1

Hub and Spoke model

The details above are enough to generate a very minimal "network" - one "hub" machine with a public address, and one or more "spokes" that are anywhere that can reach that public address. (This does not include reaching each other via the hub, that's next.)

So, for every machine, apt install wireguard-tools so you have wg-quick and a properly permissioned /etc/wireguard. Get a root shell and cd there (it just shortens these steps if the working directory is already "safe" and is already where things are going to look.)

Generate the keys:

# cd /etc/wireguard
# wg genkey > privatekey
# wg pubkey < privatekey > publickey

Configuration here can go two ways:

  • use ip and wg commands to get an active working system (an ephemeral one - rebooting will lose it all) then wg-quick save to stash the pieces
  • generate a wg0.conf with the correct values, and then wg-quick up to activate it (knowing that it'll activate the same way on reboot.)

The raw commands make things more incremental and help you understand which pieces bring which functionality rather than making it a magic whole; also if you're integrating it into your own script-generation system (maybe directly managed systemd files) you want to know where each bit goes. On the other hand, if you're just setting up half a dozen machines by hand, hand-constructing some nearly identical wg0.conf files is pretty quick too.2

Since there are advantages to both paths, I recommend doing what I did - do both! Set up one machine step by step, understand the resulting config, and just generate the config files for the other machines by hand, since the main difference will be the keys.

Hub and first spoke

Shortcuts we'll take here:

  • While you can have multiple wireguard interfaces, we'll just use wg0 in all examples since we only need one for this basic setup.
  • We'll just pick 172.27/16 for our RFC-1918 address space since
    • either you or your provider is already using some 192.168.*/24 net and collisions are confusing
    • 10/8 isn't totally uncommon either, but it's also technically one single network
    • Docker is probably already using 172.17/16 on your machine.
  • We'll use literal --priv-spoke-2-- and --pub-hub-1-- to substitute for the keys for those hosts; here we'll have multiple spoke-N systems and a single hub-1.
  • We'll use 198.51.100.15 as the public address of hub-1 (per RFC-5737 this is on TEST-NET-2, which is "provided for use in documentation".)
  • We'll just pick sequential wireguard-net addresses:
    • hub-1 172.27.0.1
    • spoke-1 172.27.0.2
    • spoke-2 172.27.0.3
  • Sample commands that start with # (evoking a root shell prompt) run as root on any machine; commands that show a host# prompt run as root on that particular named host.
  • Wireguard doesn't seem to have a standard port but some documentation uses 51820 so we'll do that too; feel free to change it to anything that doesn't collide with somethng else you're already running.

We've already generated privatekey and pubkey for spoke-1 and hub-1 above (with raw wg genkey commands because wg-quick doesn't give you any shortcuts for key generation anyway.) This didn't configure anything, just generated a pair of text files on each machine; we'll use the contents of those files below.

On both hub-1 and spoke-1, create the wg0 interface to hang this all off of:

# ip link add dev wg0 type wireguard

Set the local addresses on each:

hub-1# ip address add dev wg0 172.27.0.1
spoke-1# ip address add dev wg0 172.27.0.2

Actually tell the kernel about the private key (by attaching it to the wg0 interface):

# wg set wg0 private-key /etc/wireguard/privatekey

(Filenames are required by this particular interface to avoid exposing the key via ps or history.)

Set up the first link from spoke-1 to hub-1:

hub-1# wg set wg0 listen-port 51820
spoke-1# wg set wg0 peer --pub-hub-1-- endpoint 198.51.100.15:51820 persistent-keepalive 25

The ports need to match, but it's not that important what they are. Also you can use a DNS name here (ie. hub-1.example.com) if the endpoint is actually in DNS.

Note that this link won't work yet - there's no route. You can confirm this by running ping -c 3 172.27.0.1 and ip route get 172.27.0.1 on spoke-1; 100% packet loss, and the route shown will be your normal default route (compare it to what you get from ip route get 1.1.1.1) rather than via wireguard (which would show dev wg0.)

spoke-1# wg set wg0 peer --pub-hub-1-- allowed-ips 172.27.0.0/16
spoke-1# ip route wg0 172.27.0.0/16 add dev wg0

ping still won't work since this is only one direction, spoke to hub3, but ip route get should show dev wg0 which is an improvement. Set up the hub to spoke return path:

hub-1# wg set wg0 peer --pub-spoke-1-- allowed-ips 172.27.0.2/32
hub-1# ip route wg0 172.27.0.2/32 add dev wg0

Now you should have an encrypted channel between a pair of machines, and you can access any TCP and UDP services on hub-1 from spoke-1 directly by IP address without any further configuration4. The reverse is also true: you can ping 172.27.0.2 from hub-1 now, so you might find this to be a useful way to demo a local service securely - with a little more work, but without needing a permanent or routable IP address for spoke-1 itself.

Once it works, if you're using wg-quick you can save the config:

hub-1# touch wg0.conf
hub-1# wg-quick save wg0
spoke-1# touch wg0.conf
spoke-1# wg-quick save wg0

(as of Ubuntu 24.04, save won't work unless the file already exists, which might be a bug, or might be a way to defer setting correct permissions?)

Second spoke

The hub and spoke can talk to each other just fine but that's probably a lot less interesting than adding more spokes. Same key generation, but let's create wg0.conf for spoke-2 directly:

[Interface]
Address = 172.27.0.3/16
PrivateKey = --priv-spoke-2--

[Peer]
PublicKey = --pub-hub-1--
AllowedIPs = 172.27.0.0/16
Endpoint = 198.51.100.15:51820
PersistentKeepalive = 25

On hub-1 you could do the wg and ip commands directly, but it's simple to just add a new [Peer] stanza:

[Peer]
PublicKey = --pub-spoke-2--
AllowedIPs = 172.27.0.3/32

Note that the Hub side doesn't get an Endpoint, and thus can't use PersistentKeepalive either. Also the AllowedIPs range is very narrow, it says that this Peer link only carries traffic to and from spoke-2's wireguard address. (For now - we will be adding more later.)

Since nothing has run on spoke-2 yet, we can just bring it up directly from the configuration:

spoke-2# wg-quick up wg0

On the hub, there's a little more work since wg-quick doesn't have an incremental reload, so we bring it down and back up again:

hub-1# wg-quick down wg0
hub-1# sed -i -e '/^Endpoint =/d' wg0.conf
hub-1# wg-quick up wg0

It's probably safe to drop the Endpoint filtering here, since the "live" state will just be updated as soon as anything connects - but the hub would still generate noise for unreachable clients/spokes until then.

(The Ubuntu .service files pipe wg-quick strip to wg syncconf which is a less disruptive path, since it leaves existing peer sessions alone - I have not tested it at this time, but it's worth looking at once you get things running.)

Now we can do the same ping and ip route get tests we did with spoke-1 above and see that we can talk to hub-1 over wireguard. We can also see that we can't talk to spoke-1 from spoke-2 - the packets are getting to hub-1 but it isn't forwarding them.

Forwarding among spokes with iptables-nft

It's 2025 so we can assume you're using iptables-nft - the "new" (in 2014) kernel interface, but with a CLI layer that is still compatible with 15 years of stackoverflow answers.5

There are three steps - spoke-1 and spoke-2 won't be able to connect to each other until all three are done.

  1. sysctl -w net.ipv4.ip_forward=1 (depending on what else is going on with your hub-1 system this might already be set) lets your system forward packets at all. There's an ancient principle called "please don't melt down the Internet"6 that no system should ever forward packets "out of the box" without someone explicitly configuring it to do so - otherwise network admins would be spending vast amounts of time hunting down unexpected loops.
  2. iptables -A FORWARD -i wg0 -j ACCEPT which tells the kernel nftables/xtables layer that incoming packets from wireguard (on wg0) should be fed to the FORWARD "chain" (for possible forwarding)
  3. iptables -A FORWARD -o wg0 -j ACCEPT which likewise configures wg0 as an output interface for that chain.

I recommend trying the ping test after each step just to convince yourself, but it's only after wg0 is configured as both an input and an output interface that packets will flow through.

Again these commandline steps are "live", but not persistant - they'll go away on reboot. Since they're specific to wireguard, if you're using wg-quick it makes sense to add them to the [Interface] stanza of wg0.conf:

PreUp = sysctl -w net.ipv4.ip_forward=1
PostUp =   iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT

You could use the exact commandline values here, but wg-quick automatically substitutes the correct wg0 value when you use %i - and all of the other tutorials and examples use %i - so you might as well be consistent even if you're not going beyond this single-network configuration.

Hub and Spoke and Hey Wasn't This About Webcams?

So far the spokes have been identical, but suppose spoke-2 is really on our actual home internal network, with direct access to a truly astonishing number of terribly cheap web cameras (attached primarily to windows that face bird feeders - imagine each camera as a cat staring out at the "kitty TV" provided by an outdoor bird feeder... As A Service.)

Now that the rest of the wireguard connectivity is in place, we need to

  1. configure spoke-1 to know that the camera network is "over on wireguard somewhere"
  2. configure hub-1 to know that the camera network is in the general direction of spoke-2
  3. configure spoke-2 to masquerade (NAT) requests to the camera network out its local wifi connection and back.

Let's add to the shortcuts above:

  • The web cameras are all on 10.11.12.* (This is not an actual /24, it's just a collection of addresses within 10/8 that share a common prefix.)
  • The local wifi interface on spoke-2 is wlp0s0f0 and it turns out we don't care what address it has (in particular it does not need to be in the 10.11.12.* range with the cameras.)

Step 1 is simple: just add , 10.11.12.0/24 to the AllowedIPs stanza for the --pub-hub-1-- peer on spoke-1. (If you're doing this manually, also do the explicit ip route add that wg-quick does for you.) If you later give another spoke access to the cameras, this is the only step you'll need to repeat. (Yes, I said this "wasn't a /24" and it still isn't as far as the "real" network is concerned - it's just a wireguard-specific fiction for a range of addresses.)

Step 2 is almost identical: on hub-1, add , 10.11.12.0/24 to AllowedIPs, but this time specifically for the --pub-spoke-2-- peer. This is the primary "access control" - if the hub doesn't allow this traffic in from wireguard, changing local configuration on any of the receiving spokes (like spoke-1) won't do anything to get access to it.

Step 3 on spoke-2 itself is almost unrelated to wireguard - it just involves the same basic iptables-nft setup we did on hub-1 to involve nftables at all, followed by a single command to NAT the local interface:

PreUp = sysctl -w net.ipv4.ip_forward=1
PostUp =   iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT; iptables -t nat -A POSTROUTING -o wlp0s0f0 -j MASQUERADE
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT; iptables -t nat -D POSTROUTING -o wlp0s0f0 -j MASQUERADE

The FORWARD chains arrange for xtables/nftables to care about wireguard packets at all; the iptables -t nat -A POSTROUTING -o wlp0s0f0 -j MASQUERADE lines feed packets that are heading out wlp0s0f0 into the source address rewriting machinery (MASQUERADE) so that the those devices think the connection is coming from spoke-2's wifi address, so they know where to send the responses (the responses also get rewritten so they actually reach spoke-1 over wireguard.)

Note that we don't need to say anything about the addresses we're rewriting for here - it's implicit in the use of wlp0s0f0, it'll be any address associated with that interface, which in this case is actually 10.0.0.0/8, though the AllowedIPs rule (specifically the ip route part of it) on hub-1 will prevent this from having any packets outside of the 10.11.12.* range. Exercise for the reader: encode the narrower limit here as well, perhaps directly with nft commands, or perhaps just with a wlp0s0f0:0 sub-interface with a narrower netmask?

Other Details

Benefits of wg-quick

wg-quick is "just barely enough" configuration to get a minimal wireguard network off the ground - which made it a convenient place for operating system packaging tooling to add quality-of-life improvements; Ubuntu, for example, includes systemd .service files that take care of restarting your wireguard network on boot.7

MTU

wg-quick does ip link set mtu 1420 up dev wg0 while running over links that are themselves mtu 1500; this prevents fragmentation from adding the 80-byte (IP v6) wireguard header to an already-MTU-sized packet (documented on wikipedia.) If you're only doing v4 over v4, you can save 20 bytes and use an MTU of 1440 - but if you're in a situation where you care that much you likely don't need me to tell you that you are.

Future work

It should be possible to fully VPN a browser using a podman container and a namespaced wireguard tunnel (possibly using wireguard over wireguard?) While it has some limitations (wireguard doesn't transport multicast so mDNS/avahi doesn't work) it should be suitable for most "operate from the correct network" cases. Stay Tuned.

Conclusion

Wireguard really has very few moving parts, but you really do need to get all of them right at once to do simple things with it. Fortunately there are helper tools like wg-quick, and wireguard itself is built at a level that works smoothly with the rest of the networking tools in Linux.


  1. This doesn't mean you can't have redundant connections to the same place - you just need to do the balancing yourself at another level? 

  2. in the "pets vs cattle" debate, a handful of pets isn't wrong as long as you have an honest assessment of your trajectory and have time to rework things when the stampede arrives. After all, if Moore's Law lets you just have "one machine and a backup", maybe you don't need a whole Kubernetes cluster... 

  3. You could demonstrate this with tcpdump -i wg0 on hub-1 while running the ping on spoke-1

  4. As long as they are listening on "any" interface - for example, sshd or a typical web server, which show up as *:22 or *:443 in lsof -i tcp or netstat output. 

  5. You can check this by running iptables -V and confirming the output contains (nf_tables). If you don't get that, you're using legacy iptables, and you may need to check if your kernel is new enough to even have wireguard. 

  6. RFC-1812 section 2.2.8.1 is more concrete about the 1995-era "hidden pitfalls" of having "an operating system with embedded router code". 

  7. See "Common tasks in WireGuard VPN" which explains using systemctl enable wg-quick@wg0 to turn on systemd support, and how systemctl reload lets you add or remove peers, but that [Interface] changes generally need a full systemctl restart (still more convenient than anything wg-quick supplies directly. 

One problem with having decent iptables compatibility layers for nftables is that there's a dearth of documentation for the new stuff; even 2024 stackoverflow questions (and answers) just use the iptables commands underneath. So, when I first set up the "obvious" nft commands to redirect :80 and :443 to internal non-privileged ports (which then the "quadlet" PublishPort= directive would re-namespace internally so that nginx would see them as 80 and 443 again) it worked, but only from off-box - connections (to the dns name and thus the outside IP address) from on box didn't go through that path. Not great, but most of my searching turned up incorrect suggestions of turning on route_localnet so I just left it that way.

Then a Christmas Day power outage led me to discover that there was a race condition between the primary nftables service starting up and populating the "generic" tables, and my service adding to those tables, which caused the forwarding rules to fail to start.1 At least the command error showed up clearly in journalctl.

Application-specific tables

The one other service that I found using nftables is sshguard2 - which doesn't depend on the service starting at all. Instead, sshguard.service does nft add table in an ExecStartPre and a corresponding nft delete table in ExecStartPost - then the sshguard command itself just adds a blacklist chain and updates an "attackers" set directly. This doesn't need any user-space parts of nftables to be up, just the kernel.

I updated my service file to do the same sort of thing, but all expanded out inline - used ExecStartPre to add table (and add chain), matching delete table in ExecStopPost, and the apply the actual port redirects with ExecStart. The main rules were unchanged - a prerouting tcp dport 443 redirect worked the way it did before, only handling outside traffic; the new trick was to add a oifname "lo" tcp dport 443 redirect to an output chain, since locally-initiated traffic doesn't go through prerouting. (Likewise for port 80 since I still have explicit 301 redirects that promote http to https.)

The cherry on top was to add counter to all of the rules - so I could see that each rule was firing for the appropriate kind of traffic, just by running curl commands bracketed by nft list table ... |grep counter and seeing the numbers go up.

Victory, mostly

The main reason for fixing localhost is that it let me run a bunch of "the websites are serving what I expect them to be" tests locally rather than on yet another machine. This is a bit of all-eggs-in-one-basket, but I'll probably just include a basic "is that machine even up" checking on a specifically-off-site linode, or maybe a friend's machine - really, if the web server machine is entirely off line I don't really need precise detail of each service anyway.

Slightly More Victory (2025-08-04 followup)

While setting up wireguard for access to some internal webcams, I kept getting Empty reply from server errors from curl. Switching to https just to see if I got different error messages gave me a startling

* Server certificate:
*  subject: CN=icecream.thok.org

which could only mean that I was talking to the webservers here... and in fact, I was; tcp dport 443 is much too broad, and catches traffic though this machine as well as to it. Adding an ip daddr for the external IP address of the web server to the rule was enough to correct the problem.


  1. This is the "normal" systemd problem, where After= means "after that other service successfully starts" which isn't good enough for trivial .service files that don't explicitly cooperate. It might be possible to fix this by moving the nft -f /etc/nftables.conf from ExecStart= to ExecStartPre= (since it's a Type=oneshot service anyway) so that it exits before the job is considered "started" but the sshguard-inspired workaround is cleaner anyway. 

  2. sshguard is a simple, configurable "watchdog" that detects things like repeated ssh authorization failures (by reading the logs in realtime) and responds with various levels of blackhole routes, using nftables to drop the packets. Professional grade, as long as you have an alternate way back in if it locks you out.