nftables for port redirection

Topics: https nftables

One problem with having decent iptables compatibility layers for nftables is that there's a dearth of documentation for the new stuff; even 2024 stackoverflow questions (and answers) just use the iptables commands underneath. So, when I first set up the "obvious" nft commands to redirect :80 and :443 to internal non-privileged ports (which then the "quadlet" PublishPort= directive would re-namespace internally so that nginx would see them as 80 and 443 again) it worked, but only from off-box - connections (to the dns name and thus the outside IP address) from on box didn't go through that path. Not great, but most of my searching turned up incorrect suggestions of turning on route_localnet so I just left it that way.

Then a Christmas Day power outage led me to discover that there was a race condition between the primary nftables service starting up and populating the "generic" tables, and my service adding to those tables, which caused the forwarding rules to fail to start.1 At least the command error showed up clearly in journalctl.

Application-specific tables

The one other service that I found using nftables is sshguard2 - which doesn't depend on the service starting at all. Instead, sshguard.service does nft add table in an ExecStartPre and a corresponding nft delete table in ExecStartPost - then the sshguard command itself just adds a blacklist chain and updates an "attackers" set directly. This doesn't need any user-space parts of nftables to be up, just the kernel.

I updated my service file to do the same sort of thing, but all expanded out inline - used ExecStartPre to add table (and add chain), matching delete table in ExecStopPost, and the apply the actual port redirects with ExecStart. The main rules were unchanged - a prerouting tcp dport 443 redirect worked the way it did before, only handling outside traffic; the new trick was to add a oifname "lo" tcp dport 443 redirect to an output chain, since locally-initiated traffic doesn't go through prerouting. (Likewise for port 80 since I still have explicit 301 redirects that promote http to https.)

The cherry on top was to add counter to all of the rules - so I could see that each rule was firing for the appropriate kind of traffic, just by running curl commands bracketed by nft list table ... |grep counter and seeing the numbers go up.

Victory, mostly

The main reason for fixing localhost is that it let me run a bunch of "the websites are serving what I expect them to be" tests locally rather than on yet another machine. This is a bit of all-eggs-in-one-basket, but I'll probably just include a basic "is that machine even up" checking on a specifically-off-site linode, or maybe a friend's machine - really, if the web server machine is entirely off line I don't really need precise detail of each service anyway.


  1. This is the "normal" systemd problem, where After= means "after that other service successfully starts" which isn't good enough for trivial .service files that don't explicitly cooperate. It might be possible to fix this by moving the nft -f /etc/nftables.conf from ExecStart= to ExecStartPre= (since it's a Type=oneshot service anyway) so that it exits before the job is considered "started" but the sshguard-inspired workaround is cleaner anyway. 

  2. sshguard is a simple, configurable "watchdog" that detects things like repeated ssh authorization failures (by reading the logs in realtime) and responds with various levels of blackhole routes, using nftables to drop the packets. Professional grade, as long as you have an alternate way back in if it locks you out.