nftables
One problem with having decent iptables
compatibility layers for
nftables
is that there's a dearth of documentation for the new
stuff; even 2024 stackoverflow questions (and answers) just use the
iptables
commands underneath. So, when I first set up the "obvious"
nft
commands to redirect :80
and :443
to internal non-privileged
ports (which then the "quadlet" PublishPort=
directive would
re-namespace internally so that nginx
would see them as 80
and
443
again) it worked, but only from off-box - connections (to the
dns name and thus the outside IP address) from on box didn't go
through that path. Not great, but most of my searching turned up
incorrect suggestions of turning on route_localnet
so I just left
it that way.
Then a Christmas Day power outage led me to discover that there was a
race condition between the primary nftables
service starting up and
populating the "generic" tables, and my service adding to those
tables, which caused the forwarding rules to fail to start.1
At least the command error showed up clearly in journalctl
.
Application-specific tables
The one other service that I found using nftables
is
sshguard
2 - which doesn't depend on the service starting
at all. Instead, sshguard.service
does nft add table
in an
ExecStartPre
and a corresponding nft delete table
in
ExecStartPost
- then the sshguard
command itself just adds a
blacklist chain
and updates an "attackers" set
directly. This
doesn't need any user-space parts of nftables
to be up, just the
kernel.
I updated my service
file to do the same sort of thing, but all
expanded out inline - used ExecStartPre
to add table
(and add
chain
), matching delete table
in ExecStopPost
, and the apply the
actual port redirects with ExecStart
. The main rules were
unchanged - a prerouting tcp dport 443 redirect
worked the way it
did before, only handling outside traffic; the new trick was to add a
oifname "lo" tcp dport 443 redirect
to an output chain, since
locally-initiated traffic doesn't go through prerouting
. (Likewise
for port 80
since I still have explicit 301
redirects that promote
http
to https
.)
The cherry on top was to add counter
to all of the rules - so I
could see that each rule was firing for the appropriate kind of
traffic, just by running curl
commands bracketed by
nft list table ... |grep counter
and seeing the numbers go up.
Victory, mostly
The main reason for fixing localhost is that it let me run a bunch of "the websites are serving what I expect them to be" tests locally rather than on yet another machine. This is a bit of all-eggs-in-one-basket, but I'll probably just include a basic "is that machine even up" checking on a specifically-off-site linode, or maybe a friend's machine - really, if the web server machine is entirely off line I don't really need precise detail of each service anyway.
-
This is the "normal" systemd problem, where
After=
means "after that other service successfully starts" which isn't good enough for trivial.service
files that don't explicitly cooperate. It might be possible to fix this by moving thenft -f /etc/nftables.conf
fromExecStart=
toExecStartPre=
(since it's aType=oneshot
service anyway) so that it exits before the job is considered "started" but thesshguard
-inspired workaround is cleaner anyway. ↩ -
sshguard is a simple, configurable "watchdog" that detects things like repeated ssh authorization failures (by reading the logs in realtime) and responds with various levels of blackhole routes, using
nftables
to drop the packets. Professional grade, as long as you have an alternate way back in if it locks you out. ↩