https
One problem with having decent iptables compatibility layers for
nftables is that there's a dearth of documentation for the new
stuff; even 2024 stackoverflow questions (and answers) just use the
iptables commands underneath.  So, when I first set up the "obvious"
nft commands to redirect :80 and :443 to internal non-privileged
ports (which then the "quadlet" PublishPort= directive would
re-namespace internally so that nginx would see them as 80 and
443 again) it worked, but only from off-box - connections (to the
dns name and thus the outside IP address) from on box didn't go
through that path.  Not great, but most of my searching turned up
incorrect suggestions of turning on route_localnet so I just left
it that way.
Then a Christmas Day power outage led me to discover that there was a
race condition between the primary nftables service starting up and
populating the "generic" tables, and my service adding to those
tables, which caused the forwarding rules to fail to start.1
At least the command error showed up clearly in journalctl.
Application-specific tables
The one other service that I found using nftables is
sshguard2 - which doesn't depend on the service starting
at all.  Instead, sshguard.service does nft add table in an
ExecStartPre and a corresponding nft delete table in
ExecStartPost - then the sshguard command itself just adds a
blacklist chain and updates an "attackers" set directly.  This
doesn't need any user-space parts of nftables to be up, just the
kernel.
I updated my service file to do the same sort of thing, but all
expanded out inline - used ExecStartPre to add table (and add
chain), matching delete table in ExecStopPost, and the apply the
actual port redirects with ExecStart.  The main rules were
unchanged - a prerouting tcp dport 443 redirect worked the way it
did before, only handling outside traffic; the new trick was to add a
oifname "lo" tcp dport 443 redirect to an output chain, since
locally-initiated traffic doesn't go through prerouting.  (Likewise
for port 80 since I still have explicit 301 redirects that promote
http to https.)
The cherry on top was to add counter to all of the rules - so I
could see that each rule was firing for the appropriate kind of
traffic, just by running curl commands bracketed by
nft list table ... |grep counter and seeing the numbers go up.
Victory, mostly
The main reason for fixing localhost is that it let me run a bunch of "the websites are serving what I expect them to be" tests locally rather than on yet another machine. This is a bit of all-eggs-in-one-basket, but I'll probably just include a basic "is that machine even up" checking on a specifically-off-site linode, or maybe a friend's machine - really, if the web server machine is entirely off line I don't really need precise detail of each service anyway.
Slightly More Victory (2025-08-04 followup)
While setting up wireguard for access to some internal
webcams, I kept getting Empty reply from server errors from curl.
Switching to https just to see if I got different error messages
gave me a startling
* Server certificate:
*  subject: CN=icecream.thok.org
which could only mean that I was talking to the webservers here... and
in fact, I was; tcp dport 443 is much too broad, and catches traffic
though this machine as well as to it.  Adding an ip daddr for
the external IP address of the web server to the rule was enough to
correct the problem.
- 
This is the "normal" systemd problem, where After=means "after that other service successfully starts" which isn't good enough for trivial.servicefiles that don't explicitly cooperate. It might be possible to fix this by moving thenft -f /etc/nftables.conffromExecStart=toExecStartPre=(since it's aType=oneshotservice anyway) so that it exits before the job is considered "started" but thesshguard-inspired workaround is cleaner anyway. ↩
- 
sshguard is a simple, configurable "watchdog" that detects things like repeated ssh authorization failures (by reading the logs in realtime) and responds with various levels of blackhole routes, using nftablesto drop the packets. Professional grade, as long as you have an alternate way back in if it locks you out. ↩
The ultimate goal of the Popular Web servers discussion
was to actually make up my mind as to what to actually run.  The
diversity of options made me realize that SSL termination and web
serving was an inherently modular thing, and since I wanted some
amount of isolation for it anyway, this would be a good opportunity
to get comfortable with podman.
What shape is the modularity?
The interface has three basic legs:
- Listen on tcp ports 80 and 443
- Read a narrow collection of explicitly exported files
- (later) connect to other "service" containers.
(Some of the options in the survey, like haproxy, only do the
"listen" and "connect" parts, but that reduces the "read files" part
to running a "static files only" file server container (which has
access to the collection of files) and having haproxy connect to
that.  For the first pass I'm not actually going to do that, but
it's good to know in advance that this "shape" works.)
Listening ports
If I'm running this without privileges, how is it going to use traditionally "reserved" ports? Options include
- have systemdlisten on them and pass a filehandle in to the container
- run a socatservice to do the listening and reconnecting
- lower /proc/sys/net/ipv4/ip_unprivileged_port_startfrom 1024 to 79
- use firewall rules to "translate" those ports to some higher numbered ones.
I actually used the last one: a pair of nft commands, run in a
Type=oneshot systemd service file to add rules that add rule ip
nat PREROUTING tcp dport 80 redirect to (each unprivileged target
port).  This seemed like the simplest bit of limited privilege to
apply to this problem, as well as being efficient (no packet copying
outside the kernel, just NAT address rewriting) - but do let me know if
there's some other interface that would also do this.
Reading a set of files
docker and podman both have simple "volume" (actually "bind
mount") support to mount an outside directory into the container; this
also gives us some adminstrative options on the outside, like moving
around the disks that the files are on, or combining multiple
directories, without changing the internals at all.
Currently, the directory is mounted as /www inside the container,
and I went with the convention of /www/example.com to have a
directory for each FQDN.  (For now this means a bunch of copy&paste in
the nginx.conf but eventually it should involve some more automation
than that, though possibly on the outside.)1
In order to enable adding new sites without restarting the container,
the nginx.conf is also mounted from the outside, as a single-file
bind mount - using exec to nginx -s reload avoids restarting the
container to apply the changes, allows for automatic generation of the
config from outside, without allowing the container itself access to
change the configuration.
Connecting to other containers
(Details to follow, pending actually using this feature; for now it's sufficient to know that the general model makes sense.)
Why podman over docker?
podman has a bunch of interesting advantages over docker:
- actual privilege isolation - dockeritself manages access to a service that does all of the work as root;podmanactually makes much more aggressive use of namespaces, and doesn't have a daemon at all, which also makes it easier to manage the containers themselves.
- podmanstarted enough later than- dockerthat they were able to make better design choices simply by looking at things that went wrong with- dockerand avoid them, while still maintaining enough compatibility that it remained easy to translate experience with one into success with the other - from a unix perspective, less "- emacsvs- vi" and more "- nvivs- vim".
Mount points
Originally I did the obvious Volume mount of nginx.conf from the git
checkout into /etc/nginx/nginx.conf inside the container.
Inconveniently - but correctly2 - doing git pull to change
that file does the usual atomic-replace, so there's a new file (and
new inode number) but the old mount point is still pointing to the
old inode.
The alternative approach is to mount a subdirectory with the conf file in it, and then symlink that file inside the container.3
LetsEncrypt
We need the certbot and python3-certbot-nginx packages installed
in the pod.  python3-certbot-nginx handles adjusting the nginx
config during certbot operation (see
github:certbot/certbot
for the guts of it.
Currently, we stuff these into the primary nginx pod, because it
needs to control the live webserver to show that it controls the
live webserver.
When used interactively, certbot tells you that "Certbot has set up
a scheduled task to automatically renew this certificate in the
background."  What this actually means is that it provides a crontab
entry (in /etc/cron.d/certbot) and a system timer (certbot.timer)
which is great... except that in our podman config, we run nginx as
pid 1 of the container, don't run systemd, and don't even have
cron installed.  Not a problem - we just create the crontab
externally, and have it run certbot under podman periodically.
Quadlets
Quadlets are just a
new type of systemd "Unit file" with a new [Container] section;
everything from the podman commandline should be expressible in the
.container file.  For the nginx case, we just need Image=,
PublishPort=4, and a handful of Volume= stanzas.
Note that if you could run the podman commands as you, the
.container Unit can also be a systemd "User Unit" that doesn't
need any additional privileges (possibly a loginctl enable-linger
but with Ubuntu 24.04 I didn't actually need that.)
Walkthrough of adding a new site on a new FQDN
DNS
Start with DNS.  Register a domain (in this case, thok.site, get it
pointed to your nameserver5, have that nameserver point to the
webserver.
Certbot
Once certbot is registered,
$ podman exec systemd-nginxpod certbot certonly --nginx --domain thok.site
takes a little while and then gets the certificate.  Note that at this
point, the nginx running in that pod knows nothing about the domain;
certbot is doing all the work.
Get the site content
I have a Makefile that runs git clone to get the site source, or
git pull if it's already present, and then uses ssite build to
generate the HTML in a separate directory (that the nginx pod has
mounted.)
Update the common nginx.conf
Currently nginx.conf is generated with cogapp, so it's just a
matter of adding
# [[[cog https("thok.site") ]]]
# [[[end]]]
and rerunning cogapp to expand it in place.
Kick nginx
make reload in the same Makefile, which just does
$ podman exec systemd-nginxpod nginx -s reload
Done! Check it...
At this point, the site is live. (Yes, the very site you're reading this on; the previous sites all had debugging steps that made the notes a lot less clear, so I didn't have a clean set of directions previously...) Check it in a client browser, and add it to whatever monitoring you have.
Conclusions
So we now have a relatively simple path from "an idea and some
writing" to "live website with basic presentation of content".  A bit
too much copy-and-paste currently, and the helper Makefile really
needs to be parameterized or become an outright standalone tool.
(Moving certbot to a separate pod also needs investigating.)  Now
back to the original tasks of moving web servers off of old hardware,
and pontificating actually blogging!
- 
Not yet as automated as I'd like, but currently using Ned Batchelder's Cog to write macros in python and have them update the nginxconfig in-place in the same file. Eliminates a bunch of data entry errors, but isn't quite an automatic "find the content directories and infer the config from them" - but it is the kind of rope that could become that. ↩
- 
In general, you want to "replace" a file by creating a temporary (in the same directory) then renaming it to the correct name; this causes the name to point to the new inode, and noone ever sees a partial version - either the new one, or the old one, because no "partial" file even exists, it's just a substitution in the name to inode mapping. There are a couple of edge cases, though - if the existing file has permissions that you can't recreate, if the existing file has hardlinks, or this one where it has bind mounts. Some editors, like emacs, have options to detect the multiple-hard-links case and trade off preserving the links against never corrupting the file; this mechanism won't detect bind mounts, though in theory you could find them in/proc/mounts. ↩
- 
While this is a little messy for a single config file, it would be a reasonable direction to skip the symlinks and just have a top-level config file inside the container include subdir/*.confto pick up all of the (presumably generated) files there, one per site. This is only an organizational convenience, the resulting configuration is identical to having the same content in-line, and it's not clear there's any point to heading down that path instead of just generating them automatically from the content and never directly editing them in the first place. ↩
- 
The PublishPortoption just makes the "local" aliases for ports 80 and 443 appear inside the container as 80 and 443; there's a separatepod-forward-web-ports.servicethat runs thenftablecommands (with root) as a "oneshot"systemdSystem Service. ↩
- 
In my case, that means "update the zone file with emacs, so it auto-updates the serial number" and then push it to my CVS server; then get all of the actual servers to CVS pull it and reload bind. ↩ 
I posted1 a poll on mastodon:
What's your choice for an internet-facing web server, in 2024? Security over performance, ease (or lack) of configuration is a bonus; presence in Debian or Ubuntu preferred but anything I can build from source (so, probably not written in go) and has a good CVE story is of interest.
In the initial set I included Apache, nginx, lighttpd, caddy,
and webfs (based on them showing up in
popcon.)  So far nginx is in the lead
with caddy and Apache surprisingly close to tied, but the
fascinating bit was the followups about servers that either I hadn't
heard of, or didn't realize qualified.  (It got boosted early on by
Tim Bray and
Glyph which got it much broader
attention than I expected, which I believed really helped reach people
who provided some of the more unusual followups.)
Twisted Python
Twisted is actually packaged and has decades of usage - it just wasn't
tagged with the httpd virtual package in debian so it didn't come up
in my original search.  (It also doesn't currently include any config
to run by default, but really it would just be a basic .service file
to invoke twisted web with some arguments.)  Glyph points
out
that twisted's TLS support is in C, but that parsing HTTP with C in
2024 is just asking for trouble.
Kestrel+YARP
Kestrel is the web server component of DotNet Core - this combination handles all of the app service front end traffic for Azure
YARP itself is a standalone MIT-licensed reverse proxy written in C# (nothing to do with Edgar Wright.)
Thanks to Blake Coverett for pointing this one out (they used it under Debian and Ubuntu in production!) but the DotNet ecosystem is pretty far outside my comfort zone/tech bubble.
OpenBSD httpd
OpenBSD ships a default http server with strong security and simple configuration. This does look solid and would be high on the list if I were running OpenBSD - there's some risk that it uses OpenBSD's advanced isolation features in ways that a naïve Linux port might not get right, but if I find an active one I'll look further.
There's an AsiaBSDCon 2015
paper which
describes the history of it replacing nginx (which itself replaced
an Apache 1 fork) as the native OpenBSD web server; this includes a
long discussion of their attempts to harden nginx that are worth a
look in terms of secure software development challenges.
haproxy
Marko
Karppinen
pointed out that haproxy (which is packaged but doesn't Provides:
httpd either) actually works directly as a web server - no direct
file support, but it can terminate HTTPS
connections and
pass the connections on to HTTP backends.  (As of haproxy 2.8,
acme.sh can update a running
haproxy
directly, without disruptive restarts.)
Traefik
Gigantos pointed out that Traefik can also terminate HTTPS directly, and has builtin ACME (Let's Encrypt) support as well as being able to do service discovery instead of needing direct per-site configuration - depending on the shape of those providers that might not end up being less work but it's arguably putting the information in a more correct place.
NGINX Unit
PointlessOne suggested that for very dynamic backends, NGINX Unit was worth a look - it supports a huge variety of languages while still having attention on security and performance.
Apache with mod_md
Most of the comments on apache were about how "it still works" and had
decades of attention, but Marcus
Bointon
pointed out
mod_md
which adds ACME support directly as an Apache Module (shipped with
apache since 2.4.30, which predates Ubuntu 20.04, it's been around for
a while) defaulting to Let's Encrypt.  (He goes on to complain about
the lack of HTTP/3 support, but from my perspective it's evidence that
Apache isn't standing still after all.)
Lighttpd
There was actually one vote against lighttpd from Chris Siebenmann as having stagnated too much to seriously consider for new deployments. (It does still get active development but I'm going for general impressions here and this one was interesting.)
h2o
FunkyBob chimed in near the end of the survey with h2o (in front of Django.) h2o turns out to be
- MIT licensed
- Written in C
- Responds reasonably to CVEs
- Used to do releases on github but now takes the interesting approach that ... each commit to master branch is considered stable and ready for general use ...
- Packaged in ubuntu and debian (also without Provides: httpd, but it's a 2018 version with a bunch of cherry-picked fixes the look like upstream, so I'm not sure how "actually" up-to-date that version is (late 2023 best-case though.)
- Also available as a library, which is common in go projects but a lot more unusual in C servers.
I thought I'd never heard of it, but I'd starred it on github at some unknown point.
Conclusions
Primarily Confirmation
- There's more life in Apache than I'd realized (mod_mdin particular)
- nginxis still the mainstream choice
- caddyis definitely up-and-coming with an enthusiastic community
Actual final numbers: 491 people responded.
- 19% Apache
- 53% nginx
- 2% lighttpd
- 21% Caddy
- 0% webfs
- 4% other/explain
Unexpected Highlights
Not going to do another survey on them, but I was pleased (and surprised) at the number of serious alternatives that turned up, including a few things that I knew about but didn't realize were legitimate answers to my question:
- Twisted Web (including python3-txacme)
- Nginx UNIT
- haproxy
- Traefik
- Kestrel+YARP (dotnet)
Personal Decisions
Part of the motivation for the survey was that I was stuck on an upgrade path for some old blogs and project sites. While that sounds low-value, it's also my playground for professional builds and recommendations, so I take it way more seriously than I probably should...
While the survey results didn't give me a final answer (nor were they intended to) they did reduce some fretting and lead me to a more direct plan:
- put a bounded amount of time into building caddyto my latest-from-source standards
- prototype something with Twisted Web, particularly for the fast path "idea → domain registration → publication" projects, and see how it feels for more conventional use
- fall back to nginxif I don't get anywhere in a week.
What Actually Happened
Since I wanted to get at least one blog up and running quickly to publish this article, I took a shorter path:
- Installed blag which is probably the least-effort markdown blog to get going2
- Used my draft caddy-in-podman notes to do a quick nginx-in-podman, rootless
- Used nftablesNAT support to forward 80/443 to the podman published ports.
That's just on my laptop but by the time you read this it'll be transplanted to a real server.
The key here is that the nginx-in-podman bit is just the server:
- it bind-mounts nginx.conf
- it bind-mounts a multiple-domain content directory
so content and operation are relatively separated, a new server can be
tested with the live content, and more importantly - if I succeed in
my caddy building efforts, I can drop in a caddy-in-podman container
and "effortlessly" swap from nginx to caddy without actually any real
sysadmin effort beyond a podman stop/podman run (which also leaves
me a quick path to rolling back to the working version.)  Yes, this is
the whole promise of container-based modularity, but I needed to see
it scale down without a bunch of larger scale complexity.3