podman
The ultimate goal of the Popular Web servers discussion
was to actually make up my mind as to what to actually run. The
diversity of options made me realize that SSL termination and web
serving was an inherently modular thing, and since I wanted some
amount of isolation for it anyway, this would be a good opportunity
to get comfortable with podman
.
What shape is the modularity?
The interface has three basic legs:
- Listen on tcp ports 80 and 443
- Read a narrow collection of explicitly exported files
- (later) connect to other "service" containers.
(Some of the options in the survey, like haproxy
, only do the
"listen" and "connect" parts, but that reduces the "read files" part
to running a "static files only" file server container (which has
access to the collection of files) and having haproxy
connect to
that. For the first pass I'm not actually going to do that, but
it's good to know in advance that this "shape" works.)
Listening ports
If I'm running this without privileges, how is it going to use traditionally "reserved" ports? Options include
- have
systemd
listen on them and pass a filehandle in to the container - run a
socat
service to do the listening and reconnecting - lower
/proc/sys/net/ipv4/ip_unprivileged_port_start
from 1024 to 79 - use firewall rules to "translate" those ports to some higher numbered ones.
I actually used the last one: a pair of nft
commands, run in a
Type=oneshot
systemd service
file to add rules that add rule ip
nat PREROUTING tcp dport 80 redirect to
(each unprivileged target
port). This seemed like the simplest bit of limited privilege to
apply to this problem, as well as being efficient (no packet copying
outside the kernel, just NAT address rewriting) - but do let me know if
there's some other interface that would also do this.
Reading a set of files
docker
and podman
both have simple "volume" (actually "bind
mount") support to mount an outside directory into the container; this
also gives us some adminstrative options on the outside, like moving
around the disks that the files are on, or combining multiple
directories, without changing the internals at all.
Currently, the directory is mounted as /www
inside the container,
and I went with the convention of /www/example.com
to have a
directory for each FQDN. (For now this means a bunch of copy&paste in
the nginx.conf
but eventually it should involve some more automation
than that, though possibly on the outside.)1
In order to enable adding new sites without restarting the container,
the nginx.conf
is also mounted from the outside, as a single-file
bind mount - using exec
to nginx -s reload
avoids restarting the
container to apply the changes, allows for automatic generation of the
config from outside, without allowing the container itself access to
change the configuration.
Connecting to other containers
(Details to follow, pending actually using this feature; for now it's sufficient to know that the general model makes sense.)
Why podman over docker?
podman
has a bunch of interesting advantages over docker
:
- actual privilege isolation -
docker
itself manages access to a service that does all of the work as root;podman
actually makes much more aggressive use of namespaces, and doesn't have a daemon at all, which also makes it easier to manage the containers themselves. podman
started enough later thandocker
that they were able to make better design choices simply by looking at things that went wrong withdocker
and avoid them, while still maintaining enough compatibility that it remained easy to translate experience with one into success with the other - from a unix perspective, less "emacs
vsvi
" and more "nvi
vsvim
".
Mount points
Originally I did the obvious Volume mount of nginx.conf
from the git
checkout into /etc/nginx/nginx.conf
inside the container.
Inconveniently - but correctly2 - doing git pull
to change
that file does the usual atomic-replace, so there's a new file (and
new inode number) but the old mount point is still pointing to the
old inode.
The alternative approach is to mount a subdirectory with the conf file in it, and then symlink that file inside the container.3
LetsEncrypt
We need the certbot
and python3-certbot-nginx
packages installed
in the pod. python3-certbot-nginx
handles adjusting the nginx
config during certbot
operation (see
github:certbot/certbot
for the guts of it.
Currently, we stuff these into the primary nginx
pod, because it
needs to control the live webserver to show that it controls the
live webserver.
When used interactively, certbot
tells you that "Certbot has set up
a scheduled task to automatically renew this certificate in the
background." What this actually means is that it provides a crontab
entry (in /etc/cron.d/certbot
) and a system timer (certbot.timer
)
which is great... except that in our podman config, we run nginx
as
pid 1 of the container, don't run systemd
, and don't even have
cron
installed. Not a problem - we just create the crontab
externally, and have it run certbot under podman periodically.
Quadlets
Quadlets are just a
new type of systemd
"Unit file" with a new [Container]
section;
everything from the podman
commandline should be expressible in the
.container
file. For the nginx
case, we just need Image=
,
PublishPort=
4, and a handful of Volume=
stanzas.
Note that if you could run the podman
commands as you, the
.container
Unit can also be a systemd
"User Unit" that doesn't
need any additional privileges (possibly a loginctl enable-linger
but with Ubuntu 24.04 I didn't actually need that.)
Walkthrough of adding a new site on a new FQDN
DNS
Start with DNS. Register a domain (in this case, thok.site
, get it
pointed to your nameserver5, have that nameserver point to the
webserver.
Certbot
Once certbot is registered,
$ podman exec systemd-nginxpod certbot certonly --nginx --domain thok.site
takes a little while and then gets the certificate. Note that at this
point, the nginx
running in that pod knows nothing about the domain;
certbot
is doing all the work.
Get the site content
I have a Makefile that runs git clone
to get the site source, or
git pull
if it's already present, and then uses ssite build
to
generate the HTML in a separate directory (that the nginx
pod has
mounted.)
Update the common nginx.conf
Currently nginx.conf
is generated with cogapp
, so it's just a
matter of adding
# [[[cog https("thok.site") ]]]
# [[[end]]]
and rerunning cogapp
to expand it in place.
Kick nginx
make reload
in the same Makefile, which just does
$ podman exec systemd-nginxpod nginx -s reload
Done! Check it...
At this point, the site is live. (Yes, the very site you're reading this on; the previous sites all had debugging steps that made the notes a lot less clear, so I didn't have a clean set of directions previously...) Check it in a client browser, and add it to whatever monitoring you have.
Conclusions
So we now have a relatively simple path from "an idea and some
writing" to "live website with basic presentation of content". A bit
too much copy-and-paste currently, and the helper Makefile
really
needs to be parameterized or become an outright standalone tool.
(Moving certbot
to a separate pod also needs investigating.) Now
back to the original tasks of moving web servers off of old hardware,
and pontificating actually blogging!
-
Not yet as automated as I'd like, but currently using Ned Batchelder's Cog to write macros in python and have them update the
nginx
config in-place in the same file. Eliminates a bunch of data entry errors, but isn't quite an automatic "find the content directories and infer the config from them" - but it is the kind of rope that could become that. ↩ -
In general, you want to "replace" a file by creating a temporary (in the same directory) then renaming it to the correct name; this causes the name to point to the new inode, and noone ever sees a partial version - either the new one, or the old one, because no "partial" file even exists, it's just a substitution in the name to inode mapping. There are a couple of edge cases, though - if the existing file has permissions that you can't recreate, if the existing file has hardlinks, or this one where it has bind mounts. Some editors, like
emacs
, have options to detect the multiple-hard-links case and trade off preserving the links against never corrupting the file; this mechanism won't detect bind mounts, though in theory you could find them in/proc/mounts
. ↩ -
While this is a little messy for a single config file, it would be a reasonable direction to skip the symlinks and just have a top-level config file inside the container
include subdir/*.conf
to pick up all of the (presumably generated) files there, one per site. This is only an organizational convenience, the resulting configuration is identical to having the same content in-line, and it's not clear there's any point to heading down that path instead of just generating them automatically from the content and never directly editing them in the first place. ↩ -
The
PublishPort
option just makes the "local" aliases for ports 80 and 443 appear inside the container as 80 and 443; there's a separatepod-forward-web-ports.service
that runs thenftable
commands (with root) as a "oneshot"systemd
System Service. ↩ -
In my case, that means "update the zone file with emacs, so it auto-updates the serial number" and then push it to my CVS server; then get all of the actual servers to CVS pull it and reload bind. ↩