podman

Podman, Nginx, and Interchangable Parts

The ultimate goal of the Popular Web servers discussion was to actually make up my mind as to what to actually run. The diversity of options made me realize that SSL termination and web serving was an inherently modular thing, and since I wanted some amount of isolation for it anyway, this would be a good opportunity to get comfortable with podman.

What shape is the modularity?

The interface has three basic legs:

Listen on tcp ports 80 and 443
Read a narrow collection of explicitly exported files
(later) connect to other "service" containers.

(Some of the options in the survey, like haproxy, only do the "listen" and "connect" parts, but that reduces the "read files" part to running a "static files only" file server container (which has access to the collection of files) and having haproxy connect to that. For the first pass I'm not actually going to do that, but it's good to know in advance that this "shape" works.)

Listening ports

If I'm running this without privileges, how is it going to use traditionally "reserved" ports? Options include

have systemd listen on them and pass a filehandle in to the container
run a socat service to do the listening and reconnecting
lower /proc/sys/net/ipv4/ip_unprivileged_port_start from 1024 to 79
use firewall rules to "translate" those ports to some higher numbered ones.

I actually used the last one: a pair of nft commands, run in a Type=oneshot systemd service file to add rules that add rule ip nat PREROUTING tcp dport 80 redirect to (each unprivileged target port). This seemed like the simplest bit of limited privilege to apply to this problem, as well as being efficient (no packet copying outside the kernel, just NAT address rewriting) - but do let me know if there's some other interface that would also do this.

Reading a set of files

docker and podman both have simple "volume" (actually "bind mount") support to mount an outside directory into the container; this also gives us some adminstrative options on the outside, like moving around the disks that the files are on, or combining multiple directories, without changing the internals at all.

Currently, the directory is mounted as /www inside the container, and I went with the convention of /www/example.com to have a directory for each FQDN. (For now this means a bunch of copy&paste in the nginx.conf but eventually it should involve some more automation than that, though possibly on the outside.)¹

In order to enable adding new sites without restarting the container, the nginx.conf is also mounted from the outside, as a single-file bind mount - using exec to nginx -s reload avoids restarting the container to apply the changes, allows for automatic generation of the config from outside, without allowing the container itself access to change the configuration.

Connecting to other containers

(Details to follow, pending actually using this feature; for now it's sufficient to know that the general model makes sense.)

Why podman over docker?

podman has a bunch of interesting advantages over docker:

actual privilege isolation - docker itself manages access to a service that does all of the work as root; podman actually makes much more aggressive use of namespaces, and doesn't have a daemon at all, which also makes it easier to manage the containers themselves.
podman started enough later than docker that they were able to make better design choices simply by looking at things that went wrong with docker and avoid them, while still maintaining enough compatibility that it remained easy to translate experience with one into success with the other - from a unix perspective, less "emacs vs vi" and more "nvi vs vim".

Mount points

Originally I did the obvious Volume mount of nginx.conf from the git checkout into /etc/nginx/nginx.conf inside the container. Inconveniently - but correctly² - doing git pull to change that file does the usual atomic-replace, so there's a new file (and new inode number) but the old mount point is still pointing to the old inode.

The alternative approach is to mount a subdirectory with the conf file in it, and then symlink that file inside the container.³

LetsEncrypt

We need the certbot and python3-certbot-nginx packages installed in the pod. python3-certbot-nginx handles adjusting the nginx config during certbot operation (see github:certbot/certbot for the guts of it.

Currently, we stuff these into the primary nginx pod, because it needs to control the live webserver to show that it controls the live webserver.

When used interactively, certbot tells you that "Certbot has set up a scheduled task to automatically renew this certificate in the background." What this actually means is that it provides a crontab entry (in /etc/cron.d/certbot) and a system timer (certbot.timer) which is great... except that in our podman config, we run nginx as pid 1 of the container, don't run systemd, and don't even have cron installed. Not a problem - we just create the crontab externally, and have it run certbot under podman periodically.

Quadlets

Quadlets are just a new type of systemd "Unit file" with a new [Container] section; everything from the podman commandline should be expressible in the .container file. For the nginx case, we just need Image=, PublishPort=⁴, and a handful of Volume= stanzas.

Note that if you could run the podman commands as you, the .container Unit can also be a systemd "User Unit" that doesn't need any additional privileges (possibly a loginctl enable-linger but with Ubuntu 24.04 I didn't actually need that.)

Walkthrough of adding a new site on a new FQDN

DNS

Start with DNS. Register a domain (in this case, thok.site, get it pointed to your nameserver⁵, have that nameserver point to the webserver.

Certbot

Once certbot is registered,

$ podman exec systemd-nginxpod certbot certonly --nginx --domain thok.site

takes a little while and then gets the certificate. Note that at this point, the nginx running in that pod knows nothing about the domain; certbot is doing all the work.

Get the site content

I have a Makefile that runs git clone to get the site source, or git pull if it's already present, and then uses ssite build to generate the HTML in a separate directory (that the nginx pod has mounted.)

Update the common nginx.conf

Currently nginx.conf is generated with cogapp, so it's just a matter of adding

# [[[cog https("thok.site") ]]]
# [[[end]]]

and rerunning cogapp to expand it in place.

Kick nginx

make reload in the same Makefile, which just does

$ podman exec systemd-nginxpod nginx -s reload

Done! Check it...

At this point, the site is live. (Yes, the very site you're reading this on; the previous sites all had debugging steps that made the notes a lot less clear, so I didn't have a clean set of directions previously...) Check it in a client browser, and add it to whatever monitoring you have.

Conclusions

So we now have a relatively simple path from "an idea and some writing" to "live website with basic presentation of content". A bit too much copy-and-paste currently, and the helper Makefile really needs to be parameterized or become an outright standalone tool. (Moving certbot to a separate pod also needs investigating.) Now back to the original tasks of moving web servers off of old hardware, and ~~pontificating~~ actually blogging!

Not yet as automated as I'd like, but currently using Ned Batchelder's Cog to write macros in python and have them update the nginx config in-place in the same file. Eliminates a bunch of data entry errors, but isn't quite an automatic "find the content directories and infer the config from them" - but it is the kind of rope that could become that. ↩
In general, you want to "replace" a file by creating a temporary (in the same directory) then renaming it to the correct name; this causes the name to point to the new inode, and noone ever sees a partial version - either the new one, or the old one, because no "partial" file even exists, it's just a substitution in the name to inode mapping. There are a couple of edge cases, though - if the existing file has permissions that you can't recreate, if the existing file has hardlinks, or this one where it has bind mounts. Some editors, like emacs, have options to detect the multiple-hard-links case and trade off preserving the links against never corrupting the file; this mechanism won't detect bind mounts, though in theory you could find them in /proc/mounts. ↩
While this is a little messy for a single config file, it would be a reasonable direction to skip the symlinks and just have a top-level config file inside the container include subdir/*.conf to pick up all of the (presumably generated) files there, one per site. This is only an organizational convenience, the resulting configuration is identical to having the same content in-line, and it's not clear there's any point to heading down that path instead of just generating them automatically from the content and never directly editing them in the first place. ↩
The PublishPort option just makes the "local" aliases for ports 80 and 443 appear inside the container as 80 and 443; there's a separate pod-forward-web-ports.service that runs the nftable commands (with root) as a "oneshot" systemd System Service. ↩
In my case, that means "update the zone file with emacs, so it auto-updates the serial number" and then push it to my CVS server; then get all of the actual servers to CVS pull it and reload bind. ↩

2024-10-13

Topics: https podman LetsEncrypt