THOK operational notes

Historically, my online blog writing has always been distracted/dominated by the "plumbing" and other technical writing about blogging1, rather than writing itself2. Since I'm in the process of setting up new simplified infrastructure, I'm going to try that trick again3 but this time I already have a steady stream of things to write about because they keep leaking in to the Rule 3 project, and this pile of notes will keep them out of there - this is about Personal Infrastructure4 while that one is more Pontificating About Engineering and Infrastructure (for your next startup5.)


  1. About That Blogging 

  2. More About That Blogging 

  3. Notes from the Management 

  4. Otherwise known as "a homelab with delusions of grandeur", in the process of turning from an OpenAFS cell with a pair of 4x2T RAID-4 HP Proliant boxes into a single ASUSstor with 6x2T RAID-6 (7.3T) as a unix filesystem. 

  5. Mekinok was a 2001 "instant infrastructure" startup; one of our lasting contributions was OpenAFS packaging in Debian to make setting up an AFS Cell (with Kerberos support) much less mysterious - which was useful to both TunePrint and MetaCarta over the next decade, and probably others. 

One problem with having decent iptables compatibility layers for nftables is that there's a dearth of documentation for the new stuff; even 2024 stackoverflow questions (and answers) just use the iptables commands underneath. So, when I first set up the "obvious" nft commands to redirect :80 and :443 to internal non-privileged ports (which then the "quadlet" PublishPort= directive would re-namespace internally so that nginx would see them as 80 and 443 again) it worked, but only from off-box - connections (to the dns name and thus the outside IP address) from on box didn't go through that path. Not great, but most of my searching turned up incorrect suggestions of turning on route_localnet so I just left it that way.

Then a Christmas Day power outage led me to discover that there was a race condition between the primary nftables service starting up and populating the "generic" tables, and my service adding to those tables, which caused the forwarding rules to fail to start.1 At least the command error showed up clearly in journalctl.

Application-specific tables

The one other service that I found using nftables is sshguard2 - which doesn't depend on the service starting at all. Instead, sshguard.service does nft add table in an ExecStartPre and a corresponding nft delete table in ExecStartPost - then the sshguard command itself just adds a blacklist chain and updates an "attackers" set directly. This doesn't need any user-space parts of nftables to be up, just the kernel.

I updated my service file to do the same sort of thing, but all expanded out inline - used ExecStartPre to add table (and add chain), matching delete table in ExecStopPost, and the apply the actual port redirects with ExecStart. The main rules were unchanged - a prerouting tcp dport 443 redirect worked the way it did before, only handling outside traffic; the new trick was to add a oifname "lo" tcp dport 443 redirect to an output chain, since locally-initiated traffic doesn't go through prerouting. (Likewise for port 80 since I still have explicit 301 redirects that promote http to https.)

The cherry on top was to add counter to all of the rules - so I could see that each rule was firing for the appropriate kind of traffic, just by running curl commands bracketed by nft list table ... |grep counter and seeing the numbers go up.

Victory, mostly

The main reason for fixing localhost is that it let me run a bunch of "the websites are serving what I expect them to be" tests locally rather than on yet another machine. This is a bit of all-eggs-in-one-basket, but I'll probably just include a basic "is that machine even up" checking on a specifically-off-site linode, or maybe a friend's machine - really, if the web server machine is entirely off line I don't really need precise detail of each service anyway.


  1. This is the "normal" systemd problem, where After= means "after that other service successfully starts" which isn't good enough for trivial .service files that don't explicitly cooperate. It might be possible to fix this by moving the nft -f /etc/nftables.conf from ExecStart= to ExecStartPre= (since it's a Type=oneshot service anyway) so that it exits before the job is considered "started" but the sshguard-inspired workaround is cleaner anyway. 

  2. sshguard is a simple, configurable "watchdog" that detects things like repeated ssh authorization failures (by reading the logs in realtime) and responds with various levels of blackhole routes, using nftables to drop the packets. Professional grade, as long as you have an alternate way back in if it locks you out. 

There is a feature request from 2019 which is surprisingly1 still open but not really going anywhere. There are scattered efforts to build pieces of it, so for clarity let's write down what I actually want.

What even is "cron-like behaviour"

The basic idea is that the output of a job gets dropped in my mailbox. This isn't because mail is suitable for this, just that it's a well established workflow and I don't need to build any new filtering or routing, and it's showing up at the right "attention level" - not interrupting me unless I'm already paused to check mail, easily "deferred" as unread, can handle long content.

Most uses fall into one of two buckets.

  • Long-timeline jobs (weekly backups, monthly letsencrypt runs) where I want to be reminded that they exist, so I want to see successful output (possibly with different subject lines.)
  • Jobs that run often but I don't want the reminder, only the failure reports (because I have a higher level way of noticing that they're still behaving - a monthly summary, or just "things are still working".)

The primary tools for this are

  • a working mail CLI
  • systemd timer files
  • systemd "parameterized service" files that get triggered by the timer failing (or passing.)

The missing pieces are how to actually collect the output.

Journal scraping?

We could just trust the journal - we can use journalctl --unit or --user-unit to pry out "the recent stuff" but if we can pass the PID of the job around, we can use _SYSTEMD_UNIT=xx _PID=yyy to get the relevant content.

(Hmm, we can get pass %n into the mailing service (systemd.unit(5)), but not the pid?)

Separate capture?

Just run the program under script or chronic pointing the log to %t or %T, and generate it with things we know, and then OnFailure and OnSuccess can mail it and/or clean it up.

While it would be nice to do everything with systemd mechanisms, if we have to we can have the wrapper do all of the work so we have enough control.2

In the end

Once I started poking at the live system, I realized that I was getting ahead of myself - I didn't have working mail delivery.3 Setting up postfix took enough time that I decided against anything more clever for the services - so instead, I just went with a minimal .service file that did

WorkingDirectory=...
Type=exec
ExecStart=bash -c '(time ...) 2>&1 | mail -s "Weekly ..." ...

and a matching .timer file with a variant on

[Timer]
OnCalendar=Monday *-*-* 10:00

The systemd.time(7) man page has a hugely detailed set of syntax examples, and if that's not enough, systemd-analyze calendar --iterations=3 ... shows you the next few actual times (displayed localtime, UTC, and as a human-readable relative time expression) so you can be confident about when your jobs with really happen.

For the initial services like "run an apt upgrade in the nginx container" I actually want to see all of the output, since Weekly isn't that noisy; for other services I'll mix in chronic and ifne so that it doesn't bother me as much, but for now, the confidence that things actually ran is more pleasing than the repetition is distracting.

I do want a cleaner-to-use tool at some point - not a more sophisticated tool, just something like "cronrun ..." that automatically does the capture and mail, maybe picks up the message subject from the .service file directly - so these are more readable. But for now, the swamp I'm supposed to be draining is "decommissioning two machines running an AFS cell" so I'm closing the timebox on this for now.


  1. but not unreasonably: "converting log output to mails should be outside of systemd's focus." 

  2. moreutils gives us chronic, ifne, and lckdo, and possibly mispipe and ts if we're doing the capturing. cronutils also has a few bits. 

  3. This was a surprise because this is the machine I'd been using as my primary mail client, specifically Gnus in emacs. Turns out I'd configured smtpmail-send-it so that emacs would directly talk to port 587 on fastmail's customer servers with authenticated SMTP... but I'd never gotten around to actually configuring the machine itself

KPhotoAlbum has lots of built-in features, but in practice the more convenient1 way to interface with it from simple unix tools is to just operate on the index.xml where all of the metadata is stored.2

kpa-grep

I started kpa-grep back in 2011, around when I hit 90k pictures (I'm over 200k now.) The originally documented use case was kpa-grep --since "last week" --tags "office" which was probably for sorting work pictures out from personal ones. (The fuzzy timedateparser use was there from day one; since then, I'm not sure I've used anything other than "last week" or "last month", especially since I never implemented date ranges.) I've worked on it in bursts; usually there's feedback between trying to do something with a sub-gallery, trying to script it, and then enhancing kpa-grep to handle it. The most recent burst added two features, primarily inspired by the tooling around my Ice Cream Blog -

  • A sqlite-based cache of the XML file. Back in the day it took 6-8 seconds to parse the file, on a modern laptop with SSD and All The RAM it's more like 2.5 seconds - the sqlite processing takes a little longer than that but subsequent queries are near-instant, which makes it sensible to loop over kpa-grep output and do more kpa-grep processing on it. A typical "pictures are ready, create a dummy review post for the last ice cream shop with all pictures and some metadata" operation was over a minute without the cache, and is now typically 5-10 seconds even with a stale cache.
  • Better tag support - mostly fleshing out unimplemented combinations of options, but in particular allowing --tag and --since to filter --dump-tags, which let me pick out the most recent Locations which are tagged ice cream, filter out city names, and have a short list of ice cream shops to work with. (Coming soon: adding some explicit checks of them against which shops I've actually reviewed already.)

As far as I know I don't have any users, but nonetheless it is on github, so I've put some effort into keeping it clean3; recently that's also included coming up with a low-effort workflow for doing releases and release artifacts. This is currently a shell script involving debspawn build, dpkg-parsechangelog, and gh release upload which feels like an acceptable amount of effort for a single program with a man page.

pojkar

pojkar is a collection of Flickr upload tools that work off of KPhotoAlbum.4 The currently active tools are sync-to-flickr and auto-cropr.

sync-to-flickr

sync-to-flickr is the engine behind a simple workflow: when I'm reviewing photos in KPhotoAlbum, I choose particular images for posting by adding the Keyword tag flickr to the image. Once I've completed a set and quit out of KPhotoAlbum, I run sync-to-flickr sync which looks for everything tagged flickr, uploads it to Flickr with a title, description, and rotation (and possibly map coordinates, except there are none of those in the current gallery.) There's also a retry mechanism (both flickr's network and mine have improved in the last decade so this rarely triggers.) Once a picture has been uploaded, a flickd tag is added to it, so future runs know to skip it.

After all of that, the app collects up the tags for the posted set of pictures; since social media posting5 has length limits (and since humans are reading the list) we favor longer names and names that appear more often in the set; then we drop tags that are substrings of other tags (dropping Concord in favor of Concord Conservation Land since the latter implies the former well enough.) Finally we truncate the list to fit in a post.

auto-cropr

Flickr has an obscure6 feature where you could select a rectangle on a picture (in the web interface) and add a "note" to that region. auto-cropr used the API to look for recent instances of that which contained a magic string - then picked up the geometry of the rectangle and cropped just that area, posting it as a new flickr picture - and then cross linking them, replacing the original comment with a link to the new image. Basically this let you draw the viewer's attention to a particular area and then let them click to zoom in on it and get more commentary as well as a "closeup".

Note that these "views" are only on Flickr, I don't download or back them up at all (I should fix that.)

fix-kpa-missing/kpa-insert

As part of the Nokia 6630 image fixing project there ended up being a couple of different cleanups which I needed to review carefully, so I wanted the tools to produce diffable changes, which lxml doesn't really guarantee7. Currently, the XML written out by KPhotoAlbum is pretty structured - in particular, any image with no tags is a one-line <image ... /> and I was particularly looking to make corrections to things that were fundamentally untagged8/untaggable (for fix-kpa-missing) or insert lines that were already one-line-per-picture, I just had to get them in the right place.

When I started the image recovery, I ended up just adding a bunch of images with their original datestamps (from 2005), but KPhotoAlbum just added them to the end of the index (since they were "new" and I don't use the sorting features.) So I had the correct lines for each image (which checksums and dimensions), I could just chop them out of the file. Then kpa-insert takes these lines, and walks through the main index as well. For basically any line that doesn't begin with <image it just copies it through unchanged to the new index; when it finds an image line, it grabs the attributes (startDate, md5sum, and pathname specifically) and then checks them against the current head of the insertion list9. Basically, if the head of the index was newer than the head of the insertions, copy insertions over until that's no longer true. If they match exactly - the original version just bailed so I could look at them, then once I figured out that they really were duplicates, I changed it to output rm commands for the redundant files (and only kept the "more original" line from the original index.)

The output was a diffable replacement index that I could review, and check that the "neighbor" <image> entries made sense, and that nothing was getting added from elsewhere in the tree, and other basic "eyeball" checks. Since I had to do this review anyway to make sure I hadn't made any mistakes of intent, it made sense to write the code in a "direct but brittle" style - anything weird, just bail with a good traceback; I wouldn't even look at the diffs until the code "didn't find anything weird." That also meant that I'd done the least amount of work10 necessary to get the right result - basically a degenerate case of Test Driven Development, where there's one input (my existing index) and one test (does the new index look right.)

I also didn't have any of my usual user interface concerns - noone (not even me) was ever going to run this code after making this one change. I did keep things relatively clean with small helper functions because I expected to mine it for snippets for later problems in the same space - which I did, almost immediately.

For fix-kpa-missing, I'd noticed some "dead space" in the main KPhotoAlbum thumbnail view, and figured that it was mostly the result of an old trailcam8 project. I was nervous about "losing" metadata that might point me at images I should instead be trying to recover, but here was a subset that I knew really were (improperly but correctly) discarded images - wouldn't it be nice to determine that they were the only missing images and clean it up once and for all?

So, the same "only look at <image lines" code from kpa-insert, extract the pathname from the attributes, and just check if the file exists; I could look for substrings of the pathname to determine that it was a trailcam pic and was "OK", plus I could continue with the "direct but brittle" approach and check that each stanza I was removing didn't have any tags/options - but just blow up if it found them. Since it found none, I knew that

  • I had definitely not (mis-)tagged any of the discarded pictures
  • I didn't have to write the options-handling code at all. (I suspect I will eventually need this, but the tools that are likely to need it will have other architectural differences, so it makes sense to hold off for now.)

There were a couple of additional scripts cobbled up out of these bits:

  • fix-kpa-PAlbTN which looked for Photo Album ThumbNails from the Nokia project and make sure they didn't exist anywhere else in the tree since I was discarding the ones that I had real pictures for and wanted to be sure I'd really finished up all of the related work while I still had Psion 5 code in my head...
  • find-mbm which used magic.from_file to identify all of the Psion Series 5 multi-bitmap image files (expensively, until the second or third pass when I realized that I had all the evidence I needed that they only existed in _PAlbTN subdirectories, and could just edit the script to do a cheap path test first - effectively running file on a couple of hundred files instead of two hundred thousand.) This was just to generate filenames for the conversion script, it didn't do any of the work directly.

Conclusion

I now have three entirely different sets of tooling to handle index.xml that take very different approaches:

  • kpa-grep uses SQL queries on a sqlite cache of the entire index (read-only, and generates it by LXML-parsing the whole file if it's out of date)
  • pojkar does directly LXML parsing and rewriting (since it's used for uploads that used to be expensive, it does one parse up front and then operates on an internal tree, writing that out every time an upload succeeds for consistency/checkpointing)
  • kpa-insert &c. treat the index.xml as a very structured text file - and operate efficiently but not very safely, relying on my reading the diffs to confirm that the ad-hoc tools worked correctly regardless of not being proper.

Fortunately I've done all of the data-cleaning I intend to do for now, and the kpa-grep issue list is short and mostly releng, not features. I do eventually want a full suite of "manipulate images and tags" CLI tools, and I want them to be faster than 2.5s per operation11 - but I don't have a driving project that needs them yet - my photoblogging tools are already Fast Enough™.


  1. "Ergonomic" might be a better word than convenient, but I have a hard time saying that about XML. 

  2. This does require discipline about only using the tools when KPhotoAlbum itself isn't running, but that's not too big a deal for a personal database - and it's more about not doing updates in two places; it's "one program wins", not a file locking/corruption problem. 

  3. Most of the cleanliness is personal style, but lintian and pylint are part of that. This covers having a man page (using ronn to let me write them in Markdown) and tests (since it's a CLI tool that doesn't export a python API, cram lets me write a bunch of CLI/bash tests in Markdown/doctest style. 

  4. When I promoted it from "the stuff in my python/exif directory to an Actual Project, it needed a name - Flickor is the Swedish word for "girls", and "boys" is Pojkar (pronounced poy-car.) 

  5. Originally this was twitter support, then I added mastodon support, then twitter killed their registered-but-non-paying API use so I dropped the twitter support - which let me increase the post size significantly. This also simplified the code - I previously used bits of thok-ztwitgw but now I can just shell out to toot

  6. Notes actually went away, then came back, then got ACLed; they're also inconsistent: if you're in a search result or range of pictures (such as you get from clicking an image on someone's user page) the mouse only zooms and pans the image; if you edit the URL so it's just a single-image page, then you get rectangle-select back. I basically no longer use the feature and should probably do it directly client-side at some point, at which point the replacement tool should get described here. 

  7. It may be possible to pick a consistent output style at rendering time, but that might not be consistent with future KPhotoAlbum versions, and I just wanted to stick with something that worked reliably with the current output without doing too much (potentially pointless) futureproofing. 

  8. One subset was leftover trailcam pics from before I nailed down my trailcam workflow - most trailcam pics are discardable, false-positive triggers of the motion sensor due to wind - but initially I'd imported them into KPhotoAlbum first, and then deleted the discarded pictures - and this left dangling entries in index.xml that had no pictures, and left blank spots in the UI so I couldn't tag them even if I wanted to. 

  9. This is basically an easier version of the list-merge problem we used to ask as a MetaCarta interview question - because we actually did have a "combine multiple ranked search results" pass in our code that needed to be really efficient and it was a surprisingly relevant question - which is rare for "algorithm questions" in interviews. 

  10. In fact, it would have made a lot of sense to do this as a set of emacs macros, except that I didn't want to tackle the date parsing in elisp (and pymacs is years-dead.) 

  11. perhaps instead of pouring all of the attributes and tags into sqlite as a cache, I should instead be using it for an index that points back into the XML file, so I can do fast inserts as well as extracts? This will need a thorough test suite, and possibly an incremental backup system for the index to allow reconstruction to recover from design flaws. 

This was supposed to be a discussion of a handful of scripts that I wrote while searching for some particular long lost images... but the tale of quest/rathole itself "got away from me". The more mundane (and admittedly more interesting/relevant) part of the story will end up in a follow-on article.

Background

While poking at an SEO issue for my ice cream blog1 I noticed an oddity: a picture of a huge soft-serve cone on flickr that wasn't in my KPhotoAlbum archive. I've put a bunch of work into folding everything2 in to KPhotoAlbum, primarily because the XML format it uses is portable3 and straightforward4 to work with.

Since I wanted to use that picture in my KPhotoAlbum-centered ice cream blog5 I certainly could have just re-downloaded the picture, but one picture missing implied others (I eventually found 80 or so) and so I went down the rathole to solve this once and for all.

First Hints

The picture on flickr has some interesting details to work from:

  • A posting date of 2005-07-31 (which led me to some contemporary photos that I did have in my archive)
  • Tags for nokia6630 and lifeblog
  • A handwritten title (normally my uploads have a title that is just the on-camera filename, because they go via a laptop into KPhotoAlbum first, where I tag them for upload.)

As described in the Cindy's Drive-in story, this was enough to narrow it down to a post via the "Nokia Lifeblog Multimedia Diary" service, where I could take a picture from my Nokia 6630 phone, T9-type a short description, and have it get pushed directly to Flickr, with some automated tags and very primitive geolocation6. That was enough to convince me that there really was an entire category of missing pictures, but that it was confined to the Nokia 6630, and a relatively narrow window of time - one when I was driving around New England in my new Mini Cooper Convertible and taking lots of geolocated7 pictures.

Brute Force

I'd recently completed (mostly) a transition of my personal data hoard from a collection of homelab OpenAFS servers (2 primary machines with 8 large spinning-rust disks) to a single AsusStor device with a half dozen SSDs, which meant that this was a good chance to test out just how much of a difference this particular technology step function made - so I simply ran find -ls on the whole disk looking for any file from that day8:

$ time find /archive/ -ls 2>/dev/null |grep 'Jul 30  2005'

The first time through took five minutes and produced a little over a thousand files. Turns out this found things like a Safari cache from that day, dpkg metadata from a particular machine, mailing list archives from a few dozen lists that had posts on that exact day... and, entirely coincidentally, the last two files were in a nokia/sdb1/Images directory, and one of them was definitely the picture I wanted. (We'll get to the other one shortly.)

Since that worked so well, I figured I'd double check and see if there were any other places I had a copy of that file - as part of an interview question9 over a decade ago, I'd looked at the stats of my photo gallery and realized that image sizes (for JPGs) have surprisingly few duplicates, so I did a quick pass on size:

time find /archive -size 482597c -ls

Because I was searching the same 12 million files10 on a machine with 16G of RAM and very little competing use, this follow-up search took less than two minutes - all of the file metadata was (presumably) still in cache. This also turned up two copies - the one from the first pass, and one from what seems to be a flickr backup done with a Mac tool called "Bulkr"11 some time in 2010 (which didn't preserve flickr upload times, so it hadn't turned up in the first scan.) Having multiple copies was comforting, but it didn't include any additional metadata, so I went with the version that was clearly directly backed up from the memory of the Nokia phone itself.

That other file (side quest)

So I found 482597 Jul 30 2005 /archive/.../nokia/sdb1/Images/20050730.jpg and 3092 Jul 30 2005 /archive/.../nokia/sdb1/Images/_PAlbTN/20050730.jpg in that first pass. The 480k version was "obviously" big enough, and rendered fine; file reported the entirely sensible JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=8, manufacturer=Nokia, model=6630, orientation=upper-left, xresolution=122, yresolution=130, resolutionunit=2], baseline, precision 8, 1280x960, components 3 which again looks like a normal-sized camera image. The 3k _PAlbTN/20050730.jpg version was some sort of scrap, right?12

I don't know what they looked like back then, but today the description said Psion Series 5 multi-bitmap image which suggested it was some kind of image, and that triggered my "I need to preserve this somehow" instinct13.

Wait, Psion? This is a Nokia... turns out that Psion created Symbian, pivoted to being "Symbian Ltd" and was a multi platform embedded OS (on a variety of phones and PDAs) until it got bought out by Nokia. So "Psion" is probably more historically accurate here.

The format is also called EPOC_MBM in the data preservation space, and looking at documentation from the author of psiconv it turns out that it's a container format for a variety of different formats - spreadsheets, notes, password stores - and for our purposes, "Paint Data". In theory I could have picked up psiconv itself, the upstream Subversion sources haven't been touched since 2014 but do contain Debian packaging, so it's probably a relatively small "sub-rathole"14... but the files just aren't that big and the format information is pretty clear, so I figured I'd go down the "convert english to python" path instead. It helps that I only need to handle small images, generated from a very narrow range of software releases (Nokia phones did get software updates but not that many and it was only a couple of years) so I could probably thread a fairly narrow path through the spec - and it wouldn't be hard to keep track of the small number of bytes involved at the hexdump level.

Vintage File Formats

The mechanically important part of the format is that the outer layers of metadata are 32 bit little endian unsigned integers, which are either identifiers, file offsets, or lengths. For identifiers, we have the added complexity that the documentation lists them as hex values directly, and to remove a manual reformatting step we want a helper function that takes "37 00 00 10" and interprets it correctly. So, we read the files with unpack("<L", stream.read(4))[0], and interpret the hex strings with int("".join(reversed(letters.split())), 16) which allows directly checking and skipping identifiers with statements like assert getL(...) == h2i("37 00 00 10")15. This is also a place where the fact that we're only doing thumbnail images helps - we have a consistent Header Section Layout tag, the same File Kind and Application ID each time, and that meant a constant Header Checksum - so we could confirm the checksum without ever actually calculating it.

Once we get past the header, we have the address of the Section Table Section16 which just points near the end of the current file - where we find a length of "1 entry" and a single pointer back to where we already were. (All this jumping around feels like a lot of overhead, but it's only about one percent of the file size.) That pointer brings us to the Paint Data Section which starts with a length (which helps us "account for" the other bytes in the file, since it covers everything up to the Section Table and an offset (which we can ignore since the subsequent data just stacks up until we get to the pixels.) Finally we get the x and y pixel dimensions, some theoretical physical dimensions (specified as having units of ¹/₁₄₄₀ of an inch, but always zero in my actual files) and then a "bits per dot" and "color vs greyscale" flag. Given that these are photo thumbnails, it isn't surprising that these are consistent at "16 bits per pixel" and "color", but the spec is vague about that (as is the psiconv code itself, which just does some rounded fractional values for bit sizes that are larger than the 1/2/4 bit "magic lookup table" values.)

Finally we get to an encoding flag. On the first pass through I only saw 0 "Plain Data" for this, which simplified things... until I did the full run and found that many of the chronologically later thumbnails17 instead had 3 meaning "16-bit RLE". The particular RLE mechanism is pretty simple: values below 128 are a repeat count, and the following pixel should be "used" N+1 times; in order to avoid the RLE making highly varying files larger, values from 128 to 255 do the reverse: the subsequent 256-N 16-bit pixels18 are just used directly with no expansion.

Ancient Pixels

While pixels are clearly labeled as 16 bit, we don't actually have any hints about which of those bits represent which colors. I tried a bunch of guesses that (with a couple of test images) were either too pink, too yellow, too magenta, or all of them at once. Finally I looked at the psiconv source - lib/psiconv/parse_image.c doesn't appear to directly handle 16 bit, it just has a fallback heuristic where red and green each get (16+2)/3 bits, and blue gets the rest, so you get 6/6/4 (which was one of the values I'd already guessed and discarded as "too pink".) To make sure it wasn't a more complicated misinterpretation, I just grabbed the upper 8 bits and used them for all three channels - for a snowy scene with a lot of white and black anyway, it looked pretty convincing, even if it was really just dumping everything but red (displaying it in monochrome probably made it easier to reinterpret, though.)

I also tried a few sample images that were also in the phone backup - flower.jpg was mostly yellow, blue.gif was shades of blue with white swirls - and still wasn't getting that far. At some point I realized that this was a kind of retrocomputing project and that perhaps I should be trying to figure out what "period" 16 bit pixel representations were - and wikipedia already had the answer! While there was a lot of "creativity" in smaller encodings, "RGB565" was basically it for 16 bit19. Since I'd already parameterized the bit lengths for the previous experiments, just dropping in rgbrange = [5, 6, 5] was enough to produce samples with convincing colors when compared to the original images. Victory! Now all I had to do was process the whole set. A little use of python3-magic20 let me identify which files were in this format, then convert the whole set.

Great, now I have all of these thumbnails. And as thumbnails they look pretty good! On closer review they even match the full-sized images I'd already recovered, which confirms that nothing else is missing from that particular camera phone. The other thing that really stands out from that review is that these really are only 42x36 and that is tiny, and if you enlarge them at all they actually get significantly worse. Now that I've used them to be sure that I have all of the originals: I've deleted all of the _PAlbTN directories from my photogallery.21

Conclusion

This was a fairly deep (even excessively deep) rathole for this class of problem - and there are different branches I would have taken if I were doing this in a professional context - but it resolves some (personal) questions that have been lingering for over a decade, and gives me some increased confidence in the integrity of my lifetime photo archive. Worth it.


  1. I mentioned the blog to some old friends who asked "can I just google ice cream blog eichin and find it?" and at the time, I assumed that would work - not knowing that Alfred Eichin patented an ice cream scoop in 1954 that dominates the web, partly because his name was engraved on many of them and they turn up on collector sites, etsy, and ebay. (Not a relation, as far as I am aware.) 

  2. I've folded previous photogalleries in, with tag and description conversions (even if that meant a lot of cut&paste), and included even terrible digital photos all the way back to the little 640x480 shots from my 1999-era Largan camera. 

  3. I've published tools like kpa-grep and also built personal cropping tools (that used the old flickr region-note feature) and auto-posting tools (that generate my current social media posts as well.) All of these work directly with the KPhotoAlbum XML format, typically using python lxml

  4. You've probably heard horrors about XML; while there are encoding issues (well handled by popular libraries - if you don't try and use regex you won't summon ZA̡͊͠͝LGΌ) the thing that matters here is that the model is very flat: a long list of images with a fixed vocabulary of attributes and a single list of (sets of) tags per image - no nesting, no CDATA, no entity cross-reference. 

  5. I literally run icecream-start shopname to grab all of the images tagged (with KPhotoAlbum Location tags) with that shop's name and assemble a first-draft markdown page that just assumes I want all of the pictures and will fill in text descriptions myself. 

  6. Originally the tags were just the real-time cell-tower ids, with a service that scanned participating flickr accounts and turned the "machine" tags into real-world locations afterwards. 

  7. I worked at MetaCarta - a geographic search company - at this time, so I had a professional interest, but we weren't actually acquired by Nokia until 5 years later. 

  8. Seems a bit crude, but the alternative is using touch to create two timestamp files and use -newer; I did run a quick test pass to catch the extra whitespace between the year and the day-of-month - since I also didn't want to turn this into another #awktober post

  9. The interview question was about cleaning up duplicates in a large but badly merged photogallery. The particular bit we were looking for was that you didn't need to do N² full-file comparisons on a terabyte of images when there were only 20k files involved; if you started with just comparing sizes, that was good, but we'd push a little harder and steer you towards comparing hashes in various ways. All straightforward stuff analagous to the kind of bulk data shuffling we were doing, without needing proprietary concepts like gazetteer imports... and most people had some concepts of digital photography at that point. The bit about sizes was realizing that if you shot "raw" most files would be the same uncompressed size, but JPGs are highly compressed and turned out to vary a lot - so as long as you did a full-file confirmation on each pair, using length as an initial discriminator was actually pretty good. (But really, you know about hashes, md5sum, that sort of thing, right? Especially for an infrastructure job where you've almost certainly downloaded a linux install ISO and checked the hashes?) 

  10. Since all of the archives involved are on one filesystem, I didn't need a filesystem cache to get this instantly - df -i reports IUsed and all of those correspond to what I was searching through, with little (and probably no) disk access at all. 

  11. As far as I can tell, bulkr only pulled down the "Original" images and named them from the flickr title, but didn't grab tags, comments, or geographic location. Fortunately that is still up on flickr for future preservation efforts. 

  12. I only finally got around to looking this up while writing this, turns out the internet believes that this is actually an abbreviation of Photo Album Thumb Nail - which is at least convincing, if not well documented. 

  13. Also, there were a number of these Psion "images" in my collection already - which KPhotoAlbum failed to render at all, just left unselectable blanks in the image view - which implied that if I did follow this thread to the end it would let me solve yet another archive quality issue... 

  14. If this were a work project, I'd have gone down the "update the package" path - mostly because at both MetaCarta and RightHand I had already built entire systems of plumbing to streamline the "build a package from upstream sources adding small rigorously tracked changes, and stuff it into a shared artifact repository" pipeline; I only have segments of that implemented in my homelab. 

  15. The actual code has more comments and variables-for-the-purpose-of-labelling because as I built it up I wanted to be clear on things like "I expect this to be a Header Section Layout but I got something else"; the documentation was clear enough (and the format simple enough) that there weren't that many experimental failures in the early stages, and by the time I got to the later stages where it would have been helpful I had already relaxed to the point of writing incomprehensible lines like seek(thing_offset) anyway. 

  16. Both the names and the indirection levels involved strongly suggest that whoever cooked up this format had been recently exposed to the ELF spec, with its Section Header Table and Program Header Table, and in fact Symbian E32Image turns out to be ELF. 

  17. My evidence-free theory here is that while phones of that era didn't get software updates very often, I do vaguely remember getting a few, so perhaps RLE support simply wasn't there as-shipped and was delivered as part of a later update, so only later images used it. 

  18. This was my only point of confusion from the documentation: it says "100-marker" in a context surrounded by other "obviously" hex numbers (with no 0x marker) and for some reason I missed that and interpreted 100 as decimal, which led to rather scrambled decoding until I checked the psiconv code itself - up until that point I'd actually done fairly well at implementing this by only looking at the specs, and I really can't blame the spec author for this one. 

  19. RGB565 was also known as "High Color" in Windows documentation of the era. (That page explains nominal human eyes being more green-sensitive and includes a sample image that attempts to justify that "the extra bit should be in green.) 

  20. "magic" refers to the magic number database used by the unix file utility to make a "heuristic but surprisingly good" fast guess as to what the contents of a file are (ignoring the name - remember, these Psion files all had .jpg or .gif extensions anyway, the directory name mattered but otherwise each thumbnail had exactly the same name as the image it was made from.) 

  21. I did keep them in the git repo for the conversion project - 400ish original thumbnails takes up 2M bytes, and they compress down to about half a meg - so there's no need to free up the space they take up, but there are good organizational reasons like "the photogallery should only have original images" to purge them from the gallery itself. This ends up guiding other clean-up and curation later on. 

The ultimate goal of the Popular Web servers discussion was to actually make up my mind as to what to actually run. The diversity of options made me realize that SSL termination and web serving was an inherently modular thing, and since I wanted some amount of isolation for it anyway, this would be a good opportunity to get comfortable with podman.

What shape is the modularity?

The interface has three basic legs:

  • Listen on tcp ports 80 and 443
  • Read a narrow collection of explicitly exported files
  • (later) connect to other "service" containers.

(Some of the options in the survey, like haproxy, only do the "listen" and "connect" parts, but that reduces the "read files" part to running a "static files only" file server container (which has access to the collection of files) and having haproxy connect to that. For the first pass I'm not actually going to do that, but it's good to know in advance that this "shape" works.)

Listening ports

If I'm running this without privileges, how is it going to use traditionally "reserved" ports? Options include

  • have systemd listen on them and pass a filehandle in to the container
  • run a socat service to do the listening and reconnecting
  • lower /proc/sys/net/ipv4/ip_unprivileged_port_start from 1024 to 79
  • use firewall rules to "translate" those ports to some higher numbered ones.

I actually used the last one: a pair of nft commands, run in a Type=oneshot systemd service file to add rules that add rule ip nat PREROUTING tcp dport 80 redirect to (each unprivileged target port). This seemed like the simplest bit of limited privilege to apply to this problem, as well as being efficient (no packet copying outside the kernel, just NAT address rewriting) - but do let me know if there's some other interface that would also do this.

Reading a set of files

docker and podman both have simple "volume" (actually "bind mount") support to mount an outside directory into the container; this also gives us some adminstrative options on the outside, like moving around the disks that the files are on, or combining multiple directories, without changing the internals at all.

Currently, the directory is mounted as /www inside the container, and I went with the convention of /www/example.com to have a directory for each FQDN. (For now this means a bunch of copy&paste in the nginx.conf but eventually it should involve some more automation than that, though possibly on the outside.)1

In order to enable adding new sites without restarting the container, the nginx.conf is also mounted from the outside, as a single-file bind mount - using exec to nginx -s reload avoids restarting the container to apply the changes, allows for automatic generation of the config from outside, without allowing the container itself access to change the configuration.

Connecting to other containers

(Details to follow, pending actually using this feature; for now it's sufficient to know that the general model makes sense.)

Why podman over docker?

podman has a bunch of interesting advantages over docker:

  • actual privilege isolation - docker itself manages access to a service that does all of the work as root; podman actually makes much more aggressive use of namespaces, and doesn't have a daemon at all, which also makes it easier to manage the containers themselves.
  • podman started enough later than docker that they were able to make better design choices simply by looking at things that went wrong with docker and avoid them, while still maintaining enough compatibility that it remained easy to translate experience with one into success with the other - from a unix perspective, less "emacs vs vi" and more "nvi vs vim".

Mount points

Originally I did the obvious Volume mount of nginx.conf from the git checkout into /etc/nginx/nginx.conf inside the container. Inconveniently - but correctly2 - doing git pull to change that file does the usual atomic-replace, so there's a new file (and new inode number) but the old mount point is still pointing to the old inode.

The alternative approach is to mount a subdirectory with the conf file in it, and then symlink that file inside the container.3

LetsEncrypt

We need the certbot and python3-certbot-nginx packages installed in the pod. python3-certbot-nginx handles adjusting the nginx config during certbot operation (see github:certbot/certbot for the guts of it.

Currently, we stuff these into the primary nginx pod, because it needs to control the live webserver to show that it controls the live webserver.

When used interactively, certbot tells you that "Certbot has set up a scheduled task to automatically renew this certificate in the background." What this actually means is that it provides a crontab entry (in /etc/cron.d/certbot) and a system timer (certbot.timer) which is great... except that in our podman config, we run nginx as pid 1 of the container, don't run systemd, and don't even have cron installed. Not a problem - we just create the crontab externally, and have it run certbot under podman periodically.

Quadlets

Quadlets are just a new type of systemd "Unit file" with a new [Container] section; everything from the podman commandline should be expressible in the .container file. For the nginx case, we just need Image=, PublishPort=4, and a handful of Volume= stanzas.

Note that if you could run the podman commands as you, the .container Unit can also be a systemd "User Unit" that doesn't need any additional privileges (possibly a loginctl enable-linger but with Ubuntu 24.04 I didn't actually need that.)

Walkthrough of adding a new site on a new FQDN

DNS

Start with DNS. Register a domain (in this case, thok.site, get it pointed to your nameserver5, have that nameserver point to the webserver.

Certbot

Once certbot is registered,

$ podman exec systemd-nginxpod certbot certonly --nginx --domain thok.site

takes a little while and then gets the certificate. Note that at this point, the nginx running in that pod knows nothing about the domain; certbot is doing all the work.

Get the site content

I have a Makefile that runs git clone to get the site source, or git pull if it's already present, and then uses ssite build to generate the HTML in a separate directory (that the nginx pod has mounted.)

Update the common nginx.conf

Currently nginx.conf is generated with cogapp, so it's just a matter of adding

# [[[cog https("thok.site") ]]]
# [[[end]]]

and rerunning cogapp to expand it in place.

Kick nginx

make reload in the same Makefile, which just does

$ podman exec systemd-nginxpod nginx -s reload

Done! Check it...

At this point, the site is live. (Yes, the very site you're reading this on; the previous sites all had debugging steps that made the notes a lot less clear, so I didn't have a clean set of directions previously...) Check it in a client browser, and add it to whatever monitoring you have.

Conclusions

So we now have a relatively simple path from "an idea and some writing" to "live website with basic presentation of content". A bit too much copy-and-paste currently, and the helper Makefile really needs to be parameterized or become an outright standalone tool. (Moving certbot to a separate pod also needs investigating.) Now back to the original tasks of moving web servers off of old hardware, and pontificating actually blogging!


  1. Not yet as automated as I'd like, but currently using Ned Batchelder's Cog to write macros in python and have them update the nginx config in-place in the same file. Eliminates a bunch of data entry errors, but isn't quite an automatic "find the content directories and infer the config from them" - but it is the kind of rope that could become that. 

  2. While this is a little messy for a single config file, it would be a reasonable direction to skip the symlinks and just have a top-level config file inside the container include subdir/*.conf to pick up all of the (presumably generated) files there, one per site. This is only an organizational convenience, the resulting configuration is identical to having the same content in-line, and it's not clear there's any point to heading down that path instead of just generating them automatically from the content and never directly editing them in the first place. 

  3. The PublishPort option just makes the "local" aliases for ports 80 and 443 appear inside the container as 80 and 443; there's a separate pod-forward-web-ports.service that runs the nftable commands (with root) as a "oneshot" systemd System Service. 

  4. In my case, that means "update the zone file with emacs, so it auto-updates the serial number" and then push it to my CVS server; then get all of the actual servers to CVS pull it and reload bind. 

Got far enough into staticsite that it was time to go beyond the basic blog, and the ice cream blog turns out to be a good testbed for that.

Fix the images

Images (specifically, jpg files from cameras or modern cellphones) are, by default, large and messy, despite staticsite doing clever things with img.srcset. It turns out that there's a stack of problems:

  • ImageMagick convert doesn't update (or discard) EXIF.width and EXIF.height when resizing, and later parts of the toolchain (probably including the browser itself) get mislead by small images with large dimensions.
  • Certain parts of the staticsite markdown processing path end up giving absolute instead of relative links to the produced images (still looking for where though) and so if you make a local sandbox copy of the main site, some of the img files that the browser fetches actually come from the upstream live site instead of the sandbox, completely confusing your debugging process.
  • I really want the images to use bootstrap's img-fluid which I can add using the markdown "attributes" extension, which is already turned on, but I want it consistently site wide.

On top of that, it may turn out that the part of the problem I care about needs to be fixed in the python-markdown layer instead of staticsite itself, but it may just be "non-overridable python code"1 rather than something I even can fix in a theme.

Current solutions:

  • github ticket #70 filed to describe the <img> problem and hostname part.
  • Use the python-markdown attribute extension {: class="img-fluid"} manually on all images, so that they scale-to-fit regardless of what processing they've been through.
  • Wrote a little icecream-start shop-name that takes kpa-grep output and fills in a blank markdown file with a title and filled in ![]() image includes for each image (so I can write the article and just delete the unneeded images as I go along - which will work better once #70 is fixed, for now half of the images go upstream instead of locally.)
  • Bigger hammer: icecream-start now uses jhead -autorot -purejpg2 which just rotates them losslessly and wipes out any conflicting EXIF metadata. This, combined with img-fluid and a width-clamp in site.css were the minimal "image-heavy pages are actually good now" set of changes.

Finish taxonomy support

staticsite has Hugo-style taxonomies (to the point of linking to them for documentation.) It does a fine job building index pages, but stops there. The two followons to make them useful are

  • Link those index pages in the navbar (or the sidebar, but for photo-heavy mobile use I find that the sidebar is an utter failure, so my first template effort was to turn that off and use full width ("12 column" in bootstrap terms)
  • The default page templates include the tags at the bottom, but only if they're from the tags taxonomy. Turns out we can just iterate over the available taxonomies and render all of them.

Current solutions:

  • navbar config is one line in the index.md metadata, done.
  • replacing the "tags for this article" with "all tags for all taxonomies for this article" was some simple nested loops in Jinja2, once I got past the scoping problem below.

A future possibility is to add some markup (possibly subverting the wikilinks syntax, or maybe just using links with a magic urltype) that lets me just use the tags in-line in the text without having to put them in the per-post metadata. (Future, not blocking for now, and ideally it would just be a hook into the same taxonomy plumbing.)

The template changes ran into some issues:

  • Jinja2 macros are file scoped, so an attempt to replace a single macro (like inline_page as called by pages) is silently ignored, instead you need to replace the entire file including the otherwise unchanged calling macro (at which point you might consider giving up on extending the existing theme in the first place.)
  • Some of the ssite subcommands will parse a .staticsite.py or settings.py in the top level of the site source, which would let you configure a theme; important ones like ssite show ignore that entirely and require a --theme argument.
  • For a while this looked like "syntactically bad themes (or settings) were silently not imported"; that turns out not to be true, it just wasn't importing them at all because the config was ignored instead.
  • The existing settings aren't actually in-scope in the settings file, though you may be able to import the global settings it's not clear that those are the correct ones after other processing.
  • Some of the data structures visible in the template act like strings but aren't strings - so for example, you can iterate over the taxonomies, and if you render that inline you get the names, but you can't then get the taxonomy from there because you end up attempting to use the object as a key and not the name. On top of that, python code in jinja2 templates has very limited access to python builtins - so you don't have dir or str (though you can simulate the latter with "" ~ var, it's not great.) Turns out that most of these objects have a .name you can use directly, but I haven't found good documentation for that - but at this point, I recognize it as a pattern, so "just try .name" is part of my experimentation repertoire.

System dark mode

blag had what turns out to be really simple bits of CSS3 for a dark mode that turns on when the browser is in dark mode (usually triggered by "system" darkmode, through xsettings and GTK themes.) It's worth adding that to the staticsite theme if we can do it in a simple way.

Current solutions:

  • Within the theme directory, static/css/*.css get installed, so just copy the default site.css there and add extra files that it explicitly @include's.
  • Specifically, @import "bootstrap-color-fix.css" screen and (prefers-color-scheme: dark); isolates all of the horror - so providing a color mode is only one mechanical line of CSS.
  • To create that file, just copy /usr/share/javascript/bootstrap4/css/bootstrap.css (include attribution comments, it is MIT licensed) and delete everything that isn't a color, which gets it down to about 700 entries; then cook up a little elisp to "invert" a color string in the buffer. Yes, this is gruesomely brute force - but it's short term: bootstrap 5.3 has proper dark-mode support built in, so when staticsite upgrades (not something I'm prepared to tackle myself right now1) we can just discard these changes and use that support instead. (I don't actually want any in-page controls for this, just automatic support for the viewer's system or in-browser choices.)

More markdown extensions

It's a little messy to even turn on extensions; the documentation (doc/reference/pages/markdown.md) says you can set MARKDOWN_EXTENSIONS but it doesn't actually say where and see the problem above about things ignoring settings.py.

Aside from wikilinks for in-line taxonomy reference, I'd like to turn on whatever makes bare URLs into links; SO suggests just using <> which I'd forgotten, but also gives both a (mildly flawed) sample extension for it and a pointer to markdown2 which has link-patterns as a mechanism for this.

Geography

Saw Simon Willison's experiments with OpenFreeMap and MapLibre and realized it would be really easy to lay out my Ice Cream Journey on it. Not sure it's worth actually hosting an entire tileset (when by definition I only need Massachusetts), and later on I might just stash maps at various static zoom levels or something simple like that. For now, though, it's responsive and doesn't need an API key, and the Javascript interface is straightforward.

In fact, my use of the interface is probably too straightforward - rather than being generated from page metadata, there's just a hard-coded list of Names, markdown page names, and lat/long pairs, and two dozen lines of code to forEach the place list and create a maplibregl.Marker attached to a maplibregl.Popup for each; through the glory of Unicode, we can even have 🍨 markers for general ice cream and 🍦 for places that specialize in soft-serve. That all works fine, the only manual step is adding a single line of data to the map.html file for every review I do - technically moving it into per-page metadata wouldn't be less work, or more robust in any way, but it feels like the right place for it, so I'll get to that eventually.

Since this is still an experiment, I didn't want to just have "Map" in the navbar, I wanted a specific experimental marker in the title. The definition of the navbar is just a list in the metadata of index.md itself, but the titles are expected to be in the metadata of each of those pages - the main trick here is that raw html files aren't, they're actually J2Page Jinja2 templates, so you can stuff a {% block front_matter %} inside an HTML comment, and that works as a clean way to hide the metadata.4

Page Width

One final issue (and one of the only design aspects I've gotten feedback about from readers5) is that on a wide screen, the pictures are too huge and the text ends up ridiculously wide. It took decades but the web design industry did realize that the newspaper industry's use of narrow columns was good for reading,6 but Bootstrap itself doesn't appear to have any useful defaults for this (or even any good stackoverflow answers.) All it needs is

@media (min-width: 40em) {
    .container-fluid {
        width: 40em;
    }
}

(adjust 40em to taste, but probably keep it in character-width units to stay consistent with other user preference choices.) All this declares is that if the screen is 40em wide or larger, set the outermost bootstrap container width to 40em; this keeps smaller size layouts unchanged, and breaks smoothly as you get larger.


  1. It's open source python code, everything is overridable, but for me it's a big step towards just writing a new engine (or adding these features to one of my old ones) which I'm specifically shying away from in this moment. 

  2. github:Matthias-Wandel/jhead, yes, that Matthias Wandel of youtube woodworking fame

  3. See blag style.css for the prefers-color-scheme conditionals in @media stanzas; a mere 8 lines for each scheme. 

  4. This trick doesn't appear to work for generated references, so while I can add archive to the nav list, it gets the site title instead... currently worked around with a querySelector.textContent assignment in a DOMContentLoaded function in the blog.html and page.html templates, but ironically that doesn't fix the archive page itself. 

  5. Both of them! Dark mode, on the other hand, was entirely implemented for me personally, and worth the effort to get working when I was still looking at the site in draft, regardless of anyone else ever seeing it. 

  6. Even though it had very little to do with that and was more of an artifact of how to assemble type in frames for printing, up through linotype and phototypesetting in column inches that were literally pasted up. 

My earlier attempts to distill blogging (and blog creation) down from a software and sysadmin task to "just name something and start writing" have kind of failed, but as I'm shuffling around hardware and feeling inspired to procrastinate by writing, I'm doing another pass.

Given that I'm python-oriented, I wanted something primarily in python, open source, with extra points for "maintained in Debian" and "I haven't failed to use it previously."

Blag

blag is maintained by a Debian developer, easy to get launched, is named after an XKCD comic, and I actually put 3 draft blogs together with it in a couple of days before trying the next thing. (In particular, I had one site that was going to mostly be collected essays and with some blog bits, and not primarily a blog, though I still wanted an index and RSS and tagging, I had some trouble reorganizing that one into the right shape.)

Definitely still worth a look, especially for anything "actually blog shaped" - I had filled half a whiteboard with notes on what I actually wanted before I stumbled on the next candidate, so it was very helpful in getting me to define what I meant by "static site blogging" and how that was different from what I thought I meant. Unlike many of the other systems discussed here, the developer actually notices github issues, which is commendable.

Staticsite

staticsite caught my eye in an odd sort of way - it's still a markdown blog with other features, an instant-blog tutorial (doc/tutorial/blog.md), and some obvious tooling. What stood out was that it had Hugo-inspired taxonomy support - when tags aren't enough but you want kinds of tags, this lets you name and label a group, and have automatic lists of pages in the navbar, just by using them (and creating one two-line file.) This was attractive, especially for my ice cream blog which is itself completely serious but also serves as a playground for tooling and rendering ideas; ice cream shops have flavors, towns, and novelties and I can just drop a little metadata on each page.

(2024-08-07 side note: still fixing some details like actually including those on the pages themselves like tags are1, doing user defined themes2 at all, and fixing the image handling3; I'm not stuck on any of those, just merely-part-way into them.)

(2024-08-21 side note: fixed the above and I'm using it live - see staticsite-itself for more in-depth usage and customization.)

Others

Others I've glanced at - didn't really dismiss, they just didn't end up on the fast-path before I got to staticsite:

Pelican

pelican is in Debian, and the initial description starts with metadata in a post; this wasn't originally an objectionable issue, but after using blag and staticsite I find I really want a minimal post to need no more than a # title (though I certainly want to be able to add metadata later, that's "being organizational", not "blogging", and is minor unexpected friction.) Is this excessive? Certainly, but I'm also someone that recommends that developers learn to touch-type (and pick an editor) early in their careers - I'm already committed to being excessive about flow and friction.

Nikola

Nikola python, markdown, MathJax; also heavy on the required metadata (and seems to require a new_post command. ssite new is similar but optional, and is really just a generic "run a template for me" tool.) Looks very featureful, I was just in the mood for something with less rope.

Hyde

Hyde is named as a pun on Jekyll (a popular github-pages-capable ruby static site tool) - not in Debian, is on pypi but last release was 9 years ago, the description page has many dead links, and doesn't yet have a completed python3 port.

other sources


  1. the main blog template renders tags in-line but doesn't automatically notice taxonomies (or better yet, taxonomies mentioned in the nav bar.) 

  2. turns out that ssite show ignores .staticsite.py so you can't set an explicit path to a theme, but it takes a --theme argument; misleadingly, ssite shell does read the settings. There are probably 2 or 3 issues here, I'm just not sure which ones are real (the "show ignores settings" bit might just be an under-documented security concern) and haven't filed them yet. 

  3. recently figured out that ImageMagic convert -resize produces a smaller JPEG, but doesn't update the EXIF Data which definitely misleads the browser, and is probably also misleading ssite when it generates the smaller images (since it also doesn't discard the EXIF data.) Again, still needs a couple of experiments where I do clean up and let it re-run before deciding which parts are actually issues. (In the end, I stomped on the native size-handling with bootstrap's img-fluid.) 

I posted1 a poll on mastodon:

What's your choice for an internet-facing web server, in 2024? Security over performance, ease (or lack) of configuration is a bonus; presence in Debian or Ubuntu preferred but anything I can build from source (so, probably not written in go) and has a good CVE story is of interest.

In the initial set I included Apache, nginx, lighttpd, caddy, and webfs (based on them showing up in popcon.) So far nginx is in the lead with caddy and Apache surprisingly close to tied, but the fascinating bit was the followups about servers that either I hadn't heard of, or didn't realize qualified. (It got boosted early on by Tim Bray and Glyph which got it much broader attention than I expected, which I believed really helped reach people who provided some of the more unusual followups.)

Twisted Python

Twisted is actually packaged and has decades of usage - it just wasn't tagged with the httpd virtual package in debian so it didn't come up in my original search. (It also doesn't currently include any config to run by default, but really it would just be a basic .service file to invoke twisted web with some arguments.) Glyph points out that twisted's TLS support is in C, but that parsing HTTP with C in 2024 is just asking for trouble.

Kestrel+YARP

Kestrel is the web server component of DotNet Core - this combination handles all of the app service front end traffic for Azure

YARP itself is a standalone MIT-licensed reverse proxy written in C# (nothing to do with Edgar Wright.)

Thanks to Blake Coverett for pointing this one out (they used it under Debian and Ubuntu in production!) but the DotNet ecosystem is pretty far outside my comfort zone/tech bubble.

OpenBSD httpd

OpenBSD ships a default http server with strong security and simple configuration. This does look solid and would be high on the list if I were running OpenBSD - there's some risk that it uses OpenBSD's advanced isolation features in ways that a naïve Linux port might not get right, but if I find an active one I'll look further.

There's an AsiaBSDCon 2015 paper which describes the history of it replacing nginx (which itself replaced an Apache 1 fork) as the native OpenBSD web server; this includes a long discussion of their attempts to harden nginx that are worth a look in terms of secure software development challenges.

haproxy

Marko Karppinen pointed out that haproxy (which is packaged but doesn't Provides: httpd either) actually works directly as a web server - no direct file support, but it can terminate HTTPS connections and pass the connections on to HTTP backends. (As of haproxy 2.8, acme.sh can update a running haproxy directly, without disruptive restarts.)

Traefik

Gigantos pointed out that Traefik can also terminate HTTPS directly, and has builtin ACME (Let's Encrypt) support as well as being able to do service discovery instead of needing direct per-site configuration - depending on the shape of those providers that might not end up being less work but it's arguably putting the information in a more correct place.

NGINX Unit

PointlessOne suggested that for very dynamic backends, NGINX Unit was worth a look - it supports a huge variety of languages while still having attention on security and performance.

Apache with mod_md

Most of the comments on apache were about how "it still works" and had decades of attention, but Marcus Bointon pointed out mod_md which adds ACME support directly as an Apache Module (shipped with apache since 2.4.30, which predates Ubuntu 20.04, it's been around for a while) defaulting to Let's Encrypt. (He goes on to complain about the lack of HTTP/3 support, but from my perspective it's evidence that Apache isn't standing still after all.)

Lighttpd

There was actually one vote against lighttpd from Chris Siebenmann as having stagnated too much to seriously consider for new deployments. (It does still get active development but I'm going for general impressions here and this one was interesting.)

h2o

FunkyBob chimed in near the end of the survey with h2o (in front of Django.) h2o turns out to be

  • MIT licensed
  • Written in C
  • Responds reasonably to CVEs
  • Used to do releases on github but now takes the interesting approach that ... each commit to master branch is considered stable and ready for general use ...
  • Packaged in ubuntu and debian (also without Provides: httpd, but it's a 2018 version with a bunch of cherry-picked fixes the look like upstream, so I'm not sure how "actually" up-to-date that version is (late 2023 best-case though.)
  • Also available as a library, which is common in go projects but a lot more unusual in C servers.

I thought I'd never heard of it, but I'd starred it on github at some unknown point.

Conclusions

Primarily Confirmation

  • There's more life in Apache than I'd realized (mod_md in particular)
  • nginx is still the mainstream choice
  • caddy is definitely up-and-coming with an enthusiastic community

Actual final numbers: 491 people responded.

  • 19% Apache
  • 53% nginx
  • 2% lighttpd
  • 21% Caddy
  • 0% webfs
  • 4% other/explain

Unexpected Highlights

Not going to do another survey on them, but I was pleased (and surprised) at the number of serious alternatives that turned up, including a few things that I knew about but didn't realize were legitimate answers to my question:

  • Twisted Web (including python3-txacme)
  • Nginx UNIT
  • haproxy
  • Traefik
  • Kestrel+YARP (dotnet)

Personal Decisions

Part of the motivation for the survey was that I was stuck on an upgrade path for some old blogs and project sites. While that sounds low-value, it's also my playground for professional builds and recommendations, so I take it way more seriously than I probably should...

While the survey results didn't give me a final answer (nor were they intended to) they did reduce some fretting and lead me to a more direct plan:

  • put a bounded amount of time into building caddy to my latest-from-source standards
  • prototype something with Twisted Web, particularly for the fast path "idea → domain registration → publication" projects, and see how it feels for more conventional use
  • fall back to nginx if I don't get anywhere in a week.

What Actually Happened

Since I wanted to get at least one blog up and running quickly to publish this article, I took a shorter path:

  • Installed blag which is probably the least-effort markdown blog to get going2
  • Used my draft caddy-in-podman notes to do a quick nginx-in-podman, rootless
  • Used nftables NAT support to forward 80/443 to the podman published ports.

That's just on my laptop but by the time you read this it'll be transplanted to a real server.

The key here is that the nginx-in-podman bit is just the server:

  • it bind-mounts nginx.conf
  • it bind-mounts a multiple-domain content directory

so content and operation are relatively separated, a new server can be tested with the live content, and more importantly - if I succeed in my caddy building efforts, I can drop in a caddy-in-podman container and "effortlessly" swap from nginx to caddy without actually any real sysadmin effort beyond a podman stop/podman run (which also leaves me a quick path to rolling back to the working version.) Yes, this is the whole promise of container-based modularity, but I needed to see it scale down without a bunch of larger scale complexity.3


  1. Three entire months ago, in July 2024, simplifying the short-mastodon-rant to large-blog-rant pipeline is the entire thing I was trying to kick off here... 

  2. But see followup discussion on static site tools

  3. Does Kubernetes bring anything to an environment with 2 or 3 containers? I'm not prepared to find out just yet.