systemd

The biggest and last website to move over to new hardware was THOK.ORG itself. Bits of this website go back decades, to a slightly overclocked 486DX/25 on a DSL line - while static websites have some significant modern advantages, the classic roots are in "not actually having real hardware to run one". That said, it does have a lot of sentimental value, and a lot of personal memory - mainly personal project notes, for things like "palm pilot apps" or "what even is this new blogging thing" - so I do care about keeping it running, but at the same time am a little nervous about it.

(Spoiler warning: as of this posting, the conversion is complete and mostly uneventful, and I've made updates to the new site - this is just notes on some of the conversion process.)

Why is a static site complicated?

"static site" can mean a lot of things, but the basic one is that the web server itself only delivers files over http/https and doesn't do anything dynamic to actually deliver the content.1 This has security benefits (you don't have privilege boundaries if there are no privileges) and run-time complexity benefits (for one example, you're only using the most well-tested paths through the server code) but it also has testing and reliability benefits - if you haven't changed anything in the content, you can reasonably expect that the server isn't going to do anything different with it, so if it worked before, it works now.

This also means that you will likely have a "build" step where you take the easiest-to-edit form and turn it into deliverable HTML. Great for testing - you can render locally, browse locally, and then push the result to the live site - but it does mean that you want some kind of local tooling, even if it's just the equivalent of find | xargs pandoc and a stylesheet.

For THOK.ORG, I cared very little about style and primarily wanted to put up words (and code snippets) - Markdown was the obvious choice, but it hadn't been invented yet! I was already in the habit of writing up project notes using a hotkey that dropped a username and datestamp marker in a file, and then various "rich text" conventions from 1990's email (nothing more than italic, bold, and code) - I wasn't even thinking of them as markup, just as conventions that people recognized in email without further rendering. So while the earliest versions of the site were just HTML, later ones were a little code to take "project log" files and expand them into blog-like entries. All very local, README β†’ README.html and that was it.

Eventually I wrote a converter that turned the project logs into "proper" markdown - not a perfect one (while using a renderer helped bring my conventions in line with what rendered ok, I never managed to really formalize it and some stuff was just poorly rendered), just one that was good enough that I could clean up the markdown by hand and go all in on it. There was a "side trip" of using Tumblr as a convenient mobile blogging service - phone browsers were just good enough that I could write articles in markdown on a phone with a folding bluetooth keyboard at the pycon.ca conference (2012) and get stuff online directly - I didn't actually stick with this and eventually converted them back to local markdown blogs (and then still didn't update them.)

Finally (2014 or so) I came up with a common unifying tool to drag bits of content together and do all of the processing for the content I'd produced over the years. thoksync included a dependency declaration system that allowed parallelized processing, and various performance hacks that have been overtaken by Moore's Law in the last decade. The main thing is that it was fast enough to run in a git post-update hook so when I pushed changes to markdown files, they'd get directly turned into live site updates. Since I was focussed on other things in the meantime (including a new startup in 2015) and the code worked I hadn't really touched it in the last decade... so it was still python 2 code.

Python 2 to Python 3 conversion

Having done a big chunk of work (including a lot of review, guidance, and debugging) on a python 3 conversion of a commercial code base, I was both familiar with the process and had not expected to ever need to touch it again - the product conversion itself was far later than was in any way reasonable, and most other companies would have been forced to convert sooner. It was a bit of a surprise to discover another 2000+ lines of python 2 code that was My Problem!

While there were only a few small CLI-tool tests in the code (which I was nonetheless glad to have) I did have the advantage of a "perfect" test suite - the entire thok.org site. All I had to do was make sure that the rendering from the python 3 code matched the output from the python 2 code - 80,000 lines of HTML that should be the same should be easy to review, right?

This theory worked out reasonably well at first - any time the partially converted code crashed, well, that was obviously something that needed fixing.

Here in 2025, with Python 3.14 released and the Python Documentary published, noone really cares about the conversion process as anything but a historical curiousity... but I had a bunch of notes about this particular project so I might as well collect them in one place..

  • Trivia
    • #! update (I prefer /usr/bin/python3 but there are solid arguments that /usr/bin/env python3 is better; I just don't happen to use venv or virtualenv, so for my workflow they're equivalent.)
    • print β†’ print(), >> β†’ file= - print itself was one of the original big obnoxious changes that broke Python 2 code instantly, it wasn't until relatively late that from __future__ import print_function came along, which didn't help existing code but gave you a chance to upgrade partially and have shared code that was still importable from both versions. (Sure, library code shouldn't call print - it still did, so it was still a source of friction. Personally I would have preferred a mechanism for paren-less function calls or definitions... but I wanted that when I first started using Python 2, and it was pretty clear that it wasn't going to happen. M-expressions didn't catch on either...
    • Popen(text=True) was a fairly late way of saying "the python 2 behaviour was fine for most things, let's have that back instead of littering every read and write with conversion code." (universal_newlines=True did the same thing earlier, kind of accidentally.)
    • file() β†’ open() wasn't particularly important.
    • long β†’ int (only in tumblr2thoksync, most of this code was string handling, not numeric) - this was just dropping an alias for consistency, they'd long been identical even in Python 2.
    • import rfc822 β†’ import email.utils (parsedate and formatdate were used in a few RSS-related places. Just (reasonable) reorganization, the functions were unchanged.
    • SimpleHTTPServer, BaseHTTPServer β†’ http.server
    • isinstance(basestring) β†’ isinstance(str) string/byte/unicode handling was probably the largest single point where reasoning about large chunks of code from a 2-and-3 perspective was necessary; it's also somewhere that having type hints in python 2 would have been an enormous help, but the syntax didn't exist. Fortunately, for this project none of the subtleties applied - most of the checks were really that something was not an xml.etree fragment, it didn't matter at all what kind of string it was.
  • Language improvements
    • except as - nicer to stuff an exception that you're intentionally poking at into a relevantly-named variable instead of rummaging around in sys.exc_info. (raise from is also great but nothing in this codebase needed it.)
    • f=open() β†’ with open() as f encourages paying attention to file handle lifetimes, reducing risk of handle leakage, and avoiding certain classes of bugs caused by files not flushing when you expect them to ("when the handle gets garbage collected" vs. the much more explicit and visible "when you leave the scope of the with clause.)
    • argument "tuple unpacking" is gone - this wasn't an improvement so much as "other function syntax made it harder to get this right and it wasn't used much, and there was a replacement syntax (explicit unpacking) so it was droppable." Not great but maybe it was excessively clever to begin with.
    • Python 2 allowed sorting functions by id; Python 3 doesn't, so just extract the names in key= (the actual order never mattered, just that the sort was consistent within a run.)
  • Third-party library changes (after all, if your callers need massive changes anyway, might as well clean up some of your own technical debt, since you can get away with incompatible changes.)
    • Markdown library
      • markdown.inlinepatterns.Pattern β†’ InlineProcessor (the old API still exists, but some porting difficulties meant that the naΓ―ve port wasn't going to work anyway, so it made sense to debug the longer-lived new API.)
      • etree no longer leaked from markdown.util (trivial)
      • grouping no longer mangled, so .group(1) is correct and what I'd wanted to use in the first place
      • add β†’ register (trivial)
      • different return interface
      • string hack for WikiLinkExtension arguments no longer works; the class-based interface was well documented and had better language level sanity checking anyway.
    • lost the feedvalidator package entirely so minivalidate.py doesn't actually work yet (probably not worth fixing, external RSS validators are more well cared for and independent anyway.
    • lxml.xml.tostring β†’ encoding="unicode" in a few places to json-serialize sanely
      • in a few places, keep it bytes but open("w" β†’ "wb") instead

Once the tooling got to the point where it ran on the entire input without crashing, the next "the pre-existing code is by definition correct" test was to just diff the built site (the output) with the existing Python 2 version. The generated HTML code converged quickly, but it did turn up some corrupted jpg files and other large binaries; these were all repairable from other sources, but does suggest that using more long-term content verification (or at very least, "checking more things into git") should be an ongoing task. (All of the damage was recoverable, it was just distressing that it went undiscovered as long as it did.)

Attempting to get out of the blog tooling business

The tooling described here evolved around a particular kind of legacy data and ideas, and isn't really shaped appropriately for anyone else to use - it isn't even well-shaped for me to use on any other sites. While the port did allow me to do some long-overdue content maintenance of thok.org itself, it was getting in the way of a number of other web-writing projects. Attempting to apply the Codes Well With Others principle, I dug into using staticsite, which was simple, written in Python 3, based on markdown and Jinja2 and had at least some recent development work. I ended up using it for several sites including this one, though not thok.org itself (at this time.)

I may end up going back and doing a replacement for staticsite, but I expect to keep it shaped like staticsite so I can use it as a drop-in replacement for the current handful of sites, and it's really worked pretty well. (I will probably try to start with just a replacement template - using plain HTML rather than upgrading to a current version of React - since most of what I want is very simple.) The other possibility is to move to pandoc as the engine, because it tries hard in entirely different ways.

Things Left Behind

The old system had a notification mechanism called Nagaina that had a plugin mechanism for "probes" (AFS, Kerberos, NTP, disks, etc.) It had a crude "run all current probes, then diff against the previous run and notify (via Zephyr) if anything changed". The biggest flaw of this approach was that it relied on sending messages via MIT's Zephyr infrastructure; the second biggest was that it actually worked so I didn't feel that compelled to improve it (or move to something else.)

The new system has a bunch of systemd timer jobs that do things and reports on them by email; OpenAFS notification is gone because the cell is gone, and other things have simpler failure modes and just need less monitoring. I have an extensive folder of possible replacement notification mechanisms - some day I'll pick one and then work backwards to tying anomaly detection and alerting into it.


  1. This definition of static doesn't preclude things with client-side javascript - I've seen one form of static site where the server delivered markdown files directly to the client and the javascript rendered them there, which is almost clever but requires some visible mess in the files, so I've never been that tempted; it would also mean implementing my own markdown extensions in javascript instead of python, and... no. 

There is a feature request from 2019 which is surprisingly1 still open but not really going anywhere. There are scattered efforts to build pieces of it, so for clarity let's write down what I actually want.

What even is "cron-like behaviour"

The basic idea is that the output of a job gets dropped in my mailbox. This isn't because mail is suitable for this, just that it's a well established workflow and I don't need to build any new filtering or routing, and it's showing up at the right "attention level" - not interrupting me unless I'm already paused to check mail, easily "deferred" as unread, can handle long content.

Most uses fall into one of two buckets.

  • Long-timeline jobs (weekly backups, monthly letsencrypt runs) where I want to be reminded that they exist, so I want to see successful output (possibly with different subject lines.)
  • Jobs that run often but I don't want the reminder, only the failure reports (because I have a higher level way of noticing that they're still behaving - a monthly summary, or just "things are still working".)

The primary tools for this are

  • a working mail CLI
  • systemd timer files
  • systemd "parameterized service" files that get triggered by the timer failing (or passing.)

The missing pieces are how to actually collect the output.

Journal scraping?

We could just trust the journal - we can use journalctl --unit or --user-unit to pry out "the recent stuff" but if we can pass the PID of the job around, we can use _SYSTEMD_UNIT=xx _PID=yyy to get the relevant content.

(Hmm, we can get pass %n into the mailing service (systemd.unit(5)), but not the pid?)

Separate capture?

Just run the program under script or chronic pointing the log to %t or %T, and generate it with things we know, and then OnFailure and OnSuccess can mail it and/or clean it up.

While it would be nice to do everything with systemd mechanisms, if we have to we can have the wrapper do all of the work so we have enough control.2

In the end

Once I started poking at the live system, I realized that I was getting ahead of myself - I didn't have working mail delivery.3 Setting up postfix took enough time that I decided against anything more clever for the services - so instead, I just went with a minimal .service file that did

WorkingDirectory=...
Type=exec
ExecStart=bash -c '(time ...) 2>&1 | mail -s "Weekly ..." ...

and a matching .timer file with a variant on

[Timer]
OnCalendar=Monday *-*-* 10:00

The systemd.time(7) man page has a hugely detailed set of syntax examples, and if that's not enough, systemd-analyze calendar --iterations=3 ... shows you the next few actual times (displayed localtime, UTC, and as a human-readable relative time expression) so you can be confident about when your jobs with really happen.

For the initial services like "run an apt upgrade in the nginx container" I actually want to see all of the output, since Weekly isn't that noisy; for other services I'll mix in chronic and ifne so that it doesn't bother me as much, but for now, the confidence that things actually ran is more pleasing than the repetition is distracting.

I do want a cleaner-to-use tool at some point - not a more sophisticated tool, just something like "cronrun ..." that automatically does the capture and mail, maybe picks up the message subject from the .service file directly - so these are more readable. But for now, the swamp I'm supposed to be draining is "decommissioning two machines running an AFS cell" so I'm closing the timebox on this for now.


  1. but not unreasonably: "converting log output to mails should be outside of systemd's focus." 

  2. moreutils gives us chronic, ifne, and lckdo, and possibly mispipe and ts if we're doing the capturing. cronutils also has a few bits. 

  3. This was a surprise because this is the machine I'd been using as my primary mail client, specifically Gnus in emacs. Turns out I'd configured smtpmail-send-it so that emacs would directly talk to port 587 on fastmail's customer servers with authenticated SMTP... but I'd never gotten around to actually configuring the machine itself