systemd
The biggest and last website to move over to new hardware was THOK.ORG itself. Bits of this website go back decades, to a slightly overclocked 486DX/25 on a DSL line - while static websites have some significant modern advantages, the classic roots are in "not actually having real hardware to run one". That said, it does have a lot of sentimental value, and a lot of personal memory - mainly personal project notes, for things like "palm pilot apps" or "what even is this new blogging thing" - so I do care about keeping it running, but at the same time am a little nervous about it.
(Spoiler warning: as of this posting, the conversion is complete and mostly uneventful, and I've made updates to the new site - this is just notes on some of the conversion process.)
Why is a static site complicated?
"static site" can mean a lot of things, but the basic one is that the web server itself only delivers files over http/https and doesn't do anything dynamic to actually deliver the content.1 This has security benefits (you don't have privilege boundaries if there are no privileges) and run-time complexity benefits (for one example, you're only using the most well-tested paths through the server code) but it also has testing and reliability benefits - if you haven't changed anything in the content, you can reasonably expect that the server isn't going to do anything different with it, so if it worked before, it works now.
This also means that you will likely have a "build" step where you
take the easiest-to-edit form and turn it into deliverable HTML.
Great for testing - you can render locally, browse locally, and then
push the result to the live site - but it does mean that you want some
kind of local tooling, even if it's just the equivalent of
find | xargs pandoc
and a stylesheet.
For THOK.ORG, I cared very little about style and primarily wanted to
put up words (and code snippets) -
Markdown was the obvious
choice, but it hadn't been invented yet! I was already in the habit of
writing up project notes using a hotkey that dropped a username and
datestamp marker in a file, and then various "rich text" conventions
from 1990's email (nothing more than italic, bold, and code) - I
wasn't even thinking of them as markup, just as conventions that
people recognized in email without further rendering. So while the
earliest versions of the site were just HTML, later ones were a little
code to take "project log" files and expand them into blog-like
entries. All very local, README
β README.html
and that was it.
Eventually I wrote a converter that turned the project logs into
"proper" markdown - not a perfect one (while using a renderer helped
bring my conventions in line with what rendered ok, I never managed to
really formalize it and some stuff was just poorly rendered), just one
that was good enough that I could clean up the markdown by hand and go
all in on it. There was a "side trip" of using
Tumblr as a convenient mobile
blogging service - phone browsers were just good enough that I could
write articles in markdown on a phone with a folding bluetooth
keyboard at the pycon.ca
conference
(2012) and get stuff online directly - I
didn't actually stick with this and eventually converted them back
to local markdown blogs (and then still didn't update them.)
Finally (2014 or so) I came up with a common unifying tool to drag
bits of content together and do all of the processing for the content
I'd produced over the years. thoksync
included a dependency
declaration system that allowed parallelized processing, and various
performance hacks that have been overtaken by Moore's Law in the last
decade. The main thing is that it was fast enough to run in a git
post-update
hook so when I pushed changes to markdown files, they'd
get directly turned into live site updates. Since I was focussed on
other things in the meantime (including a new startup in 2015) and the
code worked I hadn't really touched it in the last decade... so it
was still python 2 code.
Python 2 to Python 3 conversion
Having done a big chunk of work (including a lot of review, guidance, and debugging) on a python 3 conversion of a commercial code base, I was both familiar with the process and had not expected to ever need to touch it again - the product conversion itself was far later than was in any way reasonable, and most other companies would have been forced to convert sooner. It was a bit of a surprise to discover another 2000+ lines of python 2 code that was My Problem!
While there were only a few small CLI-tool tests in the code (which I was nonetheless glad to have) I did have the advantage of a "perfect" test suite - the entire thok.org site. All I had to do was make sure that the rendering from the python 3 code matched the output from the python 2 code - 80,000 lines of HTML that should be the same should be easy to review, right?
This theory worked out reasonably well at first - any time the partially converted code crashed, well, that was obviously something that needed fixing.
Here in 2025, with Python 3.14 released and the Python Documentary published, noone really cares about the conversion process as anything but a historical curiousity... but I had a bunch of notes about this particular project so I might as well collect them in one place..
- Trivia
#!
update (I prefer/usr/bin/python3
but there are solid arguments that/usr/bin/env python3
is better; I just don't happen to usevenv
orvirtualenv
, so for my workflow they're equivalent.)print
βprint()
,>>
βfile=
- print itself was one of the original big obnoxious changes that broke Python 2 code instantly, it wasn't until relatively late thatfrom __future__ import print_function
came along, which didn't help existing code but gave you a chance to upgrade partially and have shared code that was still importable from both versions. (Sure, library code shouldn't callprint
- it still did, so it was still a source of friction. Personally I would have preferred a mechanism for paren-less function calls or definitions... but I wanted that when I first started using Python 2, and it was pretty clear that it wasn't going to happen. M-expressions didn't catch on either...Popen(text=True)
was a fairly late way of saying "the python 2 behaviour was fine for most things, let's have that back instead of littering every read and write with conversion code." (universal_newlines=True
did the same thing earlier, kind of accidentally.)file()
βopen()
wasn't particularly important.long
βint
(only intumblr2thoksync
, most of this code was string handling, not numeric) - this was just dropping an alias for consistency, they'd long been identical even in Python 2.import rfc822
βimport email.utils
(parsedate
andformatdate
were used in a few RSS-related places. Just (reasonable) reorganization, the functions were unchanged.SimpleHTTPServer
,BaseHTTPServer
βhttp.server
isinstance(basestring)
βisinstance(str)
string/byte/unicode handling was probably the largest single point where reasoning about large chunks of code from a 2-and-3 perspective was necessary; it's also somewhere that having type hints in python 2 would have been an enormous help, but the syntax didn't exist. Fortunately, for this project none of the subtleties applied - most of the checks were really that something was not anxml.etree
fragment, it didn't matter at all what kind of string it was.
- Language improvements
except as
- nicer to stuff an exception that you're intentionally poking at into a relevantly-named variable instead of rummaging around insys.exc_info
. (raise from
is also great but nothing in this codebase needed it.)f=open()
βwith open() as f
encourages paying attention to file handle lifetimes, reducing risk of handle leakage, and avoiding certain classes of bugs caused by files not flushing when you expect them to ("when the handle gets garbage collected" vs. the much more explicit and visible "when you leave the scope of thewith
clause.)- argument "
tuple
unpacking" is gone - this wasn't an improvement so much as "other function syntax made it harder to get this right and it wasn't used much, and there was a replacement syntax (explicit unpacking) so it was droppable." Not great but maybe it was excessively clever to begin with. - Python 2 allowed sorting functions by
id
; Python 3 doesn't, so just extract the names inkey=
(the actual order never mattered, just that the sort was consistent within a run.)
- Third-party library changes (after all, if your callers need massive
changes anyway, might as well clean up some of your own technical
debt, since you can get away with incompatible changes.)
- Markdown library
markdown.inlinepatterns.Pattern
βInlineProcessor
(the old API still exists, but some porting difficulties meant that the naΓ―ve port wasn't going to work anyway, so it made sense to debug the longer-lived new API.)etree
no longer leaked frommarkdown.util
(trivial)- grouping no longer mangled, so
.group(1)
is correct and what I'd wanted to use in the first place add
βregister
(trivial)- different return interface
- string hack for
WikiLinkExtension
arguments no longer works; the class-based interface was well documented and had better language level sanity checking anyway.
- lost the
feedvalidator
package entirely sominivalidate.py
doesn't actually work yet (probably not worth fixing, external RSS validators are more well cared for and independent anyway. lxml.xml.tostring
βencoding="unicode"
in a few places to json-serialize sanely- in a few places, keep it
bytes
butopen("w" β "wb")
instead
- in a few places, keep it
- Markdown library
Once the tooling got to the point where it ran on the entire input
without crashing, the next "the pre-existing code is by definition
correct" test was to just diff the built site (the output) with the
existing Python 2 version. The generated HTML code converged quickly,
but it did turn up some corrupted jpg
files and other large
binaries; these were all repairable from other sources, but does
suggest that using more long-term content verification (or at very
least, "checking more things into git") should be an ongoing task.
(All of the damage was recoverable, it was just distressing that it
went undiscovered as long as it did.)
Attempting to get out of the blog tooling business
The tooling described here evolved around a particular kind of legacy
data and ideas, and isn't really shaped appropriately for anyone else
to use - it isn't even well-shaped for me to use on any other sites.
While the port did allow me to do some long-overdue content
maintenance of thok.org itself, it was getting in the way of a number
of other web-writing projects. Attempting to apply the Codes Well
With Others principle, I dug into
using staticsite, which was
simple, written in Python 3, based on markdown
and Jinja2
and had
at least some recent development work. I ended up using it for
several sites including this one, though not thok.org itself (at this
time.)
I may end up going back and doing a replacement for staticsite
, but
I expect to keep it shaped like staticsite
so I can use it as a
drop-in replacement for the current handful of sites, and it's really
worked pretty well. (I will probably try to start with just a
replacement template - using plain HTML rather than upgrading to a
current version of React - since most of what I want is very simple.)
The other possibility is to move to pandoc
as the engine, because it
tries hard in entirely different ways.
Things Left Behind
The old system had a notification mechanism called Nagaina that had a plugin mechanism for "probes" (AFS, Kerberos, NTP, disks, etc.) It had a crude "run all current probes, then diff against the previous run and notify (via Zephyr) if anything changed". The biggest flaw of this approach was that it relied on sending messages via MIT's Zephyr infrastructure; the second biggest was that it actually worked so I didn't feel that compelled to improve it (or move to something else.)
The new system has a bunch of systemd
timer jobs that do things and
reports on them by email; OpenAFS notification is gone because the
cell is gone, and other things have simpler failure modes and just
need less monitoring. I have an extensive folder of possible
replacement notification mechanisms - some day I'll pick one and then
work backwards to tying anomaly detection and alerting into it.
-
This definition of static doesn't preclude things with client-side javascript - I've seen one form of static site where the server delivered markdown files directly to the client and the javascript rendered them there, which is almost clever but requires some visible mess in the files, so I've never been that tempted; it would also mean implementing my own markdown extensions in javascript instead of python, and... no. ↩
There is a feature request from 2019 which is surprisingly1 still open but not really going anywhere. There are scattered efforts to build pieces of it, so for clarity let's write down what I actually want.
What even is "cron-like behaviour"
The basic idea is that the output of a job gets dropped in my mailbox. This isn't because mail is suitable for this, just that it's a well established workflow and I don't need to build any new filtering or routing, and it's showing up at the right "attention level" - not interrupting me unless I'm already paused to check mail, easily "deferred" as unread, can handle long content.
Most uses fall into one of two buckets.
- Long-timeline jobs (weekly backups, monthly letsencrypt runs) where I want to be reminded that they exist, so I want to see successful output (possibly with different subject lines.)
- Jobs that run often but I don't want the reminder, only the failure reports (because I have a higher level way of noticing that they're still behaving - a monthly summary, or just "things are still working".)
The primary tools for this are
- a working
mail
CLI systemd
timer filessystemd
"parameterized service" files that get triggered by the timer failing (or passing.)
The missing pieces are how to actually collect the output.
Journal scraping?
We could just trust the journal - we can use journalctl --unit
or
--user-unit
to pry out "the recent stuff" but if we can pass the PID
of the job around, we can use _SYSTEMD_UNIT=xx _PID=yyy
to get the
relevant content.
(Hmm, we can get pass %n
into the mailing service
(systemd.unit(5)
), but not the pid?)
Separate capture?
Just run the program under script
or chronic
pointing the log to
%t
or %T
, and generate it with things we know, and then
OnFailure
and OnSuccess
can mail it and/or clean it up.
While it would be nice to do everything with systemd
mechanisms, if
we have to we can have the wrapper do all of the work so we have
enough control.2
In the end
Once I started poking at the live system, I realized that I was
getting ahead of myself - I didn't have working mail delivery.3
Setting up postfix
took enough time that I decided against anything more
clever for the services - so instead, I just went with a minimal
.service
file that did
WorkingDirectory=...
Type=exec
ExecStart=bash -c '(time ...) 2>&1 | mail -s "Weekly ..." ...
and a matching .timer
file with a variant on
[Timer]
OnCalendar=Monday *-*-* 10:00
The systemd.time(7)
man page has a hugely detailed set of syntax
examples, and if that's not enough,
systemd-analyze calendar --iterations=3 ...
shows you the next few actual times (displayed localtime, UTC, and
as a human-readable relative time expression) so you can be confident
about when your jobs with really happen.
For the initial services like "run an apt
upgrade in the nginx
container" I actually want to see all of the output, since Weekly
isn't that noisy; for other services I'll mix in chronic
and ifne
so that it doesn't bother me as much, but for now, the confidence that
things actually ran is more pleasing than the repetition is
distracting.
I do want a cleaner-to-use tool at some point - not a more
sophisticated tool, just something like "cronrun ..." that
automatically does the capture and mail, maybe picks up the message
subject from the .service
file directly - so these are more
readable. But for now, the swamp I'm supposed to be draining is
"decommissioning two machines running an AFS cell" so I'm closing the
timebox on this for now.
-
but not unreasonably: "converting log output to mails should be outside of systemd's focus." ↩
-
moreutils
gives uschronic
,ifne
, andlckdo
, and possiblymispipe
andts
if we're doing the capturing.cronutils
also has a few bits. ↩ -
This was a surprise because this is the machine I'd been using as my primary mail client, specifically Gnus in
emacs
. Turns out I'd configuredsmtpmail-send-it
so that emacs would directly talk to port 587 on fastmail's customer servers with authenticated SMTP... but I'd never gotten around to actually configuring the machine itself. ↩