THOK.ORG itself

The biggest and last website to move over to new hardware was THOK.ORG itself. Bits of this website go back decades, to a slightly overclocked 486DX/25 on a DSL line - while static websites have some significant modern advantages, the classic roots are in "not actually having real hardware to run one". That said, it does have a lot of sentimental value, and a lot of personal memory - mainly personal project notes, for things like "palm pilot apps" or "what even is this new blogging thing" - so I do care about keeping it running, but at the same time am a little nervous about it.

(Spoiler warning: as of this posting, the conversion is complete and mostly uneventful, and I've made updates to the new site - this is just notes on some of the conversion process.)

Why is a static site complicated?

"static site" can mean a lot of things, but the basic one is that the web server itself only delivers files over http/https and doesn't do anything dynamic to actually deliver the content.1 This has security benefits (you don't have privilege boundaries if there are no privileges) and run-time complexity benefits (for one example, you're only using the most well-tested paths through the server code) but it also has testing and reliability benefits - if you haven't changed anything in the content, you can reasonably expect that the server isn't going to do anything different with it, so if it worked before, it works now.

This also means that you will likely have a "build" step where you take the easiest-to-edit form and turn it into deliverable HTML. Great for testing - you can render locally, browse locally, and then push the result to the live site - but it does mean that you want some kind of local tooling, even if it's just the equivalent of find | xargs pandoc and a stylesheet.

For THOK.ORG, I cared very little about style and primarily wanted to put up words (and code snippets) - Markdown was the obvious choice, but it hadn't been invented yet! I was already in the habit of writing up project notes using a hotkey that dropped a username and datestamp marker in a file, and then various "rich text" conventions from 1990's email (nothing more than italic, bold, and code) - I wasn't even thinking of them as markup, just as conventions that people recognized in email without further rendering. So while the earliest versions of the site were just HTML, later ones were a little code to take "project log" files and expand them into blog-like entries. All very local, README β†’ README.html and that was it.

Eventually I wrote a converter that turned the project logs into "proper" markdown - not a perfect one (while using a renderer helped bring my conventions in line with what rendered ok, I never managed to really formalize it and some stuff was just poorly rendered), just one that was good enough that I could clean up the markdown by hand and go all in on it. There was a "side trip" of using Tumblr as a convenient mobile blogging service - phone browsers were just good enough that I could write articles in markdown on a phone with a folding bluetooth keyboard at the pycon.ca conference (2012) and get stuff online directly - I didn't actually stick with this and eventually converted them back to local markdown blogs (and then still didn't update them.)

Finally (2014 or so) I came up with a common unifying tool to drag bits of content together and do all of the processing for the content I'd produced over the years. thoksync included a dependency declaration system that allowed parallelized processing, and various performance hacks that have been overtaken by Moore's Law in the last decade. The main thing is that it was fast enough to run in a git post-update hook so when I pushed changes to markdown files, they'd get directly turned into live site updates. Since I was focussed on other things in the meantime (including a new startup in 2015) and the code worked I hadn't really touched it in the last decade... so it was still python 2 code.

Python 2 to Python 3 conversion

Having done a big chunk of work (including a lot of review, guidance, and debugging) on a python 3 conversion of a commercial code base, I was both familiar with the process and had not expected to ever need to touch it again - the product conversion itself was far later than was in any way reasonable, and most other companies would have been forced to convert sooner. It was a bit of a surprise to discover another 2000+ lines of python 2 code that was My Problem!

While there were only a few small CLI-tool tests in the code (which I was nonetheless glad to have) I did have the advantage of a "perfect" test suite - the entire thok.org site. All I had to do was make sure that the rendering from the python 3 code matched the output from the python 2 code - 80,000 lines of HTML that should be the same should be easy to review, right?

This theory worked out reasonably well at first - any time the partially converted code crashed, well, that was obviously something that needed fixing.

Here in 2025, with Python 3.14 released and the Python Documentary published, noone really cares about the conversion process as anything but a historical curiousity... but I had a bunch of notes about this particular project so I might as well collect them in one place..

Once the tooling got to the point where it ran on the entire input without crashing, the next "the pre-existing code is by definition correct" test was to just diff the built site (the output) with the existing Python 2 version. The generated HTML code converged quickly, but it did turn up some corrupted jpg files and other large binaries; these were all repairable from other sources, but does suggest that using more long-term content verification (or at very least, "checking more things into git") should be an ongoing task. (All of the damage was recoverable, it was just distressing that it went undiscovered as long as it did.)

Attempting to get out of the blog tooling business

The tooling described here evolved around a particular kind of legacy data and ideas, and isn't really shaped appropriately for anyone else to use - it isn't even well-shaped for me to use on any other sites. While the port did allow me to do some long-overdue content maintenance of thok.org itself, it was getting in the way of a number of other web-writing projects. Attempting to apply the Codes Well With Others principle, I dug into using staticsite, which was simple, written in Python 3, based on markdown and Jinja2 and had at least some recent development work. I ended up using it for several sites including this one, though not thok.org itself (at this time.)

I may end up going back and doing a replacement for staticsite, but I expect to keep it shaped like staticsite so I can use it as a drop-in replacement for the current handful of sites, and it's really worked pretty well. (I will probably try to start with just a replacement template - using plain HTML rather than upgrading to a current version of React - since most of what I want is very simple.) The other possibility is to move to pandoc as the engine, because it tries hard in entirely different ways.

Things Left Behind

The old system had a notification mechanism called Nagaina that had a plugin mechanism for "probes" (AFS, Kerberos, NTP, disks, etc.) It had a crude "run all current probes, then diff against the previous run and notify (via Zephyr) if anything changed". The biggest flaw of this approach was that it relied on sending messages via MIT's Zephyr infrastructure; the second biggest was that it actually worked so I didn't feel that compelled to improve it (or move to something else.)

The new system has a bunch of systemd timer jobs that do things and reports on them by email; OpenAFS notification is gone because the cell is gone, and other things have simpler failure modes and just need less monitoring. I have an extensive folder of possible replacement notification mechanisms - some day I'll pick one and then work backwards to tying anomaly detection and alerting into it.


  1. This definition of static doesn't preclude things with client-side javascript - I've seen one form of static site where the server delivered markdown files directly to the client and the javascript rendered them there, which is almost clever but requires some visible mess in the files, so I've never been that tempted; it would also mean implementing my own markdown extensions in javascript instead of python, and... no.