Mid-2022 edit: I’ve decided to switch to a different, managed platform. However, I’m keeping this post around for historical purposes.
I’ve been looking to restart blogging for a while now. A number of things held me back, not the least of which is blog technology. I’ve written a couple iterations of my own .NET-based static site generator, but it just wasn’t very exciting for me. I think I’ve finally figured out why.
Static site generators
The standard static site generator uses config and markup (well, Markdown, usually) files in various locations on a filesystem, in conjunction with themed templates, to generate websites that then get hosted on a basic web server without an interactive backend. I love that from a security perspective: that’s a whole class of possible vulnerabilities gone. But I’ve only recently realized that I don’t enjoy it from a UX perspective.
As a techie, and software developer, I live and breathe the filesystem, config files, and command-line tools. So, it came as a surprise to me when I made the realization that I don’t want to work with these things (that I normally hold dear) when writing blog posts. I think there are two major reasons for this: perfectionism and overall mindset.
I am often a perfectionist to a fault. Not to digress too far, but this trait has caused me a lot of trouble, especially when it comes to working on — and completing — personal projects. How does that manifest when it comes to using static site generators for blogging? To put it simply, I don’t think any of them are good enough. Each one has its own limitations, which I invariably encounter, and that severely detracts from my enjoyment of the tools. This, of course, includes my own creations. As I attempt to fix the issues I see with other generators, I introduce new issues and complexities into my own solutions. It’s become a never-ending cycle of annoyance and rework. And in the meantime, I’m not blogging. I’m just working on code that would theoretically, eventually enable me to do so.
The other aspect of UX I found that I miss is the old WYSIWYG editor. Or even a basic rich text editor. Static site generators, by their very nature, don’t come with anything like that. You can instead use whatever editor application you prefer. Lots of editors let you preview Markdown text as you type it, in fact. That’s great, but I’ve discovered that I go into a different mindset when I write English rather than code, and mixing the two will often make me lose sight of what I’m saying and focus too much on the technical aspects of getting it displayed just right.
On the other side of the blogging technology spectrum are the “traditional” blogging platforms, content management systems (CMS), and such. While these do provide a much better UX, they are less secure than static sites due to the fact that they do backend processing, so the attack surface is larger. With that concern comes the requirement to keep the platform hardened and up to date, or to pay someone else to do it.
In addition to the increased security concerns, the traditional platforms are naturally slower, again due to the fact that they do any backend processing. As optimized as they may get, it would still be hard to beat a simple web server hosting static files.
I could pay for blog hosting. I probably wouldn’t notice any appreciable difference in performance between a true static site and a CMS-backed one. But I want to be in control of my data, and there’s still the nagging feeling I get of wanting to implement a customized solution just for me — not some bog standard cough Medium cough thing that everyone else does.
As I mentioned above, blogging technologies lie on a spectrum. What other options are out there that lie between the two extremes?
Headless and hybrid CMS
The concept of a headless CMS has been gaining popularity of late. In essence, a headless CMS is one that has a full-featured backend infrastructure, along with the associated content creation and management user interfaces, but has no frontend (“head”) and instead provides an API to retrieve the content for displaying in another system. That term, along with “hybrid” when applied to a CMS, is not well defined today, so there is some variability in what different vendors call “headless”. For the purposes of this post, let’s say that “headless” means no frontend whatsoever, and “hybrid” means there is a default frontend in addition to an API.
The Ghost platform falls into the hybrid category. It has a frontend, with theming support, but it also has a content API. There are static site generator integrations available, some supported by Ghost directly. When I got this far in my research, I was quite excited, since it appeared that I’d found the solution I was looking for. Alas, I ran into a couple of issues with this approach that prevented me from choosing it. The first one was that none of the themes I found worked nearly as well as the built in and freely available ones for Ghost. That’s not overall a big deal, of course, but I had already started an instance of Ghost in a Docker container and was starting to appreciate its theming, so when I saw the difference between how posts look in Ghost versus how they look rendered in other systems with multiple different themes coming nowhere close to the polish of default Ghost themes, I was pretty disappointed. Even the ported versions of the default “Casper” theme weren’t working as well outside of Ghost.
The other issue was that I would now have to configure and maintain two independent systems to allow me to blog the way I want. And I’ve witnessed far too many instances of breaking API changes and other integration problems with different applications that this isn’t something I felt I’d enjoy maintaining.
Instead, I decided to try a different approach: scraping!
Yeah, I know, this isn’t the early 2000s, and I’m not a search aggregator. But scraping actually makes sense in this instance:
- I have control over the source system;
- Being a hybrid CMS, Ghost does minimal backend processing to display content;
- Themes should work just fine after being scraped.
Turns out, there’s already a project that does something like this specifically for Ghost: buster. Unfortunately, it’s unmaintained and doesn’t actually do a proper job scraping because under the hood it uses
wget, which makes some unfortunate changes to the retrieved contents, such as indiscriminately turning all same-site absolute URLs into relative ones. I couldn’t come up with an obvious way to fix that tool.
That means, of course, I wrote my own. Keeping with the theme, I called it ecto1. It’s a Python 3 script, utilizing BeautifulSoup and tinycss2 for parsing content for further URLs to fetch. Pointing to Ghost’s Sitemap file reveals all the content URLs, while those content URLs themselves point to resources such as images and scripts. Since essentially everything is downloaded, Ghost themes work just fine: after all, they’re just HTML and associated resources.
To complete the workflow, I’m using GitHub Pages for hosting. I intended to use GitHub Actions along with Ghost’s built-in webhook support, but there doesn’t appear to be any way to trigger a GitHub Action with the built-in integrations, so I’ve got a simple script running together with Ghost that gets triggered to execute ecto1. I’m going to experiment with this setup for a little while to see if I like it. Meanwhile,
hello again, world!