PBS 175 of X: Static Site Generators (GitHub Pages)

01 Feb 2025

This instalment is the start of a new series-within-a-series focusing on the technology that powers this website and that hosts the new JavaScript version of XKPasswd: GitHub Pages.

GitHub Pages provides a very interesting free web hosting option that inherently provides a full version history to easily recover from any kind of whoopsie. It’s also surprisingly versatile and powerful. On one end of the spectrum you can use it to simply host a folder of HTML, CSS & JavaScript files exactly as they are, and on the other end of the spectrum you can build a full content management system with all the taxonomies and content organisation features you would expect from a typical blog or corporate site. The mechanism GitHub Pages uses to achieve this is completely different from that used by the most popular content management system in the world, WordPress, but is fundamentally the same as that used by the fashionable new up-and-comer, Hugo.

In this first instalment we’ll set the scene, explaining the difference between dynamic content management systems (CMS) like WordPress, and static site generators like Github Pages and Hugo. We’ll also give a big-picture overview of how GitHub Pages works, and hence, the technologies we’ll be exploring over the coming months.

Matching Podcast Episode

You can also Download the MP3

Read an unedited, auto-generated transcript with chapter marks: PBS_2025_02_01

What Problems Do CMS Platforms Solve?

Before we get lost in the weeds of the two different architectures for CMS platforms, let’s step back for a moment and describe the problem all these systems set out to solve in their own distinct ways.

Separation of content creation from content presentation
1. Separation at the task level — even if one person is running the entire site, they can easily focus on either designing it or creating content, they never have to wear two proverbial hats at once
2. Separation at the storage level — regardless of how the data is stored, the content is segregated from the site’s design
3. It should be possible to change the content without needing to edit the design, and to change the design without editing the content
Content needs to be organised into sensible taxonomies, and new content should automatically appear in the correct locations within the site based on the taxonomies the creator applies. Examples of taxonomies include:
1. Nested Page hierarchies
2. Time-based sequences of posts or articles (blogs)
3. Categories
4. Tags
Support an editorial process — the details will differ from organisation to organisation
1. Content management systems need to facilitate an organization’s desired process for creation, review, and publishing of content
2. For personal sites or sites for small organisations, the needs here may be very basic — perhaps as simple as a mechanism for marking content as being in draft before publishing

Two World Views — Dynamic & Static

All CMS platforms combine content with a theme of some kind to produce the HTML, CSS & JavaScript behind the reader’s view in their browsers. The two big questions are:

When does the content get rendered to HTML, CSS & JavaScript?
Where is the content stored?

In theory, these two questions are not linked, but in reality, they almost always are, so we get answers to both questions with just two categories of CMS.

Read-Time Rendering — Dynamic CMS Platforms

More traditional CMS platforms (like WordPress) render pages on demand the moment the reader would like to read them. This is done using a server-side language like PHP, and the content is stored in a relational database.

When you visit a WordPress site like Podfeet.com, your browser sends a request to Allison’s web server which then runs PHP code which reaches back to a web server Allison runs and builds the web page you requested, and returns it to your browser.

As you browse Allison’s site from page to page, the same process happens again and again.

Pros & Cons

There is a significant amount of infrastructure needed to power a website like this:

A Linux or Windows server with a web server like Apache or NGINX and PHP
A Database server
The WordPress code

The server has quite a bit of work to do on each page view, so a small number of users can make a lot of work for this infrastructure. Spikes in traffic can easily cause excessive loads on the server, so for even vaguely popular sites, an additional layer of infrastructure is needed — a caching server. For this cache to be effective it needs to be tightly integrated with WordPress so it can know when a particular post has been edited, or a new comment added, so it can invalidate its cached copy of that page and ask the server for a fresh rendering next time a user asks for it.

Running a dynamic website involves continued bills and at least some amount of maintenance work to keep everything patched and secure.

But you do of course get some advantages:

Content publishes instantly
Content can be published at a high rate
User interaction like comments is natively supported
Authors and editors can use a web interface to manage the content
Once things are set up properly, very little technical skill is required to create content
There is a vast community of people creating themes and plugins to enhance the things the site can do (especially for WordPress)

Publication-Time Rendering — Static Site Generators

Static Site Generators (SSGs) like GitHub Pages turn things on their head completely. The entire site is rendered to HTML, CSS & JavaScript when new content is published, and those pre-generated files are copied to a basic web server that serves those same simple HTML files to all visitors.

Static Site Generators generally store their content in files on the file system, so there is no database, and hence no need for a database server. The web server doesn’t do any rendering, so its load is minimal, and it doesn’t require any server-side languages.

While the generated files need to make their way to a web server to be published to the internet, the generation itself can be done anywhere. You can even do the generation on your own PC and then move the generated files to your web server by any file-transfer protocol the server can accept.

For small sites with a single author that are not updated too often, it is perfectly practical for authors to install their SSG software on their PC and keep and manage the content files for their site like they would any other documents. Then, each time they update their content they’d run their SSG software to generate the updated website files, connect to their web server with their favourite file transfer client, and just drag-and-drop the newly generated website files to their server to publish the updated site.

This is, of course, not practical for multi-author sites, so an SSG can also be run on a centralised server of some kind. The key requirement is that authors have a mechanism for accessing the content files on that server, that the server can run the SSG app to convert the content files to the website files, and then transfer those generated files to the web server in some way. A SAMBA file share and a few scripts to trigger a rebuild and use Secure FTP to copy the files would work just fine.

These kinds of scripting approaches are simple bread-and-butter stuff to sysadmins, but they’re not practical for home users, and, there are better tools available.

Git+CI/CD and Static Site Generators are a Perfect Fit!

When you stop and think about it, the site content files are just a folder of text files that multiple people need to edit in a controlled way, and where changes to those files need to trigger actions. That’s no different from writing software where you have a folder of source code that needs to be edited by multiple people on a controlled way and then trigger some kind of build process to test and publish the updated app or library or what ever!

That means we can use the kind of tooling perfected by developers collaborating on software to collaborate on a statically generated website.

The content creation can be done using your favourite text editor, usually in Markdown, and collaboration and editorial control can be handled by Git. No matter how complex your editorial process is, it can be implemented using Git features like branches, permissions, and pull requests.

Content publishing can then be automated using the Continuous Integration/Continuous Deployment (CI/CD) pipelines that are now ubiquitous in Git environments. CI/CD pipelines are effectively scripts that run in the cloud that are triggered by Git events like commits to specific branches. GitHub supports CI/CD through GitHub Actions.

This is the process by which these very show notes were published. Myself and Allison are both authors with full permissions on the Git repository for these notes. The repo contains Markdown files with the content. We use a temporary Git branch to draft each new instalment, and then when we’re ready to publish the instalment, we merge the temporary branch into master which triggers a GitHub Actions action that uses a static site generator to convert the Markdown files into the files for the website and publishes those generated files to a web server to update the website.

In terms of editorial process (not necessarily technical implementation), it’s very simple — myself and Allison are peers with full publishing power, the community are contributors who use Pull Requests to submit proposed fixes and changes to us. We can then choose to accept those changes and merge them into master (triggering a site rebuild), or we can reject them (which we almost never do).

Pros & Cons

The infrastructure requirements for a static site are generally simpler than those for dynamic sites. They range from simply a PC with a few apps and the most basic of web hosting to a Git + CI/CD infrastructure and a basic web server. Regardless of your scale, a static site will be architecturally simpler and cheaper to run than an equivalently sized dynamic site.

OK, so why does the whole web not run on static site generators? Because there are, of course, some downsides, too 🙂

Publication of new content is never real-time. Sure, you can automate it with triggers, but the entire site needs to be re-built each time, so it will always take a few minutes
Interaction with your readers needs to be off-loaded from the site — you can embed 3rd-party cloud discussion group services like Disqus, or you can simply utilise social media for audience engagement. Whatever you do, that engagement will not be tightly integrated with your CMS; it will be in some way external to it. This adds a layer of complexity to this approach.
While the details will vary widely, in general, the interface for authors will not be as simple and self-contained as a unified web interface. Depending on who your authors are, and what their technological experience is, that might be a bug or a feature, but regardless, it’s something you need to consider carefully

Which to Choose?

It’s vital to realise that neither approach is objectively better or worse. Each has advantages and disadvantages. The question is not “which approach is best?”, but “which approach is best for me/us?”. Who you are, what your goals are, and what resources you have at your disposal all need to feed into your evaluations.

Some key questions to consider:

What’s the cadence for new content? A few posts and hour, a day, a week, a month?
How many readers do you need to cater for?
Do you want user comments embedded into your content management system?
What’s your budget?
What technical skills do you have at your disposal for operating the site?
How are your authors with technology?
1. What skills do they have?
2. How would they feel about learning new skills?
3. Do they have strong feelings about tooling choices?
4. Are they employees you can command, or volunteers you need to attract?
How much content are you likely to create? SSGs may have total size limitations (GitHub Pages limit is 1GB per repo and individual files less than 100MB)

At the extremes, it’s usually quite obvious what the right fit is. Let’s take two really popular sites as examples.

First, let’s consider Ars Technica — it has tens if not hundreds of authors, editors, and other contributors, new content is published many times an hour, and they integrate user comments into their pages.

It should come as no surprise that they use a dynamic system that must cost an absolute fortune to run because it needs to support a massive audience.

But now let’s consider John Gruber’s Daring Fireball Apple site. This is no more of a hobby project than Ars Technica — running that site puts food on John’s table, but, only on John’s table. He has no employees and no complex editorial process. He publishes a handful of times a day at most and has never allowed any kind of user-submitted content onto his site. He also has a massive audience, so he needs his site to be able to handle that load without costing him more than he can make from sponsorships.

John uses a static site generator, and it really is the perfect fit for him.

Both Ars and Daring Fireball are huge sites, neither is in any way amateur, both have made the perfect technical decision for their needs, and one has opted for a dynamic site, and one for a statically generated one.

The annoying thing is that for smaller sites, the tradeoffs tend to be less clear. For decades I’ve operated both my personal website and the site for my podcasts as dynamic sites, but I’m completely fed up with both. I’m done with managing servers and databases and paying substantial hosting bills. I want something simpler and cheaper, so I’m spending 2025 moving both of those sites to static site generators. Specifically, I’m moving both to GitHub Pages.

Why You Might Consider Using GitHub Pages

Before we finish this instalment with a big-picture overview of how GitHub Pages works, let’s just look at some reasons you might be interested in learning a little more about GitHub Pages.

Free web hosting 🙂 — sure, it’s limited to static content, but you can use GitHub Actions with static site generators to run complex sites, and you can even use your own domain names if you like!
A simple way to publish the matching documentation with any code share with others on GitHub.
You can host the kind of client-side web apps we’ve been developing in Programming By Stealth without even learning about Static Site Generators. (Like Allison has done with her Elapsed Time Adder app!)
If you’re the kind of person who’s enjoying this series, your skillset and mindset are likely to line up well with how GitHub Pages does things.

A Big-Picture Overview of GitHub Pages

GitHub Pages converts content within a GitHub Git repository into a published web page using GitHub Actions. The default publication action uses a somewhat customised and extended version of the open source Static Site Generator Jekyll.

Note that by default, Jekyll is always doing the site generation, but its presence can be invisible. If you give it only regular HTML, CSS, and JavaScript files as input, then it will pass them straight through to the ‘generated’ site unchanged. It will have checked each file for front matter and template strings, found none, and hence made no changes. This is what happens each time Allison updates her Elapsed Time Adder app.

But GitHub Pages doesn’t have to use Jekyll, it can be configured to serve HTML, CSS, and Javascript generated through just about any build process (including Webpack which we learned about in instalments 137, 138 & 139). When used as a content management system in its default configuration, you write your content in Markdown and add your metadata as YAML front matter at the top of your Markdown files. This is one of the reasons we covered YAML in instalments 168 & 169.

Some very basic aspects of GitHub Pages’ operation are configured in the GitHub web interface, but most of the configuration is managed via a YAML settings file.

GitHub provides a number of built-in themes, and using the configuration file, it’s possible to import other themes hosted elsewhere, including on GitHub.

To truly customise a GitHub Pages site you need to extend an existing Jekyll theme or create your own from scratch. Being an open source project, Jekyll leverages other open source technologies within it’s themes, including:

The open source Liquid templating engine created by Shopify, which in turn leverages the Jinja2 templating engine for data processing.
The CSS pre-processor Sass

GitHub Pages in no way enforces the use of Bootstrap, but given that we’ve relied heavily on it throughout this series, and that its customisation features are implemented in Sass, we’ll be using Bootstrap 5 with GitHub Pages in this series.

Putting it all together, you can see just how many technologies are in the mix here, how some show up repeatedly, and that while most are familiar to us already, quite a few are not:

Git & GitHub ✅
- GitHub Actions ❌
  - YAML ✅
  - NodeJS & NPM ✅ (for publishing web apps with Webpack rather than than Jekyll in pass-through mode)
HTML, CSS & JavaScript ✅
- NodeJS & NPM ✅
- Bootstrap ✅
  - Bootstrap customisation ❌
    - Sass ❌
Jekyll ❌
- YAML ✅
- Markdown ✅
- HTML, CSS & JavaScript ✅
- Liquid Templates ❌
  - Jinja2 ❌
- Sass ❌

Final Thoughts

Our approach in this series will be to break the broad topic down into two distinct sections, each solving a different problem. If you’re envisaging this as a 50/50 split, don’t, it will most likely prove to be much closer to a 10/90 split!

The first problem to be solved will be using GitHub Pages to host the HTML+CSS+Javascript web apps we were developing earlier in this series. This use of GitHub Pages ignores Jekyll completely, but allows us to explore how GitHub Actions work, and in the process get a taste for how the CI/CD workflows we’ve mentioned in passing in recent instalments work. We’ll probably wrap this up in a single instalment.

The remainder of the series will focus on solving the much more expansive problem of using GitHub Pages with Jekyll as a content management system. The end goal for this much larger second act will be a content management system with an entirely bespoke theme based on Bootstrap with customisations that supports rich taxonomies. We’ll build this up gradually, starting with the most simplistic base theme possible with Bootstrap installed in a customisable way, and we’ll build things up from there, starting with the basics of each of the new technologies early in the series, and then filling ever more of the detail in all the new technologies as we progress to ever more advanced use cases.

From the very start of the series we’ll built everything in such a way that we can test our sites locally on our own local computers before pushing our changes to GitHub for publication to the internet. This will greatly speed up the development process by allowing us to debug our sites with minimal friction.

No Previous (on First) GitHub Pages Miniseries Deploying a JavaScript Web App with Webpack & GitHub Actions

174. Working with Submodules Programming by Stealth 176. Deploying a JavaScript Web App with Webpack & GitHub Actions

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack