O'Reilly logo

Static Site Generators by Brian Rinaldi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. What Are Static Sites?

A Little Background

There’s been a lot of talk recently about static sites and the new generation of tools used to create them, commonly referred to as “static site generators” or “static site engines.” As with any new technology, it can sometimes be hard to differentiate the hype from the reality. This book aims to give you a broad understanding of the technology: what it is and where it best applies. First, however, we need to define what static sites are and where they came from.

The term “static site” is an interesting one if you think about it, as it defines itself by what it lacks. The “static” aspect doesn’t so much describe a feature as the absence of one: dynamic page rendering. Once upon a time, probably before we commonly used the term “static site,” this would have been considered a weakness.

Those of us who’ve been working in web development for some time probably recall building static sites using tools like Dreamweaver, HomeSite, or (heaven forbid) FrontPage. The content on these pages could only be changed by manually altering the existing site files and replacing the files on the server via FTP.

There were a number of issues with this process. Adding content to the site required a moderately high level of technical knowledge, either knowledge of the specific tool used to design and build the site or of HTML to hand code the site. One also needed to understand how to deploy the site to a host via FTP, which isn’t necessarily straightforward for nontechnical users. This meant that the content creators, who are frequently nontechnical, could not directly or easily contribute to the site and required the assistance of a web developer to add new content.

Creating new pages typically required copying and tweaking existing pages. As the site grew, maintaining proper navigation and links typically became both tedious and extremely error prone. Some tools offered features like templates that tried to solve these issues, but these could be complicated or cumbersome to create.

In addition to these issues, there was the limitation that if your site required dynamic features like comments or forums, for example, this was simply not possible in a purely static site.

The Dynamic Site Era

Dynamic sites seemed to fix these issues. Nontechnical content creators could create and update pages via backend forms without the need to understand the specifics of website development tools or HTML. Since the content and pages were all driven from a database, navigation could be generated automatically. In addition, by definition, dynamic sites allow for dynamic features such as forums or comments.

In the case of content-focused web pages, dynamic sites often took the form of a content management system (CMS). These could be custom built to the needs of the site or, very frequently, selected from a number of commercial or open source options.

To this day, most of the content published on the Web runs through some form of content management system. Popular open source options include DrupalJoomla, and Typo3 (see Figure 1-1). Nowadays, these systems typically handle much more than simply content creation and publication, with features such as complex roles and access control, workflow management, document management, and syndication.

Alt Text
Figure 1-1. Adding an article in the Drupal CMS (source: Drupal.org).

These additional features lead to the biggest issue with dynamic sites, which is that the solution is often more complex than the problem. By virtue of its need to cater to a broad set of customers, a pre-built CMS often has a steep learning curve for both developers and content creators. Meanwhile, a custom CMS requires both extensive development efforts and access to a developer should issues or necessary changes arise.

Hosting dynamic sites is complicated by the need for database storage (and backups) as well as support for whatever dynamic language the site is built upon (PHP, Ruby, etc.). Factor in the need for regular updates to the dynamic language, database solution and even the CMS software itself, and it becomes rather obvious that, while dynamic sites solve many difficult problems, they bring with them their own set of complications.

The Rise of Blog Engines

The complexity of content management systems was not well suited for smaller, content-focused sites or blogs that didn’t require advanced features like complex user roles or workflow. Blogging engines, the most popular being Wordpress (see Figure 1-2), aimed to solve this by making development simple, with pre-built and easily customizable templates, and publishing content quick and easy.

Blog engines don’t negate the need for supporting a dynamic language (PHP in the case of WordPress) or for a database (typically MySQL for WordPress). WordPress, however, became popular enough that many hosts made “out-of-the-box” hosting solutions that simplified setup and maintenance. To give you a sense of the popularity of WordPress, according to W3Techs, as of May 2015, Wordpress is used on approximately 23.9% of the top 10 million sites, a percentage that dwarfs every other content management system.

Alt Text
Figure 1-2. The WordPress dashboard (source: WordPress.org).

Nonetheless, over time, WordPress has begun to gain some of the complexity of a typical CMS, and it is generally lumped in the category of CMS by most industry research. Many sites heavily depend on features that are added via plug-ins, the quantity and quality of which can dramatically impact site performance. In addition, features like plug-ins and “shortcodes” can impact the portability of content, keeping your site tied to the Wordpress platform.

Some in the blogging community felt that Wordpress and competing blog engines like Moveable Type had strayed so far from the simplicity of their initial blogging focus that they created new projects, like Ghost for example (see Figure 1-3), that aimed to get back to the basics of just blogging. Ghost’s tagline is, in fact, “Just a blogging platform.”

Alt Text
Figure 1-3. Ghost offers an intentionally simple and sparse editor (source: Ghost.org).

Static Pages Get New Life

Whatever complexity dynamic sites may bring, for most use cases, there is simply no avoiding the need for dynamic data. Even the most basic content site, like a personal blog, generally has dynamic aspects: commenting, feedback or contact forms and search, to name just a few. So it wasn’t until the rise of new services that can fill these voids that static sites really became a viable option for more than just “brochureware”.

There are numerous services, both free and paid, that offer the ability to add dynamic aspects into static pages (it’s important to note that these services are not specifically intended for use only on static sites). Some popular options include:

There are many more covering a full range of typical site requirements. There even BaaS (backend as a service) solutions like Parse or Kinvey that offer APIs that allow developers to pull any form of arbitrary dynamic data into a static page.

Overview of Popular Services

If you’re interested in some of the services listed above as well as implementation details, Raymond Camden wrote an article on the topic called “Moving to Static and Keeping Your Toys”.

What makes all of these services work is the ability to load remote data requests via Ajax. As an example, let’s look at how to load Disqus comments onto a page. The following is from my personal blog:

<div id="disqus_thread"></div>
<script type="text/javascript">
    /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
    var disqus_shortname = 'remotesynthesis'; // required: replace example with your forum shortname

    /* * * DON'T EDIT BELOW THIS LINE * * */
    (function() {
        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
        dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
    })();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>

In a nutshell, the script creates a new <script> element whose source is a JavaScript file on the Disqus server. The file URL is specific to the forum via the configuration variable, disqus_shortname, allowing it to retrieve the forum name via the URL of the script. This file then performs a number of actions to remotely retrieve comment data and display it on the page.

How Disqus Works

If you’re curious for a more specific description, see “How does Disqus work?” in the Disqus documentation.

Of course, one need not rely on these services for loading dynamic data onto a static page—a savvy developer could write his or her own solution using similar techniques—but these out-of-the-box services make static pages a much more appealing, and far less daunting, option than they once were.

Defining a Static Website

So far we’ve covered some background showing how the static web pages of old failed to meet the needs of the Web as websites became more complex and interactive. We discussed how dynamic sites generally and content management systems specifically solved some of these problems but led to increased complexity in both development and authoring. Blog engines partially addressed these issues but also took on some of complexity over time. Finally, we saw how Ajax and the rise of services have helped make static pages a viable option again.

However, before we explore static site generators, I’d like to end our current discussion by laying a clear definition of a static site. Understanding what a static site is (and isn’t) is essential for evaluating whether a static site generator is a workable solution for your project:

Static site files are delivered to the end user exactly as they are on the server.
This is probably the key defining characteristic of a static site and part of why static sites tend to perform so well: there is no server-side generation at runtime. This means, for instance, that every visitor to your static site will be served an identical copy of index.html from the server until it is manually overwritten, say by uploading a new file via FTP.
There is no server-side language.
It follows from the preceding characteristic that there would be no server-side language (like Ruby or PHP for example) involved. However, when speaking of static site generators, some are written using these languages but are intended to be run locally.
There is no database.
As there is no server-side language to speak to a database, there is therefore no database. This does not mean that there is no data. There can be data stored as files or via an external service like the ones discussed earlier. This means that if you need common features like user registration/login, this would need to be via an external service.
Static sites are HTML, CSS, and JavaScript.
This seems fairly obvious, but it should be clear that since static sites are intended to run in the browser, they must rely on web technologies to function. Of course, this can also include images like JPEG and GIF, graphic files like SVG and WebGL, or data formats like JSON or XML.

Benefits of Static Sites

While each of the preceding features brings with it certain limitations, they also lead to some of the primary benefits of static sites:

Performance

There is no server-side processing and no database to connect to, meaning that there is nothing to slow down getting a static page from the server to your end user. This also means that there are no bottlenecks that might cause slowness or outages should you encounter a significant traffic surge.

Hosting

Since no server-side language is required, hosting requires no complicated setup or maintenance, making it cheap and easy. In fact, there are even free options, like GitHub pages or Surge, for instance (we’ll explore deployment options in a later chapter).

Security

There are no server-side language issues to exploit and no database to hack. Basically, as long as the files on your host are secure, your static site is secure.

Content versioning

Since your entire site, from configuration to content, is file-based, it is very easy to keep all aspects of it within a version control system like Git. This can be especially advantageous for things like documentation that you may want to allow community contributions, for example, using pull requests on GitHub.

Despite these benefits, static sites, even with the help of a static site generator, are not the solution for every type of site. In upcoming chapters, we’ll discuss more some of the limitations of static sites and the types of sites these solutions are best suited for.

A Word (or More) About Markdown

Before we dig into static site generators, there’s one last item we need to discuss: Markdown. Markdown has become a de facto part of the static site stack. It is a shorthand way to write HTML and is the default tool to write post and page content in most static site generators. However, it is often unfamiliar to most anyone who isn’t a web developer.

What is Markdown?

Markdown is essentially a syntax for a simple, easy-to-read, plain text format that is designed to be converted to HTML. It was originally created in 2004 by John Gruber, who is well known for his commentary on the technology industry, and he owns the copyright as well as rights to the name Markdown, though the original conversion tool is licensed under the BSD open source license.

Markdown has been widely adopted across the industry as a way to quickly create web content using a simple shorthand. Many popular web-development tools offer Markdown support out of the box, including Sublime Text, Atom, Visual Studio Code, and Brackets. Most blog engines have started offering support for Markdown, including Wordpress.

There’s even a burgeoning market for standalone Markdown editors, with some popular options being Mou on Mac, MarkdownPad on Windows, and Dillinger in the browser. Markdown support is also central to new services like Beegit, which offers online document collaboration.

More Markdown Tools

If you are interested in the tool ecosystem in Markdown, I wrote a post that covers more standalone options as well as conversion tools for doing tasks like converting Word documents to Markdown.

Markdown syntax

Markdown’s appeal is the simplicity of its syntax. Its philosophy emphasizes being easy to read first and and easy to write second. Let’s look at some examples to see how this works.

Headers are generally indicated using the pound symbol. So:

#My Title

results in:

<h1>My Title</h1>

And:

##My Header

results in:

<h2>My Header</h2>

The number of pound symbols indicates the header level. Markdown often offers multiple syntax options for elements, so headers can also be indicated via underlining. The following would also result in an <h1> block:

My Title
=============

Unordered lists can be created using either asterisks, pluses or hyphens:

* My first bullet
* My second bullet

results in:

<ul>
    <li>My first bullet</li>
    <li>My second bullet</li>
</ul>

Replacing the * with + or - will result in the same HTML output.

Ordered lists use numbers but do not require that the number actually correlate to the items position in the list. So:

1. My first item
1. My second item
8. My third item

results in:

<ol>
    <li>My first item</li>
    <li>My second item</li>
    <li>My third item</li>
</ol>

Italic and bold text typically also uses the asterisk, but can also use underscore. So:

*This is italic* and _this is italic_
but **this is bold** and __this is bold__

results in:

<em>This is italic</em> and <em>this is italic</em>
but <strong>this is bold</strong> and <strong>this is bold</strong>

Links and images use a similar syntax, one that my experience has found to be the least intuitive of Markdown’s shorthand syntax. So:

![O'Reilly logo](http://cdn.oreillystatic.com/images/sitewide-headers/ml-header-home-blinking.gif)

And this would be a [link to O'Reilly](http://oreilly.com)

results in:

<img src="http://cdn.oreillystatic.com/images/sitewide-headers/ml-header-home-blinking.gif" alt="O'Reilly logo">

<p>And this would be a <a href="http://oreilly.com">link to O'Reilly</a></p>

Hopefully this gives you a sense of what the Markdown syntax looks like. There is also shorthand syntax for things like block quotes, code blocks, and horizontal rules. If you would like a comprehensive overview of the entire syntax, refer to John Gruber’s original syntax documentation.

The problem(s) with Markdown

Markdown’s biggest flaw is the simplicity of its syntax. Once you become comfortable with the syntax, it can be very quick and easy to write Markdown documents. But Markdown’s syntax only covers a limited subset of HTML. To fix this limitation, Markdown allows you to directly include HTML within a Markdown document, but this means that you’ll need to know HTML to properly use Markdown for authoring. There are also multiple “flavors” of Markdown to deal with. These issues can complicate using Markdown with content contributors. The following are two other problems:

Problem 1: the lack of a standard

There are numerous Markdown variations, called “flavors,” available. GitHub relies on Markdown as a standard for its documentation and uses GitHub-Flavored Markdown. StackOverflow has its own additions to Markdown. According to Wikipedia, other variations of Markdown also exist from reddit, Diaspora, OpenStreetMap, and SourceForge.

There was even an attempt to standardize Markdown which ran into copyright issues, as John Gruber owned the rights to the Markdown name. It now exists under the name CommonMark.

The problem with a lack of a standard is that much of the tooling around Markdown is built for one variant or another. Some support multiple variants, but trying to teach a nontechnical content contributor about the complexity of the Markdown ecosystem can become a barrier.

Problem 2: Markdown doesn’t replace HTML

Markdown covers a very limited subset of HTML, which means that authors will need to understand the situations that aren’t covered as well as know the HTML to use for those situations. This forces a content contributor to not only learn Markdown, but also what Markdown cannot do, and then learn HTML to fill those gaps.

Let’s look at a very common example. Markdown currently has no syntax for named anchors, but named anchors are frequently used in content to allow a user to quickly jump to a location in a page. In order to achieve a named anchor, you’ll need to mix Markdown and HTML as follows:

<a name="mysubheader"></a>
##My Subheader

While Markdown’s support for embedded HTML means that there is nothing HTML can do that Markdown cannot, it adds a great deal of complexity, especially for a content contributor who is unfamiliar with HTML.

In addition, standalone Markdown editors are not WYSIWIG, opting instead to offer a live preview of hand-written code. As Markdown continues to grow in use, the tools keep improving, but the current state of Markdown tooling offers a very unfamiliar experience for many content contributors.

Word to Markdown

One option for content contributors familiar with working in Word for content authoring is the Microsoft Word to Markdown Converter project by Ben Balter. My own personal use of this project has shown that while the output needs manual cleaning, it is generally reliable.

Despite these issues, as we’ll see when we look deeper at static site generators, Markdown has become the standard for writing content within these tools.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required