Static Web Site Generation (Part 1)

Mar 23, 2023

Creating a static web site generator seems to have become a rite of passage. Perhaps it always has been?

A static web site generator is a tool for taking some representation of the content of a website, and generating from it a set of static web pages in a way that can be served by an off-the-shelf web server, generally using simple file-based routing.

There is no shortage of static web site generators out there. GitHub supports Jekyll out of the box. Friends have successfully used Hugo. I've used Gatsby myself for a work project before switching to Next.js. There's a lot of great technology out there for static web site generation.

Yet, here I am, maintaining my website using a DIY static web site generator. It's the age-old story. My website has always been a set of static pages hosted on GitHub pages, because I wanted it to be low maintenance. But eventually I wanted a consistent navigation bar atop my site across these static pages. Using Javascript for that felt like overkill. So I cooked up a small script to add the navigation bar to every page.

The next natural step was to generalize the script to control where the banner went on a page. From there, it was a small step to use a page layout as a template that would get filled by the content specific to any given page of the site. Once you have the notion of a page layout, it is another small step to support different page layouts for different part of the site. And once you have the page content separated from the layout of the page, well, that page content can easily be written in some other format and converted to HTML. Which is the first step in supporting content written as posts for a blog hosted on the site, doesn't it? The kind of blog post like the one you're currently reading.

Yes, I know. I simply recreated the exact reasoning that went into the creation of these static web site generators I just mentioned. I am not claiming originality here. But it's been a fun and natural process, and I ended up with something that feels perfectly adapted to my current workflow. My dependencies are minimal and I spend no time maintaining the tool.

I am of course not alone in this approach. It's an easy way to experiment with site generation. Designers seem to feel the same about their own website.

This post (and upcoming ones) will dig into the details of my tool, in case you're curious.

The webgen tool

The tool, called webgen, is built around two ideas:

  • templates living at multiple levels of the folder hierarchy;
  • dependency-based content generation.

These are obviously not new ideas, but finding a balance between simplicity and usefulness has been entertaining.

The tool is implemented in Go, my preferred programming language for personal projects for the last year. It is statically typed, completely procedural, and the closest to programming in C that I've felt in a long while — but without the memory management headaches. That Go compiles to a single statically-linked executable is a bonus. The code for the tool is part of the Git repository of this website, which I recognize is not great for sharing. (I will likely move it to its own git repository at some future date, or if there's a need to it. UPDATE: done) This website as a whole in fact serves as an example of using webgen to create a static web site.

The idea of templates living at multiple levels of the folder hierarchy is simple. You create your static website as usual, with possibly nested folders of HTML, CSS, and Javascript files laid out however you want them. Any folder can have a source folder __src that includes content that webgen can use to generate HTML files in addition to those HTML files that are already present. The folder containing a __src folder is called the anchor folder of that __src folder. Any file xyz.content (containing an HTML fragment) in a __src folder is converted to an HTML file xyz.html in the anchor folder by inserting it into a template. That template is a file CONTENT.template which is found either in the same __src folder as the xyz.content file or in a __src folder hanging off a parent of the anchor folder. The first template found in a __src folder moving up from the anchor folder is the one that gets used. Thus, a template put in a __src folder at the root of the website folder hierarchy can serve as a generic template used for all pages, except when overridden by a template in a sub-folder.

Templates are implemented using Go's html/template package. They are simply HTML documents with a placeholder of the form {{.Body}} that gets replaced by some the content from a xyz.content file to create a file xyz.html in the anchor folder of the __src folder containing xyz.content.

For example, the structure

root/
    __src/
        CONTENT.template
        index.content
    A/
        __src/
            page-A.content
    B/
        __src/
            CONTENT.template
            page-B.content

will generate the following files:

root/
    index.html
    A/
        page-A.html
    B/
        page-B.html

where root/index.html and root/A/page-A.html are generated by plugging root/__src/index.content and root/A/__src/page-A1.content into template root/__src/CONTENT.template (respectively), while root/B/page-B.html is generated by plugging root/B/__src/page-B.content into template root/B/__src/CONTENT.template.

The structure of the HTML generation code is straightforward:

for every folder F:
   if F/__src/ exists:
      find nearest enclosing template T
      for all files F/__src/C.content:
          insert C.content into T and create F/C.html 

In the tool, the above algorithm is contained in function WalkAndProcessContents(), and relies on package function filepath.WalkDir() to walk the folder hierarchy from a given root path:

func WalkAndProcessContents(root string) {
    cwd, err := os.Getwd()
    if err != nil {
        rep.Fatal("ERROR: %s\n", err)
    }
    walk := func(path string, d fs.DirEntry, err error) error {
        if err != nil {
            // Error in processing the path - skip.
            return nil
        }
        if !d.IsDir() {
            // Skip over files.
            return nil
        }
        if filepath.Base(path) == ".git" {
            return fs.SkipDir
        }
        if isGenDir(path) {
            // Skip GENDIR.
            return fs.SkipDir
        }
        ProcessFilesContent(cwd, path)
        return nil
    }
    if err := filepath.WalkDir(root, walk); err != nil {
        rep.Fatal("ERROR: %s\n", err)
    }
}

(Variable GENDIR holds the name of source folder __src.)

The above code finds every anchor folder (skipping files) and calls ProcessFilesContent() on any folder F to handle HTML file generation in F from a content file appearing in F/__src. Note the special cases for .git and for __src to avoid generating files in .git/ and in __src/. Clearly, this could and should be generalized into a generic list of folders to skip.

Function ProcessFilesContent() does the bulk of the work:

func ProcessFilesContent(cwd string, path string) {
    genDir, err := identifyGenDir(path)
    if err != nil {
        return
    }
    entries, err := os.ReadDir(filepath.Join(path, genDir))
    if err != nil {
        // if we can't read GENDIR, skip.
        return
    }
    for _, d := range entries {
        if !d.IsDir() && isContent(d.Name()) {
            relPath, err := filepath.Rel(cwd, path)
            if err != nil {
                relPath = path
            }
            target := filepath.Join(relPath, targetFilename(d.Name(), "content", "html"))
            w, err := os.Create(target)
            if err != nil {
                w.Close()
                rep.Printf("ERROR: %s\n", err)
                continue
            }
            if err := ProcessFileContent(w, filepath.Join(relPath, genDir, d.Name())); err != nil {
                w.Close()
                rep.Printf("ERROR: %s\n", err)
                continue
            }
            rep.Printf("  wrote %s", target)
            w.Close()
        }
    }
}

func ProcessFileContent(w io.Writer, fname string) error {
    rep.Printf("%s\n", fname)
    tmpl, err := FindTemplate(fname)
    if err != nil {
        return err
    }
    main, err := ioutil.ReadFile(fname)
    if err != nil {
        return err
    }
    current := template.HTML(main)
    rep.Printf("  using template %s\n", tmpl.name)
    c := Content{"", time.Time{}, "", "", current}
    filledTmpl, err := ProcessTemplate(tmpl.template, c)
    if err != nil {
        return err
    }
    bytes := []byte(filledTmpl)
    if _, err := w.Write(bytes); err != nil {
        return err
    }
    return nil
}

Finding templates is handled by function FindTemplate() and inserting content into a template by function ProcessTemplate():

type template_info struct {
    template *template.Template
    name     string
}

func FindTemplate(path string) (template_info, error) {
    // Given a path, find the nearest enclosing CONTENT.template file.
    previous, _ := filepath.Abs(path)
    current := filepath.Dir(previous)
    for current != previous {
        gdPath, err := identifyGenDirPath(current)
        if err == nil {
            tname := filepath.Join(gdPath, TEMPLATE)
            tpl, err := template.ParseFiles(tname)
            if err == nil {
                result = template_info{tpl, tname}
                return result, nil
            }
        }
        previous = current
        current = filepath.Dir(current)
    }
    return template_info{}, fmt.Errorf("no template found")
}

func ProcessTemplate(tpl *template.Template, content Content) (template.HTML, error) {
    var b strings.Builder
    if err := tpl.Execute(&b, content); err != nil {
        return template.HTML(""), err
    }
    result := template.HTML(b.String())
    return result, nil
}

Nested templates is a straightforward extension of the above. They let an HTML fragment file.content be injected into a template to create a larger fragment that itself can be injected into a template further up the folder hierarchy. Details are left as an exercise to the reader.

And that's it, really. I have some glue code to launch the tool from the command line and helper functions to simplify the code, but the core is as simple as can be.

The second idea underlying webgen, dependency-based content generation, I will address in Part 2, which covers Markdown content.

Daniel Martin (by John Fowles)