Static Web Site Generation (Part 2)
May 2, 2023
Previous post in the series: Static Web Site Generation (Part 1).
In my last post, I talked about webgen
, the tool I wrote to maintain this website. It is a
fairly conventional static web site generator, based on the simplest of all ideas: HTML templates
that can be filled with HTML content. Writing it from scratch has been a good exercise and gave me a
tool perfectly suited to my workflow that I can modify however and whenever said workflow changes.
A quick recap: the tool assumes that a web site is structured as a folder hierachy containing the
HTML files to be served statically through a standard web server. Some of those HTML files are
pre-written files, while others are generated from content and HTML templates. Site generation is
done before the web site is deployed, so that only static files are served by the web server.
The tool scans all folder in the hierarchy and for any folder F
with a __src
subfolder containing a
file foo.content
holding some HTML content it generates a file foo.html
in folder F
by inserting
__src/foo.content
into an HTML template. That HTML template is found by searching the parent folders
of F
for a template __src/CONTENT.template
. Templates can be stored at any levels of the folder
hiearchy, and the template in the nearest enclosing folder is used.
This means, for example, that you can write a site template capturing the structure of all the
pages of the final website (header, footer, navigation bar, layout, CSS), that you can write the HTML
content of the various pages within those various content files, and that you can then generate the
final website by simply injecting each content file into the site template to obtain the final pages.
Moreover, and this is the point of this post, there is no reason why content used to generate the
final pages need to be HTML. It is easy enough to generate HTML from markup languages such as Markdown or
Org files. The approach that webgen
takes is to cascade content generation. For Markdown, for
example, there is a Markdown-to-HTML translator in webgen
that knows how to translate Markdown files
into HTML content files that can be then be processed using the HTML content generation described
above as though they were original content files. There is no difference between generated content files and original content files. This compositionality was an important design criterion for webgen
.
More specifically, every __src/foo.md
file gets translated into a __src/foo.content
HTML file in the
same source folder. The implementation uses the library Blackfriday for parsing Markdown and
generating the HTML. The translation from Markdown to HTML is completely standard: headers get
translated into <h1>
, <h2>
, ..., italics into <i>
, bold into <b>
, lists into <ul>
, etc.
A Markdown-specific HTML template can additionally be used to mediate the translation from Markdown
to HTML. The tool searches for a template __src/MARKDOWN.template
by walking up the folder
hierarchy from the source folder of any Markdown file, in the same way as it searches for a template
__src/CONTENT.template
. If it finds such a Markdown template file, the HTML obtained by
translating the Markdown file is inserted into the template to create the final content file.
The implementation supporting this process is a simple extension of what's already available. Recall
from last time that the tool does a recursive walk over the folder hierarchy of the web site to
generate foo.html
from every __src/foo.content
it finds. To handle Markdown translation, the tool
does a preliminary recursive walk to translate every Markdown file into a content file before. Thus,
given the following hierarchy:
root/
__src/
MARKDOWN.template
C/
__src/
page-C.md
D/
__src/
MARKDOWN.template
page-D.md
the tool will first generate the following files in the Markdown translation pass:
root/
C/
__src/
page-C.content
D/
__src/
page-D.content
using template root/__src/MARKDOWN.template
for page-C.md
and root/D/__src/MARKDOWN.template
for
page-D.md
. The subsequent recursive walk will generate page-C.html
and page-D.html
in the usual way:
root/
C/
page-C.html
D/
page-D.html
The underlying walk-and-translate algorithm is straightforward:
for every folder F:
if F/__src/ exists:
for all files F/__src/C.md:
transform C.md into some HTML content
if there is a nearest MARKDOWN.template file T:
insert HTML content into T to update HTML content
write HTML content to F/__src/C.content
In the tool, the above algorithm is contained in function WalkAndProcessMarkdowns()
, invoked before
WalkAndProcessContents()
from last time and working very similarly:
func WalkAndProcessMarkdowns(root string) {
cwd, err := os.Getwd()
if err != nil {
rep.Fatal("ERROR: %s\n", err)
}
walk := func(path string, d fs.DirEntry, err error) error {
if err != nil {
// Error in processing the path - skip.
return nil
}
if !d.IsDir() {
// Skip over files.
return nil
}
if filepath.Base(path) == ".git" {
return fs.SkipDir
}
if isGenDir(path) {
// Skip GENDIR.
return fs.SkipDir
}
if isGenPosts(path) {
return fs.SkipDir
}
ProcessFilesMarkdown(cwd, path)
return nil
}
if err := filepath.WalkDir(root, walk); err != nil {
rep.Fatal("ERROR: %s\n", err)
}
}
(Variable GENDIR
abstracts the name of source folder __src
in case I want to change it in the future.)
A function ProcessFilesMarkdown()
does the bulk of the work of translating all the Markdown files in
a __src
folder in a way similar to ProcessFilesContents()
for content files:
func ProcessFilesMarkdown(cwd string, path string) {
gdPath, err := identifyGenDirPath(path)
if err != nil {
return
}
entries, err := os.ReadDir(gdPath)
if err != nil {
// if we can't read GENDIR, skip.
return
}
for _, d := range entries {
if !d.IsDir() && isMarkdown(d.Name()) {
relPath, err := filepath.Rel(cwd, gdPath)
if err != nil {
relPath = gdPath
}
target := filepath.Join(relPath, targetFilename(d.Name(), "md", "content"))
w, err := os.Create(target)
if err != nil {
w.Close()
rep.Printf("ERROR: %s\n", err)
continue
}
if err := ProcessFileMarkdown(w, filepath.Join(relPath, d.Name())); err != nil {
w.Close()
rep.Printf("ERROR: %s\n", err)
continue
}
rep.Printf(" wrote %s", target)
w.Close()
}
}
}
func ProcessFileMarkdown(w io.Writer, fname string) error {
rep.Printf("%s\n", fname)
md, err := ioutil.ReadFile(fname)
if err != nil {
return err
}
metadata, restmd, err := ExtractMetadata(md)
if err != nil {
return err
}
output := blackfriday.Run(restmd, blackfriday.WithNoExtensions())
tpl, tname, err := FindMarkdownTemplate(fname)
if tpl != nil {
rep.Printf(" using markdown template %s\n", tname)
result, err := ProcessMarkdownTemplate(tpl, metadata, template.HTML(output))
if err != nil {
return err
}
output = []byte(result)
}
if _, err := w.Write(output); err != nil {
return err
}
return nil
}
(Clearly, that both the recursive walk to find Markdown files and the code to process Markdown files
parallel the structure of generating HTML from content files suggests that there's an abstraction
here that I can isolate. I'll work on that kind of refactoring the next time I touch this code
base.)
The actual Markdown-to-HTML translation is achieved with a call to blackfriday.Run()
in the
Blackfriday library. Functions FindMarkdownTemplate()
and ProcessMarkdownTemplate()
are used
to find a __src/MARKDOWN.template
and to insert the produced HTML into such a template, respectively.
One additional detail worth mentioning is that I allow Markdown files to have YAML-style metadata at
the top of the file, such as:
---
title: Static Web Site Generation (Part 1)
date: 2023-03-23
---
This metadata can be used in a Markdown template, although the implementation is not yet generic: I
support only a few fields, such as title
and date
as above, mostly dedicated to the creation of blog
posts. I will most likely talk about blog support in webgen
in a future post. Markdown metadata is
read using a function ExtractMetadata()
.
And that's about it. Clearly the above generalizes to other markup formats, as long as you can
define an HTML translator for those formats.
In a precise sense, webgen
works bottom-up: it first transforms every Markdown file it finds into a
content file in a first pass, and then transforms every content file it finds into an HTML file in a
second pass. It is easy to add more steps to the cascade. For instance, I explored the idea of
adding LaTeX-style mathematical markup to Markdown documents to help writing mathematical posts. So
as not to reinvent the Markdown-translation wheel, or hack the underlying Markdown translation
library, the easiest way to support this kind of extended Markdown is to first translate the
extended Markdown file into a normal Markdown file in which the mathematical markup has been
translated to HTML, leaving all standard Markdown untouched. This latter Markdown file can then be
translated to HTML using the process described above, taking advantage of the fact that the
Blackfriday library leaves HTML markup in a Markdown file untouched during translation. (You can see the result on this experiment on these lecture notes on algorithmic analysis. The translation from LaTeX to HTML was done through Temml.)
The current implementation of webgen
is not smart about generation. It will regenerate all files
whenever it is run, whether their source files have been modified or not since the last
generation. I don't see it as a problem right now because this site is small: generating it from
scratch every times takes only a few seconds. Of course, a more efficient generation process would
be to only regenerate files whose source have changed. I'm leaving this optimization for a future
refactoring over the code.
Mantissa (by John Fowles)
Previous post in the series: Static Web Site Generation (Part 1).
In my last post, I talked about webgen
, the tool I wrote to maintain this website. It is a
fairly conventional static web site generator, based on the simplest of all ideas: HTML templates
that can be filled with HTML content. Writing it from scratch has been a good exercise and gave me a
tool perfectly suited to my workflow that I can modify however and whenever said workflow changes.
A quick recap: the tool assumes that a web site is structured as a folder hierachy containing the HTML files to be served statically through a standard web server. Some of those HTML files are pre-written files, while others are generated from content and HTML templates. Site generation is done before the web site is deployed, so that only static files are served by the web server.
The tool scans all folder in the hierarchy and for any folder F
with a __src
subfolder containing a
file foo.content
holding some HTML content it generates a file foo.html
in folder F
by inserting
__src/foo.content
into an HTML template. That HTML template is found by searching the parent folders
of F
for a template __src/CONTENT.template
. Templates can be stored at any levels of the folder
hiearchy, and the template in the nearest enclosing folder is used.
This means, for example, that you can write a site template capturing the structure of all the pages of the final website (header, footer, navigation bar, layout, CSS), that you can write the HTML content of the various pages within those various content files, and that you can then generate the final website by simply injecting each content file into the site template to obtain the final pages.
Moreover, and this is the point of this post, there is no reason why content used to generate the
final pages need to be HTML. It is easy enough to generate HTML from markup languages such as Markdown or
Org files. The approach that webgen
takes is to cascade content generation. For Markdown, for
example, there is a Markdown-to-HTML translator in webgen
that knows how to translate Markdown files
into HTML content files that can be then be processed using the HTML content generation described
above as though they were original content files. There is no difference between generated content files and original content files. This compositionality was an important design criterion for webgen
.
More specifically, every __src/foo.md
file gets translated into a __src/foo.content
HTML file in the
same source folder. The implementation uses the library Blackfriday for parsing Markdown and
generating the HTML. The translation from Markdown to HTML is completely standard: headers get
translated into <h1>
, <h2>
, ..., italics into <i>
, bold into <b>
, lists into <ul>
, etc.
A Markdown-specific HTML template can additionally be used to mediate the translation from Markdown
to HTML. The tool searches for a template __src/MARKDOWN.template
by walking up the folder
hierarchy from the source folder of any Markdown file, in the same way as it searches for a template
__src/CONTENT.template
. If it finds such a Markdown template file, the HTML obtained by
translating the Markdown file is inserted into the template to create the final content file.
The implementation supporting this process is a simple extension of what's already available. Recall
from last time that the tool does a recursive walk over the folder hierarchy of the web site to
generate foo.html
from every __src/foo.content
it finds. To handle Markdown translation, the tool
does a preliminary recursive walk to translate every Markdown file into a content file before. Thus,
given the following hierarchy:
root/
__src/
MARKDOWN.template
C/
__src/
page-C.md
D/
__src/
MARKDOWN.template
page-D.md
the tool will first generate the following files in the Markdown translation pass:
root/
C/
__src/
page-C.content
D/
__src/
page-D.content
using template root/__src/MARKDOWN.template
for page-C.md
and root/D/__src/MARKDOWN.template
for
page-D.md
. The subsequent recursive walk will generate page-C.html
and page-D.html
in the usual way:
root/
C/
page-C.html
D/
page-D.html
The underlying walk-and-translate algorithm is straightforward:
for every folder F:
if F/__src/ exists:
for all files F/__src/C.md:
transform C.md into some HTML content
if there is a nearest MARKDOWN.template file T:
insert HTML content into T to update HTML content
write HTML content to F/__src/C.content
In the tool, the above algorithm is contained in function WalkAndProcessMarkdowns()
, invoked before
WalkAndProcessContents()
from last time and working very similarly:
func WalkAndProcessMarkdowns(root string) {
cwd, err := os.Getwd()
if err != nil {
rep.Fatal("ERROR: %s\n", err)
}
walk := func(path string, d fs.DirEntry, err error) error {
if err != nil {
// Error in processing the path - skip.
return nil
}
if !d.IsDir() {
// Skip over files.
return nil
}
if filepath.Base(path) == ".git" {
return fs.SkipDir
}
if isGenDir(path) {
// Skip GENDIR.
return fs.SkipDir
}
if isGenPosts(path) {
return fs.SkipDir
}
ProcessFilesMarkdown(cwd, path)
return nil
}
if err := filepath.WalkDir(root, walk); err != nil {
rep.Fatal("ERROR: %s\n", err)
}
}
(Variable GENDIR
abstracts the name of source folder __src
in case I want to change it in the future.)
A function ProcessFilesMarkdown()
does the bulk of the work of translating all the Markdown files in
a __src
folder in a way similar to ProcessFilesContents()
for content files:
func ProcessFilesMarkdown(cwd string, path string) {
gdPath, err := identifyGenDirPath(path)
if err != nil {
return
}
entries, err := os.ReadDir(gdPath)
if err != nil {
// if we can't read GENDIR, skip.
return
}
for _, d := range entries {
if !d.IsDir() && isMarkdown(d.Name()) {
relPath, err := filepath.Rel(cwd, gdPath)
if err != nil {
relPath = gdPath
}
target := filepath.Join(relPath, targetFilename(d.Name(), "md", "content"))
w, err := os.Create(target)
if err != nil {
w.Close()
rep.Printf("ERROR: %s\n", err)
continue
}
if err := ProcessFileMarkdown(w, filepath.Join(relPath, d.Name())); err != nil {
w.Close()
rep.Printf("ERROR: %s\n", err)
continue
}
rep.Printf(" wrote %s", target)
w.Close()
}
}
}
func ProcessFileMarkdown(w io.Writer, fname string) error {
rep.Printf("%s\n", fname)
md, err := ioutil.ReadFile(fname)
if err != nil {
return err
}
metadata, restmd, err := ExtractMetadata(md)
if err != nil {
return err
}
output := blackfriday.Run(restmd, blackfriday.WithNoExtensions())
tpl, tname, err := FindMarkdownTemplate(fname)
if tpl != nil {
rep.Printf(" using markdown template %s\n", tname)
result, err := ProcessMarkdownTemplate(tpl, metadata, template.HTML(output))
if err != nil {
return err
}
output = []byte(result)
}
if _, err := w.Write(output); err != nil {
return err
}
return nil
}
(Clearly, that both the recursive walk to find Markdown files and the code to process Markdown files parallel the structure of generating HTML from content files suggests that there's an abstraction here that I can isolate. I'll work on that kind of refactoring the next time I touch this code base.)
The actual Markdown-to-HTML translation is achieved with a call to blackfriday.Run()
in the
Blackfriday library. Functions FindMarkdownTemplate()
and ProcessMarkdownTemplate()
are used
to find a __src/MARKDOWN.template
and to insert the produced HTML into such a template, respectively.
One additional detail worth mentioning is that I allow Markdown files to have YAML-style metadata at the top of the file, such as:
---
title: Static Web Site Generation (Part 1)
date: 2023-03-23
---
This metadata can be used in a Markdown template, although the implementation is not yet generic: I
support only a few fields, such as title
and date
as above, mostly dedicated to the creation of blog
posts. I will most likely talk about blog support in webgen
in a future post. Markdown metadata is
read using a function ExtractMetadata()
.
And that's about it. Clearly the above generalizes to other markup formats, as long as you can define an HTML translator for those formats.
In a precise sense, webgen
works bottom-up: it first transforms every Markdown file it finds into a
content file in a first pass, and then transforms every content file it finds into an HTML file in a
second pass. It is easy to add more steps to the cascade. For instance, I explored the idea of
adding LaTeX-style mathematical markup to Markdown documents to help writing mathematical posts. So
as not to reinvent the Markdown-translation wheel, or hack the underlying Markdown translation
library, the easiest way to support this kind of extended Markdown is to first translate the
extended Markdown file into a normal Markdown file in which the mathematical markup has been
translated to HTML, leaving all standard Markdown untouched. This latter Markdown file can then be
translated to HTML using the process described above, taking advantage of the fact that the
Blackfriday library leaves HTML markup in a Markdown file untouched during translation. (You can see the result on this experiment on these lecture notes on algorithmic analysis. The translation from LaTeX to HTML was done through Temml.)
The current implementation of webgen
is not smart about generation. It will regenerate all files
whenever it is run, whether their source files have been modified or not since the last
generation. I don't see it as a problem right now because this site is small: generating it from
scratch every times takes only a few seconds. Of course, a more efficient generation process would
be to only regenerate files whose source have changed. I'm leaving this optimization for a future
refactoring over the code.