Getting a Grip on Hugo's URL Space
When I converted my static site to Hugo in January of 2020 everything went smoothly except the URL mapping. I wanted all the existing pages to retain their URLs, but Hugo has Very Definite Opinions about how the URL space should be organized.
Unfortunately Hugo’s documentation doesn’t clearly articulate these opinions. I wasted many hours trying to understand what was going on.
My site had a fairly standard layout of directories each containing one or more HTML files or subdirectories. An index.html
file would be presented when loading the bare directory: .../www/foo/bar/index.html
has the URL http://site.com/foo/bar/
. Other *.html
files in a directory would have URLs ending in .html
: .../www/baz/page.html
has the URL http://site.com/baz/page.html
.
My first pass at converting to Hugo I retained the directory & file layout, and just converted everything from html
to md
. So we have something like this:
$ ls -RF content/
content/:
dir-one/ index.md
content/dir-one:
index.md stuff.md subdir/
content/dir-one/subdir:
index.md sub-stuff.md
Publishing with hugo -D
got me the following (eliding *.xml
, dist/*
, categories
, tags
and other boilerplate):
$ ls -RF public/
public/:
index.html
What the hell? Where is all of my content?
I read more docs and figured out that I needed to name the index files _index.md
(with a leading underscore), not index.md
. Like so:
$ ls -RF content/
content/:
dir-one/ _index.md
content/dir-one:
_index.md stuff.md subdir/
content/dir-one/subdir:
_index.md sub-stuff.md
Publish, and now I can at least see something:
$ ls -RF public/
public/:
dir-one/ index.html
public/dir-one:
index.html stuff/ subdir/
public/dir-one/stuff:
index.html
public/dir-one/subdir:
index.html sub-stuff/
public/dir-one/subdir/sub-stuff:
index.html
But it’s nothing like the simple layout that I’ve got in my content directory. Each leaf-node file is its own directory instead of being an HTML file in a directory. What the heck?
OK, so the docs talk about uglyurls
. Let’s turn that on and maybe I’ll get something sensible…
$ ls -RF public/
public/:
dir-one.html index.html dir-one/
public/dir-one:
stuff.html subdir/ subdir.html
public/dir-one/subdir:
page/ sub-stuff.html
That’s … not even wrong. The leaf node files are in the right place, but the index files are now leaf nodes in their parent directory. Insane.
At this point I gave up trying to get Hugo to do something sensible automatically, and I just put a url
parameter in the front matter of every leaf node page. So now content/dir-one/stuff.html
is:
---
title: "Stuff"
date: 2020-02-02T18:11:31-08:00
url: "/dir-one/stuff.html"
---
This is stuff
This means I’d need to hand-edit the front matter if I ever rearrange my site, but I don’t think I’m going to need to move the legacy content around very much. For most of the new content I’ll probably just do things “The Hugo Way”, and live with its ideas about URLs.
After all of this headdesking I came to some conclusions about Hugo. I don’t know if its developers would agree with them, but they helped me get a working mental model.
- Hugo has a lot of plumbing for dealing with blog-like content. It has built in support for “categories” each having having multiple “posts”, generally organized by date.
- The assumption is that you’ll have a fairly simple URL space outside of your blog-ish content. Maybe an “about” page and a few other static bits.
- Hugo wants every page to have a URL ending in a slash, even if it’s represented by a single markdown file on disk.
- The URL space of your published website is not (and should not be) closely tied to the directory layout of your development environment on disk.