Search, a Client-based Approach

Image of a MacBook Pro with backlit keyboard in the dark
You could do search locally, it's much faster than server-side search with a less than humongous dataset. Photo by Tobias Horvath.

Right now, serving this website executes no code on the main web server. Other than, you know, the server itself. So I took a client-based search approach. It’s blazingly fast, but there’s also a few downsides. Let me elaborate.

If you serve your site or at least parts of it dynamically, you can use the search engine of your CMS or blogging engine of choice.

This site was born from the idea of being all static. This doesn’t necessarily make sense, but I wanted that. So what were the choices search if your site should stay strictly non-dynamic?

  • Use Google Site Search and have a Google-branded page handle your visitors requests for search. It rips them from your site and delivers results only as current as the last time Google crawled and processed your content.
  • Use a JavaScript-based full text search framework like lunr.js and make sure to feed it with an index of your content.
  • Use a JavaScript-based solution, tinier and optimized for the task at hand. Alex Pearce discussed this on his blog at alexpearce.me. Based on this, Christian Fei made a Jekyll plug-in that you can readily use.
  • Develop your own tiny solution.

I did the latter.

With my principle of not exposing visitors to third parties unless necessary and considering outsourcing on-site search not to be an admirable quality, Google Site Search was out.

lunr.js on the other hand is a great tool but it was too sizey.

The Jekyll plug-in solution works similar to my search, but I still wanted to reduce the code and learn a bit while doing so.

Keywords

In order to allow keywords to tag my articles as well, I decided to add tags to the YAML front matter of my posts and export that data as an array in my JSON output.

For example, my Arch Linux with Whole Disk Encryption article has the following in its front matter:

---
title: "Arch Linux with Whole Disk Encryption"
tags: linux, arch, lvm, dmcrypt, wde, mkinitcpio, systemd
---

You have to do some reverse thinking but I actually like the way this works.

If someone is on the site looking for the wde (short for whole disk encryption) article, they will find it by entering wde in search, even though it’s not part of the article title.

If it gets bigger

Once the site considerably grows in size, the solution may become clumsy. After all, the search.json contains all article title and keywords at all times. While not bigger than the Archive page itself, it would still have to load at least once for each visitor.

Make sure to set caching headers for the JSON file so browsers don’t download the file multiple times.

Things to consider:

  • Defer loading of the search.json until after someone enters the search field deliberately. This might delay the search results by a tiny bit.
  • Develop a server-based search that you simply query client-side. And that’s where a pure client-based solution breaks.