Static site search optimization - Chris' Externalized Inner Monologue

I get this recurring urge to have a nice website, and usually fail to pull through after doing some interesting things and subsequently losing interest. The jury is still out on how it’s going to go this time, but I certainly reached an interesting thing: Search without a full-blown search engine.

Zola, the static page renderer that’s in use here, comes with support for two search libraries out of the box:

Fuse.js (a few more features)
Elasticlunr.js (leaner bundle)

Both have as basic operating principle that the static page renderer generates an index file that is loaded and searched directly on the client. Elasticlunr.js has a bit smaller footprint, so I went with this one.

I have a bit of a background with search technology, so having search on my site is absolutely necessary. Also, to avoid embarrassing myself, it should be decent.

Some functionality I added, and why:

Search quality🔗

These are implemented by passing additional parameters when calling Index.search.

Weigh fields by signal strength🔗

Matches in highly specific fields are stronger indicators for relevance than matches in fields containing lots of generic text. Zola’s integration of Elasticlunr.js index generation doesn’t make a lot of fields available, so there isn’t much we can do, e.g. to boost specific tags.

At least, we can give titles, which are densely filled with relevant information, a health boost relative to the document body.

{
  fields: {
    title: { boost: 1.5 },
    body: { boost: 1.0 }
  }
}

Enable term expansion🔗

Adding expand: true to the search options allows the search engine to expand the user query into tokens that exist in the index. This is useful for autocomplete and live search, where results for incomplete user input should be generated.

Fuzzy search🔗

Unfortunately, Elasticlunr.js doesn’t support real fuzzy search with spellchecking and stuff by itself, so I’d have to build that myself. Loading the whole index on the client is already a lot, and including a full dictionary as well wouldn’t help with page speed, so I’m not going down that road yet.

As a compromise, I check the number of results after searching, and switch to OR logic, which requires only some of the search query terms to match rather than the default of requiring all terms to match. It’s basically just sending another request with bool: 'OR'.

This will still provide somewhat relevant results if one of the words is mistyped, but won’t help if the search query consists only of a single word.

User experience🔗

Search as you type🔗

Search is notoriously unreliable in most places, so getting someone to use it is already signaling trust. Giving much feedback nurtures that trust and guides the user towards a search query that works for them well.

Autocomplete would be best for that, because it gives a more focus overview of how well the query is finding results, but then I’d end up with a second thing to maintain.

Live search/search as you type is the next best thing, as it combines the benefits of frequent feedback with low maintenance.

Fuzzy search transparency🔗

Flipflopping between lots of semi-relevant results (with fuzzy search) and few specific results (with strict search) could confuse users and erode trust in the quality of the search feature. This is mitigated by showing when fuzzy search is used next to the number of results.

Linkable search results🔗

Searching automatically adds the user query with ?q=<user query> to the URL. Opening any page with this parameter automatically triggers search, allowing to link search results.

This is more useful in a context in which specific sets of products are often shared with others, e.g. e-commerce, but it helped me while working on search, because I didn’t have to type stuff to see the effect of my most recent changes after reloading.

Impractical things I would have liked to do🔗

Highlighting🔗

Showing a snippet of the matching document that contains the search query would have been great to illustrate the relevance of the results, but Elasticlunr.js doesn’t offer such functionality, so I’d have to build my own highlighter that extracts relevant snippets from the document.

Doing so isn’t particularly hard, but the index size increases if all article’s content has to be fully included in the index file, which is detrimental for overall performance.

Synonyms🔗

Synonyms would be useful to bridge linguistic ambiguity that humans take for granted, like finding house when searching for home.

This is yet another feature that Elasticlunr.js doesn’t have, and Zola isn’t open to be extended in that regard without submitting a patch upstream. The DIY effort isn’t worth it.

An interesting alternative could be vector search, but that could blow up index sizes and require WebGPU to work well, if decent-enough models aren’t too large to deliver to the browser in the first place.

Wait, should I have used Fuse.js instead?🔗

It can do better fuzzy search, but is more limited in language processing, so the scale remains level.

If I switch to another solution, it probably will be something on the server side so I can do things that would be too heavy for the browser, like vector search.

Conclusion🔗

Being used to working with the likes of Elasticsearch and Solr, it’s quite refreshing to see how far you can get in terms of search quality without using these behemoths. It’s clear that there is no reason to run a full search daemon for small sites.