Skip to main content

Faceted search

··
Update: checkout facets and pagefind-instantsearch

Faceted search is a parametric search with the difference that user can judge about distribution of results by different cataegories (facets) upfront. One more improvement is when the system sugests most relevant facets depending on the type of search. For example, if user searches:

  • for screen - resolution and diagonal are relvant parameters
  • for fridge - volume and energy efficiency are relevant parameters

Examples #

It would be easier to grasp the concept by checking examples: algolia.com, searchkit.co, reactiveapps.io, addsearch.com, search.io, coveo.github.io, analysis-tools.dev.

E-commerce example #

Classical example of faceted search is search for e-commerce website.

Typically you would see following filters:

HierarchicalCategoricalNumerical range

Aditionaly there would be:

  • results displayed as a grid (first screenshot) or as a list
  • pagination
  • sorting, for example by price, by popularity or relevance

Rental website #

Idea is the same, but additionally:

Similar ideas can be seen in data tables, in computational notebooks, like Jupyter, Kaggle, Observable etc.:

KaggleObservable

Or in some plotting libraries

Range sliderScatterplot Matrix (SPLOM)

Backend #

Server #

Typical solution is to use some kind of search engine with support for faceted search, for example:

UI #

Except backend you would need some kind of UI. There are a lot of candidates:

Client #

But I’m more interested in client-side faceted search. There are a lot of client side full-text search engines: orama, pagefind, lunr.js, flexsearch.

And even more fuzzy-text search “engines”: uFuzzy, fuse, fzf-for-js, fuzzysort, quick-score.

In similar way we can use faceted search at the client side. I found 3 libraries:

Experiment #

I decided to try them out. I started with tanstack and shadcn/ui (React, Radix, TailwindCSS). Then I replaced faceting capabilities with Orama, but preserved UI. Then I replaced faceting capabilities with ItemsJS.

I found couple datasets for the demo:

Demo is not ideal, but enough to compare approaches:

  • For filter with checkboxes I use Comand component, which is probably wrong. Instead component should be able to load more options and use some kind of fuzzy search
  • Filter with slider misses number marks. See #1188
  • Filters should be collapsible, like Accordion component
  • I need to store state of filter in URL
  • UI “jumps” - scroll position changes unexpectedly (sometimes)

Tanstack table native faceting #

I’m impresed by Tanstack table, it packs so many features and has elegant API layer.

  • Filter with checkboxes
    • Options should be sorted by frequency
    • Options should be limited to first 10-20, with ability to fetch more on request
  • Search and sorting is done in main thread, so there is slight latency on keyboard input
  • There is no full-text ssearch (only substring match), but this is irrelevant, because I’m mainly interested in faceting

Tanstack table + Orama #

I wanted to preserve the same UI, so I integrated Orama in Tanstack table.

Initial load of the data (10000 records) was so slow that I had to move it in Web Worker. Later, I limited demo to 1000 records.

Orama has decent full-text search, but faceting is sad:

  • Filter with checkboxes
    • Options for string facets sorted by frequency, but for string[] are not
    • When option is selected it removes values from the same facet, but instead it should only change other facets
    • There is no way to limit number of options returned for the facet
  • Filter with slider
    • There are no min and max values for facets, so this filter in demo is broken

And there are another small bugs.

Tanstack table + ItemsJS #

ItemsJS focuses on faceting, and full-text search is outsourced - by default, it uses Lunr. But you can switch to another solution, for example, minisearch.

Secret sauce is FastBitSet.js.

It supports:

  • moving selected options to top
  • limiting number of options per facet
  • min, max values for numerical facets
  • preserving unselected options in facets

I almost didn’t find downsides, except:

  • TypeScript signatures can be better (extends {})
  • For one letter search, it returns empty result, but I think this is due to full-text search engine

Other things to try:

  • integrate different full-text or fuzzy search engine
  • move it to Web Worker
  • integrate with Instantsearch
  • implement slider component with mini-plot
  • implement date-range component
  • implement hierarchical categories component, like file tree

Prebuild index for static websites #

Typical solution for search for static websites, like Hugo, is to load data as JSON in memory and then index it. Is there a way to build index upfront and fetch it from the server with HTTP range request? It can be optimized-for-reads format, like Arrow.

  • stork (deprecated) has CLI for building index and JS library to consume it.
  • orama/plugin-data-persistence can store index data as JSON or as dpack, but not sure if stores raw data or index.
  • Pagefind has CLI for building index and JS library to consume it. Stores index as CBOR.

Related: RoaringFormat.