Faceted search
Table of Contents
Faceted search is a parametric search with the difference that user can judge about distribution of results by different cataegories (facets) upfront. One more improvement is when the system sugests most relevant facets depending on the type of search. For example, if user searches:
- for screen - resolution and diagonal are relvant parameters
- for fridge - volume and energy efficiency are relevant parameters
Examples #
It would be easier to grasp the concept by checking examples: algolia.com, searchkit.co, reactiveapps.io, addsearch.com, search.io, coveo.github.io, analysis-tools.dev.
E-commerce example #
Classical example of faceted search is search for e-commerce website.
Typically you would see following filters:
Hierarchical | Categorical | Numerical range |
---|---|---|
Aditionaly there would be:
- results displayed as a grid (first screenshot) or as a list
- pagination
- sorting, for example by price, by popularity or relevance
Rental website #
Idea is the same, but additionally:
- results can be displayed on the map
- filter may include date range
Related #
Similar ideas can be seen in data tables, in computational notebooks, like Jupyter, Kaggle, Observable etc.:
Kaggle | Observable |
---|---|
Or in some plotting libraries
Range slider | Scatterplot Matrix (SPLOM) |
---|---|
Backend #
Server #
Typical solution is to use some kind of search engine with support for faceted search, for example:
- meilisearch
- typesense
- tantivy
- There is an attempt to compile it to WASM
- Not a typical choice, but also may work - DuckDB, because it has
Full Text Search and
GROUPING SETS
- there is WASM version, but it is kind of big
UI #
Except backend you would need some kind of UI. There are a lot of candidates:
- instantsearch Plain JS, React, Vue, Angular
- reactivesearch React, Vue
- AddSearch/search-ui Plain JS
- coveo/search-ui Plain JS
- sajari/search-ui React
- Flowbite: Tailwind CSS Faceted Search Drawers
Client #
But I’m more interested in client-side faceted search. There are a lot of client side full-text search engines: orama, pagefind, lunr.js, flexsearch.
And even more fuzzy-text search “engines”: uFuzzy, fuse, fzf-for-js, fuzzysort, quick-score.
In similar way we can use faceted search at the client side. I found 3 libraries:
Experiment #
I decided to try them out. I started with tanstack and shadcn/ui
(React, Radix, TailwindCSS). Then I replaced faceting capabilities with Orama, but preserved UI. Then I replaced faceting capabilities with ItemsJS.
I found couple datasets for the demo:
Demo is not ideal, but enough to compare approaches:
- For filter with checkboxes I use
Comand
component, which is probably wrong. Instead component should be able to load more options and use some kind of fuzzy search - Filter with slider misses number marks. See #1188
- Filters should be collapsible, like Accordion component
- I need to store state of filter in URL
- UI “jumps” - scroll position changes unexpectedly (sometimes)
Tanstack table native faceting #
I’m impresed by Tanstack table, it packs so many features and has elegant API layer.
- Filter with checkboxes
- Options should be sorted by frequency
- Options should be limited to first 10-20, with ability to fetch more on request
- Search and sorting is done in main thread, so there is slight latency on keyboard input
- There is no full-text ssearch (only substring match), but this is irrelevant, because I’m mainly interested in faceting
Tanstack table + Orama #
I wanted to preserve the same UI, so I integrated Orama in Tanstack table.
Initial load of the data (10000 records) was so slow that I had to move it in Web Worker. Later, I limited demo to 1000 records.
Orama has decent full-text search, but faceting is sad:
- Filter with checkboxes
- Options for
string
facets sorted by frequency, but forstring[]
are not - When option is selected it removes values from the same facet, but instead it should only change other facets
- There is no way to limit number of options returned for the facet
- Options for
- Filter with slider
- There are no min and max values for facets, so this filter in demo is broken
And there are another small bugs.
Tanstack table + ItemsJS #
ItemsJS focuses on faceting, and full-text search is outsourced - by default, it uses Lunr. But you can switch to another solution, for example, minisearch.
Secret sauce is FastBitSet.js.
It supports:
- moving selected options to top
- limiting number of options per facet
- min, max values for numerical facets
- preserving unselected options in facets
I almost didn’t find downsides, except:
- TypeScript signatures can be better (
extends {}
) - For one letter search, it returns empty result, but I think this is due to full-text search engine
Other things to try:
- integrate different full-text or fuzzy search engine
- move it to Web Worker
- integrate with Instantsearch
- implement slider component with mini-plot
- implement date-range component
- implement hierarchical categories component, like file tree
Other ideas and links #
Prebuild index for static websites #
Typical solution for search for static websites, like Hugo, is to load data as JSON in memory and then index it. Is there a way to build index upfront and fetch it from the server with HTTP range request? It can be optimized-for-reads format, like Arrow.
- stork (deprecated) has CLI for building index and JS library to consume it.
- orama/plugin-data-persistence can store index data as JSON or as dpack, but not sure if stores raw data or index.
- Pagefind has CLI for building index and JS library to consume it. Stores index as CBOR.
Related: RoaringFormat.
Benchmarks for full text search #
Read more: Facets, search_syntax