A search engine in CSS

Published in

Algolia Stories

10 min readJan 9, 2018

Last November I had the opportunity and the pleasure to give a talk at dotCSS. All the dotConferences are high quality, but dotCSS is my favorite. I’ve been inspired by talks I saw there year after year. Being on stage myself and sharing my knowledge with such a large audience was a very enjoyable experience.

April Fools Joke

The talk I gave was the technical explanation of the April Fools Joke we did last April at Algolia. On the first of April, we released our brand new CSS API Client, announcing that we were about to close most of our servers because we invented the first API client that does not call its own API servers.

This was of course a joke, but we tried to create an appealing-enough illusion so people start wondering if we were joking or serious. And a central piece in that endeavor was the live demo.

Search into the list of Algolia employees, by name, job title or team using only CSS.

Hacking spirit

What started as a friendly challenge with coworkers, to see how far we could go with CSS mimicking the features of a search engine ended with something that actually worked!

The whole demo was done in a spirit of hacking. Not hacking as the act of finding vulnerabilities, but hacking as the act of overcoming the limitations of a system to achieve new outcomes. In this case, the system and its limitations were CSS, and the new outcome was to build a search engine.

If we try to get to the core of what a search engine is, we quickly realize it’s nothing more than a simple machine that expects keywords as input and will give you results as output. What happens inside the machine will define if the search experience is great or average.

Of course I could not settle on “just an average” search experience. I knew I needed to have the three pillars of what makes great search: relevance, speed and UX. I was happy with the results of the demo as I think it had those three points. Of course it can still be improved, but for a CSS hack I think it’s pretty honest.

Overcoming the limitations of CSS

CSS is a styling language, so its building blocks are different from the classical programming languages. Take Ruby, PHP, C++ or JavaScript and you will have variables, functions, loops, conditions and regular expressions as part of the core building blocks of the language.

In CSS you have none of that, or at least not as standard as in those other languages. When starting to build a search engine, one might think that variables, loops and regular expressions are mandatory — and that, if the language can’t provide them, then the language can’t do a search engine.

As I really wanted to build a search engine in CSS, I could not let this block me. I could not start a project thinking it was not doable. So I decided not to focus on what CSS could not do, but focus on what it could do well instead.

I quickly realized that the main strength of CSS lies in its selector engine. CSS can target elements using their tag name, class name, id, the value of its attributes or even their ancestors or siblings in the markup. And best of all, you can actually combine all those selectors to make some really precise ones.

CSS cannot work by itself. You always need to have an HTML to apply CSS to. But even if you have the most dirty and unsemantic HTML file, you’ll be able to craft the perfect CSS selector that will target exactly what you want. No matter how large the pool of potential, irrelevant, elements you have in your HTML, CSS will still be able to only target the ones you’re interested in.

This ability to only select relevant items from a large pool of choices is very important, as this is exactly what you expect from a search engine.

With a search engine you have a large pool of potential results, but you’re only interested in the ones that are relevant to your keywords.

When I realized that, I knew I was on to something. If I could align the strength of CSS with the desired outcome of the search engine, I would be able to build this actual demo.

Starting small with some markup

The whole hack is based on very simple markup. All you need is an `input` (that will act as your search bar) and an empty `div` (that will hold your results):

Using `input[value=”tim” i]` as a selector, you can actually target the input, based on its current value. Here, using the ` i` specifies that the match should be case insensitive.

On top of that you can add `~ #result` to your selector so it will now select the `<div id=”result”>` that is placed after the input. Now that you can target the empty div, you can completely change its styling and its content with a `:before` pseudo-element and some `content`.

The final CSS selector looks like this:

When you get to that point, you have a CSS selector that, based on the value of an input element somewhere on the page, will let you change the content and the styling of another, completely unrelated, element.

This long selector is the cornerstone of the whole hack but comes with a small drawback: it does not (really) work.

JavaScript to the rescue

It does not really work because when you load the page for the first time, your search bar is actually empty. So the HTML `value` attribute is set to an empty string.

When you start typing something in the input, the HTML `value` attribute of the `input` is not updated. Typing in an input only updates a dynamic value, not the static one that is present in the markup. That’s really a shame because it’s specifically this static value that CSS is reading.

At that point, I could have simply submitted the form by pressing Enter. On the server-side I could have intercepted the form value, written it back in the markup and re-rendered the page. It would have worked, but it would not have been fast. And as I really wanted something instant, I had to resort to using JavaScript.

I would have loved not to use JavaScript and stay with a pure CSS solution, but I could not find a way. If you, my dear reader, know a way to avoid using JavaScript, let me know!

Anyway, the JavaScript I added was really minimal. It’s a simple `oninput` handler on the input that will read the dynamic value and set it back in the HTML `value` attribute. Triggered at each change in the input, it will allow me to have instant results as I wanted.

Several results

So far the markup I have only allows me to display one result when I’m typing the correct keyword. What I would like is to display all the matching results. For example, if searching for “Alexandre”, I want to display all the people named “Alexandre” in the company.

To do that, I only have to slightly edit the markup. Instead of one empty `div` to hold the result, I actually create 150 empty `div`s. One per potential result, i.e., one per employee.

In addition, I’ll also pre-fill each `div` with the name of an employee. I’ll be using the same `:before` and`content` trick here. But I will not actually display any of those `div`s. They will all have a `display: none` by default.

I’ll display them only in specific circumstances. I’ll switch their `display` to `block` only when there will be a matching keyword typed.

Now, whenever I type `Alexandre`, I’ll actually have `#result15`, `#result16` and `#result17` displayed while all the other will stay hidden.

Understanding the search engine

When I got to that point I was already quite happy with the results. I knew how to display results based on the typed keywords. Now the question was turning to “what keywords should yield results?”.

It isexactly at that moment that I decided to take a step back from the project and try to really understand what I was trying to achieve. So far I had mostly solved the challenge by using clever CSS hacks. But if I wanted to go further, I needed to actually better understand what I was building.

Fortunately, working at Algolia, I’m surrounded by people that use and build search engines all day. I sat with a few colleagues and they explained how a search engine was actually working.

Every single search engine (be it Algolia, ElasticSearch or Solr) is made of two parts: the indexing and the searching.

The searching might be the most obvious. It’s what we’re confronted with every time we use a search bar. Every time we type something in a search bar, we are asking ourselves an implicit question: “what will I find if I type `alex`?”.

But when actually building a search engine, you have to ask yourself a very different set of questions. You have to take every single potential result and ask yourself: “What should a user type to be able to find `Alexandre Meunier`?”.

So I started taking the list of all employees to see what should be typed to find them. I wanted to find all the people named Alexandre by just typing `alex`. Actually, `alex` should also find people named Alexander, Alexandra or Alexandria…

It goes even further. I should be able to find everyone named `alex` by just typing `ale`, or `al` or even just `a`. And I should be able to find Alexandre, Alexandra and Alexandria by typing `alexandr`.

But I also wanted to find people by their last name and job title…

Building the n-grams

What I ended up doing was generating all the potential n-grams that would lead to a result in my list of employee. An n-gram is a sequence of characters that will lead to a result. For example `t`, `ti` and `tim` are all the n-grams that could find `Tim`.

Of course I did not do that by hand — I wrote a Ruby script to help me generate n-grams from strings. I applied it to employee names, and then used the results to write a long list of CSS selectors that look like this:

Using selectors like this, I can now type only the start of any first name, and it will display the matching results. It is very verbose, but it does actually work.

The final CSS generated is quite long, as you have to do this for the first name, last name and job title of every single one of the 150 employees. This is also the step where you start adding more features. I wanted the hack to be easy to use, so I also added support for accented characters, synonyms and some (limited) typo-tolerance.

Improving the display

The last details I added were to make the search engine even more easy to use. I added some ordering of the results using the `order` property. I grouped relevant results together: a match in the first name will be ranked higher than a match in the last name.

I also added highlighting, so the typed keyword was bolded in the results. Highlighting is an often overlooked part of search engines, but it is very important one: this is how you explain to your users why a result is there.

The difficulty here was that I was adding all my content through `:before` and `content`. I had no HTML to style, only pure text, already included in CSS.

In order to bold a specific part of the display, I had to cheat. I created a new font by merging the regular font and its bolded version into one. I put all the bolded glyphs into the Private Use Area of UTF8. This is a specific namespace where you can put whichever glyphs you want. In the end, I have a set of glyphs that look exactly like bolded versions of my regular characters, accessible through obscure code points.

All I have to do is to replace the regular glyphs with their bolded version when I need highlight, and that’s it. The CSS is completely unreadable but it works. The downside is that you have to do that for every single n-gram you generated previously, making the CSS file grow even larger.

Conclusion

The final CSS file was 8MB, which I managed to get down to 5MB once minified. This is still way too much, which is why I would strongly discourage anyone to use this in production!

The whole project might have seemed crazy or even impossible at the start, but as I like saying:

If there’s one thing I learned from making a project like this, it’s that using a language to make it do things it was not meant to be doing is an incredible learning experience. I learned so much about the strengths and weaknesses of CSS by pushing it to its limits.

I can only encourage you to try crazy things with CSS (or any other language for that matter), it will give you a lot of clarity on the best ways to use the language.

I hope I inspired a few of you, and I’d love to see what you’ll build — leave a comment or a tweet any time!