SEO-friendly e-commerce with Algolia’s instantsearch.js

Baz
Algolia Stories
Published in
10 min readJun 27, 2017

--

When I first came across Algolia, and in particular instantsearch.js, I fell in love with the speed, the great way the UI components interacted with each other, and the ease of implementation. But very soon after experimenting and understanding how it worked, I became, like many others it seems, quite concerned about using it heavily in a website that would rely on SEO, indexing of large amounts of content, and ranking in the SERPs.

Nevertheless, I continued, and feel like I overcame the various SEO challenges that come with any AJAX-dependent website. Below is a detailed explanation of my lessons-learned and the functionality I was able to implement.

Some background

All The Dresses brings together all the women’s clothing available for hire in Australia into one beautiful, easy-to-use and extremely fast (thanks to Algolia) website. We currently list around 3,000 items from 15 of Australia’s best online rental boutiques.

All The Dresses on Desktop

Yes, technically, All The Dresses isn’t an e-commerce website, as it doesn’t have a cart or checkout, but it still follows the familiar concept of presenting product listings to consumers in various categories/segments, which then lead to individual pages for each product. As such, all of the content below should still be relevant to the majority of e-tail websites.

Before we get stuck into the detail, let me state that All The Dresses was written entirely from scratch (apart from leveraging several opensource JS scripts for specific UI functionality). That is, we are not using any e-commerce platform such as Magento, WooCommerce or Shopify. The main languages/platforms/frameworks used are PHP, MySQL and Jquery. The site uses infinite scrolling, as described by Algolia here, with a few tweaks.

A single PHP search/listings page handles the homepage view (or “Our Favourites”), designer view and category view.

All The Dresses on Mobile

SEO concerns and how I dealt with them

Loading content in the source HTML

The major concern most would have with instantsearch.js, or any websites relying on AJAX calls to populate content, is the doubt about whether Googlebot would see and index that content. Historically, Google’s crawlers would only obtain source HTML, but advancements over recent years have proven that Google is indeed able to execute Javascript and index content loaded via Javascript on page load. Having said that, there is no definitive answer on whether it always does this, how well it does it, and whether it does it for all kinds and complexity of websites. In other words, Googlebot is not as predictable in indexing dynamic content as it is in indexing source (or static) content.

“…Googlebot is not as predictable in indexing dynamic content as it is in indexing source (or static) content.”

To give me some level of confidence, I used the “Fetch as Google” function within Google Search Console. This tool allows you to see a screenshot render of a page on your website, as seen by the Googlebot crawler. If you run this tool over your website/page, and see all your content in the screenshot that is generated, you can have a little more confidence that that content will in fact be indexed and presented in SERPs. If you can’t see your content in the screenshot, you probably have a problem and you should re-consider the structure/architecture of your page.

Quick Tip: Ensure that all .js, .css and image files are not blocked from Googlebot via robots.txt or any other mechanism so that Googlebot is able to execute the Javascript you need it to.

Here is an example of what “Fetch as Google” saw for one of our pages, a designer product listing for Zimmermann. All instantsearch widgets appear to have successfully loaded.

How Googlebot saw the Zimmermann designer listing page

I only discovered this feature fairly close to launching the website. I decided early on that to be on the safe-side, content should be loaded as part of the source HTML where possible. Even if I had discovered what I did about Googlebot early on, I would still stick by this decision. To do this I used the Algolia PHP API to make a request that matched the instantsearch.js request that would be made upon load.

“…content should be loaded as part of the source HTML where possible.”

Instantsearch.js offers two options for URL syncing (updating the URL as filters/search query are modified by the user) — query string and hash-based. You should use query strings. This is considered more optimal by Google, and also makes it easy for you to utilise those values in PHP.

After building up the appropriate filters in PHP and making the request to Algolia via the API, I output the results to HTML that is very close to exactly the same as what is produced by instantsearch.js. Instantsearch is set to display 20 items at a time, so this is all that I output to the source HTML via PHP. If you visit a designer listing page on All The Dresses, for example, Talulah Dresses, then “View Source” in your browser, you’ll see that the details of the first 20 products are in the source HTML as part of regular markup. I also output the refinement lists for Designer and Website, which are returned by the PHP API as part of the same request if you specify facets.

Instantsearch.js runs on load, and dynamically replaces the source HTML with the results it has fetched. This basically has no impact on the page display, as for the most part it is replacing source HTML with exactly the same HTML. The perfectionist in me was annoyed by this concept — making the same request twice, once via PHP and once via Instantsearch, as well as replacing content with the exact same content — but I got over it. For All The Dresses, it has minimal impact on performance or user experience. Ideally, instansearch.js would offer the possibility of passing it a starting state, without the need to make an initial request to Algolia, but unfortunately this isn’t available. The team at Algolia have said this is something they are including into a new release of React instantsearch — let’s hope they do the same for instantsearch.js :)

Infinite Scrolling

Ok, so I’ve handled outputting the source HTML for the first 20 products of any query above. But, All The Dresses utilises infinite scrolling, and a particular query will often return more than 20 products, so how do I get Googlebot to index the content of the rest of those products on a listings page?

The simple way would’ve been to avoid using infinite scrolling at all and to go back to using pagination or Prev/Next buttons, but I just loved the experience infinite scrolling gave the user, so I was determined to persist.

“I just loved the experience infinite scrolling gave the user, so I was determined to persist.”

Previous and Next pages can be specified in the <head> of your page using link attributes of rel=”next” and rel=”prev”. This guides Google on the logical sequence of your pages. In these tags I would specify the next/prev pages to be the same URL as the current page, but with a starting page parameter attached. For Example:

<link rel="next" href="/designer/Self-Portrait/1">

I use URL-rewrites to make the URLs search engine friendly (more on this later), but this is basically passing a parameter of startPage=1 (0 represents first page). I pass this to both the Algolia PHP API and to instantsearch so that the results start at product #21 for that query. On page 2, there is a rel=prev with a URL setting startPage to 0 (or left unspecified), and a rel=next with a URL setting startPage to 2. This continues until there are no more products for that query. In conjunction with the 20 products, the Algolia PHP API also provides the total # of results that are available for the query. You can use this to determine whether there are any more products after a particular page, and therefore whether you should include a rel=next tag.

Googlebot follows each of these URLs and therefore indexes the content.

Note: When a user visits a page that lists products starting from anything other than number 1, they can access all products after that starting point via infinite scroll, but they can’t access products prior to that starting point. This really only affects users if they’ve arrived at the website via a Google result specifying a startPage other than 1, as there is no other natural way to arrive at these pages.

Ideally, the page would have reverse infinite scroll, which wasn’t a feature of Algolia’s infinite scroll solution. I may implement this at some point in the future, but it isn’t a priority right now. Here’s a great example of search engine friendly infinite scroll with reverse capability: http://scrollsample.appspot.com/items

Filters & Friendly URLs

For All The Dresses, “Designer” was the only filter parameter that was really important for SEO. Using Apache mod_rewrite to enable friendly URLs, I made URLs directly to product listings for a specific designer, such as:

http://allthedresses.com.au/designer/Thurley

This would pass a parameter of designer=Thurley to the same page handling all listings (search.php). As per before, I use the parameters passed to the page to build up the filters that are sent to the Algolia PHP API. In addition, I output these parameters as Javascript which is then passed to the instantsearch initialisation method.

Googlebot finds these URLs in one of three ways:

  • Each designer in the refinement list is actually an <a> tag with an href to the designer specific friendly URL page. Clicking it doesn’t actually take you there, it just applies the filter on instantsearch, but as far as Google is concerned it’s a valid link and it will follow it.
  • I have a dedicated Designer listing page, which lists every designer, each with a link to the appropriate friendly URL.
  • Sitemap that was submitted to Google. I use a PHP class, sitemap-php

Apart from specifying a page number as described earlier, the only other URL feature I’ve implemented is for predefined searches. In a nutshell, I take common search terms (or categories, if you will), such as “Red Dress” and “Floral Playsuit” and make a search for these terms available at a URL like:

http://allthedresses.com.au/items/White+Lace+Dress

This will conduct a PHP API and instantsearch query for the phrase “White Lace Dress”. The results displayed aren’t always a White Lace Dress, but generally, it’s pretty accurate, and it’s a hell of a lot easier than having to categorise each product in the database. Overall, this increases the chances of appearing in Google results for these search terms.

I make these links available via the Sitemap, as well as a Categories page (which deliberately isn’t that easy to find).

Handling Titles and Short Descriptions

So we’ve now got individual search-engine friendly URLs for each designer and each category. To improve the SEO on these pages, I need an <h1> heading related to the designer or category, and for some designers, I chose to have a text (and keyword) rich 100-or-so word description.

Let’s use the designer listing page for Dion Lee, as an example. The challenge here is that we now have a page with a heading of “Dion Lee” and a short description above the product listings, but the user has the ability to completely change what is displayed via the instantsearch refinements. If the user chooses to add another designer as a refinement, should the page still be showing a heading of “Dion Lee”? I would think not. What if the user deselects “Dion Lee” from the designer refinement list? Then I would definitely think not.

I solved this by setting a “title type” (either designer, category, or homepage) data attribute on the heading and using a simple custom searchFunction (an option of the instantsearch() main method). This checks the state of current filters to see if the filter that is aligned to the value of “title type” attribute has been changed. If it has changed, then hide the heading.

Custom function to check current search & refinements against page title

This way, when viewing a designer listing, if the user modifies filters such as Size or Price, the heading stays. As soon as the designer refinement list is modified, the heading is hidden.

Wrapping Up

I hope the information above helps you in your decision of whether to use instantsearch.js in your e-commerce site. The truth is, All The Dresses isn’t old enough for me to categorically say what kind of impact using instantsearch has had or will have on our Google indexing and rankings. I also don’t have an alternate base to compare to (i.e. a pre-instantsearch period). But at least you can see that I’ve thought through the challenges and you now know how I navigated them.

Within 48 hours of launch, All The Dresses was already receiving over 100 visits per day from Google, and this has rapidly grown each week over the last 8 weeks. This is without a single significant backlink to boost our domain or page authority. Admittedly, much of the Google traffic lands at a specific product detail page, which doesn’t need or use instantsearch.

I’d love your feedback on the methods used above as well as our website in general. SEO aside, we think it’s a pretty cool implementation of Algolia instantsearch. Let us know what you think in the comments. Oh, and if you enjoyed the read, please click the heart below to recommend this post. Cheers!

We have just released All The Dresses New Zealand at allthedresses.co.nz — it’s a sister site to All The Dresses (Australia) and runs entirely off the same code-base. Configuration settings just point at a different database and Algolia application/index depending on the domain that the user has come in on.

--

--