How-To Solr E-Commerce Search
Written by: | Posted: | Category:

Solr Ecommerce Search - A Best Practice Guide — Part 1

How-To Do Solr E-Commerce Search just right? Well imagine you want to drive to the mountains for a holiday. You take along your husband or wife and your two children (does it get any more stereotypical?) — what kind of car would you take? The two-seater sports car or the station wagon? Easy choice, you say? Well, choosing Solr as your e-commerce search engine is a bit like taking the sports car on the family tour.

Part of the issue is how Solr was originally conceived. Initially, Solr was designed to perform as a full-text search engine for content, not products. Although it has evolved "a little" since then, there are still a few pitfalls that you should avoid.

That said, I'd like to show you some best practices and tips from one of my projects. In the end, I think Solr is good at getting the job done after all ;-)

How to Not Reinvent the Wheel When Optimizing Solr for E-commerce Search

First, don't reinvent the wheel when integrating basic things like synonyms and boostings on the Lucene level. These can be more easily managed using open-source add-ons like Querqy.

If you want to perform basic tasks such as eliminating specific keywords from consideration, replacing words with alternatives better matching your product data, or simply setting up synonyms and boostings… Querqy does the job with a minimal of effort.

Solr, by default, uses a scoring model called TF/IDF (Term Frequency/Inverse Document Frequency). In short, it scores documents higher with more occurrences of a search term. And lower if fewer documents contain the search term.

For general use cases, how often a search term resides in a text document may be important; for e-commerce search, however, this is most often not the case.

E-Commerce does not concern itself with search term frequency but rather with where, in which field, the search term is found.

How-To Teach Solr to Think Like an E-Commerce Search Manager

To help Solr account for this, simply set the "tie" option for your request handler to 0.0. This will have the positive effect of only considering the best matching field. It will not sum up all fields, which could adversely result in a scenario where the sum of the lower weighted fields is greater than your best matching most important field.

How-To Fix Solr’s Similarity Issues for E-Commerce Search

Secondly, turn off the similarity scoring by setting uq.similarityScore to "off."

<float name="tie">0.0</float>
<str name="uq.similarityScore">off</str>

This will ensure a more usable scoring for e-commerce scenarios. Moreover, by eliminating similarity scoring, result sorting is more customer-centric and understandable. This more logical sorting results from product name field matches leading to higher scores than matches found in the description texts. Don't forget to set up your field boostings correctly as well!

Give my previous blog post about search relevancy a read for more advice on what to consider for good scores.

Even with the best scoring and result sorting, the number of items returned can be overwhelming for the user. Especially for generic queries like "smartphone," "washing machine," or "tv."

How-To Do Facets Correctly in Solr

The logical answer to this problem is, of course — faceting.

Enabling your visitors to drill down to their desired products is critical.

While it may be simple to know upfront which facets are relevant to a particular category within a relatively homogenous result-set, the more heterogeneous search results become, the greater the challenge. And, of course, you don't want to waste CPU power and time for facets that are irrelevant to your current result set, especially if you have hundreds or even thousands of them.

So, wouldn't it be nice to know which fields Solr should use as facets — before calling it? After all, it's not THAT easy. You need to take a two-step approach.

For this to work, you have to store all relevant facet field names for a single product in a special field. Let's call it, e.g., "facet_fields." It will contain an array of field names, e.g.

Facets For Product 1 (tablet):

"category", "brand", "price", "rating", "display_size", "weight""category", "brand", "price", "rating", "display_size", "weight"

Facets For Product 2 (freezer):

"category", "brand", "price", "width", "height", "length", "cooling_volume”

Facets For Product 3 (tv):

"category", "brand", "price", "display_size", "display_technology", "vesa_wall_mount"

If a specific type, e.g., "televisions," is searched, you can now make an initial call to Solr with just ONE facet, based on the "facet_fields" field, which will return available facets restricted to the found televisions.

Additionally, it's possible to significantly reduce overhead by holding off requesting untimely product data at this stage.

It may also be the right time to run a check confirming whether you get back any matches at all or if you ended up on the zero result page.

If that is the case, you can either try the "spellcheck" component of Solr to fix typos in your query or implement our SmartQuery technology to avoid these situations in most cases right from the start.

Now, you use the information collected in the first call to request facets based on "category", "brand", "price", "display_size", "display_technology" and "vesa_wall_mount", in the second call to Solr.

How-To Reduce Load with Intelligent Facet-Rules !

Supercharged engine ready for heavy load.

You might argue that some of these facets are so general in nature that there isn't a need to store and request them each time—things like category, brand, and price. And you would be right. So if you want to save memory, use a whitelist for the generic facets and combine them with the special facets from your initial request.

Let's have a look at an example. Imagine someone searches for "Samsung." This will return a very mixed set of results with products across all 3 areas of the above facets example. Nevertheless, you can use the information from the first call to Solr to filter out facets that do not apply to a significant sample of the result.

A note of caution: the additional effort of filtering out facets with low coverage may prove more useful, at a later stage, once additionally applied filters — on the category, for example — reveal a particular relevance for a given facet, which was not evident initially. Once the user decides to go for "Smartwatches" following a search for "Samsung," the "wrist size" suddenly gains importance. This makes clear why we only drop facets that are not present in our result set at all.

Now that the result has facets, it might make sense to offer the user a multi-select option for the values. This allows them to choose, side by side, whether the TV is from LG, Samsung, or Sony.

How-To Exclude Erroneous Facet Results

The good news is that Solr has a built-in option to ignore set filters for generating a specific facet.

facet.field={!ex=brand}brand fq={!tag=brand}brand:("SAMSUNG" OR "LG" OR "SONY")

This is how we tag the facet field to exclude it during filtering. Then using the filter query, we have to pass that tag again, so Solr knows what to exclude.

You can also use other tags. Just be sure to keep track of which tag you use for which facet! So, something like this also works (using "br" instead of the full field name "brand" — this is useful, if you have more structured

field names like “facet_fields.brand”):

facet.field={!ex=br}facet_fields.brand fq={!tag=br}facet_fields.brand:("SAMSUNG" OR "LG" OR "SONY")

Define Constraints for Numeric Fields for Slider-Facets

But what about numeric fields like price or measurements like width, height, etc.?

Using these fields to gather the required data to create a slider facet is fairly easy.

Just enable the stats component and name which details you require:

stats=true stats.field={!ex=price min=true max=true count=true}pric

The response includes the minimum and maximum values respective to your result. These form the absolute borders of your slider.

Additionally, use the count to also filter out irrelevant facets by a coverage factor.

stats": {
    "stats_fields": {
        "price": {
            "min": 89.0,
            "max": 619.0,
            "count": 188
        }
    }
}

Remember, if you filter on price, to set the slider's lower and upper touch-points to correspond to the actual filter values!

Otherwise, your customers have to repeatedly select it ;-)

So from the stats response, you have the absolute minimum and maximum. And you've set the minimum and maximum of the filter.

Solr E-Commerce Search - Not Bad After All

Congratulations! You now know how to tune your Solr basic scoring algorithm to perform best in e-commerce scenarios. Not only that, you know how to make the best use of your facets within Solr.

In the next episode of this best practices guide, I would like to dive deeper into how to correctly weight and boost your products. At the same time, I want to pull back the curtain on how to master larger multi-channel environments without going to Copy/Paste hell. So stay tuned!

Follow us:


© 2021 searchhub.io Contact Us