Framing the Hurdles to Relevant Ecommerce Search
Every e-commerce shop owner wants to achieve ecommerce search relevance with the search results on their website.
But what does that really mean?
As the German singer Herbert Grönemeyer once stated: “It could all be so easy, but it isn’t”. *1
What is Search Relevance for Your Visitors?
It may mean that they look for products matching their search intent.
All of them.
New ones on top.
Or the top-sellers.
Or the ones with the best ratings.
Or the ones now available.
Or maybe the cheap ones, where they can save the most money right now?
What is Search Relevance for the Shop Owner
It may even mean something entirely different, like having the products on top with the best margin. Or the old ones which should free space in the warehouse.
It’s evident that the goals are not the same; sometimes even contradictory.
How to Overcome Hurdles to Ecommerce Search Relevance?
As with most things, the solution is an even blend of several strategies. These will allow both a strong foundation to reach a broad audience, while simultaneously retaining enough focus to meet individual customer intent.
But even with the perfect ranking cocktail, you will still have to do your homework concerning the basic mechanics of finding all relevant products in the first place.
So let’s start with that.
Step #1 — Data Retrieval is Key in Making Search Relevant:
Ask yourself what kind of data you need and if you are making use of all its potential yet.
Don’t forgo the basics!
It’s easiest if you begin this exercise with the following analogy top-of-mind: Imagine you are building a skyscraper!
If the basement is not level, you can try as hard as you want, the construction will fall apart.
Or to borrow another analogy: painting a wrecked ship in fancy orange color will still leave you with a ship wreck. So don’t try to use fancy stuff like machine learning to compensate for crappy data.
Achieving ecommerce search relevance is just as much about, wisely, using every available piece of data you have throughout your databases, as it is conceiving a relevant structure to support it.
Keep in mind details like the findability of terms. Having many technical specifications is great. Having them in a normalized matter is even better.
A simple example of this are colors. The products of brands tend to use fancy names like “space grey” or “midnight green”.
But that is not what your customers will search for. At least not the majority of customers.
As a result, for the purposes of searchability and facetability, it is necessary to map all brand specific terms to the generally used terms like black and green.
Keep it simple!
Further to normalization: if your customers are searching for sizes in different ways, e.g. 1TB vs. 1000GB, you need to make it convenient for customers to find both.
Key to the success of this kind of approach, is structurally separating facet data from search data. All variations must be findable, but only the core values used for faceting.
True, there are several software vendors out there who can help you normalize your product data. However, a few simple processing steps, that you plug into your data processing pipeline, will improve your data enough to considerably increase both findability, and facetability.
Step #2: Data Structuring – for Ecommerce Search Relevance
Assuming you are satisfied with your general data quality, the next important step is, to think about the database structure. This structure will support you and your customers not only to find all related products to a given query, but also to ensure they are returned in the right order. At least more or less the right order, but we’ll get to that later.
Naturally, part of your data structure needs to be weighting the different pieces of information you declare searchable. This means the product name is more important than the technical features. However, features still take precedence over the long description when describing your product.
Actually, an often missed piece of the relevancy puzzle is doing the necessary work to determine which parts of your data structure are essential for relevant intent-based results.
In fact, in many cases, It has proven more lucrative to eliminate long descriptions all together, as they unnaturally bloat your search results. Random hits are most likely not adding value to the overall experience.
As mentioned previously, it’s always a tradeoff between “total-recall” (return everything that could be relevant, and live with additional false results) and precision (return the right stuff, albeit not every item).
What About Stemming to Increase Relevance?
Some search engines allow you to influence the algorithm in detail on a “per field” level.
Stemming is useful on fields with a lot of natural language. But please, don’t use stemming on brand names!
On a similar note, technical features can have units, e.g. “55 inch” or “1.5 kg”. Making this kind of stuff findable can be tricky because people tend to search for it in different ways (1.5kg vs. 1.5 kg).
For this reason, it’s important to:
- normalize it in your data and,
- make sure to do the same steps during query time.
How Best to Structure Multi-Language Product Data Feeds for Optimal Relevance?
If you sell into multiple countries with different languages, set up your indexes to use the correct type of normalization for special characters like Umlauts or different writings for the same character.
Recently, I ran into a case that illustrates this problem quite well, when I noticed people searching for iPhone with characters like í or ì instead of the normal i. Needless to say, it’s imperative these cases are handled correctly. And it’s not as if you have to configure everything on your own. There are ready to configure libraries available for a variety of search engines.
Ecommerce Product ranking
As stated previously, in the introduction, due to the contradictory nature between a user’s intent and the goals of the shop manager, ranking of found items can be tricky.
However, under normal circumstances, you need simply to apply a few basic rules like de-boosting accessory articles, to get the desired results. To do this you must, first, be able to identify what an accessory item is. This means that you, ideally, have a flag you can set in your data. If there is no flag and you have no way of marking articles in the database, you may get lucky and have a well-maintained category structure. In this case, you can utilise an alternative method and de-boost articles from specific categories instead.
You may also find it helpful to attempt to reconstruct accessory items by identifying “joining words” like “for” (case for smartphone xy, cartridge for printer yz).
If neither is the case (haha), I strongly suggest you start flagging your items now. Otherwise, it will be much harder to achieve ecommerce search relevance.
The remainder of the ranking rules depend on your audience and your preferences. Be sure you have ample data within your database to pull from! Things like “margin”, or “sold items count”. This will give you flexibility to utilize different approaches and even be able to A/B test them. Feel free to add more values to your data, which you deem relevant for scoring your products!
These types of rankings are applied globally, completely query-agnostic.
Tracking and Search Term Boosting
Now, we come to the part, where you let your customers do the work for you.
How, you ask? By making easy use of customer behavior within the shop to enhance the results. To do this, simply take the queries, clicks, add to carts and buying events and combine them at session level.
Why bother with the session? Isn’t it possible to just use the distinct “click path”? Let me take you through an example. Imagine your customer is searching for something but doesn’t find it because of a typo or different naming in the shop. As a result, he might leave or try to find the right product via the shop’s category navigation. If he finds what he’s looking for, you both get lucky. You now have a link between the former query and the correct product.
This may even result in you learning new synonyms. Nevertheless, be careful. Should your thresholds be too low to filter out random links, you may end up with many false results.
Now that you have a link between queries and products, you can attach the query to the products and use that for boosting at query time.
Keep in mind that boosting is pretty safe, as long as your engine emphasizes precision over recall. You may want to stick to tracking click paths, if you are returning large result sets with blurry matches. For this reason, it’s essential to make sure the query truly belongs to the subsequent actions to not confuse every action within a given session.
These optimizations will already be visible in better results. At least for your most popular products that is. To mitigate a positive feedback loop (popular products get all the attention) ensure new products get a fair chance of being shown. This is simple enough by adding a boost, to new products, for a short time after their release.
But How Do I Achieve Search Relevance for the Rest of my Products?
Let’s expand this one level further and generalize the links we created in the last stage.
If, for example, for the search term “galaxy”, some real phones are being interacted with, we can insinuate, this behavior could also apply for the rest of the products from that category or product type. As mentioned previously, it is imperative that you have clean data as not to mix up stuff like “smartphones & accessories”. Good luck, if you’re using this type of key to generalize your tracking links! Don’t do it — clean your data first!
In the example at hand, we want to achieve a link between the query and all products of the type “smartphone”. After that, we can add a boosting for all the smartphones found and voilà…
You get a result with smartphones on top. The most relevant ones getting an extra punch from the direct query relation.
And finally, the relevancy of the products is a stack of boostings:
- First by the field weight
- Then by ranking criteria
- And in the end by the tracking events.
If you got this far, you might also be interested in the more advanced techniques like “learning to rank”.*2
This method applies the principles of machine learning to the product ranking mechanism. However, it will require some supervision to, successfully, learn the right things!
Or perhaps you want to integrate personalization for individual visitors. Wait a minute… maybe that topic is so comprehensive, it would be better left for another blog post…
So, now we’re done, right?
Well, not so fast 😉
Query Preprocessing for Ecommerce Search Relevance
The whole data part is only one side of the coin. Your customers may still need some help finding what they look for.
For this reason, you should implement some preprocessing of the incoming queries before forwarding them to your search engine.
Preprocessing can be as simple as creating tasks to remove so-called stop words, i.e. filler-words words like a, the, at, also, etc.
If your engine does not come with a list of stop words, you can search the internet and adapt a list to meet your needs. In addition, counting the words from your data and checking which word most frequently qualify as a stop word for you can be very effective.
Some search engines even allow reducing the value of those words to a bare minimum. This method can help you to better rank the one product where, actually, the whole phrase matches (e.g. “live at Wembley” instead of “live … Wembley”).
We also mustn’t forget the need to support your customers should their language differ from the one used describing your products. For this reason, you need to establish a set of synonyms for the cases where you would, otherwise, end up with no match results.
Please keep in mind, if your search engine also provides a way to define antonyms for similar words with diverging meaning, e.g. “bad” and “bat”, make sure you fully understand how this cleans/shapes the results. In some cases, products containing both words will be kicked out of the result for triggering antonyms on both sides of the spectrum.
If you’re able to, use deboosting for the antonym instead of completely removing it. It can save your day!
And finally, your customers might mispell words like I just did… did you notice? Well, your search engine will notice.
Or, what about the scenario when your search just won’t find the right things? Or anything at all, for that matter. Or, maybe the result varies negatively because the boosting for one frequent query is better than the ranking for an alternative spelling.
In this case, you could add a preprocessing rule. And for some frequently used queries, it might work out.
But eventually, you will get lost in the long tail of queries, completely.
Tools like our search|hub can help you in matching all variations of a query to the perfect master query. This master query is then sent to your search engine —whatever flavor search engine you might have.
Searchhub identifies any types of misspellings, or even ambiguous albeit, correct, spellings (playstation 5, ps 5, PS5, play-station 5) or typos (plystation 5, plyastation 5, etc.).
We know which query performs best, so that you don’t have to!
If you want to see your shop’s queries clustered around top-performing master queries, and get to know the full potential this has on your conversions, please feel free to contact us!