why the smartSuggest module might matter to you
We at searchhub live to make existing search engines better understand humans and deliver exceptional search experiences. So, why have we now created our own smartSuggest Module, and why does this matter to you? Until a couple of months ago we were mainly focusing on rewriting the user queries into the best possible search engine queries helping our customers to deliver better results for a given user query and gain fantastic uplifts in their key business metrics.
But soon we realized that we still left much of its potential unused. Why? — because search is a process and not a single feature. Until now we have completely ignored the part of the whole search process where the user formulates his query. — and guess what — there is already a nice feature for that called “autosuggest” or “query suggestions” or “auto-complete” in the search universe.
1. The goal of serving Query Suggestions – smartSuggest
Disclaimer — in this article we’ll not talk about UI and frontend implementations at all — instead we are going to focus on information-need and information delivery.
Since we are a highly data-driven company we went out and started to analyze our tracking data to find strong evidence that it is worth it to spend a significant amount of development time on either improving an existing or building our own query suggestions-system and to identify the areas we should focus on.
But before we take a closer look at what the data revealed let’s check if we can find some best-practice articles on the internet and see what they recommend:
The Nielsen Norman Group recommends using query-suggestions to:
- Facilitate accurate and efficient data entry
- Select from a finite list of names or symbols, especially if the item can be selected reliably after typing the first 1–3 characters
- Facilitate novel query reformulations
- Encourage exploratory search (with a degree of complexity and mental effort that is appropriate to the task). Where appropriate, complement search suggestions with recent searches
In our summary this boils down to — guide the user during the search formulation process to facilitate accurate data entry and encourage exploratory search.
However this is very much biased towards the user of a webshop— but what about the goals and needs of a webshop owner? Again we can find some inspiration on the internet.
Lucidworks for example points out some opportunities in terms of merchandising when it comes to query suggestions.
- Customize autocomplete suggestions according to where a visitor is on the site.
- Retailers can use autocomplete search suggestions to draw customers’ attention to certain merchandise. Products that are on sale, that are from certain brands or have a higher margin.
- Use past online behavior to shape search recommendations.
- Tie autocomplete results to customer trends.
- Factor geography into autocomplete recommendations.
Time for another summary — while guiding the user during the search formulation process, encourage exploratory search, and boost product discovery for users.
If we combine the essence of both summaries we end up with something like:
Guide the user during the search formulation process to facilitate accurate data entry, encourage exploratory search and boost product discovery.
Now that we have a goal query suggestions work well, if we observe that they help the user articulate better search queries and help to better discover the product offering. It’s rather about speeding up the search process than about guiding the user and lending them a helping hand in constructing their search query and guiding them through the available options.
2. Validating the goal and identifying the most valuable use cases
Now that we know want query suggestions should enable us to offer the user lets slice and dice some logs and tracking data to come up with the most valuable use cases we need to enable.
To validate our goals or assumptions we’ve sampled around 120.000 search sessions across several customers. We further filtered them down to roughly 57.000 search sessions by only looking at sessions that consist of two or more different searches where at least one of these search types was either a “typed search” or a “suggested search”.
- In this context a “typed search” is defined as a query formulation process where the user typed each and every letter, digit, or punctuation that resulted in a search.
- A “suggested search” is defined as a query formulation process where the user typed something and selected a query suggestion that resulted in a search.
From here on we compared the different search types in terms of their KPIs. The query suggestions have a large positive impact on, probability of add-2-cart and probability of buy and a large negative impact on the probability of spelling mistake and probability of zero-result. Therefore serving query suggestions shows an improvement in all metrics.
- Query suggestions are used only if they are relevant and good enough to provide genuine guidance during the query formulation process. This is what we call the intent matching or retrieval process.
- The likelihood of influencing the user’s query formulation process with query suggestions is highly dependent on the session context resulting in a need for query and ranking flexibility. This is the so-called scoping, filtering, and ranking process.
3. The Task of matching user intent and serving query suggestions
How often have you already cursed your smartphone’s autocorrect?
Any query recommendation should be relevant to the user. If irrelevant information (false promises or unintended suggestions) appears too often, the user’s confidence in the results will diminish, as will engagement. During the intent matching or retrieval process mainly two parts decide if you are able to provide relevant and inspiring query suggestions that can guide users. The first one is the suggestion corpus. And the second one is the matching strategy.
Building the suggestion corpus.
Let us first focus on the suggestion corpus. As with any data-driven application, the fundamental rule (bullshit in — bullshit out) still stands. The quality of the displayed query suggestions will mainly be dependent on building a quality corpus. A smart query suggestion solution needs to provide a robust process of building and updating the suggestion corpus(es).
This corpus may rely on different sources like customer query data logs, product data, or even other data-pools like a knowledge-graph for example. Only by combining these data sources, you are able to provide the diversity in the suggestions you need. But this combination comes at a cost — redundancy.
Query suggestions that are semantically similar but contain different spellings should only be displayed once. As there is no value in showing semantically identical phrases with close spellings, for example:
- Singular vs. plural forms of nouns (“women dress” vs. “women dresses”)
- Order of words (“long blue dress” vs. “blue long dress”)
- Compound words (“dishwasher” vs. “dish washer”)
- With and without stop-words (“women dress” vs. “dress for women”)
- Special characters (“swell bottle” vs. “s’well bottle”)
- Alternative spellings (“barbecue” vs. “barbeque”)
To be able to ingest, combine, clean, and update this suggestion corpus in almost real-time is the key challenge for every query suggestion system and by the way a very challenging engineering task.
The query or user intent matching strategy
The second part is how to match the given user query or user intent against the corpus and respond with a relevant and helpful list of query suggestions. To do so you need a system that is able to handle the following cases in an intelligent and graceful way.
- query normalization and spell correction. Since user input tends to be messy your system needs to provide normalization & spelling correction functionality. When a customer misspells a word or a phrase in the search box, autocomplete identifies misspelling, fixes it on the fly, and displays the correctly spelled suggestions instead.
- partial and multi-matching. Multi-match is used in product searches to allow matching of different tokens of a phrase on the same product attribute or value.
To handle all these cases your query suggestion system must provide different types of suggesters. For example, with built-in suggesters you can choose an implementation that allows for fuzzy matches (celvin can return calvin) or another one matching infixes (calvin can return calvinklein for men), but you can’t have both. A nice query suggestion system can do both (celvin can return calvinklein for men)
4. The art of Scoping, filtering and Ranking query Suggestions
Once we have managed to get the matching or retrieval right and we can receive meaningful and helpful query suggestions we still have to work on the scoping, ranking, and filtering process to make the query suggestions even more relevant, more diverse, and more inspiring.
- query suggestion scoping. If we already know that we might be able to help the user to articulate his intent by scoping a broad query (“tv”) with relevant categories or important features, the chances he might find what he is looking for will be increased.
- query suggestion filtering. There will always be a situation where you might need to exclude or filter specific suggestions based on different data points. Some common examples are.
- false promises — query suggestions which yield a false promise or zero results, should be excluded from the autocomplete display.
- blacklisted queries — Some phrases may be suppressed via a blacklist.
3. query suggestion ranking. Since we are going to present the user a list of possible choices ranking becomes a powerful tool to guide and inspire. Again some common examples.
- ranking query suggestions by business metrics — the most obvious approach is to rank suggested phrases by the number of search events, which works fairly well. But a higher number of search events does not necessarily mean a higher business value. Other relevant metrics could be considered in the ranking, such as the number of sales, margin, etc., which affect its business value. If business metrics are collected over a long period, it can be useful to boost the value of more recent events.
- promote brands or important features — If a user types in a generic subject, say “tv”, a smart query suggestion system will use this opportunity to suggest tv brands, like “Samsung TVs” or tv types, like “curved TVs”, or “4K TVs”, which give users a helpful suggestion, and also applies a merchandiser’s business logic of promoting a brand or specific type of tv.
- promote query suggestions by geographic segmentation — considering the user’s geographic location might improve ranking results. Users coming from different countries might have different interests.
- promote query suggestions based on taxonomy location — taking into account the user’s location in the product taxonomy might help to add additional context to the user query. For example, a user typing in “t-shirt” while in the menswear section. Then the user might be more likely to be interested in shirts for men, rather than shirts for all genders.
Again a system with maximum flexibility helps to improve the system over time and adapt it to upcoming trends or new ideas and business opportunities. Being able to influence and optimize the ranking of your query suggestions based on behavioral and conditional signals is crucial for your business when it comes to anticipating a customer’s search intent, and provide useful suggestions. These suggestions will help guide the customer through the product discovery experience and remove barriers to finding new products online.
The smartSuggest lib
Since we at searchhub already solve the task of matching user intent, apply semantic deduplication, and provide query sharpening or query relaxation we tried to find an existing system for serving query suggestions that on top provides the three above-identified use-cases of scoping, filtering and ranking Query Suggestions. Unfortunately we could not find such a system and went down the path of building it on our own based on Lucene.
With the data provider SPI, any kind of data source can be used to build up the suggestion corpus. The built data is then already tagged for boosting and filtering. It will then be indexed with different indexers, each one optimized for the particular search approach.
For a search request we search these indexes one after the other, until enough suggestions could be retrieved. This means if there are enough prefix matches, no unnecessary fuzzy matching is done. In a final step the results are ranked and optionally grouped and truncated. This way the maximum performance with the necessary feature set can be achieved. Everything that’s not necessary will be skipped and won’t affect response time.
The plugin is built as a production-grade workhorse, handling a load of up to 1500 QPS. And for customers using our smart query suggestion, more than 40% of all user sessions already start with a clicked query suggestion, which proves the quality of the suggestions served.
smartSuggest can be a powerful discovery tool when implemented correctly since it offers you a simple, clean API to influence the query suggestions the way you want them by using contextual boosting tags, contextual filters & scopes, blacklists, and business ranking. smartSuggest is simple to integrate too. You are only two steps away from testing it.
1. Provide your search analytics data API (e.g. GA) or use our Search Collector SDK
2. Start the smartSuggest service in your environment or request a SaaS instance
3. Integrate the smartSuggest API in your Frontend…
The value smarter query suggestions bring
Especially in mobile-first scenarios, where the smaller screen and keyboard limit the use of more traditional faceted search selectors, smart query suggestions do more than merely forecasting words or phrases the user is typing. smartSuggest goes a huge step further, and anticipates the user’s intentions to make helpful suggestions.
These suggestions improve the user’s search experience, increasing both online conversion rates and average online cart value. Overall, smart query suggestions improve both the customer’s experience, as well as helping the retailers merchandisers and the business bottom line. Investing in such features will consistently improve online conversion rates and the size of online shopping carts, especially on mobile devices. Given the high impact from this feature, retailers with a large online catalog are essentially leaving money on the table without such a powerful smart query suggestion solution.
The technology behind search|hub is specifically designed to enhance our customers existing search, not replace it. With just two API calls, search|hub integrates as a proxy between your frontend application and your existing search engine(s) injecting its deep knowledge.
If you’re excited about advancing search|hub technology and enabling companies to create meaningful search experiences for the people around us, join us! We are actively hiring for senior Java DEVs and Data Scientists to work on next-generation API technology.