Do you know Querqy? If you have a Lucene-based search engine in place - which will be Solr or Elasticsearch in most cases - you should have heard about Querqy Sounds like: "Quirky"! It's a powerful query parsing and enhancement engine. It uses different rewriters to add context to the incoming search queries. The most basic rewriter uses a manual rule configuration to add Synonyms, Filters, and Up- and Down-Boostings for the final Lucene Query. More rewriters handle decomposition, number unit normalization, and replacements.
Error-Tolerance know when to say when
If you use another search engine, you most likely have similar tools to handle synonyms, filtering, and so on. So this post is also for you because search engines all share one big problem: rules have to be maintained manually! And all those rules are not error-tolerant. So let's have a look at some examples.
Example 1 - the onsite search typo
Your rule: Synonym "mobile" = "smartphone" The query: "mobil case" As you can see, this rule won't match because of the missing "e" in "mobile". So in this example, the customer won't see the smartphone cases.
Example 2 - the search term composition
The same rule, another query: "mobilecase" Again the synonym won't be applied since the words are not separated correctly. For such queries, you should consider Querqys word-break rewriter.
Example 3 - search term word order
Your rule: women clothes = ladies clothes The query: "clothes for women" or "women's outdoor clothes" A unique problem arises when using rules for multiple words. There will be many cases where the order changes and the rules won't match anymore.
These are just a few constructed examples, but there are plenty more. None of them are fundamental, but they stack up quickly. Additionally, different languages come with other nuances and tricky spelling issues. For us, in Germany, word-compositions are one of the significant problems. From our experience, at least 10-20% of search traffic contains queries with such errors. And we know that there is even more potential for improvement. Our working hypothesis assumes around 30% of traffic can be rephrased into a unified and corrected form.
What options do you have? Well, you could add many more rules, but you'll run into the following problem: Complexity.
We've seen many home-grown search configurations with thousands of rules. Over time, these become problematic because the product basis changes. Meaning old rules lead to unexpected results. For example, the synonym "pants" and "jeans" was a good idea once, but since the data changed, you have a lot of mismatches because meanwhile, the word "jeans" references many different concepts.
SearchHub - your onsite search's intuitive brain!
With SearchHub, we reduce the number of manual rules by unifying miss-spellings, composition and word-order variants, and conceptual similar queries.
If you don't know SearchHub yet, our solution groups different queries with the same intent and decides the best candidate. Then, come search-time, we transform unwanted query variants into their best candidate respectively.
What does that mean for your rules? First, you can focus on error-free, unified, and standard queries. SearchHub handles all spelling errors, compositions alternatives, and word order variations.
Additionally, you can forego adding rules to add context to your queries. For example, it might be tempting to add "apple" when someone searches for "iphone". But this could lead to false positives when searching for iPhone accessories from different brands. SearchHub, on the other hand, only adds context to queries where people search for such connections. In case of ambiguous queries, you can further split these queries into two unique intents.
Use the best tools
Querqy is great. It allows you to add the missing knowledge to the user's queries. But don't misuse it for problems like query normalization and unified intent formulation; for that, there's SearchHub. The combination of these tools makes for a perfect symbiosis. Each one increases the effectiveness of the other. Leveraging both will make your query parsing method a finely tuned solution.