Skip to main content

The Solr Wildcard Problem And Multiterm Solution

A while ago I ran into a problem with Solr and wildcard searches. Turns out it's more of a little known fact than a problem, Solr treats wildcards a bit different than how one would expect.

Normally, when a user enters a search-term we will append '*' to it so it will find words starting with the term that is being searched for. Common sense dictates that when someone searches for "train*", the set of results will also contain the word "train" itself. But "train" is not part of the results. When looking for "train*" Solr will return "trainer", "training", "trains" but it won't return results with the word "train" in it.

In case you want documents matching the exact word as well you will have to change your Solr statement. Instead of searching for "word*" you'll have to look for "word OR word*", only then documents matching "word" will turn up in the result-set.

After discussing it with Nick Veenhof, it turns out that, as of Solr 3.6 there's been a new type of analyser introduced that addresses this shortcoming (although it seems to be more of a side-effect). The multiterm analyser type. Copy the query analyzer section of the text fieldType and change the type to "multiterm". What multiterm addresses is that when you look for "train*" it will run the query as if you've been searching for "train", "trains", "training", "trainer". You might be inclined to think that that's the same as searching for anything that starts with "train", but that is not the case. Without specifying the multiterm analyser the wildcard searches are not analysed at all and thus can't be used for stemming, or downcasing or accent mapping...

Specifying the new multiterm analyser will also "expand" the wildcard search to the actual word, in our case "train", which is what we were looking for.

In case you're using Drupal to interface with Solr there's an issue filed about this along with a simple example explaining the difference between a non-analysed wildcard search and a multiterm search.