How to Customize Stemming in Apache Solr?

Stemming is the process for reducing inflected or sometimes derived words to their stem, base, or root form.

Suffix Stemming Example

Orginal Token Token After Stemming
Reading Read
Cars Car
Fixed Fix
Actively Active

Objective of Stemming

Stemming is done to improve search result recall. Though stemming improve recall it would reduce the precision.

Solr Stemmers suitable for English

Stemmer Name How it works?
Snowball Stemmer Aggressive and algorithms based.
Solr Stemmer Less aggressive and typically target the most common nouns and adjectives.
Hunspell Stemmer Both dictionary and rule based.

Customize Stemming

Sometime the standard out of the box stemmers may not do the stemming correctly as per the domain requirement. I have seen scenarios in which product name or author name (complete noun) getting trimmed due to stemmer configuration. solr.KeywordMarkerFilterFactory class will be useful to avoid stemming of predefined set of words. It basically allow us to create an exception list.

As per the above configuration PorterStemFilterFactory will avoid words listed in the productNames.txt file from stemming.

Leave a comment

Your email address will not be published. Required fields are marked *