Stemming is the process for reducing inflected or sometimes derived words to their stem, base, or root form.
Suffix Stemming Example
Orginal Token | Token After Stemming |
---|---|
Reading | Read |
Cars | Car |
Fixed | Fix |
Actively | Active |
Objective of Stemming
Stemming is done to improve search result recall. Though stemming improve recall it would reduce the precision.
Solr Stemmers suitable for English
Stemmer Name | How it works? |
---|---|
Snowball Stemmer | Aggressive and algorithms based. |
Solr Stemmer | Less aggressive and typically target the most common nouns and adjectives. |
Hunspell Stemmer | Both dictionary and rule based. |
Customize Stemming
Sometime the standard out of the box stemmers may not do the stemming correctly as per the domain requirement. I have seen scenarios in which product name or author name (complete noun) getting trimmed due to stemmer configuration. solr.KeywordMarkerFilterFactory class will be useful to avoid stemming of predefined set of words. It basically allow us to create an exception list.
1 2 3 4 5 6 7 |
<fieldtype name="productName" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="productNames.txt" /> <filter class="solr.PorterStemFilterFactory" /> </analyzer> </fieldtype> |
1 2 3 4 |
#add product names here to avoid stemming Adidas Lenses Belts |
As per the above configuration PorterStemFilterFactory will avoid words listed in the productNames.txt file from stemming.