Searching hyphenated words in SharePoint Server 2013

One of the issue that we came across recently was that SharePoint Search wasn’t returning results for the hyphenated words because of line break (e.g. commun-ication). This was an issue especially in PDF documents.
Looks like “-” is one of the wordbreakers in SharePoint. 

Word breaking is one of the key Natural Language Processing (NLP) features that enable search and improve search results. Word breakers split a stream of text into individual words or tokens on which additional language processing can happen. Word breakers are language-specific. In addition to built-in word breakers, Search in SharePoint 2013 enables the use of custom word breakers so that users can manage word breaking behavior according to their needs.

How to switch to a custom word breaker in SharePoint Server 2013
You can follow these steps to replace the existing word breaker with a custom word breaker or replace the existing word breaker with a word breaker in another language.

Open the Registry Editor, as follows:
Choose Start, and then choose Run.
In the Open dialog box, type Regedit, and then choose OK.
In Registry Editor, select the following registry subkey:
HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Office Server/15.0/Search/Setup/ContentIndexCommon/LanguageResources/Default/Your language.
Modify the WBDLLPathOverride registry value. In the Edit String dialog box, in the Value data box, type the path to your custom word breaker DLL, and then choose OK. The new DLL should be located in the same path as the existing DLL that is being replaced.
Modify the WBreakerClass registry value. In the Edit String dialog box, in the Value data box, type the class ID of your custom word breaker, and then choose OK.

Restart the SharePoint Search Host Controller and SharePoint Server 2013.

Do a full re-crawl.

You should be seeing results as you expect.

Leave a Reply