Customizing Elasticsearch analysis in Sugar 7.6

Here is a guest post from a couple members of the Sugar Developer community.  Yann Berges and Cédric Mourizard from Synolia, an Elite SugarCRM Partner, share some insights on how to improve the quality of Elasticsearch results in Sugar.

Elasticsearch in Sugar 7

Since Sugar 6.5, Elasticsearch has been included as a core feature of the Sugar Application and became a required component in Sugar 7.x releases. You can find some information regarding installation, configuration, and monitoring of Elasticsearch in the Knowledge Base.

It works very well and is pretty fast!

However, the default configuration is often too strict with diacritics such as accent and stressed letters found in languages like French.  For example, the default configuration will not match e with é during a global search which is not desirable for us.



Below we will explore how Elasticsearch Analyzers can be used to address this issue.  It is quite easy and does not require custom code!

To improve the global search with this type of capability you need to setup an Analyzer. Analyzers are how Elasticsearch generates tokens from input data to be indexed. You can also combine analyzers for multiple transformations; like converting text to lowercase, utilize a list of stopwords to exclude some words and lot of capabilities, or apply regular expressions.

Configuring Elasticsearch Analyzers in Sugar 7.6

One of the out of the box analyzers called asciifolding is for converting Unicode characters into lower ASCII, when they exist.  To add this type of configuration you could do that very easily with an entry in $sugar_config.

Add the following configuration setting to your config_override.php at the root of your Sugar 7.6 installation.  You may need to create this PHP file if it doesn't exist already.  The significant part is the line where the 'asciifolding' filter is added.

SugarElasticsearchConfgWithAsciifolding.php

<?php

$sugar_config['full_text_engine']['Elastic']['index_settings']['default']['index'] = array(
'analysis' => array(
'analyzer' => array(
'core_email_lowercase' => array(
'type' => 'custom',
'tokenizer' => 'uax_url_email',
'filter' => array(
'lowercase',
                ),
            ),
'standard' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array(
'asciifolding',
'lowercase',
                ),
            ),
        ),
    ),
);

After updating your config, you need to do a Quick Repair & Rebuild and perform a full system index which will rebuild the index data using this new additional analyzer.

Now if one of your modules contains, for example, cooking recipes then you can now search “Saute” and find the right results with or without accents like “Sautéed Tuna Steaks”



Enjoy your (Elastic)search!