Matt Marum

Elasticsearch Indexing Jobs in Sugar 7

Blog Post created by Matt Marum Employee on Sep 14, 2015
In this post by Jelle Vink, SugarCRM's Security Architect and resident Elasticsearch expert, offers an explanation of how the Sugar Job Scheduler and Job Queue affects Sugar 7's record indexing behavior.

 

Cron.php Execution

 

When cron.php is executed, there is a limit of how many jobs the driver executes and how long it will run. When either maximum is reached, the current cycle will terminate. The default maximums are 25 jobs and 1,800 seconds. Both can be changed in config_override.php:

$sugar_config['cron']['max_cron_jobs'] = 25;$sugar_config['cron']['max_cron_runtime'] = 1800;

 

There is also a minimum interval in minutes (which defaults to 1). If cron is executed multiple times in a row, it will only actually do something when the minimum interval is met. This can be changed to allow another cycle to be run again immediately after the previous finishes by using the following setting.

 $sugar_config['cron']['min_cron_interval'] = 0;

 



 

 

Elasticsearch Job Creation

 

There are a certain number of schedulers configured out of the box in Sugar 7. When cron is executed, the driver starts by executing schedulers that are due. These schedulers are not jobs themselves.  They simply create new jobs to be executed.  These jobs are then stored in job_queue table.

 

Once schedulers have created the necessary jobs, the driver starts executing the different jobs based on the order of creation, status, job delay and execution time.  For Elasticsearch there is one scheduler which is configured to run as often as possible - which means every time cron is executed. This scheduler will create a consumer job for every module for which there are queued Elasticsearch records in fts_queue table.

When a full reindex has been triggered by a Sugar Administrator, a consumer job for every FTS enabled module will be created and queued.

 

Always remember that your Elasticsearch jobs are not alone in the job queue.  There are other schedulers that create jobs like Email reminders, Database pruning, Check inbound email boxes, etc.  Jobs can also be created outside of schedulers via logic hooks or other custom code.

 

Job execution

 

As explained above, the cron driver will only run 25 jobs in the queue during each cycle. There is no guarantee that these are going to be Elasticsearch jobs.  Other jobs may also be waiting in the queue.  So there isn't any reason to give Elasticsearch jobs priority as we treat all jobs equally to guarantee that every job is executed eventually.

 

For Elasticsearch specific jobs there is also a maximum number of records that one Elasticsearch job will consume out of the queue for a given module. As explained above one Elasticsearch (consumer) job will only process one single module. The maximum of records an Elasticsearch consumer job will process for one module is by default 15,000. This can be configured using the following setting.

$sugar_config['search_engine']['max_bulk_query_threshold'] = 15000;

 

Effects on Elasticsearch indexing

 

In the demo data there is no single module which has a higher count of 15,000 records. The only limiting issue here is the amount jobs which are created which is in certain cases higher than the default 25. To get everything indexed for a full reindex, on average at least 2 cron runs are needed.

When testing Elasticsearch (full) reindexing after running cron, you should ensure that there are no records left in the fts_queue table. This is the only confirmation that all records are present in Elasticsearch.  A single cycle may not be enough to ensure all records have been indexed!

 

While it may cause an issue for Sugar Developers doing local development without cron setup, this is not an issue on a properly configured production system. For example, once a cron cycle stops after 25 jobs, the next cycle will happen soon - we typically recommend triggering cron every minute. That next run will pick up the next 25 jobs, etc, until indexing is complete.

 

Additional ElasticSearch fine tuning

 

The following config_override options are available for an admin to fine tune the performance of the indexing. This might change in the future as we are considering refactoring our queue out of the Sugar database. Below values are the defaults:

$sugar_config['search_engine']['max_bulk_query_threshold'] = 15000;$sugar_config['search_engine']['max_bulk_delete_threshold'] = 3000;$sugar_config['search_engine']['force_async_index'] = false;$sugar_config['search_engine']['max_bulk_threshold'] = 100;

 

Development / QA recommendations

We recommend adding the following to our deploy/automation to circumvent any issues regarding Elasticsearch (re)indexing and general cron usage.

 

All changes have to be done in config_override.php:

$sugar_config['cron']['max_cron_jobs'] = 500;$sugar_config['cron']['min_cron_interval'] = 0;

 

This will ensure that when a QA person or Sugar Developer executes cron.php multiple times in a short time frame, that cron will run immediately and will tend to clear the queue fully when there are a lot of jobs to be run.

Outcomes