A client of ours is having problems with ElasticSearch (ES) where a record (typically an Account record since this is what they search on most) is not returned by global search but is present in the CRM when searched for in the list view. The vast majority of records are properly indexed but some are not and this is a cause of concern for the client when they consider how many others might be affected that they don't know about.
We are running SugarCRM 8.0.0 (Build 211) with ElasticSearch 5.6.10 (single node). The infrastructure is based on Debian 8 and SugarCRM, ES and MySQL are all on separate servers.
There does not seem to be any pattern to the affected records, some are new records whilst others reported were part of the original data migration and we have performed ES reindexing (w and w/o clearData) numerous times since.
Interestingly, a simple workaround is to edit and save such a missing record which will cause it to get picked up and properly indexed.
I've found several community posts around the same problem "missing global search results until I manually re-save the record in question" but the solutions they outline did not help us resolve the issue on our end.
Some of the advice I found:
- increase PHP memory_limit so that it's not lower than ES memory consumption - fine on our end, PHP has 6GB whereas ES is setup with 2GB heap size
- check the job_queue table, if there are failed/errored reindexing jobs - there aren't any on our end
- ensure the searched records are actually visible (teams/permissions) to the person doing the search - checked, not the case here
- increase the RAM available for the web server - checked, RAM looks good and we did not increase it
- make sure the ES server is not out of disk space (it was advised to leave 2,5x data volume of disk space) - checked, seems fine
The ES instance as such is up and running and all the relevant modules are search-enabled. The total count of indexed documents looks approximately ok and I can see the numbers growing in time which means that indexing does work.
I checked RAM and Disk space on the server node that houses ES
Current RAM utilization is at around 3GB out of 8GB avail.
Current DISK utilization is at around 19% 22GB out of 92GB
The yml configuration for ES is simple and presents itself as follows (the real IP address was redacted):
network.host: 10.20.30.40 #redacted
The jvm options are as follows:
Looking at the recent ES logs I did not identify any problems, in the last month I can see mostly GC details:
[2018-10-04T09:58:11,782][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [287ms] collecting in the last [1s]
[2018-10-08T10:03:51,243][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [331ms] collecting in the last [1.3s]
[2018-10-11T05:13:46,113][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [516ms] collecting in the last [1s]
[2018-10-20T05:32:15,278][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [554ms] collecting in the last [1.1s]
[2018-10-26T10:29:12,133][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [322ms] collecting in the last [1s]
[2018-10-26T10:29:13,136][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [271ms] collecting in the last [1s]
[2018-10-28T14:44:16,148][INFO ][o.e.c.m.MetaDataMappingService] [node-1] [8c5202da11c5318f90bfca50abc4991c_shared/BNiu_wrrSh-31Li75PYPTA] create_mapping [PLC_Placement_Categories]
[2018-10-28T15:02:34,060][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc] overhead, spent [1s] collecting in the last [1.5s]
Inspecting the SugarCRM database I found a few things that caught my attention, but I am not 100% sure it is related or even something to be worried about - maybe I am not fully aware of how things should work there:
- The fts_queue table is empty and remained that way during my investigation. I confirmed the queue was empty with the elastic:queue CLI command.
- The job_queue did not contain any entries with "FTS" in their name which seems odd to me.
- The job_queue did contain a lot of entries for "Elasticsearch Queue Scheduler", all of them were successful but also ALL of them had the message "No records currently in queue - nothing to do".
I am looking for advice on how to mitigate the issue described in first paragraph or at least what to check, where to look further to pinpoint the actual reason behind the explained problem. Thank you in advance to all who respond, any help is highly appreciated!