Deadlock in "metadata_cache" table cause the system having no response

Symptoms:
When the deadlock happened, user could not do any operation as they got no response from system.
This issue also happened last year and it seems that it could be happened repeatedly. As the "metadata_cache" table being deadlock 3 times within 2 years and happened different time of day, it seems like random occurrence.

Investigation:
After we looking at the activity monitor in SQL Server Management Studio, we found that there were quite a few processes with "DELETE FROM metadata_cache WHERE id = 'xxxx-xxx-xx-xxx' " having deadlock. We also found that there are 3 records in metadata_cache table with the same type "meta_hash_base".

Action after encountering the issue:
The server being rebooted, the metadata_cache table was truncated and then asked user to login to the system. The records in metadata_cache table were then rebuilt and the system resumed.

Issue simulation:
I tried to create multiple "meta_hash_base" records in metadata_cache table. After I login to the system using a new browser instance, the system is then no response and the metadata_cache table being deadlock.

I would like to ask few questions regarding the metadata_cache table:

1. What is the role of metadata cache table? (There is an article on SugarCRM developer guide but it seems focus on the file and architecture on application server)
2. The operation that involved lookup on metadata cache table. (e.g. View the record in some modules)
3. How often will the entries in metadata cache table be updated / deleted / inserted?
4. Could we set the field "Type" as the key so that meta_hash_base entry will not be duplicated?

Thank you.

Environment: SugarCRM Pro 7.6.2.1, MS SQL Server 2014, PHP 5.4.45

  • I ran into this issue on an on-premise 7.7.0.0 instance in a Windows environment.  The fix from Sugar Support was always the same:
    1. Delete the contents of the 'cache' directory in the Sugar file system.
    2. Truncate the 'metadata_cache' table.
    3. Refresh a page in Sugar, so that both the table and the cache directory are rebuilt.

    Sugar Support also noted that: "We have made numerous bug fixes directly related to metadata cache problems available in 7.7.2.1."  You should try upgrading.  We eventually re-built the instance in ondemand and migrated the data to the new ondemand instance.

  • not very helpful but additional input:

    same experience but SugarCRM 7.9.4 on Windows environment and MS SQL.

    Truncate or delete from metadata_cache solves the problem. It occures randomly but only by using Studio or Repair and rebuild.

    Bests

    Björn

  • I guess that means they did not fix all of the metadata_cache issues in 7.7.2.1.  Have you upgraded past 7.9.4?  Did the issue continue?

    I've only ever seen this issue on the one Windows/MSSQL instance I inherited.  I've never had this issue on LAMP.

  • I can confirm this occurs on Windows / MSSQL system with SugarCRM 7.9.4

    All admin privileged people are aware of it and don't do any changes during office time to avoid productive disurbances so you can say it has a direct impact on how to use SugarCRM

    Bests

    Björn

  • Hi Justin,

    Thank you for your reply.

    When I take a look at the release note of version 7.7.2.1,

    it stated that:

    "Known Issue 74919 : Performing certain actions (e.g. Quick Repair and Rebuild) in Sugar that rebuild the cache files may cause unexpected issues in the system if there are multiple users logged in and utilizing Sugar."

    However, they did not mention any changes related to cache in 7.7.2.1.

    May I ask the server condition when you tackle the deadlock issue? I found that the DB server having high memory usage (Used to occupied all the system memory, then the system having very slow / no response) and limited the DB Server to occupied up to 8x% of system memory. I'm not sure if this issue is triggered by high memory usage and multiple user logged in and using the function in Sugar.

    Update the version would be definitely an option to solve / alleviate the problem and we would schedule to update to current supported version.

    Thanks.

    Kent

  • Well, I used to ask for the help from sugar support and the technical support reply that he could not simulate the issue in Linux and MySQL. So it maybe the way that MS SQL server handle under certain condition is different from MySQL.

    Thank you for your reply.

    Kent

  • Hi Björn,

    Thank you for your reply.

    In my case, it happened even when the system is not running "Repair and rebuild" and it seems that the deadlock was triggered by large number of users logged in and the changes of cache files / metadata table under the MS SQL server environment. I could confirm that "Repair and rebuild" would only be carried out when there are some customization deployed in production environment during non-peak hour and there was no deployment being carried out when deadlock happened.

    As the "metadata_cache" table being deadlock 3 times within 2 years and I found that all 3 cases having multiple "meta_hash_base" records in metadata table, under this situation, the focus of deadlock table may turns to:

    1. Why the records in metadata table being duplicated?

    2. How often when the records in metadata changed (Update / Delete)?

    3. How to prevent the record being duplicated?

    Thanks.

    Kent

  • We never did figure out what was causing the issue or even do much debugging around it.  The metadata_cache issue was a small problem compared to some other issues the instance had.  As I said, we inherited the instance with many issues and eventually migrated it to ondemand by configuring/customizing a brand new instance and then migrating the data via StarfishETL.

  • Hi Justin,

    Thanks for your feedback.

    In my case, it would be confusing as there are many users using the system and the action of "Truncate and Rebuild metadata_cache" table would cause around 10 minutes downtime. Also, the metadata_table deadlock behavior is unpredictable so that extra effort in handling this issue is required. Unless there are some preventive measures on prevent the record in metadata_table being corrupted / duplicated.

    Due to some limitations, we could not inherit the instance from on-premise to on-demand one. The way I could think of is to consider upgrading the instance to latest version of SugarCRM and see if it helps. I still would like to know if there is any way to prevent metadata_table being deadlock.

    Thanks.

    Kent

  • We faced exactly same issue on Linux + MSSQL (SugarCRM ENT 7.9.2).

    Alex Nassi is there anyone which are able to take care of this issue? I just filed the Case #342495

    Regards

    André Lopes
    Lampada Global
    Skype: andre.lampada